r i
4.l,1;': t.:.*.:.1
iiir '
:i;:,
'
Contents
, ''5,n '
:;li
.{ :._
1. Production Systemsand ACT 2. Knowledg...
212 downloads
1770 Views
24MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
. r i
4.l,1;': t.:.*.:.1
iiir '
:i;:,
'
Contents
, ''5,n '
:;li
.{ :._
1. Production Systemsand ACT 2. Knowledge RePresentation 3. Spread of Activation 4. Control of Cognition
726 r,
5. Memory for Facts
17L i"
6. Procedural Learning
215 l' ,:,
7. LanguageAcquisition
251,
Notes References Index of Authors General Index
307 315 335 340
';i
fr F $,"
t* i,i: li;. lf
/
'
To Russ and f.f., from whom I have learned much about the evolution and development of cognition
Copyright @ 1983bY the President and Fellows of Harvard College All rights reserved Printed in the United Statesof America 10987654321 This book is printed on acid-free PaPer, and its Ui1$ing materialshave been chosenfor strength and durability. Library of congress cataloging in Publication Data Anderson, fohn Robert, 1947The architecture of cognition. (Cognitive scienceseries;5) BibliograPhy,P. Includes index. 1. Cognition-Data processing. 2. Human information 3. Digitalcomputer simulation' I' Title' pro."rriig. II. Series. 82-21385 153 8F311.A5894 1983 ISBN 0-(r74-0M25-8
1".-i ll..
u: f$. l.: ':rr
ti, ii
',. iffif *
{&: .,. ldji i l , . : i
:1
Preface
In the mid 1950s,when I was an undergraduate, three of the active areasof research in psychology were learning theory, psycholinguistics, and cognitive psychology. At that time there was no coherent connection among the three, but the question of how language could be acquired and integrated with the rest of cognition seemed interesting. However, there was no obvious way to tackle the question because the field just did not have the relevant concepts.There were the options of pursuing a graduate career in learning theory, cognitive psychology, or psycholinguistics. I chose cognitive psychology and I believe I chose wisely (or luckily). When I went to Stanford in 1958,Gordon Bower assignedme a hot issue of the time: to understand the categoricalstructure in free recall. As we analyzed free recall it becameclear that we needed a complete model of the structure of human memory, with a particular focus on meaningful structure. Bower and I worked on this for a number of years, first developing the model FRAN (Anderson, 1972), then HAM (Anderson and Bower, 1973).Through this work we c6uneto appreciatethe essential role of computer simulation in developing complex models of cognition. HAM, the major product of that effort, was a complete model of the structures and processesof human memory, having as its centralconstruct a proPositional network representation. It successfully addressed a great deal of the memory literature, including a fair number of experiments on sentencememory that we performed explicitly to test it. Critical to predicting a particular experimental phenomenon was deciding how to representthe experimental material in our system.Usually our intuitions about representationdid lead to correctpredictions, but that was not a very satisfactorybasis for vll
ix
Preface
Preface
a comprerJiction.The memory system had to be understood as for reason The system. cognitive ponent of a more gunur"t the of properties in the lay a particulai representation choosirrg -ryri"rr,. For instance, choice of a representation for a general Jentence depended on the operation of a natural-languagi purrur, whic'h, unlike the facts rePresentedin HAM, is a skill. th"ru is a fundamental distinction between declarative knowledge, which refers to facts we know, and procedural knowla edfe, which refers to skills we know how to perform. HAM, the because incomplete was system, the"oryof the declarative ,.pruruntation chosen in the declarative system depended 9' as the operations of componentsof the procedural system such
I then developed ACTF, the first production system to contain an extensivetheory of production acquisition. A viable version of ACTF was running in 1977. After moving to Camegie-Mellon in 1978, I spent the next four years trying to "tune" the ACT systemand make it address language acquisition. My attempts produced some major reorganizations in the theory, including changesin the spreading activation mechanisms, a theory of production pattem matching, augmentations to the architecture to handle goals, and additions to the production-learning mechanisms,all of which are described in this book. Paul Kline and David Neves collaborated in the early part of this effort. In the later years I was guided by the A.C.T. researchgrouP, which has included Frank Boyle, Gary Bradshaw,ReneeElio, Rob Farrell,Bill fones, Matt Lewis, Peter Pirolli, Ron Sauers, Miriam Schustack, and feff Shrager. I was able to finally aPPly ACT to language acquisition. The new version of ACT is called ACT* (to be read ACTstar). The theory has evolved from the original concern with categoricalfree recall to address ,lll ever-widening set of questions. Each theory along the way raised questions that the next one answered. Finally, the widening circles have expanded to encompass my original interest in leaming theory, psycholinguistics, and cognitive psychology. While it is undoubtedly not the final theory, it achievesa goal set fifteen years earlier. ACT'I is a theory of cognitiuearchitecture-that is, a theory of the basic principles of operation built into the cognitive system. ACT stands for Adaptive Control of Thought._It is worth reviewing what this title means and why it is apt. First, this theory concernshigher-level cognition or thought. A major presupposition in this book is that higher-level cognition constitutes a unitary human system. A central issue in higher-level cognition is control-what gives thought its direction, and what controls the transition from thought to thought. As will become apparent, production systems are directed at this central issue. A major concem for me has been to understand the principles behind the control of thought in a way that exposes the adaptive function of these principles. Ultimately, understanding adaptive function brings us to issuesof human evolution, which are largely excluded from this book (but see Anderson, 1.982c). It needs to be emphasized that production systems address the issue of control of cognition in a precise way that is relatively unusual in cognitive psychology. Other tyPes of theoretical analysesmay produce precise models of specific tasks, but
vlll
Parser. The first step to develoPins a more complete theory was identifying an appropriate fJrmalisrn to model procedural transiknowledgl. My fitit effort was to use the augmented. aPPropriate so tion networks (ATNs), which had seemed .for natural-languageprocessing.A theory of languageacquisition to was deveto-peJitt that framework. However, ATNs prove-d too and formalisms be at once too restrictive as comPutational led powerful as models of human cognition. Jl-resedifficulties had Newell Allen me to considerthe production-systemmodels I have been promoting, which had some similarities to ATNs. features' Proto admit that aI first, when I focused on surface I beduction systemsseemedunattractive.But as I dug deeper, key incame more and more convinced that they contained knowledge. procedural sights -rn" into the nature of human ecr system, the product of my threg years at Michigan, synthesis 197g-1976,and described in Anderson (1976),was a architecof the HAM memory system and production-systuT the first for used ture. In this system a ptod.t.tion system was boralso time as an interpret"t of a Propositional network. AcT researchers, rowed the spreiding activation concept from other operatedon such as Collins andbuillian. Its production system of its longportion a working memory defined by the active simulation version of term memory netfuork. In a computer ,,program,, production ACT, called ACTE, it was possible to book desets that modeled various tasks. Much of my 1976 tasks' scribes such production sets that modeled various small where of However, ultho,.gh ACTE answered the question from the propositional ,"pr"J"r,tations came from-they came where u.tior6 of productions-now the natural question was, learning a do produ.lior,, come from? This led to the issue of Beasley,and theory for production systems.Paul Kline, charles
x
Preface
task in a particular how the system sets itself to do a particular the choice of system production way is left to intuition. ln a to exeproduction what of what to do next is made in the choice stratresolution conflict the are cute next. Central to this choice .gi",(seeChap*+l.Thusproductionsystemshavefinallysucpsychology' ceeded in banishing the homunculus from a gle-at *TI In writing this book I have been helped by B-radshawl qtlt Gary drafts' on people who provided comments Chase,ReneeElio,IraFischler,SusanFiske'|imGreeno'Keith Holyoak,Billlones,PaulKline,SteveKosslyn'MattLewis' Newell, ]ane Brian MacWhin"ey; Michael McCloskey, Allen zenon Pirolli, Peter Pinker, Perlmutter, Rolf Pfeifer, steven Pylyshyn,RogerRatcliff,LanceRips,PaulRosenbloom'Miriam ted on one or more of schustack, and feff shrager have *^*"t Ed Smith, McClelland, g, these chapters. Robert i'rederkin IaY in its manuscript the read and two anonymous reviewers have editor-reading excePtional entirety. Eric lvanner has been an andcommentingontheentirebook.LynneRederhasgone idea by idea' through this boik with me, word by word and of the ideas denumber a u*poJi.g weaknessesand providing my "Thinkas well as g-.,p, .r"iop"aiere. The A.C.T. research with me' times many thesJideas irrJiclasses, have gone oI"1 a1f exprograms the get to hard tlurry people ha; worked Bf-d:.h3*' Gary students' perirnents ,urrning. My erafuate and Miriam it"rr"" Elio, Biil lJner, Mitt Lewis, Peter Pirolli, Iwasawa Takashi collaborators. Schustack,have tu"r, valuable experiment-running Proand Gordon Pamm wrote countless Winslow Rane and Riehle Barbara and data analyses; fr*, lwasawa and greatly assisteJ in testing subiects; and Takashi Two-outgeometry-proiect' the fir*k goyle did much work on have Sauers' Ron and Farrell ,t"iai"g'.rrrdergraduates, Ro-b acquisition of .the simulation L;;" reJpor,siUfi for the GRAPES her many other duties' of LISP progru*ming skills' Among for prepalils Td iiuitity respot Rane winslow had frimary grateful^to..hut deeply am I and this *i^t"'ipt, wallace Monica work. "oorai."ting hard and for all of her resourcefulness book the shepherding of months *u"."rponsible for the last into publication. who have .Finallv, I want to thank those Sovernment agencies from grants .support Research research. my to -".i *;;;J and ACTE of development the in NIE and NIMH **ru critical sciences Information the from grant ACTF, and a current the geometry Lu".n of NSF (tsr-go-r5357) has supported i
Preface
x1
faproiect. My simulation work has taken placeon the sUMEX from lifity at Sianford, supported by gran! P.L1f'R-00785-08 uy NIH; on carnegie-Mellon's computer facilities, supporte{ PsycholgEY our contract F33615-i1-K-1539from Anfe; andon vAx, purchased through grant BNS-80-13051from NSF. My language acquisitio., ,"iuuich is supported bI a contract from ON-R.iwould like to expressspecialappreciation to the Personnel and Training ResearchProgram of oNR (Marshalt Tl|1 Henry Halff) and-to the Memory ind Cognition Program at NSF goe ioung;. ffrese fi^/o groups have been long-stan{ilg-ttqprovided the stable fundiorters of lhe ACT rer"urih at d ha*te ing that a proiect of this scaleneeds;the current sourcesof suppo? ,r" r.ioodr+-gt-C-0335 from ONR and BNS-82-08189from proitSf. Marshall Farr, Henry Halff, and foe Young have also vided valuable input to the research.
P
"d 4
gEH
o
fiH
o
o l-lr
(n
E A
@
e 7 !iD
q
rl 7 o o
?$tie[[i]iaI eiH8 rEilti'6.,
q
9. 6 n a-lii.
5
t.IH rEiii eegiligggalg llgg ill Fl-
F'
o
Fg
CN
t 5'x I 6 A t
s)
a r 8fr5 6'E b
3
*r'EiilBirIB [*i iiEFillii t sE lfiliialis Fn
r.9
69
; $
3;{; F€ -(ts l>O
q N
g a bo ii
ErscoorNc The encoding of propositional representations is more abstract than that of temporal strings or spatial images in that the code is independent of the order of information. For instance, the propositional repnesentation(hit |ohn Bill) doesnot encode the difference between lohn hit Bill and BiII was hit by lohn. What it doesencodeis that the two arguments,Iohn and Bill, are in the abstractrelation of hitting. In encoding a sceneof ]ohn hitting Bill, the propositional representationdoesnot code who is on the left and who is on the right. Propositional encodings are abstract also in that they identify certain elementsas critical and ignore all others. Thus the encoding of the scenemay ignore all physical details, such as |ohn's or Bill's clothing.
79
KnowledgeRepresentation
TheArchitectureof Cognition
oneoftheprincipallinesofempirical.evidenceforproposi. memory tional representationscomesfrom the various sentence predicted better is perforrnance u{'f memory studies showing that original the of structure word e tt Uy than Ly ,"-ur,tic vaiables of experisentence.This researchinclldes the long tradition for the gist is better than memory 'L97"1,; ments showing that memory Brinsford and Frank, 1971'; for exact wording (Begg, showing that the sachs, 7g67;Wuii"i, r56al and experiments a sentenceare in L"ri pr"*pts for recalling a particular word cfos-e.(Ar.temporally words that are semanticity iatt er than Similar 1968)' Wanner' no*"t, L973;Lesgold, 1972; derson mempicture to "r,a respect demonstrationshave been offered with oqy(Bower,Karlin,andDueck,LSTS;MandlerandRitchey' that are Pretiffi, that it is the underlying semantic relations some exresearch, this against dictive of memory. In tu".tiot for senmemory verbatim good p"ii-"rrts have dlmonstrated for visual memory good or igZS') tences(Graesserand Mandler, HowL978)' (Kolers, important detail that is not semantically multireprethe for troublesome not are ever, theseobservations they can be sentationalposiiior, U.it g advancedhere, although for the pure propositional position that has been is that there t "*u".rursing advanced.Whaf is important ior current purposes memory is are highly reproducible circumstances in which \ its physi- rl for good for the meaning of a stimulus event and not is necessary.loPIoIal details. To u.co,rritfor these situations it i semantic rela- \ posea representationthat extractsthe significant for situations that iior,st ips from these stimuli. To account temporal string the use show good *"*ory for detail, one can or - image rePresentation. is that hirtir,.tive feature of abstract propositions A;;h", takes II hff Thus there are strong ionitraints among the elements. must have as one of its two arguments,Siru tt r.e, and it-ecide is nothing.like this There argumentsan emiedded propgsition. what is in the directly encode i;;;;s, which with strings one element ", possible. logically is world, and any comLination elements other the what not-consirain of a string or image does relational rePresent ha1d, '1 o-ther g" the might U". rrop*?tior,", to learned only hal mind tlie ar,d of experience, categorizations iertain Patterns.s see --Ur"p;ritional representationshave been particularly imporACT* emerged tant to the devei"p*."t of ACT*. Historically, and from ACTE, which had only propositional representations, analyses-were therefore, many of the "mpiiicai and theoretical that the AcT architecconcernedwittrthese. Although I believe representations, the ture appli", uq""iiy *"il to ail knowledge
I t\*
7l
maiority of the analysesin subsequent chapters work with propositional representations. As with the other representations,the true significanceof the p;opositiop ligs in the way it is treated by- the produc-. r -a-.Qg!;act prqc€sses.Unlike the encoding Processesfor tem- I system [!o1 poral strings or spatial images,the structureof an abstractproposition is not a direct reflection of environmental structure. Rather it reflects an abstraction of an event, and the encoding processitself is something that must be learned.With respectto i language, each child must leam the processof comprehensionf (sometimes innocuously called a "parser") for his native lan-f guage. Similar extractionprocessesmust be at work in learningf to interpret nonlinguistic experiencesand to identify the mean-l ingful invariances (innocuously called perceptuallearning andf concept acquisition). Becausethe propositional encodings arer' not direct reflectionsof externalstructure but are determined byi " experience, many of the representations proposed over thd years have had a somewhat ad hoc character.Until the abstraction processesunderlying the formation of perceptual and linguistic parsersare specified,there will be unwanted degreesof freedom in propositional representations,and they will remain as much a matter of intuition as of principle. The learning theory in Chapters 6 and 7 is a step in the direction of specifying h ''' how we leam to transform input into new higher-order rePresentations. lppositional representations, like strings and images, involve structure, category,and attribute information. Figure2.l0 illustrates how this information would be used to representThe tall lawyer belieaedthe men werefrom Mars. A central node representsthe propositional structure, and links emanating from it point to the various elements.The labels identify the semantic relationships. The labeled network notation is appropriate because the links in a net are order-free, iust as elements of a proposition are. The reader may recognize such a representation as the structure introduced by Rumelhart, Lindsay, and Norman (1972)in the early days of the LNR model. Many other, more complex network notations exist (Anderson and Bower, 1973; Norman and Rumelhart, 1975; Schank, 1972). Kintsch (1974)introduced a linear notation for network structure that is technically more tractable for large sets of propositions than a network representation like that in Figure 2.10. As with the other representationaltypes, such notational differencesare not significant. What is important is the information that the notation encodesand how that information is used. Choice of notation is a matter of convenience.
presen
TheArchitectureof Cognition
72
ATTRIBUTE
^"r*/?
,7 etnwner/
/\
TALL
,,i.u,
/\ \ATEGoRY LAWYER
/\ *r^,rurr1/
\or.*^,
/\ PLURAL
o/ The tall lawyerbelievedthe man encoding Figure2.10 A proposition camefrom Mars. An-on-Norse SIoRAGEaNo RrrnrEvAL According to ACT*, a proposition such asXbelieaedY,in Figure 2.L0, is encoded and retrieved in an all-or-none manner. There has been a mild debate about this (seeAnderson, l98O; Anderson and Bower,1973;R. C. Anderson,7974;Goetz, Anderson, and Schallert, L981),and I have found myself on the other side of the issue-that is, proposing partial memory for propositions, The basic empirical researchconcems subiects' memory for sentencesin which it seemsreasonableto assume that certain phrases convey basic propositions. For instance, Thedoctor shotthe lawyer might be said to convey a basic ProPosition. Subiectscued with part of a ProPosition,such asthedoctor, may recall only part of the remainder. Numerous theories have been developed to accountfor this partial recall(Anderson and Bower, 1973;jones, 7978).One problem is that the degree of recall is much less than would be expectedunder some notions of chance. For instance, if we cue with the subiect of the sentence,recall of the object is much greater if the verb is recalled than if it is not. Depending on the experiment, obiect recall is 60-95 percent if the verb is recalled,and 3-15 percent if it is not. This empirical evidence provides weak evidence at best on all-or-none memory for propositions, and elsewhere (Anderson,1976,1980)I have tried to identify the ambiguities. In contrast to the murkiness of the empirical picture, the evidence for an all-or-none systeSnis quite clear from considerationsof func-
a \
i'
l. t .:l I
l tional value within a production system. Our experience has been that it is not adaptive to store or retrieve partial proposi-; tions becausepartial information cannot be easily used in fur- | ,:1. ther processing, and its retrieval only ClUttgqyp working mem-'l ory, or worse, misleads the information processing.It seems unlikely that an adaptive systemwould waste capacity on useless partial information. In this case,where the empirical evidencl is ambiguous, our Seneralframework can guide a decision. Arithmetic facts may be the easiestexamplesfor illustrating \ = the impact of partial encoding. If we store the proposition (5 \ (plus d Zll as l= (plus 3 2)\, with the sum omitted in a partial i encoding, the fact is of no use in a system.Whether partial encoding leads to disasterrather than iust wastedependson one's : system assumptions; if propositions encoded facts like (5 : (6 lptus g 21)'), imagine what a disasterthe partial encoding with (pf"t 3 2)) would lead to! The crisp semanticsassociated arithmetic proPositions makesclearthe consequencesof partial encoding. i{o*"ner, similar problems occur in an inferential system If lR"ugan defeated Carter) is encoded as (defeated iarter) or if (Miry give Bill Spot tomorrow) is encodedas (Mary -give Bill Spot). The complaint against the partial encoding scheme is not simply thaferrors are made becauseof the failure to encode informaiion. It is transparentthat humans do fail to encode information and do make elrors as a consequence.It can be argued that failure to encode is actuallyadaptive in that it prevents the memory system from being cluttered. The occasionalfailure of memory may be worth the savings in storageand processing. This is a diificult argument to evaluate, but it does make the point that failure to encodemay not be maladaptive.However, ihe result of partial encodingls maladaptive;the systemstill has to store, retrieve, and processthe information, so there is no savings in processingor storage.The partial memory that is retrieved is no better than nothing and perhaps is worse. An adaptive system would either iettison such partial encodings if they occurred or at least refuse to retrieve them. PlrrrnN
MetcntNc: Setrn*ru Pnoprnrrns
One of 1[g 9al!g:rtPloPgrJiesof a proP.o.qitignalstructure is its \l .otteit,ia'Uefoiti ditecting I lt' ability to'detb-ctitral'elements "t" how. The abitity to make connectednessiudgments shows up in a wide variety of experimentalparadigms, but it would be useful to describe an exPerimentfrom my laboratory which had
74
TheArchitectureof Cognition
as its goal to simply contrastiudgments oJ connectednesswith subiect-verb-obiect iudgm-entsof form. Subiectsstudied simple then saw test senand iloctor the hateil lawyer r".["^c"s like The the doctor), that hated lawyer maiched tencesthat exactly Ohe the lawyer), hated (The doctor reversed oblect and had subiect sailor, The the hated (The lawyer changed *oid one had that or lawyer kicked the doctor).two WPesof iudgments were made. In tire proposition-matching condition, subiectswere asked to ,".ogni"" k a test sentenceconveyed the same-meaningas the stud! sentence.In this condition they should respond positively to the first type of test sentencegiven above and negatively to the other two types. In the connectednesstask, subjects were askedif all three words camefrom the samesentence' Il Thus, they should respond positively to the first two types of f test sentencesand negatively to the third tyPe. All subiectsre- ,$ sponded more rapidlf in the connectednesiiondition, indicat- | ;, ing that accessto information about whether conceptsare con- | r,"ited is more rapid than is accessto how they are connected.[q1-r Reder and Anderson (1980)and Reder and Ross (198i) also present evidence that subjects are able to make iudgments ibout thematic relatednessmore rapidly than about exactrelationships. In those experimentssubjects learned a set of thematicafly related facts about a person-for instance, a set of facts about |ohn at the circus. Subiects could fudge that a fact was consistent with what they had studied faster than they could judge whether it had been studied. For instance, they could r"dg" thatlohn watcheilthe acrobafswas consistentbefore they cbud decide whether they had studied that sentence.This 5' -p"tidig* will be consideredfurther in Chapter of connecrapid detection this In many memory experiments and Collins a foil. of Quillian reiection tivity caninterfere with is in Madrid reiect to it difficult find (1g7i) report that subiects and Glucksberg connection. the spurious Mexicob".utse of decide to it easier find subiects that (1981) report McCloskey they don'[ know a fact like lohn hasa rifle it- they have learned not-hing connectinglohn and rifte than if they have explicitly learneJthe fact lt ii not knownwhetherlohn hasa rifle. Anderson and Ross(1980)showed that subiectswere slower to reiectA cat is a snakeif they had learned some irrelevant connecting fact like The cat attaikedthe snake.King and Anderson (7976)report a similar effect in an experiment in which subiects retrieved experimentally learned dcts. These similarity effects will be discussed further in ChaPter 3.
Knowledge Representation Perrnnn MnrcHtNc:
75
DecREE oF Mercrr
There is a natural tendencyto think of propositions and word strings as being the same-a single, verbal representation (Begg and Paivio, 1969). However, the strong ordinal metric properties in the pattern matching of word strings is not found with propositions. In particular, as discussed earlier, there is a strong dependenceon matching first elements in word strings. The samephenomenon is not found with propositions (Dosher, 7976).In unpublished researchin our laboratory we have had subjectsmemorize in a meaningful way sentenceslike The lawyer helpedthe writer. They were then presented with a set of words and were asked to recognize the sentence that these words camefrom. We found they were just as fast to recognize the probe Writer helpeillawyerasLawyerhelpedwriter. Thus the memory for meaningful information does not show the same order dependence as the memory for meaningless strings of items. Cor.rsrnucrroN or PnorostrloNs As with images and strings, ProPositionalstructurescan be created by combining either primitive elements or elements that are structures.However, the relational structure imposes a unique condition on proposition construction: the relation takesa fixed number of slots, no more and no less.When a relation is constructedbut not all the arguments specified,we fill in the missing arguments. Thus if we hear "Fred was stabbed," we cannot help but fill in a dummy agent. The various proposalsfor propositional systemsdiffer in the richnessproposed for default slots and for inferenceprocedures to fill these slots. One featurethat often accompaniesproposals for "semantic decomposition" (Schank,1972;Norman and Rumelhart, 1.975)is a rich system for inferring the occupants of various slots. However, by their nature all propositional systemsrequire some default system.The notion of a missing slot that must be filled in with a default value is not meaningful for imagesor strings. Fupcqox oF A PnoposrrloNAlCoor
I' c lv
The distinctive properties of propositions derive from their abstract,setlike structure.Peoplelearn from experiencewhich aspectsor higher-order propertiesof an event are significant, and to represent these they develop a code, which is more direct and efficient than storing the details. Rather than representing all the piecesof information that enableone to infer that
76
A has thrown a ball (A raised his hand over his head, A's hand held round object, A's hand thrust forward, A's hand released the ball, the balt moved forward, etc.) or the exactwords of the sentencethat was parsedinto this meaning, the code represents the significant relationship directly. The propositional rePresentation does yield economyof storagein long-term memory, but other advantagesare probably more significant. For instance, the representation will occupy less sPacein working memory and will not burden the pattern matcherwith needless detail. Thus, it often will be easier to manipulate (think about) these abstractedstructures. Another advantageis that the inferential rules need be stored only for the abstractrelation and not separatelyfor all the types of input that can give rise to that relation. Storage and Retrieval In ACT* theory storageand retrieval are identical for the three representationaltypes. A unit (phrase, image, or PrcPosition) is treated as an unanalyzed package.Its internal contents are not inspected by the declarative Processes,so these Processes have no basis for responding to a phrase differently than to a proposition. It is only when units are oPeratedupon by productions that their contentsare exposedand processingdifferences occur. This is a fundamental difference between declarative memory, which treatsall types the same,and production memory, which treatsthem differently. If ACT+is correctin this hypothesis, traditional memory research(for example, Anderson and Paulson, 1978;Paivio, t97l), which attempted to find evidence for different types, is doomed to failure because it is looking at declarativememory. Traditional researchhas been used to argue for different types by showing better memory for one type of material, but one can argue that the research is better explained in terms of differential elaboration (Anderson, 1976,1980;Anderson and Bower, 1973;Anderson and Reder,
77
KnowledgeRepresentation
The Architectureof Cognition
the elementsmust be in working memory and the system must be able to addresseachelement.Broadbentnotes that the number of elementsin a chunk correspondsto the nurnber of values one can keep separateon physical dimensions (for instance,the number of light intensities that one can reliably label). He suggests that problems with larger chunks might be "discriminaiion" probiems in identifying the locations of the individual elements in working memory. One can retrieve elements from a hierarchical structure through a top-down Process,by starting with the top stmcture, ,.,np"cking it into its elements,and unpacking these, and so on. Similarly, it is possible to retrieve elements in a bottom-up manner by starting with a bottom node, retrieving its embedding structure, then the structure that embeds it, and so on. Ttiese steps can fail either becausethe unit to be retrieved was not encoded or becauseit cannot be retrieved. Figure 2.1.1presents a hypothetical hierarchical structure, with twenty-seven terminal elements, in which certain units (marked with an X) are unavailablefor recall.Using a top-down search,it would be possibleto retrieve A, B, and C from the top structurei C,D, and E from A;1,2, and3 from C; and 4, 5, and 6 from D. The structuresfrom E and B are not available;I, J, and K can be retrieved from C; the structures from I and j are not available, but 25,26, and27 are availablefrom K. (This retrieval processcan be driven by spreading activation, which will be described in Chapter 3.) Thus, although each individual act of retrieval is all-or-none, only nine terminal elements are retrieved from the twenfy-seven-elementterminal aray. Also \l"'')
. t 'I' , Jr ; ' ",ur--
r97e). ,. The tefr! cognitiaeanif (Anderson, 1980)is used for structures \ ihat have all-or-none storage and retrieval properties. By this ,' ,,.ldefinition, all three types are cognitive units. Becauseof limits ' '' bn how much can be encoded into a single unit, large knowledge structures must be encoded hierarchically, with smaller cognitive units embedded in larger ones. It has been suggested (Broadbent, 1975\that the limits on unit size are related to the limits on the ability to accessrelated information in working memory. For a unit to be encodedinto long-term memory, all of
a 2 3 4 5 6
Figure 2.ll
7 89
1 9 2 0 2 21 2 2 3 2 4 2 5 2 6 2 7 l o l l 1 2 1 3l 4 l 5 1 6 1 7 1 8
A hypotheticalhierarchicalencoilingin which the boxedunits cannotbe retieaeil.
78
cally significant obiects without encoding the visual details of the obiect. This discussion of hierarchies has assumed that a particular element or subhierarchy aPPears in only one hierarchy, but much of the exPressive Power of the system derives from the fact that hierarchies may share subhierarchies, creating tangled hierarchies. For instance, the same image of a person can aPPear in multiple propositions encoding various facts about him. Anderson ind Bower (1973, chap. 9) showed that subjects had considerable facility at sharing a subproposition that participated in multiple emLedding propositions. Hierarchies can overlap in their ierminal nodes also, as in the case of two propositions connected to the same concePt. Figure 2.12 shows a very tangled hierarchy inspired by the scrip-t from schank and Abelson (1977, PP.43 and 44). Note that the central structure is a hierarchical string of events with various propositions and images overlaid on it. In general, what Schahk ind Abelson refer to as a script colresPonds to a temporal string structure setting forth a sequence of events.T This itructure ii overlaid with embellishing image, ProPositional, and string information. In Figure 2.12 a string encodes the main element Jf tt restaurant sequence (enter, order, eat, and exit), " another string unpacks the sequence involved in entering, an image uttpacks the structure of a table, and so on. Schank's motl recent proposal of MOPs (1980) comes closer to the generality of this tangled hierarchy concePt.
note that if cued with 10, the subject would be able to retfieve 1 * the fragment F and hence the elements11 and L2 but nothing R elseof thu hier"rchy.sSuch hierarchicalretrieval would produce i ., the phrase patterns documented for linear strings (Johnsorr, i-* ".t t ]97Oi, propositional structures (Anderson and Bower, 1973)' I , j and story ltructures (Owens, Bower, and Black, t979; Rumel- | I t hart, 1gi8). To my knowledge no one else has explored this I t'"i issue with respectto picture memory, but it would be surpris- | ),i. were not also found i ' -:: qlLrrrLqr structures ulr s=lers rrecall sLsu ing if such hierarchiial -J therg.
If one could segmenta stnrcture to be recalledinto its hierarchical units, one should see all-or-none recall for the separate units. The empirical phenomenonis never that clear-cut, partly bbcausesubjectsdo not adopt entirely consistent hierarchical encoding ,.it"^"r. The hierirchical structure of their scheme might a'iffer slightly.from the one assumed by the expe-rimenter. A moreimportant reasonfor deviation from hierarchi produce elaboracal all-or-none recall is that the subiect may tions that deviate from the specified hierarchy. For example, consider a subiect'smemory for the multiproposition sentence The rich itoctoigreetedthe sickbanker. A typical propositional analysis (Kintsch, 1974;Goetz, Anderson, and Schallert, 1'981) *orrid assignrich and iloctorto one unit, and sick andbankerto another. However, a subiect, elaboratingmentally on what the sentencemeans, might introduce direct memory connections between rich and banker or between doctor and sick. Then, in recall, he may recallsick and doctorbut not rich andbanker,violating the expectedall-or-none pattern. The complications Pr9ducel by suih elaborationsare discussedin Anderson (1976). Mrxsp HmnancruEs AND Tencrno HrrnancHrEs To this point the discussionhas largely assumedthat hierarchies cons]istof units of the same rePresentationaltyPe. However, representationscan be mixed, and there is considerable advantage to this flexibility. If one wanted to represent lohn chanted;Oor, two, three," it is more economicalto representthe object of john's chanting as a string. That is, the string would element of a proposition. Again, to represent the "t "t "pp".t at a ball game, one might want a linear orevents t"q.t"t." of of propositions describing the significant a sequence dering of would be mixed to representa spaimages and events. Strings or a sequenceof distinct images' syllables nonsense of array tial One would use a mixture of imagesand propositions to encode comments about pictures or the position of various semanti-
79
Knowledge RePresentation
TheArchitectureof Cognition
",, i 'LJ I I : ':
Sll \ Dorn /
( wotrin
't'
Loc - > Rcslouronl R.lolion
Curlomrt Hungry Hor Moncy
types. Figure 2.12 A tangleithierarchyof multiplerepresentational
80
The Architectureof Cognition Final Points
Table 2.1 summafized the featuresthat distinguish the three gpes of representation:they encodedifferent information and have different pattern-matching and construction principles. One might question whether theseprocessesare really distinct. To consider a wild but instructive example, supPosesomeone proposed the following "propositional" model to account for distanceeffectsin judging relative position in a linear ordering. Each object is given a propositional description that uniquely identifies its position. So the string ABCDEFGH might be encoded as follows: "A's position is 0 followed by 0 followed by O," "B's position is 0 followed by 0 followed by 1," and so on, with each position encoded in binary by u sequenceof three propositions specifying the three digits. To iudge the order of two items, a subject would have to retrieve the binary encodings and then scan them left to right until a first mismatching digit was found. Then a judgment could be made. The farther apart the items, the fewer propositions would have to be scannedon the averageto find a mismatch. One could make numerous challengesto this proposal,but I will focus on one. The time necessaryto make linear-order iudgments (often less than a second)is less than the times to chain through three propositions in memory (seconds-see the experiment by Shevell and Anderson in Anderson, 7976,p. 366).Thus one cannot get the temporal parameters right for this hypothetical propositional model. This illustrates an important constraint that blocks many creative attempts to reduceone Processto another, suPposedly more basic process.The times for the basic Processes must add up to the time for the reduced process. Despite the fact that the three representationaltyPes clearly have different processesworking on them, I have used the same basic network notation for all three with structure, attribute, and categoryinformation. The fact that similar notation can be used for distinct types reflects the fact that the notation itself does not contain the significant theoreticalclaims. Finally, I should stressthat the production system framework makes it easy to communicate among the rePresentational $pes. The condition and action of a production can specify different types of structures. Also, the condition of a production can match working memory elementsof one $pe, and the action can createworking memory elementsof another type. For instance, in reading one can imagine matching first the spatial image code,converting this to string code, then converting this to a propositional code.
Appendix:ProductionSetforMentalRotation Table 2.2 provides a production set that determines if two produc-tionset shepard ut d M"trler figures are,con8ruent._This task, howMetzler and Shepard the is much more general tl"t presimultaneously pair of any whether ever; it will dlcide 2'13 Figure rotation. after is congruent figures sentedconnected -of sysproduction by the produced control flow illustrates the asset production This task. the of subgoals the tem among memory sumes that-the subieit uses the stimuli as an extemal and is internally building an image in working memory. Figure 2.9, earlier, indicated where attlntion is focused in extemal memory and what is currently being held in internal working *u*ory at various points during the correspondence.
Table 2.2 A pfoductionsystemfor rotatingShepardand MetzleffiSures IF the goal is to compare obiect 1 to obiect 2 P1 THEN set as the subgoa[ to create an image of obiect 2 that is congnrent to object 1. IF the goal is to createan image of obiect 2 that is congruent P2 to obiect 1 and Part 1 is a Part of obiect 1
rHEN*fi',il:5*
p3
[;:fi rr *T":;:ll"#,:";5T
2 of a partof obiect * image
2 corof a partof obiect image
and part 2 is an untried Part of obfect 2 in locus A and Part 1 is in locus A of object 1 THEN set as a subgoal to createan image of part 2 that is congruent to Part 1 and tag Part 2 as tried. IF the goal is to createan image of obiect 2 that is congment P4 to obiect 1 and obiect 2 has no subParts THEN build an image of obiect 2 and set as a subgoal to make the image congruent to obiect L. P5IFthegoalistomakeimagelcongruenttoobiect2 andimage 1 and obiect 2 do not have the sarireorientation and the orientation of obiect 2 is less than 180"more than the orientation of image 1 THEN rotate image 1 counterclockwise'
82
Table 2.2 (continuedl IF the goal is to make image 1 congruent to obiect 1 andimagelandobiectlhavethesameorientation and image 1 and obiect 1 do not match with failure. POP THEN IF the goal is to make image 1 congruent to obiect 1' and image 1 and obiect 1 match THEN POP with success and record that image f is congruent to obiect 1' IF the goal is to create an image of obiect 2 that is congruent P8 to obiect 1 and an image is congnrent to obiect 1' THEN POP with that image as the result' IF the goal is to create an image of obiect 2 that is congruent w to obiect 1 and no congruent image was created THEN POP with failure. IF the goal is to create an image of 'a part of obiect 2 cot' P10 resPonding to Part 1 and an image is congruent to Part 1' THEN POP with that image as the result. IF the goal is to create an image of a part of obiect 1 corP11 responding to Part 1 and there are no more candidate parts of obiect 2 THEN POP with failure. IF the goal is to create an image of object 2 that is congment P12 to obiect 1 and there is an image of part 2 of obiect 2 that is congruent to Part 1 of obiect 1 and part 3 of obiect 1 is attached to part 1 and part 4 of obiect 2 is attached to part 2 THEN build an.imageof Part 4 and set as the subgoal to attach to the image of part 2 this image of part 4 so that it is congment to part 3' IF the goal is to attach image 2 to image 1.so that image 2 is P13 congruent to Part 3 and image 1 is an image of Part 1 and image 2 is an image of Part 2 and part 2 is attached to part 1 at locus A THEN attach image 2 to image 1 at locus A and set as a subgoal to test if image 2 is congruent to part 3. IF the goal is to test if an image is congruent to an object P14 and the image and the obiect match THEN POP with success. P6
Know ledge RePresent ation
The Architectute of Cogttitiotr
83
Table 2.2 (continued\ P15
P16
Pt7
P19
P20
IF the goal is to test if an image is congruent to a Part and the image and the obiect do not match THEN POP with failure. IF the goal is to attach image 1 to image 2 so that it is congruent to a Part and a subgoalhas resulted in failure THEN POP with failure. IF the goal is to create an image of a part of obiect 2 corresPonding to Part 1 and part 2 is an untried part of obiect 2 THEN set ai a subgoal to createan image of part 2 that is congment to Part 1 and tag part 2 as tried. IF the goal is to attach to image 1 image 2 so that it is congruent to Part 3 and this has been successfully done THEN POP with the combined image 1 and image 2 as a result. IF the goal is to create an image of object 1 that is congruent to obiect 2 and obiect 2 is not Primitive and a successfulimage has been synthesized THEN that image is of obiect 1 and it is congruent to obiect 2 and POP with that image as the result' IF the goal is to createan image of object 2 that is congruent to obiect 1 andanimagelofpart2ofobject2hasbeencreated and the image is. congruent to Part 1 of obiect 1 and part 1 is attached to Part 3 and part 2 is attached to Part 4 and Part 4 is not Primitive THEN set as a subgoal lo attach images of primitive parts of par t 4t ot heim agesot hat t heyar econgr uent t opar t 3. IF the goal is to attach images of primitive parts of obiect 2 to image 1 so that they are congment to obiect 1 and part 2 is a primitive part of obiect 2 and Part 2 is attached to obiect 4 and image 1 is an image of obiect 4 and image 1 is congruent to obiect 3 and part 1 is attached to obiect 3 and part 1 is a primitive part of obiect 1 THEN build an image 2 of Patt 2 and set as the subgoalto attachimage2 to image 1 so that it is congruent to Palt 1.
84
T'heArchitecture of Cogttition
Table 2.2 (continued\ IF the goal is to attach images of primitive parts of obiect 2 to image 1 so that they are congnrent to object 1 and image 2 has been created of part 2 of obiect 2 and part 2 is attachedto part 4 of object 2 and image 2 is congruent to part 1 of object 1 and part 1 is attachedto part 3 of object t THEN build image 3 of part 4 and set as the subgoalto attach image3 to image 2 so that it is congruent to part 3. IF the goal is to attach primitive parts of object 2 to image 1 so that they are congnrent to object 2 and all the primitive parts have been attached THEN POP with the result being the synthesizedimage.
.F iD(a q) g g
;3 (a
t s (a
X "lJ
sq!
(/)
o
qt o oo
L
=t
E!
t:
(-
(r: qr (J I
E c
:(r! t
.:p ql
IF the goal is to compare object 1 and object 2 and an image of obiect 2 has been created congruent to obiect 1 THEN POP with the conclusion that they are congruent.
UI
,vq
q)
's U
E
".i (\
o o C' I
q) .5 qt F
E
q
F q) (l, C'
Es r(
8e 3
eo P5
e (e) 5 o
gE Ea
!B
(a (a
s
o
() s
\l
P F. q)
s s
ES ;tr
(a !t
OO O.F
o oo
c,
OO
s o
s
\r
P O CD C'-
EE
PirCl -e (J
s
o q
E:
g5
o
3 _e t ) sql?
*d
Fbo (O F{
N q, L
,
bo ll
I
Spreadof Actiuation ,l
\,
(1)
\
j . 1,
': 3 Spread of Activation , | '1.
',
,
\.-
-!t
t,
$
i ,l
'
''-'\, .."[J
Introduction
\ i
NE OF THE KEY FACTORS in human intelliSence is the ability to identify and to utilize the knowledge that is relevant to a particular Problem. In the ACT activation servesa maior role in that facilspreading theory, ity. Activation controls the rate at which information is processed by the pattern matcher for production conditions. Since information can have an imPact on behavior only by being matched in the condition of a production, activation co-ntrolsthe rate of information Processing.It is the "enetry" that runs the "cognitive machinery." Activation sPreads through the declarative network along paths from original sources to associatedconcePts.A piece of information will become active to the degree that it is related to current sources of activation. Thus, spreading activation identifies and favors the processing of information most related to the immediate context (or sourcesof activation). There are numerous reasons for believing in a spreading activation mechanism. One is that it comespondsto our understandingsof neurophysiology. The neurons and their connections can be thought of as a network. The rate of firing of a neuron can be thought of as the level of activation. (It is believed [Hinton and Anderson, 1981] that neurons encode information by the rate of firing.) However, there are many ways to make a corresPondencebetween neural constructs and the cognitive constructsof a spreading activation model, and there is no compelling basis for deciding among them. Rather than a node-to'neuron corresPondence,it may be more reasonablefor nodes to correspond to sets of neurons (Hinton and Anderson, 1981).Nonetheless, the general set of 85
87
,,neurophysiologicalhunches" that we Possessis probably an importint .ot rideration in many people's acceptance of spreadingactivation. It is nara to pinpoint the intellectual origins of the idea of spreadingactivition. In part it was anticipatedin associationist models of thought going back to Aristotle (see Anderson and Bower,1973, fJr a discussionof the history of associative models). The process of tracing chains of connections can be found in early experimental psychology Programs, in Freud's psychodynamics, and in Pavlov's ideas. These models were iargely slrial, with one thought leading to iust one other, On thJ oiher hand, the neural netrn'orkmodels of the 1940sand 1950s(for exarnple, Hebb, 7949)stressedparallel interactions among many elements. ThJ work of Quillian (1969)was important to the current resurgenceof work on spreading activation, and he is probably rlsponsible for the use of the term. His maior contribu--rf tion wis to relate this idea to the growing understanding tf ll , symbolic computation and to suggesthow it might.be used fi I ',"1t\ to facilitate tire search of semantic networks. Collins and : this con- | Quiliian (1969, 1972) popularized an application of siruct to retrieval of citegorical facts (for example, a canary is a bird),leading to what was once called "the semanticmemory paradigm." r Currently, two maior researchparadigms are important to understatdit g spreading activation.One is the priming Para- \'i digm (Meyer and schvaneveldt, TgTl) on the effect that preset tit g an item has on the processing of associated items. For initance, recognition of the word dog is facilitated if it is preceded by the word caf. These results provide direct evid"t." that processing of items can be primed through association. The literature on fact retrieval (Anderson, 1974)studies subiects' ability to retrieve previously studied information (such as whether they recognizeThe lawyer is in the bank). Both literatures will be discussedin detail in later sections of the chapter. Fuxcrtonr oF AcrrvATIoN Activation measuresthe likelihood that a particular piece of , knowledge will be useful at a particular rnoment. It is a reason- I able heuiistic that knowledge associatedwith what we are Pro- I '; cessingis likely to be relevant to that processing.Spreading ac- \ ' ) tivation is a parallel mechanism foi spreadiig measures of | associativerellvance over the network of knowledge. These
89
The Architectureof Cognition
Spreadof Actiaation
measuresare used to decide how to allocate resourcesto later, I more time-consuming processes,such as pattern matching, that i operate on the knowledge. These later processeswill devote i more resourcesto the more active knowledge on the heuristic I assumptionthat this is likely to be the most useful. This was the basisof Quillian's argument for why spreading activation was a good artificial intelligence mechanism. It avoided or reduced the costly knowledge searchprocessesthat can be the pitfall of any artificial intelligence proiect with a large data base. Unfortunately, the processof computing the spread of activation is computationally expensive on standard computers. On serial computers, time to compute spread of activation tends to increasewith the square of the number of nodes in the spreading process.Fahlman (1981)has tried to develop parallel machinery that will reduce the computational cost. Until such machinery becomesavailable,this apparentlygood idea will not seeextensive use in pure artificial intelligence applications (applied knowledge engineering). However, it is reasonableto propose \ that it is relatively cheap for the brain to spread activation Uut relatively expensive for it to perform symbolic computation. tf Activation does not directly result in behavior. At any point of time a great deal of information may be active, but an active network is like Tolman's rat buried in thought (Guthrie, !952, p. 1a3). There must be processesthat convert this activation into behavior. A serious gap in much of the theorizing"ebqpt'i spreading activation is that theseprocesseshave not been spec- | ' r$ed. A major strength of the ACT production system is that one is forced to specify theseinterfacing processes.In ACT*, activation has an impact on behavior through the production system's pattern matcher, which must determine whether a set of knowledge structures in the network fits the pattern specified by a production condition. The level of activation of the knowledge structuresdetermines the amount of effort that goes into making the correspondence.If the structure is inactive, no attempt is made to match it. If its activation level is low, the rate of matching may be so slow that the system follows someother course of behavior before the pattern matcher can completely match the structure. If it is sufficiently active to lead to a production execution, the activation level will determine the speed of the pattern matching and henceof the execution. In the standard terminology of cognitive psychology, spreading activation defines working memory for ACT. Becausethere are degreesof activation, ACT's conceptof working memory is one of degrees.The processof spreading activation amounts to
the processof bringing information into working memory, thus making it available. In this chapter I will first discussthe properties of spreading activation in the ACT theory, explain how spreading activation affects behavior, and present arguments for its general conception. Then I will consider some evidencefrom the priming and recognition literatures to determine whether ACT is compatible with the results. Finally, I will consider the relation between ACT's spreadingactivation and haditional conceptsof working and short-term memory.
88
Mechanisms of Activation Activation flows from a sourceand setsup levelsof activation throughout the associative network. In ACf, various nodes can becomesources.In this sectionI will specify how nodes become and stay sources, how activation spreadsthrough the network, and how levels of activation in the network and rate of production firing are related. Becausethis section uses quite a lot of mathematicalnotation, Table 3.1 provides a glossary. Souncus or ActrvATroN
I
There are three ways in which an element can become a sourceof activation in working memory. First, an element that encodes an environmental stimulus will become a source. For example,if a word is presented,its memory rePresentationwill be aciivated. Second, when a production executes, its actions 6riii"d structures that become sourlcesof activation. Third, a Production can focus the goal element on a structure in working memory, and the elements of such a focused structure can become sourcesof activation. To illustrate the first and second ways, consider someone reading and recognizing the sentenceIn the winery the fireman snoreil.-Ata low level of processing,the encodingsof individual letter strokes(features)would aPPearas active entities in working memory. At a second level, letter- and word-recognition productions would recognize patterns of these features as words and deposit in working memory active encodings of their word interpretations (not unlike the word recognition model of Mcclelland and Rumelhart, 1981, or the model in Chapter 4). At this level a perceptualinterpretation of an event is bling produced by a production action. After the second level, pioductions would recognizethe sentenceencoding and deposit an active rePresentationof the resPonsecode in working memory.t
of notation Table3.1 Glossary
6f Lt
Level of activation of a node Level of activation of node i Level of activation of node f at time f Vector giving level of activation of all nodes in network Vector giving activation of nodes at time f Vector giving the activation at time t supported by source i Strength of coupling between a node and its input Amount of source activation Amount of source activation for node i Amount of source activation for node i at time f Vector giving source activations of all nodes in network Vector giving source activations of nodes at time I Net source activation, defined as c*Bf p* Net source activation of node i Vector giving net source activation of all nodes in network Delay in transmission of activation between nodes Period of time that a node will stay a source without main-
n{t) N(f) p* p
tenance Total activation input to node i at time f Vector giving input to all nodes at time f Rate of decay of activation Amount of activation maintained in spread, defined as p :
a A1
a{t) A A(f) A{t) B c* ci ci'(r)
c* c*(r) c C1
c
Spread of Actiaation
TheArchitectureof Cognition
90
CocNluvp
UNtrs Pnovron Nrrwonx
CoNxrcuoNs
As discussedin Chapter 2, declarativememory consistsof a network of cognitive units. For instance,two elementslike doctor and bank can be connected iust by appearing in the same propositional structure (The doctor is in the bank). Figure 3.L gives an exampleof a piece of long-tern memory in which unit nodes connect element nodes. Two units in temporary active memory encodethe presented sentenceDoctor hatesthe lawyer who is in the bank. Nodes corresPondingto the main concePts are sourcesof activation. From these sourcesactivation sPreads throughout the network. For purposesof understanding spread of activation, we can think of .[ong-terlnmemory as a network of
TEMPORARY UNITS ENCODING PROBE
LONG.TERMMEMORY
,,/PRIEST
B/p* fg
R sr(T)
Strength of the connection from node f to node i Matrix of the connecting strengths between the nodes in the network Total activation over the nodes in the network maintained by source i after i has been a source for T seconds
u N t T S-
UNIT I
^/::::, LAWYER
T9-
/ FRIDAY
UNIT2 The elements entered into working memory (letter features, words, and so on) all start out as sources of activation. However, unless focused, these elements stay active only Af period of time. Thus a great deal of information can be active at any one time, but only a relatively small amount will be maintained active. This is like the conception of short-term memory sketched out in Shiffrin (7975). Sources of activation are not lost, however, if they are encodings of elements currently being perceived. Even if the node encoding the perceptual obiect did inactive, it would be immediately reactivated by reenb".ott " coding. An element that is made a source by direct PercePtion can become inactive only after it is no longer directly Perceived. Also, the current goal element serves as a Pennanent source of activation.
/
"TSA|LoR
U N I TI O
t( -- RoB I Figure 3.1 A hypotheticalnetworkstructure.In temporaryactiaememory is encodedThe doctor hates the lawyer who is in the bank. In long-termffiemoryare illustraterlthe /ncfs The doctor hatesthe priest, The lawyer hatesthe iudge, The lawyer was in the court on Friday, The sailor is in the bank, The bank was robbed on Friday, and The sailor was robbed on Friday.
gnition nodes' with no distinction nodes.2
Spreadof Activation
between unit nodes and element
Spnnnp or AcrrvATroN The network over which activation spreads can be represented as an n x n matrix R with entries iu. tt value ru ='0 if there is no connection between i andl. otirerwise, " its varue reflects the relative strength of connection from i to j.A constraint in the ACT theory is that lsrg - l. This guarantees that the ac_ tivation from node i is divided among ti" nodes according to their strengths of connection. The"tiu.hed level of activation of a node will increase or decrease as the activation coming into it from various sources increases or decreases. The activation of a node varies continuously with increases or decreases in the input. A bound is placed or, ihu totar activation of a node by u dgcay process. This con."pr-ion .an be captured by the model of a ieiky capacitor (sejnowski, 19g1), and the change in activation of noae i is desciib"a u/ the differential equation:
ry:
- p*a{t) Bn{t)
(3.1)
Here a1(f)is the activationof node i at time f, and nt!)is the input to node f at time t. equation indicates that the .r.ur,Iu Ti: in activation level is positiveiy proportional (proportio"ffi given by the strength of the coipring factor,Bjto the amount of and negatively_proportionarlirop"ttlonality given input by the decayfactor, p*) to tr,L cioent lever oi activation. The second factor describesan exponentiardecay process. The behavior of this equation is fairly easyto undersi""a in the caseof constant input n. suppose the initial activation of node i is asandthere is constant input z. Then activation of node i at t is given as: a{t) = a, *
(#-"f.
- eo")
Eq. (3.2) indicates becausethe input to i varies with time and depends on the reverberation of the activation through the network. However, Eq. (3.2)still servesas a good first approximation for understanding what is happening. That is, the current input is driving the activation toward some resting level, and the rate of approach is exponential. If a node is not a sourcenode, the input is iust the activation received from nodes connectedto it. tf. it is a source node,8 it receives an additional source activation cf . (As will be discussed in Chapter 5, the size of this source activation depends on the strength of node f.) The input to node i can be expressed as:
ry ncal {*r {tr)
-Df) (3'3) n t $ ) : c i ( r )+ l r x a l t ,I J,f,E.-/tnf.d*k1 where ci$) -- 0 if i is not a sourcenode at time f, and ci() = ty if it is. The term in the sum indicates that the activation from a nodei is divided among the attachednodesaccordingto the relative strengthsrx. The value 6f appearsin Eq. (3.2)to reflect the delay in transmission between node i and node i. It a neural model it might reflect the time for information to travel down an axon. Equations (3.1) to (3.3)may be rePresentedmore generally if we consider an z-elementvectorA(t) to representthe activation of the n nodes in the network, another n-elementvectorfiI(f)to representthe inputs to them, and a third n-elementvector C.(f) to reflect the activation from the sourcenodes. Then Eqs. (3.1) and (3.3)become: dA(t) : BN(t) - P.A(t) dt
(3.4)
and
N(t;:C*(t)+RA(t-Ef)
(3.2)
Thus the activation moves from as to nB/p*.It approachesits new value according to an.exponentialry decereratingfunction. The new value is proportional to both tne input, n, and the strength of coupling, B, and inversely proportio'narto ihe a"""f fa9t9r, p*. The ratebf approach is conirofiJ iy p.. The behavior of the syitem is usuaily more compricatedthan
Srn L{. il;
(3.s)
where R is the matrix of connectionstrengths.The generalsolution for Eq. (3.a) is given as: A(t; :
e-p'tA(g\*
f -re-p'.-')BN(x)dx
(3.6)
There is no general closed-form solution to this equation.
94
It
The Architecture of Cogtritiotr
McClelland, (7979)dealt with the situation where 6f : 0 and there was no reverberation (that is, activation does not feed back from a node to itself either directly or through intermediate nodes). The first assumption may be approximately true, but the second is invalid in the current context' In ACT* 6f is assumedto be small (on the order of a millisecond), and the decayfactorp* islarge. In this casethe activation ? L LY Y ;F level of the network will rapidly achieve asymPtotic level. At vJ|^d' -i-ru*ptotic level, dA(t)/dt = 0, and we can derive frcm Eqs. ,o^.(.! ^ta.iy and (3.5):a '----tl (J'J': ,'---:'(, 1t.(.tt,) ,-, : r> L
W:6tad/
l: :)& r^f,P[
v
A+ ' pRA t-1/A--
l*ft
Q:)
where c : Bc*/p' and p = Blp*.lt p < 1, Eq. (3.7)describesa set of simultaneouslinear equations with finite, positive solu- \ z a i /! : L"!l lry tions for all the a1. $f p ) 1, activation grows without bound X 'ro:'f"< pat-" activation the Thus for Eq. soluiion is no there t3.71.) and --'^--/ tern in the network can be dete-rmi""a by solving this set ofhnnL reequations. The parameter p is the maintenancefactor that flects what proportion of a node's activation is maintained in the spread to the neighboring nodes. tn ihis systemeachsourcesupports a particular subsetof activation ot"t the network. Let Ar(f) be a vector describing the subsetof activation supportedby sourcei. Then the total activation is just the su* ol the subsetssupported by all sourcesi, that is
A(t\:
?
o,ttl
(3.8)
Let q(Q denote the sum of activation over all the nodes suPported by source i T seconds after f became a source node. sr(7) ,"pt"tut is the activation of f due to its source contribution, plus thl activation of its associated nodes due to the source contribution, plus the activation of their associated nodes, and so on' That is, i,(D is the sum of the entries in ,q1(t).To calculate sr(7) we need to represent T in terms of the number, n, of El time units that have passed: T = ntt. The activation will have had a chance to spread n units deep in the network by the time T. Then,
=,,,Al ry(,,6r) [,
B%
(3.e)
''
, {u*l1tt
Spreadof Actiaation
wherep(i,n):ffil6f.Inthelimitthishasthevalue:
sr(r)il
q/$ - p)
{,h I lr"ntt2ol
'' X*t be p* : 1, B :
'$ Reasonablevalues for the parametersmight (and hencep : .8),6f : 1'msec.with such values,the quantity in Eq. (3.9j achieves73 percentof its asymptoticvalue in L0 *r"., 91 percentin 20 msec,and 99 percentin 40 msec.Thus, to a good approximation one can reason about the impact of the aslmptotic activation values without worrying about h9* asymptotewas achieved.Thus Eq. (3.7) is the important one for determining the activation pattem. The appendix to this chapter contait Jso*e usesof Eq. (3.7)for deciding about the impact of various network structures on distribution of activation' It shows that level of activation tends to decay exPonentiallywith distancefrom source. The rapidity of spreadof activation is in keeping with a g99d number tf ot-t eiperimental results that will be reviewed in "r with the adaptive function of spreadingactivathis chapterand tion. AJmentioned earlier, activation is supposedto be a relevancy heuristic for determining the importance of various pieces of information. It guides subsequent-processingand makesit more efficient. It would not make much adaptive sense to compute a spreading activation Processif that computation took a iot g time to identify the relevant structures of memory' Sunaunlc Up This section has shown in detail how a spreading activation mechanism,consistent with our knowledge of neural functioning, producesa variation in activationlevels.Equation(3.7) der.iiU"r asymptotic pattems of activation, which, it was shown, are approacneaso iapidly that one can safely ignore spreading time in reaction timeialculations. Thus a coherentrationale has been establishedfor using Eq. (3.7). In some enterprises(see chapter5), facedwith complexnetworks, one might want to resort to solving the simultaneousequations in (3.7).However, for the purposesof this chapter, its qualitative implications are - :r sufficientr deter\ The level of activation of a piece of network structure mineshowrapidlyitisprocessedbythepatternmatcher. Chapter4 will [o into detail about how the pattem matcher Performs its tests and how it is controlled by level of activation. For current PurPosesthe following approximation is useful: if the
.il,
96
The Architectureof Cognition
pattern tests to be performed have complexity K and the data has activation A, the time to perform the testswill be K/ A. Thus time is directly proportional to our measureof pattern complexity and inversely proportional to activation. This implies a multiplicative rather than an additive relationship between pattern complexity and level of activation. In this chapter pattern complexity will remain an intuitive concept,but it will be more Precisely defined in the next chapter.
LI
lr)
Priming Studies Priming involves presenting subjectswith some information and then testing the effect of the presentationon accessto associated information. In the lexical decision task @ecker, 1980; Fischler, 1977;McKoon and Ratclif,f,1979; Meyer and Schvaneveldt, 1971;Neely, 1977),the one most frequently used, it is found that less.timeis required to decide that an item is a word if it is precededby an associatedword. For instance butter is more rapidly iudged to be a word when preceded by breail. These paradigms strongly demonstratethe importance of associative spread of activation. They are particularly convincing becausethe priming information is not directly part of the measured task, and subiectsare often unaware of the associativerelations between the prime and the target. The fact that priming is obtained without awarenessis a direct reflection of the automatic and ubiquitous characterof associativespread of activation. Beneficialeffectsof associativelyrelated primes have also been observedin word naming (Warren, 1977),free association (Perlmutter and Anderson, unpublished), item recognition (Ratcliff and McKoon, 1.981),sentencecompletion and sensibility iudgments (Reder, in press), and word completion (Schus"*n1iitl.r,r.",ron of the associatedword is the dominant reJf , is sometimesfound for words unrelated to the lI. sult, inhibition and Bloom,1979;Becker,1980).ForJ prime (Neely, 1977;Fischler instance, judgment of butter may be slower when precededby the unrelated,gloae than when precededby xxxxx if the subiect consciously expectsthe following word to be related-a word such ashand,rather than butter-after gloae.If the subiect is not aware of the predictive relation between the associative prime and the target, or if the prime-to-target interval is too short to permit an expectation to develop, only positive facilitation is observed. Neely has related this to Posner and Snydey's(1975) distinction between automatic and consciousactivation.
97
Spreadof Actiaation
t'$rr
A PnooucrroN SYsrsM Monru Table 3.2 provides a set of productions that model perfo_rmance in a lexicaldecision task, and Figure 3.2 illustratesthe flow of control among them. Central to this analysis is the idea that the subiectenteis the experimentwith productions, such as P1, which iutomatically label stimuli as words. There would be one in the lexicaldecisiontask Table3.2 Proiluctions for petformance 1. A word-naming Production IF the word is spelled F-E-A-T-H-E-R Pl THEN assert that the word is similar to FEATHER'
|^;? R,J,.ui
2. Productions that perform automatic lexical decision IF the go"l is to iudge if the stimulus is spelled correctly n and a word is similar to the stimulus and the stimulus mismatchesthe spelling of the word THEN say no. IF the goal is to iudge if the stimulus is spelled correctly P3 and a word is similar to the stimulus a"*jf st im ulusdoesnot m ism at cht hespellingof t he THEN say yes. 3. Productions that perfomr deliberate lexicd decision IF the goufis to iudge if the stimulus matches an anticipated P4 word and a word is anticiPated and the stimulus does not mismatch the spelling of the word THEN say Yes. IF the goal is to iudge if the stimulus matchesan anticipated P5 word and a word is anticiPated and the stimulus mismatchesthe spelling of the word THEN change the goal to iudging if the stimulus is correctly sPelled. 4. An optional production that capitalizes on nonwordg eimilar to the anticipated word tF the goal is to iudge if the stimulus matchesan anticipated P6 word and a word is anticiPated and the stimulus is similar to the word and the stimulus mismatchesthe spelling of the word THEN say no.
Spread of Actiaation
The Architecture of Cognition
98
r{o ANTICIPATIONS
AN T IC IP A TE WORD
IDENTIFY SIMILAR WORD
P4
SAY YES
NO, BUT P6
SIMILAR
(OPTIONAL)
SAY NO
SAY Y ES
of theflow of controlamongtheproductions Figure3.2 A representation in Table3.2. such production Per word. Not only will they label actual words, but they will label near-words like FEATHFR with its closestmatching word. The mechanismsof such word identification and partial matching will be described in the next chapter. It is reasonableto suPPosethat such productions exist, given that throughout our lives we have to recogniz9 briefly words. The difficulty of fresented and iriperfectly produced is proofreading text ior speliing errors also evidence that such productions exist. partial-matching ^ Sir,." product-ionslike P1 will label partial matches,the subject cannot respondsimply upon the firing of P1; the spelling of ihe similar word must be checkedagainst the stimulus' It has been suggestedthat the resultstypically observed in lexical decision tists depend on the fact that the distractorsare similar to words oames, 1975;Neely,1977). Certainly the subiect need,s some basis for reiecting distractors, and that basis may well have an impact upon the processfor recognizing targets'.According to the model presentedhere, the subiectreiectsa stimulus as a nonword if ifmismatches a hypothesized word and accepts the stimulus if a mismatch cannot be found' irroductions P2 and P3 model performancein those situations where the subject is not consciously anticipating a particular
99
words but is operating with the goal of seeing if the stimulus matches the spelling of the word iudged as similar. These productions therefore must wait for a production like P1 to first label the stimulus.s Productions P4 and P5 model performance when the subiect is consciously anticipating a word. In this case there is no need ' to wait for the stimulus to be labeled as similar to a word; it can be matched against the anticipated word. If it matches, the subject can exit with a quick yes. If not, the subject cannot yet say no. He might have expected FEATHER, but was presented with SHOULDER or SHOULDFR, in which case he must return to the goal of iudging a similar word. Production P5 switches to the goal of judging the similar word. It is assumed that while P5 was applying, a word-naming production like Pl. would also have applied to produce a hypothesis about the identity of the stimulus. Therefore, in Figure 3.2, PS goes directly to the similarity judging box where P2 and P3 can apply. Note that P2 and P3 have the goal of checking the spelling, while P4 and P5 have the goal of matching an anticipated word. Becausethe two goals are contradictory, P2 and P3 cannot apply in parallel with P4 and P5, but only after P5. P6 reflects an optional production for which there is some experimental evidence, as I will discuss. It generates a no if the stimulus does not match but is similar to an anticipated word. .In contrast to P5, P6 avoids the need to go onto P2 and P3. Thus, if FEATHER were anticipated, it would generate a quick no to FEATHFR. It would also erroneously generate a quick no to HEATHER. However, no experiment has used words similar but not identical to anticipated words. THr Perrsnhr-MarcHrNc
Nxrwonx
Figure 3.3 provides a schematic illustration of how the pattern matching for productions P2-P5 is implemented. The top half of the figure illustrates the pattem network, and the bottom half a hypothetical state of the declarative network that drives the pattern matcher. Critical to the operation of this system is the subnode A, which detects conflicts between word spelling and the information in the stimulus. When the goal is to judge a similar word, P2 and P3 will be applicable. Note that P2 receives positive input from both A, which detects a spelling mismatch, and from the clause noting the most similar word. P2 thus performs a test to see if there is a misspelling of the similar word. P3 receives a positive input from the similar word node but a negatiae input from the A node. P3 thus checks that there
TheArchitectureof Cognitian
101
Spreadof Actiuation
, la,
.1, ,
ory structuresto activatespelling information for various words I u in long-terrn memory. In addition to the stimulus and the antic- | .*i .'l ipated word, a third potential source of activation is presenta- * t t ' \ ) tion of an associatedpriming word, which can activatethe target word.
PRODUCTIOtI [,€MORY
worcl ,s similor
word hos
worclis onticipoled
' 'l'-,
* GoAL-
ANTICIPAT€O
STIMULUS
woRK|t{C M€MORY
LONG TERT IIEMORY
Figure 3.3 The pattern network representingthe conditionsfor productionsP2, P3,P4, and PSin the lexical decisiontask. AIsorepresentedare the temporary structurescreatedto encodethe probe, the sourcesof actiaation (from which rays emanate), to long-termmemory. Forsimplicity, the and the connections goalelementsin P2*PS are not represented.
are no misspellings. P2 and P3 are mutually inhibitory, and only one can apply. Similarly, P4 and P5 are applicable when the goal is to iudge an anticipated word. If there is no conflict with the anticipateil word, P4 will apply to say yes. If there is a conflict with th.eanticipatedword, P5 will apply to switch the goal to iudge a similar word. As with P2 and P3, P4 and P5 are mutually inhibitory. The bottom section of Figure 3.3 illustrates the situation in declarative memory. Anticipating a word (for example, hand) amounts to keeping active in memory a Proposition to that eff.et ("hand is anticipated"), with activation maintained by the goal element.The lettersof the stimulus keep the spelling information active. Activation spreadsfrom these temporary mem-
Basrc PnrorcrroNs oF THEMoDEL {- yo,.,A o,.L^. This model explains the basiceffectsobservedin the priming paradigms. First, presentinga word will increasethe activation ielated wsrds and their spelling information. This will of "n speed the rate at which P2 can detect a mismatch to a related word and hence the rate at which P3 can decide that there is a match. On the other hand, if the probe is not related,the rate at which P2 and P3 apply will be unaffected by the prime. Thus the model predicts only facilitation and no inhibition in paradigms of automatic Priming. Conscious anticipation of the probe has two advantages; first, the spelling information is more active, and second,there is no need for initial identification by a word-naming production like P1. On the other hand, if the anticipation is not met, there is the cost of switching the goal to iudging a similar word. Thus the model predicts both benefit and cost in paradigms of consciouspriming. Table 3.3 is an attempt to analyzethe full combinatorial possibilities of the lexical decision task. The subiect may or may not have a consciousanticipation and may or may not have automatically primed a set of words. In the caseof consciousanticipation or automatic priming, the stimulus may or *"y_ 1oj match the items anticipated or primed (a nonword FEATHFR primingfor wotilsand anil conscious Table3.3 Analysisof automatic compaisonswith controlconditionr nonwords:
Mercx PRrME Word Nonword Word Nor uercH Nonword PRIMB Word No rnrurNc Nonword
Match anticipation
Not match anticipation
No anticipation
S*, A* E., A* s+ Es+ E-
E*, 6+
A+
E-, A* EEEE-
I Control Control
1. A+ is the advantage of automatic activation. S+ is the advantage of conscious anticipation. E- is the cost if the anticipated word is not presented.
Spreadof Actiuation 102
\J
)\ J
\; rl.: lr
t--
would be considered to match a primed word like FEATHER). Finally, the stimulus may be a word or a nonword. Table 3'3 gives all eighteen Possibilities created by these combinations' The standard control cases are when there is no anticipation or priming; Table 3.3 gives the relevant differences between theie contiols for word and nonword and the other cases. These predictions are based on the assumption that the subiect uses productions P2-P5, as shown in Table3.2, but not the optional F5. Th"t" are three factors: A+ refers to the activation-based advantage of automatic priming; S+ refers to the priming advantage alsociated with conscious anticipation through activation of-spelling information and omission of the similarity iudgment proiuction (p1); and E- refers to the cost of having iudgment of ihe anticipated word followed by the attempt to match -some other *ord. As the table shows, the theory predicts that the effects of automatic and conscious priming are cumulative. Also, no advantage of conscious anticipation is predicted for non-
words. Neely (Lg77) came close to including all the possibilities of Table 3.3 in one experiment. He presented subjects _with a target stimulus. He was prime followed at varying delays by " conscious primingby from to separate automatii priming "bl. a building when !h"y of name the telling subiects to anticipate would be conscious there way In this , su* f,ody part as a prime. i anticipJtiot of a building but automatic priming of a body part. Neely suiprised subjects by followingbody part ; ,' Or, sorne trials with a body-part word such asleg. _w1"" tlgre was a 700 msec .i . j dehy between prime and target,- t-he benefit of priming for a i lr ; u"au part was less than the cost of the failed anticipation, yield' less .ir,g a^net inhibition. However, on such trials there was a surprise for than word body-part surprise a for intibition bird word. This is the result predicted if one assumes that the benefits of automatic priming combine with the cost of conscious inhibition. It cortespot ds to the contrast between the not-match-anticipation, match-prime situation and the notmatch-anticipation, not-match-prime control in Table 3.3. d Bloom (1979')and Neely (7977)have reported that Fischler "tpriming has a beneficial effect on nonword iudgconscious ments, contradicting the predictions in Table 3.3. There aPPear to be small decreases in latency for nonwords similar to the primed word. However, there is also a slight increase- in the false-alarm rate. According to Table 3.2, conscious priming has no beneficial effects for nonwords because execution of P5, which reiects the word as failing expectations, must be followed l
103
The Architecture of Cognition by P2, which rejects the match to a similar word. Thus if a subject expects FEATHER and sees SHOULDFR, P5 will reiect the match to the expected word and change the goal to iudge the spelling of a similar word. If SHOULDER is selected as similar, P2 will reject the match to SHOULDFR and execute no. There is no benefit associated with having a nonword similar to a conscious expectation, so a subiect will be no faster if FEATHFR is presented when FEATHER is expected. To account for the small benefits observed, one might assume that subjects sometimes use the optional P6 in Table 3.2, which would produce faster performance for similar nonwords. INrrnecrroN
wITH Srruulus
Quermv
A critical question is how well ACT can explain the various factors that modulate the degree of priming. For instance, Meyer, Schvaneveldt, and Ruddy (1975) report that the effect of priming is increased in the presence of degraded stimuli. The patterns matched in P2 and P3 are combinations of information ibout the physical stimulus and about the spelling pattern. The wilt be a function of both sources of rate of palter.r,atg\ing activation.(pe{rading'1he stimulus will lower the level of activation from'ihe physical stimulus and so increase Processing time. It also means that pattern matching is more dependent on activation of the spelling information, so there will be an increased priming effect. It should be noted here that Meyer and colleagues ProPose a different interpretation of their results. Using Sternberg's (1969) additive factors logic, they assume from the interaction of stimulus quality and priming that semantic context primes the perceptuil processing before information from lexical memory is involved. The ACT analysis is that stimulus quality affects the rate of lexical pattern matching (P2 and P3) by affecting the amount of activation coming from the perceptual encoding. The experiment by Meyer and colleagues offers no basis for separating the two interpretations. However, it should be noted that tha ACT explanation predicts the direction of the interaction, not just its existence. Errucrs or Srzr oF EXPECTEDSET In many experiments subiects cannot predict the specific word that will occur. For instance, they might be primed with a category $og) and be led to expect any member of that category. They might prepare for a small number of members in the categoi (cotiie, poodle, labrador) and avoid the similarity stage (P2
104
The Arcltitectureof Cognition
Spreadof Actiaation
and P3) if the item is one of these. However, the number of items they can expect is small, coresponding to the limitation on the number of items that can be kept active in working memory. If the set of words that might follow the prime is greater than the number that can be kept active, subiects have , only a probability of correctlyanticipating the word. Consistent witii this, Becker(19s0)has shown that the amount of facilita- Cr rll 6 lI).a tion is a function of the size of the primed set. Fischler and lsr{{ ,F Bloom (1979)have shown that there are positive benefits onlyj fu.; e ) '' for the most probable continuations of a sentence. r) a rrzn\r''' l t' a l''' k Preparingfor more than one word would mean hotding more than one assertionof the form "collie is anticipated" active in working memory. The number of assertionsthat can be held active should be limited by the capacity to maintain information in an active state. One effect of maintaining more assertions would be to decreasethe amount of activation being expended by P5 to test any one anticipated word becauseactivation from the stimulus has to be divided among all the words. Thus it should take longer for P5 to detect a mismatch and to switch to the similarity stage.In this w"!t ACT can explain the other result observed by Becker: that inhibition was greater when larger sets were expected.
-b,
4-gt,
..,1 rri .'t
Tuvrr Counsr or Pnrurnc By looking at the amount of priming at various intervals after a stimulus, it is possible to make inferencesabout the loss of activation in the semantic network. Meyer, Schvaneveldt,and Ruddy (1972)show that approximately half of the priming benefit remains 4 secondsafter the prime. Other studies (for example, Swinney,1979) have shown rapid decayof priming by the unintended meaning of an ambiguous word, such that in less than a second, no priming remains. From the Meyer et al. study, something like 4 secondscan be taken as a high estimate on the half life of priming, and from the Swinney study, something like 4fr) mseccan be taken as a low estimate.According to ACT theory, thesehalf-life estimates reflect the parameter Af, the time a node can stay active without rehearsal.There is some difficulty in assessingwhen subjectsceaseto rehearseor focus an item. The higher times in Meyer et al. may reflect maintenance activity by the subiect. A related issue is the time course of the rise of activation. Fischler and Goodman (1978),Ratcliff and McKoon (1981),and Warren (1977)have shown that automatic priming is nearly at maximum less than 200 msec after presentation of the prime.
li
lt i,l
105
Schustack(1981)found high degreesof priming even when the onset of the prime followed the onset of the to-be-primed target. This superficially anomalous result can be understood when we realize that processing of the target occursover a time interval that overlaps with presentationof the prime (Fischler and Goodman, 1978).In any caseit is apparentthat onset of facilitation is rapid. These studies also provide little or no evidence for a gradual rise in priming as claimed by Wickelgren (1976).That is, the size of the priming effect is nearly maximal close to its onset. The empirical picture is consistentwith the ACT analysis in which an asymptotic pattern of activation is patte*,.d"rapidly set up in the declarativenetwork, and this -q,,il6\ termines the rate of pattern matching. 9-'-) { Yl t t,. J 't " I*rrnecrloN wrr' pantsnN MnrcHtrgc The size of the priming effect dependson the difficulty of the pattern matching. Word naming (Warren, 1977) produces imaller priming effectsthan lexicaldecision. In a word-naming task one simply has to say the word, without checking it carefully for a subtle misspelling. (Indeed,word naming is basically implemented by Pl in Table 3.2.) similarty, McKoon and Ratcliff (1979)have found larger priming effects on item recognition than on word-nonword iudgments. Recognitioniudgment requires matching contextual information and so is more deminding of the pattern-matching apparatus.As noted earlier, ACT predicts this relationship between pattern complexity and level of activation. THn Srnoop PHTNoMENoN One challenging phenomenon that must be accountedfor is , the effect of associative priming in the Stroop task (Watren, 'pnez+).
of a word.It rhe $trogpteqk,inYglvpp"..naming"-the.sglor
takes longer to name the color if the word is precededby an associativetyrelatedword. In this task the priming effect is negative. To explain this, it is necessaryto assume,as is basicto all analysesof the Stroop task (for a review see Dyer, t973), that there are comPeting tendencies to Processthe color of the word and to name the word. These competing tendenciescan be represented in ACT by the following pair of productions:
P7 " :X'oT;[il:r',"1'ffi :n:;flt'"Jiifi", is the articulatory code for LVconcept and LVcode THEN generateLVcode.
t [t L
106 PB
IF the goal is to say the name of the word and Lvconcept is the identity of LVstimulus andLVcodeisthearticulatorycodeforLVconcept THEN generate LVcode.
Consider what happens when dogis presentedin red. The following information will be active in working memory: 1. 2. 3. 4. 5.
Spreadof Actiuation
TheArchitectureof Cognition
The goal is to say the name of the color' Red is the color of the stimulus. "Red" is the resPonsecode for red' Dog is the identity of the stimulus' "Dog" is the resPonsecode for dog'
Fact5 is a long-term-memory structurethat is made more active through priming of dog,There is a correctinstantiation of P7 involvirig iacts 1,1, and 3. However, there is a partial instantiation of-P7involving 1,2, andS, and a partial instantiation of P8 involving 1,4, andS. The instantiation of P7 is partial because red in2 doesnot match ttog in 5 (that they should match is indicated by the same variable LVconcept in the second and third clausesbf P7\.The instantiation of P8 is partial becauseworil in the first clauseof PB mismatchescolor in fact L. These partial matcheswill compete with the correct instantiation of P7, and the amount of competition they provide will be increasedby the primed activation of fact 5.7 ftris analysis does not identify the Stroop interference as being at thl level of either perceptual encoding (Hocf and or responsecompetition (Morton, 1969).Rather, it Egetf,, 1p/70) is taking place at the level of matching competing patterns to data in wbrking memory, where the critical piece of data (response-codeasiociation as in fact 5 above) is not a perceptual encoding nor a motor resPonse.This corresPondsmost closely to the "Jonceptual encodi.g" analysisof the Stroop task (Seymour, 1977;Stirling, 1979). .r This exampleshJws that increasedactivation does not inev{ \ tably lead to better performance.If that activation goesto a pro\ \ duciion responsibtefor generating the correct behavior, it will \ \ lead to better performance;if it goesto unrelated productions it I \ will have no effec| if it Soesto competing productions it will I i have a detrimental effect.
The Fact-Retrieval Paradigm In a typical fact-retrieval situation (Anderson, 1974;Anderson and Bower, 1973;Hayes-Roth, 1977;King and Anderson, 1976;Lewis and Anderson, 1976;Thorndyke and Bower, 1974), a fact is presentedthat subjectshave stored in memory (either a fact about the world, known by all subiects, or a fact learned as part of an exPeriment). Subfectsare simply asked whether or .-\ , not thev recognizethe fact, such as Hank Aaron hits homeruns. lndia (Lewis V i Ur lhey must^reiect{@ likeHank Aarcn-cqmesfrom \ The fact retrieval paradigm is a much more direct and deliberate study of retrieval than the typical priming study. It is not at all obvious, a priori, that the two paradigms would tap the same processes,and the relation between the paradigms needs further research.However, the preliminary indication is that the two phenomenado involve the sameProcesses.It is possible to prime the recognition of experimentallystudied material (Fischler,Bryant, and Querns, unPublished;McKoon and Ratcliff, 1979). Experimentally acquired associations have been shown to lead to priming of lexical decisions (Fischler,Bryant, and Quems, unPublished; McKoon and Ratcliff' 1979). A maior exPerimental variable in the study of fact retrieval { has been calledfan, which refers to the number of facts studied about a concept.The more factsare associatedwith one particular concept,the slower is the recognition of any one of the facts. ti \d In the current framework this manipulation is interpreted as afLr '! I fecting the amount of activation that is spread to a particular 'a is that eachconceptor node has a limited fact. The {r.t i "stumption j capacity for spreading activation, and as more paths are atdown . '-, I tached to it, the amount of activation that can be spread adexperimental y path are considerable There is reduced. I "t the control less directly or to more able in being | "u"tiges network. portion the of a to spread activation amount of I I
t
r
t. l
.t
' : j
I '
PnooucrroN
IIaPTEMENTATIoN
In Table 3.4 productions P1 through Pa model recognition and rejection of probes in the fact-retrieval experiment described by Anderson (1976, p. 258). Subjects studied locationsubject-verb sentences of the form In the bnnk the lawyer laughed. After committing the sentences to memory, subiects were shown four types of probes: (1) three-element targets,
which were identical to the sentencesstudied; (2)Jhtgg-l$"t
J
-l
108
The Architectureof Cognition
Table 3.4 Productions for perfoftnanceis a fact-retrieaaltask
$t
,so so qt (l.S
IF the goal is to recognize the s€ntence and the probe is "In the LVlocation the LVperson LVaction" and (LVaction LVperson LVlocation) has been studied THEN say yes. V2 IF the goal is to recognize the sentence and the probe is "ln the LVlocation the LVperson LVaction" and (LVaction LVperson LVlocation) has not been studied THEN say no. IF the goal is to recognize the sentence and the probe is "The LVperson LVaction" and (LVaction LVperson LVlocation) has been studied THEN say yes. P4 IF the goal is to recognize the sentence and the probe is'The LVperson LVaction" and the probe (LVaction LVperson LVlocation) has not been studied THEN say no. P1
tSJB
Se s.g sr:
=c
-o
o
E
'=
tr, o z ct
5 ah
I
a E I
:B c)
U'
J
/ I
ol
r/l\l \j.. )J(,1.
[E JJ
subiect,and verb that had all failp, consist-ins.gf..illgSllion, bea;iiidiiia u"I iiof i"'iliiit combination;(3)two-elementtar-
Figure 3.4 illustrates schematicallythe structure of production memory for P1-P4 and their connections to declarative memory. Each production is representedas consisting of two clauses,swhich are representedas seParateelementsat the bottom of the figure. The elements,called terminal nodes, perform
e o
I
s
,/
8.E
PnrrsRN Marcrrrnc Srnucnrnn
()
/
c=-
gets in which, for example, the subject and verb were from the target sentence; and (4) hpo-element foils in which the subiect and verb ciune from different sentences.Responseto these four types of probes is handled by productions P1.-P4,respectively, in Table3.4. Production Pl recognizes that the current probe matches a proposition previously stored in memory. The elements in quotes refer to the probe, and the elementsin parenthesesrefer to the proposition in memory. Pl determines whether the probe and proposition match by checking whether the variables (LVlocation, LVperson, and LVaction) bind to the same elements in the probe and the proposition. P2 will fire if the variables do not match in a three-elementprobe. Productions P3 and P4 are like Pl and P2 except that they respond to twoelement probes.
CT
Et-
;
.D
pr I
EsE
\')', -8
EHE
:l l l
\H
/
O
lrj
o= .E o.
os
Eb
$"r P-Y
*( 9 )r
€.F *€ E C )
ss T:
ES SE IT
\;:r l Rto F-5
.F.*g F EE
E$$
-!rJ
.S
I
:-tr4-
.S.8 g" EI.E
o c o,
3g
sH
$
o-
ld 6
z o: =e
*3 *R
9> >(r *o
x=
< tr,
fi= h
$is {* s Rsi
sE-* E
F-x
EHT E-iE
s€ I rsst
: g
a bo
ii
110
Spreadof Actiaation
TheArchitectureof Cognition
tests to find clausesin declarative memory that match them' These clauses are combined at higher nodes. The two-input nodeswith two positive lines perform testsof the variable identity between thl two clauses. So Pl checks that LVlocation, LVperson, and LVaction are the Samein probe as in memory structure. A negative two-input node like P2 will fire if there is input on its positive line and no comPatibleinput on its negative line. In Figure 3.4 declarative memory is partitioned into temporary memory, representing the probe, and long-term memlhe sludied facts.The main sourcesof activation bry, "t.oding are the individual elements(bank,Iawyer,laugh),which are encodings of the external probe. From these elementsactivation ,pru"Jr to the probe ur,d thtoughout long-term memory. The probe is connectedto two terminal nodes in production membry that test for the two probe pattems (two-elementand threeetement). The rate at which the nodes perform their tests is determined by the level of activation of the probe structure. Similarly, the propositions in declarative memory are connectedto the proposition terminal node in production memory. Again, the rate it *hich any proposition is processedby the pitt".t node is a function of the level of activation of that proposition. Also, the rate of pattern testing at the higher nodes in production memory is a function of the level of activation of the iata (declarativememory) elements being tested. In the caseof the positive P1 and P3, this level will be the sum of the activatiorrof the probe and the memory elements.In the caseof negative P2 a.,d P4, this level will be affectedonly by the probe activation. The absencetest in P2 is implemented by setting uP an inhibitory relation betweenPl and P2, and similarly, the absencetest in pe is handled by an inhibitory relation between P3 and P4. Strong evidence for P1 will repress P2 and prevent it from firing. If there is not sufficient evidence for P1, P2 will build up uuld"r,ce for itself and eventually fire. P2 in this model is set to accumulateactivation from the proposition twice as fast as Pl, so if there is not a good match to P1, P2 will repressit. This inhibitory relationship makesP2 wait to seeif P1 will match. The mechanismsof production pattern matching are described in Chapter 4. However, there are three important featuresto note now about how the pattern matcher treats P1'-P4: L. Pattem matching will take longer with high-fan probes, those whose elementsaPPearin multiple study sentences.The fan from an element reducesthe amount of activation that can
111
go to any propositional trace or to the probe encoding it. Pattern matching for targets is a function of the activation of the propositional trace and the probe. 2. It should take longer to recognize larger probes because more tests must be performed. In this experiment, three-element probes took longer than two-element probes. For ample additional evidencein support of this prediction,seeAnderson (1976,chap. 8). 3. Foils that are more similar to studied sentencesshould be harder to reiect. In this experiment "overlap" foils were used that had two of three elementsin common with a studied sentence. Subjects found these harder to reject than nonoverlap foils. Again, for many confirmations of this prediction, see Anderson (1976,chap. 8). ACT* predictsthat a partial match of the positive production pattem (for example, Pl) will inhibit growth of evidence at the negative production pattem (for example, P2). More generally,ACT* predicts difficulty in rejecting partial matches. RrpcrroN oF Fons An interesting question is, how does a persondecide that he does not know something?In these experimentsthis question is addressedby ACT's model for foil rejection.eThe most obvious model, which is obviously incorrect, is that subiects exhaustively search their memories about a concept. However, foils are rejectedmuch too quickly for this to be true; typically, the times to reject foils are only slightly longer than the times to accepttargets. Anderson (1976)and King and Anderson (1976) proposed what was called the waiting model,in which subjects waited some amount of time for the probe to be recognized. If it was not recognized in that time, they would reiect it. The assumption was that subiectswould adiust their waiting time to reflect factors, like fan, that determine the time taken to recognize targets. lmplementation of the waiting model.The current ACT theory provides a more mechanistic instantiation of the waiting model. As indicated in Figure3.4, a foil is rejectedby a production whose condition pattern detectsthe absenceof information in memory. If a production is looking for the presenceof subpattern 51 and the absenceof pattern 52, two pattern nodes are created.One correspondsto the positive coniunction 5L&52, and the other to 5L&-52. In Figure 3.4 the S1&S2conjunctions correspond to productions P1 and P3, and the 51& -52 coniunctions toP2 and P4.An inhibitory relation is establishedbe-
ll2
T'heArchitectureof Cognition
'
tween the positive S1&S2and the negative51& -52. Both positive and negative patterns receive activation from 51, but only the positivJpattern receivesactivation from S2.t0In the figure, the common subpattern51 refers to the encoding of the Probe, and 52 refers to the memory proposition. The 51& - 52 pattern builds up activation either until total activation reaches a threshold or until it is repressedby accruing evidence for the positive S1&S2pattern. A long-standing question in the ACT theory is how subiects adjust their waiting time to reflect the fan of elementsin a foil. Such an adiustment makes sense,becauseif they did not wait long enough for high-fan targets, they would be in danger of spuiiouslyrejecting them. However, for a long time there was nb plausible mechanism to account for adiusting the waiting titt i. The obvious idea of counting links out of a node and setting waiting time according to the counted fan is implausible. Buithe current pattern-matchingsystemprovides a mechanism for adiusting waiting time with no added assumptions.Note in Figure 3.4 that the fan of elementswill affectnot only the activation of the memory elementsbut also that of the probe encoding. This activation will determine the amount of activation thit arrives at the S1&-S2 coniunctionsin P2 and P4, and thus fan will causeactivation to build more slowly to threshold for the foil-detecting productions. It will also have the desired effect of giving high-fan targets more time to complete pattern matching. One should not conclude from this discussion that the only way of responding no is by this waiting process.As in the Glucksberg and McCloskey (1981)experiment, subiects can retrieve information that allows them to explicitly decide they don't know. They may also retrieve information that implies the probe is false(for example, RonalilReaganis a famousliberal senatorfrom Alabama).However, absencedetection by waiting is an important basic mechanism for concluding that one does not know anything about a particular fact. Testsof the waitingmodel.It is predicted that a foil is harder to reiect the more featuresit shareswith a target (leading to activation of 52 and S1&S2in the above analysis).As reported earlier, Anderson (1976)found that overlap foils including a number of words from the studied sentencetook longer to reiect than nonoverlap foils. The experiment by King and Anderson illustrates another kind of similarity that may slow down foil reiection. We had subiectsstudy sentencessuch as
Spreadof Actiuation
113
The doctor hated the lawyer. The doctor ignored the model. This was called a connected
set because the two sentences had
the same subiect. Unconnected sets were created in the same mold but did not have the samesubject.The task was to recognize whether verb-obiect pairs came from the same sentence. So hated the lawyer would be a positive probe and hated the model a negative probe. Negative probes or foils were always constructedby pairing a verb and an obiect from different sentences in a set, either connected or unconnected. Subiects showed no difference in speed of recognizing positive probes from connected and unconnected sentencesets, but they were slower and made more elTors in rejecting foils from connected than those from unconnected sets. The connected foils were spuriously connected from verb to obiect through the shared subiect, causing a partial match to the positive conjunction (S1&S2) and inhibiting the production that detected the absence(S1&-S2). Anderson and Ross (1980)have performed an extension of this logic to study what is nominally called semantic memory. Subiects studied sentenceslike The cat attackedthe snakein the first phaseof the experiment, then judged the truth of categorical probes like A cat is a snakein the secondphase. They were slower and made more errors in these categoricaliudgments when they had learned an irrelevant sentence,like The cat at' tackedthe snake,linking the two categoriesin the first phase. Anderson and Ross suggestedthat similarity effects in semantic memory are to be understoodin terms of spurious intersections. These similarity effects in semantic memory are the findings that it is harder to reject a pair of the form An A is a B the more similar A and B are. SoA dogis abird is harder to reject than A ilogis n rock. Thinking of the similarity effect in terms of the number of prior connectionsbetweenthe subiectand predicate of the sentencemakesit possibleto understand the experiments (Collins and Quillian, 1972)in which subiectsare slow to reject foils when the relationship between subiectand predicate is not one of similarity but someother associativerelation. Example sentencesfrom Collins and Quillian are An almondhasa fortune and Mailrid is Mexican. Another result that can be understoodin this framework has been reported by Glucksberg,Gildea, and Bookin (1982).They found that subjectshave considerabledifficulty in reiecting as
tl4
TheArchitectureof Cognition
Spreadof Actiaation
false statements like Some surgeonsare butchers, which have a high degree of truth metaphorically but a low degree of truth literally. This difficulty can be predicted because of intersec-
1'/ tions between subject and predilate. Qr L 4o1l- \-\J3< | ,o,tV u E t^"dt u - ; r / . .1- t ,r0(!1 rllr1ltf;r-
Errrcrs oF rHE Nuunrn o, sou*3*f
F
{
t
"f^*-i'^
r{8r(
In ACT*, activation spreads from multiple sources at once, and activation converging on one node from multiple sources will sum. A typical sentencerecognition experiment presents probes with multiple concePtslikeThe firemansnoredin the win' ery. Fireman,snored, andwinery all provide indices into memory and henceare sourcesfor activation. The amount of activation converging on the trace connecting these concePtsshould be the sum of the activation from eachconcePt.Thus, the time taken to recognizethe sentenceshould be affectedby the fan of each concept.As reviewed in Anderson (7976),this prediction has been consistentlyconfirmed, despite fairly strong efforts to get the subiectto focus on only one of the elements in the sentence. Another imptication of the sum model is that the more concepts provided for recognition, the more activation should accumulate. For instance, the sentenceIn the bank the lawyer mockedthedoctorconsistsof four major concePts-bank,lawyer, mock, anddoctor.If the subject is presentedwith two, three, or four of theseelementsand askedif all the words occurredin one sentence,there should be twice as much activation accumulated at the propositional trace with a four-element probe as with a two-elementprobe.rr Unfortunately, it does not follow that recognition times will be faster, becausethe subiect must perform more complex pattern tests to determine that more elements are properly configured.The evidencein Anderson (1976)was that the greatercomplexity of the pattem tests overrode the activation advantage.I have found a similar outcome in simulating pattern matchingwith networks such as the one in Figure 3.4. However, a recent unpublished experiment of mine has avoided this confounding. In this experiment subiectsdid not have to recognizeall elementsin the sentences.They learned to assign numbers to four-element (location-subiect-verb-obiect) sentencessuch as ln the bankthe lawyer cheatedthe doctor, then were presentedwith probesconsistingof all four elements,random subsetsof three, or random subsets of two. All the elements caunefrom one sentence,and subjectswere asked to retrieve the number of that sentence.The number could always
115
be retrieved from any two words in a probe,.so the complexity of the pattern matching did not increase with the number of elements in the probe (although it might take longer to initially encode the stimulus). In each case subjects only had to test if two words came from a sentence. The extra word or two in three- and four-element probes just provided extra sources of activation. In this experiment, subjects took 2.42 seconds to recognize two-element probes, 2.34 seconds to recognize threeelement probes, and 2.29 seconds to recognize four-element otf;her prediction of this model is that fan effects should b;y* smaller when the probes contain more elements. This is based Lt on the following inalysis: Suppose n elements, with f.an f , are'1\,: presented. The total activation arriving should be nA/f , and5ra,*,a recognition time should be a function of the inverse, or f /na. t' ' Note that the fan effect is divided by the number of elements . Another unpublished experiment contrasting recall ,".-Fli# ognition provides a test of this prediction. The material "r,d learned consisted of subject-verb-object sentences.The obiects were always unique (fan 1), but the subject and verb occurred in one, two, or three sentences. In the recognition condition, subjects saw a sentence and had to recognize it.In the recall experiment, they saw subject and verb and had to recall the object. In both cases the pattern-matching operations would have to identify subiect, verb, and obiect, so pattern complexity was held constant. Although the subject and verb might have occurred in multiple sentences, they occurred only once together, so the tobe-recalled object was uniquely specified. Table 3.5 presents the recognition and recall performance as a function of subject and verb fan. In this experiment the two fans were correlated; if the subject fan was n, the verb fan was n also. As can be seen, fan had a much greater effect on the recall condition than on
recognition. fio
v faieo6}-, y ,4 f -t+ lrh"r! pur,olfo^"!
Table 3.5 also presents the predictions for this experimbnt L L
rt'w :i:"s?iltrT:5fr'"'il111,i;'1fi :Ti:'*?':?::iil'T;f
the other recall condition was (2/f)A. The 2/f ref.ers to the summed activation from subject and verb; the 1 in the recognition equation refers to the extra activation from the one-fan object. Reaction time is predicted by an inverse function of activation, and a single intercept was estimated for the recall and recognition conditions. As can be seen, this model does a good job of accounting for the differences in the size of fan effects. In
116
Spreadof Actiaation
T'heArchitecture of Cognition
Table 3.5 Observedand predictedreactiontimesas a function of fan and zuhetherrecognitionor recallwas required' Recognition2
Recalls
Obs: 1.35sec Pred: 1.33sec Obs: 1.58sec Pred: 1.55sec Obs: 1..70sec Pred: 1.58sec
Obs: 1.54 sec Pred: 1.55 sec Obs: 2.07 sec Pred: 2.22 sec Obs: 2.96 sec Pred: 2.89 sec
1. Obs = observed;Pred : predicted; RT : reaction time. Correlation: r = .991; standard error of observed times : .07 sec. 2. Predictingequation:RT = .88 + l.Y/(z/f + 1). 3. Predictingequation:RT: .88 + 1.Y/(2/f'). Anderson (19S1) a similar but more elaborate model has been applied to predicting the differences in interference (fan) effects obtained in paired-associate recognition versus recall. JuocunNTs oF Cor.lNrcrEDNEss As discussed in Chapter 2, subiects can iudge whether elements are connected in memory more rapidly than they can iudge how they are connected (Glucksberg and McCloskey, 1981). Detecting connectivity is a more primitive patternmatching operation than identifying the type of connection. I speculated in Chapter 2 that detection of connectivity might be a property unique to propositional structures. Another unpublished study confirms again the salience of connectivity information within propositional structures and also checks for an interaction with fan. After studying true and false sentences of the form lt is true that the doctor hated the lawyer and lt is f alse that the sailor stabbed the baker, subiects were presented with simple subject-verb-obiect sentences (The sailor stabbeil the baker). They were asked to make one of three i.tdgments about the sentence-whether it was true, whether it was false, and whether it had been studied (as either a true or false sentence). The last judgment could be made solely on the basis of connectivity, but the other two required a more complex pattern match that would retrieve the studied tmth value. Crossed with the type of question was the type of sentence-true, false, which the question or a re-pairing of studied elements-about was asked. The terms could be either one- or two-fan. Thus the design of the experiment was 3 (types of question) x 3 (types of sentence) x 2(fan).
tt7
Table 3.5 presents the resultsof the experiment classifiedaccording to the described factors. Table 3.7 presentsthe average times and fan effectscollapsedover type of question or type of material. Note that subiectswere fastestto make studied iudgments in which they only had to iudge the connectivity of the elements.They were slower on true and false judgments. Significantly, the fan effect is497 msecfor true and falseiudgments but only 298 msec for studied iudgments. Thus fan has less effect on connectivity iudgments than on judgments of exactrelationship. This is further evidencefor the expectedinteraction between complexity of the pattern test and level of activation. Subjectsare also faster to iudge re-paired material than other material, but they do not show smallerfan effects.Thus, the effect of question type on fan is not just a matter of longer times showing larger effects. An interesting series of experiments (Glass and Holyoak, 1979;Meyer, 1970;Rips, 7975)hasbeen done on the relative difficulty of universal statements(4ll colliesare dogs)versus particular statements (Somecolliesare dogs, Somepets are cats). When subjectshave to judge only particulars or universals,they judge particulars more quickly than universals. On the other hand, when particulars and universals are mixed, subjectsare faster to iudge universals. This can be explained by assuming that subiectsadopt a connectivity strategyin the particular-only blocks. That is, it is possible to discriminate most true particucorrect Table3.5 Meanreactiontimes(sec)andpercentages (in parenthesesl in the truth experiment Question No-r.lu ettEsrroN True False Re-paired Fnu eunsrrox True False Re-paired
True?
False?
Studied?
1.859 (.e13) 2.612 (.832) 1,.703 (.e82)
2.886 (.841) 2.74't (.78s) 2.174 (.e31)
1.658 (.e03) 1..804 (.e03) 1,.&0 (.e70)
2.457
3.262 (.822) 3.429 (.774) 2.786 (.881)
1.896 (.862) 2.171 (.82e) r.728
(.u2)
2.863 (.801) 2.165 (.e62)
(.e2e)
Spread of Actiaatiott
TheArchitectureof Cognition
118
Table 3.7 Aaeragereactiontimes(sec)and fan effects(sec) in Table3.6 Type of question
Type of material
Avrnlcs RT 2.276 True? 2.880 False? 1.783 Studied?
True False Re-paired
2.335 2.603 1.999
Avrnecr FAN EFFEcrs .435 True? .559 False? .298 Studied?
True False Re-paired
.402 .435 .4il
lars (Sone catsare pets) from false particulars (Somecats are rocks)on the basis of connectivity. On the other hand, many false universals have strong connectionsbetween subiect and predicate-AU catsare pets and All ilogsare collies-which rules out the possibility of a connectivity strategy' The Nature of Working MemorY ,r i # I I '
Working memory, that subset of knowledge to which we at any particular moment, can be identified with have ".c"it portion of eCf'r memory. According to !hi-s.analysis the active knowfgdg.g Q!rucgu1ts-cg-rrently L-con$Sls--bp.th-..pf.-t9mp9rary apd the iCtii;A^p3i!9of tong-tenn"memory' Such bSJl.EeUgnCg4 *orking memory has been offered by a number " "ot."piiil;f researchers,including Shiffrin (1975).Sinceactivation varies of continuously, working memory is not an all-or-none concept' Rather, information is part of working memory to various degrees. MsMonY SPar.I What is the relationship between this conceptionof an active or working memory and traditional conceptionsof short-term memory? ih" tno*"ntary capacityof working memory is much greater than the capacity of short-term memory, which tradit-ionally was placed in the vicinity of seven units based on tP"l_T-".3.:119tI memory span (Miller, 1956).However, m_e-lnplY rather -than tnel memory of'.Wgrting capacity the-'sustained are se- ' There active. momentarily that information of amount !s vere limitationCon the amount of tnfbrmation that can be maintained in an active state in the absenceof extemal stimulation. The only element that sustains activation without rehearsalis
tt9
the goal element. The size of the memory sPan can be seen to i partly reflectthe number of elementsthatian be maintained ac- | tive by the goal element. Rehearsalstrategiescan be viewed as I an additional mechanism for pumping activation into the net- | I work. By rehearsing an item, or," -"les that item a source of activation for a short time. Broadbent (1975)has argued that memory sPan consists of a reliable three or four elementsthat can always be retrieved and a secondset of variablesize that has a certainprobability of retrieval. That is, subjectscan recall three or four elements perfectly but can recall larger sPans,in the range of seven to nine elements, only with a probability of around .5. The ACT analysis offered here correspondsto Broadbent'sanalysis of memory span. The certain three or four elements are those whose.activation is maintained from the goal. The other elements correspond to those being maintained probabilistically by rehearsal. According to this view rehearsalis not essential for a minimal short-term memory. Probably the correct reading of the relation between rehearsaland short-term memory is that while rehearsal is often involved and is supportive, minimal setscan be maintained without rehearsal(Reitman, \97'l', 1974; Shiffrin, 1973).It is alsorelevant to note here the evidence linking memory span to the rate of rehearsal(Baddeley,Thomson, and Buchanan, 1975). Peneptcu THn STSRNBERG The various paradigms for studying memory sPan have been one methodology for getting at the traditional concept of shortterm memory. Another important methodology has been the Sternberg paradigm (Sternberg,1969).In the Sternberg experiment, time to recognize that an element is a member of a studied set is found to be an aPProximatelylinear function (with slope about 35 msec per item) of the size of the set. This is true for sets that can be maintained in working memory. The traditional interpretation of this result is that the subject performs a serial high-speed scan of the contents of short-term memory looking for a match to the probe. I. A. Anderson (1973)has pointed out that it is implausible that a serial comparison Process of that speed could be implemented neurally. Rather he arguesthat the comparisonsmust be performed in parallel. It is well known (Townsend, 1974)that there exist parallel models which can predict the effectsattributed to serial models. In the ACT framework, these iudgments could be implemented by productions of the form:
t20
Spreadof Actiaation
T'heArchitectureof Cognition
P5
IF the goal is to recognizeif the probe is in LVset and LVprobe is presented and LVprobe was studied in LVset THEN say yes.
P6
IF the goal is to recognizeif the probe is in LVset and LVprobe is presented and LVprobe was not studied in LVset THEN Eayno.
These productions are of the same logic as the fact-recognition productions in Table 3.4. Their rate of application will be a function of the level of activation of the matching structure. The more elements there are in the memory set, the lower will be the activation of any item in that set because it will receive activation from the goal element and less maintenance rehearsal. A rather similar idea was proposed by Baddeley and Ecob (7973). The amount of activation coming to these productions will be a function of the fan of LVset, among other things. The amount of activation from LVset will be A/n, where n is the number of elements in the memory set. Then the total activation of the structure being matched to the probe is A+ + Afn, where A* is the activation coming from other sources such as LVprobe. Under the hypothesis that match time is an inverse function of activation, recognition time will vary as a function of 1/(A* + A/n). This predicts a somewhat negatively accelerated function of n, set size; the degree of negative acceleration will be a function of A*. Despite the textbook wisdom about the effect of linear set size, the obtained functions are more often than not negatively accelerated (Briggs, 1974). ACT's waiting process for absence detection also predicts that the time to reiect foils approximately parallels the target functions. That is, activation of the first clause in P6 (the goal is to iudge if the probe is in LVset) will be a function of the fan from LVset. Note that the ACT model is iust one instantiation of the class of parallel models for performing in the Sternberg task (see Ratcliff, 1978; Townsend, 1974). Variations on this analysis of the Sternberg task were offered in Anderson and Bower (1973) and Anderson (1976). However, at that time they stood aspost hoc analyses nearly identical in prediction to the serial account of the phenomenon. There was no independent evidence for this account over the serial one. Some recent experiments by jones (]ones and Anderson, 1981; Iones, unpublished) have confirmed predictions that discriminate between the two accounts. We compared associatively re-
tzl
lated memory sets (plane, mountain, crash, clouds, wind) with unrelated sets. Because of the associative interrelationships, activation of some members of the set should spread to others. Thus the dissipation in activation with increase in set size should be attiinuatedr Correspondingly, we did find smaller set size effects with the related memory sets. These experiments provide evidence that a spreading-activation-based conception of the Sternberg paradigm is better than the high-speed memory scan. That is, we have shown that the long-term associative relationships over which the spread occurs can facilitate recognition of an item in short-term memory.rz Appendix: Example Calculations It is useful to look at a few hypothetical examples of asymptotic activation patterns. Some involve rather simple and unrealistic network structures, but they illustrate some of the ProPerties of the spreading activation mechanisms. LrNsan CHanss Figure 3.5a represents a simple linear chain that starts with node 0 and extends to nodes'1.,2,3, and so on, with node t4connected to both n - 1. and n * 1, except for 0, which is connected just to 1.. Assume that all links in the chain are of equal strength. Assuming that node 0 becomes a source and that one unit of activation is input at 0, one can derive the asymptotic pattern of activation. Letting al refer to the activation of the ith artdk node in the chain, at = krt *h"t" r : (1 - \m)/P 2/(1 - rp). The exception to this rule is the activation for node 0 which has activation level ao = 1/(l - rp). Since r is a fraction, activation level decays away exponentially with distance from source. Assumingp : .8, which is a typical value in our simulations, t : .5, k : 3.33, and ae : 1.67. Note that although aeis only given one unit of activation as a source, reverberation increases its level an additional .67 units. Figure 3.5b illustrates another kind of linear chain, this one centered at node 0 and extending to - o and + 6. Again we can solve for asymptotic pattems of activation assuming one unit of input at node 0. Again activation decays_elpe4entially with distance flt: krt, and again r : (1 - Vl - pz)/p and A6 : 1/(1 - rp) , but now k : 1/ ( 1 - r pl. Thus activation tends to decay away exponentially with distance from source, and one can safely ignore the effects of distant structure. This is one reason why a network will reach a near-asymptotic pattern of activation rather quickly. That is,
122
Aclivolion Level
(o)
(b)
oooo-3^-r--,Co:,
Cr-r
t23
Spreadof Actiuation
The Architectureof Cognition
oooo
among the other nodes. We either ignore the existenceof the unanilyred structure or assumethat activation spreadsinto the unanalyzedstructure and never spreadsback. It is impossible to fully calculatereverberations into the unanalyzed,structure and back. The interesting question is, what are the consequencesof failing to consider the unanalyzed stmcture?Does it changeeverything in the analyzedstructure by a multiplicative scalefactor, or does it change the ordinal relations among the activation levels of the analyzed nodes? Is there some way to "correct" for the effect of reverberationthrough the unanalyzed structure without actually specifying the reverberation? The effect of the unanalyzed structure will depend on its properties.In one situation, shown in Figure 3.6, it does have minimal impact. One node X from the analyzed structure is represented,with no connections from the unanalyzed structuie to the analyzed structure other than directly through X.rg Thus, in a sense, the analyzed structure has captured all the "relevant" connections in memory. Assume that the relative strength of X's connections to the unanalyzed structure is s and hencJ the relative strength of its connections to the analyzed structure is 1 - s. We can classifythe nodes in the unanalyzed structure according to their minimum link distance from X. A node will be in level i if its minimum distance is f. Node X will be consideredto be level0. Let albe the total of the activationof all nodes in level i.By definition a node in level i has connec-
at of activation effects Figure3.S Twosimplelinearchains for calculating ilistance.
the distant structure that the activation takeslonger to reachhas tittle effect on the final pattern of activation. The data of McKoon and Ratcliff (1980)on linear chains in paragraphsare relevanthere. Th"y were able to show that priming decayedapproximately exponentially as a function of distance, as Predicted by this analysis. Uuexlrvzrp
Level 2
\/\
/
o.-o
\--\lr,/
/\^/
o
Level I
i
./o
uNANALYZED \l/
Level O
SrnucruRn
In typical applications one can specify only a small fragment of the-semanticnetwork becausethe full network contains millions of nodes. I will refer to the specified fragment as the ana' Iyzeil structureand the remainder as the unnnalyzedstructure,ln in application we assume that some nodes in the analyzed structure are sourcesof activation, and we calculatethe spread
ANALYZED
Figure 3.6 NodeX is connectedto sonteanalyzednetworkstructureand someunanalyzedstructure.Thisfigure is usedto determinethe structureor pat' with the unanalyzed effectof reverberation structure. analyzeil in the actiaation of terns
t24
The Architectureof Cognition
Spreailof Actiaation
- r, i + r, and possibly i.ra Let s1 lions only to nodes in level i be the relative strength of ail connections to level i-- l, s, the relative strength of all connections to level r, and ss the streirgth of all connections to level i + 1. sr + sz * ss = l. It will be as_ sumed that the same values for these parameters apply at all levels except 0. For all levels except 0 and l, the following equation describes the pattern of activation. at = PSrat-r * pszot * pssAsl
(3.11)
It can be shown that once again level of activation decays exponentially with distance such that at:
krr
(3.12)
2Pst
(3.13)
where
,: and
k-
psv r(1 -pssr-psz-pzsss)
(3.14)
and
ao:
YG_ psrr- psr)
(1 - pssr - psz - prsss)
(3.ls)
where V is the amount of activation input to X from the analyzed structure. Assuming arbitrary blt plausible values of s : . 6 7 , s t : . 6 , s z : . 1 , s g = . 3 , a g r r d p: .8, then r = .623,k :
1.340V, and ao : !.199V.
The point of this analysis is to show that reverberation from the unan alyzedstructuremay just murtiply by a constant the activation v of x, calculated on the bases t1r," analyzed struc_ "r called Tlir multiplicative constantmight be the correction fure. factor. Hence calculations for the ana-lyzedstructure can be adjusted by introducing multiplicative constants to represent ef_
-'"'""
125
fects of reverberation through the unanalyzed structure, given valuesfors, p, st, s2, and ss. Thus the typical practiceof analyzing part of the network need not lead to serious difficulties if thi relationship between the analyzed and unanalyzed stmcture approximatesthat shown in Figure 3.6.
Control of Cogtrition
lControl of Cognition
UMAN COGNITION at all levels involves choosing what to process.Alternatives present themselves,implicitly or explicitly, and our cognitive systemschoose, implicitly or explicitly, to pursue someand not others. W" qfi- { ent our sensesto only part of the environment; we do not per- [ ceive everything we sense;what we do perceivewe do not rec- t . ognize in all possiblepatterns; only someof what we recognize I do we use for achievingour goals; we follow only some ways of f pursuing our goals;and we chooseto achieveonly some of the I I possiblegoals. In ACT* many of these choices are made by the conflict reso- i lution principles. A theory of conflict resolution has an ambiguous status, given the current categoriesof cognitive psychol- ; ogy. ln certain ways it is a theory of attention; in other ways it t is a theory of perception;in other ways it is a theory of problem I solving; in still other ways it is a theory of motor control. However, it is not really a schizophrenic theory, but a unified theory facing a schizophrenicfield. Unfortunately, the idea of conflict resolution, as defined within production systems,is not familiar to cognitive psychology.While the term is more common in artificial intelligence, it is still not that common. Therefore, in this chapterI first set the stageand identify someof the relevant issuesand past work. Curent Status of the Field PnocnsstNc Dara-DnrvEN vERsusGo^al-DtREcrED According to one of cognitive psychology'srecurrent hypotheses,there are two modes of cognitive processing.One is automatic, less capacity-limited, possibly parallel, invoked directly 1,26
127
by stimulus input. The secondrequires consciouscontrol, has ,urrur" capacitylimitations, is possibly serial, and is invoked in response-tointemal goals.thl idea was strongly emphasized by Neisser (1967)in his distinction betweenan early"preattentive,, stage and a later controllable, serial stage. Lindsay andl Norman (1977)made the distinction between?ata-driven and\i conceptually driven Processingthe cornerstoneof their intro-.J ductory psychologytlxt. The iJea is found in the distinction of posnei tigiel and Snyder (1976)between automatic "r,Ji'ottrer and consciousattention. They argue that pathway activation ih" ryri"m can automatically facilitate the processingof some information without cost to the processing of other information. On the other hand, if attention is consciouslyfocused on the processingof certain information, this processingwill b-"f.acilitated but ai a costto the processingof what is not attended'r Shiffrin and,schneider (1977) and Schneider and Shiffrin (1977) make a distinction between automatic and controlled processingof information and endorse the idea that automatic away from irocessin[ ."t, Progress without taking capacity conprocessing controlled whereas oth"t ottg-oitg piocJssing, automatic and Schneider, to Shiffrin sumes."!".iiy. According processing ."r, o..tr in plrallel whereas controlled processing is serial. 6nly with a great deal of practiceand only under certain circumstancescan controlled information processing become automatic. LaBergeand samuels (79741arguefor a similar conception of the deveiopment of automaticity. Recentwork on problem solving (Larkin et al., 1980)has also found that a move _ t. rfl iro* goal-dire.i"ap data-driven processingis associatedwith growing exPertise. -'=-*A roi"*t at similar but not identical distinction, called bottom-up versus top-down processinS, is-frequent in computer processingstarts with the.d?!a.a3*If1e to science.Botlgffi llEitry-ro-fit work up to-tn
r,tg-
i
t". Ei$E-ggtl-t
i-nffiry1*
qgged"u'
t h ed i i t i r r cit6 n i- i l E l s o -.ffgf"gJAt"'"""', Interestingly, processing. pefreptual or moders in many Tound of augdesign in the as architecture, in advances continued mented transition networks (woods, 19701for parsing or the HARPY system for speech perception (Lowelre, 7976\, have blurred this distinction. In such systemsthe processingoccurs in responseto goals and data iointly' (that whether ot"-rt,rdies tasks that are basically-perc-eptual or basically system) is,- they start at the bottom of the cognitive ___v:-
t28
Controlof Cognition
TheArchitectureof Cognition
E
tart-at-the-top-qf the system), one must address the issue of how top-down processinf,and bottom-up processingare mixed. Thereare numerous interest-
GIVEN:M is the milpoint of AE ond 6
ing psychological resultsconcemingthis mixing. DSlibqAtely focssl4go{r9g.1ttgn!tg4en.e pgqceptual !qgk-gan facititailits e, 1973 gggellgrf, (LaBerg ; PosnerarirdSnj'dii: i'tg7il Cit"rge
M is the midPoint PROVE: of EF
ffi*q l"iinstani, lhat subiscts are fastei to recognize letters they expect . Context uaI factoriffi veTlse GIftrs on_-DE rcEDtual ived.
--'rne=rGts;rimnrynrevel
on the low level havebeen long
known and are now well documented in the laboratory. A recent surprise has been how much low-level processesaffect whlt is supposedly high-level processing. some of the early evidenceof this camefrom studiesof chessand the game of go, in which it was found that what separatedexpertsfrom novices. was the ability to perceive relevant pattems on the gameboard (chase and simon,]r9T3;Reitman, t9T6;simon and-Gilmartin, 1973).For an expert, the lines of development in the game are suggesteddirectly by the patterns on the board, just aJour perception of a dog is usually determined by the data, without any intent or plan to seea dog. The key to expertisein many problem--solving domains like physics (Larkin, 1991)or geometry (Anderson, 1982)is development of these data-driven rules. such rules respond to configurationsof data elementswith recommendationsfor problem development,independent of any higher-level goals. In the problem shown in Figure 4.L, from our work on geometr!, what distinguishes experienced from novice students is that experts very quickly perceive that LACM = aBDM even though they do not know how this fact will figure in the final proof. r
THB HEARSAY SvsrrrraeNu oppoRTuNrsrrcpreNNrr.rc Artificial intelligencehas been greatly concemedwith how to mix top-down and bottom-up processingin many domains. The HEARSAY architecture(Erman and Lisser, l9T9; Reddy et al-, 1973)was developedfor speechperception but has proven quite influential in cognitive psychology(for example,Rumelhart, 1971). In particular, Hayes-Roth-and Hayes-Roth (lgrg) have adapted it in their proposal for opportunisticplanning. This type of architectureprovides a set of comparisonsand coirtrasts
FDB Figure 4.1 A geometryproblemthatseruesto tlistinguishnoaicestudents immediatelyperceiue frim 'that expeiinced. Experiencedstudents how the f act wiII fit knowing = without ABDM LLCM into the final Proof. that are useful for identifyit g what is significant about the ACT architecture. Certain aspects of the HEARSAY architecture have much in common wittr production systems. For instance, there is a blackboard which, like working memory, contains a wide range of relevant data, organized generally according to level. In speech perception, HEARSAY's blackboard contained hypotheses about sounds, syltable structures, word structures, word sequences, syntax, semantics, and pragmatics. Numerous knowleitge sources,which are like productions, resPond to data at one levet and introduce data at another level' At any point in time the system must choose which source to apply, fto- a set of potentially relevant knowledge sources. fttis'is the conflict-resolution problem, and it is in the solution to this problem that the HEARSAY architecture differs most fundamentally from ACT. In HEARSAY, conflict-resolution decisions are made dynamically and intelligently by considering any relevant information. Various knowledge sources are responsible for evaluating the state of knowledge and deciding what should be done next. This contrasts with the simple, compiled conflict-resolution schemes of production systems' The irnenseY scheme has the potential for cognitive flexibility but
{( 130
*
at considerablecomputationalcost. One of the potentials of\ HEARSAYis that it allows for a radical shift of attention when a I I new hypothesis seemsPromising. The bpportunistic system proposed by Hayes-Roth and Hayes-Rofhis at interesting attempt to extend the flexible controi structure of HEARSAY to planning. The dominant view of planning (Miller, Galanter, and Pribram, 1960; Newell and 3itt ot, lgZZ; Sacerdoti,1977) seesplanning as a top-down, focused processthat starts with high-level goals and refines them into achievable actions. This is sometimes referred to as sucrefnementor problemdecomposition.In contrast, Hayescessiae Roth and Hayes-Roth claim that multiple asynchronous Processesdevelop the plan at a number oi levels. The particular I planning taskstudied by these researchers-subiects planning bl, I series-oferrands through a town-supported their view. In this task subjectsmixed low-level and high-level decision making. Sometimesthey planned low-level sequencesof errands in thi absenceor in violation of a prescriptive high-level plan. The researcherscharacterizedthis behavior as "opportunistically" iumping about in the planning spaceto develop the most Promising aJpectsof the plan. This certainly seemedin violation of successiverefinement and more compatible with HEARSAY's architecture,which can handle multiple focusesof attention at multiple levels. To lmplement this opportunistic control structure, HayesRoth and Hayes-Rothproposed a complex blackboardstructure that represents many aspects of the plan , simultaneously. Again, knowledge sourcesresPond to whatever aspect 9{ t!" plan seemsmosi promising. Figure 4.2 illustrates their blackLoard structure and iust a few of the knowledge sources (see Hayes-Roth and Hayes-Roth for an explanation). This structurl, containing multiple planes,each with multiple levels, is even more complex than the original HEARSAY blackboard. This causesproblems,becauseskipping among its many planes and levels makes unrealistic demands on working memory.2 Human ability to maintain prior states of control in problem solving is severely limited (chase, 1982; Greeno and simon, 1974;Simon, 1975).Indeed, in some situations, phenomena\ that appear to violate hierarchicalplanning are actually simple I failuresof working memory. For instance, subiectsmay pursue / details of a current plan that is inconsistent with their higher goals, simply becausethey have misremembered the higher goals. Hayes-Rothand Hayes-Rothhave made a clear contribution
131
Controlof Cognition
The Architectureof Cognition
EXECUTIVE P'ioriri.'+l
ProblrmDrfinirion
Dir,.torf
2 IF the goal is to place Y in a puzzle and Y fits into an availablelocation THEN place Y in the location and POP the goal. P3 IF the goal is to place Y in a puzzle and Y does not fit into any available location THEN set an intention to later placeY in a location and POP the goal. P4 IF there is an intention to placeY in a location and Y fits into an availablelocation THEN place Y in the location. P1
These productions describe apuzzle-solving behavior in which he starts fitting pieces randomly into the periphery and fits center pieces when the periph"ry pieces have been placed. PL is a production for randomly selecting pieces for placement. If the piece can be placed, P2 will place it; if not, P3 sets an intention to later place it. (The clause Y fits into location is meant to reflect recall of a memorized location-f. ]. has the locations of the pieces memorized.) P4 is a data-driven "demon" which will place a suspended piece when the right prerequisite pieces are
Control of Cognition
1'59
placed. ]. ]. does appear to return to suspended pieces when-he can place them. Of course, this requires that he remember his intentions. If there were many interior pieces he would not be able to remember all his suspended intentions. Fortunately, his puzzles have at most two or three interior pieces' Appendix: Calculation of Pattern Node Activation The following equations describe the behavior of the pattern matcher. For many reasons the simulation of the ACTI pattern matcher only appioximates these equations. In Part, this is because the simulutiott tries to create a discrete approximation of a continuous Process. In part, this is because of optimizations associated wiih pruning the proliferation of pattern instantiations. The actual code can be obtained by writing to me. As in the declarative netwdrk (see chapter 3), the change in activation of the pattern node, x, is a positive function of the input and a negative function of a decay factor:
lt*:Bn,(t)-p*a"(t)
(4.2)
where nr(t) is the input to node x. It is the sum of excitatorybottom-up input, e,(t), and of inhibitory input, i"(f). Thus n"(t): er(t) + i'(f)
(4.3)
The valu e of er(t) is positive, but the value of i"(f) can be positive or negative. The excitatory input depends on whether the node is a positive two-input node or a negative one-input node. If it is a positive two-input node, the excitatory factor is defined as the weighted sum of activation of its two subpatterns, y and z: er(t\ :
ru,' au(t')* t"r' a"(t)
(4.4)
where r* is the strength of connection from y to x.It is defined as:
fnt:
s3
(4.5)
> ,,
where the summation is over all patterns i that include y as a subpattern. The strength s1of a pattern f is a function of its fre-
The Architectureof Cognition
170
quency and recencyof exPosure.In the caseof a negative oneinput node r, its excitatory input is a function of the activation of the one-input node y that feedsinto it: er(t):ffi'rur'ao(t)
(4.6)
I t. -r^.(t'"
V t
,r.0*
5 | Memory for Facts
',vheren is a multiplicative factor greaterthan 1 to help I compete with two-input nodes. The inhibitory factor i"(t) in Eq. (a.3)is defined as the sum of all net inhibitory effectsbetween x and all of its competitors y:
i,(r)
i"o$)
(4.8)
where 8"(t) is again a measureof the goodnessof the tests, and Tr(t) is the activation of the superpatternsof r or the activation of x if it has no suPerPattems.The value of gr(f) can be positive or negative. The assumption is that the tests performed at the pattern node are relatively slow and require sometime to complete,and that evidence gradually accrueswith time. The actual simulation emulatesthe result of this gradual accrual of evidence by the following equation - 1) * increments(t- 1)] (4.9) 8"(t) : minlValue", 8"$ where
= f ' or(t- 1)'ffi increment"(f)
, . o q ' : ^ _ f'ut'l
Ft.r,{'(;\l
r t''''
I
,i-t
(4.7)
The net inhibitory effect between two nodes r and y is defined as i,u(t): g"(f)T"(r)- gu(t)Tr(t)
'
i..
(4.10)
where Value, is the total evidence for x when the pattern matching completes,and MAX, is the value if there is a perfect match. tn the case of a one-input node the increment is iust value is . f ar(t l), and this increases until a maximum achieved. The systemdescribed above is basicallylinear, with one important nonlinearity imposed on it. This is that the activation of i node is bounded below by zero.When it reachesthis bound, it ceasesto participate in the calculations.
Introduction UMAN MEMORY has been a maior domain for testi.g the ACT theory. Although the theory is more general than any particular memory paradigm, it is important to show that its assumptions, when combined, p.edi.t the results of specific paradigms. Every experimental iesult that can be derived from existing assumptions is further evidence for the accuracy and generality of those assumptions. The first Part of this chapter will discuss the ACT assumptions about encoding, retention, and retrieval of facts. The later sectionswill consider some of the maior Phenomena documented in the experimental literature on human memorln ErscopINc Encoding in ACT refers to the Processby which cognitive units (in the senseof Chapter 2) becomePerrnanentlong-term memory traces.When a cognitive unit is created,to record either some externalevent or the result of some intemal computation, it is placedin the active state.However, this active unit is transient. There is a fixed probability that it will be made a Permanent trace.The term trace is reservedfor permanent units. This encoding assumption is spectacularlysimple. The probability is constant over many manipulations. It does not vary with intention or motivation to learn, which is consistentwith the ample researchindicating that intention and motivation are irrelevant if processingis kept constant (Nelson, 1976;Postman, 1964).This is probably an adaptive feature, becauseit is unlikely that our iudgments about what we should remember are very predictive of what will in fact be useful to remember. 171
,,,r" "
L73
The Architecture of Cognition
Memory for Facts
The prcbability of forming a long-term memory trace also does not vary with the unit's duration of residence in working memory. This is consistent with research (Nelson, 1977; Woodward, Biork, and |ongeward, 1973; Horowitz and Newman, 1969)t that finds uninterrupted presentation time is a poor predictor of ultimate recall probability. However, Prc,bability of recall is found to increase with two presentations, even if those presentations are back to back (Horowitz and Newman, 1959; Nelson, 1977r. Similarly, Loftus (1972) found that duration of fixation on a picture part has no effect on the Probability of recalling the part but that the number of fixations on that part does have an effect. One would suppose that a second presentation or fixation has some chance of creating a new working-memory copy. Every time an item is reentered into working memory it accrues an additional probability of being permanently encoded. In this assumption is a partial explanation of the spacing effect (see Crowder, 1976, for a review)that two presentations of an item produce better memory the farther apart they are spaced. If two presentations are in close succession, the second may occur when a trace from the first is still in working memory, and a new working-memory trace might not be created. This is similar to Hintzman's (1974)habituation explanation of the spacing effect, or what Crowder more generally calls the inattention hypothesis. Consistent with this explanation is the evidence of Hintzman that at short intervals it is the second presentation that tends not to be remembered.2 Also consistent with this analysis is the evidence that memory is better when a difficult task intervenes between two studies (Bjork and Allen,1970;Tzeng,1973) than when an easy task intervenes. A difficult task is more likely to remove the original trace from working memory because it tends to interfere with the source nodes maintaining the trace. If the trace is removed from working memory, the second presentation will offer an independent opportunity for encoding the trace. In most situations the length of uninterrupted study of an item has small but detectable effects on the probability of recall, so I would not want the above remarks to be interpreted as implying that duration of study has no effect. The duration of uninterrupted study might affect the probability of encoding in a number of ways. First, if the subiect's attention wanders from the item, it may fall out of working memoY, and when attention returns the item will get the benefit of a new entry into working memory. In some situations it is almost certain that instance, traces exit from and reenter working memory-for
when a subject reads and then rereads a story that involves too many cognitive units to be all held simultaneously in working memory. In this situation the subject must shift his working memory over the material. The second effect of duration of study is that the subject may use the extra study time to engage in elaborative processing, which produces redundant traces. That is, while added duration of residence in working memory will not increase that trace's probability of being stored, the time may be used to create traces that are redundant with the target trace. For example, an image of a horse kicking a boy can beiegarded as redundant with the paired associate horse-boy. This possibility will be discussed at great length later in the chapter. this view is somewhat similar to the original depth-of-processing model (Craik and Lockhart, 1972; Craik and Watkins, 1973),which held that duration of rehearsal was not relevant to memory if the rehearsal simply maintained the trace in working memory. Rather, duration would have a positive impact only if the rehearsal involved elaborative processing. However, the current proposal differs from that in the claim that spaced repetition of an item will increase memory even if neither presentation leads to elaborative processing. The current proposal only claims that duration of uninterrupted residence in working memory is ineffective, which is more in keeping with Nelson's conclusions (1977) on the toPic. Additional presentations after a trace has been established increase the strength of the trace. All traces have an associated strength. The first successful trial establishes the trace with a strength of one unit, and each subsequent trial increases the strength by one unit. The strength of a trace determines its probability and speed of retrieval. Thus strength is the mechanism by which ACT predicts that overlearning increases the probability of retention and speed of retrieval.
172
Rrrrrgrror.t According to the ACT theory, a trace once formed is not lost, but its strength may decay. Studies of long-term retention show gradual but continuous forgetting. Based on data summarized 6y Wickelgren (1976) and data of our own, trace strength, S, is a power function of time with the form S:
t-b
(5.1)
where the time f is measured from the point at which the trace
L74
Memoryfor Facts
TheArchitectureof Cognition
was createdin working memory and the exponentb has a value on the interval 0 to 1. The smaller the value of b the slower the forgetting or loss of strength. It reflects the "retentiveness" of thJsystem. The function has a strange value at f : 0, namely, infinity. However, the strength value is only relevant to performanceat times f > 0. I regard this decay function as reflecting a fundamental fact about how the physical system stores *Jn ory in the brain (for example, seeEccles'sdiscussionllgT2l of neural effectsof use and disuse). Such a Power function is to : be contrastedwith an exponential function (for example,S at forgetmore rapid where a 17), which would produce much ting than is empiricallYobserved. dne interesting issueis what the retention function is like for a trace that has had multiple strengthenings. The ACT theory implies that its total strength is the sum of the strengths remaining from the individual strengthenings, that is, s:)f,-D
Rrrnrrvnr The probability of retrieving a trace and the time taken to retrieve are functions of the trace'slevel of activation. Retrieval of a tracein the ACT framework requires that the tracebe matched by u production or productions that generate the memory report. As developed in the last two chapters,the time to perform this matching will be an inversefunction of the level of activation. Thus the time, T(A), to generatea memory report of a trace with activation A is:
r@):l+B/A
(5.2)
where f1 is the time since the ith strengthening. Evidence for
this assumption will be given later, when the effectsof extelsive practice are discussed. Not elements. rt\rl interconnect sltlrrErrfD. that llrtEr(-urlltEut units fllaf cognitive unus are coSnltlve arg Traces lraces only do the unit nodes have strength but so do the element nodes they connect. Every time a unit node acquires an increment in strength, there is also an increment in the strength of the elements. An element node can have more strength than a unit it participates in because it can participate in other units and gain strength when these are Presented too. Thus all nodes in lhe declarative network have associated strengths, and all accumulate strength with Practice. The strength of a node affects its level of activation in two ways. First, the amount of activation spread to a node from associated nodes is a function of the relative strength, r11,of the link from node i to node i, as developed in Chapter 3. Let Sy be the strength ofj. Then rg : SlflxSlwhere the summation is over all nodes k that aneconnected to element i. Second, every time a trace is strengthened there will be an increment in the activation capacity of its elements. This capacity determines how much activation the element can spread if it is presented as Part of the probe (that is, the source activation capacity, in the terms of Chapter 3). This idea will be developed in the section on practice.
1,75
D
,t
v )q I tA ^i,.2,
In this equation, I is an "intercePt" Parameter giving time to perform processes like initial encoding that are not affected by itu.e activation. As developed in Eq. (4.1), the parameter B reflects factors such as pattern complexity (C), pattern strength (S), goodness of match (G), and number of interfering patterns (l). That is, B : CI/SG from Eq. (4.1). The memory trace will not be retrieved if the pattern matching fails or terminates before it is complete. Thus there is a race between some termination time and the time it takes to complete the pattern matching. For many reasons both the patternmatching time and the cutoff time should be variable. The pattern matihing can be variable because of variation either in activation or in the efficiency with which the pattern tests are performed. f)ifferent contexts can affect both the speed of patiern matching and the terminating conditions that determine the cutoff time. Without a detailed physiological model, it is foolish to make a strong claim about the form of this variability. However, in the past work on ACT (Anderson, 1974, 1976, 1981b; King and Anderson, 1976), we have assumed that the probability of a successful match before the cutoff was an exPonential function of level of activation: R(A) -
i
(5.3)
t - e-KAtB
(s.4)
This is most naturally read as the probability of an exPonentially variable matching Processwith mean B/A completing before a fixed cutoff K. However, it is possible to read it as the probability of a fixed matching Process completing before a variable cutoff. The probabilistic retrieval Process implies that if repeated memory tests are administered, the results should be a mixture
176
Memoryfor Facts
T'heArchitecture of Cognition
of recall of an item and failure to recall it, and this is observed (Estes, 1960;Goss, 1965;Jones, 1962).It is also observed that an it"n successfully recalled on one trial has a greater probability of recall on a second trial. The above analysis would appear to imply that recall of each trial is independent, but there are a r,trtt tut of explanations for the observed nonindependence' First, the above analysis is only of retrieval and ignores-the probability of encoding the trace. The all-or-none encoding pro."tt would produce nonindependence a-Tong successive recalls. Second, some nonindependence would result because the successful trial provides a strengthening experience and so increases the levei of activation for the second test. Third, nonindependence could be produced by item -selection effects if the items varied consideriUty it the level of activation they could achieve (Underwood and Keppel, 7962). An implication of this analysis is that if enough time was put into trying to retrieve, every item that has had a long-term trace formed would be successfully recalled. If the subject repeats the retrieval process often enouf,h he will eventuallY qet lucky and retrieve ih" ite^. Indeed, it has been observed that there are slow increments in the ability to recall with increased opportunity for recall (Bushke, 1974). In the rest of the chapter I will consider how ACT's general theory of memory applies to a number of the maior memory phenomena. Of cb.tts", in motivating the preceding theoretical discussion we already reviewed some of the theory's most direct application to data. |udgments of Associative Relatedness Chapter 2 mentioned that subiects can iudge that concepts in memory are connected independent of the exact relationship and can make connectedness iudgments more rapidly than is important to iudgments of exact relationship. This capacity memory performance in many situations. undlrstanding Therefore this chapter provides a more detailed analysis of two aspects of this phenomenon. One aspect concems iudging the thlmatic consiJtency of a fact rather than whether a fact has been studied. As Reder (1982)has argued, in many memory situations subiects are making consistenry iudgments rather than recognition iudgments. The second aspect concerns selecting a subnode of a concept for focus of activation, which proves to be a way of avoiding the effects of interfering facts.
Tnruerrc
fuocltENTs The experimentsof Reder and Anderson (1980)and Smith, Adams, ind Schom(1978)are examplesof situations in which subiectscan use this processof connectednessiudgment to verify statements. These researchershad subiects study a set of ficts about a Personthat all related to a single theme, such as running. So a subiectmight studY: Marty preferred to run on the inside lane. Marty did sprints to improve his speed. Marty bought a new Pair of Adidas. The subjects then had to recognize these facts about the individual and reiect facts from different themes (for example, Marty cheered the trapeze artist). The number of such facts studied a-bout a person was varied. Based on the research on the fan effect (see Chapter 3) one might expect recognition time to increase with the number of facts studied about the individual. However, this material is thematically integrated, unlike the material in the typical fan experiment. In these experiments, recognition time did not depend on how many facts a subiect studied about the individual. Reder and Anderson postulated, on the basis of data from Reder (1979), that subiects were actually iudging whether a probe fact came from the theme and not carefully inspecting memory to see if they had studied that fact about the individual. To test ihis ide", we examined what happened when the foils involved predicates consistent with the theme of the facts that had been studied about the probed individual. So if the subiect had stud ied Marty prefeneit to run on the inside lane, a foil might be Marty ran fiae miles eaery ilay (the subiect would have studiedran fiae miles eaery ilay about someone other than Marty). In this situation subiects took much longer to make their verifications, and the fan effect reemerged. We proposed that subjects set uP a rePresentation like that in Figure 5.1-, in which the thematically related predicates are alteidy associated with a theme node like running. A subnode has been created to connect the traces of that subset of the facts that were studied about the person. This subnode is associated with the theme and with the individual theme predicates. The subnode is basically a token of the theme node, which is the type. The figure shows two such theme nodes and two subnodes. Reder and Anderson proposed that in the presence of
178
TheArchitectureof Cognition
Memoryfor Facts
C h e c kS c h e d u l e Arrive ol Slolion Buy Tickel
S u b n o d eI
Woit for Troin Heor Conducfor A r r i v e o l G r o n dC e n l r oI M o r ly
W o r m e d u Pb Y J o g q i n g P r e f e r r e d I n s i d eL o n e R o nF i v e M i l e s
Sirbnode2
Did Sprinls Troce 6
Wonfed lo MokeTeom ughf NewAdidos
Figure 5.1 Networkfepresentation for the Reilerand Andersonexperiac' are organizeilinto two subnodes Marty ments.Factsabout cordingto theme.
unrelated foils, subiectssimply looked for a subnode attached to the theme of the Predicate. In the current ACT framework, this would be done by retrieving a subnode on the constraint that it must be on a path connecting subiect and predicate. If a subnode can be retrieved and if the foil is unrelated, subiects can respond positively without retrieving the fact- There would be no effect of fat out of the subnode. In the presenceof a related foil, subjectswould have to retrieve the fact from the subnode, and the fan effect should reemerge,as was observed. As illustratedin Figure5.1, Rederand Anderson had subiects leam two sets of facts about some of the individuals, for instance, some facts involving running and others involving a train trip. The structure in Figure 5.1 leads to a number of predictionJ about this manipulation. First, the number of themes studied about an individual should have an effect. A second theme should result in greater fan from the person and so
179
should dissipate activation, and this effect of theme fan should be observed whether the foils are related or not. These predictions have been confirmed by Reder and Anderson (r9go) and Reder and Ross (1983). Another prediction is that the number of facts learned about theme A shourd have no effect on the time to verify a fact from theme B (assuming at least one fact is studied about theme A). This is because the-fan out of the subnode for theme A has no effect on the activation of theme B nor on the activation of any of the facts attached to that theme. Reder and Anderson and Reder and Ross verified this preJiction that fan of the irrerevant theme has no effect on time to recognize a fact from the target theme. calculation of actiaation patterns. Activation patterns were calculated for a representation like that in Figure 5.1 by solving a set of simultaneous linear equations of thJform of E-q. (3.7)."In 9"i"g so, the amount of source activation from the'predicate (which for simplicity is represented by a single node) was set at 10 and the activation from the person at L, since more activation would come from.the multiple familiar concepts of the predicate. All the links in Figure 5.L were considered to be of'equal relative strength. The conditions of the Anderson and Reder ex_ periment can be classified according to the number of facts leamed about the tested theme (one oi three) and the number of facts about the untested them e (zero, or,", o. three). (The case of zero facts means there is not a second theme.) The pattern of activation was calculated in each of the six situations defined by $9 Srossing of these two factors. For each of these conditions Table 5.1. reports the level of activation of the node correspond_ ing to the trace in the presence of a target and also the level of activation of the subnode in the p."r.*u of a target, a related foil, and an unrelated foil. Note that the activation of the target trace decreases with the number of facts in the same theme as the probe, decreases when there is a second theme, but shows little variation "ei between one and three facts in the nontested theme. This is precisely the fan effect reported by Reder and Anderson when related foils were used. Th,us it appears, as hypothesized for the related-foil condition, that activation of the [rr.u controls i,rdgment time. It-was hypothesizecl that when the foils were unrelated, the level of activation of the subnode and not that of the trace would control judgment time. The subnode actually shows a reverse fan effect-greater activation for three facts-when a tar-
Memoryfor Facts
T'heArchitectureof Cognition
180
Table 5.1 Leuelof actiaationof variousnodesin Figure5,1 underaariousconditions
Node and situation Tencnr rRAcE OF TARGET
Number of facts about other theme
1 fact about tested theme
3 facts about tested theme
0 1 3
7.78 7.26 7.24
6.83 6.57 6.56
0 1 3
7.M 5.59 5.50
7.49 6.29 6.22
rN PREsENcE
Sunhloor rN PREsENcE OF TARGET
SusNopn rN PREsENCE OF RELATED
Sunnoor
FOIL
0 1 3
s.30 4.07 4.00
5.94 4.88 4.81
0 1 3
1.40 .70 .66
r.42
rN PRESENcE
OF UNRELATED FOIL
.75 .71,
get is presented as a probe. This is because there are more paths converging on the subnode in the case of three facts. Although these additional paths may not be direct routes from sources of activation to the subr(ode, they are indirect routes. Thus, presented with Marty arriaed at the station, activation can spread from arriue at station to train theme tohear conductor to subnode 1. Also note that the subnode has a high level of activation in the presence of a related foil. Although this level is not as high as for a target, it is sufficiently high to result in errors to such foils if the subject responds positively to any subnode on a highly active path. If the time to identify the relevant subnode is affected just by its level of activation, there should be a reverse fan effect, as Table 5.1 indicates. However, Reder and Anderson (1980) and Smith and colleagues both report no fan effects in the presence of unrelated foils. Reder (f982) and Reder and Ross (1983)
181
speculated that the subiects adopt a mixed strategyin this case, sometimes resPonding to subnode activations and sometimes to the trace. The direct and reverse fan effects would then tend to cancel each other out. Consistent with this hypothesis, Reder and Ross showed that when subiects are explicitly instructed to respond on a thematic basis (that is, accept both studied sentencesand unstudied but related sentences), they do show a reverse fan effect. Reder and Ross also found subiects slower to accept unstudied related sentences than studied sentences in their thematic iudgment conditions. This is to be exPected from Table 5.1 becaut"I. the presence of related foils (which are the same as Reder and Ross's related, nonstudied targets) there is a lower level of activation of the subnode than in the Presence of targets. RrrocusINc
oN SunrqoPrs
In the previous section the activation patterns were calculated on the aJsumption that the subiect spreads activation from the person node,iuch as Marty in Figure 5.1. In these calculations ihe activation from Marty is broken up twice before getting to the target predicate, once between the two subnodes and then among"the facts attached to the subnode. This implies that the activJon level of the traces should be no different if six facts are attached to one subnode than if six facts are divided between two subnodes. In both cases, one-sixth of the activation reaches the subnode. In fact, however, there is evidence (McCloskey and Bigler, 1980; unpublished research in my lab) ihat subiuctt ur" faster in the two-subnode condition. These and other results (Anderson,1976; Anderson and Paulson, 1978) lead to the second asPect of the subnode model, the refocusing Process. Even in cases where subiects must retrieve the specift fact, they can first identify the relevant subnode and theniocus activation on it. This is a two-stage process: first the subnode is selected, then activation spreading from the subnode enables the target fact to be identified' This subnode-plus-refocusing model explains the low estimate of the strength of prior associations that we have gotten in some previous experiments (Lewis and Anderson,7976; Anderson, L-981).As suggested in Anderson (1976), subiects may create an experimenlil subnode and use contextual associations to focus ot lt, which would largely protect them from the interference of prior associations. This model offers an explanation-for why people are faster at retrieving inforrration about familiar concepts. Pr"r.r*ably such concepts have a well-developed and
183
TheArchitectureof Cognition
Memoryfor Facts
perhaps hierarchical subnode structure that can be used to io.us the retrieval process on a relatively small subset of the facts known about that concePt.
sentences of the form The lawyer hated the doctor, they had to discriminate these sentences from foil sentences consisting of the same words as the target sentence but in new combinations. There were twenty-five days of tests and hence practice. Each day subiects in one group were tested on each sentence twelve times, and in the other group, twenty-four times. There was no difference in the results for these two groups, which is consistent with earlier remarks about the ineffectiveness of massing of practice, so the two groups will be treated as one in the analysis. There were two types of sentences-no-fan sentences, consisting of words that had appeared in only one sentence, and fan sentences, with words that had appeared in two sentences. Figure 5.2 shows the change in reaction time with practice. The fuictions that are fit to the data in the figule are of the form T : I + BP-", where I is an intercept not affected by strengthening, I + B is the time after one day's practice, P is the amount of practice (measured in days), and the exponent c is the rate of improvement. It tums out that these data can be fit, assuming different values of B for the fan and no-fan sentences and keeping I and c constant. The equations are:
182
Practice People get better at remembering facts by practicing them, and it should come as no surprise that ACT predicts this. However, the serious issue is whether the ACT theory can predict the exact shape of the improvement function and how this varies with factors such as fan. AccurrruLATIoN oF STRENGTHwITH PRAcrrcE ACT makes interesting predictions about the cumulative effects of extensive practice at wide intervals such as twenty-four hours. The reason for looking at such wide spacings is to avoid complications due to the diminished effects of presentations when they are massed together. ln a number of studies I have done on this topic, we have given the subiect multiple repetitions of an item on each day and repeated this for many days. In the analyses to follow, the cumulative impact of the multiple repetitions in a day will be scaled as one unit of strength. Assume the material has been studied for a number of sessions, each one day apart. The total strength of a trace after the pth day and just before d^y p + 1 will be (by Eq. 5.2): p
s:> sf-D l=l
T : .36 + .77(P - r/2)-'38
for no fan
(s.7)
/1
/1
/f,"v
(5.s)
It can be shown (Anderson, 1982)that this sum,is closely.ag9 = vr"rtl- 5 \,f^Jh .' proximated as:
cLbL-l,eo eLc+l
5:D(P)+"-a
(s'6)
D:s/(1 -b), and a:bs/(l -b). Thus where c:'L-b, strength approximately increases as a Power function of Practice (that is, P days). Note that not only will the unit nodes in these traces accrue strength with days of practice, but also the element nodes will accrue strength. As will be seen, this power function prediction coffesPonds to the data about practice. - l r l l l l l r L l t l r l l l
Errscrs
oF ExrENsIvE PRACTICE
A set of experiments was conducted to test the prediction that strength increases as a Power function with extensive practice. In one experiment, after subiects studied subiect-verb-obiect
lb
t5
20
25
DAYS OF PRACTICE
as a function of timesfor fan and no-fansentences Figure 5.2 Recognition practice.The solidlinesrepresentthe predictiotrsof the model desuibedin the text.
184
T : .36 + 1.15(P- r/2)-.36 for fan
Some fraction of the intercept , l(P\, is speeding uP' reflecting the strengthening of general procedures:
(s.8)
The value P - r/z is the average practice on a day p. Note that one implication of Figure 5.2 and of these equations is that a practiced fan fact can be faster than a less practiced no-fan fact. These equations imply that the fan effect diminishes with practice, but also that the fan effect never disappears. After p days the fan effect is .38(P - t/2)*.roaccording-tb these equations. Hayes-Roth (1977) reported data on practice from *hich she concluded that the fan effect disappeared after ten days and one hundred practice trials. However, this is not whai these equations imply, and Figure 5.2 shows that there still is a fan effect after twenty-five days and six hundred trials. perhaps the Hayes-Roth conclusion was a case of erroneously accepting the null hypothesis. Equation (5.6) showed that strength increases as a power , 4 function of practice. As will now be shown, this implies that S ,.', time should decrease as a power function. Recall that N\|n{'-'reaction 'S;i* \ the amount of activation sent to a trace from a concept is a prod-
ttr
Y /
uct.;f the activationemittedfrom the conceptand tr," ,irffi
,
I(P):\+IzP-"
RT(P):Ir*(6+nB'\P-c
4nr y"-l
sirength of a conceptlthen;rrysqe.ngrh K, i'*t;ri'pr#ii."z
will be s' + DP" where s' f R - a-fxomEq. (5.6).This ulr,r*"r (n that the grior strengthof $isEtabfeover the experiment. \/ { strength of one of n experimentai facts attached A ^^In:_l"lative will be r/n.if we assumethat subiectscan com(^^>t? i:"i:ept tll'6^', pletety filter out by a subnode structure any interference from q,tc1t":' PreexPerimentalassociations.This implies that the activation li^+,*t converging on the trace will be 3(s' + Dp+")fn, with the 3 reflecting the fact that activation is converging from three con;*|{ tV^* cepts (subject,verb, obiect). According to the earlier retrieval assumption,Eq. (5.3),recognition time will be a function of the inverseof this quantity: 4"'f-.,e
naarra--iha
A-
rL^
r-^^^
Rr(P): I(P).
-^-:tl
l- ^
4/
?,
.
nn+r\
r(p)+
#*.r:
..r
,
I *t"l t'2,4r1ts."rzl &
q , \ ij
.r{ aD\PJ,
nB'P-"
fr-
Rf(P) : I(P) * nB'P-c
_fl
; ,i
(s.e)
5
ffd;yll"? p= s,/0-a\ q=
l ^ ,n
blo htz
C*- fc{
(5.10), S =n",y'Q
>1,'^fr?*
5l rnf rr"e,,w;b
n .: S.,., b =' ,)., n.1 rrngi..-{l
),^
,-b -.-..|. L)
by settin'lr:
INrrnaCrION
where B' : B/3D and I(P) is the intercept after p days of practice. To the extent that s', the prior strength of the .or,cept, is small relative to the impact of the massive experimental practice, this function becomes nB'P-", and total reaction time is predicted to be of the form
S'=R.^o
(s.12)
Deviation of Eq. (5.12)required two approximating assumPis tions. The first ii that prioritrength S' is zero and the second the in that the general speed-upis at the samerate as speed-_u,p retrieval-of a specificfaci. However, Eq. (5.12)will yield a good fit to data in many caseswhere these assumptionsare not true' It is a more geneial form of Eqs. (5.7) and (5.8) that were fit to Figure 5.2. f,quations (5.7) and (5.8) can be derived from Eq.
.t
$'/DP" + 7)
(s.11)
where 12reflects that part of the improvement due to general practicel It includes strengthening of -the productions used in ihe task. It is assumedthal the general speed-upis at the same rate (parameterc) as retrieval of the specificfact. So we can rewrite Eq. (5..10)as
odthat f t h a ttrace t r a crelati erelatctivationemitted\.hZ,<J,r^1 ",^uRY) h rr nnnnonl il by ^a concept ifa function of its strengt]i If R is the prior ;t0
\
185
MemorY for Facts
T'heArchitectureof Cognition
'96,12- '39'B' : '38'andc : '36'
PnA,CTICEAND PnrOn FeUnrlnrrv BETWESI.I
One basic consequence of this increase in concept strength is that subiects can remember more facts about frequently-used concepts and can more rapidly retrieve facts of similar relative (1976) strength about more frequlntly used concepts. Anderson peoplefamiliar about facts retrieve can subiects that ,upoi"d Tid Kennedy is a senator-rnore rapidly than facts about less familiar people-Birch Bayh is a senator (this experiment- was he was still a senator). Anderson (1981'b)noted that done *t ". there are serious issues about whether pairs of facts like these are equated with each other in terms of other properties' In that reseaich report, I had subiects leam new facts about familiar or unfamiliar people and tried to control such things as degree of learning foi these new facts. Still subiects werg at an advantage both in-learning and in retrieving new facts about the more familiar person. We rlcently performed an experiment in which we compared time to verify sentences studied in the experiment like Ted Kennedy is in New York with time to verify other studied sentences hk;Btll lones is in New Troy. Subiects were initially more rapid at verifying the experimental facts about the familiar concepts, consistent witfr Anderson (1981). However, we also looked at the effects of fan and practice on these verification times. Figure
G',vr')1- jfr-vufti-D+ #,]l
t87
TheArchitectureof Cognition
Memoryfor Facts
5.3 shows what happened to the effects of fan and familiarity over nine days of practice. As can be seen, the fan effects were largely maintained over the period, diminishing only from .30 sec to .20 sec, while the familiarity effects decreased fnrm .30 sec to .12 sec. This is what would be predicted on the basis of Eq. (5.9). As practice P increases, the effect of prior strength S' diminishes. The functions fit to the data in Figure 5.3 are of the form I + B/(S + P") where I is the asymptote, B is the retrieval time parameter, S is prior strength and strength accumulated in original learning, P is the independent variable (number of days),
and c is the exponent controlling growth of strenglh' ]he 91"^four tity pc reflects strength after P diys. The value.of I for all reparameter B the conditions was estimated as .36 sec. Since values separate : 12* nB' from Eq' [5'12]), flects fan (that is, B of B were estimated for the no-fan (1.4? sec) and fan conditions (1.g4 sec). since s reflects prior strength, separate values of s were estimated for the familiar material (.88) and the unfamiliar estimaterial (.39). Finally, a single parameter for c, '31' was mated for all four conditions. of a On day 10, subiects were asked to leam some new facts people differeni form @iit lones hated the iloctor) about the old (those studied in the experiment) and about some new people not yet studied. Some of the new PeoPle were familiar famous ,,u*"r and others were unfamiliar. After learning these new facts, the subiects went through one session of verification for these. There was no difference in the time (.96 sec) they took to recognize the new facts about old familiar or new familiar p"oit". They took longer to recognize new f-1cfsabout the old (1.00 sec), so the practice did not completely unfamiliar people -differences between familiar and unfamiliar' eliminate the However, they took longest to recognize facts about the new unfamiliar people (1.06 sec), so the practice increased tle capacity of the unfamiliar nodes.
186
AN, UNFAMILIAR
t.40
Oruen
(-t lrJ
I u,J
F A N ,F A M I L I A R
F
1.20 z. o t(J
N O- F A N UNFAMILIAR
LrJ E
NO- FAN FAMILIAR
t23456789 DAYS OF PRACTICE nboutfamiliarandunFigure5.3 Recogttition of f nn andno-fansentences as a functionof practice. familiar concepts
Rnsurrs
The hypothesis that more frequent nodes-have greater a_ctivation capa.ity it consistent with a number of results in the literature. For instance, Keenan and Baillet (1980)found that subiects are better able to remember material about more familiar people and locations. They used an incidental-leaming paradigm in which subjects *rui" asked an encoding question about these concepts (is your bestfriend kind? versus ls your teacher kind? subLater, where the second is about a less familiar person). had studied pairs like teacheriects were asked whether they kinit. Subiects were better able to remember suctr pairs yh"l the people were more familiar. Keenan and Baillet also found that subiects answerecl the encnding questions more rapidly about familiar items, which is also what would be expected if familiarity affected level of activation. A similar concept is needed to understand the marvelous performance of the subject SF, studied by Chase and Ericsson (1981), who was able to increase his memory sPan to_over .ignty digits after hundreds of hours of practice. He was able to cJmmit these numbers to long-terrn memory with a rate of
188
T'heArchitectureot'Cognition
presentation typical of memory-span experiments and to reliably retrieve these items from long-term memory. Some of this performance depended on developing mnemonic encoding techniques, such as encoding three- and four-digit numbers into running times. He might encode 3492 as 3 minutes and 49.2 seconds, a near-record mile time. This allowed him to get his memory span up to more than twenty digits (for example, T running times x 4 digits : 28 digits). Presumably, SF's ability to remember these times reflects the fact that running times were more frequently used concepts for him than strings of four digits. He did not in fact have a prior concept of 3: 49.2. However, the various elements that went into this concept were sufficiently familiar that they could be reliably retrieved. Note that this mnemonic technique by itself left SF far short of the eighty-digit level. To achieve this, he developed an abstract retrieval structure for organizing the running times in a hierarchy. He stored these running times in long-tenn memory and then was able to reliably retrieve them back into working memory. Figure 5.4 illustrates the hierarchical structure proposed by Chase and Ericsson for SF's performance on an eighty-digit list. The numbers at the bottom refer to the size of the chunks he encoded mnemonically. If we ignore the last five numbers, which are in a rehearsal buffer, SF has a node-link structure that organizes twenty-one chunks in long-term memory. He had to practice with a particular hierarchy for a considerable time (months) before he became proficient with it. This practice was necessary to increase the capacity of the individual
emory accurate and nodes in this hierarchy so that they could support the caincrease can practice that rapid retrieval. This indicates in this ones the as such nodes, abstract pJ.t,y of completely such nodes, concrete more of capacity th" as well ur ,t*.[,,r", as people . e mnemonic fnfotu generally, an important comPonent of T"ty unfamiliar, techniqties is clnverting the task of associating strong eleweak elements to the taik of associating familiar, method key-word ments.s For instance, the critical step in the (AtkinsonandRaugh,lgTs)forforeignlanguageacquisitionis familiar word' to convert the ,rr,fltt itiar foreign wbrd into a practice,.the through familiar when the foreign word becomei recall of accurate for required longer conversion stef is no meaning. Recognition versus Recall is straightforThe difference between recognition and recall paradigm, recognition a In Parts analysis. ACT ward under the whether they asked is subiect th" uttd presented are of a trace is also asked to were studied. In a recall paradigm, the subiect activation conACT, retrieve other componenti of the trace. In If there is components. verges on the trace from all presented recognia In available' sufficient activation, the trace becomes In a trace' is a there that tion test the subiect simply announces More trace. the parts-of generate also recall test the sublect *ntf paradigm'- For of the trace is usually presented in a recognition and restimulus both recognition instance, in paired-isiociate recallonly paired-associate in but presented, ,po,,," are typically to test pairedthe stimulus is presented. Howevet, it is possible a stimulus' One associate ,".ogr,ition by simply Pfes-e1ling between successat would expect a high conditional probability the response' recognizing the st'imulus and success at recalling and fnde"J thut" is (Martin, 1957)' Parnnn-AssoclATg Rscocr'ItrloN recall, wolIn one interesting analysis of recognition versus to the associate paired a ford (1971) tried to ielate recognitionbf rethe of recall to lead would p.oUuUitity that the stimului
Figure 5.4 Representation of SF'shierarchicalstructurefor an eightydigit list. (FromChasennd Ericsson,1981.)
,por,ruur,dtotheprobabilitythattheresPonsewouldleadto fol guessing, recall of the stimulus. He showed that, correcting by the propredicted be could associate paired iecognition of a His model was basibabilities of forward and backward recall. if the subiect cally that a paired associate could be recognized the stimulus or could retrieve if," turponse from the stimulus
190
Memoryfor Fa.cts
The Architecture of Cognition
from the response. Let Py and P6 be these two probabilities of recall. On the assumption that the two directions are independent, Wolford derived the following equation for corrected recognition P": Pr:
Pt + ('l' - P)P,
(s.13)
under the ACT* theory the subject is viewed not as performing two independent retrievals in recognition but rather as converging activation from the two sources. This is an important *r"y it, which AcT* differs from the earlier ACT',s (Anderson, 1gi6) and their predecessor HAM (Anderson and Bower, 19731. Nonetheless, ACT* can predict the relationship documented by Wolford. Let As denote the amount of activation that comes from the stimulus and Aa the amount from the response. The probability of retrieving the memory trace will be a function of ihe activufiot from the stimulus in the case of forward recall, of the activation from the response in the case of backward recall, and of the sum of these two activations in the case of recognition. The following equations specify probability of forward recall, backward recall, and recognition: Pt :'1. - e-KAs
(5.14)
- e -x A ^
(s.15)
Po :1 Pr:
L - e-KGs+Anl - | -
:f-(1
(s-xes)@-KAa)
-Pr(l -Po)
: Pr * 0' - Pt)Pb
for nonindepenclent. while wolford found no evidence at all Allison, Wollen, (for instance, researchers other independence, nonindePenweak a {or evidence found have 1969) and Lo* ry, dence. Wonn RscocNtrloN have Another maior domain where recall and recognition is A subiect words. Siven been contrasted is memory for single by a list of single words and then must either recall (typically, be can perforrnance Recognition free recall) or recognize them. Acexperimentsin such much higirer than-recall performance (1972) and An."iai"g io the framewori set forth in Anderson the subiect that is assumed it (\972, 1974), derson-and Bower elecontextual Various the to words the forms traces linking more undoubtedly are elements contextual thi ments. Although node' complex, theylre often rePresented as a single context each where 5.5, Figure in islllustrated situation This simplified inrecognition model, this With trace. a to line corrlsponds
,WordI Word 2 o o o o
(5.16a) (s.16b) (5.16c) (s.16d)
Thus, even though there are not two independent retrievals, ACT can predict that the probability of recognition is the probabilistic sum of forward and backward recall as found by Wolford. The above analysis assumes that the probability of forming the trace is 1.and that all failures of recall derive from failures of retrieval. lf there is some probability of failing to encode the trace, the ACT analysis predicts that probabilities of forward and backward recall would overPredict probability of recognition because forward and backward recall would no longer be
191
CONTEXT
Wordi o o o o o Wordn
for a of the word-contextassociations Figure 5.5 Network representation singlelist.
192
'l"he Architectureof Cognition
lvlvlllury
volves retrieving the context from the word and verifying that it is indeed a list context. Direct recall involves retrieving list words from the list context. However, because of the high fan from the context, the subiect will have limited success at this. Thus the subject will use an auxiliary process, using various strategies to generate words that can then be tested for recognition. This has been called the generate-testmodel of recall. For a review of relevant positive evidence see Anderson and Bower (1972)or Kintsch (1970).The maior challenge to this analysis has come from various experiments showing contextual effects. These results will be reviewed in the next sections. The basic assumptions of the generate-test model are consistent with the current framework. The current framework makes it clear that recognition is better than recall because the fan from the word nodes is smaller than that from the context nodes. If the same word appears in multiple contexts the fan from the word node would be large, and this would hurt recognition. Anderson and Bower (1972, 1974) present evidence that recognition performance degrades if a word appears in multiple contexts. Errscrs
TRAIN
Luv.o
oTHERS2l?:0l
in the en' networkstructure of therelevant Figure5.6 A representation with associated numbers The ,oaing_rpirilirityexperiments. acti'. spreailing the with associateil strengths the aie ttte niais assumeil aation a-nilysis.Thisis thememoryrepresentation train-black' whenthe sibiecthasstudieil
or ENcoDTNGCorgrrxr
A large number of studies have displayed an effect of encoding context on both recognition and recall (for example, Flexser and Tulving, 1978; Tulving and Thomson, l97l; Watkins and Tulving, 1975). These experiments are said to illustrate threencoding specificity principle, that memory for an item is specific to the context in which it was studied. The experiment by Tulving and Thomson (1971)is a useful one to consider. Subiects studied items (for example,black) either in isolation, in the presence of a strongly associated encoding cue (white, say), or in the presence of a weakly associated encoding cue (train). The strong and weak cues were selected from association norrns. Orthogonal to this variable, subiects were tested for recognition of the word in one of these three contexts. Recognition was best when the study context matched the test context. We have explained this result in terms of selection of word senses (Reder, Anderson, and Biork, 1974; Anderson and Bower, 1974') or in terms of elaborative encodings (Anderson, 7976). These explanatiops still hold in the current ACT* framework. Figure 5.6 illustrates the network structure that is assumed in this explanation for the case of study with a weak encoding cue-that is, the subject has studied the paft train-black. Black, the to-be-recalled word, has multiple senses. In this case
lv,
,.*r
'+ i..
l-.,
,'*l
',F: t,
-t
,r'1 , t
I{ ;
to which is attached it is illustrated ashaving two senses,blackl., which is attachedthe the weak associatefialn, andblackz,to s t r o n g a s s o c i a t e w h i t e . T h e o v a l n o d e s i n F i g u r eto5othersl .6arethe leading tracesencoding theseassociations;the nodes associations'Similarly' andothers2representother unidenti fied and white represent it ,,oa"s at it bottom attached to train " only the multisimplicity, " unidentified associations.For other ple senses of blackare rePresented' there is only one At first, p"opi" often have the intuition that a number of disare there However, sensefor a word llkeblack one is likely to white, of the Presence tinct if similar,",',",. In color or a race of think of blackas referring to a prototypical. to associateit with people. In the fi"r"r,." 2itraiy one is liicely toy train' !ooa'o, the glistening color of a polished of the word chosense the determines conilxt The encoding that senseand, perhaps, sen, and a traJe is formed involving is tested, context will the encoding context. When the su6iect will spread activation and again determinl the sense chosen, fromthechosur,r"rrr".Theprobabitityofrecognition.willbc greaterwhenthesamesenseischosen,becauseactivationwill to the trace' t" ,pr"uding from a node directly-attached I t s h o u l d b e n o t e d t h a t a s e n s e f o r t h e w o r d c a n a l of,black sobechoThat is, the sense sen by means oirpruuaing activation. is the one that receivesthe chosen in the context of {rain-black greatestactivationfromtrainand,b|ack.InFigure5.6thiswillbe of.train and black. Thus, btackl,,which lies at the intersection
Memoryfor Facts
TheArchitectureof Cognition
194
there are two "$raves" of activation. The first determines the senses of the words, and the second spreads activation from the chosen senses to retrieve the trace. This same double activation process is used in selecting a subnode. Evrprncr
FoR MULTIPLE SnNss NoPrs
Although one can explain the encoding-specificity result by assuming multiple senses as in Figure 5.6, it is not necessary to assume multiple senses to explain why recognition is higher if the study context is presented at test. To see this, note in Figure 5.6 that the study context, train, is associated with the trace. This means that more activation will converge at the trace at test if.train is presented again, independent of any sense selection.a However, a number of additional results indicate the need for the multiple-sense-node explanation. One is that recognition is worse in a test context that promotes selection of the wrong sense than in a neutral test context that simply fails to present any encoding word (Reder, Anderson, and Bjork, 1974; Tulving and Thomson, 1973; Watkins and Tulving, 1975). For instance, after studying train-black, subjects are worse at recognizing black in the context white-black than when black is presented alone. Thus, it is not just the absenceof train that is hurting recognition, it is the presence of another context that actively selects against the original interpretation. The multiple-sense representation is important for under' standing the results of Light and Carter-Sobell (1970). They had subjects study a pair like raspberry-iam, with iam as the target word. At test, subiects were presented with raspberry-jam or strawberry-iam, which tapped the same sense of that word, or log-iam, which tapped a different sense. The identical pair produced the best recognition, and log-jam, which tapped a different sense, produced the worst. Figure 5.7 shows a schematic of the memory representation for their experiment. Simultaneous
\,a
LOG
TRAFFIC
RRY STRAWBE
'\
IrJ
3
j
a
L (l'
z ox
a u) (u ,tr F
lrJ td
q) lt q)
s
q) (r) q)
(? o
=
s
o
lrj
k
z
I z
9
= = o ()
J lrj E,
f
f cl
F
(t L q) t c) OO oo
E
o F
(nn/H
s .S Tq)
> - .E H ./'-( FH z I F J
lrJ E
\3
s s q) L
\-
:3 (J
i' L{ GGI la(J
o
Ee, o z oo'6
UJ
.SX Ha,
td
..-'ii o O.
:) (n
sc
G(d
Soo
(t)
CouprlerloN erup TneNsFoRMATroN
fb
x
While the production set in Table 7."Lis relatively elegant in that it reducesthe knowledge to a minimal set of generalrules, the rules underlying language generation probably are often more specific. For instance, it is reasonable to propose that composition forms rules like: IF the goal is to describeLVobject and LVobiectis known to the listener and LVobiecthasLVpropertythat is to be described THEN set as subgoals 1. To generatethe 2. To generatethe namefor LVproperty 3. To generatethe namefor LVobiect.
o
.s5 E.9
,n x@(
3,/ -4.
\H Fa
st. to
+ilstr
)l-
8U
,{o
o-'E' 3-q
J
\. \E
g
-s .9 srl.
Sor
tr€ r< bi
E a
00 fL
Communrcote
O
Unknown I
Ongoi ng Currenl \rr,
\
anriuur" f
ibul
I
- Attribute I "-Expensive
8uy
I Cofegorr
Agenl
I
I cofegory
- Lowyer
Recent a Comptetedl
Sue
-
\ Alfributes
Agenl
ouiect
Relof ion
Objecf Supposed
This pruduciitln woulcl g,eneratethe dccl"rrativcsctttettcc,Ir.lllsform it to question form, then say the transformed sentence' Thus ACT'; planning and transformation Processes(see Chapter 4) underlie the transformational component of English'{ The compilation process can apPly to this goal structure too, and it couli create ihu follo*ing production, which compiles out the planning: IF the goal is to question whether the proposition (I-Vrelation LVagent LVobiect)is true and LVrelationdescribesongoing LVaction and LVaction is currently happening THEN set as subgoals 1. To generateis 2. To describeLVagent 3. To generatethe name for LVrelation 4. To generateing 5. To describeLVobiect.
Predicole
Figure 7.2 Themeaningstructureunderlyingthegeneration in Figure7.1.
Thjs one production would be able to generate the phrase The rich doctor, which required three productions in Figure 2.1. This production speeds up generation considerably u.? gr""tty reduces the demand on working memory for retrieving"infoimation about goal structure. The original productions th"atgave rise to cornpiled productions like thii would still be arounJurrt would only apply in unusual circumstances where none of the compiled productions would. These hierarch.icalgoal structures permit the computation of transformations.s For instance, consider the question transforin English that converts The tawyer is iuying the car into 1a.tio1 the lawyer buying the car? The foilowing producti"on generates \ this transformation, starting with u q.r"ti"i meaning structure. IF the goal is to question whether the proposition (LVrelation LVagentLVobject)is true THEN set as subgoals 1'. to plan the communication (LVrerationLVagent LVobject) 2. to move the first word in the description of LVrelation to the beginning 3. to executethe plan.
In this case, information controlling generation of the auxiliary verb is has migrated to the main generation production. The previous prodiction planned a sentence, transformed it, and it; the iurrent production directly generates the if,"tr "*".rrt"d question. Such a compiled production would lead to more automatic and less resource-demanding sentence generation' OF LANGUAGEGNNNNNTTON SrCNrrrCINr PNOPERTIES This example illustrates some of the properties of language generatior, ui developed in the ACT* framework. Language to other cognitive activities, feneration is similar in character one. Sentences problem-solving a and its structure is basically until achievsuccessively goals are generated by decomposing structure hierarchical the produces This able-goals are reached. plans generation to apPly can iust of lariguage. Transformations slow, associate to tend We plans. problem-solving as to "tn"t conscious, ind effortful processing with problem solving and automatic processing with language generation. However, is there are many problem-solving situations where behavior editing-Card, text computer expert (for eiample, automatic Moran, and Newell, 1980) and situations where language generation is effortful, as in speaking a foreign language. The issue of conscious effort versus automaticity is one of practice, not one of problem solving versus language It is useful to consider how the ACT theory handles the insertion of lexical items into the sentence. Insertion of function
268
'I'he Architecture at' Cognition
words (such asthe)and inflections (such as-ed)is specified, like word order, by templatesin the action sides of productions for generation. Content words are retrieved from long-term memory through their meaning and inserted into the variable slots of these templates.sThis implies that the semanticfeaturescontrolling the insertion of a morpheme like the are implicit, hidden in the productions that generatethe. Thus the speaker will not be able to tell us what the means. On the other hand, the speaker should have conscious and declarative access to the meaning of a content term like doctor.6The separation between function words and content words is consistentwith the neurological evidence that word order and function words are affected by lesions to Broca's area and that content words are affected by lesions to Wernicke's area (Goodglass and Geschwind, !976. There are, of course,always other interpretations of lesion results,but this evidenceis at least suggestive. ACT is a considerable advance over our earlier work on language acquisition (LAs-Anderson, 1977) because it treats word order and function words together as parts of language templates. The earlier LAS found it much easier to leam languagesthat relied heavily on word order and did not deal well with highly inflected languages that relied less on word order. This is not a problem for the current system. In ACT syntactic knowledge is encodedseparatelyfor generation and for comprehension. Productions for the two behaviors may have striking similarities, but they are distinct. This is an unavoidable consequenceof the production system architecture. Knowledge is applied efficiently only by compiling specific versions of the knowledge for specific applications. It is possiblethat at some past time common rules underlay the generation and comprehension of language. These would be declarative rules that would be applied interpretively as discussed in the previous chapter. For instance, it is not unreasonable to suppose that initial generation and comprehension of a class-taught foreign language refer often to a common set of declarative rules of syntax. However, as discussedin the previous chapter, such interpretive application of knowledge is inefficient. Efficienry comeswhen productions are compiled that are specific to the intended use of the language. It is unlikely that infirst language acquisition the representation of syntax for generation is ever the sameas that for comprehension. Some studies(Fraser,Bellugi, and Brown, 1963;Petretic and Tweney,,l977)show that young children have accessto a syntactic rule in receptivecircumstances,but not in generative
LanguageAcquisition
269
circumstances,and vice versa. Moreover, it is not iust a caseof comprehension being ahead of generation or vice versa. Some studies (Schustack, 1979)have shown that the child has access to some rules in comprehension and different rules in generation. While generation and comprehension productions are distinct, some declarative knowledge is used in common by the two processes.For instance, the productions modeled in Table 7.1.referred to declarative facts about word-meaning connections. These same facts can be used in comprehension. My young son seems to have no vocabulary items in generation that he does not have in comprehension (excluding nonreferential terms like the, by, and so on). Comprehension may lay the groundwork for generation by building in many of the wordconceptlinks that will be used. There are probably other examplesof structures shared by generation and comprehension. Semantic properties such as animateness may be declaratively represented and used by both. Word-classinformation is another candidatefor common knowledge. However, all syntactic knowledge that has production embodiment is specific to use, and this is the major frac' tion of syntax. The fact that adults are quite consistent in their syntactic rules in comprehension and generation might seem to contradict ACT*'s separation between comprehension and generation. Even for children inconsistenry is probably the exception. However, this is not inconsistentwith ACT since both abilities must deal with the same language. Indeed, generation and comprehension productions can be generated from the same learning experience.That they agree is only testimony that the acquisition mechanismsare successfulin both cases.In addition, it is certainly possible that generation can be used to train comprehension and vice versa. For instance, comprehension productions can be acquired by using generated sentences. Also, failures of the comprehensionproductions to processselfgenerations can be used as evidence that thesegenerations are in error. The rest of this chapter is concernedwith acquisition. There are four maior sections in the discussion to follow. First, I will state some general assumptionsabout languagelearning, then I will report on two simulations of the acquisition of English.? The first simulation concernshow acquisition would proceed if it were not burdened by capacity limitations. This work has been reported elsewhere (Anderson, 1981c),but the current
271
The Architectureof Cognition
LanguageAcquisition
simulation is more advanced.In this effort, the concernis only with reproducing some of the most salient aspectsof languagelearning phenomena.The secondsimulation incorporateslimitations one might assumefor a child between the ages of one and three. Here the goal is to have the simulation Produce sentencesthat corresPondto those of the child. Finally, the chapter will discussmore generally ACT's language-acquisition facility, focusing on how it might account for the Purported universalsin the syntax of natural languages.
referentialwords before learning of syntax begins. There is evidence that children accomplish their initial lexicalization by having individual words paired directly with their referents (MacVihinney, 1980).(Certiinly this was very much the casefor my son.)Thi; assumptionallowsus to focuson the learningof ryt t"*; it is not essential to the working of the Program. Again, the child-language simulation will consider what happens when the child slarts without knowing the meanings of any words. In that simulation most of the initial learning is devoted to inducing word meaning; only later does the program pick up any intet"itir,g syntax. A function of the one-word stage in children's lanfuage acquisition may be to pennit this initial
270
Assumptions about Language Learning MseNrNc-UrrsnaxcE PATRINGS One basic assumption of this simulation is that the learner has accessto pairings of an utteranceand its meaning. This assumption underlies most formal analysesof languageacquisition (Anderson, 1977; Langley, 1981; MacWhinney, 1980; Pinker and Lebeaux,in Press;Selfridge,198'1.;Wexler and Culicover, 1980). such pairings can occur in several ways. The speakermay generatean utterance and the learner may figure out its meaning from the context. Or the learner may generate an utteranceand get a correctexPansionof that sentence.It is also possible for the learner to have memorized complete stringsand to usetheseas targetsagainstwhich to compareand correcthis generations(MacWhinney, 1980).For instance,if the learnergeneratesgoedbut recallshaving heard Toentto communicate past tense,he has a basis for correcting his generation. In the first simulation the program is always given correct pairings of meaning and sentence.This is an idealization in many ways. The learnerwill encountermany sentencesthat do not have their meaningsidentified, and he will have many tobe-expressedmeaningswithout getting feedbackas to the correct sentence.However, ACT would not learn on these trials, so the idealization avoids wasted cycles.A more seriousproblem is that in a realisticsituation the leamer would receivemispairings of sentenceand meaning. The first simulation ignores this difficulty, but the secondone will consider how a Program can cope with as much as 50 percent mispairings. This is an idealizition also in that the learner may be able to determine the meaning only of individual phrases, not whole sentences. However, the samelearning mechanismswill work for phrasemeaning correlations,as will be seen in the child-language simulation. Another assumption of this first learning simulation is that ntttnberof tlre the programknorvsthe meaningof a srrbstantial
lexicalization. IongtrrvlNc rHE PHnasESrnucruRs BeforeACT can learn from a paired sentenceand meaning, it must identify the sentence'shierarchicalphrase structure' For a number of reasons,inducing the syntax of language becomes easieronce the phrase structure has been identified: 1. N{uch of syntax is concernedwith placing phrase units within other Phraseunits. 2. Much of the ireative capacityfor generating natural-language sentencesdepends on recursion through phrasestructure units. 3. Syntacticcontingenciesthat have to be inferred are often localizedto phraseunits, bounding the size of the induction problem by the size of the phraseunit' 4. Natuiat-lu.rg,tug. transformationsare best characterized accordingto phraseunits, as the transformationalschool has argued. 5. Finallp many of the syntactic contingenciesare defined by phrase-unit arrangements.For instance, the verb is inflected to reflect the number of the surface-structure subiect. The ACT theory predicts that natural language will -have a phrase structure.-GivenACT's use of hierarchicalgoal struci.rr"r, it has no choice but to organize sentencegenerationinto phrases.The learning problem for the child is to identify the hierarchicaloiganization and make the hierarchicalcon"d.rlt,r trol of his behavior match that. seen this way, languageacquisition is just a particularly clearinstanceof acquiring a societytlefined:;kill.
,o
Also, even before it knows the language, ACT* cannot help but impose a hierarchical structure on a long string of words that it hears.That is, long temporal strings can only be encoded hierarchically,as was discussedin Chaptet 2. ACT has a set of perceptualprinciples for chunking a string of words as it comes in. I will specify these principles in a language-specificmanner, but it is an open question to what degree these principles are language-specificand to what degree they reflect general principles for perceptually chunking a temPoral string. A production system has been createdthat will actually aPPly these Principles to segment an incoming string. However, to ease the burden of detail in exposition and to get to the point, I will not present this production system but rather describe the principles it implements and their effect.sOne simplification is that the system receives sentencessegmented into morphemes, so this work completely ignores the issues of how the speech streamis segmentedor how morphemesare discovered.This is also an assumption of other efforts (Langley, 1981;Pinker and Lebeaux,in press;Selfridge,798'l';Wexler and Culicover, 1980), and I do not feel that it distorts the essentialcharacterof syntax acquisition. As it turns out, this simplification can createproblems that a child does not have. In breakingkickedinto kfck plus ed, we solved the segmentation problem for the program but createthe problem of how to decide that kick and ed are closely related. The graph-deformationprinciple. Natural-languagesentences usually satisfy the graph-deformation condition (Anderson, 1977),which claims that the hierarchical structure of the sentence preserves the structure of the semantic referent. The graph-deformation condition is illustrated in Figure 7.3. Part (a) is a semantic'network representationfor a set of propositions, and part (b) is a sentencethat communicatesthis information. The network structure in (a) has been deformeilin (b) so that it sits above the sentence,but all the node-to-nodelinkages have been preserved. As can be seen, this captures part of the sentence'ssurfacestructure. At the top level is the subiectclause (node X in the graph\, gaae, book, and the recipient (node Y) identified as a unit. The noun phrases for X and Y are segmented into phrases according to the graph structure. For instance,the graph structure identifies thatlives andhousebelong together in a phrase and that big, girl, liaes, and housebelong together in a larger phrase. Becausesentencesusually satisfythe graph-deformationcondition, one can use the semantic referent of a sentenceto infer its surfacestructure. For instance,the graph deformation in (b)
( o)
sMtLL
BI G RELATION
:::::'::l
i BOY ,CATEGORY
f otr*,tute ] .ott**t
AGENT
- n,*.
OBJECT
z I
cArEcoRY I
LIVES
t
BOOK
HOUSE
Figure7.3 An illusttationof the applicationof the graph-deformation condition-
identifies the location of the terms for which there are meanings in the surface structure of the sentence.Howevet' a term like the before big girl rcmains ambiSuous in its placement. It could either be part of the noun phrase or directly part of the main clause. Thus, some ambiguity about surfacestructure remains and will have to be resolved on other bases.In LAS the remaining morphemes were inserted by a set of ad hoc heuristics that worked in some casesand completely failed in others. One of the goals in the current enterprise is to come up with a better set of principles for determining phrase boundaries. Researchon acquisition of artificial grammars Provides empirical evidence for the use of the graph-deformation condition. Moeser and Bregman (1972, 1973)have shown that possession of a semantic referent is critical to induction of q/ntax, as would be predicted by the graph-deformation condition. Morgan and Newport (1981)show that the critical feature of the semanticreferent is that it provides evidence for the chunking of elements, as predicted by the graph-deformation condition. Anderson (1975)demonstratedthat languageswith referents that systematically violated the graph-deformation condition are at least as difficult to learn as languageswith no semantic referent.
275
The Architectureof Cognition
LanguageAcquisition
The graph-deformationcondition is violated by certain sentenceslnit have undergone structure-modifying transformations that creatediscontinuous elements.Examples in English are;The news surprisedFred that Mary wfls preSnant, and lohn and Bilt borrowedand returned, respectiaely,the lawnmawer' Transformationsthat create discontinuous elements are more common in languagesin which word order is less important than it is in fnlnsn. However, the graph-deformation condition is a correctiharacterizationof the basic tendency of all languages. The general phenomenon has been frequently com(see iretited ,rpo^ and hai been called Behaghel's First Law Clark and Clark,1977). Sentencesthat violate the graph-deformationcondition cause two problems. First, such J sentencecannot be hierarchically org"r,i""d by using the semanticreferent,as was done in Figure r.I. Fortr.,ut"ly, ncr has other principles for chunking. Second, it is necessaryto learn the transformations underlying these sentences.Aswill be seen, ACT can learn such transformations but with difficultY. Other principlesfor phiase-structureiilentification.The existence of nonreierential morphemes meansthat the graph-deformation condition in itself clnnot provide an adequatebasis for the phrase structuring of sentences.For instance, consider the placlment of the article a betweengaae andbookin Figure 7 '3(b)' birr"r, that a has no meaning association,there is no basis for deciding whether it is part of the verb phrase or the noun phrase.to assign such morphemes to the appropriate phrases, lhe simulation will have to use nonsemanticcues. A number of other cuescan be used for chunking a string into phrases. First, there may be pauses after certain morpheme-s and not after others. Normal speechdoes not always have such pauses in the correct places and sometimes has pauses in the *ror,g places;however, pausing is a fairly reliable indicant of phrase structure (Cooper and Paccia-Cooper,1980). Also, as booput and Paccia-Cooperdiscuss, there is information in the intonational contour as to the correct segmentation of a string. It is argued that parent speech to children is much better segmented than aduit tpe"ch to adults (seede Villiers and de Villiers, 1g7S).Furthermore, ACT does have the facility to recover from the occasionalmissegmentation. Children also occasionally missegment (MacWhinney, 1980)and, of course, they re-
Latin the segmentae will more frequently follow agricol,-wi!h which it is associated,than it will precedelaud, with which it is not associated. The differences in transitional frequencies would be sharperin Latin, which hasa freeword order, but they also exist in English. Thus, ACT can associateae with agricolif ae has followei agricolmore frequently than it has preceded Iaud.It strikes to*"" as implausible to supposethat people could keep the necessarystatistical information about morphemg-tomoipheme transitions. However, Hayes and Clark (1970)have shown that in listening to nonsensesound streamssubjectscan use differential transition probabilities as a basis for segmentation. Such information hai also proven useful in computational models of speechrecognition (Lesseret al',1977)' The foregbi.,g analyiis has assumed that the learner has no grammaticil rules for chunking the sentencestring. However, Ir, *"r,y situations the leamer will have rules that can analyze a subchunk of the sentence,for example,a noun phrase. In that case, the grammaticalanalysis can be used to help anchor the chunking of the remainder. rules.ln various simulations I have Applicition of segmentation I either used these-segmentationrules to chunk the strings, or prolearning the provided have skipped their application and gram wiih chunked iirings. Anderson (1981c)comParesthese i,vo methods. Learning pio,tes to be slower by almost an order of magnitude when the program must induce the segmentation, for two reasons.First, certain sentencescome in without adequate basis for segmentation, particularly early in the learning history when cleai-cut transition frequencieshave not been coirpiled and there are no syntactic rules available to anchor the sentencechunking. Second, misparsings can occur, and these have to be o.r"r.o*e. The existenceof misparsings in children's language is evidence that they have similar difficultiestn ttie two simulations reported on here, the program was provided with prechunked strings. The rate-of learning was ,to* enough to make another order of magnitude intolerable' All our siriulations have worked with less than 10,000 utterance-meaning pairings, while a child probably encounters many more in"h that in a single month. Therefore in manv *uyt our learning simulations are forced to be unrealistically
cover eventuallY. Another basis for segmentation relies on the use of statistics abottt morpheme-to-rnorphemetransitions. For instance, in
The initial chunking analyzesthe sentenceinto chunks and assigns meaning to eich chunk. These provide the necessary
274
fast. Fon*ratroN oF lNmrer Rurss
276
TheArchitectureof Cognition
ingredients for the formation of language-generation rules. That is, the learner can form the rule that he should generatethe structure associatedwith the chunk to communicate the meaning associatedwith the chunk. so, for instance, suppose ACT encounters the pairing of the meaning ((carry) (horsex) (farmer Y)) and the chunked string ((eqi + | (agricol * as) (port + ant)). Further, assumeACT knows that the meaning of equ ishorse, the meaning of agricol isfarmer, and the meanin! of port is carry. Then, it would form the following rules:e , IF the goal is to communicate((carry)agentobject) THEN set as subgoals 1. to describethe agent 2. to describethe obiect 3. to describe(carry). IF the goal is to describe(horseX) THENgenerate(equ+ i). IF the goal is to describe(farmery) THEN generate(agicol + as). IF the goal is to describe(carry) THEN generate(port + ant). As a shorthand, these rules are denoted: (relation agent object) + "agent object relation,, if relation involves carry (horse X) * equ * i (farmer Y) - agricol * as (carry)+port*ant. As discussed in Chapter 6, the claim is not really that the learner forms such production rules from a single eiperience. Rather, a single experience would only create a declarative structure that could be interpreted analogically to guide later generations.ro Production rules such as those abJve would eventually be compiled from the analogical application. Howeler, again to speed up the rate of learning, the program creates these rules in production form right away. In general, the simulation will leam in the following way.
Tlrcprrrgrrttn rlitrlsttttllu)tkn(lwi116 rrrryof tlrasyntnctlc nrtcoof lhc Inrrguit[ftr. ln llrlnulluntlrrrr tlrc pprgrntnctulalllrernot Hcnrtltlkrrutylhltrgrn't,rnllrrlllrrrr'h r)ll Rrnn0 tletrrtrlt urr,lerb,rreJorr nlnrclrrrt'orttl nnllence In ilre rncnrrlrrg rtfenent.'l'lrerrlr sonte eviderrcethat childrenuse default word orders;that is, they
LanguageAcquisition
Zn
generateword orders not found in adult speech(Clark, 19TS;de Villiers and de Villiers, 1978;MacWhinney, 1980).In any case, the program will start to acquire rules like those above. These will require generalization and discrimination, which much of the learning history will be concernedwith. AIso, the program will encounter sentences that violate the graph-deformation condition and require learning transformations. My simulations of child language acquisition have focused on the acquisition of rules for phrase structuresand transformations and on the discrimination, generalization, and strengthening of these rules. The learning mechanisms involved are those of the previous chapter, but they have distinctive applications in the context of languageacquisition. The next section, on the acquisition of a rather demanding fragment of English, illustratesthis application. A CompetenceExample The first simulation involved learning a fragment of English that had a heavy weighting of the verb-auxiliary system and question transformations. The sentences could contain the modalscan (able),could(able),should(obligation),would(intention), will (future), did (emphatic), do (emphatic), does (emphatic), and may (possibility), with the corresponding meaning componentsin parentheses.Thesemeaning componentswere not assignedto the terms, but rather had to be induced from the context. The sentenceswere also marked for tense and, optionally, for perfect, progressive, and stative. There were sets of four adjectives,eight nouns, six transitive verbs, and four intransitive verbs. Among the words were mun, hit, shoot, and run, all of which have irregular inflections. Therefore another problem for the simulation was to learn the special inflections associatedwith these terms. Finally, the training set involved yes-no questions and wh-questions. The learning history involved 850pairings of target sentences with meaning structures. For each sentence the program generated a candidate sentenceand then tried to learn by comparing its generation to the feedbackprovided by the target sentence. Table 7.2, later in the chapter, illustratessome of the gentences in tlrat learninghistory nlongwith the nenteCIcen f,en'l'lto 0ffilatlhy tlru lfl\rll,mnl, puctrllnruerrrnntlc cftnrnctcr oi t[e itlnlrJnr.r'H rlr,r,lvurlrrun llrtr lirr.lllrrrt llrly wt,t.0g0nsrnlelrlrflrt,
durrrly wlthlrrllruuyrrlnrllc unrslrnlnlr, lfy atlfrirtlng llrurnrrdonrgurertrllurr wernntlctheeentenceu moreconllrnrunreleru, plex on the averagewith time. However, this gradual increase
LanguageAcquisition
The Architectureof Cognition
278
in syntactic complexity is not particularly critical to performanceof the simulation. SeeAnderson (1981c)for a rather similar record of successon a rather similar subset without gradually increasing syntactic complexity. Tnr Fnsr SrNrrxcs The first target sentencepresented to the program was The boys are tatt; its meaning structure is illustrated in Figute 7.4. All the relevant semantic information controlling this syntactic structure is present in this figure, including the subject's definite and plural featuresand the predicate's present and stative features. However, it is unreasonable to suPpose that the learner actually knows that all these features are relevant to the communication. Supposestatiae and definite ate thought to be relevant, but not presentor plural. Then we should represent the hierarchical structure imposed by the to-be-communicated structure as ((stative(*tall)) (definite (*boyy))).tt In all meaning structures used in the simulation, the relational descriptors are embedded around the core relation (for example, tall) of the communication and similarly around the core nouns. This will produce a "nouniness" ordering on words in the noun matrix and a "verbiness" ordering on verbs in the verb matrix. As discussedearlier in the chapter, the sameeffect can be achieved by a data-specificity principle of production selection. However, this simulation did not have that conflict-resolution principle built into it in a way that reflected the specificity of the meaning structure. Therefore one might think of noun-phrase and verbphrase embedding as implementing this aspectof the specificity principle. The program had no syntactic rules at this point, so it fell back on the default rule of generating the terms it knew in the order they occurred in the semantic referent (generally ordered as
279
the predicate-obiector relation-agent-obiect).since it knew sentence target The boy. tall generated iords for tati andboy, it (are it received as feedbick uit"t chunking was ((the (boy + s),) to led meaning the and sentence (tall))). comparing the target it formed phrase noun the From .r"riior, of generu'tiotrrules. the rules: 't,. (Definiteobiect)+ "the obiect"' 2. (*BoYterm) + "boY+ s"' From the main clause, the following rule was learned: 3.(Predicateobject).-+,,obiectpredicate,'ifpredicateascrib +tall.
and from the predicate it formed the rule 4. (Stativeattribute)+ "are attribute"' 5. (*Tall)+ "tall." Each of these rulesrz was entered with one unit of strength' These strength measures increased or decreasedas the rules proved successfulor not. THr SscoND SENTENcE (boy)) The next pairing involved the target sentence ((the strucmeaning the paired was (tawyer))) and yith (shoot+ s) (a The V)))' ("lawyer (indefinite (*bov W)) (definite iure ((*shoot) verb-agentThe * lawyer. s boy the program generated shoot fhe lU;J.t ordering is just the default ordering. The noun phrase for learned 2 and 1 rules boys was gene-ratedby application of the previous sentence The program formed rules to handle the casesfor which there were only default orderings:
PLURAL PRESENT ATTRIBUTE TTRIBUTE ATTRIBUTE
OEFINITE Figure 7.4
EDICATE
STATIVE
*TAI-L
The meaningstructureunderlyingthe sentenceThe boys are tall.
6.(Relationagentobiect)+,,aSentrelationobiect,,ifthere|ation is about *shoot. 7. (*Shoot\+ "shoot * s." 8. (lndefinite obiect)+ "A obiect"' 9. (*LawYer)+ "Iau)Yer"' which Rule 1, which was leamed from the first sentence and strengthwas so and g"."r"i" d, the in this context, was correct s, was Ined by one unit. Rule 2, which inflected the noun with disapand strength incorrect and so was weakened to zero
280
The Architectureof Cognition
. peared.rsAn action discrimination was formed. A searchwas made for a feature that was true of the current context and not trug o{ the previous use of.boy + s. Although this was not the only difference, a random searchretrieved tf,e fact that the term was singular in the current context. so the following rule is formed:
LanguageAcquisition
generatedby ACTand the feeilbackprouideil Table 7,2 Samplesentences
some of the more interesting of the first twenty-five sentences are illustrated in Table 2.2. Note that sentence 9 is the first one in which-ACT's generation matched the target sentence. It acquired the rule for the trom the first sentence.From the sixth sentenceit learned the agent-actionordering for iump and the ed inflection. From sentence7 it learned the slnfleltior, for lautyer. At this point all of the grammaticalrules are specific to single concepts. ACT's performanceon the tenth sentencereveals that its performance on the ninth was luck. Here it uses the s inflection for lawyer when the noun is singular. It also incorrectly uses the ed inflection fordance(acquired?ro* the third sentence).This leads to a set of three discriminations. The s inflection ior lawyer is strong enough to justify a condition discrimination u, *"ll u, an action discrimination: 11. (*Lawyerterm)+ "lautyer* s,'if termis plural. 12. (*Lawyerterm)-+ "luutyer,,if term is singlhr. It also makesan action discrimination to give the s inflection for dance. Contrasting sentence 3 with ,"rrt-"r,." L0, it selects the tense feature:
Sentence generated by ACT
Sentence number
BoY
1
relt, snoor
3
DANCE TIIE LADY
THE LADY DANCE ED
6
'UMP THE FARMER cooD THE LAwYER
THE FARMER S JUMP ED THE LAWYER S ARE GOOD
10
THE LAwYER s IUMP ED THE LAwYER s DANcE ED
THE LAWYER S TUMP ED TI{E LAWYER DANCE S
14
Krss rHE
THE FARMER S ARE KISS ING
7 9
THE BOY S ARE TALL
rIrE
BoY s LAwYER
FARMER A BoY
(*Dance) + "dance * s" if the tense is present.
In thesediscriminations ACT is just looking for some feature in the semantic or syntactic context where tlie correct and the incorrect applications differ. ACT is biased to detect differences cfoserto the discriminatedterm (for example,dancesin rule 13) beforemore clistantdifferences.so ACT will use the numb., oi thc sulrjectnoun to ctlnstrain the subiect-nouninflection before it will considerthe number of the obiectnoun for that purpose. This reflectsthe principle of selectingdiscriminationJ according to level of activation, which was discussedin chapter 6.
THE BOY SHOOT S A LAWYER
THE BOY S ll7
THE TALL LAwYER HAs Is
THE TALL LAWYER HAS
170
IUMP ING soMB FARMER s Hrr
ED THE
BEEN JUMP INC SOME FARMER S HIT THE
208
THE DocroR
MAY TICKLE ED
THE DOCTOR MAY TICKLE
358
THE sAILoR s MAY ARE
472
wHo
632
THE FUNNY sArLoR
751
wHo
LADY
LADY
THE FUNNY FARMER
THE FUNNY FARMER
BEING BAD wAs rHE
THE SAILOR S MAY BE BEING BAD
FUNNY
WHO WAS THE FUNNY
LAWYER BEING HIT ED BY s rs
KISSED BY THE BAD BOY HAVE RUN ED
NO
THE GIRL HAs RUN ED
790
ARE soME DocroR
805
HAs A sArLoR
811
wHo
s BEING
HIT ED BY SOME LADY S RUN ED
MAY BE sHoor
LAWYER BEING HIT BY TTIE FUNNY SAILOR S ARE XISS ED BY THE BAD BOY WHO TIAVE RUN
ED BY
SOME GOOD LAWYER S
THE GIRL HAS RUN ARE SOME DOCTOR S BEING HIT BY SOME LADY S HAS A SAILOR RUN WHO MAY BE SHOT BY SOME GOOD LAWYER S
815
THE ANGRY BoY cAN
824
THE SMART LADY S MAY
835 838
SOME MEN DANCB
SOME MEN DANCE ED
SOME TALL GIRL S MAY
SOME TALL GIRL S MAY
BErNG
BAD RUN ED
13.
Feedback fmm target sentence
2
10. (*Boyterm)-- "boA" if term is singular. THn Frnsr TwnNry-FrvE SnNruNcns
281
SHOOT ED THE ANGRY SAILOR WOULD THE 8OY S HAVE RUN ED
TTIE ANGRY BOY CAN BE BAD THE SMART LADY S MAY RUN
SHOOT THE ANGRY SAILOR WOULD THE BOY S HAVE RUN
282
The Architectureof Cognition
LanguageAcquisition
Acr will also consider elements in the semantic structure before it considers the syntactic structure (goal structure). other than that, it chooses randomry from u*6.g the possibl" f.rtures. Therefore,the fact that all three discriminations were correct in the above example was somewhata matter of luck. sentence14 is a good illustration of an erroneous discrimination. Note that ACT had generated the term a where the target used some.ln responseto this difference ACT formed an acti,on discrimination. This time, by chance,ACT noted that the successfuluse of a had not been in the context of *kfss.Therefore it built the following erroneous rule:
what fills the variable slot. Similarly, ACT wants to generalize rules 2L, 26, and29 by replacing the concepts*dance,*jump, and *ploy by . variable and the words dance, iu^p, and play by u variable. However, to make such generalizations, it again needs a constraint on what may be substituted for these variables. ACT does not have any appropriateconstraint stored in memory for any of these potential generalizations. When facedwith a number of potential generalizations,all of which require a variable constraint, ACT decides to create a word class.It createda class,which I will call verb, and stored in long-tenn memory the facts that*dance,*jump, and *play (or, equivalently, dance,iump, play) are instances of this class. Having done so, it was in a position to create the following rules:
19. (Indefiniteobject)+ "someobject,,in the contextof *kiss. This rule later died when it led to an incorrect generation. Grr.IrneuzATroN
30. 31. 32. 33. 34. 35.
The first generalizations occurred after sentence 25. At this point ACT had the following rules, among others: 20. 13. 21. 22. 23. 24. 25. 26. 27. 28. 29.
283
(Relationagent)+ "agentrelation',if relationinvolves*dance. (*Dance) + "dance* s" if presenttense. (*Dance) + "dance* ed.', (*Dancel+ "dance* s" if presentand singularagent. (*Dance) + "dance"if presentand plural igent. (Relationagent)--+"agentrelation,iif relationinvolves*jr^p. (*lump)+ "jump * s,, if present. (*lump)-+ "jump * ed.,, (*lump)+ "iump + ing" if thereis the contextof progressive. (Relationagent)--+"agentrelation,,if relationinvolvfs *play. (*Play)-+ "play * ed.',
(Relation agent)+ "agentrelation"if relationinvolvesa verb. (rVerb)-+ "verb * s" if presenttense. (*Verb)+ "verb * ed." (*Verb)+ "verb * s" if presentand singular. (*Verb)+ "verb" if presentand plural. (*Verb)+ "verb + ing" if in the contextof progressive.
In these rules verb refers to a word in the verb class, and *verb refers to its meaning. The development of the verb word classdoes not stop after thefirst twenty-five sentences.As evidencebuilds up about the syntacticpropertiesof other words, ACT will want to add additional words to the class. A maior issue is when words should be added to the same class. It is not the case that this occurs whenever two rules can be merged, as above. The existenceof overlapping declensionsand overlappingconjugationsin many languageswould result in disastrousovergeneralizations.ACT considersthe set of rules that individual words appear in. It will put two words into a single classwhen
These rules are not a particularly parsimonious (let alone accurate) characterization of the language structure. Because of specificity ordering, they work uettei than might seem possible. For instance, rules 22 and 23 take pre."derr.e ovir 13, which does not test for number, so 13 will not cause trouble. similarly, rules 13,22, and 23 ail take precedenceover rule 21. Thus rule 21 will not be able to generatean ed inflection when the tense is present, even though 21 does not explicitly test for past tense. ACT's generalization mechanism would like to generalize rules 20, 24, and 28 together. This would involve replacing the concepts *dAnce,*irmp, and *play by a variable. Hbweu""r, discussedin the previous chapter,theremust be a constrainton ",
1. The total strength of the rule for both words exceedsa threshold indicating a satisfactoryamount of experience. Thus, one does not form classgeneralizationsuntil a su[ficient data base has been created. 2. A certain fraction (currently two-thirds) of the rules that have been formed for one word (as measured by strength) has been formed for the other word. When such a classis formed, the rules for the individual words can tre generalizerl to it. Also, any nclv rtrlr:s;rcqrrireri for onc. -';glsirt-,
284
word will generalize to others in that class. once a class is formed, new words can be added according to criteria (1) and (2). Further, two classescan be merged, aga"in according to the same criteria. Thus, i-t is possible to gradually build .ip t"rg" classes, such as the first declension it L"tir, (see Anderson, 1981c). The word-specific rules are not lost when the classgeneraliza_ tions appear. Furthermore, one form of discriminatio-nis to create a rule that is special for a word. For instance, if the general rule for pluralization does not apply for mnn, one action discrimination would be to propote ir,it men is the special plural form tor man. Becauseor lne specificity ordering in production selection, these word-specific rules wiil be favorla wnen applicable. This means that the system can live with a particular word (such as diae)that is in a generalclassbut has some excep_ tional features. The creation of word classescould be a language-specific operation, but it is possible that there are other casusol class for_ mation in nonlinguistic skill acquisition. Most constraints on variables involve u9eo{ properties that are already stored with the objects. In the bridge generalization discussed in the preceding chapter, the variable cards were constrained by the fact that tfuy were- touching honors. Rules for acting o; general (variabili zed) objects (for example, throwing) make reference to the objects' physical properties (say, size"and weight). word classesare arbitrary. The only features their members have in common is that the sameset of syntacticrules apply to them. It is tempting to think that adjectives and verbr t *" distinctive semarrticproperties, but this may not be so (seeMaratsos and chaikley , l98l; Macwhinr€/, in press, for discussion). For instance,why should actiaebe an adiective andthink a verb? The arbitrariness of word classesis even clearer in other languages, where nouns are almost randomly divided among decllnsions and verbs among conjugations.r{ Robert Frederking (personalcommunication) has argued that arbitrary nonlinguistic classesdo occur in game playiig. There are arbitrary classesof piecesdetermined oity uy tt eir Function in the game. In the game of Hi-e he induc"a tnu existence of four classesof pieces. Having those four classes was essential to his being able to win the Same. THU Frx.lr pnnesr-Srnucrt RE Rures It is useful to examine some of the setsof rules that ACT possessedafter 850 sentences.Among the word classes it formed
285
LanguageAcquisition
The Architectureof Cognition
final was the noun class, which included all the nouns. The rules for this class were: 34. 35. 96. g7.
(*Nounterm)+ (*Noun term) + (*Noun term) + (*Noun term) +
"noun * s." (11) "noun" if singular.(511)-. "noun * s" if plural' (385) *man'(71\ "men" if pluraland *noun :
is a To the right of each rule is its eventual strength. Rule 34 accordothers the by blocked be residual rule that will always pluing to specificity. Rules 35 ind 36 are the standard rules for was that excePtion an is 37 Rule ral"and'singulir inflection. 36' from formed by in action discrimination parIn contrast to the noun word class,which had a relatively aspect perfect the of simonious structure,the rulesfor inflection Figure as has,had, or haae had a relatively complex structure. Each rules' these of 7.5 is an attempt to illustrate the structure produca single terminus in the discrimination tree represents the tion; the number beside it is its strength.The paths through a in choosing network reflect the operation of conflict resolution ate 3795 class and production. The tesls involving class 4750 the testsof whether perfect occursin ttre context of a modal. By SING?
,rr/\t /\
PAST?
,ol \. t\
[rs] nao
,rt/
PL],.R?
/
\*o \
PRES?
YEs/ Y'./
ACTION?
\"
\ \/ / [ss]nave [zs]Have [re] Have
learned Figure 7.5 A discriminationnetworkspecifyingthe aanousrules tense' inflecting Perfect for
287
The Architectureof Cognition
LanguageAcquisition
same generalization Process that led to formation of word classes,ACT has formed classesof modal contexts. Class 4750 containsthe modalswould,should, andmay. Class3795contains the modals do, diil, will, and could. Eventually, they would have been collapsed into a single class, but that had not yet occurred by sentence850. Also present- able (can) appears in Figure 7.5 as a separatetest becauseit has not been merged into a modal class.The nine rules in the figure lead to nearly perfect performanceon the perfect. However, they could have been replaced by the following three rules:
this particular elror. Perhapsthe assumptionabout precedence of semanticfeaturesis incorrect. It is alsopossiblethat children generatesuch erTorscovertly, as will be discussedmore fully in the section on the child simulation.
286
(Perfectaction)+ hadactionif pasttense' (Perfectaction)-+ hasactionif presentand singular. (Perfect action) + haae action.
Becauseof specificity, the last rule would aPPly only if the first two did not. Given its learning history, ACT has worked its way to a more complex characterization. In general, rules derived from language learning are not likely to be the most parsimonious, nonredundant characterizationof the language. fnis is an important way in which the conception of language based on ACT differs from the conception promoted in linguistics. Parsimony has been applied in linguistics to a too nalrow range of phenomena, and the result is a seriously misleading characterizationof language (see Chapter 1).
AceursrrloN oF TnausroRMATroNs ACT needs to get its syntactic rules clear before it is ready to deal with question transformations.Actually, the subject question does not involve learning a transformation. Given the meaning structure ((progressive (hif)) (query X) (definite (boy Y))), ecr would generateby its subiect-verb-obiectrule_?ruas ttiiii"g the hoys. The question mark indicates that it failed to have'alexical item for query. Comparing this with the feedback who was hitting the boys?,ACT would infer that query is realized as Who. However, the other two constructions are more difficult: Wasthe girl hitting the boys? and Who was the girl hitting?In both case; the graph-deformationcondition is violated inLoving the participle was ftomthe verb matrix to the front of the sentence. Early in the learning history, ACT typically refused to try to leam from a sentence that violated the graphdeformation condition because it could not formulate any phrase-structure rule to accommodate it and could not see a way to transform the output of existing phrase-structure rules' However, as its phrase-structure rules becamemore adequate, it was eventually able to propose a transformation. Figure 7.6, showing AcT's first postulation of a question
SvNr.e,cuc vERsusSnuaNTIc DIScRIMINATIoN The production set for perfect in Figure 7.5 is in error in that it teststhe semanticsubject (agent)of the sentencefor number rather than the syntactic subject. This means the simulation would generateTheboy * s has beenhug * ed by the girl, where the number of the verb agreeswith the number of the syntactic object.rsThere are a number of placeswhere ACT has a similar error for other verbal constructions. This derives from the assumption that the semantic structure is more active than syntactic structure and consequently ACT would consider a discrimination involving the semantic structure first. Situations where the number of the semantic and of the syntactic subject is different are sufficiently rare that it takes a long time for ACT to form the right discrimination. Eventually it would, but it failed to achieve this for the perfect construction in the first 850 sentences. This is somewhat distressing, becauseI know of no casesin literaturewhere children made the chitd-langtrage-acquisition
('RUN}} (DEFINITE('FUNNY('IJAOYX}))I) ((TUTURE (OUESTION
IITWUNE('RUN)}
X))I) (DEFINITE(}FUNNY(TLADY
( O E F I N I T E( ' T U N r u V( T L A D YX ) } )
(TFUNNY ( *LADY X))
( THE
(FUNNY
( LADY
S)))
(WILL
(RUN))) l
Figure 7.6 The goal structuregenerateilfor the sentencethat led to the firstquestiontransformation.
'I'he Architectureof Cognition transformation, illustrates the decomposition of the meaning structure into components accordingto the rules ACT possessed at this time. Not knowing h; to rearize the question element, ACT simply tried to generate the embedded proposition that was questioned. This-was .eatizea as The ii"ii{,rau, utill run. The target sentence came in chunked as (wil (the (lady s)D Gun)). Acr noticed that it courd achieve the \funny target sentenceby simpry rearranging the structure of its generation. This is one of the circumstan&s for learning a truiJror_ mation. Therefore it formed the foilowing pranninf rule: __ _lt the goalis to communicate(questionassertion) THEN plan the generationof assertion and movethe first morphemein the reration matrix to the front and thengeneratethe sentence. As discussed at the beginning o{ this chapter, this planning rule might eventuaily b1 reprlced by , rure of the form: "olpiled IF the goalis to communicate(question(relation agent)) and the tenseis future THEN set as subgoals l. to generate wiil Z. to describeagent 3.
to describe relation.
A separatecompired rure wourd be leamed for ail frequent sen_ tence structures. The pranning rure above wourd remain for the^sg1nf19quent constructions that had not been compiled. Acr offers an expranation for an interesiing aspect of the question transformation. Note that the g"""r"r transformation rule does not aPply to the simplest.r"rb".or,rtructions. For exumplg, applied to rhe sentence7t t uoy kirkrili, soilor,this rure would produce the ungrammatical kic]rea Rather, the best way tJ create a question fte boy the sailor? is to prlopose a do_ insertion: Did the boy kick the saitoi rn" acq,risition of this rure can be explaineo,o)n.r,roposing that rhe guii"Lr planning rure was applied (prop.bll covertlyJ and led t[ tr,i, incorrect question form. Then the discrimination process wourd be evoked to produce the right action, as shown by the foilowing planning production:
nguageAcquisition IF the goalis to communicate(question(fverb) agentobject)) and the tenseis past and thereis no modal,perfect,or progressive aspect THEN plan the generationof ((*verb)agentobiect) and then insertdid at the front and replaceverb * eil by verb and then generatethe sentence. Compiled, this becomes IF the goalis to communicate(question((*verb)agentobject)) and the tenseis past and thereis no modal,perfect,or progressive THEN set as subgoals 1. to saydid 2. then to describeagent 3. then to say verb 4. then to describeobiect. Thus our compiled rule for did insertion is no different from any of the other compiled rules for the question transformation. The same leaming principles can deal with dfd insertion without any special-caseheuristics. SuuuanY oF ACT's PrnronuANcE Figure 7.7 summarizes ACT's progress.The number of syntactic errors per sentenceare plotted againstthe number of sentences studied. Since the complexity of sentencestends to increase through the leaming history (becauseof a bias in the random sentencegenerator), there are more opportunities for errors on later sentences.Thus the rate of learning is actually faster than the graph implies. The sentencesin Table 7.2 are a sample biased to represent errors, becausethese are the more interesting cases.The table lists all the errors made in the last hundred sentences.As can be seen, the final errors almost exclusively involve irregular verbs in lessfrequent constructions. With time ACT would learn these too, but it has not yet had enough opportunities to learn them. Children also have a long history of residual trouble with irregular constructions. One type of error produced by the program seemsunlike the errors made by children. This is illustrated in sentence ll7, where the program generated the verb matrix has k jumping rather than has beenjumping. It had learned eom prtrfpus gerr tences that is plus irg communicates the progrecsive in the oon-;..
290
The Architectureof Cognition
LanguageAcquisition
291
unbounded from the start. Every sentenceit generatesis a serious attempt to reproduce a full-length English sentenceand, as we see in sentenie 9, it can hit the target early. In contrast, utterancesgeneratedby children are one word in length fola lglg while, then two woris for quite a while; only after considerable do longer utterancesstart to appear. The first mul(My "rp"riurrce tiword combinati[ns are clearly not English sentences. son,SfirSt noted two-word utterancewas "more giSh"-translated "I want more cheese.")
a, (rl c' q, c q, ,t l|, cl o o |l, |9
o J
ro
l@ 50 20 LOG(numberof sentences)
asa funcgenerated Figare7.7 Themeannumberof errorsper sentence paiings' tion of the numberof sentence-meaning t e x t o f p r e s e n t s i n g u l a r . T h e . p r o g r a m h a s n o t y e t l e a r n eitdis that of perfect' Thus' the elementis becoln esbeenin th! context generalization' Perhaps a perfectly (pun intended) reasonable and are unwilling children remember word-to-word transitions (such ashas-fs)exceptin the to venture un ir,frequent ^"tf". transition This idea will be employed in the seccontext of a stronl tion on child language' "example simulation should make a fairly In summury, tti' language-acquisition convincing case for the power of ACT's memory, it computer and time system.Exceptfor limitaiions of isunclearwhetherthereareanyaspects-ofanaturallanguage thatthisp,og,"^cannotacquire.rrrefutlimportofthisclaimis of the learning ProS-r1T u tittt" ur,cer{ain, howevet.'ih" success dependsonProPerties-ofthe.semanticreferent,anditisdiffisolutions to critical probcult to know wliether I have built in This question can referent. lems in the structure of the semantic to leam very referent semantic be resolved by using the same great deal of a require will proiect different tanguales,iut that effort. Child Language previous simulation is The perfornance generated by the its vocabulary choice' than definitely unchildlite in ways other length of utteranceis its Its rateof initial ProBressis rapid, and
CoNsrnerNTsoN CHrr.p Lencuncs AcQUTSITION It would be easy to conclude that children are simply less capable than ACT, but this would fail to understand what is has numerous unrealistic ioing on. The preceding simulation in response to the largely is behavior frp"Itr, and thl child's reality of his situation. Voiabularylearningand rule learnfng.Unlike the previous t1*ulation, the child stirts out knowing neither the Pronunciation of the words nor their meaning. Therefore, a lot of the initial leaming is drill and practice, setting in place the prerequisites to syntix acquisition. Also the child cannot immediately create compiled productions for generation. It will take some time to .o*pilu the rules, and in the meantime executing them will impot" extra short-term memory burdens. Indeed, generating a singte word is probably not a single compiled unit for a child' Foiinstance, I noted foi my son that the quality of generation-of a word deteriorateswhen it is preceded by " word, and the more syllables there are in that first word, the more the quality deteriorates. RateOf learning. In addition to slow compilation, there are limitations on t6e amount of information the child can learn from a meaning-utterancepairing. In the preceding simulation, the progru* u.q,tired six rules from sentence 14. Acquiring each*1" t"q,tires holding and inspecting a trace of the meaning, the ,"r,t"t ." generation, and the target sentence.A child *d"ta be lucky if he could hold enough information to enable one rule to be formed. limitations Quality of training data. Becauseof cognitive (ungrammatical data (short-termmemorylancl becauseof noisy utterances,sentencesthat do not colresPond to the assumed meaning), the Process of identifying the meaning-utterance .o.r"rpJ.dencei is much more precarious for a child and there Therefore, the child will be many erroneouscorresPondences. must proceed with a lot more caution.
rc Architecture
'gnition
Short-termmemorylimitationsand conceptuallimitations.There are severelimitations on the intake and analysis of a meaningsentencepairing in the absenceof comprehension of the sentence.According to traditional assertionsand the discussionin Chapter 3, a child can hold five or fewer chunks in memory. Early in language these chunks may be a morpheme or less; later they may be as large as a word; still later they will be the length of comprehended phrases. Moreover, when a child's short-term memory is overloaded, it will be selectivein what it retains. Anderson (1975)proposed tl":re telegraphic-perception hypothesisthat children tend to retain the words they know and the semantically important words. The child presented with The ilog is catching the ball might record ,loggy catch ball. Small wonder, then, that his utterances have the same telegraphic quality. In addition to a limitation on the strings he encodes,there is a limitation on the meaning structures the child can pair with these utterances.These are limitations of memory capacity but also of conceptual development. If the child is not perceiving possession in the world he can hardly acquire the syntactic structures that communicate it. Slobin (1973)has argued that much of the timing of the child's syntactic development is determined by his conceptual development. Understandingthe natureof language.The child's appreciation of the function of linguistic communication is probably also developing as he acquires language. In the preceding simulation, and indeed in the forthcoming one, it is assumed that the learner is always operating with the motivation to learn the full communicative structure of language. However, a child may well start out with more limited goals, and restrictions on his utterancesmay reflect only his aspiration level. Impact of the constrainfs.It is important to determine what happens to the performanceof the ACT language-learningprogram when the child's situation is better approximated. The characterof its generations should better approximate the character of the child's generations. And it should still eventually learn a natural languageiust as a child does. This secondpoint is not trivial. The child initially produces (and presumably leams) a language which is not the target language.His rules are not simply a subsetof the adult grammar, they are different from the adult grammar. Therefore, in acquiring the correct rules, the child must overcome his initial formulations. Are initial rules hindrances, neutral, or stepping stonesto acquiring the correct rules? For instance, where the adult will
LanguageAcquisition
293
model Mommy kissesDaddy, the child might generateMommy be rekiss or Dadity kiss. In ACT these initial rules can either placed by stionger and mor-especific rules, or transformations excan be acquired-to convert their output into more adequate pressions. SruurauoN or f.f. ACT was applied to simulating the early linguistic development of my rot, J.;., who beganto,tt" words at twelve months and was beginning the two--word stage at eighteen months' When the final draft of this book was written, f.f' was twentythreefour months and was producing some fairly interesting (l need word and some occasiinal four--or five-word utterances this To a minute).tu in back coming Mommy moreu)ater,Dadity; develhis with uP keeping of iair a did point the simulation iob (as fprr,"r,t and reproducing the-characterof his generations well as reproducing some of the generations-themselves). to At the time of .,'iriting, the simulation had been exposed encounwould what p.esttmably is f'J' 5,000sentences,which a ter in several days. Given that the Program has reproduced twentyto twelve great deal of the iyntactic_development.frol left in iorr rrror,ths,ther" ur" still some unrealistic idealizations in a first the than simulation realistic it. However, this is a more number of waYs. not Vocabularyllarning and rule learnirg.The simulation does these infer must it words;_ of start out knowing tf,e meanings word is from the senterr.-"-*"uning piitings. Furthermore, a frecriterion a achieved has it t"tt tit not considered a chunk syllables of number by divided (frequency quency of exposure simulation). The consequenceof trymust be 10 in the ",r*".t generaing to generatea nonchunk word is that the rest of the to prostarted program the times tiJn pfin is destroyed. Many generating in used.up capacity its duciBye-byeDaitiy, buthad Tie prolru* muc! sooner leamed to generate Hi Bye_by"e. was fiaddy, becauseAi-tras a single syllable. Interestingly-,this said we though even learning, in ].J.'s the ordering observed bye-byeto liim (I regret to report) at least as often ashi. " Al;, the program will not make a transition between two frewords unless it has encounteredthat transition a criterion that rule the or simulation) quency (five times in the current (ten .un, fo, the transition has achieved a criterion strength edits simulation the Thus, 8enunits in the current simulation). uterations from weak rules accordingto its memory of correct (see the has is lik" iumping terancesand so avoids expressioni
295
The Architectureof Cognition
LanguageAcquisitiotr
discussion in the previous section). Consequently, the program's initial utterancesare repeats of sequencesin the adult model; only later does it come up with novel utterances.The samewas observedof I.l. His first usesof morewere utterances lnkemore gish (l want morecheese)and morenana (l want more banana),and only later did he produce novel generations that had not been modeled. An example of a novel combination is more high, by which he asked to be lifted high again. (Please note tiat hig:hdoes not refer to drug use.) Interestingly, when he had *ut ted to be lifted high before this point, he simply would say more or high. An interesting consequenceof these assumptionscan be illustrated with utteranceslike Russbark or Mommy bark. For a while, the simulation was not capableof thesetwo-word transitions, nor was f .j. This simply reflected the weaknessof the rule authorizing this two-word transition. Thus |.J. and the program would generate only one of these two words. If the simulation generated this plan and went through it left to right, it would generateRussor Mommy.However, |.j.'s one-word utterancein this situation was bark, although he knew the words Mommy and Russ, and we had modeled the two-word transitions many times. Therefore, the Program has a "subgoal omitter"; if it came upon a subgoal that was going to exhaust its resources and a more important part of the utterance remained, it simply omitted that subgoal and went on to the more important part. Similar ideas have been proposed by Bloom (7970).Thus, in our program, as in f .f ., the first generationswere bark and only later aialney become agent * bark. The program regarded the relational term as most important, then the obiect, and finally the agent. This reproducedf.J.'s ordering, although it is rePorted (MacWhinney, 1980)that many children prefer the agent. In the caseof l.l.-an only child-it was usually obvious from context who the agent was, and this perhaps accounts for his preference. MacWhinney (1980)reports that some children will move the most important term forward out of its correctposition, but we did not observeany clear casesof this with J.f. Another interesting consequence of the assumptions concerns the. frequency of generations of the structure oPera' tor * object versusobject * operator. Braine (1963)has characterized early child sPeech as being formed by these two structures, and indeed early on the J.J. simulation did create two word classes,one consisting of first-position oPerators and one consisting of all second-position oPerators. However, in the early speechof I.l. and the Program, the operator * obiect
construction was more frequent because the operators ryere more frequently used and compiled more rapidty. Ih"t- the first word in ihe operator * obiect sequencewas less likely to exhaust capacity than the first word in the obiect * operator se-
294
quence. ' Rot, of learninS.In line with the earlier discussionon rate of leaming, the piogram was limited to learning one rule Per meaninlg-stringpresentation.Becausethe utterancesit worked with *Jt" so thott, there was often just a single rule to learn' However, when there were multiple rules to learn, the program chose the left-most rule with the imallest span. For instance, if given the generation Do88y chaseskitty and it did not know the ireaning for doggy,it would learn this lexical item in preference to learning theitquence agent-action-object.This ordering was produced-Vy ^ ptogt"* that scanned the sentencein a left-totignt depth-firtt -"t t er looking for the first to-be-learned stiucture. It seemedreasonablethat the program should focus on the pieces before learning how to put the pieces together' One consequencewas that thl utterancesfirst produced tended to be short and fragmentary. to generate Quality of training itata.when the program failed 50 utterance, the target match to failed any utterance or *[". it corthe as to feedback incorrect percent of the time it received rect utterance. (When incorrect feedback is given, another random utterance is chosen from the model's repertoire.) Even when its generations matched the target utterance, it was given incorrect ieedback 20 percent of the time. The smaller percentage in this second situation reflects the idea that learner and t*a"t are often not in corresPondencewhen the learner cannot express himself, but are more often in correspondence when the learner does exPresshimself correctly. One consequenceof this is that when the Program does hit upon the correct rule, that rule is generally reinforced. The progiu* takes various measures to protect itself from noisy data. When it detects too great a disparity between its generationand the model's, it iust refusesto learn. More import-ant,it does not weaken the rules that led to generation of the mismatched sentences. Incorrect feedback createsthe greatest problems for acquisition of word-meaning pairings. Almost 50 percent of the program's hypothesesaslo a word's meaning are incorrect. ThereIor", a word-meaning corresPondenceis encoded only after a consistent reinforcethree-of certain number-currently ments. So set, there was one occasionwhen a wrong hypothesis
296
The Architecture of Cognition
was formed, but the program kept accumulating information about word-meaning pairings and eventually the correct hypothesis acquired greater frequency and replaced the wrong one. Short-termmemorylimitationsand conceptuallimitations.This simulation's limitations on sentencecomPlexity and conceptual complexity were relatively unsystematic. Whenever we could identify u.ot."ptualization that was part of |.f .'s repertoire and that wL found Lurselves communicating in our utterances, I added it to the array of sentence-meaningpairings that wer: presented to the Program. For instance, at the 1,500thpairing (I equate this pairing with ].1.'s developmentat about seventeen mbnths), *. htd evidence that |.J. could understand the construction involving a relation between people and their body parts, and we were practicing constructions like /'/.'s nose, ivlo**y,s eAr, Russ'seye, Daddy's hand. Therefore "possesser bodypart" constructions were included in the Presentation set along with an aPPropriate semantics after the l',500th pairing. the natureof language.There was no attempt to Llnderstanding model l.l.'s growing appreciation of the function of language, solely for lack of relevant ideas. However, there is evidence that he did not start out appreciating the full communicative Power of language.There was a curious early stage(around thirteenfourte-enironths) when J.J. would only use his utterancesdescriptively to name objectsand not to requestthem. He had the conceptoi.ot uurbalrequest, as he was a very effectivepointer' It waJfrustrating to us as Parentsthat our son could want something, be crying for it, have the name in descriptivemode, and "refltse" to iell us what he wanted. It was a joyous day for us all when he pointed to a banana he wanted and said nana. Negation and question-askingcame in even later, although I.l. u-nderstood both much earlier. His increasein appreciation of the communicative function of language is reflected (but not explained) in the growth in the type of constructions provided in the training sequencefor the Program. PnnronuANcEor rtrr J.l. Stuulartox Training data.The total set of constructions Presented to the simulation is describedby the following pairings: + "object/property" (*obiect/*property)) 1. (Describeirequest (such as " cookie ," " 11P" \. 2. (Point out (+obiect))'+"this." 3. (Request(*more(*obiectX))) - "more obiect" ("more cheese"')'
LanguageAcquisition
297
(*upldown (? X)))+ "upldownII." 4. (Request 5. (Describe(*see(*personX))) + "hi person"('hi Russ"). 6. (Describe(*depart(*personX))) - "bye-byeperson"(bye-bye Daddy"l. 7. (Request/describe(*action ($personX))) * "person action" ('Mommy iump"). 8. (Describe (*property (*object X))) - "property object" ("hot bagel"). 9. (Negative conceptualization) -, "no conceptualization" ("no more Sally"). 10. (Describe (bodypart (*person X)) - "person part" ("Mommy nose"). 11. (Describe (associate(tobiect 1 X)(*obiect2 X))) --+"obiect 1 object 2" ("baby book"). 12. (Describelrequest(*verb (*agent X)(*objectY))) - "agent verb object" ("daddy feed Russ"). 13. (Describe/request(*direction (*agent X))) -* "agent go direction" ('ll go down"l. 14. (Describe/request(*back(*agent X)(robiect Y))) - "agent put back obiect" ("Daddy put back blocks"). 15. (Describe/request(*off (*agent X)(*obiectY))) + "agent tunr off obiect" ('Sarah turn off Sally"). (*obiect X))) - "object 16. (Describe (*openl*closel*broken (' door open"). openlclosehroken" 17. (Question (location (object X))) -- "rohere'sobject" ('where's balloon"). 18. (Describe/request(*up (agent X)(object YXlocation Z)\l + "agent put obiect up on location" ('Ernie put duck up on bed") 19. Delete agent in all requests. 20. Optional transformations:put back object + put object back turn off object + turn object off . 21. Optional transformation: replace inanimate object by it. 22. (Proposition location) + proposition fn location ('Mommy read book in playroom"l. These constructions are given in the order they were introduced into the training sequence. As can be seen, there is an increase in complexity of both meaning and utterance. The first nine were introduced by the 1.,500th pairing, which I equate with I.I.'s development at seventeen months. The next eight were introduced by the 3,000th pairing, which I equate with nineteen months. The last five pairings were introduced by the 4,fi)0th pairing. The system started out just learning single word pairings (construction 1.). This is intended to model a true one-word stage in ability when the child is given or rlecords only a single word and selects a single obiect as its referent. Construction 2
298
LanguageAcquisition
The Architectureof Cogttition
was introduced to account for our child's early tendency to present objectsto us with the utterancedis. Construction 3 received a lot of modeling at the kitchen table: "Does ].]. want more cheese?"Sometimes we modeled iust the two-word pairing "More apple juice?" and sometimesfull sentences,but again the assumption was that ].J. extractedthe critical two words from the longer utterances.The expression"uP jj" derives from our idiosyncratic use of "up with J.j." and "down with I.1." Although we did frequently use the more normal order "Does |.j. want down?" our son latched onto the other order. Rules 72, 74, and 15 all describe agent-verb-obiect sequences that we started modeling with fairly high frequenry. Until the twenty-first month, we heard only verb-obiect utterancesfrom f .J.There are multiple hypothesesfor the omission of the agent. He may have been modeling the command form, which omits the agent. Or he may have been omitting the most obvious term becauseof capacitylimitations and to avoid blocking the critical verb. Initially, the simulation was trained on requestswith sentencesthat contained the agent. This corresPondsto forms we sometimesused to I.I.-"Russ, drop the ball" or "Will f .f . give me the ball?" However, I decided at the 3,000th pairing that it was more approPriate to provide the simulation with the deleted-agentversion of requests (rule 19). Whatever the justification for this midstream change, it demonstrated the learning program's ability to recover from mistraining. Rules 20 and 21 reflectoptional transformationsand so tap the Program'sability to deal with constructionsin free variation. In this case,if the program generatedthe equivalent expression (Dadily put back ball f.orDadily put ball back), the program was not given feedback that it was incorrect, but was presented with the alternative form as another construction. Note that function morphemes (a, the -ed, 's, and so on) are not part of the input to the Program. This reflectsthe belief that I.l. did not initially record these items (the telegraphic-perception hypothesis). When he began echoing our phrases at the twenty-first month he often succeededin echoing constructions and words he was not producing spontaneously. However, usually he omitted the function words. So for instance,"Russ is a good dog" would become"Russ good dog," and "Nicky goes to the docto/' would become (approximately) "Nicka go docka." The first function word he began to use productively was if (rule 2L), at about twenty-two months. The articles (4, the) ancls for pltrralizationand possessiveonly appearedin the
Table7.3 Growthin the linguisticcompetence ot' the simulation Number of pairings
1-500 501-1,000 1,001-1,500 1,501-2,000 2,001-2,500 2,501-3,000 3,001-3,500 3,501-4,000 4,00L-4,500 4,501-5,000 5,001-5,500 5,501-6,000
Vocabulary size
Mean length of utterance
12 31 39 M 54 55
n
97 110 t22 736 742
1.00 1.00 1.09 1. 13 1.16 1.26 1.24 7. 26 '1,.23 7. 41 1.50 1.58
twenty-fourth month.tT In addition to slowly increasing the complexity of the training constructions, we increased the training vocabulary as I.l. appearedto understandmore words. Undoubtedly the simulation's training vocabulary seriously underestimatedthe pool of words J.J.was working with. I.l.'s vocabulary is increasing so rapidly that it is impossible to keep an up-to-date inventory of all the words he understands. At seventeenmonths his vocabulary was approximately a hundred words. By sampling every tenth pageofWebster'sCollegiate Dictionary and adding 20 percent for items not inclu ded (Groaer, Milkbones,and so on), we estimatedthat he had a 750-wordvocabulary at twenty-four months. Rnsurrs Table 7.3 provides a summary of the f .J. simulation through the first 6,000 pairings (twenty-fourth month) in terms of vocabulary size and mean length of utterance. Compared to the previous simulations, the growth of linguistic competencewas spectacularly slow. The mean length of utterance was calculated, omitting those caseswhere the program failed to generate anything. So, as in measuresof mean length of utterance for children, this measure is bounded below at 1.0. As can be inferred, the first multiword utterancesonly began appearing after the 1,000th pairing. The following list gives some of the program's notable multiword utterancesin the order generated during the first 6,000 pairings: MORE BO] TLE
MOMMY READ
HI MOMMY
WANNA GRAPES
300
The Architectureof Cognition
BYE DADDY
PLEASE MORE
DowN u
DADDY GO DOWN
UP I' RUSS WALK
DADDY EAT COOXIE DOOR CLOSED
MOMMY BARK NO MORE
MOMMY CHIN
NO MORE APPLE tUrCE NO MOMMY WALK
ROGERS EAT ORANGE WHERE,S IT
GOOD FOOD MOMMY TALK
It cooK
READ BOOK
HOT FIRE
PLEASE MOMMY READ BOOK DADDY EAT BIG CRACKER
DADDY GO UP GO WALX
It is of interest that the program generatesNo Mommy walk. Tfis is a generalization of the construction No moreopptt juice which has the form negation followed by conceptualization. By the 3,000thutterancethe simulation had mlrged more,hi, and bye into a single word classof prefix operatorq utalk, bark, talk, and cook into a classof postfix operators; and down and up into a third class. By the 4,000thutterance it had formed a class of transitive verbs. It is worth noting that the program's generations underestimateits grammaticalcompetence.For initance, the program has agent-verb-object rules rike Mommy turn off sally or Daddy put back blocks. However, the rules rt a *otat have not been sufficiently practiced to appear in the generations. This is one advantageof a computer simulation over a child. That is, one can peer below the surface performance to seewhat the system really knows. (It alsodoesn't spit up in restaurants.) It is interesting that the ].|. simulation has been run to the limit of the current implementation; there is just not any more memory in the SUMEX INTERLISPsystem the simulation is run on. Any further simulation will have to use a new system, and the implementation will have to be optimized for space efficiency. To do a realistic simulation requires storing a great many rules and information about many past sentences. Suuueny This simulation establishesthat the same learning mechanisms that gave superhuman learning in the previous simulation can result in much better approximationsof early child language when these mechanisms are constrained by the limitations believedto constrainthe child. It was able to repro-
301
duce I.J.'s early two- and three-word utterances.However, the system is still at a very early stageof development,and much remains to be determined.
SARAH R.EAD BOOK
tI GO DOWN, DADDY ERNIE GO BY CAR WANNA IT
NICE RUSS
LanguageAcquisition
Adequacy This chapter has reported two simulation efforts to assessthe adequacy of the ACT learning mechanisms for natural language. The first simulation showed that it was capableof acquiring a reasonable subset of natural language. The second showed that when constrained by reasonableinformation-processinglimitations, it provides a good approximation to child languageacquisition. These two demonstrationsare in lieu of the impractical demonstration, that is, that the program can learn any natural language in a humanlike manner. wexler and culicover (1980)have attempted an approximation of this goal for a rather different learning system. They showed that a set of learning mechanisms could acquire any transformational grammar that satisfieda good number of constraints. Their formal proof of this did not say anything about whether the course of languageacquisition would approximate child language acquisition. Nonetheless, if natural languages satisfy these constraints, then they at least have a sufficiency proof for their learning scheme. coming at the problem from this point of view, they actually are more interested in the constraints than the learning scheme.Although some of the constraints seem purely technical for purposes of achieving a successful proof, others are important claims about the universal propertiesof natural language. While the Wexler and Culicover endeavor makes senseif one acceptstheir framework, it doesnot if one comesat the problem from the ACT point of view. If ACT is the right kind of model for languageacquisition, there is no elegant characterizationof what constitutes a natural language.A natural languageis anything that the learning mechanismscan acquire, given meaning-sentencepairings. This is unlikely to be a member of some pristine formal class satisfying iust a small number of elegant formal constraints.t8Thus, while it is certainly worthwhile to identify the formal properties of languagesthat ACT can and cannot learn, a proof in the mold of Wexler and Culicover does not make much sense. Lrrgcursrrc UwrvpRsAls Wexler and Culicover propose that their constraints on natural languagearelinguistic universals.Chomskypopularizedthe ideathat the syntaxof all natrrrallangtrages satisfies certaintrni-
303
The Architecture of Cognition
LanguageAcquisition
versal constraintsand that natural languagesare learnableonly becauseof these constraints. There have been numerous Proposals for what might be universal features of natural lang.tug"t. If it could be establishedthat such universalsexist, it is ofte" thought that they would be evidencefor a language-specific acquisition device.This is becausethe universal properties are only meaningful in the context of acquiring language and would not have relevancein other learning situations. An interesting question is whether the purported universals of natural languagecan be accounted for in the ACT scheme. This would provide another way of determining whether ACT is adequatelo the properties of natural language.A number of ge."t"i syntactic properties are often considered to be univerJab of nitural languages.These include the facts that all languages have t o,tt i and verbs, phrase structure, and transforirations, and that transformations are cast with resPect to phrase structure rather than word strings. All these features are ilso true of the languageslearned by ACT. The fact that all languageshave elementslike noun and verb derives from the relaiiot -atgument structure of propositional units. The fact of phraseltructures and transformations on these phrase struciutes derives from ACT's goal structure and planning structure. However, some rather specific constraints have been suggested on the tyPesof transforrnations that might apply. One of ih" *ot" discussedconstraints started out with some observations by Chom sky (1973)of what he called the A-over-A constraint. For instance, sentence 1-below seemsacceptablewhile sentence2 is not:
5. Which womandoesMary believethatBill saidthatJohnlikes?
302
7. Which womandid fohn meetwho knowsthe senator? 2. *Whichsenatordid Iohn meetthe womanwho knows? These would be derived transformationally from 3 and 4, respectively: 3. 4.
Iohn did meet (which woman) who knows the senator' Iohn did meet the woman who knows (which senator)'
The constraint appearedto be that one could not extract a noun phrase for wh-fronting that was itself embeddedwithin a noun phrase. Chomsky proposed in general that any extraction transiormation must aPPly to the highest constituent that satisfies the structural description of that transformation. It is worth noting that wh-fronting can extract a term many levels deep if there are not intervening relative clattses:
but that the identical semanticcontent becomesunacceptableif a relative clauseis inserted: 6. *Which woman doesMary believethe fact that Bill said that Iohn likes? Ross (1967\noted that a similar restriction apPearsto hold for movement of adjectives.Contrast 7. StupidthoughMary believesJohnsaidFidois, everyonelikes the dog. 8. *StupidthoughFido bit a man who was,everyoneblamesthe dog. Ross proposed the complex NP constraint-the term complex NP refers to noun phrasesthat contain relativeclauses-that no transformation can extracta constituent from a complex NP. Ross also noted that it is impossible to extract a constituent from a coordinate structure, as in 9. *Who doesMarylike |ohn and? Therefore he proposed a coordinate-structure constraint that nothing can be moved out of a coordinate structure. In total, he enumerated a fair number of apparent distinct constraints on transformationsin English. These constraintsmay derive from ambiguities in applying transformationalspecifications.Consider a production for whmovement: IF the goalis questionLVobjectin (LVrelationLVagent LVobiect) and this occursas part of LVstructure THEN set as subgoals L. 2. 3.
to plan the communication of LVstructure to move the first morpheme in the main verb structttre to the front to move the obiect after the main verb structure to the fiont.
There is an ambiguity in this transformation if more than one object follows a verb. Assuming the pattem matcher selectsthe highest obiect, this transformation would successfullvgenerate
The Architectureof Cognition
LanguageAcquisition
1.and 5 but not 2, 6, or 9. (A similar statementof the transformation for 7 and 8 would produce that phenomenon.) There is nothing language-specificabout this analysis. Such ambiguities in specificationabout in computer text editors, where one has the same problem of retrieving multiple elements that meet the samedescription. In various problem-solving situations, one is likely to find that an optimization transformation selects the first object that satisfies its description. Consider a plan to paint three obiects,with the structureof the plan specified as:
fail to extract a desired object. This will be true for linguistic pattern descriptions, other "human" pattern descriptions,and nonhuman pattern descriptions such as those one gives to a text editor. At y system that leams finite pattern descriptions is going to face ambiguities in their range of application. Therefore the mere observation that our adult system seems unable to extract certain pattems says nothing about the language specificity of that system nor of its acquisition. In order to show that the system is language-specific,it must be shown that in nonlinguistic systemsthe human pattern extractor functions differently. This has never been attempted. Similarly, the mere observation of such a limitation says nothing about whether there is a fundamental limitation on pattern extraction or whether the experiencejust has not been sufficiently complex to force sufficiently refined pattem specifications. Unless one can establish a fundamental limitation, the observation of constraints says little about whether or not the learning mechanism is language-specific.
304
1a. 1b. 2a. 2b. 3a. 3b.
Fill bucket with enough paint for obiect A. Paint A. Fill bucket with enough paint for object B. Paint B. Fill bucket with enough paint for obiect C. Paint C.
If the bucket can hold enough paint for any two objects,.alikely transformation of this plan is La'. Lb'. 1c'. 2a'. 2b'.
Fill bucketwith enoughpaint for A * B. PaintA. PaintB. Fill bucketwith enoughpaint for C. PaintC.
The optimization transformationhas applied to the first two objects in the sequencebut not the last two. If this discussionis correct,this constraintis not a fundamental limitation on natural language.There is no reason why, if the learner received appropriate training, he could not learn a fuller description that could selectthe appropriateobject. This would require more complex training data and make learning somewhat more difficult. However, it would just be a matter of degree. Thus, this analysis predicts that if English had constructions like 2, 6, and 9, it would still be learnable, if more difficult. There has been a considerable constellation of research around the A-over-A constraint and its relatedmanifestations. It may well be that the exactidea offered here will not extend to all of this constellation.However, a general point about such "constraints on transformations"is that given finite bounds on pattern descriptions, it is always possible to come up with data structuresstrfficientlycomplexthat the patterndescriptionswill
305
Notes
1. Production Systemsand ACT 1. Relevant to this interpretation is the fact that children who suffer damage to their languageareasare nonethelesscapableof acquiring a language-presumably using other areas to store the linguistic programs (Lenneberg, 7967). 2, It might be argued that models that go down to the neural Ievel (for example, Hirrton and Anderson, 1981) are more precise. However, these models have been applied in such a small range of tasks that they are totally vague on the issue of control of cognition (see Chapter 4) in which the precision of production systems is most exact.This is not to say it would not be possible to constructa neural model more precise than production systems. 3. ln Table 1.1 and elsewhere productions are presented in an English-like syntax for readability. In the actual computer implementations the syntax tends to be much more technical and much less comprehensible.Readerscan obtain the actual production code by writing to me. 4. unlike this version, earlier versions of ACT had a construct of a global aariablewhose value could be passedfrom one production to the next. 5. The typical formal definition of computationallyuniversalis that the system is capable of mimicking a Turing Machine. The things computable by a Turing Machine have been shown to be identicrrl to the things computable by a number of formalisms, including the modern computer (Minsky, 1967).A frequently accepted conjecture, known as church's thesis, is that any behavior capableof heing precisely specified will be in the class of things computable by a Turing Machine. 6. This follows from the fact that there are always different ways to specify the solution of the task, and each of these specifications can be implemented. However, the phenomenon of multiple ways to perform the same task is not unique to computationally universal ?07
Nofes to Pages79 -121 308
Nofes to Pages19-78
finite-state masystems. For instance, this ambiguity occurs with chines. negative evi7. others may prove more successfulat uncovering the weakest dence becauseit i; difficult for the theorist to recognize polntsofhistheory.Inthepastthemaiorityofnegativeevidence .orr," from the ,esearchut d thuoteticalanalysesof others' i ",g. It should be noted that a limit on the size of cognitive units.is memory' An quite different from a limit on the capacity of working these between connection is a interesting question is whether there to answer an assumptions its in two limits. ACT* does not contain this question. g.Chunkingassumptionsdonotreal|yhelp.Manytasksrequire that cannot reahaving simultaieously available a lot of information instance,in For chunk. sonably be assumedio be part of the same information hold simultaneously to parsing a sentence one ,ruldr of Parslevel in each state the about words, of about a novel string knowledge ing, about the sem"anticsof words, about the speaker's in the conversaand intentions, about referencesintroduced earlier more' and tion, about analogouspast experiences, 2. Knowledge RePresentation and Averbach 1. A similar set of codes was ProPosedby wallach
(1e5s).
not tied to the 2. lt should be noted that temporal strings are For inmodality' visual the to imagei verbal modality, nor spatial queue' a in people stance,a temPoralstring can encode the place of space' in and a spatiali*ugu can encode the position of sounds that a pause is 3. Burrows uria Ot"a a g97a) present evidence task' treated like an item in a Stemberg 4.ItisanentirelyoPenquestionhowtorepresentmuslc,ln one poswhich some interval properties do seem to be represented' of the as elements pauses by sibility is that blank time is represented that possible also is It music). sheet is on string (as it code' "*fti.i,ty motor-kinesthetic a of instance is an or .od" music has its o*r, correlations 5. This is not to deny that there are strong statistical to h.u-lu.u improbable is it among obiects in an image. For instance, Stacy,19731' and Glass, (Biederman, room fire hidrant in a sceneof a necessity' However, statistical probability is not the same as logical to have possible is it and terms, relalional the of semantics given the to possible not is it a fire hydrant in a room. on the other hand, in (except as lo-hn hydrant have the proposition lohn decidedthe fire is an embedded decidedto buy the fire hydrant, where the obiect hYdrant)' proposition, not fire a unit 6. As will be discussedwith respectto tangled hierarchies, case this In structure. larger may be an element of more than one '9 structure larger which to as up I"there is an ambiguity in going "ur* '1re J"uiures sucS as ot6er elcnrenlsof the cubiectmust trieve. s l r r r t . l r r r . t .. r e . r r l t , x l r t . r l t , t 1 1 sl '
sclccl tlrc CtlrrCct strttCtttrC'
309
7. Galambos and Rips (1974)document that a script involves temporal string structures organized hierarchically. That is, they show for effects urrJ.i.tud both wilh hierarchical structure (an advantage (disstructure information high in the hierarchy) and with temporal tance effects for order iudgments). 3.
SPreadof Activation
.\. I active to correlate intend not do I illustrates, example 1. As this I 'memory with what one is currently consciousof' 2. While the units in Figure 3.1. are all propositions, the same analysis would apply if the units were strings or images. its 3. If the ,o,rr." node is an encoding of an external stimulus, a of result source activation reflectssensory stimulation. If it is the activation of result a is production execution, its source activation the p.r*pua into the network from the production. If it is linked to goal' the from goal, its activation comes 4. This assumesthat there is a set of activation sources that stay and the constant for a long period of time relative to the decay rate idealuseful be a can this delay of transmiss'lon.In many situations ization. 5. In contrast to productions like P1 that exist before the experiment, it is assumedihat productions P2-P6 in Table 3'2 are created specifically to perform in the experiment and that they are compiled (iee Chapter 6) during the first few trials' in 6. This implies thit subiects go through a stage of processing to asneed is no There word. a with identified is a nonword which able to sume that subiects are conscious of this at the level of being piloting exPerisuch in that report make a verbal report, but I can on ments I am awate of identifying a nonword with a similar word trials. the of maiority the large 7. Ii should be noted that this analysispredicts particularly case In this name' a color itself is word Stroop interferencewhen the the and task the by primed being is word the for the response code productions of instantiations partial new Also, named. Ue to color will comPetewith the correct one' it is 8. For simplification the first goal clause is ignored since constant acrossthe Productions' g. I thank Robert Frederking for pointing out to me the analYsis of foil iudgments rePorted here.
IrL ;i[ i,'J,',l""1r*':ll;:i'fiffiT:i:
receiv\ must *o'us1&-s2
related to 1.1. There are, of course, limits on this prediction, of sources as limits on the number of items that can be maintained activation. be done 12. It is important to note that theseiudgments cannot plane, moun' is, That category. a to belongs probe if ihe by deciding category that tain, crash, clouds, and wind do not belong to any would exclrrdetlrc foils.
Notesto Pages123-152
Notes to Pages1'61-194
for X can 13. For instance,no node in the unanalyzedstructure C. node to be connected level than 1,4. If the node had any connections to nodes of lower be i. The not would X node from i - l, its minimum link dirtur,." * 1' because i than higher nodes to node cannot have connections node' level-i a from removed link they are only one
8. For instance, rather than simply executing a pop, other production languagesmight have many special-caseproduction rules like:
310
4, Control of Cognition in the preced1. one exampteof the distinction was developed automatic and conscious between ing chapter in the distinction priming automatic that saw We task. the lexical decision f.i*i"g'in netin the active especially made was information when l..rrr"l goal his set subfect the work, and consciouspriming occurred when In '.e1lral, to allow a potentially more Jfficient production to apply. in ACT to corresPonds processing conscious attention Lr controlled processing' of mode a special favor setting goals to capacity of 2. In the last chapter I argued thal the momentary requires planning working memory is high. However, opportunistic a large sustainedcaPacitY. by build3. It appears ttrat stuaents get around these difficulties Thus, data. the of uses both encompass that patterns ing largei segment, same the rather than seeing two triangle pattems sharing a single they see one "ailacent triingles" pattern that- involves that information the is pattem the with shared segment. Associated congruence' of rule reflexive the with used be can the one segment have special 4. Later in the chapter I discuss how goal structures mechanismsto deal with data refractoriness' to immediate 5. As will be discussed later, "goal" in ACT refers Only one a sentence' generatingas such behavior, goals of current have maya that deny to not is This Person active. be can Juch goal better are goals such many",,high-level"goals like "making money." current called po[.i"r becairsethey do not have direct control over the and attraction, fear, hunget, analysis behavior. Again, under this goals' of setting the to lead can that but states like are not foab the goal 6. It should be noted that this seriality is a property of element goal the involve productions only. Productions that do not particularly A parallel. in aPply can and match disioint sets of data important class of such productions consists of those for automatic puitu* matching. Thus multiple letters on a Page can be matched simultaneouslY. this 7. Many people,s introspective reports do not agree with a (n) (b) as or either see to fail people some particular sim.,l"tiott. which FOUL, (b) as seeing reported hive people of A number word. ilis not representeain figure 4.5. Such examplesare only meant to acwe Until obtained. is lustrate irow the wordlsuperiority effect to a human lexicon, the tually implement a reasonu-bl" "pptoximation to correspond exactly to urilikely are simulation our perceptions of human percePtions.
311
IF the current goal is to be POPped and the current goal has a supergoal and the supergoalhas another subgoal and that subgoal is ordered after the current goal THEN make that subgoal the current goal. IF the current goal is to be POPped and the current goal has a supergoal and the supergoalhas no subgoalsafter the current goal THEN set the supergoalas the current goal and this supergoalis to be POPped. 9. Wexler and Culicover (1980)note that linguistic transformations tend to apply only to two levels of a linguistic phrase structure at a time. Perhapsthe reasonfor this restriction is that this is all the goal structure that can be kept active in working memory at one time. 5. IVlemory for Facts 1. Although Woodward et al. found that presentation time had no effect on probability of recall, they found that it did have an effect on recognition time. Crowder (1976)and Glenberg, Smith, and Green (1977)have proposed that this is becauseof formation of contextual tags. One might argue that context changes with time and that this createsnew units encoding the associationsbetween the item and the new contexts. Thus time does not increase the probability of encoding a contextualunit but rather increasesthe probability of forming a new contextualunit. 2. It is unlikely that this is the sole explanationof the spacing effect, particularly at long intervals (Glenberg, 1974; Bahrick, 1979). lt is improbable that the spacing effect has a single explanation. Other factors probably include change in context (Glenberg) and some general fatiguing of the capacity of nodes to form and strengthen associations (seethe sectionon practicelater in this chapter). 3. This is not to deny that there are other important components to some mnemonic techniques, such as chunking multiple units into a single unit or imposing an orderly retrieval structure on recall. nrodelassttmedthat the con4. Past versionsof the generate-test text word made no contribution to the retrieval of the trace except in terms of sense selection. However, in this model the context word can be just as important a source of activationas the target word. In fact, some results suggestit might be a more important source (Bartling and Thompson, 1977; Rabinowitz, Mandler, and Barsalou, L977\. To the extent that the context word is the more important
Nofesto Pages795-261
Nofes to Pages266-299
source, encoding-specificity results should occur even with those words that truly nuu" a single sense(Tulving and Watkins, \977')' 5. underwood and Humphreys (1977) have basically replicated the results of Light and Carter-sobell, but they argue that the magniIt tude of the results does not iustify the multiple-senseinterpretation' effect' the of magnitude the about predictions clear make to hard is 6. [t is interesting to ask to what extent the difference in activaention patterns set ,rp i. ACT instantiates what Tulving means by coding specificitY.
3. Recently, it has been argued in linguistics (for example,Bresnan, 198L)that there are no transformations.While earlier linguistic theories may have overemphasized their use, it seems improbable that there are no transformations. For instance, constructions like respectivelyand aice aersaremain for me consciously transformational. 4. Later I will discuss what to do about verb phrases that do not have auxiliaries for pre-posing. 5. As discussed later in the chapter, content words in ACT are those that have an associatedconcept. 6. Of course, at certain points in the processing of content words, implicit, production-based knowledge comes in. Thus it is very reasonable to propose that our knowledge of what a dog looks like is embedded in dog-recognition productions. However, in generating the word dog, the speakercalls on a declarative link between the lexical item and its meaning representation. 7. Some work has been done with Latin and French; see Anderson (1981c). 8. A description of this production system will be provided upon request. 9. lt would also be possible to form analogous comprehension rules from this experience.Thus, the acquisition of comprehension and generation rules need not be independent, even if the rules are. 10. See the discussion in Chapter 6 of analogy as a basis for using declarative rules. 11. Asterisks are being used to denote the concept corresponding to content words. 12. Each of these rules is specific to a particular concept. For instance,(1) is specific to definiteand (2) is specific to boy. 13. Unlike the weakening principles set forth in the last chapter, which involved a multiplicative change, this simulation simply subtracts one unit of strength. The successof the current simulation illustratesthe arbitrariness of the exact strengthening principles. 14. lf MacWhinney is right in this debate and there are distinguishing semantic features, the ACT learning mechanismsdescribed in Chapter 5 will apply without this modification to incorporate arbitrary classes. L5. As it tums out, ACT leams the passive as a separatephrasestructure rule rather than as a transformation. Unlike the question transformations to be discussed, the passive does not violate the graph-deformationcondition. 16. How one should measure the length of these utterancesis an interesting question. We are fairly certain that I need, coming back, and in a minute are fixed phrasesfor J.J. 17. It is of some interestto considerI.I.'s use of the fourteenmorphemes discussed by Brown (1973).He used the ing inflection to describe actions, but he most definitely did not have the present progressiveatrxiliary (is, nre, and so on). t{e used ln and on quite
312
6. Procedural Learning 1. This assertion does not imply that composition can only aPPly to productions that involve goali. However, goal structures provide to' or," irnportant means for determining which. productions belong of products in the oplimizations certain producing for gether and composition. 2. Each time an attempt is made to recreatean existing production (for instance, through composition), the existing production gains an increment of strengthrather than a coPy being created' 3. No claim is made foitne appropriatenessof these rules for actually playing bridge. 4. These terms-areslight misnomers. A condition discrimination incorrect production. However, an iust changes the condition of the in both condition and action' change a makes discrimination action 5. Recall, however, from Chapter 4, that an exception to a strong general production can still take precedencebecauseof a specificity of fa.rantag". This requiresthat the exceptionhave a modest amount tolnexceptions implications.for interesting iottt" has This strengthi For flectiJnal rules for words (for example, the plural of man is men)' the rule, regular the over precedence an exception production to take to The exceptions frequency. moderate least at word must occur with frequent more for occur to aPPear do rules inflectional general words. 6. Data refractoriness(seeChapter 4) will prevent the selection of Januaryor SePtembera secondtime. 7. Language Acquisition 1. It is somewhat unfortunate that the term "generation" is used throughout this chapter, since it has a different technical meaning in lingu[tics. However, the other obvious choice, "production," is even more contaminatedin the current context' 2. The earlier set of ideas about declarative knowledge rePresentation and spreading activation defined on that knowledge rePresentation are no doubt.alsoapplicableto languageProcessing,as indeed the priming studies inaicite (Swinney, 1979).However, these mechanisms are more important to the semanticsof languagethan to the syntax.
313
314
Nofes to Page 307
used articles' successfully(thank you, sesamestreet\. He occasionally auxiliaries verb all ing' for Except possessives' ur,d plural inflectionr, missing. were inflections and is leamable in 18. However, it is not the casethat any language certain entertain wilt ACT input, an an ACTlike framework. Given that of structure the about others, not and possible hypotheses, identify to. system inductive any for i*possible is logicutty input. lt It is ali possible languai"r, iin"t finite input. ACT is_no exception' hylanguage about preferences an open question"*n"Ift"t ACT's potheses"l*"y,correspondtohumanchoice.Thechapterhas ,ho*rr that it does in the caseof the two subsetsof English'
References
Anetsorv, R. P. 1981.Psychologicalstatus of the script concept. American Psychologist36, 7t5-729. Arsprnson,l. A, 1973.A theory for the recognition of items from short memorized lists. Psychological Reaiewffi , 417-498. Alqnnnson,I. A., and Hrnrox, G. E. 1981.Models of information processingin the brain. In G. E. Hinton and J. A. Anderson, eds.,parallel Modelsof ,4ssociative Memo4y.Hillsdale, N.J.: Erlbaum Associates. Axpnnsox, I. R. 1972. FRAN: A simulation moder of free recail. In G. H. Bower, ed.,The Psychologyof Learninganil Motiaation,vol. 5. New York: Academic Press. l-974.Retrieval of propositional information from long-term {,.memory. CognitiaePsychology 6, 451-474. 1975.Computer simulation of a languageacquisition system: a first report. In R. L. solso, ed.,lnformation Processing and cognition: The Loyola Symposium.Hillsdale, N.f.: Erlbaum Associates. 1976.Language,Memory, and Thonglrt.Hilsdale, N.f .: Erlbaum Associates. 79n. Induction of augmented transition networks. Cognitiae Sciencel, 125-157. 1978.Arguments concerning representationsfor mental imagery. PsychologicalReaiew 85, 249-Zn. 1979.Further arguments conceming representationsfor mental imagery: a responseto Hayes-Roth and Pylyshyn. psychologicatReaiew 86,395-406. 1980a.Concepts, propositions, and schemata:what are the cognitive units? NebraskaSyntposiumof Motiaation 28, 121-162. 1980b.CognitiaePsychologyand lts lmplications.San Francisco: Freeman. 1981a. Tuning of search of the problem space for geometry proofs. Proceedings of the Seventhlnternationalloint Conferenceon Artificial lntelligence. 1981b.Effects of prior knowledge on memory f.or new information. Memorynnd Cognition9,237-216.
316
References
1981c.A theory of languageacquisition based on generallearnof the SeaenthInternatianalloint Confer' ing mechanisms. Proceedings lntelligence. enceon Artificial 1981d. Acquisition of cognitive skill. oNR Technical Report, 81-1. 1981e.Interference: The relationship between resPonselatency and response accuracy. Iournat of ExperimentalPsychology:Human Learningand Memory 7,311-325. ti}Za. A proposal for the evolution of the human cognitive architecture. Unpublished manuscript, Camegie-Mellon University. 1982b. Acquisition of cognitive skill. PsychologicalReview 89, 369-406. 1982c.Acquisition of proof skills in geometry. In |. G. Carbonell, R. Michalski, and T. Mitchell, eds., Machine Learning,An Artifi' cial IntelligenceApproach. San Francisco:Tioga Press. 1982d.RepresentationalTypes: A Tricode Proposal. Technical Report #ONR-82-1, Carnegie-Mellon University. Ar.runnsoN,I. R., and BowER,G. H. 1972a.Recognition and retrieval Reuiew79'97-123' processesin free recall. Psychological VerbalLearningand Verbal Behaaior11, 594-605. 1973.Himan AssaciatiueMemory. Washington: Winston and Sons. 7974a.A propositional theory of recognition memory. Memory and Cognition2, 406-472. DZla.Interference in memory for multiple contexts. Memory and Cognition2, 509-514. ANonnsop,I.R., Fannult, R., and Seusns,R. 1982.Leaming to plan in LISP. Technical Report #ONR-S2-2, Camegie-Mellon University. ArsoensoN, I. R., Gnrnno, I. G., Ktttttr, P.1., and NEvns,D' M' 1981' Acquisition of problem-solving skill. In |. R. Anderson, ed., cognitiie Skills and Their Acquisition Hillsdale, N.I.: Erlbaum Associates. ANoensoN,I. R., Ktlr.lE,P. I., and Brlsrtrv, C. M. L977.A theory of the acquisitionof cognitive skills. ONR TechnicalReport 77-t, Yale University. lg7g. A general learning theory and its application to schema of Learningand Mo' abstraction.In G. H. Bower, Qd.,ThePsychology Press. Academic York: tivation, vol. L3, 277-318. New R. E. Snow, P. A' FeIn leaming 1980.Complex Pnocesses. derico,and W. E. Montague, eds.,Aptitude,Learning,and lnstruction, vol. 2. Hillsdale, N.l.: Erlbaum Associates. ANppns