THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 16
CONTRIBUTORS TO THIS VOLUME
Wil...
54 downloads
3996 Views
15MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 16
CONTRIBUTORS TO THIS VOLUME
William G . Chase K . Anders Ericsson Arthur C . Graesser Alice E Healy Werner K . Honig Barbee T. Mynatt Glenn V. Nakamura Zehra E Peynircioglu Kirk H . Smith Roger K . R . Thompson Michael J . Watkins
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
EDITEDBY GORDON H. BOWER STANFORD UNIVERSITY, STANFORD, CALIFORNIA
Volume 16 1982
ACADEMIC PRESS A Subsidiary of Harcourt Brace Jovanovich, Publishers
New York London Paris 0 San Diego 0 San Francisco 0SBo Paulo
Sydney
Tokyo 0 Toronto
COPYRIGHT @ 1982, BY ACADEMIC PRESS,INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY F OR M OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING F R OM T H E PUBLISHER.
ACADEMIC PRESS, INC.
1 1 1 Fifth Avenue. New York, New York 10003
Uriired Kirigdoni Edition ptrblislred by ACADEMIC PRESS, INC. ( L O N D O N ) LTD. 24/28Oval Road, London N W l 7 D X
LIBRARY OF
CONGRESS CATALOG CARD N U M B E R :
ISBN 0- 12-5433 16-6 PRINTED IN THE UNITED STATES OF AMERICA
82838485
9 8 7 6 5 4 3 2 1
66-30104
CONTENTS
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
Contents of Previous Volumes ................................................
xi
SKILL AND WORKING MEMORY William G. Chase and K . Anders Ericsson
111. A Theory of Skilled Memory ......... IV. Further Studies of Skilled Me V. Conclusion ....................................................
24
THE IMPACT OF A SCHEMA ON COMPREHENSION AND MEMORY Arthur C . Graesser and Glenn V . Nakamura I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. Methods ...................................... 111. A Schema Copy Plus Tag Model IV. Some Issues Confronting the SC
..........
60 66 71 79 93 97 103
V. The Fate of Four Alternative Mo ............................... VI. The Process of Copying Schema o Specific Memory Traces . . . . . . VII. Questions for Further Research .................................... References .................................... 105 V
vi
Contents
CONSTRUCTION AND REPRESENTATION OF ORDERINGS IN MEMORY Kirk H . Smith and Barbee i? Mynatf
I. 11. 111. IV. V. VI. VII. VIII. IX.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Review of Previous Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ................. Overview of the Experiments . . Experiment 1: Retrieval from Pa Experiment 2: The Role of Determinacy in Constructing Partia Experiment 3: Node Construction .................... Experiment 4: Diverging and Conv Experiment 5 : The Role of the Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary ..................................... . .. . . . .. .... References . . . . . . . . . . . . . . . . . . . . . . . . . .
111 1I4 121 122 127 134 140 145 149 150
A PERSPECTIVE ON REHEARSAL Michael J . Watkins and Zehra E Peynircioglu
I. 11. 111. IV. V.
Overview ..... . . ... ... ... ... ...... . . . . . . . . . . . . . . . . . . . . . ... . . . . . . . . . . . 153 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Meaning of Rehearsal . . Rehearsal for Free Recall Reconsidered . . . ... . . . . . . 158 Rehearsal of Nonverbal Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 References
SHORT-TERM MEMORY FOR ORDER INFORMATION Alice E Healy
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 11. Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 111. IV. . . . . . . . . . . . . . . . . . 227 V. VI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
vii
RETROSPECTIVE AND PROSPECTIVE PROCESSING IN ANIMAL WORKING MEMORY Werner K . Honig and Roger K . R . Thompson 1. Introduction: Retrospective and Prospective Remembering . . . . . . . . . . . . . . . . 11. Representations of Initial and Test Stimuli . ....................... 111. Differentiation of Trial Outcomes . . . . . . . . . .......................
239 242 250
IV. Comparisons among Working Memory Paradigms . . V. Memory for Multiple Items ...................... VI. Discrimination and Memory of Stimulus Sequences
. . . . . . . . . . 212 .......................... 211 References .............................................
Index . . . . . . . . .
.......................................
284
This Page Intentionally Left Blank
CONTRIBUTORS Numbers in parentheses indicate the pages on which the authors' contributions begin.
William G. Chase, Department of Psychology, Carnegie-Mellon University, Pittsburgh, Pennsylvania 15213 (1)
K. Anders Ericsson, Department of Psychology, University of Colorado, Boulder, Colorado 80309 (1) Arthur C. Graesser, Department of Psychology, California State University, Fullerton, California 92634 (59) Alice F. Healy, Department of Psychology, University of Colorado, Boulder, Colorado 80309 (191) Werner K. Honig, Department of Psychology, Dalhousie University, Halifax, Nova Scotia B3H 324, Canada (239) Barbee T. Mynatt,' Department of Psychology, Bowling Green State University, Bowling Green, Ohio 43403 (111) Glenn V. Nakamura, Department of Psychology, California State University, Fullerton, California 92634 (59) Zehra F. Peynircioglu, Department of Psychology, Rice University, Houston, Texas 77251 (153) Kirk H. Smith, Department of Psychology, Bowling Green State University, Bowling Green, Ohio 43403 (111) Roger K. R. Thompson, Department of Psychology, Franklin and Marshall College, Lancaster, Pennsylvania 17604 (239) Michael J. Watkins, Department of Psychology, Rice University, Houston, Texas 77251 (153) 'Present address: Computer Science Department, Bowling Green State University, Bowling Green, Ohio 43403.
ix
This Page Intentionally Left Blank
CONTENTS OF PREVIOUS VOLUMES Volume 1
Volume 3
Partial Reinforcement on Vigor and Persistence Abram Amsel A Sequential Hypothesis of Instrumental Learning E. J. Capaldi Satiation and Curiosity Harry Fowler A hlulticornponent Theory of the Memory Trace Gordon Bower Organization and Memory George Mandler Author Index-Subject Index
Stimulus Selection and a “Modified Continuity Theory ” Allan R. Wagner Abstraction and the Process of Recognition Michael I. Posner Neo-Noncontinuity Theory Marvin Levine Computer Stimulation of Short-Term Memory: A Component-Decay Model Kenneth R. Laughery Replication Process in Human Memory and Learning Harley A. Bernbach Experimental Analysis of Learning to Learn Leo Postman Short-Term Memory in Binary Prediction by Children: Some Stochastic Information Processing Models Richard S. Bogartz Author Index-Subject Index
Volume 2 Incentive Theory and Changes in Reward Frank A. Logan Shift in Activity and the Concept of Persisting Tendency David Birch Human Memory: A Proposed System and Its Control Processes R. C. Atkinson and R. M. Shiffrin Mediation and Conceptual Behavior Howard K. Kendler and Tracy S. Kendler Author Index-Subject Index
Volume 4 Learned Associations over Long Delays Sam Revusky and John Garcia On the Theory of Interresponse-Time Reinforcement xi
xii
Contents of Previous Volumes
G. S. Reynolds and Alastair McLeod Sequential Choice Behavior Jerome L. Meyers T h e Role of Chunking and Organization in the Process of Recall Neal F. Johnson Organization of Serial Pattern Learning Frank Restle and Eric Brown Author Index-Subject Index
Volume 5 Conditioning and a Decision Theory of Response Evocation G. Robert Grice Short-Term Memory Bennet B. Murdock, Jr. Storage Mechanisms in Recall Murray Glanzer By-products of Discriminative Learning H. S. Terrace Serial Learning and Dimensional Organization Sheldon M. Ebenholtz FRAN: A Simulation Model of Free Recall John Robert Anderson Author IndexSubject Index
Volume 6 Informational Variables in Pavlovian Conditioning Robert A. Rescorla T h e Operant Conditioning of Central Nervous System Electrical Activity A. H. Black T h e Avoidance Learning Problem Robert C. Bolles Mechanismsof Directed Forgetting William Epstein Toward a Theory of Redintegrative Memory: Adjective-Noun Phrases Leonard M. Horowitz and Leon Manelis
Elaborative Strategies in Verbal Learning and Memory William E. Montague Author Index-Subject Index
Volume 7 Grammatical Word Classes: A Learning Process and Its Simulation George R. Kiss Reaction Time Measurements in the Study of Memory Processes: Theory and Data John Theios Individual Differences in Cognition: A New Approach to Intelligence Earl Hunt, Nancy Frost, and Clifford Lunneborg Stimulus Encoding Processes in Human Learning and Memory Henry C. Ellis Subproblem Analysis of Discrimination Learning Thomas Tighe Delayed Matching and Short-Term Memory in Monkeys M. R. D’Amato Percentile Reinforcement: Paradigms for Experimental Analysis of Response Shaping John R. Platt Prolonged Rewarding Brain Stimulation J. A. Deutsch Patterned Reinforcement Stewart H. Hulse Author Index-Subject Index
Volume 8 Semantic Memory and Psychological Semantics Edward E. Smith, Lance]. Rips, and Edward J. Shoben Working Memory Alan D. Baddeley and Graham Hitch T h e Role of Adaptation Level in Stimulus Generalization David R. Thomas
Contents of Previous Volumes
Recent Developments in Choice Edmund Fantino and Douglas Navarick Reinforcing Properties of Escape from Frustration Aroused in Various Learning Situations Helen B. Daly Conceptual and Neurobiological Issues in Studies of Treatments Affecting Memory Storage James L. McGaugh and Paul E. Gold The Logic of Memory Representations Endel Tulving and Gordon H. Bower Subject Index
...
XI11
Toward a Framework for Understanding Learning John D. Bransford and Jeffrey J. Franks Economic Demand Theory and Psychological Studies of Choice. Howard Rachlin, Leonard Green, John H. Kagel, and Raymond C. Battalio Self-punitive Behavior K. Edward Renner and Jeanne B. Tinsley Reward Variables in Instrumental Conditioning: A Theory Roger W. Black Subject Index
Volume 9 Prose Processing Lawrence T . Frase Analysis and Synthesis of Tutorial Dialogues AllanCollins, Eleanor H. Warnock, andJosephJ. Passafiume On Asking People Questions about What They Are Reading Richard C;. Anderson and W. Barry Biddle The Analysis of Sentence Production M. F. Garrett Coding Distinctions and Repetition Effects in Memory Allan Paivio Pavlovian Conditioning and Directed Movement Eliot Hearst A Theory of Context in Discrimination Learning Douglas L. Medin Subject Index
Volume 11 Levelsof Encodingand Retention of Prose D.JamesDoolingand Robert E. Christiaansen Mind Your p’s and q’s: T h e Role of Content and Context in Some Uses of And, Or, and If Samuel Fillenbaum Encoding and Processing of Symbolic Information in Comparative Judgments William P. Banks Memory for Problem Solutions Stephen K. Reed and Jeffrey A. Johnson Hybrid Theory of Classical Conditioning Frank A. Logan Internal Constructions of Spatial Patterns Lloyd R. Peterson, Leslie Rawlings, and Carolyn Cohen Attention and Preattention Howard Egeth Subject Index
Volume 10 Some Functions of Memory in Probability Learning and Choice Behavior W. K. Estes Repetition and Memory Douglas L. Hintzman
Volume 12 Experimental Analysis of Imprinting and Its Behavioral Effects Howard S. Hoffman Memory, Temporal Discrimination, and
xiv
Contents of Previous Volumes
Learned Structure in Behavior Charles P. Shimp The Relation between Stimulus Analyzability and Perceived Dimensional Structure Barbara Burns, Bryan E. Shepp, Dorothy McDonough, and Willa K . Wiener-Ehrlich Mental Comparison Robert S. Moyer and Susan T. Dumais Th e Simultaneous Acquisition of Multiple Memories Benton J. Underwood and Robert A. Malmi Th e Updating of Human Memory Robert A. Bjork Subject Index
Immediate Memory and Discourse Processing Robert J. Jarvella Subject Index
Volume 14
A Molar Equilibrium Theory of Learned Performance William Timberlake Fish as a Natural Category for People and Pigeons R. J. Herrnstein and Peter A. de Villiers Freedom of Choice: A Behavioral Analysis A. Charles Catania A Sketch of an Ecological Metatheory for Theories of Learning Volume 13 Timothy D. Johnston and M. T. Turvey SAM: A Theory of Probabilistic Search of Pavlovian Conditioning and the Mediation Associative Memory of Behavior Jeroen G. W. Raaijmakers and J. Bruce Overmier and Janice A. Richard M. Shiffrin Lawry Memory-Based Rehearsal A Conditioned Opponent Theory of Ronald E. Johnson Pavlovian Conditioning and Habituation Individual Differences in Free Recall: Jonathan Schull When Some People Remember Better Memory Storage Factors Leading to Than Others Infantile Amnesia Marcia Ozier Norman E. Spear Index Learned Helplessness: All of Us Were Right (and Wrong): Inescapable Shock Has Multiple Effects Steven F, Maier and Raymond L. Volume 15 Jackson On the Cognitive Component of Learned Helplessness and Depression Conditioned Attention Theory Lauren B. Alloy and Martin E. I? R. E. Lubow, I. Weiner, and Paul Seligman Schnur A General Learning Theory and Its A Classification and Analysis of ShortApplication to Schema Abstraction Term Retention Codes in Pigeons John R. Anderson, Paul J. Kline, and Donald A. Riley, Robert G. Cook, and Charles M. Beasley, Jr. Marvin R. Lamb Similarity and Order in Memory Inferences in Information Processing Robert G. Crowder Richard J. H am s Stimulus Classification: Partitioning Many Are Called but Few Are Chosen: Strategies and Use of Evidence The Influence of Context on the Effects Patrick Rabbitt
Contents of Previous Volumes of Category Size Douglas L. Nelson Frequency, Orthographic Regularity, and Lexical Status in Letter and Word Perception Dominic W. Massaro, James E. Jastrzembski, and Peter A. Lucas
xv
Self and Memory Anthony G. Greenwald Children’s Knowledge of Events: A Causal Analysis of Story Structure Tom Trabasso, Nancy L. Stein, and Lucie R. Johnson Index
This Page Intentionally Left Blank
SKILL AND WORKING MEMORY William G. Chase CARNEGIE-MELLON UNIVERSITY PITTSBURGH, PENNSYLVANIA
K . Anders Ericsson UNIVERSITY OF COLORADO BOULDER, COLORADO
I. The Skilled Memory Effect.. . . . . . . . . . . .................... A. Short-Tern Memory Capacity.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Chess and Other Game Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Nongame Skills . . . . . . . . . . . . . . . . . . . . . . . . ........... 11. Analysis of a Memory-Span Expert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. The Effects of Practice on Digit Span ................... B. Mechanisms of Skilled Memory.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. A Theory of Skilled Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. The Structure of Long-Term Memory ................... B. Short-Term Memory and Attention.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Memory Operations.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Interference. . . . . . . . . . . . . . . . . . . . . . .............. E. Working Memory. . . . . . . . . . . . . . . . . ....... IV. Further Studies of Skilled Memory.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Analysis of a Mental Calculation Expert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. The Memory of a Waiter.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Sentence Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Conclusion ................................................ References .........................................
2 2 2 5 I 7 8 24 24 28 28 36 40 42 43 49 51 55 56
Why is memory so much better for skilled people in their domain of expertise? Our interest in this problem first began 3 years ago, when we started training a subject on the digit-span task. Over the course of 2 years of practice, our subject was able to increase his digit span from 7 digits to over 80 digits, and our analysis of this subject led us to our interest in memory performance of skilled individuals. In this article, we shall first review the literature on skilled memory, then we shall describe our analysis of skilled memory in the digit-span task, and finally we shall discuss THE PSYCHOLOGY OF LEARNING AND MOTIVATION, VOL. 16
1
Copyright 0 1982 by Academic Presr. Inc All rights 01 reproduction in any form reserved ISBN 0-12-543316-6
2
William G . Chase and K. Anders Ericsson
our latest work with a mental calculation expert, a waiter who memorizes food orders, and we shall discuss extensions of our work with normal subjects.
I. A.
The Skilled Memory Effect
SHORT-TERM MEMORYCAPACITY
The capacity of short-term memory has long been accepted as one of the most fundamental limits in people’s ability to think, solve problems, and process information in general (Miller, 1956; Newell & Simon, 1972). The memory span (about 7 unrelated symbols) is the most accepted measure of short-term memory capacity (Miller, 1956), and this severe limit on readily accessible symbols is commonly taken as a fundamental limit on the working memory capacity of the human information-processing system (Baddeley, 1976; Klatzky, 1980). This is, recent events attended to in the environment, knowledge states activated from long-term memory, and intermediate computations necessary for performing complex information-processing tasks are assumed to be held in short-term memory for immediate access. Working memory is equated with shortterm memory, and it is this severe constraint on the number of readily accessible symbols that limits our information-processing capacity. Memory span has even been taken by some people as a fundamental measure of intelligence (Bachelder & Denny, 1977a,b). The superior memory performance by experts in their area of expertise seems to fly in the face of these basic limits. B.
CHESSAND OTHERGAMESKILLS
The skilled memory effect has been in the literature for some time. de Groot ( 1966) discovered that chess masters have virtually perfect recall of a chess board after viewing it for only a few seconds (5-10 sec), whereas novices can recall only three or four pieces (Chase & Simon, 1973a). Chase and Simon (1973a) showed that this memory is specific to the master’s knowledge domain by presenting chess players with randomized chess positions and finding that recall was uniformly poor for all players, regardless of their skill level. In addition to the master’s superior memory for chess positions, Chase and Simon (1973b) also found that the master has greatly superior memory for sequences of moves. According to Chase and Simon (1973b), this memory performance is
Skill and Working Memory
3
the result of a vast knowledge base that the master has acquired through years of practice. This knowledge includes procedures for generating moves, stereotyped sequences of moves, and stereotyped patterns of pieces. In order to explain the master’s superior memory for positions, Chase and Simon suggested that the master recognizes familiar patterns that he sees often in his study and play, whereas the novice is able to notice only rudimentary relations in the limited time allowed in the chess memory task. When Chase and Simon (1973a) measured memory performance in terms of patterns rather than individual pieces, master and novice memory performance were much more similar, and the absolute magnitude of memory performance was closer to seven. They concluded that the limit in performance in the chess memory task is due to the limited capacity of short-term memory. The master holds retrieval cues in short-term memory for seven patterns, located in long-term memory; and at recall, these cues are used to retrieve each pattern, one at a time from long-term memory. The novice, on the other hand, must utilize all of his short-term memory capacity to store the identity, color, and location of three or four individual chess pieces. There was one discrepant finding in the Chase and Simon (1973a) study which, in retrospect, seems critical to our analysis of skilled memory. They found that even when the master’s memory performance was scored in terms of patterns recalled, and the sophisticated guessing strategies of the master were discounted, the master’s recall still often exceeded the accepted limits of short-term memory capacity (7 2). In short, the master’s recall of patterns even exceeded the capacity of short-term memory, and Chase and Simon (1973b) were unable to fully explain this phenomenon. Charness (1976) later demonstrated that these chess patterns (i.e., their retrieval cues) are not retained in short-term memory because they are not susceptible to, interference effects in short-term memory. Later we shall try to show that this result is perfectly compatible with our new conception of working memory. This skilled memory effect has been replicated many times (Charness, 1976; Chi, 1978; Ellis, 1973; Frey & Adesman, 1976; Goldin, 1978, 1979; Lane & Robertson, 1979), and the same effect has been found with expert players in the games of go, gomoku, and bridge. Reitman (1976) studied a professional-level go player whose perceptual memory for go patterns closely paralleled that of chess masters for chess positions. In another study, Eisenstadt and Kareev (1975) compared recall of go and gomoku patterns. They took advantage of the fact that go and gomoku are played on the same 19 X 19 board with the same black and white stones, but the objects of the’ games are different and the types of patterns are
*
4
William G. Chase and K. Anders Ericsson
different. In go, the object of the game is to surround the opponent's stones, whereas in gomoku, the object is to place 5 items (stones) in a row. They trained subjects to play both games, and then, in one experiment, they asked subjects to recall a go position and a gomoku position. In fact, subjects were shown the same pattern, except that it had been rotated 90" and the color of the pieces had been reversed so that subjects were unaware of the structural identity of the positions. The interesting finding of this study was that when subjects thought they were recalling a go position, their recall of go patterns (i.e. , stones crucial to the analysis of the position as a go game) was far superior to their recall of gomoku patterns (by a factor of almost 2 to l ) , and when subjects thought they were recalling a gomoku position, their recall favored the gomoku patterns by almost a 2-to-1 margin. Rayner (1958), in an interesting training study, was able to trace the development of gomoku patterns with practice. By studying a group of people over a 5-week period as they acquired skill in the game of gomoku, Rayner (1958) was able to describe the types of patterns that players gradually learned to look for, and the associated strategies for each pattern. The patterns are quite simple; the difficulty in learning them arises from the number of moves required to generate a win from a pattern. The most complicated strategy that Rayner described was an 11-move sequence starting from a fairly simply and innocuous-looking pattern of four stones. In his analysis of the acquisition of gomoku, Rayner (1958) described a process by which his subjects gradually switched from an analytic mode of working through the strategies to a perceptual mode in which they searched for familiar patterns for which they had already learned a winning strategy. In short, Rayner (1958) analyzed in his laboratory, over a 5-week period in a microcosm, the perceptual learning process that is presumed to occur on a much larger scale, over the course of years of practice, as chess players gradually acquire master-level proficiency. The skilled memory effect has also been found in the game of bridge, which has no obvious spatial component. Charness (1979) and Engle and Bukstel (1978) have both reported that high-level bridge experts can remember an organized bridge hand (arranged by suit and denomination) almost perfectly after viewing it for only a few seconds, whereas less experienced bridge players show much poorer recall. With unorganized hands, performance is uniformly poor for both experts and less experienced players. In addition, bridge experts were able to generate bids faster and more accurately, they planned the play of a hand faster and more accurately, and they had superior memory for hands they had played. Thus, it is our contention that bridge expertise, like chess, depends in
Skill and Working Memory
5
part on fast-access pattern recognition because patterns are associated with procedural knowledge about strategies and correct lines of play.
C. NONGAMESKILLS The skilled memory effect has also been demonstrated in domains other than games, such as visual memory for music (Salis, 1977: Slaboda, 1976). An additional important property of skilled memory has emerged from several of these nongame skill studies: hierarchical knowledge structures. Akin (in press) has analyzed the recall of building plans by architects and found several interesting results. First, as with chess players, architects recall plans pattern by pattern. Second, architectural plans are recalled hierarchically. At the lowest level in the hierarchy, patterns are fairly small parts of functional spaces, such as wall segments, doors, table in a corner. The next higher level in the hierarchy contains rooms and other areas, and higher levels contain clusters of rooms or areas. The fairly localized property of architectural patterns at the lowest level in the hierarchy is reminiscent of the localized nature of chess patterns reported by Chase and Simon (1973a). Only at the next level in the hierarchy do architectural drawings take on the functional form of the architectural space: rooms, halls, and so on. Architectural patterns seem similar to chess patterns in that functional properties are more important at higher levels, while structural properties are more important at lower levels. Egan and Schwartz (1979) have found superior recall of circuit diagrams by expert electronics technicians after a brief exposure (5-15 sec) of the diagram. Egan and Schwartz have also found evidence of a higher level organization for the skilled electronics technician. At the lowest level, the basic patterns were very similar to the chess patterns and architectural patterns in terms of their localized nature. The skilled technicians, however, were faster and more accurate in their between-pattern recall than the novices, which is good evidence for the existence of higher level organization. Egan and Schwartz concluded that expert technicians use their conceptual knowledge of the circuit’s function to aid in their recall. In the domain of computer programming, Shneiderman (1976) presented a print-out of a simple FORTRAN program or a scrambled print-out of a simple FORTRAN program to programmers with varying degrees of experience. The number of perfectly recalled lines of code from the real program increased dramatically with experience, whereas there was virtually no increase in recall with the scrambled program; for the most experienced programmers, there was a 3-to-1 difference in recall ( 6 vs 18 lines). McKeithen, Reitman, Rueter, and Hirtle (1981) have since repli-
6
William G. Chase and K. Anders Ericsson
cated this result with ALGOL programs. Schneiderman (1976) further showed that the nature of the errors by the experienced programmersreplacing variable names and statement labels consistently, changing the order of lines when it did not affect the program’s result-provided evidence that the experienced programmers were using knowledge of the program’s function to organize their memory for lines of programming code. The existence of higher level functional knowledge in the more experienced individuals has also been demonstrated in baseball fans. Chiesi, Spilich, and Voss (1979) have found that the differential recall of baseball events by individuals with high and low baseball knowledge can be traced to their differential ability to relate the events to the game’s goal structure. That is, high- and low-knowledge individuals were equally competent at recalling single sentences of baseball information. However, highknowledge individuals were better at recalling sequences of baseball events, presumably because they were better able to relate each sequence to the game’s hierarchical goal structure of advancing runners, scoring runs, and winning. A very similar result on normal subjects has been demonstrated by Bransford and Johnson (1973) for recall of paragraphs. Bransford and Johnson showed that subjects were better at recalling ideas from a paragraph if they were given an organizing principle for the paragraph at the time of learning, such as a title, an illustration of the main idea of the paragraph, or the topic of the paragraph. We suggest that recall is facilitated by the use of some abstract hierarchical organizing structure for the paragraph. The same must be true of scripts and schemas as organizing structures for stories and scenes (Biederman, 1972; Bower, Black & Turner, 1979). Although we will discuss this topic more fully in the analysis of our mental calculation expert, we briefly note here that mental calculation experts, as a side-effect of their computational skill, generally exhibit a digit span that is two or three times larger than normal (Hatano & Osawa, 1980; Hunter, 1962; Mitchell, 1907; Muller, 1911). To sum up the analysis so far, the skilled memory effect has been demonstrated in a variety of game-playing and non-game-playing domains, although the bulk of the research has centered on exceptional memories of chess masters. In theory, this exceptional memory performance has been attributed to the existence of a vast long-term knowledge base built up by the expert with years of practice. In game-playing domains this knowledge takes the form, in part, of patterns which serve as retrieval aids for desirable courses of action. It was suggested that in other domains, hierarchical knowledge structures exist in the expert for the purpose of organizing knowledge. For architectural drawings, functional
Skill and Working Memory
I
areas (e.g., rooms) serve to organize lower level structures (walls, furniture, etc.); for circuit diagrams and computer programs, function is used to organize the components; and for baseball games, the hierarchical goal structure of the game is used to organize sequences of events. Although Chase and Simon (1973a,b) did not find much evidence for the existence of hierarchical structure in the master’s memory of chess positions, we suggest that there must indeed be some organizing principle to account for the fact that the master’s recall of patterns exceeds his short-term memory capacity. We shall come back to this problem again later. Finally, before we enter into the analysis of our digit-span expert, we should briefly mention a distinctly different but related type of memory expert: the mnemonist. Unlike the skill-based expert, the mnemonist does not achieve his exceptional memory performance in a particular area of expertise. Rather, the mnemonist has acquired a system or repertoire of techniques for memorizing nonsense material. Persons with trained memories can use mnemonic techniques to memorize long lists of words, names, numbers, and other arbitrary items. The most common mnemonic technique is the use of visual images as mediating devices, and the most powerful system is the method of loci, in which items to be remembered are imagined in a series of well-memorized locations interacting with objects in these locations. Mnemonists have generally made themselves known as stage performers, although the techniques have received a great deal of attention recently in the psychological literature. A cognitive theory of exceptional memory should deal with both the expertise-based memory performance and the mnemonics-based memory performance. We shall return to the cognitive principles underlying mnemonics in a later section. (See Bower, 1972, for a good scientific analysis of mnemonic techniques; Yates, 1966, for a good historical analysis; and Lorayne and Lucas, 1974, for the current best-selling system.) 11.
Analysis of a Memory-Span Expert
In this section, we will describe the highlights of our previous analysis of digit-span experts (reported more fully in Chase & Ericsson, 1981; Ericsson, Chase, & Faloon, 1980), and in addition we report some new results of interest to our theory of skilled memory. A.
THEEFFECTSOF PRACTICE ON DIGITSPAN
The basic procedure in the memory span task is to read digits to subjects at the rate of 1 digit per sec followed by ordered recall. If the sequence is reported correctly, the length of the next sequence is in-
8
William G . Chase and K. Anders Ericsson
creased by one digit; otherwise the next sequence is decreased by one digit. Immediately after the recall of each trial, subjects are asked for a verbal report of their thought processes during the trial. At the end of each session, subjects are also asked to recall as much of the material as they can from the session. On some days, experimental sessions are run instead of practice sessions. Figure 1 shows the average digit span of two subjects as a function of practice. Both subjects demonstrate a steady, although somewhat irregular, increase in digit span with practice. It appears that 200-300 hours of practice is sufficient to yield performance that exceeds the normal memory span by a factor of 10. Our original subject, SF, began the experiment in May 1978 and continued for 2 years (a total of 264 sessions) before the experiment ended. The highest digit-span performance achieved by SF was 82 digits. We started training our second subject, DD, in February 1980 to see if it was possible to train another person to SF’s system, and now, after 286 sessions, the highest span achieved by DD is 68 digits. Until now, the highest digit spans reported in the literature have been around 20 digits, and these have generally been achieved by mental calculation experts (Hatano & Osawa, 1980; Hunter, 1962; Martin & Fernberger, 1929; Mitchell, 1907; Muller, 1911). How is this memory feat possible? To answer this question, we have resorted to an extensive analysis of our subjects’ verbal reports, we have conducted over 100 experimental procedures of various kinds on our two subjects, and we have even written a computer simulation of SF’s coding strategies. In the process, we have discovered three principles of memory skill that we believe characterize the cognitive processes underlying this memory skill: (a) subjects use meaningful associations with material in long-term memory, (b) subjects store the order of items in another longterm memory structure that we have called a “retrieval structure,” and (c) subjects’ encoding and retrieval operations speed up with practice. We shall consider each of these in turn. B.
MECHANISMS OF SKILLED MEMORY
1. The Mnemonic System
When we first started this experiment, we simply wanted to run a subject for a couple of weeks to see if it was possible to increase the memory span with practice and, if so, whether we could use the subject’s retrospective reports to figure out how it happened. The verbal reports ‘Sadly, SF died of a chronic blood disorder in the spring of 1981
Skill and Working Memory
9
80
60
Z
a4
v)
k 40
20
10
20
40
30
50
PRACTICE ( 5-DAY BLOCKS )
Fig. 1. Average digit span for SF (-0)
and DD
(Ap&
as a function of practice.
were very revealing of both the mnemonic system and the retrieval structure. The first 4 hours of the experiment were fairly uneventful. SF started out like virtually all the naive subjects we have run. On the first day, he simply tried to hold everything in a rehearsal buffer, and this strategy resulted in a perfectly average span of 7 digits. The next three days, SF tried another common strategy: Separate one or two groups of three digits each in the beginning of the list, concentrate on these sets first and then set them “aside” somewhere, and then hold the last part of the list in the rehearsal buffer; at recall, retrieve and recall the initial sets while simultaneously concentrating on the rehearsal buffer, and then recall the rehearsal buffer. (This strategy represents the first rudimentary use of a retrieval structure, which is the second component of the skill, to be described later.) This simple grouping strategy seemed to produce a slight improvement in performance (to eight or nine digits), but by Day 4, SF
William G. Chase and K. Anders Ericsson
10
reported that he had reached his limit and no further improvements were possible. And then, on the fifth day, SF’s span suddenly jumped beyond 10 digits, and he began to report the use of a mnemonic aid. From then on, SF’s performance steadily increased, along with the reported use of his mnemonic system and accompanying retrieval system. It turned out that SF was a very good long-distance runner-a member of an NCAA championship cross-country junior-college team-and he was using his knowledge of running times as a mnemonic aid. For example, 3492 = “near world-record mile time.” He initially coded only 1and 2-mile times, but he gradually expanded his mnemonic codes to include 1 1 major categories from 4 mile to marathon. In addition, he added years (e.g., 1943 = “near the end of World War II”), and later he added ages for digit groups that could not be coded as running times. For example, 896 cannot be a time because the second digit is too big, so SF coded this digit group as “eighty-nine point six years old, very old man.” Table I shows the major categories used by SF and the session number when they first appeared in the verbal protocols. By the end of 6 months-1 00 sessions-SF had essentially completed his mnemonic system and he was coding 95% of all digit sequences, of which the majority were running times (65%), a substantial minority were ages (25%), and the rest were either years or numerical patterns (5%). After 200 hours, SF coded virtually everything. Later, when we wanted to see if it was possible to train another subject to use SF’s menmonic system, we were able to enlist another exceptional runner, DD, who was a College Division 111 All-American cross-country runner. DD was able to learn SF’s mnemonic system without any trouble, although the system he eventually developed is somewhat different, in TABLE I MAJORCODINGSTRUCTURES Coding structure Three-digit groups Time Age + decimal Four-digit groups Time (3, 4. 5, 10 min) Time decimal Digit + time Year Age age
+
+
Examp1e
First reported (session no.)
8:05
5
49.7
70
13:20 4:10.6 9-7:05 1955 46 76
20 26 60 64 64
Skill and Working Memory
II
part because of the differences in the races he specializes in. DD also coded virtually everything after 200 hours of practice, and the relative proportions of running times, ages, years, and numerical patterns were similar to SF’s. It should be emphasized that the semantic memories of our two subjects are very rich. That is, SF and DD do not simply code digit groups as a member of a major category; there are many subcategories within each major category. For example, there are dozens of mile times: near worldrecord time, good work-out time for high school, training time for the marathon. Table I1 is a listing of the 1-mile categories around 4:OO derived from DD’s verbal protocol when he was recently asked to sort into categories a deck of 31 cards with running times ranging from 3:40 to 4:lO. The left-hand column of Table I1 contains the categories derived from a different protocol taken from SF 3 years earlier, after SF had had about 3 months of practice on the digit-span task. In this early protocol, we asked SF to divide the running-time spectrum into categories, although we did not ask him to describe each category. We were simply TABLE I1 SF’s AND DD’s CATEGORIES FOR TIMES BETWEEN 3:40 AND 4:20 ~~
~
DD’s categories SF’s category times
Times
349
34c344 346-349 347 349
350 35 1
350 35 1 352 353
Description of semantic category Slow M-mile times Coe and Ovett. I imagine a picture 1 saw in a magazine with Coe or Ovett and 348 on it. 347 point something is the new world record John Walker. With a decimal time, I think of John Walker in a race. Without a decimal, I picture Coe or Ovett New Barrier Old World Record for a long time Indoor World Record Darrell Waltrip
352-358 354-356 357-359
Now middle of the pack in a great race Breaking the 4-min mile
400 401-402
A sec or two off the 4-min mile
359 400
Still the Big Barrier
401414 4 0 3 4 12 415 416419 420
413420
Seems like everyone has run one of these Every good college miler has done a 40-something Teens. Usually associated with high school times
12
William G. Chase and K. Anders Ericsson
interested in determining the size of SF’s semantic network of running times. In that early protocol, SF reported 210 distinct running-time categories, including 81 1-mile categories. When this protocol was taken (after 3 months of practice), SF was coding mostly 1-mile and 2-mile times, which together comprised two-thirds of the 2 10 categories reported by SF at the time. Despite the differences in procedures, different amounts of practice, and important changes in the running world, it is interesting to examine the two sets of categories side by side. Although there is little direct correspondence of categories, there are some striking similarities. There are 10 or more distinct categories for each subject over this small range, many of which contain only a single (nondecimal) time. Note in DD’s protocol that several times are associated with specific events or people. Table 11 illustrates an important point about these mnemonic codes: They are semantically rich and distinctive. On the basis of SF’s verbal protocols, we were able to figure out his coding rules and eventually to incorporate these rules into a computer simulation model that predicted how SF would code a string of digits, with 90% accuracy. We have also conducted many experiments to test our theory of SF’s coding system. The first two experiments we conducted (Days 42 and 47) were a direct test of SF’s mnemonic system. We hypothesized that if SF were using a mnemonic system and we presented him with digit sequences that he could not code with his mnemonic system, then his performance would decline. We therefore presented SF with digit sequences that could not be coded with running times or easy numerical patterns. At that time, SF had not yet invented other categories for digit sequences that were nontimes. As expected, SF’s performance dropped about 20% from his normal average of 16 digits. In our second experiment, we presented SF with digit sequences that could all be coded as running times; under these circumstances, SF’s performance jumped by over 25%. We have several other indications that our subjects are using long-term memory in the digit-span task. Perhaps the most straightforward evidence is that both our subjects can recall almost all the digit sequences that they have heard after an hour’s session, although they cannot remember the order. Both our subjects, when asked to recall everything from a session, systematically recall three- and four-digit sequences category by category, starting with the shortest times @-mile times in DD’s system, and 4mile times in SF’s system) and they work their way through to the longest times (marathon), followed by ages, years, and patterns. Furthermore, within each category, they generally also start with the shortest times and work their way through to the longest times. We believe that our subjects
Skill and Working Memory
13
are using a simple generate-and-test strategy to search their semantic memory categories for recently presented times. To give a concrete example of the generate-and-test strategy in another domain, suppose you asked subjects to name all the states in the Union that begin with the letter M. One common strategy is to generate initial consonant-vowel sounds beginning with /m/,systematically working through all the vowel sounds, and see if any states come to mind. By “come to mind” we mean that a retrievaI cue is sufficiently similar to a node in long-term memory to cause its activation. In the subsequent recall task of our experiment, we believe that our subjects systematically think of running times within small ranges, such as those described in Table 11, and if any such traces have been generated recently, there is a high probability that they will be reactivated. Figure 2 shows the average percentage of items recalled by each of our subjects as a function of practice. Although we did not think of running this experiment until several weeks of practice had elapsed, we suppose that our two subjects were like other naive subjects in the beginning,
I
I
I
I
I
I
10
20
30
40
50
I
PRACTICE ( 5-DAY BLOCKS )
Fig. 2. Average percentage of aftersession recall for SF (0-0) function of practice.
and DD
(A----A)
as a
14
William G . Chase and K. Anders Ericsson
which is to say that virtually nothing is recalled from a digit-span task after an hour’s session. With practice, however, subsequent recall gradually approached 90% over the 200-300 hour range we studied. In another experiment (after about 4 months of practice), we tested SF’s recognition memory for digit sequences because recognition memory is a much more sensitive measure of retention than recall. On that occasion, SF not only recognized perfectly three- and four-digit sequences from the same day, he also showed substantial recognition of sequences from the same week. In another experiment (after about 4 months of practice), after an hour’s session we presented SF with threeand four-digit sequences but with the last digit missing, and he was asked to name the last digit. SF was able to recall the last digit 67% of the time after 4 months of practice; after 250 hours of practice, SF was virtually perfect at naming the last digit of a probe. Finally, we ran an extended recall session after Day 125 (Williams, 1976). At that time, SF was normally recalling about 80% of digit sequences from the session, and he generally took about 5 min to do it. We asked SF to try harder and keep trying until he could recall all the digit sequences from the session. After about an hour of extended recall, SF had recalled all but one four-digit sequence from the session. Every time we have asked for extended recall since then, SF has shown virtually perfect recall. We recently ran DD on extended recall after Session 286 and he too had virtually perfect recall (97%). Up to this point it seems clear that our subjects are making extensive use of semantic memory. We next address a question of theoretical importance concerning the role of short-term memory in this task. 2.
Short-Term Memory
How much information is being processed in short-term memory? Has the extensive practice produced an increase in the capacity of short-term memory? In one experiment, we attempted to determine how much information is in short-term memory by asking SF. In this experiment, we interrupted SF at some random point during a trial while he was being presented with digits, and we asked for an immediate verbal protocol. We wanted to know what SF’s running short-term memory load was and how far behind the spoken sequence he lagged. That is, how many uncoded digits and how many coded groups are kept in short-term memory? From SF’s verbal reports, we found that he was actively coding the previous group of three or four digits while the digits for the current group were still coming in, a lag of about 4 to 7 sec in time. DD’s verbal reports show a similar pattern, although he reports more information about numerical
Skill and Working Memory
15
patterns within groups and semantic patterns between groups. For example, typical relations noticed by DD, given the sequence 415527 are “a four-fifteen mile time with a repeating digit for the decimal; the time was run by a twenty-seven year-old man.” The interesting fact from both subjects’ protocols is that very little except the most recent few seconds are in short-term memory at any moment in time. We conclude that the contents of short-term memory include: (1) the most recent one, two, or three uncoded digits; (2) the previous group of three or four digits; and (3) all the semantic information associated with the mnemonic coding of the previous group. In a series of rehearsal suppression experiments, we wanted to see how much of the digit series was retained if the rehearsal interval between presentation and recall were disrupted. In one experiment, immediately after the list was presented, SF recited the alphabet as quickly as possible for 20 sec before recall. This procedure resulted in the loss of only the rehearsal buffer at the end-the last group of three to five digits at the end of the list. In two other experiments, we suppressed visual rehearsal by having SF either copy or rotate and copy geometric shapes for 20 sec in between presentation and recall. This procedure has been shown by Charness (1976) to interfere with short-term visual retention. However, in the digit-span task, this visual suppression procedure had no effect on performance. Two further experiments were designed to interfere with short-term memory processes during the presentation of digits. In one experiment, we introduced a concurrent chanting task (“Hya-Hya”) that has been used by Baddeley and his associates to suppress short-term memory (Baddeley & Hitch, 1974). In this task, SF said “Hya” after each presented digit. This procedure produced no decrement, and SF reported that he organized the chanting in a different phenomenal (spatial) location than his perception and coding of digits. In the second experiment, we produced a very substantial amount of interference with a concurrent shadowing task. We presented SF with a random letter of the alphabet between each digit-group boundary (every third or fourth digit), and his task was to say the presented letter as soon as he heard it. One experimenter read digits to SF at the rate of 1 digit per sec, and the other experimenter read a letter at the end of each group. Unlike the concurrent chanting task, this procedure produced a 35% drop in performance, even though there was only 4 to d as much verbalization required by the subject. It appears that the concurrent chanting task does not interfere with the phonemic short-term memory buffer, as Baddeley (1981) has also recently concluded. However, we believe that the shadowing task interferes with SF’s normal strategy of lagging behind the input of digits and using the
16
William G . Chase and K. Anders Eriesson
phonemic short-term memory buffer as a temporary storage for the incoming group while processing semantically the immediately preceding group. Finally, other evidence suggests that short-term memory capacity did not increase with practice. (1) SF’s and DD’s mnemonically coded groups were virtually always three and four digits. (2) Their rehearsal group virtually never exceeded six digits. (3) In their hierarchical organization of digit groups (to be described later), SF and DD never grouped together more than three or four digit groups. (4)There was no increase in SF’s or DD’s consonant letter span with practice on digits. ( 5 ) Without a single exception in the literature, expert mental calculators and other memory experts have digit groups of three to five digits (Hunter, 1962; Mitchell, 1907; Muller, 1911). These many converging lines of evidence led Chase and Ericsson (1981) to conclude that the reliable capacity of short-term memory is 3 or 4 units, independent of practice. The usual measure of short-term memory, the span, is the length of list that can be reported 50% of the time. However, the optimum group size for digits is three or four digits (Wickelgren, 1964), the running memory span is only about three digits, and long-term memory groups are also three or four items (Broadbent, 1975). Thus, the reliable capacity of short-term memory-the amount of material available almost all the time-is closer to three or four symbols. In speeded skills, three or four symbols is a more realistic estimate of shortterm memory capacity. In the digit-span task, the evidence seems to uniformly suggest that only a very small portion of the list of digits is in short-term memory at any point in time. During presentation, only a few seconds’ worth of material is in short-term memory, and after presentation, only the last group of three to six digits is rehearsed. Almost everything seems to be mnemonically coded in long-term memory. This leads to our next problem: If these digit groups are in long-term memory, how do subjects retrieve them? 3.
The Retrieval System
The simple model of retrieval in skilled memory proposed by Chase and Simon (1973a,b) is clearly inadequate to explain digit-span performance by our experts. They proposed that retrieval cues for chess patterns are stored in short-term memory and then used at recall to retrieve items from long-term memory. First, the rehearsal suppression experiments showed that very little coded information is retained in short-term memory. Second, both SF and DD recall too much (22 digit groups for SF and
Skill and Working Memory
17
19 digit groups for DD). Third, we ran a subject who used this simple strategy, and her digit span reached an asymptote of about 18 digits, or four mnemonically coded groups of digits. This subject developed a mnemonic system based on days, dates, and times of day (e.g., 9365342 = “September third, 1965, at 3:42 P . M . ” ) . This subject never developed a retrieval system, and she tried to hold the retrieval cues for these mnemonic codes in short-term memory. Her performance improved about as rapidly as SF’s and DD’s in the beginning, but she could never improve her performance above four mnemonic groups and she eventually quit the experiment from loss of motivation after about 100 hours. We have several reasons for proposing that our subjects developed what we have termed a “retrieval structure.” A retrieval structure is a long-term memory structure for indexing material in long-term memory. It can be used to store and order information, but is more versatile because it can allow direct retrieval of any identifiable location. A good example of a retrieval structure is the mnemonic system known as the Method of Loci because it provides a mechanism for retrieving a series of concrete items associated with identifiable locations via interactive images. We suggest that our subjects have developed a retrieval structure, analogous in some respects to the Method of Loci, for retrieving mnemonically coded digit groups in the correct order. The verbal protocols are very revealing about the retrieval structures. Before every trial, SF and DD both explicitly decide how they are going to group the digits. Figure 3 illustrates the development of SF’s retrieval structure, as revealed in his verbal protocols. SF started out by relying only on the short-term phonemic buffer (R) as his retrieval mechanism until he hit on the idea of setting aside the initial groups of digits and holding only the last few digits in the rehearsal buffer. This strategy is fairly common among subjects, however, and it is not unique to our skilled subjects. What makes the retrieval structure so powerful is that SF was able to store his mnemonically coded digit groups in these locations. Without the mnemonic, it is not clear how subjects would be able to associate many distinctive items with the different locations. Even so, SF experienced a great deal of difficulty keeping the order straight for more than three or four groups of digits. After about a month of practice, SF introduced a very important innovation in his retrieval structure: hierarchical organization. He began to separate groups of 4 digits followed by groups of 3 digits. We have termed these clusters of groups “supergroups.” Finally, when these supergroups became too large (more than four or five groups), SF introduced another level in his hierarchy (Day 109), and his performance improved continuously thereafter. DD’s hierarchical organization is very
William G . Chase and K. Anders Ericssun First reported (session no I
Number of digits
Retrieval structures
1
@
1.7
2
@
7.15
20
@
15-18
19.34
35.38
39.42
Fig. 3. Development of SF’s retrieval structure. On the left i$ shown the session number in which the retrieval structure was first reponed. and on the right is shown the range of digits over which the retrieval structure works. Squares linked together correspond to supergroups, and inhide each square is the number of digits corresponding to that group. The circled R corresponds to the reheard group of 4 to 6 digits.
similar to SF’s, and Fig. 4 illustrates our best guess as to SF’s grouping structure for 80 digits, and DD’s grouping structure for 68 digits. At their current levels of practice, SF and DD use at least a three-level hierarchy: ( 1 ) Digits -+ Groups, (2) Groups Supergroups, and ( 3 ) Supergroups -+ Clusters of Supergroups. In another study, run separately on SF and DD, after an hour’s session we presented our subjects with 3- and 4-digit groups from the session and asked them to recall as much as they could about that group. Subjects invariably recalled the mnemonic code they used and they often recalled
-
Skill and Working Memory
19
SF
DD
Fig. 4 . SF’s retrieval structure for 80 digits and DD’s retrieval structure for 68 digits.
the location of the group within the supergroup. On rare occasions when they were able to recall a preceding or following group, this recall was always associated with some relation between the groups, such as two adjacent 1-mile times. With the exception of this type of episodic information, retrieval of these mnemonic codes seems to be achieved via these hierarchically organized retrieval structures rather than through direct associations between digit groups. Another interesting aspect of our subjects is that they generally spend between 30 sec and 2 min rehearsing the list before they recall it, and their rehearsal pattern is revealing about the underlying retrieval structure. According to their verbal reports, both subjects rehearse the digit sequence in reverse, supergroup by supergroup, except the first supergroup. That is, both subjects rehearse the last supergroup, then the nextto-last supergroup, and so on, until they come to the first supergroup. Instead of rehearsing this initial supergroup, the subjects then go directly to the beginning of the list and start their recall. Within supergroups, SF generally rehearses in forward order and DD rehearses in reverse order. The interesting thing about these rehearsal patterns is that rehearsal is organized in supergroups. Besides the verbal protocols, there is a great deal of additional evi-
20
William G. Chase and K. Anders Ericsson
dence that our subjects use retrieval structures. The best evidence comes from the speech patterns during recall. In the literature, pauses, intonation, and stress patterns are well-known indicators of linguistic structure (Halliday, 1967; Pike, 1945). The speech patterns of SF and DD typically follow the same pattern. Digit groups are recalled rapidly a a normal rate of speech (approximately 3 digits per sec), with pauses between groups (about 2 sec between groups, on average, with longer pauses when subjects experience difficulty remembering). At the end of a supergroup, however, there is falling intonation, generally followed by a longer pause. In another study, we conducted a memory search experiment with SF after about 100 days of practice. We presented SF with a list of digits, but, instead of asking for recall of the sequence, we presented SF with a group of digits from the list and asked him to name the preceding or following group of digits. It took SF more than twice as long to name the groups preceding or following the probe if he had to cross a hierarchical boundary (10.1 vs 4.4 sec). Up to this point, we have described the two most important mechanisms underlying our subjects’ memory performance: the mnemonic system and the retrieval structure. However, these mechanisms are not sufficient to fully explain the performance of our subjects. These systems were essentially completed within the first 100 hours of practice for both subjects. Yet the performance of both subjects showed continuous improvements through 250 hours of practice, and there is no sign of a limit. There must be another mechanism. 4 . Encoding and Retrieval Speed
This aspect of memory skill has been the most elusive mechanism to track down. For one thing, our subjects’ verbal reports are of little use in analyzing changes in the speed of mental operations. For another, we have not been able to obtain a great amount of data supporting our theory of speedup. Nevertheless, we believe that the little evidence we have suggests that speedup is an important mechanism in skill acquisition in the memory span task. We have recorded latency data on both SF and DD in a self-paced presentation task several times over the past 3 years. In this task, we presented subjects with one digit at a time on a computer-controlled video display, the subject controlled the rate at which he received digits by pressing a button each time he wanted a digit, and we measured the time between button pushes. We also systematically manipulated the size of the list.
Skill and Working Memory
21
Figure 5 shows these latency data for both subjects as a function of the size of the list and practice. As one might expect, pauses tend to occur between groups, so we have displayed only the time between groups in Fig. 5. For both subjects, pause time increases with the size of the list. This result has been known for many years (Woodworth, 1938, p. 21), namely, there is more learning overhead for larger lists. The practice data are not as clear-cut for DD as for SF. Over a 2-year period, SF’s coding time has shown a very substantial decrease, and the decrease interacted with the size of the list so that there are bigger practice effects for larger lists. In SF’s case, the practice effect is so pronounced that there seems to be very little learning overhead for the larger lists after a couple of hundred hours of practice. In another experiment, we have several direct comparisons between our subjects and other memory experts in the literature on the speed to encode a 50-digit matrix (from Luria, 1968). Subjects in this task are shown a 50-digit matrix of 13 rows and 4 columns, and timed while they study it. Subjects are then timed while they recall the matrix, and then they are timed while they recall various subparts of the matrix (rows, columns, diagonals, and so on). These data are shown in Table 111 for DD, for SF (two trials spaced 1 year apart), for two well-known mnemonists in the literature (Hunt & Love, 1972; Luria, 1968), for our mental calculation expert AB, and for four unskilled subjects. A close examination of Table 111 reveals several interesting results. First, there is an enormous difference between memory experts and unskilled subjects in the time needed to memorize the list. Second, there is a large practice effect on learning time for SF. After a year’s practice, SF
DD
DAY 266
CAY 73
x
)
m
3
3
4
0
NUMBER OF DIGITS
5
0
10
20
30
40
50
NUMBER OF DIGITS
Fig. 5. Intergroup times for SF and DD as a function of list size and practice. The dependent variable is the time that subjects paused between groups when they controlled the visual presentation of digits.
STUDY AND
TABLE 111 RECALLTIMEON LURIA’S (1968) 50-DIGIT MATRIX^ Skilled subjects
Study time Recall time Entire matrix Third column Second column Second column up Zigzag OIn seconds
AB
Luria’s S
Hunt and Love’s VP
X
SD
SI
S2
S3
S4
X
SD
193
222
180
390
209
101
798
1240
685
715
860
258
38 68 28 27 41
51 56 40 54 52
40 80 25 30 35
42 58 39 40
45 60 36 38 46
7.3 13.0 8.2 10.9 11.9
77 125 81 112 123
95 117
42 42 31 46 78
51 78 40 63 94
66 90 66 76 101
24 38 37 28 19
SF
SF (1 year later)
DD
187
81
43 41 41 47 64
57 58 46 30 38
Unskilled subjects
110 83 107
Skill and Working Memory
23
was substantially faster than the other subjects on this task. Finally, there was very little difference in retrieval times among any of the subjects. This last result is unexpected, but interesting because it suggests that retrieval time depends upon how well learned the matrix is rather than on memory skill per se. That is, unskilled subjects can achieve almost as rapid retrieval as memory experts, provided that the former take the time to learn the digit matrix as well as the memory experts. In speeded tasks, of course, we would expect a deterioration in retrieval speed for the unskilled subjects because learning time would be severely limited. It is possible to compare SF’s learning time on the 50-digit matrix with Ruckle’s data (Fig. 6), reported by Muller (1911). As far as we know, Ruckle’s data represent the fastest learning times ever reported in the literature for digits (Woodworth, 1938; p. 21), and SF’s times are comparable after 2 years of practice. The data of Fig. 6 are only for visually presented lists; Ruckle’s auditory digit span was only about 18 digits. We mention one final experiment on encoding times. After about 50 hours of practice, we presented SF with digits at a rapid rate (3 digitskec) and found that SF could not code digits presented at this rate and his
(u).
Fig. 6 . A comparison between SF (0) and Professor Ruckle Shown is the time required to memorize visually presented digits as a function of the number of digits. SF’s data are taken from the experiment on the Luria matrices (Table 111) and Ruckle’s data are derived from Muller (1911).
24
William G. Chase and K. Anders Ericsson
performance dropped back to 8 or 9 digits. However, after 250 hours of practice, SF and DD were both able to code digits at these fast rates. They were both able to code one or two groups of 3 digits each and hold about 5 digits in their rehearsal buffer, to achieve a span of about 1 1 digits. This concludes our review of the major mechanisms underlying skilled performance in the memory span task. We next present our current ideas for a theory of skilled memory, along with some additional theoretical issues and more data of interest. 111. A Theory of Skilled Memory
We perceive the central issues of a theory of skilled memory to be the following: First, what is the structure of long-term memory? Second, what storage and retrieval mechanisms operate on this semantic memory to produce skilled memory performance? Finally, what role do retrieval structures play in skilled memory performance; and, in general, what is the role of working memory in skilled performance? A . THE STRUCTURE OF LONG-TERM MEMORY
1. Semantic Memory
We assume that our subjects’ knowledge of running times is stored as a hierarchical structure, which can be represented as a discrimination tree. In Fig. 7, we illustrate the portion of DD’s semantic network outlined in Table 11. We assume that as digits are presented to DD, he searches his discrimination tree for these categories. When he searches to a terminal node, we assume that recognition has taken place and a link is established between the terminal node in the semantic network and the episodic trace of the current digit group in short-term memory. Several aspects of our subjects’ behavior are consistent with this assumed structure. First, it explains the systematic generate-and-test characteristic of our subjects’ recall after the session. We assume that they simply search through this structure, activating each terminal node in turn, and from a terminal node they then activate any links between that terminal node and associated traces and report these traces. Second, there is evidence in the verbal protocols that subjects search a hierarchical structure. When we stopped subjects in the middle of a trial and asked for the contents of short-term memory (reported earlier), our subjects reported that when they are being presented with digits, they first notice the major category before making any finer categorizations. For example, given 357, they first notice that it is a 1-mile time before they notice that it is near the 4-min barrier. DD, in fact, explicitly reported that
Skill and Working Memory
25
Fig. 7. DD’s semantic network of I-mile times over the range of 346 to 420, derived from Table 11.
he waits until he hears the first two digits before he thinks about the category because one digit is too ambiguous. In our model of the semantic structure, two digits are sufficient to activate a nonterminal node in the tree, whereas one digit is not. After hearing two digits, DD says that he then makes a category decision (age, mile-time, etc.) and then the third digit is used to find a more meaningful category if possible. Finally, we report some latency data on S F that support our hierarchical model. In this experiment, after a session SF was presented visually with a digit group with one digit missing, and the task was to name the missing digit. Figure 8 shows that both the mean latency and the variance decreased monotonically with the position of the missing digit in the probe from first to last position, corresponding in our model to depth in the hierarchy. Further, the mean latencies decreased over a fairly large range (approximately 8 sec to 1 sec); a mean latency of 8 sec indicates a considerable amount of memory search. SF’s verbal protocols indicated that the earlier the missing digit is in the probe, the more extensive is his memory search. When the missing digit was in the third and fourth positions, SF often reported having direct access to the memory trace without any conscious awareness of search.
William G. Chase and K. Anders Ericsson
26
10
9
a
-P E
7
6
._ I-
5
4
3
2
I
1
Fig. 8. Latency to name the missing digit as a function of the location of the missing digit in the probe. The location of the missing digit is indicated at the bottom of the figure. Open squares represent three-digit probes, and darkened squares represent four-digit probes. Brackets represent 2 1 standard deviation, based on 10 or fewer observations.
2.
The Retrieval Structures
The second type of long-term memory structure that is relevant to skilled performance in the digit-span task is the retrieval structure. We assume that SF’s and DD’s retrieval structures have the hierarchical
Skill and Working Memory
21
forms portrayed in Fig. 4, and that they can also be systematically searched. In the beginning, we assume that the nodes in this retrieval structure are minimally differentiated, but with practice, each node takes on a distinctive set of features. That is, we assume that it takes practice, extensive practice, to use this retrieval system, just like any mnemonic system, and that practice involves learning to generate a set of distinctive features to differentiate one location from another. As with any mnemonic system, the more distinctive the better. One important issue concerns how versatile are these retrieval locations: What exactly can be stored in these locations? We had assumed that these locations were specific to abstract numerical concepts-running times, ages, years, and patterns for our subjects-because our subjects’ letter span did not improve along with their digit span, although we did not give our subjects much practice with letters. In another experiment, SF was able to store and recall perfectly a list of 14 names using his retrieval structure, so we do have some tentative evidence that these retrieval structures can store information other than digits. Storage locations in mnemonic systems have a similar limitation, but they seem more versatile. For example, the locations in the Method of Loci are specialized for concrete items for which a visual image can be generated. Rhymes are specialized for phonemically similar patterns. As we will discuss later, we think of a retrieval structure as a featural description of a location that is generated during encoding of digit groups, and these features are stored as part of the memory trace of a digit group. Then, at recall, these features will serve as a mechanism for activating the trace, when the featural description is attended to. The idea of a retrieval structure as a set of features stored with the memory trace, we believe, explains a great deal about the types of confusion errors that we have observed (to be described later). 3.
Contexl
Finally, a third type of long-term memory structure is relevant to the digit-span task: the context. We think that it is necessary to suppose that attended information is associated with the current context-the day, the trial number, the list length, the room and building, and probably much more. Furthermore, we think that attended information is automatically bound to the current context, unlike the retrieval structure, which requires control processes to bind information. We think that it is necessary to postulate the existence of current context because, otherwise, how is information not in short-term memory normally retrieved? That is, the everyday retrieval structure or working memory that people use all the
28
William G . Chase and K. Anders Ericsson
time to retrieve recent facts not in short-term memory but relevant to the ongoing task is the context. We do not have any concrete ideas about the form of the context, but it is probably not unreasonable to suppose that there is some type of hierarchical knowledge structure, analogous to a script, to which the current events are bound in some stereotypic fashion. In any case, we assume that in the digit-span task, memory traces are associated with the current context. B.
SHORT-TERM MEMORYAND ATTENTION
We simply assume that short-term memory is the set of knowledge structures that are currently active. Thus, short-term memory can contain graphic, phonemic, and semantic features. The rehearsal buffer, we assume, is a control structure, or retrieval structure if you will, for storing the order of a set of phonemic or articulatory features. We assume that for some basic unspecified reason, there is a limit to the number of knowledge structures that can be active at any moment. Attention refers to a property of the information-processing system which limits processing. The contents of attention refer to that subset of information in short-term memory that is attended to, and by “attended to,” we mean that this information serves as input to a process that requires attention. There is a class of processes that interfere with each other, that slow each other down, that compete with each other for sensory input channels, for short-term memory space, and so on. These processes are said to be attention demanding or controlled. Without getting involved in a discussion of the nature of attention, we will simply state that short-term memory places a limit on the number of knowledge structures that can be held simultaneously as input to a control process. As we discussed earlier, this limit seems to be about three or four symbols for the chunking process. We will equate our binding operation in long-term memory with attention. Our short-term memory and attention assumptions are of little consequence for the digit-span task, except that only one or two digit groups and their associated semantic information are in shortterm memory at any point in time. The interesting assumptions concern storage and retrieval operations. C . MEMORYOPERATIONS
1 . Storage
Our storage assumption is very simple: Memory traces attended to at the same time as an active long-term memory node are bound to that node, provided that they fit the node’s range. For example, in Fig. 6,
Skill and Working Memory
29
DD’s node for a GOOD COLLEGE MILE TIME will fit any time from 4:03 to 4:12, this node is not a good mnemonic for any sequence of digits, but only for a sequence of digits in the range specified. We adopt a featural representation of binding in which the memory trace and the semantic features activated from long-term memory are chunked together by virtue of being attended together. In the digit-span task, we assume that a digit group is bound to three long-term knowledge structures: the mnemonic association, the retrieval structure, and the current context. To take a concrete example, what happens when DD hears the digit string 4054? First, as the digit string is being perceived, he actively attends to the magnitude of the digits in order to classify it in his mnemonic system. As he perceives the first two digits, they are sufficient to activate two features in semantic memory corresponding to RUNNING TIME and 1 MILE. When he perceives the third digit, it is sufficient to activate the semantic feature of GOOD COLLEGE TIME, and when he hears the fourth digit, he notices that it is the same as the first digit, which activates a feature corresponding to SAME AS FIRST DIGIT. (We shall describe how our subjects parse decimals more fully later, when we discuss discrimination. ) This set of features is simultaneously attended to along with the trace of 4054, and a new memory chunk is formed. The current context and the location in the retrieval structure are also bound to the memory trace. The subject, as he is decoding the mnemonic code, also simultaneously thinks of the location within the retrieval structure and the current context, and featural descriptions of these long-term memory structures are activated and attended to simultaneously along with the trace and its mnemonic code. For example, suppose that DD notices that the previous group was also a 1-mile time, that it was faster, that these represent first and second place, respectively, in some imaginary race, and further that he had a similar time, 406.2, on the previous trial (a typical report). This information is also included as part of the context. Figure 9 depicts the final memory trace for 4054. We believe that this representation is consistent with a large number of observations. The additional links in Fig. 8 are included to illustrate the variety of associations that we have observed. The link between the location and the semantic code reflects the fact that subjects very often know any of several semantic features without knowing the actual digits. In fact, subjects’ verbal protocols indicate that the semantic code is invariably retrieved first, suggesting that the major link between the location and the trace is through the semantic code. However, the direct link between location and trace is necessary because subjects are able to recall digit groups without semantic codes. The links between context and location and between context and the
30
William G. Chase and K. Anders Ericsson
(
Semantic Code
)
Fig. 9. DD’s memory trace for the 1-mile time 4:05.4. Stored with the trace are semantic features describing the trace as a running time, features describing its location in the retrieval structure, and features corresponding to the current context. Included in the context are local features describing the decimal point as well as noticed relationships between the trace and other nearby digit groups, and global features describing noticed relationships between the trace and earlier digit groups, the trial number, and other global contextual features,
semantic code are there because the local context can be used to disambiguate either or both. The dotted lines indicate that the context contains information about other digit groups; the existence of these links is clearly seen in the clustering that occurs in the aftersession recall. The direct link between the context and the trace is there because people can still recall small recently heard groups of digits, even though they are not in shortterm memory, provided that there have not been too many such sequences. In our theory, context is virtually useless because it is not unique: If several digit sequences have been linked to the same context, then there are too many links to achieve activation. Finally, one might ask why it is necessary to assume a trace at all. Why isn’t memory retrieval a reconstructive process in which the set of features represents a sufficient code to reconstruct the event (Neisser, 1967)? The answer is that the semantic code is not sufficient to uniquely specify the event. In our example, GOOD COLLEGE TIME only specifies a range; in DD’s semantic network, 100 possible times (including decimal times) could fit this category. What the semantic code does is narrow the search in long-term memory for the memory trace. A good mnemonic should narrow the search to a single trace. But there still must be a trace.
Skill and Working Memory
31
Our theory is consistent with two related observations in the digit-span task concerning the retrieval structure: the limited size of supergroups and the hierarchical organization of the retrieval structure. Why should this be true? After all, there doesn’t seem to be any such constraints with other mnemonic systems, such as the Method of Loci. We speculate that with the Method of Loci and other mnemonic systems, the locations are so rich and distinctive that subjects have no trouble differentiating them. However, in the digit-span task, the subjects face the problem of building retrieval structures from nothing but position information. How is the subject to do this? We suppose that the subjects build supergroups by chunking them. That is, at the end of a supergroup, the subject must, according to our encoding assumption, attend to all groups simultaneously in conjunction with the current context. In fact, subjects’ introspections suggest that they are able to attend to only a few semantic features while grouping. Thus, according to our theory, the short-term capacity places a limit on the size of supergroups, and the hierarchical structure occurs because subjects have only enough capacity to group together a few abstract features representing groups, rather than the groups themselves. Another interesting property of the memory representation is redundancy. It is very common in our subjects’ retrospective reports that they notice such things as repeating semantic codes (e.g., two 1-mile times in a row) and many other kinds of relations. These redundant relations are very important to our subjects because they help to disambiguate the memory code and they aid in error recovery. It is very common for our subjects to retrieve only a very few features associated with a trace, and, with a combination of inference and further search, eventually recover an error or retrieve a missing trace. Our subjects also are good at judging certainty of their answers, and they can virtually always indicate when a digit group is right or wrong. The redundancy of the memory trace is a possible mechanism for this judged certainty. Before describing our retrieval assumptions, we should point out that our theory has focused on meaningful associations as the major mechanisms for building long-term memory structures, and we have said nothing about trace strength. This is in contrast to most memory theories, which focus on repetition as the major mechanism, and dwell time in short-term memory as the major determiner of strength (e.g., Anderson & Bower, 1974; Atkinson & Shiffrin, 1968; Raaijmakers & Shiffrin, 1981). We believe that both mechanisms operate, that attention time and number of redundant associations jointly determine the strength of meaningful associations, and this distinction between attention and meaningful associations vs short-term memory occupancy and rote repetition underlies the empirical distinction between elaborative and maintenance rehearsal (Bjork, 1975; Craik & Watkins, 1973). We believe that meaningful asso-
William G. Chase and K. Anders Ericsson
32
ciations are much more powerful, useful, and pervasive; and rote rehearsal is the default mechanism that people use when they cannot think of any meaningful associations.
2. Retrieval The process of retrieval during a trial, we assume, involves attending to a set of features in short-term memory, and this attention process will cause the activation of memory traces in long-term memory which contain the set of features. After a trial, with no information in short-term memory except an index to the current context, recall begins by activating the current context along with the first location of the retrieval structure. This should result in activation of the location information contained in the memory trace. From there, we assume that activation spreads jointly to the trace and to the semantic code, and spreading activation from the semantic code to the trace should normally be sufficient to activate the trace. In the case of recall after the session, retrieval is achieved by activating links between semantic memory and the trace. However, it is commonly reported by both SF and DD that during a trial when they have trouble remembering a digit group, they use the alternate time-consuming strategy of searching for it in semantic memory. When they do not know the mnemonic category, SF and DD sometimes take several minutes to search the semantic network before they retrieve an item. Figure 10 illustrates the various retrieval routes to the memory trace. It is interesting to compare the retrieval times for semantic memory and Semantic Memory
Retrieval Structure
/ I
4
Context
Fig. 10. Schematic representation of retrieval of the memory trace. The trace is accessible through its semantic code, its location in the retrieval structure, and the context.
Skill and Working Memory
33
working memory (i.e., the retrieval structure). In four memory search experiments (after about 100 hours of practice), we timed SF as he responded to a probe after being presented with a sequence of 30 digits. Two of the experiments involved accessing information via semantic memory: (1) Name the last digit of the probe, and (2) point to the location of the probe. In the first experiment, we assume that the first digits of the probe lead SF directly to the appropriate node in semantic memory, and SF uses the features of this node to activate semantic information in the memory trace. In the second experiment, SF is given the probe and must point to the location of the probe in a graphic representation of the retrieval structure. In this case, we assume that the probe activates the memory trace, which in turn activates the features corresponding to its location in the retrieval structure. In both cases, there is only a single direct link to activate, and the average latency was 1.6 sec (SD = .49 sec) . The other two experiments involved searching the retrieval structure for the trace: (1) Name the digit group pointed to in a graphic representation of the retrieval structure, and ( 2 ) name the group preceding or following the probe. In the first case, search begins with the retrieval structure, as in a normal recall trial; in the second case, the probe is first used to derive its location information, and from there, the retrieval structure is entered. Unlike the previous two tasks, retrieval is achieved via the retrieval structure. In both these cases, search time was much slower (average = 6.4 sec, SD = 2.9 sec). We interpret these results to mean that direct access in semantic memory is automatic and fast; access in working memory is controlled and relatively slow (Schneider & Shiffrin, 1977). As a corollary, we assume that the bottleneck in skilled performance is access to working memory, and that practice has its greatest effect on the speed of storage and retrieval operations in working memory. 3. Differentiation
Differentiation refers to processes that produce unique memory traces. We describe two such processes that our subjects use: (1) updating semantic codes and ( 2 ) coding the decimal place. According to our theory, mnemonics and meaningful associations derive their power from their ability to narrow the search in long-term memory to a unique memory trace. We have already discussed the role of redundancy in search. We have evidence from our subjects’ protocols that another mechanism is operating, a mechanism we will call updating. The issue concerns what happens when the subject is presented with more
34
William G . Chase and K. Anders Ericsson
than one digit group within the same mnemonic category. In the example presented earlier, what happens when the subject hears 4054 after hearing 4062 on a previous trial, since they both belong to the same semantic category? If they are not differentiated, then the semantic category will no longer serve as a unique cue to the memory trace. According to our theory, when the subject perceives the current digit group and activates the semantic features for the mnemonic category, this automatically results in the activation of any previous memory traces from the same category, within the same context. Thus, in our example, upon categorizing 4054,4062 (from the same category) is automatically reactivated, and this information is incorporated in the new memory trace. It is reasonable to suppose that a new hierarchical memory trace is formed from the combined memory traces, including any comparative information between the two traces, such as which is greater in magnitude. We have some evidence that updating is, in fact, occurring with our subjects. First, updated items are invariably recalled together in the aftersession recall. The average pause times between these items clustered in the output, for a sample of updated items taken from DD’s protocols, was 1.6 sec (SD = .92 sec), compared to 3.2 sec (SD = 3.32 sec) for pause times between nearby items. Second, on several sessions, we asked SF in his verbal reports after each trial to tell us when a digit group had reminded him of an earlier group. Out of a sample of 276 digit groups from two sessions, SF noticed similarities in 47 groups, approximately 17%. Our other subject, DD, reports slightly fewer such instances of updating (about 13%). In one experiment, after a regular practice session of 60 digit groups presented in six sequences, we presented DD with probes with varying degrees of similarity to the groups from the session. We presented these probes at the usual 1-sec per digit rate and we instructed DD to code them as he would in a normal session, but to indicate immediately when a probe reminded him of an earlier sequence. In this experiment, DD only recognized digit sequences in which the first three digits matched a previous group, and recognition occurred within a second after hearing the third digit. Thus, both subjects appear to be updating their memory traces. Finally, the speed of the process-somewhere between 1 sec or less to as much as 2 sec-is suggestive of the fast-access automatic retrieval from semantic memory that we described earlier. Both of our subjects report that they code decimal digits in terms of numerical patterns, although DD’s system is much more elaborate than SF’s. Figure 1 1, derived from SF’s verbal protocols, illustrates his coding system for decimals, which is basically designed around reference points. This simple system contains a total of only ten parsing rules, or ten features, which SF uses to code the decimal.
Skill and Working Memory
35
Fig. 11. SF’s system for coding digits, derived from his verbal protocols
In contrast, DD’s system is much more complicated. In one experiment, we asked DD to sort 181 running times (printed on cards) in the range 3400 to 4100 into equivalent piles. Within semantic categories, we counted 29 rules, all based on numerical relations, that DD used to code the decimal. Only 4 of these rules were similar to SF’s in that they assigned a feature to the decimal, based only on the magnitude of the decimal. These rules were, using DD’s terminology: (1) 0 = “flat,” (2) 5 = “half,” (3) 8 or 9 = “almost,” and (4) 1 or 2 = “just above.” The rest of the rules all involved numerical relations between the last digit and the preceding digits. These include such things as the last digit is the same as one of the preceding digits; the last digit is above the preceding digit by 1, 2, or 3; the digits are all odd or all even; and the last digit is some numerical combination of some of the previous digits. Furthermore, there is a rule hierarchy because the rules overlap. DD’s system is a very complex but rule-governed system for coding the last digit in terms of numerical patterns. The system is designed to discover a feature that can be used to uniquely code the decimal point relative to the semantically coded part of the trace. Both SF’s and DD’s digit-coding systems seem to work extremely well. From an analysis of the errors, we found that the chance of making an error on the decimal, given that the semantic part of the trace is reported correctly, was less than 1% for both subjects. This error rate is quite low compared to the unconditional error rate per digit group of about 4%. The two processes described in this section, we speculate, are instances of more general processes for differentiating semantic codes. Updating probably occurs all the time during normal cognitive processing; when-
36
William G . Chase and K. Anders Ericsson
ever more than one instance of an abstract category is noticed, it is important to keep each separate. The digit-coding system, on the other hand, is probably an instance of the more general process of generating elaborated, redundant memory codes in order to facilitate retrieval and disambiguation of memory traces. D.
INTERFERENCE
So far, we have said little about mechanisms of forgetting. However, we have some data on interference effects, most of which can be described within the theoretical framework we have outlined. Perhaps the most interesting data we have concern the buildup of proactive interference within a session. Figure 12 shows, for each subject over the last 100 sessions, the probability of recalling a sequence correctly as a function of the trial number within the session. Since we are using the up-and-down method, the average percentage correct is 50%. Both subjects have a substantial increase in the error rate as the session
1
2
3
4
Trial
Fig. 12. Percentage of correct recall for a trial as a function of the trial number, for both SF for the last 100 sessions. The standard error for these percentages, based on 100 observations, is about 5%.
(U and) DD (A-A)
Skill and Working Memory
37
progresses. Further, for both subjects there is also a substantial increase in the rehearsal interval as the session progresses. Figure 13 shows the average latency to begin recall as a function of trial number (for correct trials only, although the data are similar but slightly longer for incorrect trials). There is an important theoretical issue here: Is this forgetting due to a loss of order information, or are the semantic codes being weakened? In our theory, as memory fills up with traces, is there a loss of differentiation because the codes cannot be retrieved due to confusions among the similar locations in the retrieval structures, or are the semantic connec-
1
2
3
4
Trial
Fig. 13. Time between the last presented digit and the first recalled digit as a function of trial number for each subject. For the eight data points above, the average SD, based on the last 100 sessions, is 33 sec, and the average SE is 4.2 sec. ( L O ) , SF; DD.
(A-A),
William G. Chase and K. Anders Ericsson
38
tions being lost? According to the Encoding Specificity Principle (Tulving, 1979), long-term memory traces are not lost; what is forgotten are the appropriate retrieval cues. We have analyzed some data bearing on this issue. First, we analyzed 275 errors over an 86-day period for DD and 213 errors over a 78-day period for SF. As one might expect, there are many types of errors, almost all of which are at the group level: failure to recall a group, transposition of groups, intrusion of similar groups from earlier trials, and so on. Table IV shows a breakdown of errors into order errors and item errors. Item errors are more common than order errors, and the most frequent type of item error is reporting a digit group in the appropriate semantic category, but failing to get the digits exactly right. The most common type of order error is transposing two digit groups, usually from the same location between two supergroups. Thus, there are clearly some order errors, but there are more (partial) retrieval failures. The question still remains: What percentage of retrieval failures are caused by a loss of the connection between the location in the retrieval structure and the semantic code? Figure 14 presents some data bearing on this issue. These data show 10 days’ worth of data for both subjects on the aftersession recall task as a function of trial number. It is interesting to compare Fig. 13 with Figs. 11 and 12: the aftersession recall of digit groups is best for digit groups showing the poorest recall within the session. These data clearly suggest that the buildup of proactive interference over trials is due to a loss of connections between the location in the retrieval structure and the memory trace, because the memory trace is clearly accessible through the semantic code. Another interesting result in Fig. 13 is the significant loss on the early trials; there does appear to be a weakening of the memory trace in semantic memory. According to the Encoding Specificity Principle, the difference between good and poor mnemonic codes should disappear with a recognition test because the aftersession recall task is really a generateand-test recognition procedure. These results suggest that some amount of forgetting has occurred for memory traces from the early part of a session, contrary to predictions from the Encoding Specificity Principle. TABLE IV PERCENTAGE OF ITEM AND ORDERERRORS Item
SF DD
Order
82
18
71
29
Skill and Working Memory
39
Fig. 14. Percentage of correct recall of digit groups after the session as a function of trial number for SF (U (sessions ) 99-108) and DD (A-A) (sessions 1 1 1-120). The standard error for these percentages, based on slightly more than 100 observations, is about 5%.
It could still be argued, however, that the aftersession recall is not really a recognition procedure, and that much better performance was obtained with our recognition experiment (reported earlier). The alternative interpretation is that forgetting involves weakening of the connection between the semantic features and the memory trace. Finally, we report an interference experiment designed to see how fragile is the retrieval structure. Is a single schematic retrieval structure used over and over again, or are there multiple retrieval structures, one for each trial? We tested these possibilities by giving DD two trials in a row and then we asked for recall of both lists; DD first recalled the most recent list, and then he recalled the previous list. In this procedure, DD was presented with the first list and then given a normal amount of time to rehearse the list. However, instead of then asking for recall of that list, a second list was presented to DD, followed by rehearsal of the second list and then recall. Only when the second list had been recalled did DD attempt to recall the first list. In an hour’s session (on Day 195), we gave DD three pairs of lists of length 36 digits each. Although DD was unable
40
William G. Chase and K. Anders Ericsson
to achieve perfect recall to two lists in a row, on two of the three trials, he missed perfect recall by only a single error. On the third attempt, he missed about 30% of the previous list. In short, DD is able to differentiate trials well enough that we reject the idea of a schematic retrieval structure. We think the representation we have proposed in Fig. 8 is compatible with all the empirical results. It accounts for the present results by assuming that the context can be used to differentiate retrieval locations from previous trials. At the same time, it accounts for the confusion errors observed between different retrieval locations by assuming a partial loss of location features in the memory trace. Intrusion errors from previous trials, according to the theory, are caused by a loss of context features in the memory trace, and semantic errors are caused by loss of connections between location features and semantic features in the trace. E.
WORKINGMEMORY
In this section, we want to expand on what we think is an important implication of our work for a theory of skilled memory; the concept of working memory. Working memory has traditionally been thought of as the part of the memory system where active information processing takes place (Baddeley, 1976; Klatzky, 1980). Working memory is not exactly synonymous with short-term memory because short-term memory is usually taken to mean a passive storage system for item information, whereas working memory also contains control processes because they also require memory capacity. Baddeley and Hitch (1974) and Baddeley (1981) include the articulatory loop, the “visuospatial scratch pad,” and a central executive as part of the structure of working m-emory. The concept of working memory alone is not adequate to explain the performance of our skilled subjects in the digit-span task, or the skilled memory effect in general. Our research suggests that experts make associations with information in semantic memory, and they do not have to keep the information active during the retention interval; they can rely on retrieval mechanisms to reactivate information at recall. In the digit-span task, our subjects developed an elaborate retrieval structure for storing digit sequences. In the chess research, the reason the Chase and Simon (1973a,b) model underestimated the recall of chess masters was because it assumed that information was retained in short-term memory. We argue that the idea of working memory should be reconceptualized to include retrieval mechanisms that provide direct access to recent memory traces not in active memory. Perhaps this is a semantic distinction, and perhaps another term such as intermediate-term memory (Hunt, 1972), should be used to refer to temporary knowledge structures relevant
Skill and Working Memory
41
to the ongoing task. Nevertheless, these retrieval structures have the properties associated with working memory. The important properties of the short-term memory (STM) component of working memory are direct access and fast access to knowledge structures for input into processes. Retrieval structures provide direct access to knowledge structures, and they provide relatively fast access (say, within the range of 1-5 sec), thus avoiding the difficulties normally associated with long-term memory retrieval (such searches take a lot of time and cause interference by activating competing knowledge structures). Perhaps we should call these retrieval structures the intermediate-term memory (ITM) component of working memory. An important point we want to make about skilled memory is that the size of the ITM component of working memory expands with skill acquisition, and the retrieval speed increases. We speculate that at high levels of skill, retrieval speed from ITM approaches that of STM, which is less than a second. Thus, the ITM can serve as a useful part of working memory, greatly expanding the available knowledge states as inputs to mental operations. We think this is one reason why skilled performance of experts in many domains seems vastly superior to novice performance. This reconception of working memory is helpful in interpreting the literature in other domains besides skilled performance. For example, Shiffrin (1976) has argued that short-term memory does not have enough capacity to sustain performance in many tasks, and that context-tagged information in long-term memory is used to perform complex tasks. In other words, context can also serve as a retrieval structure for knowledge in some ongoing task, and hence can also serve as an important component of working memory. One reason that Baddeley (1976, 1981) has argued for an expansion of the concept of working memory is because complex tasks such as reasoning, comprehension, mental calculation, and learning can proceed with very little decrement when subjects have to maintain a near-span digit load simultaneously in STM (Baddeley & Hitch, 1974). Kintsch (1981) has recently argued that the current concepts of STM and working memory do not adequately account either for people’s ability to retain and use the meaning of text during reading, or their ability to retrieve more detailed propositional memory from reading text. In a recent article, Daneman and Carpenter (1980) showed that a domain-specific measure of working memory capacity is a far better predictor of reading ability than the traditional short-term memory span. In this measure, subjects were required to read a series of sentences and then recall the last word of each sentence in order. Correlations between this measure of working memory and measures of reading comprehension were typically in the range of .7 to .9, whereas word span correlated only
William G . Chase and K. Anders Ericsson
42
about .35 with measures of reading comprehension. Daneman and Carpenter argued that the reading processes of good readers are faster, more efficient, and take up less capacity in working memory, thus releasing more storage capacity for knowledge structures in working memory, hence these readers’ higher sentence memory span. Good readers achieve better comprehension, according to Daneman and Carpenter, because they have more facts in working memory at any time for their comprehension processes to work on. Although we agree in principle that skill development is associated with automated processing, our theory of skilled memory requires a different interpretation of their result. The working memory of good readers is expanded, according to our theory, because they have developed better structures for organizing and retrieving information of various types relevant to the comprehension process from semantic memory during the reading process. Their larger sentence memory span, we argue, is the result of utilizing these structures for storing sentences-or some deepstructure representations of the sentences-in long-term memory. Nevertheless, we agree with the important point made by Daneman and Carpenter’s experiment, namely, that skill in some domain is associated with an expanded working memory. We want to make one more point about encoding and working memory. How well an item is retrieved depends upon how it is coded for later use. This idea has been in the literature for some time as the encoding retrieval interaction principle derived from the levels-of-processing literature (Tulving, 1979), and the constructability principle in the information-processing literature (Norman & Bobrow, 1979). The idea is that a good encoding anticipates how it will be retrieved because it builds into the representation the retrieval cues that will arise at recall. In other words, skilled individuals have learned how to code information in a useful way, so that when it is needed in some context, the retrieval cues will be appropriate to achieve recall. Novices typically do not know when a fact is relevant, and they often fail to retrieve knowledge in their longterm memory that is relevant to some task performance (Jeffries, Turner, Polson, & Atwood, 1981). This is perhaps the reason that mnemonic systems do not seem very useful in skills: The retrieval mechanisms have to be domain specific because retrieval must occur when a fact is useful. IV.
Further Studies of Skilled Memory
In this section we present subsequent work in which we have attempted to expand our theory of skilled memory into other domains. Our later
Skill and Working Memory
43
work has taken two courses. In one direction we have analyzed already existing exceptional skills. We have been fortunate to be able to study two skilled individuals, a mental calculation expert (Chase, Benjamin, & Peterson, in preparation) and a waiter who remembers large numbers of orders (Ericsson & Polson, in preparation). In another direction, we have studied normal people in a domain where most people are skilled: sentence memory (Ericsson and Karat, 1981). A.
ANALYSIS OF A MENTALCALCULATION EXPERT
Our subject AB has a magic act that he terms “mathemagics” in which he does a variety of rapid mental calculation feats. For example, he can square a 2-digit number in 1 or 2 sec, he can square a 4-digit number in about 30 sec, and he can multiply two 2-digit numbers in about 5 sec. These mental calculation feats are far beyond the capacity of average people as well as mathematicians and engineers. AB claims that he is the only person in the United States with such a mathemagics act. AB’s digit span is about 13 digits, for which he uses a mnemonic system (described later); and his performance on Luria’s (1968) 50-digit matrix is also comparable to other memory experts (see Table 111). There is a well-documented literature on mental calculation experts, or “lightning calculators,” most of whom lived in the last century, before the advent of mechanical calculating aids. A common misconception is that most lightning calculators are mentally retarded or “idiot savants. Although there are a few documented cases of mentally retarded lightning calculators, most of the lightning calculators have been well-educated professionals. To take a few examples, Bidder was a very prominent British engineer, Ruckle was a German mathematics professor, and the great German mathematician and astronomer Gauss demonstrated his lightning calculating ability as a boy. (See Mitchell, 1907; and Scripture, 1891, for good reviews.) The only recent psychological study of a mental calculation expert is Hunter’s (1968) analysis of A. C. Aitken, a Cambridge mathematics professor and perhaps the most skilled lightning calculator reported in the literature. Aitken’s skill is based on two types of knowledge: ( 1 ) computational procedures and (2) properties of numbers. Aitken had gradually acquired a large variety of computational procedures designed to reduce memory load in mental computation. With years of intensive practice, these computational procedures gradually became faster and more automatic, to the point where Aitken’s computational skills were astounding. In addition to his computational procedures, Aitken also possessed a tremendous amount of ‘‘lexical’’ knowledge about numbers.
”
44
William G . Chase and K. Anders Ericsson
For example, he could “instantly” name the factors of any number up to 1500. Thus, for Aitken, all the 3-digit numbers and a few 4-digit numbers were unique and semantically rich, whereas for most of us, this is true only for the digits and a few other numbers, such as one’s age. This knowledge also provides a very substantial reduction in the memory load during mental calculation. Our subject AB has a typical history for a mental calculator. His interest in numbers really began at about age 6 (he is now 20 years old), and from that time to the present AB estimates that he has averaged several hours of practice a day. During this extended period of continuous practice, AB has discovered many numerical concepts by trial and error. For example, at around age 12, AB discovered the algorithm he uses to square numbers; interestingly, Aitken was about the same age when he also discovered the same squaring algorithm. Our analysis of AB began with his ability to square numbers, which turned out to be a fairly complex procedure. We expected, on the basis of our theory of skilled memory, that AB would use some type of retrieval structure to store the results of intermediate computations, and then he would retrieve these computations at some later point when he needed them. Our analysis of AB’s squaring procedure is based on about 10 hours of protocols, from which we derived a model, and about 20 hours of latency and error data on squaring 2-to-5-digit numbers. The heart of AB’s squaring procedure is the algorithm that reduces squaring to easy multiplication, and it is based on the following equation:
For example: 92 = 10 x 8 + l 2 1092 = 100 X 118 + 92
In words, the algorithm involves finding a number, d , which, when added to or subtracted from the number to be squared, A, generates a new number comprised of a single digit with trailing zeros. This in effect reduces the computation from a difficult n-digit by n-digit multiplication to a much easier 1-digit by n-digit multiplication, plus an (n - 1)-digit square. Also notice that the algorithm is recursive: An n-digit square is reduced to an easy multiplication plus an (n - 1)-digit square, which in turn is reduced to an easy (n - 1)-digit multiplication plus an (n - 2)-digit
Skill and Working Memory
45
square, and so on. Recursion stops with two-digit numbers because all two-digit squares are either memorized or, in those few cases where AB claims to use the algorithm on two-digit squares, the computations are so rapid and so familiar that they are virtually long-term memory retrieval. To give a concrete example of how the algorithm works, consider the following four-digit problem: 3,456* = 3000 X 3912 + 4562 = 11,736,000 + 500 X 412 442 = 11,736,000 + 206,000 + 1,936 = 11,943,936
+
Note that, as a result of the recursive process, three fairly large partial products accumulate in memory and must be added together. In general, for an n-digit square, there are y1 - 1 partial products. These types of mental arithmetic problems impose severe memory management problems, and this is what makes AB’s squaring procedure interesting for our theory of skilled memory. How can AB remember all of these numbers? One of the first things we discovered was that AB was using a mnemonic to store these partial products. AB had previously learned a standard mnemonic technique for converting digits to consonants and making a word out of the consonants. For example, the partial product in the above example, 736, can be converted to consonants: 7 = k, 3 = m, and 6 = g, and the consonants are then converted into words, such as 736 = “key mug.” Then at a later point in the problem, when AB needs to add partial products, he retrieves the mnemonic and decodes it. AB also uses his fingers as a mnemonic aid to store the hundred’s digit. In the above example, AB stores the digit 9 on his fingers. From AB’s verbal protocols we were able to derive a process model of his squaring algorithm. With the model, we were able then to make several predictions about how fast AB would be able to solve problems of varying degrees of difficulty; the model also gave us a way to objectively analyze the memory demands involved in squaring a number with the algorithm. The first analysis we did tried to account for the speed of problem solving as a function of problem size. Figure 15 shows the average time taken by AB to solve two-digit through five-digit squares, and Fig. 16 shows these same data replotted as a function of the model’s prediction of the number of symbols processed in working memory. Several structural variables from the model were regressed against solution time: (1) number of functions in the program, (2) number of
46
William G. Chase and K. Anders Ericsson
100
80
Y
I
m
E .-
60
I-
40
20
2
3
4
5
Number of Digits
Fig. 15. AB’s average solution time for squaring numbers as a function of number of digits. Brackets are SDs of averages for 17 days. (SDs for two and three digits--.2 and 1.0 sec, respectively-are too small to be shown.) Each daily average is based on 7 or 8 observations and each total mean is based on about 130 observations.
arithmetic operations, (3) number of mental operations, (4) number of chunks processed in working memory, and (5) number of symbols processed in working memory. None of these variables was able to adequately account for the rapid increases in time with problem size, but the two measures that did the best were number of chunks and numbers of symbols processed in working memory, with the latter variable (shown in Fig. 16) predicting best (RMSD = 7.6 sec). The interesting fits to the data were 482 msec/symbol, 1082 msedchunk and 3222 msedmental operation. The magnitude of these parameters seems well in line with what is generally known about the speed of mental operations (Chase, 1978). Our model, thus, seems to be a good first approximation of the speed of AB’s squaring algorithm. The model still does not predict a fast enough increase in solution time with problem complexity; however, we think that most of this complexity can be accounted for with further refinements of the model. Specifically, we think that we need to measure separately the speed of the various mental operations in our model, rather than simply assume that all operatings take the same amount of time. We are currently analyzing, at a finer grain level, the basic processes of addition and multiplication, which are used in more complex procedures.
Skill and Working Memory
47
Number of Symbols Processed in Working Memory Fig. 16. Observed and predicted solution time as a function of number of symbols processed in working memory.
Our model also makes predictions about error rates. We found that error rate was linear with the number of arithmetic operations. According to our model, each arithmetic operation that AB performs has a 2.7% chance of error. The overall error rates in the squaring procedure ranged from approximately 7% for 2-digit squares to approximately 45% for 5digit squares. The last, and perhaps the most interesting, analysis with respect to our theory of memory, is that of retrieval distance of various mental operations: How far back does AB have to go in memory to find inputs for his mental operations? This analysis has to be done within the framework of our model. That is, for problems of various size, we examined the trace of the model and computed the retrieval distance in terms of how far back in the trace were the inputs to the current operation. We generated a trace for three problems: 3452, 3,4562, and 34,5672, and the inputs for every mental operation were classified according to how many mental operations back they occurred in memory. Figure 17 shows the frequency of retrieval distance in operations; the distributions for the three problem sizes were combined because they were indistinguishable. Inputs that required decoding a mnemonic or other external memory aid are indicated in the figure with a dot. There are some interesting things to notice about these data. First, most of the inputs for mental operations come from very recent mental operations. In fact, over half the inputs for a mental operation come from the
48
William G. Chase and K. Anders Ericsson
40
35
30
2
25
C
a
IT
e
L
20
I5
10
5
5
10
15
20
Retrieval Distance
In Operations
tarice for Fig. 1 Frequency distribution of predicted solution times as a function of retrieval AB’s squaring algorithm. These frequencies are derived from three problems: 345*, 34562, and 345672. X s are retrievals without mnemonic aids, and dots are retrievals with the aid of a mnemonic.
immediately preceding operation. Second, inputs that were stored and retrieved with the aid of mnemonic are retrieved over much longer distances. From the analysis, we were surprised at how AB’s squaring procedure keeps the inputs for operations close in time. That is, AB’s squaring procedure seems to have been designed to minimize the working memory demands by deriving inputs to mental operations from immediately preceding operations. Even so, the squaring procedure is too complex to
Skill and Working Memory
49
keep everything in short-term memory. Partial products must be stored for fairly long periods (and with many intervening mental operations) before they are needed again. Under these circumstances, AB has resorted to mnemonics. Finally, we point out that even though the logic of AB’s squaring algorithm is recursive, recursion is very expensive in terms of memory load. AB has devised a complex procedure, the logic of which is iterative rather than recursive, to avoid the memory problems associated with recursion. B. THEMEMORYOF A WAITER
Ericsson and Polson (in preparation) have studied a waiter (JC), who is able to take up to 17 menu orders without any form of memory aid. The main focus of this research has been to describe the performance of this waiter in an experimentally controlled environment and describe the cognitive processes and structures underlying this memory feat. The initial phase of this study was concerned with finding an experimental analog of the restaurant environment. The people at the table in the restaurant were simulated by pictures of faces, and the order was read by an experimenter as the waiter pointed to the corresponding picture. To mimic the restaurant situation JC was allowed to ask for repetitions of items. JC controlled the rate at which he took orders, and he was timed until he signaled the experimenter that he was ready to recall. Each order consisted of a main course of a meat dish (eight alternatives) cooked to a certain temperature (five alternatives), with a starch (three alternatives) and a choice of salad dressing (five alternatives) and during the first part of the experiment also a beverage (nine alternatives). The beverage item was later omitted because JC argued that beverage orders are taken separately for dinners. Orders were generated randomly by a computer program. According to our subject JC, the experimental situation is much harder than the restaurant situation because of the randomness of orders. In the restaurant situation a relatively small number of the possible combinations is frequent. The experimental sessions consisted of two blocks, each consisting of an order of three, five, or eight items (people) in random order. JC was instructed to proceed as rapidly as possible without making errors when recalling the collection of orders. During some sessions JC was instructed to “think aloud” while doing the same task. JC was also tested for his memory of the orders at the end of the experimental session. Even though our experimental analysis of JC’s memory skill is not complete, we have found considerable evidence for the skilled memory
William G. Chase and K. Anders Ericsson
50
mechanisms described earlier. In our laboratory situation we were able to show that JC was able to perform the memory task with few, if any, errors in recall. The average presentation time of the first five sessions (5 items per order) is given in Fig. 18. In the same figure we have plotted the average times for Sessions 12-14 and Sessions 24-32, which are both based on 4 items per order. The presentation time is short and for the sessions with most practice it approaches the reading time for orders from a table of three people. We can also see a reliable decrease in presentation time as a function of practice. It may appear somewhat unexpected to find such a large speedup, given that JC had been taking orders without notes for several years prior to the experimental sessions. However, there appears to be little pressure for increasing encoding speed in the restaurant situation beyond the rate people are able to generate orders, and this rate is relatively slow. One of the difficulties in remembering dinner orders for normal people is the similarity between the orders. JC avoids interference by capitalizing on the redundancy created by similar items. From our thinking-aloud protocols it is clear that JC, at the time an order for a person is read to him, reorganizes this information into sublists with items of a given I
I
I
Sessions 1-5
200
-
12-14
U u)
I
0
E i=
24-32
100
3
5
0
Table Size
Fig. 18. Speed of taking orders for the skilled waiter. The SEs for the above points range from I .5 sec to 13.8 sec and the average SE is 6.5 sec.
Skill and Working Memory
51
category. Each sublist contains four items or less. For salad dressings JC uses the initial letters and searches for patterns or meaningful abbreviations or words. For example, once JC encoded “Blue cheese-Oil and vinegar-oil and vinegar-Thousand islands” as B-0-0-T or “boot. ” For temperatures, JC is sensitive to the dimension of rareness, which ranges from rare to well done, and encodes progressions and other types of patterns as well. There are only three different kinds of starches, and therefore there is a high probability of some kind of pattern occurring. JC encodes many other kinds of information about “spatial” position of the person making the order and relationships between the ordered items and the person making the order. However, the within-category encoding appears to be his principal means of encoding. One piece of evidence for JC’s coding strategy comes from the order in which he gives his immediate recall. In recalling orders from tables with five and eight people he does not preserve the presentation order of the items. Instead, JC recalls all salad dressings first and then all entrees, temperatures and starches. For a table of three persons JC originally recalled the information as presented (i.e ., entree, temperature, starch , and salad dressing) for each order before moving on to the next order. Recently, JC has changed to within-category recall even for three-person orders. We are now conducting experiments designed-to demonstrate the priority of the within-category encoding more directly. We have also studied JC’s memory for orders after the session. After Session 1 we constructed the pictures corresponding to the first table of five people, and JC could accurately recall 10 items or 40% of the presented items. He recalled the encoding for the salad dressings (COOBB) and a few isolated items, but not a single complete order. Then we reconstructed the second and last table of five persons, and JC recalled the presented information perfectly. Suggestive evidence indicates that a subsequent encoding of an order from a table with the same number of people leads to reduction of memory for the initial encoding. After Session 3 we asked JC to recall as much as possible about salad dressings. From the most recent set of table sizes (Block 2) JC recalled 14 items (88%) without regard to order, or 11 items (69%) if the order within a table has to be exactly correct. From the first block of tables JC recalled 4 items (25%). It should be noted that a similar low level of recall might have been obtained for our digit-span experts if they had had to rely on episodically based recall. C . SENTENCE MEMORY
Most of the above demonstrations of skilled memory refer to skills that only a small portion of the general population ever acquires. This raises
52
William G. Chase and K. Anders Ericsson
the issue of whether all adults are able to acquire and exhibit skilled memory. To address this concern Ericsson and Karat (1981) set out to search for evidence of skilled memory in a domain where all adults have developed a skill. The most obvious skill that all normal adults have is their ability to comprehend and generate meaningful language. In most respects we can compare the language skills of any human adult with other complex skills, such as chess. To make our study as comparable to the earlier work of Chase and Ericsson, we decided to use the methodology of measuring memory spans. We read sequences of words to subjects for immediate verbatim recall. We wanted to demonstrate an analogous finding to the one by Chase and Simon (1973a), that for scrambled chess pieces on a chessboard the chess master is no better than a novice in immediate recall of chessboards. We thus compared subjects’ immediate recall for meaningful sentences with the same words presented in a random scrambled sequence. From an extensive literature we know that normal subjects’ memory spans for unrelated words is on the average 6 words. Although we have not been able to find any attempts to measure peoples’ memory spans for meaningful sequences of words (i.e., sentences), it is clear from several studies and experiments that the span should be considerably higher, 10-12 words or more. The class of meaningful sentences is not well defined, so we did not attempt to generate the sequences. Like other investigators of skilled performance we collected instances (sentences), from real life. We sampled sentences of different length from two sources. The first source was two collections of short stories. The second source was three novels by John Steinbeck. We copied these sentences and only substituted pronouns for names. We generated scrambled word sequences by randomizing the order of words in these sentences. The subjects were first given a series of sentences, and then a series of scrambled sequences. All sequences of words were read at a constant rate (1 word/sec) in a monotone voice except for the last word, which was stressed to signal the subjects to write the sequence verbatim. In Fig. 19 we have plotted the percent perfectly recalled sequences as a function of number of presented words. Each point corresponds to averages based on more than 15 subjects’ responses to five or more different sequences (for more details see Ericsson & Karat, 1981). A measure of memory span is the number of words an “average” subject will correctly recall half of the time. The memory span for scrambled sequences is between 6 and 7 words, whereas the memory span for meaningful sequences (i.e., sentences) is about 14 words. The difference is statistically reliable.
Skill and Working Memory
5
6
7
8
53
9
Scrambled Sequence
Fig. 19. Percentage of correct recalled scrambled sequences as a function of the number of words in each sequence.
1. Coding
Some interesting results support the hypothesis that the words are not encoded and stored as units, but encoded in some other form. The almost linear relationship between number of words in a sentence and percentage of recall was based on averages over many sentences. Among these sentences we can find individual sentences for which this relationship does not hold. We found exceptionally difficult shorter sentences, such as the following 12-word sentence that less than a third of our subjects recalled correctly: “He had won a few dollars from a guard on the flatcar. ” (The italicized words were frequently altered, with the rest of the sequence recalled correctly.) On the other hand, several sentences with 20 words were recalled correctly by more than half our 15 subjects. In a subsequent experiment we included sentences of up to 30 words. One of the 26-word sentences was recalled correctly by 4 subjects, and the following 28-word sentence was recalled correctly by 2 subjects: “She brushed a cloud of hair out of her eyes with the back of her glove and left a smudge of earth on her cheek in doing it.” Further evidence is obtained from a preliminary analysis of errors. Subjects virtually always recall sentences that are semantically consistent with the presented sentence. Most errors concern lexical substitutions that
54
William G . Chase and K. Anders Ericsson
do not affect meaning, such as exchanging definite and indefinite articles and exchanging prepositions. Sometimes modifiers, such as adjectives and adverbs, are omitted. 2 . Postsession Recall
In one experiment we wanted to test subjects’ incidental long-term memory for presented sentences, for which substantial memory would be expected, versus scrambled words, for which little or no memory would be expected. We alternated sentences and scrambled sequences and asked for immediate written recall after each sequence. The major difference from earlier experiments was that we asked the subjects unexpectedly for cued recall of all the presented information afterward. A unique word from each sentence and each scrambled word sequence was presented in random order. Subjects were asked to recall as much as they could about the corresponding sequence. They were asked to underline parts of sequences they felt confident were verbatim. The main result from this experiment is that subjects’ cued recall of the sentences is remarkably high, but their recall of scrambled word sequences is essentially nil. In only 12% of the cases could subjects recall anything from the scrambled sequences, and in only 4% of the cases were they able to recall more than a single word. In contrast, sentences were recalled 79% of the time, with subjects mostly able to recall more than half of the presented words. This clearly suggests to us that a single cue word was able to access an integrated representation rather than just a single chunk or subunit. In a pilot study subjects were only given a free-recall instruction, and these subjects were only able to recall a few sentences. The superiority of cued recall indicates some interesting restrictions on when memory for the sentences can be accessed and used. Another aspect of skilled memory was demonstrated in this experiment, namely, the ability to monitor the correctness of one’s memory. Recall was almost 90% for words that subjects underlined to mark confidence that these words were verbatim. The corresponding percentage for words not underlined was only about 55%. This shows a highly reliable ability to assess correctness of recall. In another experiment we had subjects underline verbatim parts of their immediate recalls. Underlined words were correct 96% of the time and not underlined words were correct 75% of the time. 3 . Individual Differences Our experiments have also consistently found systematic individual differences in ability to recall sentences. Using traditional methods for
Skill and Working Memory
55
calculating span, we find span for words in sentences to range from about 11.0 to about 20.5 words for different subjects. When we analyze our data in terms of number of perfectly recalled sentences or percentage of recalled words we find reliable individual differences as well. In the last experiment we attempted to explore the source of the reliable individual differences in span on ability to recall. According to the skilled memory model, the best predictor of people’s ability to recall sentences verbatim is their level of language skill, which we attempted to assess by a test measuring correct language use and a test of verbal reasoning. To evaluate mediation of general achievement and intelligence, subjects were also given a test of numeric reasoning. Following our earlier procedure, we had subjects recall sentences and scrambled words. A regression analysis showed that the number of perfectly recalled sentences could best be predicted by a linear combination of language skill scores and the number of perfectly recalled scrambled sequences. It is interesting that language usage and verbal reasoning were unrelated to recall of scrambled sequences, which suggests that at least two independent factors underlie the ability to recall sentences: language skill and efficiency of rehearsal.
V.
Conclusion
In our work over the past 3 years, we have tried to discover the cognitive mechanisms underlying skilled memory performance. We have shown that skilled individuals are able to associate information to be remembered with their large knowledge base in the domain of their expertise, and further, they were able to index that information properly for later retrieval. In addition, practice storing and retrieving information causes these processes to speed up. The major theoretical point we wanted to make here is that one important component of skilled performance is the rapid access of a sizable set of knowledge structures that have been stored in directly retrievable locations in long-term memory. We have argued that these ingredients produce an effective increase in the working memory capacity for that knowledge base. The question arises as to what exactly is working memory? In part, there is a problem of definition and, in part, there is still considerable doubt about the mechanisms of working memory. For the sake of terminology, we suggest that working memory has at least the following components: (1) short-term memory, which provides direct and virtually immediate access to very recent or attended knowledge states; (2) intermediate-term memory, the task-specific retrieval structure in long-term memo-
56
William G . Chase and K. Anders Ericsson
ry , which provides direct and relatively fast access to knowledge states; and (3) context, which contains structures for controlling the flow of processing within the current task and provides relatively fast and direct access to knowledge structures relevant to the current task and context. The auditory and visual-spatial buffers are also important components of working memory, although they are not the focus of this article. The focus of our article has been on the important role of retrieval structures as working memory states. ACKNOWLEDGMENTS This research was supported by contract number N00014-81-0335 from the Office of Naval Research. We are grateful to Arthur Benjamin, Dario Donatelli, and Steve Faloon for serving as expert subjects.
REFERENCES Akin, 0. The psychology of architecrual design. London: Pion, in press. Anderson, J. R., & Bower, G. H. Human associarive memory. New York: Holt, 1974. Atkinson, R. C., & Shiffrin, R. M. Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation: Advances in research and theory. New York: Academic Press, 1968. Bachelder, B. .L., & Denny, M. R. A theory of intelligence: I. Span and the complexity of stimulus control. Intelligence, 1977, 1, 127-150. (a) Bachelder, B. L., & Denny, M. R. A theory of intelligence: 11. The role of span in a variety of intellectual tasks. Intelligence, 1977, 1, 237-256. (b) Baddeley, A. D. The psychology of memory. New York: Basic Books, 1976. Baddeley, A. D. The concept of working memory: A view of its current state and probable future development. Cognition, 1981, 10, 17-23. Baddeley, A. D., & Hitch, G. Working memory. In G. H. Bower (Ed.), The psychology of learning and motivation. New York: Academic Press, 1974. Biederman, I. Perceiving real world scenes. Science, 1972, 177, 77-80. Bjork, R. A. Short-term storage: The ordered output of a central processor. Hillsdale, New Jersey: Erlbaum, 1975. Bower, G. H. Mental imagery and associative learning. In L. W. Gregg (Ed.), Cognition in leurning and memory. New York: Wiley, 1972. Bower, G. H., Black, J. B., &Turner, T. J. Scripts in memory for text. Cognitive Psychology, 1979, 11, 177-220. Bransford, J. D., & Johnson, M. K. Considerations of some problems of comprehension. In W. G. Chase (Ed.), Visual information processing. New York: Academic Press, 1973. Broadbent, D. A. The magical number seven after fifteen years. In A. Kennedy & A Wilkes (Eds.), Studies in long-term memory. New York Wiley, 1975. Charness, N. Memory for chess positions: Resistence to interference. Journal of Experimental Psychology: Human Learning and Memory, 1976, 2 , 64-653. Chamess, N. Components of skill in bridge. Canadian Journal offsychology, 1979, 33, 1-50. Chase, W. G. Elementary information processes. In W. K. Estes (Ed.), Handbook of learning and cognitive processes. Hillsdale, New Jersey: Erlbaum, 1978.
Skill and Working Memory
57
Chase, W. G., & Ericsson, K. A. Skilled memory. In J. R. Anderson (Ed.), Cognitive skills and their acquisition. Hillsdale, New Jersey: Erlbaum, 1981. Chase, W. G., & Simon, H. A. Perception in chess. Cognitive Psychology, 1973, 4, 55-81. (a) Chase, W. G., & Simon, H. A. The mind’s eye in chess. In W. G. Chase (Ed.), Visual information processing. New York: Academic Press, 1973. (b) Chi, M. T. H. Knowledge structures and memory development. In R. S. Siegler (Ed.), Children’s thinking: What develops? Hillsdale, New Jersey: Erlbaum, 1978. Chiesi, H.L., Spilich, G. J., & Voss, J. F. Acquisition of domain-related information in relation to high and low domain knowledge. Journal of Verbal Learning and Verbal Behavior, 1979, 18, 257-273. Craik, F. I. M., & Watkins, M. J. The role of rehearsal in short-term memory. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 599-607. Daneman, M., & Carpenter, P. A. Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 1980, 19, 450-466. de Groot, A. D. Perception and memory versus thought: Some old ideas and recent findings. In B. Kleinmuntz (Ed.), Problem solving: Research, method, and theory. New York: Wiley, 1966. Egan, D. E., & Schwartz, B. J. Chunking in recall of symbolic drawings. Memory and Cognition, 1979, 7, 149-158. Eisenstadt, M., & Kareev, Y. Aspects of human problem solving: The use of internal representations. In D. A. Norman & D. E. Rumelhart (Eds.), Explorarions in cognition. San Francisco, California: Freeman, 1975. Ellis, S. H. Structure and experience in the matching and reproduction of chess patterns. Unpublished doctoral dissertation, Carnegie-Mellon University, 1973. Engle, R. W., & Bukstel, L. Memory processes among bridge players of differing expertise. American Journal of Psychology, 1978, 91, 673-690. Ericsson, K. A,, Chase, W. G., & Faloon, S. Acquisition of a memory skill. Science, 1980, 208, 1181-1182. Ericsson, K. A., & Karat, J. Memory for words in sequences. Paper presented at the 22nd Annual Meeting of the Psychonomics Society. Philadelphia, Pennsylvania, 1981. Frey, P. W., & Adesman, P. Recall memory for visually-presented chess positions. Memory and Cognition, 1976, 4, 541-547. Goldin, S. E. Effects of orienting tasks on recognition of chess positions. American Journal of Psychology, 1978, 91, 659-672. Goldin, S. E. Recognition memory for chess positions. American Journal of Psychology. 1979, 92, 19-31. Halliday, M. A. K. Intonation and grammar in British English. The Hague: Mouton, 1967. Hatano, G . , & Osawa, K. Digit span of grand experts in abacus-derived mental computation. Paper presented at the 3rd Noda Conference on Cognitive Science, 1980. Hunt, E., & Love, T. How good can memory be? In A. W. Melton & E. Martin (Eds.), Coding processes in human memory, New York: Holt, 1972. Hunter, I. M. L. An exceptional talent for calculative thinking. British Journal of Psychology. 1962, 53, 243-258. Hunter, I. M. L. Mental calcuation. In P. C. Wason & P. N. Johnson-Laird (Eds.), Thinking and reasoning. Baltimore, Maryland: Penguin, 1968. Jeffnes, R., Turner, A. A , , Polson, P. G., & Atwood, M. E. The processes involved in designing software. In J. R. Anderson (Ed.), Cognitive skills and their acquisifion. Hillsdale, New Jersey: Erlbaum, 1981. Kintsch, W. Comprehension and memory for text. Colloquium talk, Carnegie-Mellon University, October, 1981 . Klatzky, R. L. Human memory: Structures and processes (2nd ed.). San Francisco, California: Freeman, 1980.
58
William G. Chase and K. Anders Ericsson
Lane, D. M., & Robertson, L. The generality of levels of processing hypothesis: An application to memory for chess positions. Memory and Cognition, 1979, 7 , 253-256. Lorayne, H., & Lucus, J. The memory book. New York: Ballatine, 1974. Luria, A. R. The mind of a mnemonist. New York: Avon, 1968. Martin, P. R., & Femberger, S. W. Improvement in memory span. American Journaloffsychology, 1929, 41, 91-94. McKeithen, K. B., Reitman, J. S., Rueter, H. H., & Hirtle, S. C. Knowledge organization and skill differences in computer programmers. Cognitive Psychology, 1981, 13, 307-325. Miller, G. A. The magical number seven, plus or minus two. Psychological Review, 1956, 63, 81-97. Mitchell, F. D. Mathematical prodigies. American Journal of Psychology, 1907, 18, 61- 143. Miiller, G. E. Zur Analyse der Gedachtnistatigkeit und des Vorstellungsverlaufes: Teil I. Zeirschrqt fur Psychologie. Erganzungsband 5 , 191I . Neisser, U. Cognitive psychology. New York: Appleton, 1967. Newell, A., & Simon, H. A. Human problem solving. New York Prentice-Hall, 1972. Norman, D. A., & Bobrow, D. G. An intermediate stage in memory retrieval. Cognitive Psychology, 1979, 11, 107-123. Pike, K. The intonation of American English. Ann Arbor: University of Michigan Press, 1945. Raaijmakers, J. G. W., & Shiffrin, R. M. Search of associative memory. Psychological Review, 1981, 88, 93-134. Rayner, E. H. A study of evaluative problem solving. Part 1: Observations on adults. Quarrerly Journal of Experimental Psychology. 1958, 10, 155-165. Reitman, J. Skilled perception in Go: Deducing memory structures from inter-response times. Cognitive Psychology, 1976, 8, 336-356. Salis, D. The identification and assessment of cognitive variables associated with reading of advanced music at the piano. Unpublished doctoral dissertation, University of Pittsburgh, Pittsburgh, Pennsylvania, 1977. Schneider, W., & Shiffrin, R. M. Controlled and automatic human information processing. I. Detection, search and attention. Psychological Review, 1977, 84, 1-66. Scripture, E. W. Arithmetical prodigies. Journal of Psychology. 1891, 4, 1-59. Shiffrin, R. M. Capacity limitations in information processing, attention and memory. In W. K. Estes (Ed.), Handbook of learning and cognitive processes (Vol. 4). Hillsdale, New Jersey: Erlbaum, 1976. Shneiderman, B. Exploratory experiments in programmer behavior. International Journal of Computer and Information Sciences, 1976, 5 , 123- 143. Slaboda, J. Visual perception of musical notation: Registering pitch symbols in memory. Quarterly Journal of Experimental Psychology. 1976, 28, 1-16. Tulving, E. Relation between encoding specificity and levels of processing. In L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing in human memory. Hillsdale, New Jersey: Erlbaum, 1979. Wickelgren, W. A. Size of rehearsal group and short-term memory. Journal of Experimental Psychology, 1964, 68, 413-419. Williams, M. D. Retrieval from very long-term memory. Unpublished doctoral dissertation, University of California, San Diego, 1976. Woodworth, R. S. Experimental psychology. New York: Holt, 1938. Yates, F. A. The art of memory. London: Rutledge & Kegan Paul, 1966.
THE IMPACT OF A SCHEMA O N COMPREHENSION AND MEMORY Arthur C . Graesser and Glenn V. Nakamura CALIFORNIA STATE UNIVERSITY FULLERTON, CALIFORNIA
I.
Introduction . . . . . . . _
B. C.
.......................................
How Do Schemas Function .......................... Memory for Schema-Relevant versus -Irrelevant Information . . . . . . . . . . . . .
B. Measures of Memo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Schema Copy Plus Tag Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Representational Assumptions and Predictions . . . . . . B. Guessing Assumptions and Predictions . . . . . . . . . . . . . .. .. . . . . . . . . . , C. Retrieval Assumptions for Recall and Re ition . . . . . . . . . . . . . . . . . . . . . . D. Retention Assumptions and Predictions . . . . . . . . . . . . . IV. Some Issues Confronting the SC+T Model.. ....., . .................., A. Does the Typicality Effect Occur for Different Types of Schemas? . . . . . . . . B. Does the Typically Effect Persist When More than One Schema Guides 111.
............................................. ffect Occur When Scripted Activities Are Videotaped? D. Is the Typicality Effect Influenced by Presentation Rate?. . . . . . . . . . . . . . . . . E. Is the Typicality Effect Influenced by the Goals of the Comprehender? . . . . . F. Does the Typicality Effect Occur in Ecologically Valid Settings?. . . . . . G. Are Unpresented Typical Items Inferred at Comprehension or at Retrieval?. . V. The Fate of Four Alternative Models.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Problems with the Filtering Model.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Problems with the Attention-Elaboration Model . . . . . . . . . . . . . . . . . . . . . . . . C. Problems with the Partial Copy Model.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. Problems with the Schema Pointer Plus Tag Model . . . . . . . . . . . . . . . . . . . . . VI. The Process of Copying Schema Nodes into Specific Memory Traces . . . . . . . . . . A. Activation of Inferred Actions via the Generic Script.. . . . . . . . . . , . . . , . . . , B. Activation of Inferred Actions via Stated Passage Actions. . . . . . . . . . . C. Activation of Inferred Actions via the Activation of a Subchunk.. . D. Predicting False Alarm Rates for Unstated Script Actions . VII. Questions for Further Research. . ......................... References . . . . . . . . . . .................................
THE PSYCHOLOGY OF LEARNING AND MOTIVATION, VOL. 16
59
60 60 62 63 66 61 69 71 11
14 15 16 19 80 80 84 85 87 90 92
93 94 94 95 96 97 98 98 100
101 103 105
Copyright 0 1982 by Academic Press. Inc. All rights of reproduction in any form reserved. ISBN 0-12-543316-6
60
Arthur C. Graesser and Glenn V. Nakamura
I.
Introduction
How do generic knowledge structures (called schemas) influence the encoding and retrieval of meaningful stimulus input? This fundamental question has recently received a great deal of attention in several subareas of psychology, ranging from perception and memory to social psychology. As a consequence of this collective enthusiasm, there are literally volumes of data, speculations, and theorizing on the role of schemas in information processing (Adams & Collins, 1979; Anderson, 1977; Bobrow & Norman, 1975; Bregman, 1977; Flavell, 1963; J. Mandler, 1979, 1982; Minsky, 1975; Norman & Bobrow, 1976, 1979; Rumelhart & Ortony, 1977; Spiro, 1977; Taylor & Crocker, 1981; Thorndyke & Hayes-Roth, 1979; Thorndyke & Yekovich, 1980). This article will not provide an exhaustive summary of this literature. An entire article would be needed to discuss the historical roots and the current status of the schema construct in research and theory. We shall propose a schemabased model of encoding and retrieval in this article, but shall fall short of achieving a complete and satisfactory explanation. One question that a schema-based model must address is how the comprehender processes information that is typical versus atypical of a central organizing schema. There rarely is a perfect match between specific input and the schemas that the comprehender identifies and applies to the specific input. Some information is directly inconsistent with a schema, whereas other information is simply irrelevant. One of the highlights of our proposed schema-based model is that it explains memory for information that varies in typicality with respect to a schema. A.
WHAT IS A SCHEMA?
Researchers do not entirely agree on what schemas are. This fact has been unsettling to some researchers. It has also led to some debate and confusion as to whether it is appropriate to use the schema construct to guide theory and research. We acknowledge this problem, but at the same time we encourage psychologists to pursue their schema-based models and to clarify the essence of the theoretical construct. At some point, psychologists may converge on a “schema for schema” that is adequately developed, articulated, and shared by everyone. Until then, researchers should specify in some detail what they mean by a schema. For the purposes of this article, we shall adopt a relatively broad definition of a schema that would be accepted by most researchers. Schemas are generic knowledge structures that guide the comprehender’s interpretations, inferences, expectations, and attention. A schema is gener-
Schemas, Comprehension, and Memory
61
ic in that it is a summary of the components, attributes, and relationships that typically occur in specific exemplars. A schema for eating at a restaurant differs from the specific memory trace constructed when an individual eats at a specific restaurant at a specific time and place. A schema consists of knowledge in that its properties typically apply to its referent. Thus, its components, attributes, and relationships may normally apply to a specific referent, but need not apply out of necessity. The content of a schema is highly structured, rather than being simply a list of features or properties. It is convenient to view schemas as having variables which are eventually filled as a schema guides the comprehension of specific input (Minsky, 1975; Rumelhart & Ortony, 1977). For example, the variables of a restaurant schema include character variables (customer, waitress, cook, hostess); object variables (table, chairs, food, menus); and action, plan, or goal variables (the customer orders food, the waitress serves the food, the customer eats). These variables are filled with contextually specific referents when someone comprehends a specific restaurant experience. A schema is instantiated when variables have been filled and conceptually interrelated in a specific context. Sometimes a specific variable is filled by default. For example, we normally infer that a cook prepared the food, even though we never see this task. A passage about someone eating at a restaurant might not state that the customer pays the bill, but this would be inferred by default by the comprehender. When one variable of an instantiated schema is filled, there may be repercussions on the other variables. For example, if a restaurant has a hostess, we would expect the cuisine to be relatively impressive. The variables affect one another in a complex but systematic way. When a schema is instantiated, the constructed memory trace is a highly integrated structure. We shall assume that schemas represent many different knowledge domains. There are schemas for person stereotypes and roles (Hamilton, 1981; Reeder & Brewer, 1979; Taylor & Crocker, 1981), goal-oriented action sequences (Nelson, 1977; Schank & Abelson, 1977), spatial scenarios (Biederman, in press; Brewer & Treyens, 1981; Goodman, 1980; Palmer, 1975), and other knowledge domains. Some schemas are very abstract, such as story schemas (Mandler & Johnson, 1977; Rumelhart, 1975, 1977; Stein & Glenn, 1979; Thorndyke, 1977). Other schemas are more concrete and embody world knowledge, such as an action-based schema for eating at a restaurant or a visual-spatial schema for an office. Several terms have been invented to capture the unique properties of different types of schemas, for example, scripts, stereotypes, themes, macrostructures, models, frames, and memory organization packages
62
Arthur C. Graesser and Glenn V. Nakamura
(MOPS). Scripts will receive a great deal of attention in this article. Scripts correspond to activities that one or more characters enact frequently (e.g., eating at a restaurant). The actions that characters perform in a script are ordered logically, conventionally, or in a manner constrained by the environment. Schank and Abelson (1977; Abelson, 1981) introduced the script construct and have specified the properties and functioning of scripts in detail. B. How Do SCHEMAS FUNCTION?
A distinction will be made between two stages of schema utilization, called schema identification and schema application (see Norman & Bobrow, 1976; Schank & Abelson, 1977). During schema identification, the comprehender identifies a schema which provides a good fit to some aspects of the input. This is a process of pattern recognition. As information accrues in a data-driven fashion, the information matches the components, attributes, and relationships of a particular schema better than alternative schemas. Of course, ambiguities sometimes occur when the information matches more than one schema equally well. It is important to determine whether the comprehender has identified the appropriate schema in experiments. When a schema title is provided for a passage, it is safe to assume that the comprehender has identified the appropriate schema. When no schema title is provided, then comprehenders must induce a schema or theme. Dramatic differences in comprehension and memory performance may emerge between conditions in which the schema has been identified and conditions in which the schema has not been identified. For example, memory for passages substantially improves when the subject has identified the central schema for a passage that would otherwise be vague or ambiguous (Anderson, Spiro, & Anderson, 1978; Bransford & Johnson, 1973; Dooling & Lachman, 1971). Once a schema is identified during comprehension, the schema application stage begins. During schema application, the schema guides processing in a conceptually driven fashion. Several events and processes occur during schema application. First, the schema influences the perception and interpretation of presented information. Experiences would be ambiguous or difficult to interpret if no schemas provided background knowledge. Second, the schema governs the attention that is allocated to elements in the stimulus material. In most conditions, more attention is devoted to information that deviates from the schema than information that is relevant to the schema (Bellezza & Bower, 1981b, 1982; Friedman, 1979; Krueger, 1981; Loftus & Mackworth, 1978; Taylor & Crocker, 1981; den Uyl & Oosterdorp, 1980). For example, more atten-
Schema, Comprehension, and Memory
63
tion would be allocated to an octopus in a farm scene than a tractor in a farm scene. However, when the comprehender is confronted with more than one activity simultaneously and the comprehender intends to concentrate on only one activity, the schema associated with the target activity guides attention to schema-relevant information (Neisser, 1976; Neisser & Becklen, 1975). Third, the schema provides the background knowledge needed to generate inferences. As we mentioned, inferences occur when schema variables are filled by default. Fourth, schemas provide the knowledge base for formulating expectations about subsequent events during comprehension. It should be apparent that schemas are very powerful and intelligent knowledge structures. During schema application, the schema imposes an interpretation on the input, guides attention, generates inferences, and formulates expectations. VERSUS -IRRELEVANT C. MEMORYFOR SCHEMA-RELEVANT INFORMATION
When a specific passage or experience is comprehended, some information is relevant to a central organizing schema (typical), whereas other information is irrelevant or inconsistent (atypical). Tearing up a bill is inconsistent with a restaurant schema, and reading a letter is irrelevant. How well is typical versus atypical information retrieved from memory? We shall propose a model that attempts to explain the influence of typicality on memory. However, before turning to this model, we shall describe four alternative hypotheses or models that have predicted how typicality and memory are related. These are the filtering hypothesis, the attention-elaboration hypothesis, the partial copy model (Bower, Black, & Turner, 1979), and the schema pointer plus tag model (Graesser, 1981). There has been widespread disagreement on the effects of typicality on memory, as well as on an explanation of these effects. According to the filtering hypothesis, schema-relevant information tends to be preserved in memory, whereas irrelevant information tends to be discarded. The schema organizes the typical information in a cohesive manner, with irrelevant information being loosely associated or not acquired. Many researchers have implicitly adopted a filtering hypothesis when deriving predictions about memory. This filtering hypothesis can be readily detected in memory studies that involve person stereotypes (Cantor & Mischel, 1977; Hamilton, 1981; Rothbart, 1981; Taylor & Crocker, 1981), prose passages (Bransford, 1979; Kintsch & Van Dijk, 1978; Spilich, Vesonder, Chiesi, & Voss, 1979), perception of real-world scenes (Brewer & Treyens, 1981), and chess configurations (Goldin, 1978).
64
Arthur C. Graesser and Glenn V. Nakamura
Researchers have not controlled for guessing in most studies that allegedly support the filtering hypothesis. Suppose that a student were asked to recall all the actions and gestures that a professor performed during a lecture on a specific day. The student would probably recall several lecture-related actions that the professor never performed. The student might erroneously recall that the professor opened a briefcase, yet the student would not erroneously recall an irrelevant action, such as taking an aspirin. Such guesses yield an unfair advantage to actions that are typical of a schema. Of presented actions that are recalled, some actions would be accurately retrieved from the specific memory trace, whereas others would be guesses. Of course, the problem of guessing also occurs on recognition tests. An accurate assessment of memory involves a discrimination between presented and unpresented information. In contrast to the filtering hypothesis, the attention-elaboration hypothesis predicts poorer memory for schema-relevant information than irrelevant or inconsistent information. According to the attention-elaboration hypothesis, more cognitive resources are allocated to information that deviates in some way from the schema. When an individual allocates more cognitive resources to an item, there is an increase in attention, rehearsal, depth or breath of conceptual elaboration, or the number of associations with other items. More resources are allocated to the deviations because the latter are difficult to relate to the mass of schemarelevant information. Since the comprehender allocates many resources to deviations from the schema, this information is later remembered better than typical information. Some researchers have implicitly or explicitly adopted the attention-elaboration hypothesis (Bobrow & Norman, 1975; Hastie, 1980; Hastie & Kumar, 1979; Srull, 1981). The attention-elaboration hypothesis has received its share of support. Experiments have confirmed that atypical information draws more cognitive resources than does typical information when resources are measured by study times during prose comprehension (Bellezza & Bower, 1981b; den Uyl & van Oostendorp, 1980) and by eye movements during the inspection of visual input (Friedman, 1979; Loftus & Mackworth, 1978). Memory experiments have also supported the prediction that memory discrimination is poorer for typical than atypical information. Schemainconsistent information is remembered better than schema-consistent information when passages involve scripts (Bower er al., 1979) or stereotypes (Hastie, 1980; Hastie & Kumar, 1979; Srull, 1981). There is better recognition memory for faces that have unusual features than faces that are prototypical (Going & Read, 1979; Light, Kayra-Stuart, & Hollander, 1979). Irrelevant information is remembered better than schemarelevant information when the stimulus material involves pictorial scenes
Schema, Comprehension, and Memory
65
(Friedman, 1979; Goodman, 1980), scripted passages (Graesser, 1981; Graesser, Gordon, & Sawyer, 1979; Graesser, Woll, Kowalski, & Smith, 1980), and stereotype-based descriptions of people (Woll & Graesser, 1982). However, the causal relationship between resource allocation and memory has generally been difficult to pin down (see Reynolds & Anderson, 1980), and there are reasons for doubting that resource allocation explains the effect of typicality on memory (Friedman, 1979; Graesser, 1981; Light et al., 1979). The partial copy model was recently introduced by Bower, Black, and Turner (1979) in the context of scripted passages. According to this model, when a scripted passage is comprehended, two different memory codes are established. The first code, called the episodic memory structure, is simply a record of actions explicitly mentioned in the passage. Access to this episodic memory structure is restricted to shorter retention intervals. This code fades quickly over time at an exponential rate; it would not be accessible after a long retention interval such as a day. The second code is established in what Bower et al. call the knowledge store. The knowledge store corresponds to the generic script. Actions in the script are activated by the passage information. Explicit actions in the passage produce a high level of activation in the corresponding actions of the script. Script-relevant actions not stated in the passage receive a lower degree of activation. Memory discrimination via the general knowledge store decays independently from that of the episodic memory structure. After a long retention interval, the different levels of activation are virtually indistinguishable. Therefore, subjects would find it difficult to distinguish presented from unpresented script actions. Bower et al. reported findings that support the partial copy model. First, unstated typical script actions had high false alarm rates on recognition tests and high intrusion rates on recall tests. False alarms are high for these items because they are activated in the generic script. Second, memory discrimination between stated versus unstated actions was higher for irrelevant actions than typical actions. This predicted outcome also follows from the fact that unstated typical actions receive some activation during encoding, but unstated irrelevant actions do not. Third, unstated script actions had higher false alarm rates (recognition ratings) when the subject read more and more versions of a given scripted activity, for example, listening to one versus three passages of a “visit health professional” script. An unstated typical action should receive higher activation when the subject reads three scripts rather than one script, because the action would receive several versus only one activation in the generic script. The schema pointer plus tag (SP+T) model also predicts better memory for information that is atypical than typical of the schema. However,
66
Arthur C. Graesser and Glenn V. Nakamura
the explanation of this typicality effect is different than the attentionelaboration hypothesis and the partial copy model. The SP+T model bears some resemblance to the schema with correction hypothesis introduced decades ago (Woodworth, 1958; Woodworth & Schlosberg, 1954). According to Graesser’s (1981) SP+T model, a specific memory trace is constructed when a schema-based passage (or experience) is comprehended. The memory trace consists of (a) a pointer to the generic schema, which interrelates both the stated and unstated schema-relevant information as a whole, and (b) a set of tugs for information that is atypical of the schema. A tag is constructed for each atypical item, but only some typical items receive tags, namely, marginally typical actions. The untagged typical information is represented by a pointer to the schema. This implies that all of the generic schema is copied into the specific memory trace. The generic schema constitutes one chunk of information containing many typical actions; each tagged item consists of an additional chunk of information. The other properties of the SP+T model need not be discussed at this point. Several predictions of the SP+T model have been confirmed in memory experiments involving scripted passages (Graesser, 1981; Graesser et al., 1979, 1980; Smith & Graesser, 1981) and paragraphs organized around person stereotypes and roles (Woll & Graesser, 1982). First, memory discrimination between presented and unpresented items is better for atypical than typical information. Memory discrimination is poor for typical information because this information is often copied into the memory trace even when it is not stated in the passage. Second, there is no memory discrimination for very typical information. These very typical items would always be incorporated in the memory trace, even when unstated. Third, recognition false alarm rates and recall intrusions are higher for typical than atypical items. The unstated typical items would be copied into the memory trace by virtue of the pointer to the generic schema. The four hypotheses and models discussed in this section do not adequately account for the effects of typicality on memory. The shortcomings will be enumerated in this article after we have reported some pertinent studies. In a later section we shall propose a model that provides an impressive fit to available data. 11.
Methods
This section summarizes the methods we have adopted when preparing acquisition materials and assessing memory. Specific methodological details and constraints were incorporated in virtually all of our studies on
Schemas, Comprehension, and Memory
67
schemas and memory. Some of these methodological details proved critical for an adequate assessment of memory. A.
PREPARATION OF ACQUISITION AND TESTMATERIALS
The acquisition materials were constructed systematically and the acquisition items were scaled on a number of informative dimensions. In most of the studies, the acquisition materials were passages containing action sequences. Most of the actions were typical of the underlying schema, whereas some actions were irrelevant (atypical). In this section we assume that the acquisition materials are passages conveying scripted action sequences. The typical actions in a scripted activity were a subset of actions drawn from a free generation set. Subjects in a free generation group were presented a script title, such as eating at a restaurant. The subjects wrote down actions that were typical of the script. An action was included in the free generation set if it was mentioned by at least two subjects in the free generation group. Each action of a script was scaled on generic recallability, which is the likelihood that an action is listed as a typical script action in a free generation task. An action was defined as typical of a script if it was a member of the script’s free generation set. Free generation tasks have been used by other researchers as an empirical method of exposing the content of scripts (Bower et al., 1979) and stereotypes (Cantor, 1978). The investigators supplied the atypical actions in each scripted activity. It is important to point out that the atypical actions were not bizarre, weird, or emotionally salient. The atypical actions were mundane actions that were simply irrelevant to the script. For example, putting apen in the pocket was an atypical action in the scripted activity of eating at a restaurant. For each scripted activity, the typical and atypical actions were rated on a typicality scale by a normative rating group of subjects. Each action was rated on the following 6-point typicality scale: (1) very atypical; (2) moderately atypical; (3) uncertain, but probably atypical; (4) uncertain, but probably typical; (5) moderately typical; and (6) very typical. For the scripted activities investigated, the ratings of the typical actions (i.e., in the free generation set) ranged from 4.5 to 6.0, with a mean of 5.41. The ratings of the atypical actions ranged from 1.2 to 4.4, with a mean of 2.84. The generic typicality of an item was defined as its mean rating on this 6-point typicality scale. A counterintuitive finding in previous studies has been the low correlation between generic recallability and generic typicality within the set of
68
Arthur C. Graesser and Glenn V. Nakamura
typical actions. For scripts, we found a nonsignificant correlation, r = . 10 (Graesser et al., 1980). The fact that an item is very typical of a schema does not ensure that the item will be articulated in a free generation task. A very typical action may be difficult to capture in words or may be inferred TABLE I EXAMPLE PASSAGES AND TESTACTIONS Restaurant script (version A) That evening Jack wanted to go out to dinner so he called a friend who recommended several good restaurants. Jack took a shower, went out to his car, picked up his girlfriend and gave his girlfriend a book. He stopped the car in front of the restaurant and had the valet park the car. They walked into the restaurant and sat for a few minutes in the waiting area until the hostess escorted them to their table. They sat down at the table, the waitress introduced herself, and they ordered cocktails. Jack talked to his girlfriend and asked her how her job was doing, and they decided on what to eat. Jack cleaned his glasses, paid the bill, and bought some mints. Then they left the restaurant and drove home. Restaurant script (version B) That evening Jack wanted to go out to dinner so he called a friend who recommended several good restaurants. Jack took a shower, put away his tennis racket, put on a jacket, and picked up his girlfriend. He stopped the car in front of the restaurant and had the valet park the car. Jack confirmed his reservations and they sat for a few minutes in the waiting area. Jack put a pen in his pocket and the hostess escorted them to their table. The waitress introduced herself and they ordered cocktails. Jack talked to his girlfriend and they decided on what to eat. They ordered dinner, ate their meal, and Jack picked up a napkin off the floor. Jack left a tip, left the restaurant, and they drove home. Test actions for restaurant script (1) Jack asked his girlfriend how her job was doing (Atypical-A)
(2) Jack ordered dinner (Typical-B) (3) They sat down at the table (Typical-A) (4) Jack gave his grilfriend a book (Atypical-A) ( 5 ) Jack put on a jacket (Atypical-B) (6) They walked into the restaurant (Typical-A) (7) Jack left a tip (Typical-B) (8) Jack cleaned his glasses (Atypical-A) (9) Jack put a pen back in his pocket (Atypical-B) (10) Jack confirmed his reservations (Typical-B) (1 1) Jack picked up a napkin off the floor (Atypical-B) (12) Jack bought some mints (Atypical-A) ( 13) Jack went out to his car (Typical-A) (14) Jack put away his tennis racket (Atypical-B) (15) Jack paid the bill (Typical-A) (16) They ate their meal (Typical-B)
Schemas, Comprehension, and Memory
69
from other actions mentioned in a free generation protocol. In either case, the process of verbally articulating schematic knowledge is substantially different than the process of judging the typicality of information in the schema. After the actions were generated and scaled, scripted passages were prepared. There were always two versions (A and B) of each scripted activity. Version A contained a different sample of actions than version B. There were three sets of actions. Common typical actions were presented in versions A and B; these were context actions that were never analyzed or tested in later recall or recognition tasks. A set of A actions included typical and atypical actions presented in version A, but not in version B. Similarly, B actions were typical and atypical actions presented in version B, but not in version A. The A actions and B actions served as test actions for a scripted activity. The rationale for having two versions of each scripted activity is important. This design feature permits an assessment of sophisticated guessing for each test action. In a recall task, a subject may recall a test action that was not presented. In a recognition task, a subject may decide that he or she experienced a test action that in fact was never presented. An estimate of guessing is essential for an assessment of what is remembered about a passage. Table I shows the A and B versions of a restaurant script. Listed below these versions are the test actions associated with the scripted activity. Four typical test actions are presented in version A, but not in version B; four typical test actions presented in B, but not in A; four atypical test actions presented in A, but not in B; and four atypical test actions presented in B, but not in A. For subjects who listened to version A, the typical and atypical A actions serve as target test actions and the typical and atypical B actions serve as nontarget actions. The reverse holds true for subjects presented version B . These design and counterbalancing constraints have been imposed in all the studies conducted in our laboratory. The design features illustrated in Table I can be adopted for different types of acquisition materials. We have incorporated these design features for tape-recorded scripted activities, videotaped scripted activities, and personality descriptions organized around role and stereotype schemas. We have also constructed a Jack story, which involves a character named Jack who engages in several scripted activities. B.
MEASURESOF MEMORY
Both recall and recognition memory tests have been administered to subjects after they are presented the acquisition passages. When they
70
Arthur C. Graesser and Glenn V. Nakamura
receive recall tests, they are given the script title and they write down as many actions as they can remember. The experimenter points out that the acquisition passage included actions that are typical of the script and actions that are irrelevant. The subjects are encouraged to recall both types of actions. When a recognition test is administered, the subjects rate each test action on the following 6-point scale: (1) positive that the item was not presented; (2) fairly sure that the item was not presented; (3) uncertain, but guess that the item was not presented; (4)uncertain, but guess that the item was presented; (5) fairly sure that the item was presented; and (6) positive that the item was presented. Decisions of 4,5, and 6 constitute YES judgments, whereas 1, 2, and 3 are NO judgments. An appropriate assessment of memory involves a discrimination between presented actions and actions that were not presented in an acquisition passage. Performance on a recognition test improves to the extent that subjects say YES to target actions and NO to nontarget actions. A subject’s hit rate is the likelihood of saying YES to presented (target) actions, whereas the false alarm rate is the likelihood of saying YES to unpresented (nontarget) actions. Recognition memory improves with an increase in the difference between hit rates and false alarm rates, that is [p(hit) - p(fa1se alarm)]. An alternative and more widely accepted measure of memory discrimination is a d’ score (Green & Swets, 1966; Kintsch, 1977). Sometimes we shall refer to a memory gcore, which is a measure of memory that corrects for guessing. Memory score =
p(hit) - p(fa1se alarm) 1 - p(fa1se alarm)
When passages are tested by recall, analyses focus exclusively on the test actions (see Table I). A recallproportion consists of the likelihood of recalling a target test action. An intrusion proportion is the likelihood of recalling a nontarget test action. We do not score actions that are not test actions. Thus, when version A is presented, we score only intrusions that are B actions; when version B is presented, we score only intrusions that are A actions. Good memory discrimination consists of a high recall proportion and low intrusion proportion. A d‘ score is normally not computed for recall, but Memory Scores may be computed. Analogous to formula 1, the Memory Score for recall is shown in formula 2. Memory score =
p(recal1) - p(intrusion) 1 - p(intrusion)
Schemas, Comprehension, and Memory
71
111. A Schema Copy Plus Tag Model
In this section we will describe the assumptions of our schema copy plus tag (SC+T) model and report some data that support it. This model is almost the same as the schema pointer plus tag (SP+T) model described in Section 1,C. In fact, the two models are identical except for one property. In the SP+T model, the memory trace contained a pointer to the generic schema as a whole. This constraint implies that all information in the generic schema is passed to the specific memory trace. However, it seems more plausible that only a subset of the information in a schema is passed to the memory trace. The SC+T model reflects our change in attitude. According to the SC+T model, only a subset of the information in a generic schema is copied into the specific memory trace. Of course, an important issue is what schema information is copied into the memory trace. We shall address this issue later. For the present, we want to emphasize that much of the schema is copied into the memory trace, but usually not all of it. A.
REPRESENTATIONAL ASSUMFTIONSAND PREDICTIONS
Three assumptions of the SC+T model pertain to the representation and content of specific memory traces. According to the SC+T model, a specific memory trace is constructed whenever a specific passage (or experience) is comprehended at a specific time and place. We shall assume for the moment that the passage (or experience) is organized around one central schema. We shall also assume that passages serve as acquisition materials. Of course, the scope of our model would extend to experiences other than discourse comprehension. Assumption I
The memory representation for a passage contains a partial copy of the generic schema that best fits the input statements compared to the set of alternative schemas in memory. Very typical items are always copied into the trace, yielding a set of very typical items (1 to n ) . Assumption 2
Some items in a passage are only moderately typical of a generic schema. Other items are relevant to a generic schema, but do not fit in with other typical items that are explicitly stated. These typical items are linked to the schema copy with tags (i.e., an associative relation that
12
Arthur C. Graesser and Glenn V. Nakamura
signifies a constrast). There is a unique tag for each of these typical items, yielding a set of tagged typical items (1 to m). Assumption 3
Some items in a passage are typical of the generic schema and are linked to the memory representation with tags. There is a distinct tag for each atypical item, yielding a set of tagged atypical items (1 to 9 ) . With these three assumptions, we are ready to make some predictions about memory for specific passages. One prediction is that memory discrimination should be better for tagged typical actions (assumption 2) and tagged atypical actions (assumption 3) than actions that are not tagged (assumption 1). Since many typical actions are inferred by virtue of the schema content being copied into the memory trace, the subject would be unable to remember whether a typical action was explicitly stated or merely inferred. On the other hand, the tagged actions would be distinct and salient in memory in the form of a separate organizational unit which contrasts the mass of typical information. All the atypical actions are tagged, yet only a subset of the typical actions are tagged. Since tagging directly predicts memory discrimination, memory discrimination will be better for atypical than typical actions. A second prediction is that there should be no memory discrimination for the very typical actions. These actions would be inferred in virtually any scripted activity. The very typical actions (with a 6.0 typicality rating) would be copied into the memory trace when they are stated and when they are not stated in a passage. Subjects should be unable to decide correctly whether these actions were presented in a passage. A third prediction is that false alarm rates and intrusion proportions should be higher for typical actions than atypical actions. A subset of the typical schema actions would be copied into the memory trace, even though these actions were never stated in the passage. These typical actions should evoke false alarms on recognition tests and intrusions on recall tests. However, unpresented atypical actions would not be copied into the memory trace, so their false alarm rates and intrusion proportions should be lower. A fourth prediction is that hitrates and recall proportions should not vary with typicality in any simple, elegant manner. On the one hand, hit rates and recall proportions should increase with typicality because the likelihood of being copied into the memory trace increases with typicality. On the other hand, hit rates and recall proportions should decrease with typicality because the likelihood of an action being tagged decreases with typicality.
Schemas, Comprehension, and Memory
73
These four predictions of the SC+T model have been consistently supported in our previous experiments. We have confirmed all four predictions when the acquisition materials involved scripts (Graesser, 1981; Graesser et al., 1979; Graesser et al., 1980; Smith & Graesser, 1981) and roles or stereotypes (Woll & Graesser, 1982). The predictions were supported for recognition tests and for recall tests. Figure 1 summarizes the outcomes of the recognition studies we have conducted. Analogous trends would occur for recall by substituting recall proportions for hit rates, intrusion proportions for false alarms, and Memory Scores (see formula 2) for d’ scores. The predictions of the SC+T model are supported by these trends. First, there is better memory discrimination for atypical actions than typical actions (see d’ scores). Second, d’ scores are essentially zero for very typical actions (6.0 typicality ratings); the hit rates do not differ from false alarm rates for these actions.
t
A Hit rate
.50
/
t
/ I
I
I
2 - 2.99
3 - 3.99
4 4.99
Very Atypical
I
5-5.99
L
6.00 Very Typical
Typicality of Information
3 . q
I
very Atypical Typicality of Information
Very Typical
Fig. 1 . Hit rates, false alarm rates (A), and d’ scores (B) as a function of the typicality of an item with respect to a schema.
74
Arthur C. Graesser and Glenn V. Nakamura
Third, false alarm rates increase with typicality, particularly within the typicality interval of 4 to 6. Fourth, hit rates vary only modestly with typicality. In fact, hit rates have increased with typicality in some experiments, have decreased with typicality in other experiments, and have shown no change in other experiments. In summary, the four predictions of the SC+T model have been consistently supported in recognition and recall studies. B.
GUESSINGASSUMPTIONS AND PREDICTIONS
Loosely speaking, subjects are guessing when there are false alarms in a recognition test and intrusions in a recall test. Subjects often guess that typical actions were presented, but rarely guess that atypical actions were presented. Unfortunately, the term “guessing” has connotations that are inappropriate in the present context. The false alarms and intrusions are not products of coin tossing or random generations of responses. These guesses are based on schematic knowledge and reflect encoding processes (i.e., copying mechanisms), rather than processes invoked when there is a lack of information to serve as decision criteria. Despite these unfortunate connotations, we shall use “guessing” in this section for lack of a better word. Correlational analyses have been performed on the typical actions (not the atypical) in order to examine guessing processes. Guessing on a recognition test can be predicted primarily by an action’s generic typicality and secondarily by an action’s generic recallability . For scripts, the correlation between false alarms (i.e., guessing on a recognition test) and generic typicality is substantial, r = .42, p < .05 (Graesser et al., 1980). The correlation between false alarms and generic recallability is low, but consistent, r = .19, p < .10 (Graesser et al., 1980). As we mentioned earlier, typical actions show a low or nonsignificant correlation between generic recallability and generic typicality. Guessing on a recall test is predicted by an action’s generic recallability, but not by its generic typicality. Typical actions in scripts show a robust correlation between intrusion proportions (i.e., guessing on a recall test) and generic recallability, r = .67, p < .05 (Graesser et al., 1980). Not surprisely, both recall and free generation tasks share similar mechanisms for the articulation of linguistic codes. These articulation mechanisms do not map directly onto the generic typicality for the set of typical actions. Moreover, generic typicality does not significantly correlate with intrusion proportions, r = .03. On recognition tests, subjects occasionally decide that very atypical nontarget actions were presented. These incorrect decisions suggest that subjects occasionally guess randomly on recognition tests. Obviously, an
Schemas, Comprehension, and Memory
75
individual would not infer the occurrence of an irrelevant action in a scripted activity. The earlier SP+T model introduced three guessing assumptions which capture the above trends in recognition and recall tests (Graesser, 1981). These three assumptions will be incorporated into the SC+T model: Assumption 4
The likelihood of guessing a typical item at recall increases with its generic recallability . Assumption 5 The likelihood of guessing YES to a typical item at recognition increases with its generic typicality and to a smaller extent with its generic recallability . Assumption 6 Individuals sometimes guess YES randomly on a recognition test, but this is unrelated to either generic typicality or generic recallability .
c.
RETRIEVAL ASSUMITIONS FOR RECALLAND RECOGNITION
The retrieval assumptions in the SP+T model were different for recall and recognition tests. Such differences are also assumed to exist in the SC+T model. When subjects recall a scripted activity, they are presented a script title (eating at a restaurant) and are asked to recall all actions that were mentioned in the activity. When subjects complete a recognition test they receive test actions in addition to the script title. The retrieval cues are quite different for recall and recognition. It is not surprising, therefore, that the retrieval processes would differ. When subjects recall a scripted activity, their recall is guided by an organized retrieval strategy. This retrieval is said to be conceptually driven. Since the subjects are prompted by a script title, the script has a substantial impact on the conceptually driven retrieval strategy. Conceptually driven retrieval often requires effort to accomplish as the individual invokes some strategy to access and decode the contextually specific memory trace. Assumption 7
Recall of an item is directed by an organized retrieval strategy that is conceptually driven and influenced by the generic schema. Recognition tasks involve more than conceptually driven retrieval. Some test actions may be retrieved by conceptually driven retrieval, but
76
Arthur C. Graesser and Glenn V. Nakamura
others are accessed by data-driven retrieval. A test item on a recognition test contains information that provides a more direct access to the item in memory. The test item serves as a copy cue, that is, a rich configuration of information that has a close match to the original encoding. Datadriven retrieval is analogous to “detection of familiarity” (Atkinson & Juola, 1974) and “intraitem elaboration” (Mandler, 1980) in models of recognition memory. Data-driven retrieval is often acceomplished quickly, is not always guided by a strategy, and does not always require a reinstatement of the original script context in which the item was embedded. Thus, individuals may correctly recognize that Jack put a pen in his pocket, but forget whether this action occurred in a restaurant script or in a packing-for-vacation script. Graesser (1981) has described the differences between data-driven retrieval and conceptually driven retrieval in more detail. Assumption 8
Recognition of an item is accomplished either by data-driven retrieval through the test item as a copy cue, or by a conceptually driven retrieval strategy. The SC+T model adopts a dual-process model of recognition memory. In this scene, the sC+T model is compatible with dual process models of word recognition (Atkinson & Juola, 1974; Kintsch, 1977; Mandler, 1972, 1980). Whereas recall is guided by conceptually driven retrieval, recognition is guided by conceptually driven and data-driven processes. D . RETENTIONASSUMPTIONSAND PREDICTIONS
According to the SC+T model, the retention functions differ for conceptually driven and data-driven retrieval. Moreover, the rate of decay is different for atypical and typical actions. In the present discussion, rate of decay is simply a description of the retention function. The decay rate is undoubtedly a product of interference mechanisms. Two retention assumptions are associated with conceptually driven retrieval. Assumption 9 addresses the impact of typicality on decay rate, and assumption 10 addresses the shape of the decay function. Assumption 9
As the retention interval increases, the schema plays a more important role in guiding conceptually driven retrieval. Thus, atypical items have a faster decay rate than typical items.
Schemas, Comprehension, and Memory
77
Assumption I0
The likelihood of accessing an item via conceptually driven retrieval decreases exponentially over the retention interval. Assumption 9 agrees with the hypothesis that memory shifts from being reproductive (faithfully close to what is stated) to being reconstructive (close to the schema) at longer retention intervals (Bartlett, 1932; Cofer, Chmielewski, & Brockway, 1976; D’Andrade, 1974; Kintsch & VanDijk, 1978; Spiro, 1977). The schema has a more central role in guiding conceptually driven retrieval as the retention interval increases. As the retention interval increases, there is a greater bias to retrieving typical tagged items than atypical tagged items. According to assumption 10, there is an exponential decay rate, which agrees with Ebbinghaus’s classical observations about recall and retention interval. If assumptions 9 and 10 are correct, then an interaction should occur between typicality and retention interval for conceptually driven retrieval. The interaction reflects the claim that atypical actions decay at a faster rate than typical actions. Such an interaction has in fact been reported in previous studies involving scripted activities (Graesser , 1981;Graesser et al., 1980; Smith & Graesser, 1981). The data actually show a crossover. Recall for atypical actions is better than recall for typical actions through 2 or 3 days; after a few days there is better recall for typical than for atypical actions. This crossover is depicted in Fig. 2. Two retention assumptions are associated with data-driven retrieval. Assumption 11 addresses the impact of typicality on decay rate, and assumption 12 addresses the shape of the decay rate. Assumption I 1
The decay rate is the same for typical and atypical items when tagged items are accessed via data-driven retrieval. Assumption 12
The likelihood of accessing an item via data-driven retrieval approximates a linear decrease over time. Data-driven retrieval does not always require a reinstatement of the original passage context and is not always substantially influenced by a strategy. Consequently, typical and atypical items have comparable decay rates. The linear decay rate probably is a simplification of the true decay function for data-driven retrieval. Data-driven retrieval is guided by a copy cue, which invokes several dimensions and levels of informa-
Arthur C. Graesser and Glenn V. Nakamura
78
Conceptually driven retrieval
1
I
I
1.00 L
g
Data-driven retrieval
-
2
.75-
Atypical
tion to serve as retrieval cues. The retention function that corresponds to any one dimension may be exponential. However, the resolution of several exponential functions, with different decay rates, is a decay function that approaches linearity. If assumptions 11 and 12 correctly specify data-driven retrieval, then there should be no interaction between typicality and retention interval. When memory scores are computed from recognition tests, there are significant interactions (Graesser, 1981; Graesser et al., 1980; Smith & Graesser, 1981). However, both conceptually driven retrieval and datadriven retrieval contribute to the Memory Scores. An accurate assessment of data-driven retrieval would partial out contributions from conceptually driven retrieval. Graesser (1981) used recall memory scores as estimates
Schemas, Comprehension, and Memory
79
of conceptually driven retrieval at recognition. After these estimates were partialed out of the recognition memory scores, the resulting functions for data-driven retrieval were plotted. These corrected data-driven functions approached linearity and the lines were nearly parallel for typical and atypical script actions. As expected, retrieval was better for atypical than typical actions. The decay rate was roughly constant for the two types of actions; if anything, the decay rate was steeper for typical than atypical actions. The pattern of data-driven retrieval is depicted in Fig. 2. Smith and Graesser (1 98 1) assessed memory for typical versus atypical script actions after a retention interval o f f hour, 2 days, 1 week, and 3 weeks. The actions of a given scripted activity were tested at only one of the four retention intervals by either a recall or a recognition test (but never both). The Memory Scores for recall and recognition supported the assumptions of the SC+T model. A mathematical simulation of the data demonstrated that the proposed assumptions provided a better fit to the data than alternative assumptions. An exponential decay function for conceptually driven retrieval provided a better fit than a linear decay function; a linear decay function for data-driven retrieval was better than an exponential function. The decay rate for conceptually driven retrieval was steeper for atypical than typical actions; the data-driven decay rates were roughly the same for atypical and typical actions. Memory was initially better for atypical than typical actions in both data-driven and conceptually driven retrieval. A dual-process recognition mechanism showed a better fit to the Memory Scores than did a single-process recognition mechanism. In summary, the SC+T model explains memory for information that varies in typicality with respect to a central organizing schema. The model can explain recall and recognition of typical versus atypical actions after varying retention intervals. Recognition is uniformly better for atypical than typical actions at all retention intervals. Recall is initially is initially better for atypical than typical actions, but the opposite holds true for 3 or 4 days. Thus, typical actions are remembered best only when memory is assessed by recall after a long retention interval.
IV. Some Issues Confronting the SC+T Model The data reported in the previous section provide encouraging support for the SC+T model. However, some questions have not been answered regarding the role of schemas in comprehension and memory. The purpose of this section is to address some questions that colleagues have raised and to report research that should help clarify these issues.
80
A.
Arthur C. Graesser and Glenn V. Nakamura
DOESTHE TYPICALITY EFFECTOCCURFOR DIFFERENT KINDSOF SCHEMAS?
We believe that the reported effects of typicality on memory generalize to knowledge domains other than scripted activities. The same effects should emerge for person schemas, visual scenarios, and schemas that correspond to other knowledge domains. In Section 1,C we reported studies that confirm the typicality effect for picture memory. In a study by Woll and Graesser (1982) subjects listened to descriptions of fictitious people and were later given a recognition test on actions and traits. The actions and traits varied in typicality with respect to a person schema which was foregrounded at the beginning of the personality description. We examined person schemas that corresponded to roles (e .g ., professor, cowboy) and stereotypes (e.g., macho male, aggressive female). As with the scripted activities, there were two versions of each personality description, so that the target items of version A were nontarget items in version B, and vice versa. The pattern of d' scores in three studies consistently supported the SC+T model. Memory was substantially better for atypical than typical actions and there was no memory discrimination for very typical information (d' = .lo). A few studies in the social cognition area have reported better memory for information that is congruent (typical) with a person schema than information that is irrelevant (e.g., Hastie & Kumar, 1979; Hastie, 1980). However, the memory measures in these studies did not control for guessing, so the data are difficult to interpret. We have reexamined the means reported in the Hastie and Kumar study and have estimated a memory score based on a reasonable guessing likelihood (. 10). These memory scores show better memory for irrelevant than typical items. It is important to control for guessing and response biases when assessing the impact of stereotypes on memory (see Bellezza & Bower, 1981a; Clark & Woll, 1981). B.
DOESTHE TYPICALITY EFFECTPERSISTWHENMORETHAN ONESCHEMA GUIDESCOMPREHENSION?
In our previous studies, passages were organized around a central schema. However, most passages and experiences foreground more than one schema. The schemas correspond to different knowledge domains and levels of structure. We have recently conducted some memory studies on passages that invoke both a script schema and a person schema. This research will be reported here. The acquisition passages contained actions that varied in typicality
Schemas, Comprehension, and Memory
81
with respect to a role schema and with respect to a script. At the beginning of the passage, the relevant script and role were identified, for example, “Bill was a professor and he decided to go horseback riding.” Here professor is the role schema and riding a horse is the script. After this introductory statement, the passage included a series of actions which varied in typicality. There were four categories of test actions: (1) script typical and role typical, (2) script typical and role atypical, (3) script atypical and role typical, and (4)script atypical and role atypical. In addition to these critical actions, which were later tested, each passage contained several context actions typical of the role or script. As in the previous studies, there were several passage versions so that a test item was a target in some versions and a nontarget in other versions. Free generation groups and normative rating groups were run in order to systematically prepare the acquisition passages. In addition, special versions and counterbalancing procedures were employed, so that a given test action was typical in some versions and atypical in others. For example, put on a hut would be atypical for a professor; the same action would be typical of a cowboy role in a passage introduced as “Bill was a cowboy and he decided to go horseback riding.” Therefore, any effects of typicality on memory could not be attributed to intrinsic properties of the items such as imagery, concreteness, or salience. Eighty subjects listened to eight experimental passages that were designed with the above constraints. Approximately 30 min later they completed a recognition test on the test actions using the 6-point rating scale. Altogether there were 128 test actions, with 16 per passage. Again the typicality of a test action varied, depending on the passage version that the subject received. The recognition data are shown in Table 11. Table I1 includes d‘ scores, hit rates, and false alarm rates for the four categories of actions. For TABLE I1 RECOGNITION MEMORYAS A FUNCTION OF SCRIPTTYPICALITY AND ROLETYPICALITY Script typical
Recognition measure
d’ score Hit rate False alarm rate
Script atypical
Role typical
Role atypical
Role typical
Role atypical
.50 .68 .55
.7 I .73
.98
1.12 .63 .25
.50
.71 .36
82
Arthur C. Graesser and Glenn V. Nakamura
present purposes, we shall focus on the d’ scores because D’ provides an accurate assessment of memory discrimination that controls for guessing. The pattern of d’ scores supported the SC+T model. The d’ scores were significantly higher for script-atypical than script-typical actions, 1 .05 versus .61, respectively, F ( 1 , 79) = 8 7 . 1 3 , < ~ .05. The d’ scores were significantly higher for role-atypical than role-typical actions, .92 versus .74, respectively, F ( 1 , 7 9 ) = 6.91, p < .05. The interaction between role typicality and script typicality was not significant. The pattern of d’ scores supported the claim that the scripts were more critical than the role schemas in guiding comprehension and memory for the passages. A difference score between atypical and typical items may be used as an index of the impact of a schema on comprehension processes. This difference score was .44 (1.05 - .61) for scripts and .18 (.92 - .74) for roles. Since script typicality predicted memory better than role typicality, it appears that there is a script bias in comprehension and memory. Some follow-up experiments were conducted in order to assess the robustness of the script bias. In one follow-up experiment we varied the instructions that subjects received before listening to the passages. In a script emphasis condition, subjects were instructed to pay careful attention to the actions that the fictitious characters performed. In a role emphasis condition, subjects were instructed to pay careful attention to the characters’ personalities. Forty subjects participated in each of these conditions. The acquisition passages, recognition test booklets, and recognition scale were identical to those in the previous study. Table 111 shows the recognition data for subjects in the script emphasis and role emphasis conditions. The instructions had no impact on memory for the passages. The main effect of instructions was nonsignificant, and instructions did not significantly interact with role typicality or script typicality. The latter variables did significantly predict memory and consistently supported the SC+T model. Script-atypical actions had significantly higher d’ scores than the script-typical actions, 1.09 versus .56, respectively, F( 1, 78) = 64.02, p < .05. Role-atypical actions had significantly higher d’ scores than did role-typical actions, .90 versus .75, respectively, F(1, 78) = 4.82, p < .05. The role typicality X script typicality interaction was not significant. The atypical-typical difference scores supported the idea of a script bias. The difference score was .54 for scripts and .15 for roles. Varying instructions did not have an impact on script bias. Another study was conducted to assess further the robustness of script bias. We modified the passages in order to attenuate any potential emphasis on the scripts. This was accomplished by deleting all the context script
Schemas, Comprehension, and Memory
83
TABLE 111 RECOGNITIONMEMORYAS A FUNCTION OF SCRIPT TYPICALITY, ROLE TYPICALITY, AND ~NSTRUCTIONAL SET Script typical Recognition measure d’ score
Hit rate False alarm rate
Instructional set Script emphasis Role emphasis Script emphasis Role emphasis Script emphasis Role emphasis
Role typical
Role atypical
.66 .67 .75 .75
.48 .44 .71 .70 .55 .56
.54
.53 Script atypical
Recognition measure d’ score Hit rate False alarm rate
Instructional set Script emphasis Role emphasis Script emphasis Role emphasis Script emphasis Role emphasis
Role typical
Role atypical
.97 1.12 .70
1.17 1.08 .61 .61 .23 .21
.71
.36 .33
actions (i.e., typical script actions that were presented in all passage versions). The rewritten passages contained a higher proportion of rolerelevant actions and very few script-relevant actions. Otherwise the passages were identical to the previous passages. The recognition booklets and the recognition scale were also identical to the previous two studies. Forty subjects participated in the experiment. Table IV shows the recognition data in the follow-up study. The pattern of d‘ scores supported the script bias idea as well as the SC+T model. The d’ scores were significantly higher for script-atypical than script-typical actions, 1.04 versus .68, respectively, F( 1,39) = 22.96, p < .05. The d’ scores were significantly higher for role-atypical than role-typical actions, .96 versus .76, respectively, F(1,39) = 8.51, p < .05. There was a nonsignificant script typicality X role typicality interaction. The atypical-typical difference score was greater for scripts (.36) than for roles (.20), which supports the notion of a script bias. ln summary, the three experiments support the SC+T model. The typicality effect persists when more than one schema is foregrounded during comprehension. Each schema has an independent effect on memor y , with schema-relevant information being remembered less well than
84
Arthur C. Graesser and Glenn V. Nakamura
TABLE IV RECOGNITION MEMORYAS A FUNCTION OF SCRIPTTYPICALITY AND ROLE TYPICALITY WHENPASSAGESEMPHASIZE ROLEPROCESSING Script typical
Recognition memory
d’ score Hit rate False alarm rate
Script atypical
Role typical
Role atypical
Role typical
Role atypical
.55 .63 .43
.80 .70 .42
.96 .72 .38
1.12 .67 .26
schema-irrelevant information. We also found a script bias in these passages which foregrounded both a script and a role. Compared to the role schema, the script had a more robust impact on comprehension and memory. This script bias was not influenced by the comprehenders’ goals (instructions) and the ratio of role-relevant to script-relevant information. Scripts are apparently more central organizing schemas than are roles. C . DOESTHE TYPICALITY EFFECT OCCURWHENSCRIPTED ACTIVITIESARE VIDEOTAPED?
The studies we have reported so far have one common property: The acquisition materials have involved passages. What happens when the acquisition material is nonverbal? In order to answer this question, we compared memory for actions in videotaped scripted activities and taperecorded scripted activities. The actions and scripts were identical in the videotaped and the taperecorded action sequences. There were four experimental scripts: setting the table, polishing shoes, fming lunch, and typing a letter. There were two versions (A and B) of each scripted activity, so that a given test action was presented in one version, but not the other. The tape-recorded scripted activities were recorded at a medium rate of approximately 150 words per min. The videotaped scripted activities had no sound. In order to foreground the appropriate script, the script title was presented on the screen immediately prior to the videotaped action sequence. All four experimental scripted activities were enacted in approximately 10 min. Thirty subjects were assigned to the videotaped condition and 30 to the tape-recorded condition. Approximately 15 min after viewing or listening to the scripts, subjects completed a recognition test using the 6-point recognition scale. Half of the 48 test actions were typical and half were
Schemas, Comprehension, and Memory
85
atypical; half of the actions were target actions and half were nontarget actions for any given subject. Table V shows the recognition data for the videotaped and tape-recorded scripted activities. The d’ scores were significantly higher for atypical actions than typical actions, 1.90 versus 1.27, respectively, F(1, 58) = 28.52, p < .05. There were slightly, but not significantly higher, d’ scores in the videotaped condition than the tape-recorded condition, 1.79 versus 1.38, respectively, F(1, 58) = 3.37, p < .07. There was no significant interaction between typicality and presentation mode, F( 1, 58) = 2.42, .10 < p < .13. These data confirm the typicality effect and support the SC+T model. The atypical-typical difference score was .55 for videotaped scripts and .81 for tape-recorded scripts. Thus, the typicality effect emerges in both linguistic and nonlinguistic acquisition materials, and in both auditory and visual modalities. D.
Is THE TYPICALITY EFFECTINFLUENCED BY PRESENTATION RATE?
Several researchers have either assumed or asserted that the typicality effect is a product of the amount of attention, rehearsal, cognitive resources, or conceptual elaboration that items receive during encoding (see Section 11,C). These ‘‘attention and elaboration” explanations are very popular among researchers. When colleagues are asked to explain why atypical actions are remembered better than typical actions, researchers frequently reply that ‘‘subjects pay more attention to the atypical items’’ or “subjects study the atypical input harder” because the atypical information does not fit in with the typical information. The attention-elaboraTABLE V RECOGNITION MEMORY AS A FUNCTION OF SCRIPT TYPICALITY AND PRESENTATION MODALITY Presentation modality Tape recorder (auditory)
Videotape (visual)
Recognition measure
Typical
Atypical
Typical
Atypical
d’ score Hit rate False alarm rate
.97 .82 .54
1.78 .73 .22
1.56 .83 .37
2.01 .72 .I5
86
Arthur C. Graesser and Glenn V. Nakamura
tion explanation differs substantially from the SC+T explanation of the typicality effect. The mechanism underlying an attention-elaboration explanation is simple and straightforward. During comprehension the comprehender identifies one or more schema as relevant to the passage. For each incoming item, the comprehender assesses the typicality of the item with respect to the available foregrounded schema. If the item is evaluated as typical, then the item does not require additional processing and analysis. If the item is atypical, then the item draws additional cognitive resources and further elaboration at a conceptual or semantic level. Since atypical items receive additional resources and conceptual elaboration at comprehension, they would be remembered better than typical items. According to the SC+T explanation, the typicality effect is a product of the organizational processes that are invoked automatically. For the most part, the representational code and the typicality effect do not depend on the encoding strategies and the goals of the comprehender during comprehension. The magnitude of the typicality effect should remain essentially constant across different encoding contexts. We conducted an experiment that varied the rate at which scripted activities were presented to subjects auditorily. In a medium rate condition, the Jack story was presented at a normal conversational rate of 175 words per min. The Jack story contained 10 scripted activities. In afast rate condition the scripted activities were presented as quickly as possible without sacrificing comprehensibility of the material (280 words per min). Forty subjects were assigned to each condition. Approximately 30 min after listening to the passage, subjects completed a recognition test. The purpose of varying presentation rate was to test between the SC+T explanation and the attention-elaboration explanation of the typicality effect. The SC+T model predicts no interaction between presentation rate and typicality on recognition memory. However, such an interaction would support the attention-elaboration explanation. According to the attention-elaboration hypothesis, atypical actions are remembered better than typical actions because the atypical actions receive more cognitive resources during comprehension. Varying processing resources among actions is manageable at a medium presentation rate. However, at a very fast presentation rate it should become more difficult, if not impossible. Consequently, the attention-elaboration explanation predicts that the differences in memory between atypical and typical actions should be less pronounced in the fast rate condition than in the medium rate condition. Table VI shows the recognition memory data for typical and atypical actions in medium and fast presentation rate conditions. The d' scores were significantly higher for atypical than for typical actions, 1.79 ver-
87
Schemas, Comprehension, and Memory
TABLE VI RECOGNITIONMEMORYAS A FUNCTION OF SCRIPTTYPICALITY AND PRESENTATION RATE Presentation rate Medium rate
Fast rate
Recognition measure
Typical
Atypical
Typical
Atypical
d’ score Hit rate False alarm rate
.63 .SO .62
1.96 .76 .I6
.37 .77
1.60 .66 .I7
.66
sus .50, respectively, F(l , 78) = 229.63, p < .05. The d’ scores were also significantly higher in the medium than in the fast presentation rate condition, 1.29 and .99, respectively, F(1, 78) = 6.28, p < .05. The interaction between presentation rate and typicality was not significant, F(1, 78) = .34, p > .50. Thus, the influence of item typicality on memory persists at all presentation rates. Since the magnitude of the typicality effect was constant at both presentation rates, organizational properties of the memory trace appear to best explain the effects of typicality on memory. The construction of the memory tags for atypical actions seems to be achieved very quickly during comprehension. Variations in attention and elaboration have little or no impact on the typicality effect. E.
Is
TYPICALITY EFFECTINFLUENCED BY THE GOALSOF COMPREHENDER?
THE
THE
To what extent is the typicality effect sensitive to the goals of the comprehender? Does memory vary for typical versus atypical information depending on the attention or emphasis that the comprehender gives to the two types of actions? We conducted an experiment to answer this question. We varied the comprehenders’ goals by varying the instructions that subjects received before listening to the scripted activities. The acquisition material consisted of the Jack story. The experiment included six instructional set conditions with 20 subjects assigned to each condition. The conditions are listed and described below. 1. Vague condition. The subjects did not receive specific instructions on how to process the Jack story. The subjects were told that they would
88
Arthur C. Graesser and Glenn V. Nakamura
later be asked questions about the Jack story. This vague condition will be regarded as a normal processing environment and a prototype to compare other instructional set conditions. 2. Personality condition. The subjects were told that they would be given a test that assessed their perceptions of Jack’s personality. The subjects were instructed to pay careful attention to unusual actions that Jack performed, because unusual actions convey much information about a person’s personality. Subjects were expected to place more emphasis on the atypical actions and less emphasis on the typical actions, compared to the vague condition. 3. Global condition. The instructions were designed to encourage subjects to focus on the global levels of the Jack story rather than the details (i.e., individual actions). Subjects were told to monitor changes in spatial settings. A change in spatial setting occurred roughly at the junctures between scripts. The subjects wrote down a tally mark in a box on the instruction sheet whenever they detected a change in spatial scenario. For example, a scenario change would occur if Jack’s activities changed from the location of his home to that of a department store. Subjects were expected to place less emphasis on atypical actions, compared to the vague condition. 4. Specific condition. The instructions were designed to encourage subjects to focus on individual actions in the Jack story. The subjects were told to write a tally mark in a box on the instruction sheet whenever Jack executed a skilled action, which was defined as an action requiring extensive training or education to perform. The subjects were given examples of skilled and unskilled actions. For example, fixing a radio is a skilled action, whereas eating a sandwich is not a skilled action. The subjects were expected to place more emphasis on typical actions (and probably also the atypical actions), compared to the vague condition. 5. Recall condition. The subjects were told that they would later be asked to recall, in writing, the contents of the Jack story. Recall instructions presumably promote an increased concern with organizing and interrelating information in a cohesive fashion. Since typical actions form the cohesive core of a passage, more emphasis should be pfaced on typical actions and less emphasis should be placed on atypical actions, compared to the vague condition. 6. Recognition condition. The subjects were told that they would later be given a recognition test on the Jack story. The format of the recognition test was described to subjects. Subjects would presumably focus on details in this condition. Compared to the vague condition, subjects should place more emphasis on typical actions. The above six instruction conditions were designed to manipulate the
Schemas, Comprehension, and Memory
89
emphasis on processing typical versus atypical actions. Of course, the instructions only indirectly control the allocation of processing resources, and there is no insurance that the instructions produce the intended manipulation. However, if the typicality effect is very sensitive to the goals of the comprehender and the allocation of resources to typical versus atypical actions, then a typicality X instructional set interaction should occur when the recognition data are analyzed. If, however, the typicality effect [&(atypical) - d‘(typical)] is relatively impervious to variations in the comprehenders’ goals, then the typicality X instructional set interaction would be nonsignificant. In all instructional set conditions, a recognition test was administered to subjects approximately 30 min after the Jack story was presented. Table VII shows the recognition data. The d‘ scores were higher for atypical than typical actions, 1.64 versus .52, respectively, F( 1, 114) = 319.03, p < .05. There were significant differences in d’ scores among the instructional set conditions, with means of .74, .85, .98, 1.02, 1.42, and 1.48 in the global, vague, specific, recall, recognition, and personality conditions, respectively, F ( 5 , 114) = 9.05, p < .05. However, the interaction between typicality and instructional set was not statistically significant, F ( 5 , 114) = 1.67, p > .lo. A series of Newman-Keuls tests was performed on the d’ scores of the six instructional set conditions in
TABLE VII RECOGNITION MEMORYAS A FUNCTION OF SCRIPTTYPICALITY AND INSTRUCTIONAL SET Recognition memory measure
Instructional set Recall Recognition Global Specific Personality Vague
Typicality Typical Atypical Typical Atypical Typical Atypical Typical Atypical Typical Atypical Typical Atypical
d‘ score
Hit rate
.49 I .54 .14 2.10 .28 1.20
.71
.48 1.47
.87 2.09 .30 1.41
False alarm rate
.62 .ll .61
.62 .I8 SO .I1 .13 .22 .63 .I8
.19
.51
.16 .16 .12
.I4 .lo .26
.70 .14 .74 .81
90
Arthur C. Graesser and Glenn V. Nakamura
order to examine the source of the significant main effect of instructional set. The outcome of the Newman-Keuls tests supported the following ordering among means using a .05 level of significance: personality = recognition > recall = specific = vague = global. The fact that there was no typicality X instructional set interaction suggests that the typicality effect is rather impervious to the comprehenders’ goals. The typicality effect difference scores were 1.1 1, 1.22, .99, .92, 1.36, and 1.05 in the vague, personality, specific, global, recognition, and recall instruction conditions, respectively. These typicality effect scores are roughly constant and do not systematically vary with the amount of resources that we expected to be allocated to atypical versus typical actions at comprehension. The instructional set variables did influence memory, but these variations in the comprehenders’ goals had a constant effect over typical and atypical actions. In summary, an explanation of the typicality effect involves an organized representational code that is established automatically during comprehension. The typicality effect is a robust phenomenon that is not malleable by the comprehender’s goals and the allocation of cognitive resources. F. DOESTHE TYPICALITY EFFECTOCCURIN ECOLOGICALLY VALID SETTINGS?
In all the experiments we have reported so far, the subjects have been aware that they were in an experiment. Does the typicality effect occur in more ecologically valid situations when an individual does not anticipate being tested in some form? Some social rules and pragmatic constraints are followed when individuals comprehend prose or experience events in an experimental setting. In the context of discourse, the speaker and listener assume that whatever is said is important and relevant to the goals of the interchange. In the context of an experiment, the subjects assume that all presented material is important and relevant to the goals of the experimental session. These pragmatic rules underly speech acts, discourse, and social interaction (de Beaugrande, 1980; Grice, 1975; Searle, 1969). The studies supporting the typicality effect and the SC+T model may be restricted to contexts in which the above pragmatic rules operate. When subjects comprehend an atypical action in a scripted passage, they would assume that the atypical action is particularly important and a relevant part of the message. The subject would believe that the speaker had an important reason for including information which would otherwise be irrelevant to the topic. Consequently, the atypical information would receive more attention and elaboration. The same pragmatic rules might
Schemas, Comprehension, and Memory
91
not apply when the material is not prose and when the comprehender is not in an experimental setting. We conducted an experiment to assess the typicality effect in an ecologically valid setting. Students received a lecture and subsequently completed a surprise recognition test on the actions performed by the lecturer. The actions varied in typicality with respect to the lecture script. Since the students were not aware that they were in an experiment during the lecture, they would not assume that all input was relevant and important. Moreover, lecturers normally communicate important and relevant messages by sentences rather than by actions; the lecturer does not intend to communicate each action or gesture executed in a lecture. According to the SC+T model, memory should be better for atypical than typical actions. However, if the typicality effect is restricted to experimental materials and prose materials, then the atypical actions would not be remembered better then typical actions. The experiment included a lecture phase, an intervening task phase, and a test phase. During the lecture phase, the students received a 15-min lecture at the beginning of their scheduled laboratory section. During the lecture, the lecturer performed a number of actions that varied in typicality with respect to a lecture script. Some typical actions were the lecturer pointing to the blackboard and the lecturer handing a student a sheet of paper. Some atypical actions were the lecturer taking offa watch and the lecturer wiping offhis glasses. Both typical and atypical actions were performed in a smooth, nonobvious manner. Before the lecturer delivered the lecture, he told the students that the material was review and that they did not need to take notes. The purpose of this comment was to encourage students to look at the lecturer rather than at their notes. There were two lecturers, which included a different sample of 10 typical and 10 atypical actions. There were 20 students in each lecture session. After the lecture, there was a 20-min intervening task. During this intervening task, the lecturer led the students to a different room in order to give a demonstration on the use of a computer. While the students and lecturer were in the computer room, a confederate cleaned up the laboratory room in order to destroy all clues as to what actions may have been performed during the lecture. After the intervening activity, the students were led back to the laboratory room and were given a recognition test on the lecture actions. The recognition data supported the SC+T model. The d’ scores were significantly higher for atypical than typical actions, 1.02 versus .15, respectively, F(1, 39) = 21.00, p < .01. The hit rates were .66 and .56 for typical versus atypical actions, whereas the false alarm rates were .62
92
Arthur C. Graesser and Glenn V. Nakamura
versus .23, respectively. These data again confirm the generality of the typicality effect. The typicality effect persists in ecologically valid settings when subjects are unaware they are in an experiment. The typicality effect is not an artifact of social rules, pragmatic constraints, or conversational postulates. G . ARE UNPRESENTED TYPICAL ITEMS INFERREDAT COMPREHENSION OR AT RETRIEVAL?
The effects of typicality on memory are mainly determined by the false alarm rates. Hit rates do not vary systematically with typicality and usually show a flat function. Figure 1 illustrates the effect of typicality on false alarm rates. Between the interval of moderately typical and very typical, there is a sharp increase in false alarm rates; the corresponding d’ scores show a dramatic decrease. In other words, memory varies antagonistically with the false alarm rates. Are the false alarm rates a product of encoding mechanisms or retrieval mechanisms? The false alarms may correspond to the inferences that the comprehender generated during comprehension. These inferences would be copied into the memory trace, as specified by the SC+T model. There is an alternative possibility, however. Perhaps these false alarms are not comprehension-generated inferences. Instead, the inferences may have been retrieval generated, that is, derived only at test time. To what extent are the false alarms comprehension-generated inferences versus retrievalgenerated inferences? Researchers would probably not quibble with the claim that a subset of the false alarms reflects comprehension-generated inferences, whereas another subset reflects retrieval-generated inferences. In fact, there is evidence for both types of inferences. One finding suggests that some false alarms are a product of retrieval, but not of comprehension. Specifically, false alarm rates show a modest increase between the interval of very typical to moderately atypical. None of these atypical actions would have been made at comprehension, yet there was a systematic change in false alarm rates. Studies by Yekovich (Dunay, Balzer, & Yekovich, 1981; Yekovich & Yekovich, 1982) have supported the conclusion that many unpresented typical actions are generated at comprehension. Subjects were presented with several scripted activities. Within 5 and 8 sec after listening to an excerpt, subjects received test words and decided as quickly as possible whether the test word was (a) explicit (i.e., mentioned in the passage), (b) implicity, or (c) unrelated to the passage. Some of the nouns were explicitly mentioned in the passage. Other test nouns were part of plausible
Schemas, Comprehension, and Memory
93
inferences that were not explicitly mentioned. A third group of nouns was totally unrelated to the passage. Yekovich reported an extremely high false alarm rate for implicit, related words. The false alarm rates varied from .33 to .46. These false alarm rates are almost as high as the false alarm rates for typical actions after a 30-min or 3-week delay (.53 in Smith & Graesser, 1981). Thus, within a few seconds after comprehension, subjects judged that unpresented typical information was being presented. These data strongly indicate that a substantial number of inferences would be generated at comprehension. A related question is whether subjects sometimes avoid searching memory when the test action is very typical of the script. We have reported that memory discrimination is low for moderately typical actions and zero for very typical actions. Perhaps memory is poor for these actions because the subjects prematurely conclude that these actions “must have been presented” and they avoid searching their memory. If subjects avoid search processes, there might be good memory for an item, but this would not be manifested in the recognition data. A study was conducted to assess whether the poor memory for typical actions was an artifact of memory search avoidance. In this study (Graesser et al., 1980), two types of recognition tests were administered. One test format had a YES/NO format, whereas the other had a twoalternative, forced-choice (2AFC) format. For the 2AFC test, the two test actions of each pair were matched on typicality. Consequently, typicality could not serve as a criterion for deciding whether an action was presented; subjects would be forced to search their memory traces in order to decide which of the two actions had been presented. Suppose that subjects sometimes avoided memory search on the YES/NO test. If they did, then memory would be better in the 2AFC than the YES/NO format. If, however, memory search was not avoided in the YES/NO format, d’ scores would be the same for YES/NO and 2AFC formats. The Graesser et al. (1980) study did not support the idea that subjects sometimes avoid memory search when given a YES/NO recognition test. The d‘ scores were the same in the YES/NO format and the 2AFC format. The YES/NO recognition format accurately taps the memory subjects have about actions in scripted activities.
V.
The Fate of Four Alternative Models
In an earlier section we described four alternative models specifying the impact of schemas on comprehension and memory. How do the four alternative models compare to the SC+T model in explaining the avail-
94
Arthur C. Graesser and Glenn V. Nakamura
able data? We believe that the SC+T model provides the best account of the data. The purpose of this section is to point out the weaknesses of the other four models. A.
PROBLEMS WITH
THE
FILTERING MODEL
The filtering model predicts that typical information will be encoded and retained in memory better than atypical information. This prediction was not supported in the studies reported in this article. There was better memory discrimination for atypical information than typical information. This outcome seriously challenges the filtering model and has led us to reject the model. It is interesting, however, that the intuitions and theories of many researchers adopt a filtering mechanism. There is only one condition in which memory for typical information exceeds that of atypical information. This condition involves recall after a long retention interval. This outcome is accommodated by the SC+T model’s assumptions regarding recall and conceptually driven retrieval. Recall involves conceptually driven retrieval and the schema plays a more important role in guiding conceptually driven retrieval as the retention interval increases. For conceptually driven retrieval, the tagged typical items decay at a slower rate than the tagged atypical items (see Fig. 2). The obtained crossover effect for conceptually driven retrieval is accommodated by the SC+T model. B . PROBLEMS WITH
THE
ATTENTION-ELABORATION MODEL
According to the attention-elaboration explanation of the typicality effect, atypical items receive more attention and conceptual elaboration than the typical items. This differential attention and elaboration are presumed to explain differences in memory. According to the SC+T model, there can be systematic differences in attention and elaboration devoted to items, but such differences do not explain the typicality effect. Instead, properties of the representational code explain the typicality effect, and the code is constructed automatically at comprehension. Several findings challenge the attention-elaboration explanation of the typicality effect. First, there is zero memory discrimination for very typical actions, yet these items must have received some attention and elaboration. Second, memory for typical and atypical actions is not influenced by a variety of script distortions that would presumably draw resources away from atypical actions and therefore attenuate memory for such items. Specifically, memory for script actions does not decrease when (a) more and more atypical actions are included in the scripted
Schemas, Comprehension, and Memory
95
activities, (b) a script is interrupted by another scripted activity, (c) actions of one scripted activity are interleaved with actions of another scripted activity (Graesser, 1981; Graesser et al., 1979). Third, it is difficult for the attention-elaboration model to explain the fact that recall for atypical information is lower than that of typical actions after a long retention interval. Fourth, the typicality effect is not influenced by the presentation rate of the material. Fifth, the typicality effect is not influenced by variations in instructions which are designed to manipulate the relative amount of attention and elaboration that typical versus atypical actions receive. Thus, the typicality effect is clearly not a malleable phenomenon contingent on resource allocation during encoding. Rather, it is a robust phenomenon that is relatively impervious to the distribution of cognitive resources. Other researchers have arrived at similar conclusions in the context of picture memory (Friedman, 1979; Light et al., 1979). We acknowledge that we have not completely eliminated an attentionelaboration explanation of the typicality effect. Perhaps the comprehender’s attentional resources can oscillate very quickly between typical and atypical actions. Future experiments need to test this possibility. As it stands, however, evidence for an attention-elaboration explanation is slim. Our attempts to manipulate the allocation of resources to items and thereby affect memory have consistently failed. The typicality effect has been particularly resilient to variations in resource allocation. Such variations may modestly influence memory in the form of a second level code, but they do not explain the typicality effects reported in this article.
c.
PROBLEMS WITH THE PARTIAL COPY
MODEL
Bower et al. (1979) proposed a partial copy model as an explanation of how scripts influence comprehension and memory (see Section 1,C). According to this model, two independent codes are formed when scripted activities are comprehended. One code is the episodic memory structure, which is a list of propositions explicitly stated in the passage. According to Bower et al., the likelihood of accessing a proposition in this episodic list decays quickly according to an exponential function. After a week or so there would be no memory for this code. The second type of code involves the generic script. When a scripted passage is comprehended, actions of the generic script are activated by script-relevant passage actions. For script-relevant actions stated in a passage, there is a corresponding action in the generic schema that receives a strong activation. There are also actions in the generic schema that receive a weak activation because they are inferred in the passage by
96
Arthur C. Graesser and Glenn V. Nakamura
default. An action is later remembered if it meets or exceeds some critical level of activation. With time, the activation level for an action in the generic script decays, and is reactivated when the script is used again. After a long retention interval, there will be no discrimination between actions that received a weak versus a strong activation in the context of a specific scripted passage. Thus, after a long retention interval there should be little or no memory discrimination between presented and unpresented typical script actions. There are two problems with the partial copy model. The first problem is the finding of zero memory discrimination for very typical actions after a 30-min retention interval. According to the partial copy model, there should be some memory discrimination for these items. These actions would have some likelihood of being retrieved from the episodic memory structure, that is, the list of explicit passage propositions. These actions should also show memory discrimination by virtue of the generic script; the very typical stated actions should have a high activation, whereas very typical unstated actions should have a weak activation. The second problem with the partial copy model involves recognition memory after a long retention interval. According to the partial copy model, there should be no memory for irrelevant passage actions after a long retention interval. After 3 weeks, for example, the episodic memory structure would certainly be completely decayed and the generic script would provide no basis for remembering the irrelevant information. However, there is in fact substantial recognition memory for irrelevant actions after 3 weeks (Graesser, 1981; Smith & Graesser, 1981). D.
PROBLEMS WITH THE SCHEMA POINTER PLUS
TAGMODEL
The SP+T model is quite similar to the SC+T model and provides a close fit to the data reported in this article. The major shortcoming of the SP+T model pertains to the amount of generic schema information that is incorporated into the specific memory trace for a passage. According to the SP+T model, the memory trace contains a pointer to the generic schema, so that the entire generic schema is copied into the specific memory trace. As we noted earlier, it is implausible that the entire generic schema is copied into the specific memory trace. For reasons discussed by Bower et al. (1979), only a subset of the generic schema content would be activated during the comprehension of a specific passage. Therefore, we abandoned this strong assumption of the SP+T model and adopted the SC+T model. The SC+T model assumes that only a subset of the nodes in a generic schema are copied into the memory trace. This is not a strong claim. The
Schemas, Comprehension, and Memory
91
assumption does not predict which subset of generic nodes is passed to the memory representation of a specific passage. In the next section, we shall examine whether there is a systematic relationship between (a) the subset of generic schema nodes that are copied into a passage representation and (b) the set of items that are explicitly mentioned in a given passage. The SC+T model would be strengthened if it could specify when a schema node is or is not copied into a specific memory trace.
VI.
The Process of Copying Schema Nodes into Specific Memory Traces
When a schema-based passage is comprehended, many nodes in the generic schema are copied into the passage representation, even though the nodes were not explicitly mentioned. These nodes are inferences. Which of the generic nodes end up being passage inferences? In this section we will report a study designed to address this question. The study was part of a master's thesis completed by Donald Smith at California State University in Fullerton (Smith, 1981). Smith's thesis focused on memory for scripted activities. Smith explained whether it is possible to predict which actions of a generic script are copied into the memory trace of a scripted passage. The false alarm rate of an unpresented script action served as an index of whether an action was an inference in a scripted passage. When an action had a high false alarm rate, it was regarded as a passage inference. Again, we should mention that the false alarm rates have been responsible for the pattern of memory discrimination scores (d' scores) in nearly all the experiments reported in this chapter. As shown in Fig. 1, typicality substantially influences false alarm rates and d' scores, but not hit rates. Within the typicality interval of 4 (moderately typical) to 6 (very typical) there is a dramatic increase in false alarm rates. What factors predict false alarm rates for typical actions? Smith (1981) investigated the extent to which an unpresented action node is activated by three different knowledge sources. First, an inference may be activated by the generic script. When the script is identified at the beginning of the scripted activity (e.g., Jack decided to eat dinner at a restaurant), then a set of script-relevant actions are activated as inferences. Second, an inference may be activated by an action explicitly stated in the passage. For example, if the scripted activity stated that Jack took out his wallet, then a plausible inference would be that Jack paid the bill. Third, an inference may be activated by a conceptual subchunk within a generic script. The significance of subchunks will be discussed shortly. In the
98
Arthur C. Graesser and Glenn V. Nakamura
following subsections, we shall describe the three knowledge sources and also how script actions were scaled with regard to these sources of activation. A.
ACTIVATION OF INFERREDACTIONS VIA THE GENERIC SCRIPT
Some script actions are activated when the generic script is introduced at the beginning of the scripted activity. For example, when Jack decided to eat dinner at a restaurant is mentioned, then the restaurant script is identified and some actions are activated, even when they are not explicitly mentioned in the rest of the passage. Some plausible script-activated actions for the restaurant script are Jack ordered food, someone gave Jack the food, and Jack paid for the food. These script-activated actions would presumably include actions that are central, essential, or characteristic of the script. The typicality ratings for script actions provide a reasonable index of the extent to which the actions are script activated. Subjects in a normative group scaled the actions on the 6-point typicality scale described earlier. The actions have also been scaled on a 6-point necessity scale. Subjects rated how necessary it is to execute an action when enacting a given script. For example, going to the restaurant would be a necessary action in the restaurant script. The necessity ratings are highly correlated with the typicality ratings, r = .91 (Graesser, 1981; Graesser et al., 1979). The mean typicality rating for an action served as a measure of script activation in Smith’s (198 1) thesis. B.
ACTIVATION OF INFERRED ACTIONS VIA PASSAGEACTIONS
STATED
Some inferences are activated by an explicitly stated passage action together with the generic script context. For example, suppose the restaurant script is identified and the passage states that Jack sat down at the table. The comprehender would probably infer that Jack walked to the table. This inference would be activated by the passage action plus the restaurant script, or by the passage action alone. The inference may not have been activated by the restaurant script alone; in some restaurants the customers do not eat at tables. In Smith’s thesis, necessity scures were computed from measures collected in a normative rating group of subjects. Necessity scores were computed for the test actions in the Jack story scripts (Graesser, 1981; Graesser et al., 1980). Subjects were given the script title and then rated
Schemas, Comprehension, and Memory
99
pairs of actions in a necessity scale. A pair of actions involved an activator and an activatee action and was placed in the following frame: Given that (activator), how necessary is it that (activatee)? For example, if the activator action is the person orderedfood and the activatee action is the person ate food, then the action pair would be: Given that the person ordered food, how necessary is it that the person ate the food? The subjects rated these action pairs on a 6-point necessity scale: 1 = very unnecessary; 2 = somewhat unnecessary; 3 = uncertain, but probably unnecessary; 4 = uncertain, but probably necessary; 5 = somewhat necessary; and 6 = very necessary. Trabasso, Secca, and Brock (1982) has used a similar test for determining whether actions, events, and states in stories are causally related. Of the 22 typical actions in each script, only 8 served as test actions in our previous memory studies. The same 8 actions of a script were the activatee items of the action pairs in the Smith thesis. Each of the 8 critical actions had 21 action pairs for which necessity ratings were collected; the activatee action was paired with the other 21 actions in a script. Since there were 8 critical actions and 21 pairs per critical action, 168 pairs were rated for each script. Smith computed a strength of activation, P,, for each action pair, which corresponded to the proportion of subjects who rated an activatee action, Aj, as being necessary for a given activator action, Ai.The criterion for being necessary was a rating of 4,5, or 6 on the necessity scale. Once these activation scores were collected, the total activation was computed for an inference action in an acquisition passage. The total activation for an inference action obviously depended on what actions were presented in the acquisition passage. An inference action would tend to be activated when there were one or more passage actions that had high activation scores associated with the inference. The total activation could be measured for inference Aj in a specific passage that contained a given subset of script actions. 22
Total activation ( A j ) = i= I
(PV given that A, was presented in the passage)
(3)
The necessity score for action Aj in a specific passage was the average amount of activation per explicit action. If there were n explicit actions, then the necessity score for Aj would be Necessity score (Aj) = Total activation (A,)ln
(4)
100
Arthur C. Graesser and Glenn V. Nakamura
The false alarm rate for inference Aj is expected to increase as its necessity score increases. C. ACTIVATION OF INFERRED ACTIONSVIA A
THE
ACTIVATION OF
SUBCHUNK
It is quite plausible that the generic script is organized into subchunks. For example, the restaurant schema might have the following subunits: waiting, ordering, being served, eating, and paying. The idea that scripts, schemas, and passages are subdivided into subchunks has been proposed by a number of researchers (Bower et al., 1979; Black & Bower, 1979; Rumelhart & Ortony, 1977; Schank, 1980). In the context of scripts, Schank calls these subchunks “scenes.” Thus, there would be a waiting scene, ordering scene, and so on. It is possible that the content of a generic schema is copied into the memory trace subchunk by subchunk. When scripts are involved, the memory trace is constructed scene by scene. A scene-by-scene composition would provide some flexibility in the construction process. When an explicit action invokes a specific scene, then many nodes associated with the scene would be copied into the memory trace. If, however, there is no mention of any action associated with a given scene, then the memory trace would contain no nodes associated with that scene. For example, if the restaurant passage does not mention anything about paying, then the entire paying scene would be left out. Moreover, an inference action may be activated by virtue of a subchunk, but not by virtue of script activation or explicit action activation. Smith used a sorting task as a method for identifying subchunks in the eight scripted activities. Subjects were given a set of 22 cards, with each card containing an action in the script. Subjects sorted the actions into different piles. Subjects were told what the script title was before they sorted the actions. The number of chunks or piles that subjects decided to use was at their own discretion, but the experimenter emphasized that actions within a chunk should be conceptually related. The mean number of chunks per script ranged from 3.5 to 8.4, with a mean of 5.6. The mean number of actions per chunk ranged from 2.6 to 6.1, with a mean of 3.9. Smith computed a “chunk score” for the eight critical actions in each scripted activity. The chunk score for action Aj was an index of the extent to which the action Aj would be activated by virtue of being in the same chunk as actions explicitly stated in an acquisition passage. Suppose that P y is the proportion of subjects who sort activatee action, Aj, in the same pile as action Ai.Then the total chunk activation for action Ai in a given
Schemas, Comprehension, and Memory
101
passage would be 22
Total chunk activation (A,) =
(Pii given that A is n='
(5)
stated in the passage and P,>.25)
There was a .25 minimal activation threshold in formula 5 because the subjects rarely placed only one action in a pile and we desired a sensitive assessment of chunk activation. The chunk score was an average chunk activation per explicit passage action. If there were n explicit actions in the scripted passage, then the chunk score for inference Aj would be: Chunk score (A,) = Total chunk activation (Aj)/n
(6)
The false alarm rate for inference Aj is expected to increase with the chunk score for Aj. Obviously, the chunk score for an inference should vary as a function of which actions are stated in the acquisition passage. D.
PREDICTING FALSEALARMRATESFOR UNSTATED SCRIPTACTIONS
To what extent can false alarm rates for script actions be predicted by the typicality ratings, necessity scores, and chunk scores? These three predictor variables were assumed to measure the activation of an action via the script, a stated action, versus a chunk. Smith analyzed the false alarm data in the Smith and Graesser (1981) study involving memory for the Jack story. We should remind the reader that there were two versions of the Jack story (A and B) and that each scripted activity had contained (a) 14 common typical actions presented in both versions A and B, (b) 4 typical actions presented in A but not in B, and (c) 4 typical actions presented in B but not in A. Smith attempted to predict the false alarm rates involved in the latter two sets of actions, that is, the B actions when subjects listened to version A, and the A actions when subjects listened to version B. Since there were 8 scripted activities and 8 critical test actions per script, Smith attempted to predict false alarms for 64 actions. Smith used multiple regression techniques when assessing the extent to which the false alarm rates for the 64 actions could be predicted by the actions' typicality ratings, necessity scores, and chunk scores. Analyses revealed that only the typicality ratings significantly predicted the false alarm rates. The fact that the necessity scores and chunk scores failed to predict false alarm rates is not surprising in retrospect. The inference actions in a script may have been usually activated by several explicit actions. Perhaps the activation levels of the unstated actions always exceeded threshold levels because they were activated several times by
102
Arthur C. Graesser and Glenn V. Nakamura
several knowledge sources. If there was extensive multiple activation, then the false alarm rates would be high and not sensitive to the chunk scores and necessity scores. In fact, the range of chunk scores was .08 to .42, and the range of necessity scores was .07 to .82. Since a score of .06 indicates activation from one activator (.06 X 18 stated typical actions = 1.08), all of the activatees (potential inferences) received at least one activation and usually several activations. Therefore, a more sensitive test was needed to assess the impact of necessity scores and chunk scores on false alarms. In order to prevent multiple activation, Smith wrote new scripted activities that contained a small subset of the actions used in the Smith and Graesser study. There were several versions of each scripted activity with a different subset of eight actions in each version. The versions were composed and manipulated in order to assess the false alarm rate for a single inference action (activatee). The details of these contextual variations are described in Smith’s (1981) thesis. The upshot of Smith’s careful manipulation of script versions was that there were four context conditions, and the false alarm rate for a critical inference (activatee) was measured in each context condition. The four context actions are (1) low necessity score and low chunk score, (2) low necessity score but high chunk score, (3) high necessity score but low chunk score, and (4) high necessity score and high chunk score. In the low necessity score conditions, none of the 8 explicit actions in the passage would activate the critical inference by virtue of necessity; similarly, in the low chunk score conditions, none of the eight explicit actions in the passage would activate the critical inference by virtue of being in the same subchunk as a stated action. In the high necessity (or chunk) condition, one or two of the eight explicit actions activated the critical inference. Table VIII shows the mean false alarm rates for the critical inference actions as a function of necessity score and chunk score. An analysis of variance was performed using item variability in the error term. False alarms were somewhat higher in the high than low necessity score conditions, .40 versus .28, respectively, F(1, 14) = 7.59, p < .05. False alarms were higher in the high than low chunk score conditions, .37 versus .30, respectively, but not quite significantly. The most interesting outcome was the chunk score X necessity score interaction, F(1, 14) = 4.82, p < .05. The false alarm rate was very high (.47) when the chunk score and the necessity score were both high compared to the other three conditions, which had roughly the same false alarm rates (.30). The interaction suggests that a generic script node must satisfy dual criteria before it is copied into the memory trace as an inference. First, the generic node must be part of the same subchunk as a
Schema, Comprehension, and Memory
103
TABLE VIII FALSEALARMRATESAS A FUNCTION OF ACTIVATION VIA CHUNKINC AND NECESSITY Activation via necessity Activation via chunking
Low
High
Low High
.28 .28
.33
.47
node that is explicitly stated in a passage. Second, the generic node must be a necessary antecedent or consequent of an action that is explicitly stated. A generic node does not tend to be copied into the memory trace if it satisfies none or only one of these two criteria. Stated differently, an explicit action tends to activate an inference if (a) the inference is part of the same subchunk as the stated action, and (b) the inference is a necessary antecedent or consequence of the stated action. The findings reported in Smith’s ( 1981) thesis need to be replicated and are clearly not the final word on the question of which schema nodes are copied into specific memory traces in the form of inferences. However, Smith’s thesis findings are consistent with the following claims regarding scripts:
1. When the script is identified, the script activates some generic nodes and these nodes are copied into the specific memory trace. Nodes that are very typical of the script tend to be activated. 2. One or more actions in a passage activate a subchunk of information (i.e., a scene) within the generic script. Some of the generic nodes within the subchunk are activated and copied into the specific memory trace, namely, nodes that are necessary antecedents and consequences of the explicit actions. VII.
Questions for Further Research
We believe that the SC+T model is a strong competitor with alternative schema-based models of comprehension and memory. The SC+T model has provided a close fit to the data reported in this article and has been articulated in the form of a mathematical model (see Graesser, 1981; Smith & Graesser, 1981). The model also has a fairly broad scope. First, the model accounts for the effects of typicality on memory after different
104
Arthur C. Graesser and Glenn V. Nakamura
retention intervals. Second, the model isolates differences between recall and recognition processes. Third, the model applies to different knowledge domains, to different types of schemas, and to situations in which more than one schema guides the processing of incoming information. Fourth, the model’s predictions are confirmed under different encoding conditions and variations in comprehender goals. Fifth, the model provides a better fit to the data than some alternative schema-based models. The model also is easily expanded to explain which generic schema nodes are copied into specific memory traces. There are a number of questions for future research. Experiments need to be conducted to examine further which generic nodes are copied into a memory trace. We have isolated some factors that predict inference activation. The Smith (1981) study suggests that some inferences are activated in a top-down fashion by the generic schema. Other inferences are generated in a bottom-up fashion, so that explicit information activates a subchunk in the schema and inferences within the subchunk are activated if and only if they are a necessary implication of the explicit information. Other factors might predict inferencing. For example, structural properties of the generic schema might predict inference generation. It is plausible that the generic schema would activate nodes that are more superordinate in a hierarchical structure when the schema structure is hierarchical (Bower et al., 1979; Graesser, 1978). An inference may tend to be activated by a schema if the node is related directly to many other nodes in the generic schema (Graesser, 1978, 1981; Graesser, Robertson, & Anderson, 1981). It is beyond the scope of this article to address the structural dimensions of schemas in comprehension and memory. This is clearly an important issue for future research and for future development of the SC+T model. A second question for future research involves the temporal dimensions of schemas in comprehension and memory. For some schemas, events and actions unfold in a chronological order. In fact, temporality is critical in scripted action sequences. Actions unfold in either a logical order (e.g., the waitress must serve the food before the customer eats), a conventional order (e.g., the customer usually eats before leaving a tip), or an order that reff ects certain environmental constraints. Researchers have examined story schemas and have specified temporal constraints that exist in stories (Thorndyke, 1977; Mandler & Johnson, 1977; Rumelhart, 1975, 1977; Stein & Glenn, 1979). When story episodes are presented out of order, the order of recalling episodes drifts toward the temporal constraints of story schemas (Mandler, 1978; Stein & Nezworski, 1978). These systematic errors in recall order also occur in scripted passages (Bower et al., 1979). In the future, the SC+T model
Schemas, Comprehension, and Memory
105
must address temporality, because this dimension has a central role in many knowledge domains. We have not addressed temporality in this article because we chose to concentrate on aspects of schema processing that would apply to any knowledge domain. For some types of schemas (e.g., stereotypes, roles, and spatial scenarios), the dimension of temporality is not particularly salient or important. A third problem for future research involves the collection and explanation of reaction time data when memory is assessed by a recognition test. For example, the model should account for the latencies of hits, false alarms, correct rejections, and misses when typical and atypical actions are tested at different retention intervals. In fact, we are at present collecting and analyzing these data. Reaction time data should provide a richer data base for testing and extending the SC+T model. Still other questions may be pursued within the framework of the SC+T model. The model will undoubtedly need to be modified and extended as new findings accumulate. With further efforts in research and theory, we hope to converge on a scientifically rigorous, detailed, general, and decisive schema for schema processing. ACKNOWLEDGMENTS The research reported in this article was supported by a National Institute of Mental Health grant (MH-33491) awarded to the first author. We would like to thank the following members of the Cognitive Research Group at California State University at Fullerton who conducted the experiments reported in Sections V and VII of this article: Lea Adams, Hank Bruflodt, Leslie Clark, Scott Elofson, Sharon Goodman, Tami Murachver, James Riha, Carol Rossi, Don Smith, Judy Zimmerman, and Professor Stanley Woll.
REFERENCES Abelson, R. P. The psychological status of the script concept. American Psychologist, 1981, 36, 715-729. Adams, M. J., & Collins, A. A. A schematic-theoretic view of reading. In R. 0. Freedle (Ed.), New directions in discourse processing (Vol. 2). Norwood, New Jersey: Ablex, 1979. Anderson, R. C. The notion of schemata and the educational enterprise: General discussion of the conference. In R. C. Anderson, R. J . Spiro, & W. E. Montague (Eds.), Schooling and the acquisition of knowledge. Hillsdale, New Jersey: Erlbaum, 1977. Anderson, R. C., Spiro, R. J., & Anderson, M. C. Schemata as scaffolding for the representation of information in connected discourse. American Educational Research Journal, 1978, 15, 433-440. Atkinson, R. C., & Juola, J . F. Search and decision processes in recognition memory. In D. H. Krantz, R. C. Atkinson, R. D. Luce, & P. Suppes (Eds.), Contemporary developments in mathematical psychology (Vol. 1). San Francisco, California: Freeman, 1974.
106
Arthur C. Graesser and Glenn V. Nakamura
Bartlett, F. C. Remembering. Cambridge, Massachusetts: Cambridge University Press, 1932. de Beaugrande, R. Text, discourse, and process. Norwood, New Jersey: Ablex, 1980. Bellezza, F. S., & Bower, G. H. Person stereotypes and memory for people. Journal offersonality and Social Psychology, 1981. 41, 856-865. (a) Bellezza, F. S., & Bower, G. H. The representation and processing characteristics of scripts. Bulletin of the Psychonomic Society, 1981, 18, 1, 4. (b) Bellezza, F. S . , & Bower, G . H. Remembering script-based text. Poetics, 1982, in press. Biederman, I. On the semantics of a glance at a scene. In M. Kubovy & J . R. Pomerantz (Eds.), Perceptual organization. Hillsdale, New Jersey: Erlbaum, 1982, in press. Black, J. B., & Bower, G. H. Episodes as chunks in narrative memory. Journal of Verbal Learning and Verbal Behavior, 1979, 18, 309-318. Bobrow, D. G . , & Norman, D. A. Some principles of memory schemata. In D. G. Bobrow & A. Collins (Eds.), Representation and understanding. New York: Academic Press, 1975. Bower, G. H., Black, J. B., &Turner, T. J . Scripts in memory for text. Cognitive Psychology, 1979, 11, 177-220. Bransford, J . D. Human cognition: Learning, understanding, and remembering. Belmont, California: Wadsworth, 1979. Bransford, J. D., & Johnson, M. K. Considerations of some problems on comprehension. In W. G. Chase (Ed.), Visual information processing. New York: Academic Press, 1973. Bregman, A. S. Perception and behavior as compositions of ideals. Cognitive Psychology, 1977, 9, 250-292. Brewer, W. F., & Treyens, J . C. Role of schemata in memory for places. Cognitive Psychology, 1981, 13, 207-230. Cantor, N. Prototypicality and personality judgements. Unpublished doctoral dissertation, Stanford University, 1978. Cantor, N., & Mischel, W. Traits as prototypes: Effects on recognition memory. Journal of Personality and Social Psychology, 1977, 35, 38-48. Clark, L. F., & Woll, S. B. Stereotypes: A reconstructive analysis of reconstructive effects. Journal of Personality and Social Psychology, 1981, 41, 1064-1072. Cofer, C. N., Chmielewski, D. L., & Brockway, J. F. Constructive processes and the structure of human memory. In C. N. Cofer (Ed.), The structure of human memory. San Francisco, California: Freeman, 1976. D’Andrade, R. G . Memory and the assessment of behavior. In H. Blalock (Ed.), Measuremenr in the social sciences. Chicago, Illinois: Aldine, 1974. Dooling, D. J., & Lachman, R. Effects of comprehension on the retention of prose. Journal of Experimental Psychology, 1971, 88, 216-222. Dunay, P. K., Blazer, R. H., & Yekovich, F. R. Using memory schemata to comprehend scripted texr. Paper presented at the meeting of the American Psychological Association, Los Angeles, California, 1981. Flavell, J. H. The developmental psychology of Jean Piaget. Princeton, New Jersey: Van NostrandReinhold, 1963. Friedman, A. Framing pictures: The role of knowledge in automatized encoding and memory for gist. Journal of Experimental Psychology: General, 1979, 108, 316-355. Going, M., & Read, J. D. Effects of uniqueness, sex of subject, and sex of photograph on facial recognition. Perceptual and Motor Skills, 1974, 39, 109-1 10. Goldin, S. E. Memory for the ordinary: Typicality effects in chess memory. Journal of Experimental Psychology: Human Learning and Memory. 1978, 4, 605-616. Goodman, G. S. Picture memory: How the action schema affects retention. Cognitive Psychology, 1980, 12, 473-495.
Schemas, Comprehension, and Memory
107
Graesser, A. C. How to catch a fish: The representation and memory of common procedures. Discourse Processes, 1978, I, 72-89. Graesser, A. C. Prose comprehension beyond the word. New York: Springer-Verlag, 1981. Graesser, A. C., Gordon, S. E., & Sawyer, J. D. Memory for typical and atypical actions in scripted activities: Test of a script pointer + tag hypothesis. Journal of Verbal Learning and Verbal Behavior, 1979, 18, 319-332. Graesser, A. C., Robertson, S. P., & Anderson, P. A. Incorporating inferences in narrative representations: A study of how and why. Cognitive Psychology, 1981, 13, 1-26. Graesser, A. C., Woll, S. B., Kowalski, D. J., & Smith, D. A. Memory for typical and atypical actions in scripted activities. Journal of Experimental Psychology: Human Learning and Memory, 1980, 6 , 503-513. Green, D. M., & Swets, J. A. Signal detection theory andpsychophysics. New York: Wiley, 1966. Grice, H. P. Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and semantics (Vol. 3): Speech acts. New York: Seminar Press, 1975. Hamilton, D. L. Illusory correlations as a basis for stereotyping. In D. L. Hamilton (Ed.), Cognitive processes in stereotyping and intergroup behavior. Hillsdale, New Jersey: Erlbaum, 1981. Hastie, R. Memory for behavioral information that confirms a personality impression. In R. Hastie, T. M. Ostrom, E. B. Ebbesen, R. S. Wyer, D. L. Hamilton, & D. E. Carlston (Eds.), Person memory: The cognitive basis of social perception. Hillsdale, New Jersey: Erlbaum, 1980. Hastie, R., & Kumar, A. P. Person memory: Personality traits as organizing principles in memory for behaviors. Journal of Personality and Social Psychology, 1979, 37, 25-38. Kintsch, W. Memory and cognition. New York: Wiley, 1977. Kintsch, W., & Van Dijk, T. A. Toward a model to text comprehension and production. Psychological Review, 1978, 85, 363-394. Krueger, L. E. Is identity or regularity more salient than difference or irregularity? Paper presented at the meeting of the American Psychological Association, Los Angeles, California, 1981. Light, L. L.. Kayra-Stuart, F., & Hollander, S. Recognition memory for typical and unusual faces. Journal of Experimental Psychology: Human Learning and Memory, 1979, 5 , 212-228. Loftus, G. R., & Mackworth, N. H. Cognitive determinants of fixation location during picture viewing. Journal of Experimental Psychology: Human Perception and Performance, 1978, 4, 565-572. Mandler, G. Organization and recognition. In E. Tulving & Donaldson (Eds.), Organization and memory. New York: Academic Press, 1972. Mandler, G. Recognizing: The judgement of previous occurrence. Psychological Review, 1980,87, 252-271. Mandler, J. M. A code in the node: The use of a story schema in retrieval. Discourse Processes, 1978, 1, 14-35. Mandler, I. M. Categorical and schematic organization in memory. In C. R. Puff (Ed.), Memory organization and structure. New York: Academic Press, 1979. Mandler, J. M. Representation. In J. H. Flavell and E. M. Markman (Eds.), Cognitive development. Vol. 2 of P. Mussen (Ed.), Manual of child psychology. New York: Wiley, 1982, in press. Mandler, J. M., & Johnson, N. S. Rememberance of things passed: Story structure and recall. Cognitive Psychology, 1977, 9 , 11 1-151. Minsky, M. A. A framework for representing knowledge. In P. H. Winston (Ed.), Thepsychology of computer vision. New York McGraw-Hill, 1975. Neisser, U. Cognition and reality. San Francisco, California: Freeman, 1976. Neisser, U., & Becklen, R. Selective looking: Attending to visually specific events. Cognitive Psychology, 1975, 7, 480-494. Nelson, K. Cognitive development and the acquisition of concepts. In R. C. Anderson, R. J. Spiro,
108
Arthur C. Graesser and Glenn V. Nakamura
& W. E. Montague (Eds.), Schooling and the acquisition of knowledge. Hillsdale, New Jersey: Erlbaum, 1977. Norman, D. A., & Bobrow, D. G. On the role of active memory processes in perception and cognition. In C. N. Cofer (Ed.), The structure of human memory. San Francisco, California: Freeman, 1976. Norman, D. A., & Bobrow, D. G. Descriptions: An intermediate stage in memory retrieval. Cognitive Psychology, 1979, 11, 107-123. Palmer, S. E. The effect of conceptual scenes on the identification of objects. Memory and Cognition, 1975, 3, 519-526. Reeder, G. D., & Brewer, M. B. A schematic model of dispositional attribution in interpersonal perception. Psychological Review, 1979, 86, 61-79. Reynolds, R. E., & Anderson, R. C. The influence of questions on the allocation of attention during reading. Technical Report #183. Center for the Study of Reading, University of Illinois, Champaign-Urbana, Illinois, 1980. Rothbart, M. Memory processes and social beliefs. In D. L. Hamilton (Ed.), Cognitiveprocesses in stereotyping and intergroup behavior. Hillsdale, New Jersey: Erlbaum, 1981. Rumelhart, D. E. Notes on a schema for stories. In D. G. Bobrow & A. Collins (Eds.), Representation and understanding. New York: Academic Press, 1975. Rumelhart, D. E. Understanding and summarizing brief stories. In D. Laberge & S. J . Samuels (Eds.), Basic processes in reading: Perception and comprehension. Hillsdale, New Jersey: Erlbaum, 1977. Rumelhart, D. E., & Ortony, A. the representation of knowledge in memory. In R. C. Anderson, R. J . Spiro, & W. E. Montague (Eds.), Schooling and the acquistion of knowledge. Hillsdale, New Jersey: Erlbaum, 1977. Schank, R. C. Language and memory. Cognitive Science, 1980, 4, 243-284. Schank, R. C., & Abelson, R. Scripts, plans, goals, and understanding. Hillsdale, New Jersey: Erlbaum, 1977. Searle, J. R. Speech acts. London: Cambridge University Press, 1969. Smith, D. A. What schema-relevant inferences are passed to the memory representation of text? Unpublished masters thesis, California State University, Fullerton, 198 1. Smith, D. A,, & Graesser, A. C. Memory for actions in scripted activities as a function of typicality, retention interval, and retrieval task. Memory and Cognition, 1981, 9, 550-559. Spilich, G. J . , Vesonder, G. T., Chiesi, H.L., & Voss, J. F. Text processing of domain related information for individuals with high and low domain knowledge. Journal of Verbal Learning and Verbal Behavior, 1979, 18, 275-290. Spiro, R. J. Remembering information from text: Theoretical and empirical issues concerning the “state of schema” reconstruction hypothesis. In R. C. Anderson, R. J. Spiro, & W. E. n of knowledge. Hillsdale, New Jersey: Erlbaum, Montague (Eds.), Schooling and the acqu 1977. S ~ l l T. , K. Person memory: Some tests of associative storage and retrieval models. Journal of Experimental Psychology: Human Learning and Memory, 1981, 7 , 440-463. Stein, N. L., & Glenn, G. G. An analysis of story comprehension in elementary school children. In R. 0. Freedle (Ed.), New directions in discourse processing (Vol. 2). Norwood, New Jersey: Ablex, 1979. Stein, N. L., & Nezworski, T. The effects of organization and instructional set on story memory. Discourse Processes, 1978, 1, 177-193. Taylor, S. E., & Crocker, J . Schmatic bases of social information processing. In E. T. Higgins, P. Herman, & M. P. Zanna (Eds.), The Ontario Symposium on personality and social pschology. Hillsdale, New Jersey: Erlbaum, 1981.
Schemas, Comprehension, and Memory
I09
Thorndyke, P. W. Cognitive structures in comprehension and memory for narrative discourse. Cognitive Psychology, 1977, 9, 77-1 10. Thorndyke, P. W., & Hayes-Roth, B. The use of schemata in the acquisition and transfer of knowledge. Cognitive Psychology. 1979, 11, 82-106. Thorndyke, P. W., & Yekovich, F. R. A critique of schema-based theories of human story memory. Poerics. 1980, 9, 23-49. Trabasso, T., Secco, T., & Brock, P. V. D. Causal cohesion and story coherence. In H. Mandl, N. L. Stein, & T. Trabasso (Eds.), Learning and comprehension ofrext. Hillsdale, New Jersey: Erlbaum, 1982, in press. den Uyl, M., & Van Oostendorp, H. The use of scripts in text comprehension. Poerics, 1980, 9, 275-294. Woll, S. B., & Graesser, A. C. Memory discrimination for information typical or atypical of person schemata. Social Cognition, 1982, in press. Woodworth, R. S. Dynamics of behavior. New York: Holt, 1958. Woodworth, R. S., & Schlosberg, H. Experimental psychology. New York: Holt, 1954. Yekovich, F. R.,& Yekovich, C. W. The use of scripts in the study of knowledge-based comprehension of text. In U. Connor (Ed.),Discourse approaches to reading comprehension, 1982, in press.
This Page Intentionally Left Blank
CONSTRUCTION AND REPRESENTATION OF ORDERINGS IN MEMORY Kirk H . Smith and Barbee T. Mynatt BOWLING GREEN STATE UNIVERSITY BOWLING GREEN,
on10
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . , 11. Review of Previous Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Overview of the Experiments .. IV. Experiment 1: Retrieval from enngs ............................. A.
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
...
B. Results and Discussion V.
...
....
C. Conclusions and Implications . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experiment 2: The Role of Determinacy in Constructing Partial Orderi A. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VI . A.
Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Experiment 4: Diverging and Converging Nodes . . . . . . . . A. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII. Experiment 5 : The Role of the Schema.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Conclusions from Experiments on Presentation Orders. . . . . . . . . . . . . . . . . . . IX . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
VII.
I.
111 114 121 122 122 124
126 127 130 131 133 134 138 138 140 141 143 145 146 147 149 149 150
Introduction
Implicit in much of the recent work in cognitive psychology on the acquisition, retention, and retrieval of information on ordered relationships is the assumption that the linear order is a “good figure” (De Soto, 1960; Henley, Horsfall & De Soto, 1969). In general, when people are confronted with a set of asymmetric, transitive relations such as “A is greater than B,” there is a strong tendency to represent it in a single, complete ordering. Experiments by Barclay (1973), De Soto (1960), THE PSYCHOLOGY OF LEARNING AND MOTIVATION. VOL 16
111
Copyright 0 1982 by Academic Press, Inc All nghts of repduction in any form reserved ISBN 0-12-543316-6
112
Kirk H. Smith and Barbee T. Mynatt
Potts (1972), and Trabasso, Riley, and Wilson (1975) provide a variety of demonstrations of the strength of this tendency. Unfortunately, the world is not a simple, well-ordered place. Much of our knowledge cannot fit neatly into the complete ordering schema. Important examples of partial orderings include family trees and causal relations. The present article is concerned with the conditions that facilitate construction of appropriate mental representations of partially ordered information. We begin with an examination of several domains of knowledge that are typically portrayed as networks of partially ordered concepts or events. A review of the experimental literature on networks and partial orderings follows. We then describe five experiments on partial orderings that have not been reported previously. These studies make two contributions. First, they point out the limitations of previously published studies and suggest the need for exploration of a greater variety of partial orderings. Second, the experiments were designed to determine whether the acquisition of partial orderings can be understood in the same terms as the acquisition of linear orderings . Our investigations of presentation order strongly suggest that the same theory is satisfactory for both partial and complete orderings. What common kinds of information form networks of partially ordered objects or events? One instance that is very familiar to cognitive psychologists is the hierarchy (see Fig. 1A). Networks of this type have been used to represent the grammatical relationships among the words in a sentence (Chomsky, 1957; Johnson, 1968), as well as a person’s knowledge about the semantic relationships among words or concepts (Collins & Quillian, 1969). Two influential theories of semantic memory combine both types of information in complex networks (Anderson & Bower, 1973; Norman, Rumelhart, & the LNR Research Group, 1975). These theoretical networks are relatively complex in that several different kinds of relationships between words (or lexical entries) are represented. Another example of a frequently encountered hierarchy of partially ordered entities is an organizational chart or chain of command. A second type of information that can form a partial ordering is a network of causal relations. Suppose, for example, that B, J, H, K, F, and D are events in a narrative. Several relations among two or more events are possible. In the simplest, B might be the sole cause of J. More complex possibilities are that J and K jointly cause H or that H is the cause of both F and D. Such causal networks are sometimes represented in graphs like those of Fig. 1. (The examples above describe the situation at the top of Fig. 1B.) Historical relationships are often portrayed in this way. A family tree represents a special case of a history described in causal terms. The flow of material through a manufacturing process often
113
Orderings in Memory
A
C
B B
I
K
G
\I
J
I\ \ T S I\ P V I Z
C
B
’
\I
j
N
I
T
I
F
Fig. I , Examples of recently investigated networks of partially ordered entities from the following studies: (A) Nelson and Smith (1972), (B) Hayes-Roth and Hayes-Roth (1973, and (C) Moeser and Tarrant (1977).
can be usefully represented as a partial ordering, and the planning of complex development projects is frequently characterized this way (as in a “PERT Chart,” Moder & Phillips, 1964). Underlying all the examples is an implicit time line. However, to the extent that the relationships are partially ordered, time need not be fully specified. In Fig. lC, the precise time at which events H and M occur is unspecified. What is important is that both H and M precede N. For a variety of reasons, information about a network of relationships is often acquired sequentially. We learn about a family tree by listening to relatives talk about the individuals who comprise it. We read about causal relations one at a time. History is most often recounted serially. But even when we are not constrained by the serial nature of language as a medium of communication, experience enforces sequential acquisition on us. Our unguided experience with the interactions of a group of people necessarily follows a time line, even though what we eventually come to understand about a group may be most accurately reflected in a sociogram or organizational chart (cf. De Soto, 1960). Perhaps the most striking example of a network of relationships that is not completely ordered but must be translated into a serial representation is found in a computer program. For some purposes, a program must be understood as a set of complex logical relationships among the operations the computer can execute. However, the program must also be realized as a single, rigorously ordered series of symbolic statements. Writing a program involves a translation from the first kind of representation (in the programmer’s head) to the second. Often of more practical importance is the translation in the other direction, as when a program with logical errors is debugged or when someone other than the original programmer
114
Kirk H. Smith and Barbee T. Mynatt
tries to correct or modify a program. (Sometimes even the original programmer has this problem after the passage of time.) The preceding examples make it clear that people need to be able to understand networks of partially ordered entities. The question is how this is done. The writings of De Soto (1960; as well as Henley et al., 1969) seem to imply that even when sophisticated people give the relationships their most thoughtful consideration, they cannot handle certain kinds of networks. Yet some of our examples indicate that such networks cannot be impossible to understand and remember. The purpose of this article is to explore certain variables that affect people’s ability to understand and remember networks of partially ordered entities, and to show that they are the same variables that affect the apprehension of complete orderings. 11. Review of Previous Research
As indicated above, the hypothesis that the linear order is a “good figure” comes from the work of De Soto. The most relevant finding for present purposes comes from an experiment in which college students had to learn 12 relationships among four people (De Soto, 1960). The task was to report for each ordered pair of names whether or not the first influenced the second. Subjects required approximately nine repetitions, or trials, to learn all 12 relations when the latter formed a complete or linear ordering. Roughly 12 trials were required when the ordering was not complete, that is, when it formed a partial ordering and resembled a hierarchy or organization chart. A thorough treatment of De Soto’s work is beyond the scope of this article; however, several comments are needed to place our work in the proper perspective. From the beginning De Soto was concerned with how certain social relations are perceived. Thus, the 1960 experiment also contained groups of subjects that learned a set of statements identical to those described above, except that “influences” was replaced by “likes. These groups had equal difficulty with complete and partial orderings, but performed better when the relationship formed a transitive symmetric structure. De Soto concluded that “influence” is understood to be an asymmetric relation, whereas “like” is symmetric. It should be noted that no logical inconsistency is involved in a situation in which A1 likes Bill, but Bill does not like Al, whereas A1 cannot be both older and younger than Bill, nor both the father and the son of Bill. The present study was concerned almost exclusively with the latter type of relationships. Thus, ”
Orderings in Memory
I15
our work should be interpreted as an exploration of the role of completeness in understanding asymmetric, transitive structures. Two other aspects of De Soto’s work deserve comment. First, we are not concerned with the observation that a set of items with two or more conflicting orderings (e.g., on two different dimensions) produces cognitive strain (De Soto, 1961). This “predilection for single orderings” is easily confused with people’s tendency to reduce a partially ordered set of items (on a single dimension) to one (incorrect) ordering. Second, the present work is not intended to be a definitive treatment of how people understand and remember systems of causal relations or the logical structure of computer programs. We recognize that intransitivities play an important role in these instances. (In fact, the loop, one of the most important structures in computer programs, corresponds formally to what Henley et al., 1969 call a cycle.) However, many of the issues raised here, especially the methodological ones, must be faced in future studies of how people deal with even more complex networks than the ones considered here. The experiments considered so far have all used a paired associate learning procedure. A paper by Nelson and Smith (1972), which raises a number of important questions about partial orderings, explored a graphical form of presentation. Graphs are popular devices for facilitating comprehension of networks. Indeed, the word “network” is applied to partial orderings by an analogy between the graphs that are used and things like fishing nets. Examples are called flow charts, family trees, PERT charts, and sociograms. The value of a graph lies in its accurate and economical representation of the important aspects of a partial ordering. For example, Fig. 1B expresses not only the determinate relationships between K and H and J and H, represented by lines, but also the indeterminate relationship between K and J. (Each line in Fig. 1 represents an asymmetric, transitive relationship between two symbols. If the relationship is ‘‘greater than, ” then the symbol with the higher location on the graph is the larger.) Nelson and Smith examined learning and retention of the 34 determinate relations represented in Fig. 1A. The relations were presented either as a set of 34 associations between letters-that is, C M, G + M, B + M, . . . D +-K - o r as a diagram like the one in the figure. Tests required the subjects to make checkmarks in a 14 X 14 matrix for which rows indicated first letters of the associations (letters lower in the hierarchy in Fig. 1A) and columns indicated second letters (or letters higher in the hierarchy). The letters heading the rows and columns were assigned randomly from trial to trial. Thus, Nelson and Smith’s subjects had to learn how to translate one representation (a set of pairs or a graph) into
-
116
Kirk H. Smith and Barbee T. Mynatt
another (a matrix). There is no special reason to assume that the matrix representation is unique or cognitively simpler than a set of pairs or a graph. Although all possible combinations are available in the matrix, the subject has to identify only the ones presented in the association condition and does not have to discriminate between logically incorrect combinations and those that are indeterminate. Nelson and Smith found that subjects learned the 34 pairings in fewer trials, retained more pairs, and required fewer trials to relearn them when they were presented graphically than when only the pairs were presented. The difference was particularly striking in the number of errors made in learning. These results indicate that the information conveyed by a graph enhances in some way a person’s knowledge about a set of partially ordered relationships. Of particular importance is the finding that learning was unaffected when the left-to-right ordering of the branches in the graphs was rearranged from trial to trial. Groups receiving graphs that changed in this way performed as well as groups that received identical graphs throughout learning. Apparently, the subjects were able to discriminate the essential features of the graphs they were shown from their nonessential aspects. Graphs of the kind shown in Fig. 1 contain a number of details extraneous to the information they represent. The order of the nodes from left to right in the drawing is irrelevant, and the length of the lines carries no meaning. The conventions for drawing graphs of this kind permit ordinal, and sometimes interval, information to be expressed under some circumstances-as when time is represented in a graph illustrating the political history of western Europe in the eighteenth century. The latter example also makes it clear that graphs representing networks are abbreviated and impoverished in important ways. The nodes or boxes and even the lines or arrows are presented in a symbolic shorthand; one must usually read an accompanying text to get the complete story. Although Nelson and Smith demonstrated that college students are able to master the information in a partial ordering (at least temporarily for the purpose of completing a laboratory experiment), a series of recently published papers seems to argue that the knowledge is qualitatively different from that acquired when a linear ordering is learned. This conclusion is based on the finding that the “distance effect,” universally found with linear orderings, has not been obtained with partial orderings. Briefly, the distance effect refers to the fact that in judging which of two items rank higher on a linear ordering, people tend to be faster and more accurate as the number of intervening things on the scale increases. In a typical demonstration of this effect, subjects first learn a completely ordered set of relations, A > B, B > C, C > D, D > E, E > F. They are
Orderings in Memory
117
then asked either to judge whether various test items of the same form are true or false or to pick out which of two things rank higher. Reaction times tend to be shorter and error rates lower in verifying B > E than B > C, C > D, and D > E. Both Potts (1972) and Trabasso et al. (1975) found the distance effect even when relationships of greater distance (e.g., B > E) were not presented until testing. In contrast, when Hayes-Roth and Hayes-Roth (1975) tested subjects who had learned the 11 relationships represented by Fig. IB, they obtained a reverse distance effect. That is, adjacent relations, involving two letters connected by a single line (e.g., B > J in Fig. 1B) were judged more quickly than remote relations involving letters connected by two or more lines (e.g., H > P). This is, of course, just the opposite of what had been found with linear orderings. Hayes-Roth and Hayes-Roth described their experiment within the context of the semantic memory literature, in which the typical material is composed of sentences about class inclusion (e.g., “Canaries are birds”), and reaction time increases with distance rather than decreases. They went on to demonstrate that repeated testing on remote relationships could change the observed effects of distance, a result that clearly has methodological implications for the verificationtime procedure used to study semantic relations. This latter aspect of their paper has been largely ignored, and subsequent work has focused on the failure to find the appropriate ‘‘distance effect”-remote relations faster than adjacent ones-in partial orderings. Moeser and Tarrant (1977) pointed out that the procedure used by Hayes-Roth and Hayes-Roth (1975) encourages subjects to learn the individual relations between letters as isolated units in memory, rather than to integrate them into a network. Moeser and Tarrant argued that people do not spontaneously integrate information into holistic representations except under special circumstances, and that none of these conditions had occurred in the Hayes-Roth and Hayes-Roth experiment. Moeser and Tarrant therefore changed a number of aspects of the learning situation. They pointed out that in the Hayes-Roth and Hayes-Roth experiment, the relationships were presented as abstract inequalities involving meaningless letter pairs. Arguing that integration is more likely to occur with concrete and familiar material, Moeser and Tarrant used sentences that related a set of male names in terms of age (e.g., “Hugh is older than Bob”). They also required subjects to learn specific ages for some of the names. And in one condition, they showed subjects a network representation (along the lines of Fig. 1C) and encouraged subjects to store information in this format. They argued that these changes should lead to integrated storage and the usual distance effect observed with linear orders. In fact, judgment times for adjacents and remotes were equal.
118
Kirk H. Smith and Barbee T. Mynatt
It is not entirely clear how this last result should be interpreted. On the one hand, the finding of equally long times for adjacent and remote relationships can still be interpreted as evidence that partial orderings are not represented and accessed in the same way as linear orderings, even when special precautions are taken to ensure that all the information has been properly stored and integrated. On the other hand, both studies of complex, partial orderings (Hayes-Roth & Hayes-Roth, 1975; Moeser & Tarrant, 1977) differed from research on linear orders in several important ways. These differences might explain why the pattern of reaction times was different. One early hypothesis was that the experiments on partial orderings had used much larger structures. The number of elements in the partial orderings was 12, compared to at most 6 in Potts (1974) and Trabasso et al. (1975). Indirect evidence now suggests that the number of elements in the ordering is probably not responsible for the difference in results. Another line of investigation has looked more closely at the procedures used in this research. The results here are less clearcut; however, we argue in a subsequent section that these results tell us as much about the complex relationship between the distance effect and mental organization as they do about the difference between partial and complete orderings. There are two lines of evidence that the number of elements in an ordering cannot explain the difference between results from experiments on partial orderings and linear orderings. First, Pliske and Smith (1979) and Woocher, Glass, and Holyoak (1978) have reported distance effects using linear orderings of 12 and 16 terms, respectively. Second, Warner and Griggs (1980) had subjects learn a seven-term partial ordering under a variety of conditions designed to ensure that the information was correctly represented. In spite of these efforts, no distance effect was observed. Thus, it appears that the distance effect reflects the organization in memory of information from linear, but not partial, orderings. Unfortunately, no studies have directly compared distance effects in complete and partial orderings using the same procedures and testing the same relationships. The one study that has compared complete and partial orderings of the same size (14 elements in Moeser, 1979) failed to obtain a distance effect for both linear orderings and partial orderings. However, the procedure of this study was quite different from any discussed so far. Procedural variations seem to account for a good deal of the confusion in the literature on partial orderings. First, investigators have used a variety of methods to present the relationships that make up an ordering. Both Hayes-Roth and Hayes-Roth (1975) and Moeser and Tarrant (1977) used elaborate training sequences made up of many exposures to the
Orderings in Memory
119
relationships. By contrast, Pliske and Smith (1979) and Woocher et al. (1978) gave people a list to learn before coming to the laboratory and tested the success of this procedure by requiring each subject to recite the list in order. Second, the partial order studies have analyzed the time required to judge whether an assertion, such as “Carl is older than Mike,” is true or false (sentence verification procedure). Many of the linear ordering studies have used a procedure in which the subject is shown two terms (e.g., “Carl Mike”) side by side on a display screen and required to press a response key under the older of the two (twochoice procedure). Polich and Potts (1977) compared the two procedures and found that the verification procedure produces interactions between the presence of an end-anchor (highest or lowest ranked element in the ordering) and whether the sentence is true or false. Not only were such interactions absent in the two-choice procedure but Polich and Potts also reported that the overall variability of the response times was significantly less with this procedure. The importance of these procedural variations is dramatically illustrated by the two experiments that constitute a master’s thesis by Pliske (1978). In the first, unpublished experiment, subjects learned a 12-term linear order and were tested on series of adjacent relationships and a selected subset of the possible remote relationships. The method was designed to follow as closely as possible the one used by Moeser and Tarrant (1977), but without any special training on how to represent the ordering. Response times were highly variable; and the effect of distance, although evident and statistically significant, was not nearly as straightforward and compelling as that obtained in the second, published experiment (Pliske & Smith, 1979), in which subjects studied the list on their own and were tested with the two-choice procedure. Such variations in method may be more important than has been previously realized. For example, we cannot rule out the possibility that the distance effect is to some extent a reflection of the acquisition and testing procedures used in these larger sturctures. Two recent papers by Griggs and his students (Griggs, Keen, & Warner, 1980; Warner & Griggs, 1980) explored the procedural variations already discussed, along with several others. In no case were distance effects obtained for partial orderings. Elaborate preliminary instructions about the nature of partial orderings and their representation in graphical form did not lead to distance effects. Warner and Griggs (1980) found that without exposure to a graphical representation of the information, less than 60%of their subjects’ responses were consistent with the correct seven-term partial ordering. Only when subjects were required to draw
I20
Kirk H. Smith and Barbee T. Mynatt
the correct graph from memory on two consecutive trials and were then trained on the adjecent comparisons to two consecutive correct trials, did they respond correctly to remote comparisons on the test. Even this rigorous program did not lead to a distance effect, although this approach can be criticized because the extensive training on adjacent comparisons may have facilitated responses to them. Warner and Griggs’s third experiment comes the closest to matching the procedures of the earlier studies with large linear orderings. Following preliminary instructions on the nature of partial orderings and their graphical representation, subjects were given a graph of either a 7- or 12item partial order and told to memorize it for a second experimental session. At the beginning of the second session, the subject had to draw the structure both before and after familiarization with the test procedure. Finally, testing made use of a modification of the two-choice procedure similar to that used by Pliske and Smith and Woocher et al. The modification involved the addition of a third button to be used when a pair of items were indeterminate, that is, not ordered by the information given to the subject. In spite of the procedural similarities, Warner and Griggs found a reverse distance effect for the 12-item partial order used by Moeser and Tarrant. (It should be noted that 2 of the 20 subjects in this condition of Warner and Griggs’ experiment were unable to learn the correct structure.) In summary, 6 years of research suggests that the information in partial orderings is more difficult to memorize than similar completely ordered information. However, after receiving special instructions about the nature of incomplete ordering, accompanied by graphs and practice in using them, most college students are able to judge whether or not a given pair of elements is ordered and to draw correct inferences about the relationship between pairs of elements that are ordered but not specifically presented. The only evidence that the partial orderings are represented in an inherently different fashion is the failure to find a distance effect for judgments requiring inferences. The correct interpretation of this difference depends on how the distance effect is interpreted for complete sets of elements (cf. Potts, Banks, Kosslyn, Moyer, Riley, & Smith, 1978). In any case, these conclusions are based on investigations of a remarkably small sample of different partial orderings. No arguments have been offered to support the claim than the sample is representative. If our survey is complete, exactly four partial orderings have been examined, the two 12-item orderings shown in Fig. 1B and lC, a 7-item ordering studied by Griggs and his students, and a 14-item ordering that Moeser (1979) compared with a complete ordering.
Orderings in Memory
121
111. Overview of the Experiments As part of a research project dealing with the construction of linear orders (Foos, Smith, Sabol, & Mynatt, 1976; Mynatt & Smith, 1977; Smith & Foos, 1975), we became interested in partial orderings or networks because they seemed to be a rich domain into which we could extend our theory of constructive processes (see Foos et al. and Smith’s section of Potts et d.).Our work has focused on the construction of four-, five-, and six-element linear orderings. With such small sets of relationships, the construction of a branch or node from two relations (e.g., A > B, A > C) did not appear to be fundamentally different from the construction of a linear ordering (e.g., from A > B, B > C). Indeed, an early study in our laboratory (Smith & Mynatt, 1975) indicated that four- and five-term partial orderings were no more difficult to construct than similar-sized linear orderings. These preliminary observations were in sharp contrast to the previously published studies we have reviewed. In what follows, the first experiment we report was an investigation of the distance effect in retrieving information from partial orderings. One condition of the experiment was essentially a replication of the experiments by Moeser and Tarrant (1977) and Warner and Griggs (1980). It differed from the previous studies mainly in the additional procedures included prior to testing in order to guarantee that subjects understood the indeterminacy of partial orderings and had learned the specific adjacent relationships. A second condition tested another structure with the same number of elements (1 2) but a configuration similar to the hierarchical network studied by Nelson and Smith (1972). The second experiment reported below was designed to explore the diversity of structural configurations possible in 12-element partial orderings. A less elaborate testing procedure was used, and the focus was on whether subjects could answer questions and draw accurate diagrams on the basis of a set of sentences describing a partial ordering. The sentences were continuously in view in order to eliminate the effects of memory storage and retrieval. Results of the first two experiments were interpreted as evidence that, at least for college students, partial orderings can be learned and the resulting knowledge is not fundamentally different from what is learned in a linear ordering. The last three experiments, using four- and five-element orderings, were concerned with the process of construction. How are the relationships in individual sentences combined to form mental networks? The first of these experiments contrasted the process of extending a linear ordering with the process of building a node or branch (e.g., the structure
122
Kirk H. Smith and Barbee T. Mynatt
involving J, R , and D, at the top of Fig. 2). The second experiment investigated the construction of different types of nodes; the third was concerned with the effects of context on constructive processes. Different contexts-in this case the sentence frames used to express relationshipswere expected to elicit more or less appropriate representational schemas from the subjects’ permanent memory. Throughout these last three experiments, the Foos et al. theory of constructive processes was extended and modified to apply to networks or partial orderings.
IV.
Experiment 1: Retrieval from Partial Orderings
None of the recent studies of partial orderings has found a distance effect, in which responses to remote relationships are faster than to adjacent ones. Various attempts (Hayes-Roth & Hayes-Roth, 1975; Moeser & Tarrant, 1977; Warner & Griggs, 1980) suggest that this failure cannot be attributed to differences in training procedures, response measurement procedures, or to the number of elements in the structure. However, only four configurations have been investigated. Each of these four structures seem arbitrarily complex and unlike anything a college student might have encountered previously. The conclusion that distance effects cannot be obtained in any partial ordering is obviously premature. A more familiar and intuitively simpler configuration of relationships that form a partial ordering is a hierarchy or family tree. The present experiment compared retrieval time for information in the hierarchical network shown in Fig. 2 with comparable performance for the irregular network used by Moeser and Tarrant. The two structures have the same number of elements (12). The elements were one-syllable given names from Battig and Montague’s (1969) norms, and the relations between them were described as age relations. A.
METHOD
The subjects were 22 undergraduate students at Bowling Green State University. Their participation partially fulfilled a course requirement for introductory psychology. The 11 students in the “hierarchy” condition worked with the relations graphed in Fig. 2. The 11 students in the “irregular” condition worked with the relations that form the irregular partial ordering (see Fig. 1C) studied by Moeser and Tarrant, and Warner and Griggs. The training phase was based to some extent on the procedures used by Nelson and Smith. It had several steps and attempted to expose the
Orderings in Memory
C
/I\
F
A
i
B
M
123
L
Fig. 2. The hierarchical network that subjects in the hierarchy condition of Experiment 1 learned.
subjects to the various properties of the order in a thorough but relatively unstructured way. The subjects were first given a sheet containing 11 sentences describing the age relation between adjacent elements from the order and were asked to draw a diagram representing the information presented in the sentences. The experimenter checked these drawings and discussed any inaccuracies with the subjects. The subjects were then asked to verify four other diagrams representing the same information in four somewhat different ways. These diagrams had lines that were longer or shorter or of varying length and had different arrangements of the branches. Two of the diagrams had minor errors in them, and two had no errors. The purpose of this phase was to allow the subjects to see that a variety of diagrams could be accurate. Again the subjects’ responses were immediately scored and discussed. The subjects then read 26 sentences describing possible age relations between the names and decided whether each sentence was true, false, or indeterminate based on the information presented in the original sentences. Feedback was also given to the subjects on these decisions. The next training task required the subjects to fill in a 12 X 12 matrix which had the 12 names printed along the left and top borders. Subjects were instructed to place a checkmark in every cell in which the name from the row was older than the name from the (intersecting) column. These responses were corrected and discussed. Up to this point, the subjects had the original sentences and all other materials available for reference. However, the subjects were told at the outset that eventual memorization of the relationships would be necessary. At this point, the subjects were asked to study any or all of the materials as long as they wished, until they felt they knew the material completely. They were then given one final test in which they placed check marks on another arrangement of the 12 X 12 matrix. Following the training phase, subjects retrieved information from memory about the relative age of pairs of names in the partial ordering. Pairs of names appeared on an Owens-Illinois Digi-Vu screen, and response times were recorded as subjects pressed one of two marked keys
124
Kirk H. Smith and Barbee T. Mynatt
on a keyboard under the name of the older person or pressed the space bar to indicate an indeterminate relation. Presentation of the name pairs and response records were under the control of a Nova 1220 computer. Each block of trials consisted of 78 name-pair presentations. The composition of the trials depended on the condition. For the hierarchical structure, on each block of trials all 26 possible determinate relations were presented, including 11 adjacent relations, 9 remote relations with a step size of 1, and 6 remote relations with a step size of 2. (Step size is defined by the number of elements between the two test items.) Each of these was presented once in a left-to-right order on the screen, and again in a reversed order. A subset of 26 of the 80 possible indeterminate relations were also included in each block. To equate the hierarchy condition with the irregular condition as much as possible, subjects in the latter condition were likewise tested on 26 determinate relations presented in both forward and reverse orders and 26 indeterminate relations. However, these relations do not exhaust all the possible relations in either category. Of the determinate relations, 7 were adjacent pairs, 5 had a step size of 1, 5 a step size of 2 , 4 a step size of 3, and 3 a step size of 4, and 2 a step size of 5. Each subject was required to complete three blocks of trials in which their error rate did not exceed 5%. B.
RESULTSAND DISCUSSION
The mean number of trial blocks required to meet the criterion was 3.7 blocks for the hierarchy group and 4.4 for the irregular group. The difference was not significant, although it was in the direction of our original hypothesis that the irregular structure is more difficult to master than the hierarchy. Response times on error trials were replaced with the mean of response times for correct trials of the same type. The response times from the irregular condition parallel other reported results: There was no evidence of a distance effect like that found for linear orders. In fact, adjacent relations, with a mean response time of 2.80 sec, produced significantly faster responses than the nonadjacent relations, with a mean of 3.11 sec, F(1, 10) = 14.7, MSe = 2.16. However, response times to pairs from the hierarchical structure showed distance effects: Response time decreased as step size increased (see Table I). Analyses showed that response time to the adjacent pairs, with a mean of 3.51 sec, was slower than to the remote pairs, with a mean of 2.31 sec, F ( 1 , 10) = 72.6, MS, = 8.25, and that 1-step pairs, with a mean of 2.91 sec, were slower than 2-step pairs, with a mean of 1.41 sec, F(1, 10) = 69.5, MS, = 7.74. A more detailed analysis of the data for the irregular condition failed to
I25
Orderings in Memory
TABLE I MEANRESPONSETIME(SEC)To DETERMINATE PAIRSAS A FUNCTION OF STEPSIZEFOR THE HIERARCHY CONDITION Step size
0
1
2
1.49 3.52 3.14
1.41
N
1.34 3.40 3.60 4.50 3.66 4.06
Meanb (with J): Mean (without J):
3.51 3.99
2.91 3.63
1.41 -
Older term in paira J R
D P
G
“The actual elements used in the structure were one-syllable first names. For convenience, only the first letter of the name is used. Refer to Fig. 2 for the placement of each term in the structure. bThe last two rows present weighted means. Note that certain names have more than one relation of a certain type, so that the means given do not always equal the means of the corresponding column of table values.
reveal any obvious trends or patterns. However, the data from the hierarchy condition suggest that the obtained distance effect consisted of two components. First, any pair of names containing J, the element at the top of the hierarchy, led to faster responding than pairs not containing J. As can be seen in Table I, this was true for both adjacent pairs, F(1, 10) = 57.7, MSe = 13.09, and for pairs of step size 1, F(1, 10) = 40.0, MS, = 14.09. (All pairs of step size 2 involve J.) The fact that J is the correct response to any pair containing it appears to confer on it a special status. Subjects can store this specific information with the term in memory and use it in making a rapid, categorical decision in much the same way that Pliske and Smith’s (1979) subjects used the gender of names in a linear order to make rapid decisions when all names of one gender preceded names of the other gender in the ordering. However, even when pairs containing J were removed from the analysis, there remained a distance effect of the kind typical for linear orderings. The last row of Table I presents the means for adjacent and remote (step size 1) pairs that did not include J. The difference was significant when tested with a comparison that was not orthogonal to those given earlier, F(1, 10) = 24.7, MSe = 1.43. A second component of the distance effect shown in Table I is a significant increase in response times to adjacent pairs from the second
126
Kirk H.Smith and Barbee T. Mynatt
level of the hierarchy (R and D) compared to the third level (P, G, and N), F( 1, 10) = 7.69, MS, = 8.01. (This planned comparison is orthogonal to all but the last one presented above.) This pattern of results suggests a search process operating like the spread of activation (cf. Collins & Loftus, 1975) from the top of the hierarchy (J) and directed downward. If such a search process is assumed to terminate only when both members of pair have been located (as would be necessary to classify a pair as indeterminate), then most of the results in Table I fall into place. For example, responses to pairs of step size 1 containing R and D (3.52 and 3.74 sec, respectively) took longer than the adjacent pairs containing these letters (3.40 and 3.60 sec, respectivley). The apparent distance effect is the result of averaging response times for adjacent pairs at different levels of the hierarchy, (i.e., pairs involving P, G, and N). The results in Table I can also be seen to display an effect similar to that of a propositional fan (Anderson, 1976). Response times were shorter to pairs containing R, with only one subordinate, than to D, with two. And response times to pairs containing G, N, and P, with one, two, and three subordinates, respectively, increased as expected. (The mean response time to indeterminate pairs, 4.35 sec, was significantly longer than the mean to determinate pairs, 2.82 sec, t(l0) = 8.21, SE, = .375; however, we could find no easily interpretable trends in the times for indeterminates.) C. CONCLUSIONS AND IMPLICATIONS
The pattern of results from the hierarchical structure displayed a traditional distance effect (when analyzed in the usual way), whereas for the irregular structure the pattern was reversed. What conclusions should be drawn from this outcome? Following the logic of other studies of partial orderings published within the last 8 years, we might conclude that the hierarchy was represented and stored like a linear ordering, but the irregular structure was not. It is interesting to speculate on the direction research on partial ordering might have taken if Hayes-Roth and HayesRoth (1975) had chosen to investigate a hierarchical structure. We contend that the search for a set of conditions that lead to a distance effect with partial orders has been misdirected. The presence of a distance effect can be the result of averaging the response time data for adjacents and remotes (or even for rank-ordered distances) in ways that obscure the effects of very different variables and processes. Our data for the hierarchical structure illustrate this kind of confusion quite clearly. One hypothesis that is consistent with all the data discussed so far is a two-process model such as Pliske and Smith (1979) have suggested to explain retrieval times for linear orderings. The general form of this
Orderings in Memory
127
model contains two components, a rapid decision process based on specific categorical information about the elements of an ordering and a slower systematic serial search process that moves from element to element along the learned connections among them. For linear orderings, an example of categorical information is the gender of names used in an ordering. The parallel in the hierarchy would be responses to pairs of names containing the topmost element (J). An example of serial search processes in linear orderings is the proposal that subjects search the ordering from the ends inward (cf. also Woocher et a/., 1978). The parallel for a hierarchy is the spreading activation notion discussed above. The irregular structure may have been learned in the same way as the hierarchy (or a linear ordering, for that matter), and the retrieval processes may have been basically the same. The difference is that irregular structures do not have a small number of elements that can be uniquely categorized by their structural properties; and sequences of the serial search (or the pathways of spreading activation) may be more idiosyncratic from subject to subject, or even from trial to trial for the same subject. In effect, irregular structures such as Figs. 1B and 1C may be interpreted as situations in which only the underlying search processes are manifest in the data because the effects of other retrieval processes, especially rapid categorical decisions, have been randomized. However, if this interpretation is correct, investigations of partial orderings should have used many more irregular structures. It also follows from this interpretation that patterns of retrieval time reflect a great deal more than the incorporation of a set of relations into an integrated memory representation. Because our goal was understanding how people construct and represent partial orderings, we drew an important lesson from our first experiment. The time taken to retrieve comparative information from an ordering is influenced by many factors other than the process of understanding and representing the information as an integrated whole. A more appropriate methodology is needed to investigate how people combine relationships and construct integrated representations of partial orderings. There is no doubt that the process is more difficult for partial orderings than for linear orderings, but it can be done. The question is how.
V.
Experiment 2: The Role of Determinacy in Constructing Partial Orderings
In reviewing research on partial orderings, we noted that a very limited number of structures have been intensively investigated. Whatever conclusions have been reached cannot really be generalized to "partial order-
Kirk H. Smith and Barbee T. Mynatt
128
ings,” but must be confined to these few specific structures. The obvious remedy is a systematic exploration of the domain; however, the number of possible configurations of partial orderings is surprisingly large and diverse. A feeling for this diversity can be gotten from Fig. 3, which illustrates five possible partial orderings of 12 elements. The only configuration in Fig. 3 that has been investigated previously is Fig. 3B, the ordering devised by Moeser and Tarrant (1977). At present there is no way of guaranteeing that the structures shown are representative of partial orderings, even with the restriction that exactly 12 elements be ordered. The partial orderings in Fig. 3 were chosen according to several principles derived from our intuitions about the possible sources of difficulty people have in understanding and remembering the relations in such structures. Figure 3B was included as a reference point, since it has been investigated not only by Moeser and Tarrant, who introduced it, but also by Warner and Griggs (1980) and by us (reported in Section IV). It has been clearly established that with appropriate background and procedures of presentation, college students can learn the relationships involved in this ordering and can draw the correct inferences about remote relationships (even though the speed of their performance may not correspond to that of subjects who are retrieving information from a linear ordering). The question raised here is whether people have more or less difficulty with the other structures. If so, we wanted to isolate 1he reasons for these difficulties. A
C
B H
I
H
B
I\ L
D
J
\ I.. E
I
I F I
\ / F
I
F
I
G
G
G
D
E
47\7\/D ,7 A
H
l
J
K
G
I
L
D
/A\ E
/A\ F
G
H
A\ I
J
Fig. 3. The partial orderings investigated in Experiment 2.
K
L
Orderings in Memory
129
The fundamental difference between partial and linear orderings is that the former leave the relationship between some pairs of elements indeterminate. One obvious way in which partial orderings can differ is in the extent of this indeterminacy. For example, 12 elements have 66 possible pairwise relationships. A linear ordering specifies 11 of these, which in turn determine all of the remaining 55 relationships. Thus, a linear ordering is a complete ordering; there are no indeterminate relations. By comparison, Moeser and Tarrant’s partial ordering (Fig. 3B) leaves 25 of the 55 potentially determinable relations indeterminate. The remaining four structures in Fig. 3 were chosen for study in part on the basis of their level of indeterminacy. Figure 3A is in some ways a much simpler, more orderly configuration than Fig. 3B, which has received so much attention; yet Fig. 3A has the same degree of indeterminacy. Specifying 11 of the 66 possible relationships between pairs of elements leaves 25 of the remaining 55 relationships indeterminate. By contrast, Fig. 3C seems intuitively very similar to Fig. 3B, but in fact has more indeterminate relations-32 out of 55, instead of 25. Figure 3E, which is a hierarchy similar to that investigated in Section IV, has 46 indeterminate relationships, a still higher level of indeterminacy. (The hierarchy in Fig. 2 has 45 indeterminate relations.) Figure 3D was constructed to have almost the same number of indeterminate relations (45) as Fig. 3E, but to look very different, at least superficially. The five structures selected for study can be seen to possess levels of indeterminacy similar to either Moeser and Tarrant’s irregular structure or our hierarchy, with the exception of Fig. 3C, which falls in between. In fact, Fig. 3C has a little more than half indeterminate relations (58%), although it “looks” a lot more like Fig. 3B than it does like Figs. 3A, 3D, or 3E. The only difference is that in Fig. 3B, H is greater than B, whereas in Fig. 3C, H is less than A. One further dimension distinguishes the five structures. Figures 3A, 3B, and 3C all have one long linear ordering of 7 elements, ABCDEFG, whereas in Figs. 3D and 3E, the longest chain is 3 elements long. Although the differences outlined above may not correspond to the relevant cognitive dimensions of such configurations, they seem to reflect the diversity that exists within the domain of 12-element structures. In order to find out whether these structures differ in difficulty, we gave subjects a set of 11 statements describing adjacent relationships among the 12 elements and tested whether they could draw a diagram representing the partial ordering and answer questions about the nonadjacent relations implied by the ordering. We selected a paper-and-pencil version of the task in which all 11 specified relations were available for
Kirk H. Smith and Barbee T. Mynatt
I30
inspection during testing and subjects could work at their own pace. Our purpose was to find out whether people could understand partial orderings of this size and complexity, independent of the demands made on memory to retain the 11 relationships. The present experiment was designed to determine whether there are aspects of partial orderings for which people do not have a readily accessible schema. A more global source of difficulty may be the fact that comparisons of age per se, especially among several persons or objects, are most easily made in terms of numerical values rather than rankings. Thus, people may assume that a set of age comparisons form a complete ordering. Another familiar relationship, ‘‘parent of, ” implies relative age but only within broad limits. A set of sentences such as “Mary is the mother of Ted” and “Sam is the father of Rita” might be expected to lead to better understanding of a partial ordering. Although the effect of changing the sentence frame seems minimal, De Soto (1960) found that the same structure differed in difficulty depending on the sentence frame used. In the present experiment, the partial orderings were presented as either parent-of or older-than sentences, although the test sentences were age comparisons (older than) in all cases. A.
METHOD
The subjects were 200 undergraduate students drawn from the same source and in the same manner as for the first experiment. Five groups of 40 students were assigned to work with each of the five partial orderings. Within each group, half of the subjects drew the diagram first, then answered questions; the other half answered questions first. The subjects in the diagram-first conditions, when given the instructions for answering questions, were told that they could refer back to their diagrams. The subjects were run in small groups of 2 to 8. Verbal instruction asked them to follow a set of printed instructions and to ignore whatever a neighbor might be doing, because each person had different materials. Each subject received a three-page booklet containing materials and instructions. For both tasks, 11 sentences appeared at the top of the page. For half the students, the sentence had the form “Beth is older than Dave”; for the other half, the sentence had the form “Beth is the mother of Dave.’’ The 11 sentences were the set of adjacent relations necessary to describe one of the 12-element networks shown in Fig. 3. The six male and six female names used as elements were again chosen from Battig and Montague’s (1969) norms. The order of the sentences on the page was randomly determined. For the diagram-drawing task, the instructions asked the subject to draw a tree or diagram which would accurately represent the age relation-
Orderings in Memory
131
ships among the people described by the sentences. They were told to use arrows to connect the names, with the head of the arrow pointing toward the younger person. Space for the drawing was provided at the bottom of the sheet. For the question-answering task, 33 statements were listed below the 11 sentences defining the order, and the subject was told to read each statement and decide whether it was true, false, or indeterminate based on the information in the 11 sentences at the top. Three columns of blanks were printed next to the statements with the headings, “True,” “False,” and “Can’t Tell,” allowing the subject to put a check in the appropriate column to indicate an opinion about each statement. The types of statements used with each of the five partial orderings were randomly chosen from among the 132 possible statements in proportion to the occurrence of each type. For example, for Fig. 3E there are 20 possible true statements, 20 possible false statements, and 92 indeterminate relations. Of the 33 statements tested, approximately 68% (23) were indeterminate, 15% (5) were true, and 15% (5) were false. The indeterminate statements were randomly selected from all possible indeterminates. The indeterminate statements were randomly selected with the restriction that the proportions of adjacent and remotes were matched to their proportion of occurrence in the set of all possible determinates. B.
RESULTSAND DISCUSSION
The number of subjects in each condition who drew completely correct diagrams on the basis of 11 relationships is shown in Table 11. In general, subjects tended to be more successful when the sentences described the TABLE I1 NUMBEROF SUBJECTSWHOSEDIAGRAMS WERECOMPLETELY CORRECT IN EXPERIMENT 2 Structure Condition * Diagram first Older Parent Questions first Older Parent
A
B
C
D
E
9 10
7 9
6 9
6
7
8 9b
5 8
8 6
4
8 8
9 96
8
*A total of 10 subjects were tested in each condition
bDue to an error, two subjects, one in each of the indicated conditions, were not requested to draw a diagram.
Kirk H. Smith and Barbee T. Mynatt
132
parent-child relationship than when the same elements were related in age, 83% vs 70%, respectively; x2(1) = 5.28, p < .05. The differences among the five structures were not significant; x2(4) = 7.75, p < .lo. There was also no indication that answering a set of 33 questions about the implications of the 11 sentences had any impact on success in drawing a diagram, or vice versa. Of the 200 subjects, 8 drew linear orderings and 4 drew two separate orderings. These errors were not confined to any one condition, however. The errors made in answering questions were compiled separately for determinate and indeterminate relations and converted to percentages, which are shown in Table 111. These values were submitted to an analysis of variance in which type of question, determinate or indeterminate, was a repeated measure. The .01 level of significance was used as a criterion for discussing any comparison. The findings are presented here in order, beginning with the effect that accounted for the greatest proportion of total variance attributable to experimental manipulations. Subjects made significantly more errors on indeterminate than on determinate relations, F( 1, 180) = 107.1, MSe = 324.6, and this difference accounted for 37% of the variance. The effect was consistent across all conditions of the experiment. The five structures differed significantly in difficulty, F(4, 180) = 16.2, MSe = 306.8, accounting for another 21% of the variance. However, examination of Table I11 suggests that the effect of structure was different for indeterminate and determinate relations. The interaction was significant, F(4, 180) = 10.5, MSe = 324.6, and accounted for 15% of the variance. The percentages in Table I11 make clear that our original hypothesis that difficulty is affected by the amount of indeterminacy was incorrect. Subjects made the greatest number of errors on Fig. 3C, in which the level of indeterminancy is intermediate between Figs. 3A and 3B (with TABLE I11 PERCENTAGE OF ERRORS ON QUESTIONS ABOUT RELATIONSHIPS IN FIVE STRUCTURES TESTEDIN EXPERIMENT 2" Figure 3 structure Type of relationship
A
B
C
D
E
Overall
Determinate Indeterminate Overall
7 25 16
17 27 22
12 51 31
5
21 16
8 13 11
10 29
OPercentages based on different numbers of questions for different structures of Fig. 3.
Orderings in Memory
133
relatively few indeterminate relations) and Figs. 3D and 3E (with many). For Fig. 3C, questions about indeterminate relationships seemed to pose the greatest difficulty. The only subjects who failed to get any indeterminate questions correct were, with a single exception, in this condition (five subjects out of six). Figure 3D was more difficult than Fig. 3E because the indeterminate relations led to more errors, but Moeser and Tarrant’s structure (Fig. 3B) was more difficult than Fig. 3A because the determinate relationships produced more errors. Long-chain structures (Figs. 3A, 3B, and 3C) did not emerge as strikingly more difficult than short-chain structures (Figs. 3D, 3E). Structures with certain elements connected to many other elements (especially Fig. 3E and, to a lesser extent, Fig. 3D) were not different from structures with little multiple connectedness (e.g., Fig. 3A). (The average number of elements connected to a given element-a measure of the fan effect, Anderson, 1976-is the same for all the Fig. 3 structures.) The effect of the sentence frame on the question-answering data was consistent with the subject’s success in drawing correct diagrams. Fewer errors were made on questions about partial orderings based on “parent” sentences than on “older” sentences, F ( l , 180) = 13.5, MS, = 306.8. This significant difference, which accounted for 4.4% of the total variance, was primarily due to the reduced numbers of errors to questions about indeterminate relations when “parent” sentences were used. This interaction was also significant, F( 1, 180) = 9.4, MS, = 324.6, accounting for 3.3% of the variance. Finally, the only effect of procedure was on determinate questions. Fewer errors were made on determinates when subjects had first drawn a diagram of the ordering (6%) than when they answered questions first ( 14%), whereas errors on indeterminates were unaffected (29 and 28% error rates for diagram-first and questions-first conditions, respectively). This interaction accounted for only 2.5% of the variance but was significant, F ( l , 180) = 7.3, MS, = 324.6. One interpretation of this result is that a diagram does not aid the subjects’ understanding of indeterminate relations, although it helps in making inferences about remote relationships. However, subjects received no feedback on the diagrams they made. Thus, their errors on questions after drawing a diagram may have been a reflection of their initial misunderstandings in composing a diagram. C . CONCLUSIONS
The results of this experiment show that partial orderings can differ considerably in difficulty. By difficulty we refer to the problems people have in understanding the implications of a set of relationships that does
Kirk H. Smith and Barbee T. Mynatt
134
not assign each element a unique rank. The results for both diagram drawing and question answering support the contentions of earlier investigators that the necessary schema for representing partial orderings is not very salient. Our subjects generally did better on both diagrams and questions when the sentences described parent-child relations than when they simply stated that one individual was older than another. We assume that parent-child relationships awaken a schema of age ordering that admits indeterminancies. That subjects in the parent-child condition did better on questions about the relative age of members of a family and that the principal gain in performance was in answering indeterminate questions are especially compelling evidence for the importance of the schema. These results also indicate that our subjects, who were college students, did not lack the appropriate schema altogether. Rather, they seemed to be much less likely to use it with pure age relations. Instead of a predilection for linear orderings in all situations, people appear to gravitate toward the linear ordering in situations where both (a) complete ranking is possible and (b) interval measurement makes sense. Although our results indicate that much of the difficulty with partial orderings has to do with indeterminate relations, no conclusion about the importance of various structural variables seems possible. In particular, the amount of indeterminacy (the proportion of indeterminate relationships among the unspecified ones) in a partial ordering does not affect its difficulty in any simple, monotonic way. (The only hypothesis consistent with our data is that difficulty increases as the ratio of indeterminate to determinate relations approaches unity. However, even this idea is limited; structures with roughly the same ratios of indeterminancy showed differences in accuracy of question answering.) One important lesson about structural variables may be gleaned from our results. There is no reason to suppose that a haphazardly selected configuration of partially ordered relationships is representative of such structures. It is important to stop generalizing about “partial orderings” and begin a more systematic search for the processes people use in integrating and understanding this kind of information. VI.
Experiment 3: Node Construction
Presentation order is another factor that has been suggested as a possible explanation for the difficulty people have in correctly integrating the components of a partial ordering. Moeser (1979), for example, compared three groups of subjects (Experiment 3), one which learned a partial ordering and two which learned a linear ordering. One of the latter groups
Orderings in Memory
135
was exposed to the sentences describing relationships (specifically, “Todd is older than Herb”) using what Moeser labeled a “match” order, that is, A > B, B > C, C > D, . . . , M > N. The other linear order group received the sentences in a “nonmatch” order so that the ninth sentence introduced two names not previously mentioned (a nonmatch situation). Three more sentences were given before a sentence referred again to a name in the seventh sentence, and not until the twelfth sentence was there sufficient information to complete a 13-element ordering. All subjects received instructions on how to represent partial and linear orderings with diagrams and how to answer questions about them correctly. Moeser found that subjects in the linear match condition did better than those in the other two conditions, which did not differ. On the basis of this finding, she concluded that some of the difficulty in integrating and understanding a set of partially ordered relationships is due to the fact that partial orderings must always be presented in nonmatch orders. In spite of its plausibility, Moeser’s argument does not really explain either her results or the difficulty of partial orderings. The terms “match” and “nonmatch” are taken from a theory of linear order construction proposed by Foos et al. (1976). Detailed application of this theory to the presentation orders Moeser used reveals that they are not of comparable difficulty. In the Foos et al. theory, match orders require the subject to add new elements to the ordering by one of two processes, both of which involve locating a single element in common between the ordering previously constructed and the new relationship. (This common element is the “match.”) The M1 process detects a match at the end of a previously constructed ordering. For example, given B > C > D, the relation D > E is added by the M1 process. The M2 process detects a match at the beginning of a previously constructed ordering, so that given B > C > D, the relation A > B is added by the M2 process, which has been shown to be slightly more difficult than the M1 process. A nonmatch order in the Foos et al. theory requires at least one nonmatch situation, for example, when A 3 B > C > D is established and E > F is to be added eventually (but must be held temporarily separate in memory). There is no reason why a partial ordering has to be presented in a nonmatch order. For example, in Moeser’s 16element ordering, once the central %element linear ordering has been presented, the remaining comparisons may be added by processes comparable to M1 and M2. For example, given the previously constructed chain B > C > D > E, branches can be added by relations such as C > Q (parallel to M1) and X > D (parallel to M2). While Foos et al. did not consider such nodeconstruction processes, the process of locating a common element (the “match”) would be the same. Whether a node is more difficult to con-
I36
Kirk H. Smith and Barbee T. Mynatt
struct than a line is an empirical question, which the present experiment considered. Moeser did, in fact, introduce a nonmatch situation in presenting the partial ordering. However, the nonmatch was not introduced at precisely the same point in the presentation order for partial and linear orderings. For the partial ordering, the first true nonmatch occurred on the eleventh comparison and was immediately followed by a resolution (or “doublematch,” D1, process). A recent study by Foos and Sabol (1981) has reported that as the number of comparisons between the nonmatch and its resolution increases, performance on nonmatch orders declines. Whereas the nonmatch was immediately resolved in the partial ordering (no intervening comparisons), three comparisons intervened in Moeser’s linear nonmatch condition. The Foos and Sabol data were based on shorter, simpler orderings (strings of six letters to be constructed from five letter pairs), so the results may not be strictly comparable. However, the point is that Moeser’s three conditions confound a number of variables known to affect the difficulty of constructing linear orderings. The linear match condition will obviously be the easiest, not only because a match order was used but also because only the M1 process was required. The linear nonmatch condition has a substantial delay following the nonmatch, making it more difficult than the partial ordering; but the latter contains match processes that result in nodes, the difficulty of which is unknown. The difference between MI and M2 processes suggests that other match processes are unlikely to be as easy as M1. A good deal of the confusion about the difficulty of different presentation orders is probably due to the fact that subjects cannot master a 14element ordering of any kind in one exposure. Beginning with the second repetition, the classifications of Foos et al. do not apply, strictly speaking. The subjects have heard all the elements once, so no comparison can pose a nonmatch of the kind envisioned by the theory. Moreover, a question such as the relative difficulty of adding a node is best posed in designs similar to those used in unraveling the effects of presentation order in constructing linear orderings. Therefore, in the remaining experiments, we limited our attention to orderings with only a few elements and explored a wide range of presentation orders. We have not been exhaustive, however, as were Foos et al. The reason is that partial orderings create a much more extensive range of possibilities, even with orderings of only 5 elements. In the first experiment, we compared the difficulty of constructing either a linear order or one of the three partial orders shown in Fig. 4. The presentation orders are given in Table IV along with an analysis of the construction processes specified by the Foos et al. theory. All presen-
Orderings in Memory
I37
A
B
C
A
A
A
/ \
I
B
I I
B
E
I
c
c
D
D
'E
I
I
D
E
Fig. 4. The three partial orderings constructed by subjects in Experiment 3. They differ in the location of the node or branch relative to the four-term linear ordering ABCD.
tation orders consist of three sentences that can be combined to form a four-term linear ordering. Half the presentation orders in Table IV accomplished this with process M1 (orders 1, 2, 5 , 7, and 9) and the other half with process M2 (orders 3 , 4 , 6, 8, and 10). The last sentence in each presentation order either completed a five-term linear ordering or resulted in one of the networks. Completing a linear ordering required either process M1 (orders 1 and 3) or process M2 (orders 2 and 4). In the
TABLE IV MEANPROPORTION OF CORRECT TRIALSAS A FUNCTION OF PRESENTATION ORDERIN EXPERIMENT 3 Constructive processb involved Presentation order"
After second sentence
1. AB,BC,CD,DE 2. BC,CD,DE,AB 3. CD,BC,AB.DE 4. DE,CD,BC,AB
MI M1 M2 M2
5 . AB,BC,CD,AE
M1 M2 MI M2 MI M2
6. I. 8. 9.
CD,BC,AB,AE AB,BC,CD,BE CD,BC,AB,BE AB,BC,CD,CE 10. CD,BC,AB,CE
After third sentence Linear orderings M1 MI M2 M2 Networks MI M2 M1 M2 MI M2
After fourth Sentence
Mean proportion correct
MI M2 M1 M2
.62 .43 .35 .42
Nd Nd Nd Nd Nd Nd
.42 .45 .33 .29 .51 .28