THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory VOLUME 20
This Page Intentionally Left Blan...
33 downloads
1441 Views
16MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory VOLUME 20
This Page Intentionally Left Blank
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
EDITEDBY GORDON H. BOWER STANFORD UNIVERSITY, STANFORD, CALIFORNIA
Volume 20 1986
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers
Orlando San Diego New York Austin Boston London Sydney Tokyo Toronto
COPYRIGHT 0 1986 BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED. NO PART O F THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL. INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC. Orlando, Florida 328x7
United Kingdom Edition ublished by
ACADEMIC PRESS I&.
(LONDON) LTD.
24-28 Oval Road. London NWI 7DX
LIBRARY OF CONGRESS CATALOG CARDNUMBER:66-301 04 ISBN 0-12-543320-4
(alk. paper)
PRINTED IN THE llNlT6D STATES OF AMERICA
(16 87
xx
89
Y X 7 6 5 4 3 2 I
CONTENTS
RECOGNITION BY COMPONENTS: A THEORY OF VISUAL PATTERN RECOGNITION Irving Biederman I . Introduction .......................................................... I1 . An Analogy between Speech and Object Perception .................. 111. Theoretical Domain: Primal Access to Contour-Based Perceptual Categories ..... IV . Basic Phenomena of Object Recognition ................ V . Recognition by Components: An Overview ...................... VI . Nonaccidentalness:A Perceptual Basis for a Componential Representati VII . A Set of 36 Components Generated from Differences in Nonaccidental among Generalized Cones ............................. VIII . Relation of RBC to Principles of Perceptual Organization .................... IX . A Limited Number of Components? ...... ......................... X. tion ..................... XI . Componential Recovery Principle ................................ XI1 . Conclusion .......................................................... References ..... .........................
12 22 23 28 46 51 51
ASSOCIATIVE STRUCTURES IN INSTRUMENTAL LEARNING Ruth M . Colwill and Robert A . Rescorla I . Introduction .......................................................... I1 . Evidence for Response-Reinforcer Associations ............................ 111. Separation of R-Reinforcer from S-Reinforcer Learning ..................... IV . The Role of the Stimulus in Instrumental Behavior .......................... V . Conclusion .......................................................... References ........................................................... V
55 57 78 82 98 98
Contents
vi
THE STRUCTURE OF SUBJECTIVE TIME: HOW TIME FLIES John Gibbon 1. Introduction . . . . . . . . . . . . . ......................................... 11. The Temporal Middle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Experiment 1: Baseline Time Left . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IV. Time-Left Mixture: The Harmonic Mean . . . . . . . . . . . . . . . . . . . . . . . . . V . Experiment 2: Arithmetic and Harmonic Mean Standards ..................... VI. Experiment 3: Harmonic Mean Asymptote ......................... VII. Concluding Remarks . . . . . ....................... Appendix: Double Standard .................................. References . . . . . . . . . . . . . . . . . ..... ..........
105
I08 112 I I5 122 I25 130 131 134
THE COMPUTATION OF CONTINGENCY IN CLASSICAL CONDITIONING Richard H . Granger, Jr. and Jeffrey C . Schlimmer 1. Introduction: Theory and Experiment in Classical Conditioning . . A Three-Level Analysis of Classical Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . .
11. 111. IV. V. VI.
Background: Historical Perspective on Contingency ..... Detail: The Contingency Computation, Algorithm, and Implementation . . . . . . . . . Breadth of the Theory: Blocking, Latency, Tracking, Learned Irrelevance Summary: Limitations and Contributions of the Theory . . . . . . . . . . . . . . . . . . . Appendix A: Derivation of Contingency Surface . . . . . . . . . . . . . . . . . Appendix B: Comparative Analysis of Performance of Contingency A References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I37 I39 I50 153 176 I83 185 I86 189
BASEBALL: AN EXAMPLE OF KNOWLEDGE-DIRECTED MACHINE LEARNING Elliot Soloway 1. 11. 111. IV. V. VI. VII.
Introduction: Motivation and Goals ............................ Representing the Game of Interpretation Process ... .......................... Generalization Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evaluation Process . . . . . . Experiments . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References .................................
193 194 I96 211 220 2 24 234 235
Contents
vii
MENTAL CUES AND VERBAL REPORTS IN LEARNING Francis S . Bellezza 1. Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. Mental Cues and the Computer Metaphor . . . . . . . . . . . . .
1V. Properties of Mental Cues Important in Learning . . . . . . . . . . . . V. Mental Cues Formed under Different Task Sets ............................. References . . . . . .
..................
237
257 268
MEMORY MECHANISMS IN TEXT COMPREHENSION Murray Glanzer and Suzanne Donnenwerth Nolan Introduction: Restrictions ......................... Background: Preceding Work ................................. 111. Text Comprehension Studies ........................... IV. Theoretical Analysis of Thematic Information Carryover ..................... V. General Theoretical Statement ........................ 1 Text: Abstraction Paradigm 1.
11.
........................ ..................... Index.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
275 277 28 I 304 307 312 314 315 319
This Page Intentionally Left Blank
RECOGNITION BY COMPONENTS: A THEORY OF VISUAL PATTERN RECOGNITION Irving Biederman DEPARTMENT OF PSYCHOLOGY STATE UNIVERSITY OF NEW YORK AT BUFFALO BUFFALO, NEW YORK 14260
I. Introduction This article describes recent research and theory on the human’s ability to recognize visual entities. The fundamental problem of object recognition is that any single object can project an infinity of image configurations to the retina. The orientation of the object to the viewer can vary continuously, each giving rise to a different two-dimensional projection. The object can be occluded by other objects or texture fields, as when viewed behind foliage. The object need not be presented as a full-colored, textured image, but instead can be a simplified line drawing. Moreover, the object can even be missing some of its parts or be a novel exemplar of its particular category. But it is only with rare exceptions that an image fails to be rapidly and readily classified, either as an instance of a familiar object category or as an instance that cannot be so classified (itself a form of classification). A Do-It-Yourself Example
Consider the object shown in Fig. 1. We readily recognize it as one of those objects that cannot be classified into a familiar category. Despite its overall unfamiliarity, there is near unanimity in its descriptions. We parse-or segment-its parts at regions of deep concavity and describe those parts with common, simple volumetric terms, such as “a block,” “a cylinder,” “a funnel 01’ truncated cone.” We can look at the zigzag horizontal brace as a texture region or zoom in and interpret it as a series of connected blocks. The same is true of the mass at the lower left-we can see it as a texture area or zoom in and parse it into its various bumps. Although we know that it is not a familiar object, after a while we can say what it resembles: a New York City hot dog cart, with the large block being the central food storage and cooking area, the rounded part underneath as a wheel, THE PSYCHOLOGY OF LEARNING AND MOTIVATION, VOL. 20
1
Copyright Q 1986 by Academic Rcss. Inc. All rights of reproduction in any form reserved.
2
Irving Biederman
Fig. I . A do-it-yourself object. There is a strong consensus in the segmentation loci of this configuration and in the description of its parts.
the large arc on the right as a handle, the funnel as an orange juice squeezer, and the various vertical pipes as vents or umbrella supports. It is not a good cart, but we can see how it might be related to one. It is like a 10-letter word with 4 wrong letters. We readily conduct the same process for any object, familiar or unfamiliar, in our foveal field of view. The manner of segmentation and analysis into components does not appear to depend on our familiarity with the particular object being identified. The naive realism that emerges in descriptions of nonsense objects may be reflecting the workings of a representational system by which objects are identified.
11. An Analogy between Speech and Object Perception As will be argued in a later section, the number of categories into which we can classify objects rivals the number of words that can be readily identified when listening to speech. Lexical access during speech perception can be successfully modeled as a process mediated by the identification of individual primitive elements, the phonemes, from a relatively small set of primitives (MarslenWilson, 1980). We only need about 38 phonemes to code all the words in English, 15 in Hawaiian, and 55 to represent virtually all the words in all the languages spoken on earth. Because the set of primitives is so small and each phoneme specifiable by dichotomous (or trichotomous) contrasts (e.g., voiced vs
Visual Pattern Recognition
3
unvoiced, nasal vs oral) on a handful of attributes, one need not make particularly fine discriminations in the speech stream. The representational power of the system derives from its permissiveness in allowing relatively free combinations of its primitives. The hypothesis explored here is that a roughly analogous system may account for our capacities for object recognition. In the visual domain, however, the primitive elements would not be phonemes, but a modest number of simple volumes such as cylinders, blocks, wedges, and cones. Objects are segmented, typically at regions of sharp concavity, and the resultant parts matched against the best-fitting primitive. The set of primitives derives from combinations of contrastive characteristics of the edges in a two-dimensional image (e.g., straight vs curved, symmetrical vs asymmetrical) that define differences among a set of simple volumes (viz., those that tend to be symmetrical and lack sharp concavities). The particular properties of edges that are postulated to be relevant to the generation of the volumetric primitives have the desirable properties that they are invariant over changes in orientation and can be determined from just a few points on each edge. Consequently, they allow a primitive to be extracted with great tolerance for variations of viewpoint and noise. Just as the relations among the phonemes are critical in lexical access-“fur” and “rough” have the same phonemes, but are not the same words-the relations among the volumes are critical for object recognition: Two different arrangements of the same components could produce different objects. In both cases, the representational power derives from the enormous number of combinations that can arise from a modest number of primitives. The relations in speech are limited to left-to-right (sequential) orderings; in the visual domain a richer set of possible relations allows a far greater representational capacity from a comparable number of primitives. The matching of objects in recognition is hypothesized to be a process in which the perceptual input is matched against a representation that can be described by a few simple volumes in specified relations to each other.
111. Theoretical Domain: Primal Access
to Contour-Based Perceptual Categories Our theoretical goal is to account for the initial categorization of isolated objects. Often, but not always, this categorization will be at a basic level, for example, when we know that a given object is a typewriter, banana, or giraffe (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976). Much of our knowledge about objects is organized at this level of categorization-the level at which there is typically some readily available name to describe that category (Rosch et al., 1976). The hypothesis explored here predicts that in certain cases subordi-
4
Irving Biederman
nate categorizations can be made initially, so that we might know that a given object is a floor lamp, sports car, or dachshund, more rapidly than we know that it is a lamp, car, or dog (e.g., Jolicour, Gluck, & Kosslyn, 1984).
THEROLE OF SURFACECHARACTERISTICS There is a restriction on the scope of this approach of volumetric modeling that should be noted. The modeling has been limited to concrete entities of the kind typically designated by English count nouns. These are concrete objects that have specified boundaries and to which we can apply the indefinite article and number. For example, for a count noun such as chair we can say “a chair” or “three chairs.” By contrast, mass nouns are concrete entities to which the indefinite article or number cannot be applied, such as water, sand, or snow. So we cannot say “a water” or “three waters” unless we refer to a count noun shape as in “a drop of water,” “a bucket of water,” or “a grain of sand,” each of which does have a simple volumetric description. We conjecture that mass nouns are identified primarily through surface characteristics such as texture and color rather than through volumetric primitives. Under restricted viewing conditions, as when an object is partially occluded, texture, color, and other cues (such as position in the scene and labels) may contribute to the identification of count nouns, as, for example, when we identify a particular shirt in the laundry pile from just a bit of fabric. Such identifications are indirect, typically the result of inference over a limited set of possible objects. The goal of the present effort is to account for what can be called primal access: the first contact of a perceptual input from an isolated, unanticipated object to a representation in memory.
IV.
Basic Phenomena of Object Recognition
Independent of laboratory research, the phenomena of everyday object identification provide strong constraints on possible models of recognition. In addition to the fundamental phenomenon that objects can be recognized at all (not an altogether obvious conclusion), at least five facts are evident. Typically, an object can be recognized (1) rapidly, (2) when viewed from novel orientations, (3) under moderate levels of visual noise, (4) when partially occluded, and ( 5 ) when it is a new exemplar of a category. Implications
The preceding five phenomena constrain theorizing about object interpretation in the following ways.
Visual Pattern Recognition
5
1. Access to the mental representation of an object should not be dependent on absolute judgments of quantitative detail because such judgments are slow and error prone (Miller, 1956; Gamer, 1966). For example, distinguishing among just several levels of the degree of curvature or length of an object typically requires more time than that required for the identification of the object itself. Consequently, such quantitative processing cannot be the controlling factor by which recognition is achieved. 2. The information that is the basis of recognition should be relatively invariant with respect to orientation and modest degradation. 3. Partial matches should be computable. A theory of object interpretation should have some principled means for computing a match for occluded, partial, or new exemplars of a given category. We should be able to account for the human’s ability to identify, for example, a chair when it is partially occluded by other furniture, or when it is missing a leg, or when it is a new model.
V. Recognition by Components: An Overview Our hypothesis, recognition by components (RBC), bears some relation to several prior conjectures for representing objects by parts or modules (e.g., Binford, 1971; Guzman, 1971; Marr,1977; Marr & Nishihara, 1978; Tversky 8z Hemenway, 1984). RBC’s contribution lies in its proposal for a particular vocabulary of components derived from perceptual mechanisms and its account of how an arrangement of these components can access a representation of an object in memory. When an image of an object is painted across the retina, RBC assumes that a representation of the image is segmented-or parsed-into separate regions at points of deep concavity, particularly at cusps where there are discontinuities in curvature (Hoffman & Richards, 1985). In general, concavities will arise whenever convex volumes are joined, a principle that Hoffman and Richards (1985) call transversality. Such segmentation conforms well with human intuitions about the boundaries of object parts and does not depend on familiarity with the object, as was demonstrated with the nonsense object in Fig. 1. The resultant parsed regions are then approximated by simple volumetric components that can be modeled by generalized cones (Binford, 1971; Marr, 1977, 1982). A generalized cone is the volume swept out by a cross section moving along an axis (as illustrated later in Fig. 5). [Marr (1977, 1982) showed that the contours generated by any smooth surface could be modeled by a generalized cone with a convex cross section.] The cross section is typically hypothesized to be at right angles to the axis. Secondary segmentation criteria (and criteria for determining the axis of a component) are those that afford descriptions of volumes that maximize symmetry, length, and constancy of the size and curvature of the cross
Irving Biederman
6
section of the component. Of these, symmetry often provides the most compelling subjective basis for selecting subparts (Brady & Asada, 1984; Connell, 1985). These secondary bases for segmentation and component identification are discussed below. The primitive components are hypothesized to be simple, typically symmetrical volumes lacking sharp concavities, such as blocks, cylinders, spheres, and wedges. The fundamental perceptual assumption of RBC is that the components can be differentiated on the basis of perceptual properties in the twodimensional image that are readily detectable and relatively independent of viewing position and degradation. These perceptual properties include several that have traditionally been thought of as principles of perceptual organization, such as good continuation, symmetry, and Pragnanz. RBC thus provides a principled account of the relation between the classic phenomena of perceptual organization and pattern recognition: Although objects can be highly complex and irregular, the units by which objects are identified are simple and regular. The constraints toward regularization (Pragnanz) are thus assumed to characterize not the complete object, but the object’s components. By the preceding account, surface characteristics such as color and texture will typically have only secondary roles in primal access. This should not be interpreted as suggesting that the perception of surface characteristics per se is delayed relative to the perception of the components, but merely that in most cases the surface characteristics are generally less efficient routes for accessing the classification of a count object; that is, we may know that a chair has a particular color and texture simultaneously with its volumetric description, but it is only the volumetric description that provides efficient access to the mental representation of “chair. I Relations among the Components. Although the components themselves are the focus of this article, as noted previously, the arrangement of primitives is necessary for representing a particular object. Thus, an arc side-connected to a cylinder can yield a cup, as shown in Fig. 2. Different arrangements of the same components can readily lead to different objects, as when an arc is connected to the top of the cylinder to produce a pail in Fig. 2. Whether a component is ”
‘There are, however, objects that would seem to require both a volumetric description and a texture region for an adequate representation, such as hairbrushes, typewriter keyboards, and corkscrews. It is unlikely that many of the individual bristles, keys, or coils are parsed and identified prior to the identification of the object. Instead, those regions are represented through the statistical processing that characterizes their texture (e.g., Beck, Prazdny, & Rosenfeld, 1983; Julesz. 1981), although we retain a capacity to zoom down and attend to the volumetric nature of the individual elements. The structural description that would serve as a representation of such objects would include a statistical specification of the texture field along with a specification of the larger volumetric components. These compound texture-componential objects have not been studied, but it is possible that the characteristics of their identification would differ from objects that are readily defined solely by their arrangement of volumetric components.
Visual Pattern Recognition
(a)
(b)
7
(C)
(d)
Fig. 2. Different arrangements of the same components can produce different objects.
attached to a long or short surface can also affect classification, as with the arc producing either an attach6 case or a strongbox in Fig. 2. The identical situation between primitives and their arrangement exists in the phonemic representation of words, where a given subset of phonemes can be rearranged to produce different words. The representation of an object would thus be a structural description that expressed the relations among the components (Winston, 1975; Brooks, 198 1; Ballard & Brown, 1982). A suggested (minimal) set of relations is described in Table I and would include specification of the relative sizes of the components and their points of attachment. STAGESOF PROCESSING
Figure 3 presents a schematic of the presumed subprocesses by which an object is recognized. An early edge extraction stage provides a line drawing description of the object. From this description, nonaccidental properties of the image, described below, are detected. Parsing is performed at concave regions simultaneously with a detection of nonaccidental properties. The nonaccidental properties of the parsed regions provide critical constraints on the identity of the components. Within the temporal and contextual constraints of primal access, the stages up to and including the identification of components are assumed to be bottom up. A delay in the determination of an object's components should have a direct effect on the identification latency of the object. The arrangement of the components is then matched against a representation in memory. It is assumed that the matching of the components occurs in parallel, with unlimited capacity. Partial matches are possible, with the degree of match assumed to be proportional to the similarity in the components between the image and the representation.2 This stage model is presented to provide an overall theoretical context. The focus of this article is on the nature of the units of the representation. 2Modeling the matching of an object image to a mental representation is a rich, relatively neglected problem area. Tversky's (1977) contrast model provides a useful framework with which to consider this similarity problem in that it readily allows distinctive features (i.e., components) of the image to be considered separately from the distinctive components of the representation. This allows principled assessments of similarity for partial objects (components in the representation, but not in
8
Irving Biederman
rn Extraction
Parsing at Regions of Concavity
Detection of Nonoccidental Properties
1
Determination of Components
Matching of Components
Object ldentif ication
Fig. 3. Presumed processing stages in object recognition.
VI.
Nonaccidentalness: A Perceptual Basis for a Componential Representation
Recent theoretical analyses of perceptual organization (Binford, 98 1; Lowe, 1984; Witkin & Tenenbaum, 1983) suggest a perceptual basis for RBC. The central organizational principle is that certain properties of the two-dimensional image are taken by the visual system as strong evidence that the three-dimensional object contains those same properties. For example, if there is a straight line in the image, the visual system infers that the edge producing that line in the threedimensional world is also straight. Images that are symmetrical only under reflection are interpreted as arising from objects with that property. The visual system ignores the possibility that the property in the image is merely a result of an (highly unlikely) accidental alignment of eye and a curved edge. the image) and novel objects (containingcomponents in the image that are not in the representation). It may be possible to construct a dynamic model based on a parallel distributed process as a modification of the kind proposed by McClelland and Rumelhart (1981) for word perception, with components playing the role of letters. One difficulty facing such an effort is that the neighbors for a given word are well specified and readily available from a dictionary; the set of neighbors for a given object is not.
Visual Pattern Recognition
9
If the image is symmetrical, we assume that the object projecting that image is also symmetrical. The order of symmetry is also preserved: Images that are symmetrical under both rotation and reflection, such as a square or circle, are interpreted as arising from objects (or surfaces) that are symmetrical under both rotation and reflection. Although skew symmetry is often readily perceived as arising from a tilted symmetrical object or surface, there are cases where skew symmetry is not readily detected (Attneave, 1983). Parallelism and cotermination constitute the remaining nonaccidental relations. All five of these two-dimerrsional nonaccidental properties and the associated three-dimensional inferences are described in Fig. 4 (modified from Lowe, 1984). Witkin and Tenenbaum (see also Lowe, 1984) argue that the leverage provided by these nonaccidental relations for inferring a three-dimensional structure from a two-dimensional image is so powerful that they pose a challenge to the effort in computer vision and perceptual psychology that assigned central importance to variation in local surface characteristics, such as luminance. The psychological literature provides considerable evidence supporting the assumption that these nonaccidental propa p l e of Non-Accidentalnes: Criticol information is unlikely to be a unseqwnce of on occident of viewpoint.
m
c
e Inference from Image Fmtum 3-0 Inference
2-D Rebtion 4. Collinearity of points or lines
Examples
Collinearity in 3-Space / /
2. Curvilineorityof points of arcs
Curvilinwrity in 3-spacS /
3. Symmetry
/
------.A \
........'...
.
Symmetry in 3-qpaw
(Skew Symmetry 7)
4.Porallel Curves (Over Small Visuol Angles)
Curves ore pmalkl in 3-Spaw
5. Vertices-- two or more terminations ato
Curves terminate at o cmmon winl m 3-Swce
"Fork"
"Arrow"
Fig. 4. Five nonaccidental relations (adapted from Lowe, 1985).
Irving Biederman
10
erties can serve as primary organizational constraints in human image interpretation. PSYCHOLOGICAL EVIDENCE FOR THE RAPIDUSE NONACCIDENTALRELATIONS
OF
There can be little doubt that images are interpreted in a manner consistent with the nonaccidental principles. But are these relations used quickly enough so as to provide a perceptual basis for the components that allow primal access? Although all the principles have not received experimental verification, the available evidence does suggest that the answer to the preceding question is “yes.” There is strong evidence that the visual system quickly assumes and uses collinearity, curvature, symmetry, and cotermination. This evidence is of two sorts: ( I ) demonstrations, often compelling, showing that when a given two-dimensional relation is produced by an accidental alignment of object and image, the visual system accepts the relation as existing in the three-dimensional world; and (2) search tasks showing that when a target differs from distracters in a nonaccidental property, as when one is searching for a curved arc among straight segments, the detection of that target is facilitated compared to conditions where targets and background do not differ in such properties.
I.
Collinearity versus Curvature
The demonstration of the collinearity or curvature relations is too obvious to be performed as an experiment. When looking at a straight segment, no observer would assume that it is an accidental image of a curve. That the contrast between straight and curved edges is readily available for perception was shown by Neisser (1963). He found that a search for a letter composed only of straight segments, such as a Z, could be performed faster when it was embedded in a field of curved distracters, such as C, G, 0, and Q , than when it was among other letters composed of straight segments such as N, W, V , and M. 2 . Symmetry and Parallelism Many of the Ames demonstrations, such as the trapezoidal window and Ames room, derive from an assumption of symmetry that includes parallelism (Meson, 1952). Palmer ( 1980) showed that the subjective directionality of arrangements of equilateral triangles was based on the derivation of an axis of symmetry for the arrangement. King, Meyer, Tangney, and Biederman (1976) demonstrated that a perceptual bias toward symmetry accounted for a number of shape constancy effects. Garner (1974), Checkosky and Whitlock (1973), and Pomerantz (1978) provided ample evidence that not only can symmetrical shapes be quickly discriminated from asymmetrical stimuli, but the degree of symmetry was also a
Visual Pattern Recognition
II
readily available perceptual distinction. Thus, stimuli that were invariant under both reflection and 90" increments in rotation could be rapidly discriminated from those that were only invariant under reflection (Checkosky & Whitlock, 1973).
3. Cotermination The "peephole perception" demonstrations, such as the Ames chair (Meson, 1952) or the physical realization of the impossible triangle (Penrose & Penrose, 1958), are produced by accidental alignment of noncoterminous segments. The success of these demonstrations documents the immediate and compelling impact of this relation. The registration of cotermination is important for determining vertices that provide information which can serve to distinguish the components. In fact, one theorist (Binford, 1981) has suggested that the major function of eye movements is to determine coterminous edges. With polyhedra (volumes produced by planar surfaces), the Y, arrow, and L vertices allow inference as to the identity of the volume in the image. For example, the silhouette of a brick contains a series of six vertices, which alternate between L's and arrows, and an internal Y vertex, as illustrated in any of the straight-edged cross-sectioned volumes in Fig. 6. The Y vertex is produced by the cotermination of three segments, with none of the angles greater than 180". (An arrow vertex contains an angle that exceeds 180".) This vertex is not present in components that have curved cross sections, such as cylinders, and thus can provide a distinctive cue for the cross-sectional edge. Perkins (1983) has described a perceptual bias toward parallelism in the interpretation of this ~ e r t e x .[Chakravarty ~ (1979) has discussed the vertices formed by curved regions.] Whether the presence of this particular internal vertex can facilitate the identification of a brick versus a cylinder is not yet known, but a recent study by Biederman and Blickle (1985, described below) demonstrated that deletion of vertices adversely affected object recognition more than the deletion of the same amount of contour at midsegment. The T vertex represents a special case in that it is not a locus of cotermination (of two or more segments), but only the termination of one segment on another. Such vertices are important for determining occlusion and thus segmentation (along with concavities) in that the edge forming the (normally) vertical segment 3When such vertices formed the central angle in a polyhedron, Perkins (1983) reported that the surfaces would almost always be interpreted as meeting at right angles as long as none of the three angles was less than 90". Indeed, such vertices cannot be projections of acute angles (Kanade, 1981). but the human appears insensitive to the possibility that the vertices could have arisen from obtuse angles. If one of the angles in the central Y vertex was acute, then the polyhedra would be interpreted as irregular. Perkins found that subjects from rural areas of Botswana, where there was a lower incidence of exposure to carpentered (right-angled) environments, had an even stronger bias toward rectilinear interpretations than Westerners (Perkins & Deregowski, 1982).
12
Irving Biederman
of the T cannot be closer to the viewer than the segment forming the top of the T (Binford, 1981). By this account, the T vertex might have a somewhat different status than the Y,arrow, and L vertices in that the T’s primary role would be in segmentation rather than in establishing the identity of the v01ume.~ Vertices composed of three segments, such as the Y and arrow, and their curved counterparts, are important determinants as to whether a given component is volumetric or planar. Planar components are discussed below but, in general, such components lack three-pronged vertices. The high speed and accuracy of determining a given nonaccidental relation, for example, whether some pattern is symmetrical, should be contrasted with performance in making absolute quantitative judgments of variations in a single, physical attribute, such as length of a segment or degree of tilt or curvature. For example, the judgment as to whether the length of a given segment is 10, 12, 14, 16, or 18 inches is notoriously slow and error prone (Miller, 1956; Garner, 1962; Beck er al., 1983; Virsu, 1971a,b; Fildes & Triggs, 1985). Even these modest performance levels are challenged when the judgments have to be executed over the brief 100-msec intervals (Egeth & Pachella, 1969) that are sufficient for accurate object identification. Perhaps even more telling against a view of object recognition that would postulate the making of absolute judgments of fine quantitative detail is that the speed and accuracy of such judgments decline dramatically when they have to be made for multiple attributes (Miller, 1956; Gamer, 1962; Egeth & Pachella, 1969). In contrast, object recognition latencies for complex objects are reduced by the presence of additional (redundant) components (Biederman, Ju, & Clapper, 1985, described below).
VII. A Set of 36 Components Generated from Differences in Nonaccidental Properties among Generalized Cones I have emphasized the particular set of nonaccidental properties shown in Fig. 4 because they may constitute a perceptual basis for the generation of the set of
components. Any primitive that is hypothesized to be the basis of object recogni4The arrangement of vertices, particularly for polyhedra, offers constraints on “possible” interpretations of lines as convex, concave, or occluding (e.g., Sugihara, 1984). In general, ‘the constraints take the form that a segment cannot change its interpretation (e.g., from concave to convex) unless it passes through a vertex. “Impossible” objects can be constructed from violations of this constraint (Waltz, 1975) as well as from more general considerations (Sugihara, 1982, 1984). It is tempting to consider that the visual system captures these constraints in the way in which edges are grouped into objects, but the evidence would seem to argue against such an interpretation. The impossibility of most impossible objects is not immediately registered, but requires scrutiny and thought before the inconsistency is detected. What this means in the present context is that the visual system has a capacity for classifying vertices locally, but no perceptual routines for determining the global consistency of a set of vertices.
Visual Pattern Recognition
13
Constant
Fig. 5 . Variations in generalized cones that can be detected through nonaccidental properties. Constant-sizedcross sections have parallel sides; expanded or expanded and contracted cross sections have sides that are not parallel. Curved versus straight cross sections and axes are detectable through collinearity or curvature. The three values of cross-sectional symmetry (symmetrical under reflection and 90"rotation, reflection only, or asymmetrical) are detectable through the symmetry relation.
tion should be rapidly identifiable and invariant over viewpoint and noise. These characteristics would be attainable if differences among components were based on differences in nonaccidental properties. Although additional nonaccidental properties exist, there is empirical support for rapid perceptual access to the five described in Fig. 4. In addition, these five relations reflect intuitions about significant perceptual and cognitive differences among objects. From variation over only two or three levels in the nonaccidental relations of four attributes of generalized cylinders, a set of 36 components can be generated. A subset is illustrated in Fig. 5. Some of the generated volumes and their organization are shown in Fig. 6. Three of the attributes describe characteristics of the cross section: its shape, symmetry, and constancy of size as it is swept along the axis. The fourth attribute describes the shape of the axis: 1. Cross section A. Edges S Straight C Curved B. Symmetry + + Symmetrical: Invariant under rotation and reflection Symmetrical: Invariant under reflection Asymmetrical
+
Irving Biederman
14
C.
2.
Constancy of size of cross section as it is swept along axis + Constant Expanded - - Expanded and contracted
Axis D. Curvature + Straight Curved
A. PERCEPTUAL BIASESAMONG
THE
COMFUNENTS
The values of these four attributes are presented as contrastive differences in nonaccidental properties: straight versus curved, symmetrical versus asyrnmetrical, parallel versus nonparallel. Cross-sectional edges and curvature of the axis are distinguishable by collinearity or curvilinearity. The constant versus CROSS SECTION
Fig. 6. Proposed partial set of volumetric primitives (geons) derived from differences in nonaccidental properties.
Visual Pattern Recognition
15
expanded size of the cross section would be detectable through parallelism; a constant cross section would produce a generalized cone with parallel sides (as with a cylinder or brick); an expanded cross section would produce edges that were not parallel (as with a cone or wedge), and a cross section that expanded and then contracted would produce an ellipsoid with nonparallel sides and an extrema of positive curvature (as with a lemon). As Hoffman and Richards (1985) have noted, such extrema are invariant with viewpoint. The three levels of cross-sectional symmetry are equivalent to Garner's (1974) distinction of the number of different stimuli produced by 90" rotations and reflections of a stimulus. Thus, a square or circle would be invariant under 90" rotation and reflection; but a rectangle or ellipse would be invariant only under reflection, as 90" rotations would produce a second figure. Asymmetrical figures would produce eight different figures under 90" rotation and reflection. 1 . Negative Values The plus values are those favored by perceptual biases and memory errors. No bias is assumed for straight and curved edges of the cross section. For symmetry, clear biases have been documented. For example, if an image could have arisen from a symmetrical object, then it is interpreted as symmetrical (King et al., 1976). The same is apparently true of parallelism. If edges could be parallel, then they are typically interpreted as such, as with the trapezoidal room or window. 2. Curved Axes Figure 7 shows three of the most negatively marked primitives with curved cross sections. Such volumes often resemble biological entities. An expansion and contraction of a rounded cross section with a straight axis produces an ellipsoid (lemon) (Fig. 7a), an expanded cross section with a curved axis produces a horn (Fig. 7b), and an expanded and contracted cross section with a rounded cross section produces a banana slug or gourd (Fig. 7c). In contrast to the natural forms generated when both cross section and axis are curved, the components swept by a straight-edged cross section traveling along a curved axis (e.g., the components on the first, third, and fifth rows of Fig. 8) appear somewhat less familiar and more difficult to apprehend than their curved counterparts. It is possible that this difficulty may merely be a consequence of unfamiliarity. Alternatively, the subjective difficulty might be produced by a conjunction-attention effect (CAE) of the kind discussed by Treisman (e.g., Treisman & Gelade, 1980). CAEs are described in the section on attentional effects. In the present case, given the presence in the image of curves and straight edges (for the rectilinear cross sections with curved axis), attention (or scrutiny) may be required to determine which kind of segment to assign to the axis and which to assign to the cross section. Curiously, the problem does not present
Irving Biederman
16
Cross Section : Edge: Curved (C) Symnetry: Yes (+I Size: Exwnded R Contmcted:(--I
Cross Section: Edge: Curved (C) Symmetry: Yes (+) Size: ExDanded (+I A:* Curved (-1
W H o r n l
Cross Section: Edge: Curved (C) Symmetry: Yes (+I Sire: Expondad It Contracted (-4 Axis: Curved (-1
C
(Gourd)
Fig. 7. Three curved components with curved axes or expanded and/or contracted cross sections. These tend to resemble biological forms.
itself when a curved cross section is run along a straight axis to produce a cylinder or cone. The issue as to the role of attention in determining components would appear to be empirically tractable using the paradigms created by Treisman and her colleagues (Treisman & Gelade, 1980; Treisman, 1982a,b; Treisman & Schmidt, 1983).
3. Asymmetrical Cross Sections There are an infinity of possible cross sections that could be asymmetrical. How does RBC represent this variation? RBC assumes that the differences in the departures from symmetry are not readily available and thus do not affect primal access. For example, the difference in the shape of the cross section for the two straight-edged volumes in Fig. 9 might not be apparent sufficiently quickly to affect object recognition. This does not mean that an individual could not store the details of the volume produced by an asymmetrical cross section. But if such detail required additional time for its access, then the expectation is that it could not mediate primal access. As of this writing, I do not know of any case where primal access depends on discrimination among asymmetrical cross sections
-m l s
Visual Pattern Recognition
17
CROSS SECTION
Ban
Straight S CuMldC
SYmrrmtrY
siza
A&
Rot BW++ constow++ Stmight + Ref+ E e Curd~sym~xpac~\t--
+
++
-
+
++
-
++
-
-
e c++
-
-
+
-
-
-
-
Q
c
m s mls
alC +
within a given component type, for example, among curved-edged cross sections of constant size, straight axes, and a specified aspect ratio. For example, the curved cross section for the component that can model an airplane wing (or car door) is asymmetrical. Different wing designs might have different-shaped cross sections. I assume that most people, including wing designers, will know that the object is an airplane, or even an airplane wing, before they know how to classify the wing on the basis of the asymmetry of its cross section. A second way in which asymmetrical cross sections need not be individually represented is that they often produce volumes that resemble symmetrical, but truncated wedges. This latter form of representing asymmetrical cross sections would be analogous to the schema-plus-correctionphenomenon noted by Bartlett (1932). The implication of a schema-plus-correctionrepresentation would be that a single primitive category for asymmetrical cross sections and wedges might be sufficient. For both kinds of volumes, their similarity may be a function of the detection of a lack of parallelism in the volume. One would have to exert scrutiny to determine whether a lack of parallelism was caused by a cross section with
Irving Biederman
\
Fig. 9. Volumes with an asymmetrical, straight-edged cross section. Detection of differences between such volumes might require attention.
nonparallel sides or by a symmetrical cross section that varied in size. In this case, as with the components with curved axes described in the preceding section, a single primitive category for both wedges and asymmetrical straightedged volumes could be postulated that would allow a reduction in the number of primitive components. There is considerable evidence that asymmetrical patterns require more time for their identification than symmetrical patterns (Checkosky & Whitlock, 1973; Pomerantz, 1978). Whether these effects have consequences for the time required for object identification is not yet known. 4.
Conjunction-Attentional Effects
A single feature can often be detected without any effect of the number of distracting items in the visual field. For example, the time for detecting a blue shape (a square or a circle) among a field of green distracter shapes is unaffected by the number of green shapes. However, if the target is defined by a conjunction of features, for example, a blue square among distracters consisting of green squares and blue circles, so that both the color and the shape of each item must be
Visual Pattern Recognition
19
determined to know if it is or is not the target, then target detection time increases linearly with the number of distracters (Treisman & Gelade, 1980). These results have led to a theory of visual attention that assumes that the human can monitor all potential display positions simultaneously and with unlimited capacity for a single feature (e.g., something blue or something curved). But when a target is defined by a conjunction of features, then a limited capacity attentional system that can only examine one display position at a time must be deployed (Treisman & Gelade, 1980). The extent to which Treisman and Gelade’s (1980) demonstration of conjunction-attention effects may be applicable to the perception of volumes and objects has yet to be evaluated. In the extreme, in a given moment of attention, it may be the case that the values of the four attributes of the components are detected as independent features. In cases where the attributes, taken independently, can define different volumes, as with the shape of cross sections and axis, an act of attention might be required to determine the specific component generating those attributes: Am I looking at a component with a curved cross section and a straight cross section or is it a straight cross section and a curved axis? At the other extreme, it may be that an object recognition system has evolved to allow automatic determination of the components. The more general issue is whether relational structures for the primitive components are defined automatically or whether a limited attentional capacity is required to build them from their individual edge attributes. It could be the case that some of the most positively marked volumes are detected automatically, but that the volumes with negatively marked attributes might require attention. That some limited capacity is involved in the perception of objects (but not necessarily their components) is documented by an effect of the number of irrelevant objects on perceptual search (Biederman, 1981). Reaction times and errors for detecting an object, for example, a chair, increased linearly as a function of the number of nontarget objects in a 100-msec presentation of a clockface display (Biederman, 1981). Whether this effect arises from the necessity to use a limited capacity to construct a component from its attributes or whether the effect arises from the matching of an arrangement of components to a representation is not yet known.
B. ADDITIONALSOURCES OF CONTOUR VARIATION 1 . Metric Variation
For any given component type, there can be an infinite degree of metric variation in aspect ratio, degree of curvature (for curved components), and departure from parallelism (for nonparallel components). How should this quantitative variation be conceptualized? The discussion will concentrate on aspect ratio, probably the most important of the variations. But the issues will be
20
Irving Biederman
generally applicable to the other metric variations as well. [Aspect ratio is a measure of the elongation of a component. It can be expressed as the width-toheight ratio of the smallest bounding rectangle that would just enclose the component. It is somewhat unclear as to how to handle components with curved axis. The bounding rectangle could simply enclose the component, whatever its shape. Alternatively, two rectangles could be constructed.] One possibility is to include specification of a range of aspect ratios in the structural description of the object. It seems plausible to assume that recognition can be indexed, in part, by aspect ratio in addition to a componential description. An object’s aspect ratio would thus play a role similar to that played by word length in the tachistoscopic identification of words, where long words are rarely proffered when a short word is flashed. Consider an elongated object, such as a baseball bat with an aspect ratio of 15:l. When the orientation of the object is orthogonal to the viewpoint so that the aspect ratio of its image is also 15: 1, recognition might be faster than when presented at an orientation where the aspect ratio of its image differed greatly from that value, say 2: 1. One need not have a particularly fine-tuned function for aspect ratio as large differences in aspect ratio between two components would, like parallelism, be preserved over a large proportion of arbitrary viewing angles. Another way to incorporate variations in the aspect ratio of an object’s image is to represent only qualitative differences so that variations in aspect ratios exert an effect only when the relative sizes of the longest dimensions undergo reversal. Specifically, for each component and the complete object, three variations could be defined, depending on whether the axis was much smaller, approximately equal to, or much longer than the longest dimension of the cross section. For example, for a component whose axis was longer than the diameter of the cross section (which would be true in most cases), only when the projection of the cross section became longer than the axis would there be an effect of the object’s orientation, as when the bat was viewed almost from on end so that the diameter of the handle was greater than the projection of its length. A close dependence of object recognition performance on the preservation of the aspect ratio of a component in the image would be inconsistent with the emphasis by RBC on dichotomous contrasts of nonaccidental relations. Fortunately, these issues on the role of aspect ratio are readily testable. Bartram’s (1976) experiments, described later in Section XI,A, suggest that sensitivity to variations in aspect ratio need not be given heavy weight: Recognition speed is unaffected by variation in aspect ratio across different views of the same object. 2 . Planar Components A special case of aspect ratio needs to be considered: When the axis for a constant cross section is much smaller than the greatest extent of the cross section, a component may lose its volumetric character and appear planar, as the
Visual Pattern Recognition
21
flipper of the penguin in Fig. 10 or the eye of the elephant in Fig. 11. Such shapes can be conceptualized in two ways. The first (and less favored) is to assume that these are just quantitative variations of the volumetric components, but with an axis length of zero. They would then have default values of a straight axis (+) and a constant cross section (+). Only the edge of the cross section and its symmetry could vary. Alternatively, it might be that a flat shape is not related perceptually to the foreshortened projection of the volume that could have produced it. Using the same variation in cross-sectional edge and symmetry as with the volumetric components, seven planar components could be defined. For symmetry, there would be the square and circle (with straight and curved edges, respectively), and for + symmetry the rectangle, triangle, and ellipse. Asymmetrical (-) planar components would include trapezoids (straight edges) and drop shapes (curved edges). The addition of these seven planar components to the 36 volumetric components yields 43 components (a number close to the number of phonemes required to represent English words). The triangle is here assumed to define a separate component, although a triangular cross section was not assumed to define a separate volume under the intuition that a prism (produced by a triangular cross section) is not quickly distinguishable from wedges. My preference for assuming that planar components are not perceptually related to their foreshortened volumes is based on the extraordinary difficulty of recognizing objects from views that are parallel to the axis of the major components, as shown in Fig. 26 (below). What might be critical here is the presence of a trihedral vertex, such as a fork or an arrow, or a curved counterpart to such vertices (Chakravarty, 1979). Such vertices provide strong evidence that the image is generated from a volumetric rather than a planar component.
++
3. Selection of Axis Given that a volume is segmented from the object, how is an axis selected? Subjectively, it appears that an axis is selected that would maximize its length, the symmetry of the cross section, and the constancy of the size of the cross section. It may be that by having the axis correspond to the longest extent of the component, bilateral symmetry can be more readily detected as the sides would be closer. Typically, a single axis satisfies all three criteria, but sometimes these criteria are in opposition and two (or more) axes (and component types) are plausible (Brady, 1983). Under these conditions, axes will often be aligned to an external frame, such as the vertical (Humphreys, 1983). 4. Parsing at Joins without Concavities
RBC assumes that parsing is primarily performed at regions of concavity. Some objects, however, can be readily modeled with a pair of components but no
22
Irving Biederman
concavity is apparent at the join of the components. For example, a rocket (or any cylinder with a tapered end) can be modeled by joining a cylinder and a cone. A cane furnishes another example. The join between the handle (a cylinder with a curved axis) and the long straight section does not have a concavity. Because the cross sections of the components in these cases are of identical shape and size, no concavity is produced. Such cases can be accommodated by formulating a secondary parsing rule: Parsing, if it is performed at all in the absence of concavities, occurs at regions where nonaccidental properties vary. In the case of the rocket, there would be a change from parallelism of the sides of the rocket’s tank to converging (nonparallel) edges for its nose cone. For the cane, it would be the change from straight to curved sides of the components. Almost always, of course, whenever volumes have different sized cross sections or differ in a nonaccidental property, concavities will be produced and it is these concavities that provide the most compelling support for segmentation. It is possible that when the secondary rule forms the only basis for parsing, recognition performance would suffer compared to objects whose components were segmentable at concavities.
VIII.
Relation of RBC to Principles of Perceptual Organization
Textbook presentations of perception typically include a section of gestalt organizational principles. This section is almost never linked to any other function of perception. RBC posits a specific role for these organizational phenomena in pattern recognition. Specifically, as suggested by the section on generating components through nonaccidental properties, the gestalt principles (or better, nonaccidental relations) serve to determine the individual components rather than the complete object. A complete object, such as a chair, can be highly complex and asymmetrical, but the components will be simple volumes. A consequence of this interpretation i s that it is the components that will be stable under noise or perturbation. If the components can be recovered and object perception is based on the components, then the object will be recognizable. This may be the reason why it is difficult to camouflage objects by moderate doses of random occluding noise, as when a car is viewed behind foliage. According to RBC, the components accessing the representation of an object can readily be recovered through routines of collinearity or curvature that restore contours (Lowe, 1984). These mechanisms for contour restoration will not bridge cusps. For visual noise to be effective, by these considerations, it must obliterate the concavity and interrupt the contours from one component at the precise point where they can be joined, through collinearity or constant curvature, with the contours of another component. The likelihood of this occurring
Visual Pattern Recognition
23
by moderate random noise is, of course, extraordinarily low, and it is a major reason why, according to RBC, objects are rarely rendered unidentifiable by noise. The consistency of RBC with this interpretation of perceptual organization should be noted. RBC holds that the (strong) loci of parsing are at cusps; the components are organized from the contours between cusps. In classical gestalt demonstrations, good figures are organized from the contours between cusps. Experiments subjecting these conjectures to test are described in a later section.
IX. A Limited Number of Components? The motivation behind the conjecture that there may be a limit to the number of primitive components derives from both empirical and computational considerations in addition to the limited number of components that can be discriminated from differences in nonaccidental properties among generalized cones. People are not sensitive to continuous metric variations, as evidenced by severe limitations in the human’s capacity for making rapid and accurate absolute judgments of quantitative shape variation^.^ The errors made in the memory for shapes also document an insensitivity to metric variations. Computationally, a limit is suggested by estimates of the number of objects we might know and the capacity for RBC to readily represent a far greater number with a limited number of primitives. A.
EMPIRICAL SUPPORT FOR
A
LIMIT
Although the visual system is capable of discriminatingextremely fine detail, I have been assuming that the number of volumetric primitives sufficient to model rapid human object recognition may be limited. It should be noted that the number of proposed primitives is greater than the three-cylinder, sphere, and cone-advocated by many how-to-draw books. Although these three may be sufficient for determining relative proportions of the parts of a figure and can furnish aid for perspective, they are not sufficient for the rapid identification of objects.6 Similarly, M a n and Nishihara’s (1978) pipe cleaner (viz., cylinder) ‘Absolute judgments are judgments made against a standard in memory (e.g., that shape A is 14 inches in length). Such judgments are to be distinguished from comparative judgments in which both stimuli are available for simultaneous comparison (e.g., that shape A, lying alongside shape B, is longer than B). Comparative judgments appear limited only by the resolving power of the sensory system. Absolute judgments are limited, in addition, by memory for physical variation. That the memory limitations are severe is evidenced by the finding that comparative judgments can be made quickly and accurately for differences so fine that tens of thousands of levels can be discriminated. But accurate absolute judgments rarely exceed 7 f 2 categories (Miller, 1956). 6Paul Cezanne is often incorrectly cited on this point. “Treat nature by the cylinder, the sphere, the cone, everything in proper perspective so that each side ofan object or plane is directed towards
24
Irving Biedennan
representations of animals (1978) would also appear to posit an insufficient number of primitives. On the page, in the context of other labeled pipe cleaner animals, it is certainly possible to arrive at an identification of a particular (labeled) animal (e.g., a giraffe). But the thesis proposed here would hold that the identifications of objects that were distinguished only by the aspect ratios of a single component type would require more time than if the representation of the object preserved its componential identity. In modeling only animals, it is likely that Marr and Nishihara capitalized on the possibility that appendages (e.g., legs and neck) can often be modeled by the cylindrical forms of a pipe cleaner. By contrast, it is unlikely that a pipe cleaner representation of a desk would have had any success. The lesson from Marr and Nishihara’s demonstration, even limited for animals, may well be that a single component, varying only in aspect ratio (and arrangement with other components), is insufficient for primal access. As noted earlier, one reason not to posit a representation system based on fine quantitative detail (e.g., many variations in degree of curvature) is that such absolutejudgments are notoriously slow and error prone unless limited to the 7 2 2 values argued by Miller (1956). Even this modest limit is challenged when the judgments have to be executed over a brief 100-msec interval (Egeth & Pachella, 1969) that is sufficient for accurate object identification. A further reduction in the capacity for absolute judgments of quantitative variations of a simple shape would derive from the necessity, for most objects, to make simultaneous absolute judgments for the several shapes that constitute the object’s parts (Miller, 1956; Egeth & Pachella, 1969). This limitation on our capacities for making absolute judgments of physical variation, when combined with the dependence of such variation on orientation and noise, makes quantitative shape judgments a most implausible basis for object recognition. RBC’s alternative is that the perceptual discriminations required to determine the primitive components can be made qualitatively, requiring the discrimination of only two or three viewpoint-independent levels of variation.’ Our memory for irregular shapes shows clear biases toward “regularization” (e.g., Woodworth, 1938). Amply documented in the classical shape memory literature was the tendency for errors in the reproduction and recognition of irregular shapes to be in a direction of regularization in which slight deviations from symmetrical or regular figures were omitted in attempts at reproduction. Alternatively, some irregularities were emphasized ( “accentuation”), typically
a central point” (italics mine; Cezanne, 1904/1941). Cezanne was referring to perspective, not the veridical representation of objects. 7Limitationon our capacities for absolute judgments also occurs in the auditory domain (Miller, 1956). It is possible that the limited number of phonemes derives more from this limitation for accessing memory for fine quantitative variation than it does from limits on the fineness of the commands to the speech musculature.
Visual Pattern Recognition
25
by the addition of a regular subpart. What is the significance of these memory biases? By the RBC hypothesis, these errors may have their origin in the m a p ping of the perceptual input onto a representational system based on regular primitives. The memory of a slight irregular form would be coded as the closest regularized neighbor of that form. If the irregularity was to be represented as well, an act that would presumably require additional time and capacity, then an additional code (sometimes a component) would be added, as with Bartlett’s (1932) idea of “schema with correction.”
CONSIDERATIONS: B. COMPUTATIONAL SUFFICIENT? ARE36 COMPONENTS Is there sufficient coding capacity in a set of 36 components to represent the basic level categorizations that we can make? Two estimates are needed to provide a response to this question: (1) the number of readily available perceptual categories, and (2) the number of possible objects that could be represented by 36 components. The number of possible objects that could be represented by 36 components will depend on the allowable relations among the components. Obviously, the value for estimate (2) would have to be greater than the value for estimate ( 1 ) if 36 components are to prove sufficient.
C. How MANYREADILY DISTINGUISHABLE OBJECTS Do PEOPLEKNOW? How might one arrive at a liberal estimate for this value? One estimate can be obtained from the lexicon. There are less than 1500 relatively common basic level object categories, such as chairs and elephants.8 If we assume that this estimate is too small by a factor of two, then we can assume potential classification into approximately 3000 basic level categories. As will be discussed, RBC holds that perception is based on the particular subordinate level object rather than the basic level category, so we need to estimate’the mean number of instances per basic level category that would have readily distinguishable exemThis estimate was obtained from three sources: (1) Several linguists and cognitive psychologists provided guesses of from 300 to lo00 concrete noun object categories. (2) The 6-year-old child can name most of the objects that he or she sees on television and has a vocabulary that is under 10,OOO words. Perhaps lo%, at most, are concrete nouns. (3) Perhaps the most defensible estimate was obtained from a sample of Webster’sseventh new collegiate dictionary. The author sampled 30 pages and counted the number of readily identifiable, unique concrete nouns that would not be subordinate to other nouns. Thus, “ w d thrush” was not counted because it could not be readily discriminated from “sparrow.” “Penguin” and “ostrich” and any doubtful entries were counted as separate noun categories. The mean number of nouns per page was 1.4; with a 1200-page dictionary, this is equivalent to 1600 noun categories.
26
Irving Biederman
plars. Almost all natural categories, such as elephants or giraffes, have one or only a few instances with differing componential description. Dogs represent a rare exception for natural categories in that they have been bred to have considerable variation in their descriptions. Person-made categories vary in the number of allowable types, but this number often tends to be greater than the natural categories. Cups, typewriters, and lamps have just a few (in the case of cups) to perhaps 15 or more (in the case of lamps) readily discernible exemplars. Let’s assume (liberally) that the mean number of types is 10. This would yield an estimate of 30,000 readily discriminable objects (3000 categories X 10 typeskategory). A second source for the estimate is the rate of learning new objects. A total of 30,000 objects would require learning an average of 4.5 objects per day every day for 18 years, the modal age of the subjects in the experiments described below. Although the value of 4.5 objects learned per day seems reasonable for a child in that it approximates the maximum rates of word acquisition during the ages of 2-6 years (Carey, 1978; Miller, 1977), it certainly overestimates the rate at which adults develop new object categories. The impressive visual recognition competence of a child of 6 , if it was based on 30,000 visual categories, would require the learning of 13.5 objects per day, or about 1 per waking hour. By the criterion of learning rate, 30,000 categories would appear to be a liberal estimate. AND THE D. RELATIONSAMONG THE COMPONENTS
REPRESENTATIONAL CAPACITY OF 36 COMPONENTS This calculation is dependent upon two estimates: (1) the number of components needed to uniquely define each object, and ( 2 ) the number of readily discriminable relations among the components. We will start with estimate ( 2 ) and see if it will lead to a plausible value for estimate (1). A possible set of relations is presented in Table I. Like the components, the properties of the relations noted in Table I are nonaccidental in that they can be determined from almost any viewpoint, are preserved in the two-dimensional image, and require the discrimination of only two or three levels. The specification of these five is conservative in that (1) it is a nonexhaustive set in that other relations can be defined, and ( 2 ) the relations are only specified for a pair, rather than triples, of components. Let’s consider these in order of their appearance on the table. Relative size. For any pair of components, C , and C,, C , could be much greater than, smaller than, or approximately equal to C,. Verticality. C, can be above or below C,, a relation, by the author’s estimate, that is defined for at least 80% of the objects. Thus, giraffes, chairs, and typewriters have a top-down specification of their components, but forks and knives do not.
Visual Pattern Recognition
27
TABLE I
GENERATIVE POWEROF 36 COMPONENTS 36
First component, C,
X
36
Second component, C2
X
3
Size (C, >> C2, C2 >> C,, CI = C2)
X
1.8 CI top or bottom (represented for 80% of the objects) X
2
Nature of join [end to end (off-center) or end to side (centered)]
X
2
Join at long or short surface of C,
X
2
Join at long or short surface of C2 = 55,987 possible two-component objects With three components: 55,987 x 36 x 46.2 = 87 million possible three-component objects; equivalent to learning 13,242 new objects every day (-827lwaking hour or 13/minute) for 18 years
Centering. The connection between any pair of joined components can be end to end (and of equal-sized cross section at the join), as the upper and forearms of a person, or end to side, producing one or two concavities, respectively (Marr, 1977). Two-concavity joins are far more common in that it is rare that two endto-end joined components will have equal-sized cross sections. A more general distinction might be whether the end of one component in an end-to-side join is centered or off-centered at the side of the other component. The end-to-end join might represent only the limiting, albeit special case of off-centered joins. In general, the arbitrary connection of any two volumes (or shapes) will produce two concavities. Hoffman and Richards (1985) discuss the production of concavities through the meeting of surfaces as a principle of transversality . Relative size ofsur&aces atjoin. Other than a sphere and a cube, all primitives will have at least a long and a short surface. The join can be on either surface. The attach6 case in Fig. 2a and the strongbox in Fig. 2b differ by the relative lengths of the surfaces of the brick that are connected to the arch (handle). The handle on the shortest surface produces the strongbox; on a longer surface, the attach6 case. Similarly, among other differences, the cup and the pail in Fig. 2c and d, respectively, differ as to whether the handle is connected to the long surface of the cylinder (to produce a cup) or the short surface (to produce a pail). In considering only two values for the relative size of the surface at the join, we are conservatively estimating the relational possibilities. Some volumes such as the wedge have as many as five surfaces, all of which can differ in size.
Irving Biederman
28
Representational Calculations
The 1296 different pairs of the 36 volumes (i.e., 362), when multiplied by the number of relational combinations, 43.2 (the product of the various values of the five relations), give us 55,987 possible two-component objects. If a third component is added to the two, then this value has to be multiplied by 1555 pairs of possible components (36 components X 43.2 ways in which the third component can be related to one of the two components) to yield 87 million possible threecomponent objects. If only 1% of the possible combinations of components were actually used (i.e., 99% redundancy), then the 36 components with the five relations could represent 870,000 possible objects. One would have to acquire 132 objects per day for 18 years (or about 8 per waking hour) to reach this value. This value constrains the estimate of the number of components per object that would be required for the unambiguous identification. If objects were distributed relatively homogeneously among combinations of relations and components, then only two or three components would be sufficient to unambiguously represent most objects! We do not yet know if there is a real limit to the number of components. A limit to the number of components would imply categorical effects such that quantitative variations in the contours of an object (e.g., degree .of curvature) which did not alter a component’s identity would have less of an effect on the identification of the object than contour variations that did alter a component’s identity.
X. Experimental Support for a Componential Representation According to the RBC hypothesis, the preferred input for accessing object recognition is that of the volumetric components. In most cases, only a few appropriately arranged volumes would be all that is required to uniquely specify an object. Rapid object recognition should then be possible. Neither the full complement of an object’s components nor its texture, color, or the full bounding contour (or envelope or outline) need be present for rapid identification. The problem of recognizing tens of thousands of possible objects becomes, in each case, just a simple task of identifying the arrangement of a few from a limited set of components. Overview of Experiments
Several object-naming reaction time experiments have provided support for the general assumptions of the RBC hypothesis, although none has provided tests for the specific set of components proposed by RBC. In all experiments, subjects
Visual Pattern Recognition
29
named briefly presented pictures of common objects. That RBC may provide a sufficient account of object recognition was supported by experiments indicating that objects drawn with only two or three of their components could be accurately identified from a single 100-msec exposure. When shown with a complete set of components, these simple line drawings were identified almost as rapidly as fullcolored, detailed, textured slides of the same objects. That RBC may provide a necessary account of object recognition was supported by a demonstration that degradation (contour deletion), if applied at the regions that are critical according to RBC, rendered an object unidentifiable. All the original experimental results reported here have received at least one (and often several) replication.
INCOMPLETE OBJECTS A. PERCEIVING Biederman et al. (1985) studied the perception of briefly presented partial objects lacking some of their components. A prediction of RBC was that only two or three components would be sufficient for rapid identification of most objects. If there was enough time to determine the components and their relations, then object identification should be possible. Complete objects would be maximally similar to their representation and should enjoy an identification speed advantage over their partial versions. 1. Stimuli
The experimental objects were line drawings of 36 common objects, 9 of which are illustrated in Fig. 10. The depiction of the objects and their partition into components were done subjectively, according to generally easy agreement among at least three judges. The artists were unaware of the set of components described in this article. For the most part, the components corresponded to the parts of the object. Seventeen component types were sufficient to represent the 180 components comprising the complete versions of the 36 objects. The objects were shown either with their full complement of components or partially, but never with less than two components. The first two components that were selected were the largest and most diagnostic components from the complete object, and additional components were added in decreasing order of size or diagnosticity, as illustrated in Figs. 11 and 12. Additional components were added in decreasing order of size and/or diagnosticity, subject to the constraint that the additional component be connected to the existing components. For example, the airplane, which required nine components to look complete, would have the fuselage and two wings when shown with three of the nine components. The objects were displayed in black line on a white background and averaged 4.5" in greatest extent. The purpose of this experiment was to determine whether the first few components that would be available from an unoccluded view of a complete object
30
Irving Biederman
Fig. 10. Nine of the experimental objects.
would be sufficient for rapid identification of the object. In normal viewing, the largest and most diagnostic components are available for perception. We ordered the components by size and diagnosticity because our interest, as just noted, was on primal access in recognizing a complete object. Assuming that the largest and most diagnostic components would control this access, we studied the contribu-
Fig. 1 I . Illustration of the partial and complete versions of two three-component objects (the wine glass and flashlight) and a nine-component object (the penguin).
Visual Pattern Recognition
31
Fig. 12. Illustration of partial and complete versions of a nine-component object (airplane).
tion of the nth largest and most diagnostic component, when added to the n - 1 already existing components, because this would more closely mimic the contribution of that component when looking at the complete object. (Another kind of experiment might explore the contribution of an “average” component by balancing the order of addition of the components. Such an experiment would be relevant to the recognition of an object that was occluded in such a way that only the displayed components would be available for viewing.) 2. Complexity The objects shown in Fig. 10 illustrate the second major variable in the experiment. Objects differ in complexity; by RBC’s definition, in the number of components that they require to look complete. For example, the lamp, flashlight, watering can, scissors, and elephant require two, three, four, six, and nine components, respectively. As noted previously, it would seem plausible that partial objects would require more time for their identification than complete objects, so that a complete airplane of nine components, for example, might be more rapidly recognized than only a partial version of that airplane with only three of its components. The prediction from RBC was that complex objects, furnishing more diagnostic combinations of components, would be more rapidly identified than simple objects. This prediction is contrary to those models that assume that objects are recognized through a serial contour tracing process (e.g., Hochberg, 1978; Ullman, 1983).
32
Irving Biederrnan
3. General Procedure
Trials were self-paced. The depression of a key on the subject’s terminal initiated a sequence of exposures from three projectors. First, the comers of a 500-msec fixation rectangle (6” wide) which corresponded to the comers of the object slide was shown. The fixation slide was immediately followed by a 100msec exposure of a slide of an object that had varying numbers of its components present. The presentation of the object was immediately followed by a 500-msec pattern mask consisting of a random-appearing arrangement of lines. The subject’s task was to name the object as fast as possible into a microphone which triggered a voice key. The experimenter recorded errors. Prior to the experiment, the subjects read a list of the object names to be used in the experiment. [Subsequent experiments revealed that this procedure for name familiarization produced no effect. When subjects were not familiarized with the names of the experimental objects, results were virtually identical to those obtained when such familiarization was provided. This finding indicates that the results of these experiments were not a function of inference over a small set of objects.] Even with the name familiarization, all responses that indicated that the object was identified were considered correct. Thus, “pistol,” “revolver,” “gun,” and “handgun” were all acceptable as correct responses for the same object. Reaction times (RTs) were recorded by a microcomputer which also controlled the projectors and provided speed and accuracy feedback on the subject’s terminal after each trial. Objects were selected that required two, three, six, or nine components to look complete. There were 9 objects for each of these complexity levels, yielding a total set of 36 objects. The various combinations of the partial versions of these objects brought the total number of experimental trials (slides) to 99. Each of 48 subjects viewed all the experimental slides, with balancing accomplished by varying the order of the slides. 4 . Results
Figure 13 shows the mean error rates as a function of the number of components actually displayed on a given trial for the conditions in which no familiarization was provided. Each function is the mean for the nine objects at a given complexity level. Although each subject saw all 99 slides, only the data for the first time that a subject viewed a particular object will be discussed here. For a given level of complexity, increasing numbers of components resulted in better performance, but error rates were modest. When only three or four components for the complex objects (those with six or nine components to look complete) were present, subjects were almost 90% accurate (10% error rate). In general, the complete objects were named without error, so it is necessary to look at the RTs to see if differences emerge for the complexity variable.
Visual Pattern Recognition
33
‘9 30
L
Number of Components
l.3
in Complete Object:
+
2 A... ..A 3 X---x 6 - 9
02
I
1
I
3
4
5
6
I
I
7
8
+ I
9
Number of Components Presented Fig. 13. Mean percentage of error as a function of the number of components in the displayed object (abscissa) and the number of components required for the object to appear complete (parameter). Each point is the mean for nine objects on the first occasion when a subject saw that particular object.
Mean correct RTs, shown in Fig. 14, provide the same general outcome as the errors, except that there was a slight tendency for the more complex objects, when complete, to have shorter RTs than the simple objects. This advantage for the complex objects was actually underestimated in that the complex objects had longer names (three and four syllables) and were less familiar than the simple objects. Oldfield (1959) showed that object-naming RTs were longer for names that have more syllables or are infrequent. This effect of slightly shorter RTs for naming complex objects has been replicated, and it seems safe to conclude, conservatively, that complex objects do not require more time for their identification than simple objects. This result is contrary to serial contour tracing models of shape perception (e.g., Hochberg, 1978; Ullman, 1983). Such models would predict that complex objects would require more time to be seen as complete compared to simple objects, which have less contour to trace. The slight RT advantage enjoyed by the complex objects is an effect that would be expected if their additional components were affording a redundancy gain from more possible diagnostic matches to their representations in memory.
B. LINEDRAWINGS VERSUS COLORED PHOTOGRAPHS The components that are postulated to be the critical units for recognition can be depicted by a line drawing. Color and texture would be secondary routes for recognition. From this perspective, Biederman and Ju (1985) reasoned that nam-
Irving Biederman
34
I100 r
t
Number of Components in Complete Object:
+
2
A.....A 3 X---
x 6
- 9
c
0
700
F Number of Components Presented
Fig. 14. Mean correct reaction time as a function of the number of components in the displayed object (abscissa) and the number of components required for the object to appear complete (parameter). Each point is the mean for nine objects on the first occasion when a subject saw that particular object.
ing RTs for objects shown as line drawings should closely approximate naming RTs for those objects when shown as colored photographic slides with complete detail, color, and texture. In the Biederman and Ju experiments, subjects identified brief presentations (50-100 msec) of slides of common objects. Each object was shown in two versions: professionally photographed in full color or as a simplified line drawing showing only the object’s major components (such as those in Fig. 10). Color and lightness were diagnostic for some of the objects (e.g., banana, fork, fish, camera), but not others (e.g., chair, pen, mitten, bicycle pump). In three experiments subjects named the object; in a fourth experiment a yes-no verification task was performed against a target name. Overall, performance levels with the two types of stimuli were equivalent: mean latencies in identifying images presented by color photography were 11 msec shorter than the drawing, but with a 3.9% higher error rate. An occasional advantage for the color slides was likely due to a more faithful rendition of the object’s components rather than any use of color for recognition: The advantage for the colored slides .was independent of whether its color was diagnostic of its identity. Moreover, there was no color diagnosticity advantage-much less an increased advantage-of the color slides on the verification task, where the color of the to-be-verified object could be anticipated. If color mediated recognition, then targets such as banana, when
Visual Pattern Recognition
35
shown as a color slide, should have enjoyed an increased advantage over their line-drawn versions compared to targets such as chair. This failure to find a color diagnosticity effect, when combined with the finding that simple line drawings can be identified so rapidly as to approach the naming speed of fully detailed, textured, colored photographic slides, supports the premise that the earliest access to a mental representation of an object can be modeled as a matching of a line drawing representation of a few simple components. Such componential descriptions are thus sufficient for primal access. Surface characteristics can be instrumental in defining edges and are powerful determinants of visual search, but may play only a secondary role in speeded recognition.
C. THEPERCEPTION OF DEGRADED OBJECTS Evidence that a componential description may be necessary for object recognition (under conditions where contextual inference is not possible) derives from experiments on the perception of objects which have been degraded by deletion of their contour (Biederman & Blickle, 1985). RBC holds that parsing of an object into components is performed at regions of concavity. The nonaccidental relations of collinearity and curvilinearity allow filling in: They extend broken contours that are collinear or smoothly curvilinear. In concert, the two assumptions of (1) parsing at concavities and (2) filling in through collinearity or smooth curvature lead to a prediction as to what should be a particularly disruptive form of degradation: If contours were deleted at regions of concavity in such a manner that their endpoints, when extended through collinearity or curvilinearity, bridge the concavity, then the components would be lost and recognition should be impossible. The cup in the right column of the top row of Fig. 15 provides an example. The curve of the handle of the cup is drawn so that it is continuous with the curve of the cylinder forming the back rim of the cup. This form of degradation in which the components cannot be recovered from the input through the nonaccidental properties is referred to as nonrecoverable degradation and is illustrated for the objects in the right column of Fig. 15. An equivalent amount of deleted contour in a midsection of a curve or line should prove to be less disruptive as the components could then be restorel! through collinearity or curvature. In this case, the components should be recoverable. Examples of recoverable forms of degradation are shown in the middle column of Fig. 15. In addition to the procedure for deleting and bridging concavities, two other applications of nonaccidental properties were employed to prevent determination of the components: (1) Vertices were altered by deleting one or two of their segments so that forks or Y’s were made into L’s or line segments, often producing
36
Irving Biederman
Fig. 15. Example of five stimulus objects in the experiment on the perception of degraded objects. The left column shows the original intact versions. The middle column shows the recoverable versions. The contours have been deleted in regions where they can be replaced through collinearity or smooth curvature. The right column shows the nonrecoverable versions. The contours have been deleted at regions of concavity so that collinearity or smooth curvature of the segments bridges the concavity. In addition, vertices have been altered (e.g.. from Y’s to L’s) and misleading symmetry and parallelism introduced.
a simple planar surface, as illustrated in the stool in Fig. 15; and, (2) misleading symmetry and parallelism were introduced, as in the spout of the watering can and the parallel edges of the surfaces among the fungs of the stool (Fig. 15). Even with these techniques, it was difficult to remove all the components, and some remained in nominally nonrecoverable versions, as with the handle of the scissors. Subjects viewed 35 objects in both recoverable and nonrecoverable versions. Prior to the experiment, all subjects were shown several examples of the various forms of degradation for several objects that were not used in the experiment. In addition, familiarization with the experimental objects was manipulated between subjects. Prior to the start of the experimental trials, different groups of six subjects (1) viewed a 3-second slide of the intact version of the objects, for example, the objects in the left column of Fig. 15, which they named, (2) were provided with the names of the objects on their terminal, or (3) were given no
visual p.#ern Recognition
37
familiarization. As in the prior experiment, the subject’s task was to name the objects. A glance at the second and third columns in Fig. 15 is sufficient to reveal that one doesn’t need an experiment to show that the nonrecoverable objects would be more difficult to identify than the recoverable versions. But we wanted to determine if the nonrecoverable versions would be identifiable at extremely long exposure durations (5 sec) and whether the prior exposure to the ifitact version of the object would overcome the effects of the contour deletion. The effects of .contour deletion in the recoverable condition were also of considerable interest when compared to the comparable conditions from the partial object experiments. 1 . Results
The error data are shown in Fig. 16. Identifiability of the nonrecoverable stimuli was virtually impossible: The median error rate for those slides was 100%. Subjects rarely guessed wrong objects in this condition. Almost always they merely said that they “don’t know.” In those few cases where a nonrecoverable object was identified, it was for those instances where some of the components were not removed, as with the circular rings of the handles of the scissors. Even at 5 sec, error rates for the nonrecoverable stimuli, especially in the name and no familiarization conditions, was extraordinarily high. (Data for the 5 sec exposure duration are not shown in Fig. 16.) Objects in the recoverable condition were named at high accuracy at the longer exposure durations. As in the previous experiments, there was no effect of familiarizing the subjects with the names of the objects compared to the condition in which the subjects were provided with no information about the objects. There was some benefit, however, of providing intact versions of the pictures of the objects. Even with this familiarity, performance in the nonrecoverable condition was extraordinarily poor, with error rates exceeding 60% when subjects had a full 5 sec for deciphering the stimulus. As noted previously, even this value underestimated the difficulty of identifying objects in the nonrecoverablecondition in that identification was possible only when the contour deletion allowed some of the components to remain recoverable. The emphasis on the poor performance in the nonrecoverablecondition should not obscure the extensive interference that was evident at the brief exposure durations in the recoverable condition. The previous experiments had established that intact objects, without picture familiarization, could be identified at nearperfect accuracy at 100 msec. At this exposure duration, error rates for the recoverable stimuli in the present experiment, whose contours could be restored through collinearity and curvature, were -65%. The high error rates at 100-msec exposure duration suggest that these filling in processes require both time (on the
Irving Biederman
38
90 - \ \
Unrecoverable
\
80
70
60 L
' 2
c
-t -
\
50
c
Q)
E 240 c 0
s30
Recoverable \
20 '
I0
\
\
Name-None
'
Picture 400
I
I
200
750
Exposure Duration (msec) Fig. 16. Mean percentage of errors in object naming as a function of exposure duration, nature of contour deletion (recoverable vs nonrecoverable components), and prefamiliarization (none, name, or picture). No differences were apparent between the none and name pretraining conditions, so they have been combined into one function.
order of 200 msec) and an image-not merely a memory representation-to be successfully executed. The dependence of componential recovery on the availability of contour and time was explored parametrically by Biederman and Blickle (1985). To produce the nonrecoverable versions of the objects, it was necessary to delete or modify the vertices. The recoverable versions of the objects tended to have their contours deleted in midsegment. It is possible that some of the interference in the nonrecoverable condition was a consequence of the removal of vertices rather than the production of inappropriate components. The experiment also compared these two loci (vertex or midsegment) as sites of contour deletion. Contour
Visual Pattern Recognition
39
deletion was performed either at the vertices or at midsegments for 18 objects, but without the accidental bridging of components through collinearity or curvature that was characteristic of the nonrecoverable condition. The percentage of contour removed was also varied with values of 25, 45, and 65% removal, and the objects were shown for 100, 200, or 750 msec. Other aspects of the procedure were identical to the previous experiments, with only name familiarization provided. Figure 17 shows an example for a single object. The mean percentages of errors are shown in Fig. 18. At the briefest exposure duration and the most contour deletion (100-msec exposure duration and 65% contour deletion), removal of the vertices resulted in considerably higher error rates than the midsegment removal, 54 and 31% errors, respectively. With less contour deletion or longer exposures, the locus of the contour deletion had only a slight effect on naming accuracy. Both types of loci showed a consistent improvement with longer exposure durations, with error rates below 10% at the 750-msec duration. By contrast, the error rates in the nonrecoverable condition in the prior experiment exceeded 75%, even after 5 sec. We conclude that the filling in of contours, whether at midsegment or vertex, is a process that can be completed within 1 sec. But the suggestion of a misleading component through collinearity or curvature that bridges a concavity produces an image that cannot index the original object, no matter how much time there is to view the image. Locus of Deletion Proportion Contour
At Midsegment
At Vertex I
I -
Fig. 17. Illustration for a single object of 25, 45, and 65%contour removal centered at either midsegment or vertex.
Irving Biederman
40
60Contour Deletion
50
---
-
At Vertex At Midsegment
40-
w + ac8 LE
30-
c
0
g
20-
40
-
Percent Contour Deletion Fig. 18. Mean percentage of object-naming errors as a function of locus of contour removal (midsegment or vertex), percentage of removal, and exposure duration.
-
4000
-
-
f
950
z
900-
Y
F
Exposure Duration Contour Deletion At Vertex At Midsegment
---
.-cc0
B 0
850-
+ 0
?!
3
8oo-
s"
750
0 c
-
4
L
I
25
I
I
45
65
Percent Contour Deletion Fig. 19. Mean correct object-naming reaction time (milliseconds) as a function of locus of contour removal (midsegment or vertex), percentage of removal, and exposure duration.
Visual Pattern Recognition
41
Although accuracy was less affected by the locus of the contour deletion at the longer exposure durations and the lower deletion proportions, there was a consistent advantage on naming latencies of the midsegment removal, as shown in Fig. 19. (The lack of an effect at the 100-msec exposure duration with 65% deletion is likely a consequence of the high error rates for the vertex deletion stimuli.) This result shows that if contours are deleted at a vertex, they can be restored as long as there is no accidental filling in, but the restoration will require more time than when the deletion is at midsegment. Overall, both the error and RT data document a striking dependence of object identification on what RBC assumes to be a prior and necessary stage of componential determination.
2. Perceiving Degraded versus Partial Objects Consider Fig. 20 which shows for some sample objects one version in which whole components are deleted so that only three (of six or nine) of the components are present and another version in which the same amount of contour is removed, but in midsegment distributed over all of the object’s components. Component
Complete
Deletion
Midsegmeni Deletion
Fig. 20. Sample stimuli with equivalent proportion of contours removed either at midsegments or as whole components.
Irving Biederman
42
With objects with whole components deleted, it is unlikely that the missing components are added imaginally prior to recognition. Logically, one would have to know what object was being recognized to know what parts to add. Instead, indexing (addressing) a representation most likely proceeds in the absence of the parts. The two methods for removing contour may thus be affecting different stages. Deleting contour in midsegment affects processes prior to and including those involved in the determination of the components (Fig. 3). The removal of whole components (the partial object procedure) is assumed to affect the matching stage, reducing the number of common components between the image and the representation and increasing the number of distinctive components in the representation. Contour filling in is typically regarded as a fast, lowlevel process. We (Biederman, Beiring, Ju, & Blickle, 1985) studied the naming speed and accuracy of six- and nine-component objects undergoing these two types of contour deletion. At brief exposure durations (e.g., 65 msec), performance with partial objects was better than objects with the same amount of contour removed in midsegment (Figs. 21 and 22). At longer exposure durations (200 msec), the RTs reversed, with the midsegment deletion now faster than the partial objects. Our interpretation of this result is that although a diagnostic subset of a few components (a partial object) can provide a sufficient input for recognition, the activation of that representation (or its elicitation of a name) is not optimal
\
\
-A
Midsegment Deletion I
65
I
ioo
1
200
Exposure Duration (msec) Fig. 21. Mean percentage of errors of object naming as a function of the nature of contour removal (deletion of midsegments or components) and exposure duration.
Visual Pattern Recognition
iooo ’020L
-g
43
t\ \
980 -
E
d
Q)
.c
I-
960
-
c
0 .+
8
a
940-
c 0
8 920 e!
\
0
“t
5”
\ -‘4
Midsegment Deletion
900
4
400
65
200
Exposure Duration (msec) Fig. 22. Mean correct reaction time (milliseconds) in object naming as a function of the nature of contour removal (deletion at midsegments or components) and exposure duration.
compared to a complete object. Thus, in the partial object experiment described previously, recognition RTs were shortened with the addition of components to an already recognizable object. If all of an object’s components were degraded (but recoverable), recognition would be delayed until contour restoration was completed. Once the filling in was completed and the complete complement of an object’s components was available, a better match to the object’s representation would be possible (or the elicitation of its name) than with a partial object that had only a few of its components. We are currently attempting to formally model this result. More generally, the finding that partial complex objects-with only three of their six or nine components present-can be recognized more readily than objects whose contours can be restored through filling in documents the efficiency of a few components for accessing a representation. 3. Contour Deletion by Occlusion
The degraded recoverable objects in the right columns of Fig. 15 have the appearance of flat drawings of objects with interrupted contours. Biederman and
44
Irving Biederman
Blickle ( 1985)designed a demonstration of the dependence of object recognition on componential identification by aligning an occluding surface so that it appeared to produce the deletions. If the components were responsible for an identifiable volumetric representation of the object, we would expect that with the recoverable stimuli, the object would complete itself under the occluding surface and assume a three-dimensionalcharacter. This effect should not occur in the nonrecoverable condition. This expectation was met as shown in Figs. 23 and 24. These stimuli also provide a demonstration of the time (and effort?) requirements for contour restoration through collinearity or curvature. We have not yet obtained objective data on this effect, which may be complicated by masking effects from the presence of the occluding surface, but we invite the reader to share our subjective impressions. When looking at a nonrecoverable version of an object in Fig. 23, no object becomes apparent. In the recoverable version in Fig. 24, an object does pop into a three-dimensional appearance, but most observers report a delay (our own estimate is -500 msec) from the moment the stimulus is first fixated to when it appears as an identifiable three-dimensional entity. This demonstration of the effects of an occluding surface to produce contour interruption also provides a control for the possibility that the difficulty in the nonrecoverable condition was a consequence of inappropriate figure-ground groupings, as with the stool in Fig. 15. With the stool, the ground that was
Fig. 23. Nonrecoverable version of an object where the contour deletion is produced by an occluding surface.
Visual Pattern Recognition
45
Fig. 24. Recoverable version of an object where the contour deletion is produced by an occluding surface. The object is the same as that shown in Fig. 23. The reader may note that the threedimensional percept in this figure does not occur instantaneously.
apparent through the rungs of the stool became figure in the nonrecoverable condition. (In general, however, only a few of the objects had holes in them where this could have been a factor.) This would not necessarily invalidate the RBC hypothesis, but merely would complicate the interpretation of the effects of the nonrecoverable noise in that some of the effect would derive from inappropriate grouping of contours into components and some of the effect would derive from inappropriate figure-ground grouping. That the objects in the nonrecoverable condition remain unidentifiable when the contour interruption is attributable to an occluding surface suggests that figure-ground grouping cannot be the primary cause of the interference from the nonrecoverable deletions. D. SUMMARY AND IMPLICATIONS OF THE EXPERIMENTAL RESULTS
The sufficiency of a component representation for primal access to the mental representation of an object was supported by two results: (1) that partial objects with two or three components could be readily identified under brief exposures, and (2) comparable identification performance between the line drawings and color photography. The experiments with degraded stimuli established that the components are necessary for object perception. These results suggest an underlying principle by which objects are identified.
46
Irving Biederman
XI. Componential Recovery Principle The results and phenomena associated with the effects of degradation and partial objects can be understood as the workings of a single principle of componential recovery: If the components in their specified arrangement can be readily identified, object identification will be fast and accurate. In addition to those aspects of object perception for which experimental research was described previously, the principle of componential recovery might encompass at least four additional phenomena in object perception: (1) Objects can be more readily recognized from some orientations than others (orientation variability); (2) objects can be recognized from orientations not previously experienced (object transfer); (3) articulated (or deformable) objects, with variable componential arrangements, can be recognized even when the specific configuration might not have been experienced previously (deformable object invariance); and (4)novel, instances of a category can be rapidly classified (perceptual basis of basic level categories). A.
ORIENTATION VARIABILITY
Objects can be more readily identified from some orientations compared to other orientations (Palmer, Rosch, & Chase, 1981). According to the RBC hypothesis, difficult views will be those in which the components extracted from the image are not the components (and their relations) in the representation of the object. Often such mismatches will arise from an “accident” of viewpoint where an image property is not correlated with the property in the three-dimensional world. For example, when the viewpoint in the image is parallel to the major components of the object, the resultant foreshortening converts one or some of the components into surface components, such as disks and rectangles in Fig. 25, which are not included in the componential description of the object. In addition, as illustrated in Fig. 25, the surfaces may occlude otherwise diagnostic components. Consequently, the components extracted from the image will not readily match the mental representation of the object, and identification will be much more difficult compared to an orientation, such as that shown in Fig. 26, which does convey the components. A second condition under which viewpoint affects identifiability of a specific object arises when the orientation is simply unfamiliar, as when a sofa is viewed from below, or when the top-bottom relations among the components are perturbed, as when a normally upright object is inverted. Palmer et al. (1981) conducted an extensive study of the perceptibility of various objects when presented at a number of different orientations. Generally, a three-quarters front view was most effective for recognition. Their subjects showed a clear preference for such views. Palmer el al. termed this effective and
Visual Pattern Recognition
41
Fig. 25. A viewpoint parallel to the axes of the major components of a common object.
preferred orientation of the object its canonical orientation. The canonical orientation would be, from the perspective of RBC, a special case of the orientation that would maximize the match of the components in the image to the representation of the object. An apparent exception to the preference for three-quarters frontal view preference was the finding of Palmer et al. (1981) that frontal (facial) views enjoyed some favor in viewing animals. But there is evidence that routines for processing faces have evolved to differentially respond to cuteness (Hildebrandt, 1982; Hildebrandt & Fitzgerald, 1983), age (e.g., Mark & Todd, 1985), and emotion and threats (e.g., Coss, 1979; Trivers, 1985). Faces may thus constitute a special stimulus case in that specific mechanisms have evolved to respond to biolog-
Fig. 26. The same object as in Fig. 25, but with a viewpoint not parallel to the major components.
48
Irving Biederman
ically relevant quantitative variations, and caution may be in order before results with face stimuli are considered as characteristic of the perception of objects in general. B. TRANSFERBETWEEN DIFFERENT VIEWPOINTS When an object is seen at one viewpoint or orientation, it can often be recognized as the same object when subsequently seen at some other viewpoint, even though there can be extensive differences in the retinal projections of the two views. The componential recovery principle would hold that transfer between two viewpoints would be a function of the componential similarity between the views. This could be experimentally tested through priming studies, with the degree of priming predicted to be a function of the similarity (viz., common minus distinctive components) of the two views. If two different views of an object contained the same components, RBC would predict that aside from effects attributable to variations in aspect ratio, there should be as much priming as when the object was presented at an identical view. An alternative possibility to componential recovery is that a presented object would be mentally rotated (Shepard & Metzler, 1971) to correspond to the original representation. But mental rotation rates appear to be too slow and effortful to account for the ease and speed in which transfer occurs between different orientations. There may be a restriction on whether a similarity function for priming effects will be observed. Although unfamiliar objects (or nonsense objects) should reveal a componential similarity effect, the recognition of a familiar object, whatever its orientation, may be too rapid to allow an appreciable experimental priming effect. Such objects may have a representation for each orientation that provided a different componential description. Bartram’s (1974) results support this expectation that priming effects might not be found across different views of familiar objects. Bartram performed a series of studies in which subjects named 20 pictures of objects over eight blocks of trials. [In another experiment, Bartram (1 976) reported essentially the same results from a same-different name-matching task in which pairs of pictures were presented.] In the identical condition, the pictures were identical acorss the trial blocks. In the different view condition, the same objects were depicted from one block to the next, but in different orientations. In the different exemplar condition, different exemplars, for example, different instances of a chair, were presented, all of which required the same response. Bartram found that the naming RTs for the identical and different view conditions were equivalent, and both were shorter than control conditions, described below, for concept and response priming effects. Bartram theorized that observers automatically compute and access all possible three-dimensional viewpoints when viewing a given object. Alternatively, it is possible that there was high componential similarity across the different views, and the experiment was
Visual Pattern Recognition
49
insufficiently sensitive to detect slight differences from one viewpoint to another. However, in four experiments with colored slides, we (Biederman & Lloyd, 1985) failed to obtain any effect of variation in viewing angle and have thus replicated Bartram’s basic effect (or lack of an effect). At this point, our inclination is to agree with Bartram’s interpretation, with somewhat different language, but restrict its scope to familiar objects. It should be noted that both Bartram’s and our results are inconsistent with a model that assigned heavy weight to the aspect ratio of the image of the object or postulated an underlying mental rotation function. WITHIN C. DIFFERENT EXEMPLARS
AN
OBJECTCLASS
Just as we might be able to gauge the transfer between two different views of the same object based on a componentially based similarity metric, we might be able to predict transfer between different exemplars of a common object, such as two different instances of a lamp or chair. Bartram (1974) also included a different exemplar condition in which different objects with the same name (e.g., different cars) were depicted from block to block. Under the assumption that different exemplars would be less likely to have common components, RBC would predict that this condition would be slower than the identical and different view conditions, but faster than a different object control condition with a new set of objects that required different names for every trial block. This was confirmed by Bartram. For both different views of the same object as well as different exemplars (subordinates) within a basic level category, RBC predicts that transfer would be based on the overlap in the components between the two views. The strong prediction would be that the same similarity function that predicted transfer between different orientations of the same object would also predict the transfer between different exemplars with the same name. D. THEPERCEPTUAL BASIS OF BASICLEVELCATEGORIES Consideration of the similarity relations among different exemplars with the same name raises the issue as to whether objects are most readily identified at a basic as opposed to a subordinate or superordinate level of description. The componential representations described here are representations of specific subordinate objects, though their identification was always measured with a basic level name. Much of the research suggesting that objects are recognized at a basic level has used stimuli, often natural, in which the subordinate level had the same componential description as the basic level objects. Only small componential differences or color or texture distinguished the subordinate level objects.
50
Irving Biederman
Thus, distinguishing Asian elephants from African elephants or Buicks from Oldsmobiles requires fine discriminations for their verification. It is not at all surprising that with these cases basic level identification would be most rapid. On the other hand, many human-made categories, such as lamps, or some natural categories, such as dogs (which have been bred by humans), have members that have componential descriptions that differ considerably from one exemplar to another, as with a pole lamp versus a ginger jar table lamp, for example. The same is true of objects that are different from a prototype, as penguins or sports cars. With such instances, which unconfound the similarity between basic level and subordinate level objects, perceptual access should be at the subordinate (or instance) level, a result supported by a recent report by Jolicoeur, Cluck, and Kosslyn ( 1984). It takes but a modest extension of the componential recovery principle to problems of the similarity of objects. Simply put, similar objects will be those that have a high degree of overlap in their components and in the relations among these components. A similarity measure reflecting common and distinctive components (Tversky, 1977) may be adequate for describing the similarity among a pair of objects or between a given instance and its stored or expected representation, whatever their basic or subordinate level designation. E. THEPERCEPTION OF NONRIGID OBJECTS Many objects and creatures, such as people and telephones, have articulated joints that allow extension, rotation, and even separation of their components. There are two ways in which such objects can be accommodated by RBC. One possibility is that independent structural descriptions are necessary for each sizable alteration in the arrangement of an object’s components. For example, it may be necessary to establish a different structural description for Fig. 27a than for Fig. 27d. If this were the case, then a priming paradigm might not reveal any priming between the two stimuli. Another possibility is that the relations among the components can include a range of possible values (Marr & Nishihara, 1978). In the limit, with a relation that allowed complete freedom for movement, the relation might simply be joined. Even that might be relaxed in the case of objects with separable parts, as with the handset and base of a telephone. In that case, it might be either that the relation is nearby, or else different structural descriptions are necessary for attached and separable configurations. Empirical research needs to be done to determine if less restrictive relations, such as join or nearby, have measurable perceptual consequences. It may be the case that the less restrictive the relation, the more difficult the identifiability of the object. Just as there appear to be canonical views of rigid objects (Palmer et al., 1981), there may be a canonical “configuration” for a nonrigid object. Thus, Fig. 27d might be identified as a woman more slowly than Fig. 27a.
Visual Pattern Recognition
a
b
C
51
d
Fig. 27. Four configurations of a nonrigid object.
XII.
Conclusion
To return to the analogy with speech perception made in Section 11, the characterization of object perception that RBC provides bears close resemblance to many modem views of speech perception. In both cases, one has a modest set of primitives: in speech, the 55 or so phonemes that are sufficient to represent almost all words of all the languages on earth; in object perception, perhaps, a limited number of simple components. The ease by which we are able to code tens of thousands of words or objects may derive less from a capacity for making exceedingly fine physical discriminations than from allowing a free combination of a modest number of categorized primitives.
ACKNOWLEDGMENTS This research was supported by the Air Force Office of Scientific Research (Grant F4962083C0086). I would like to express my deep appreciation to Tom Blickle and Ginny Ju for their invaluable contributions to all phases of the empirical research described in this article. Thanks are also due to Mary Lloyd, John Clapper, Elizabeth Beiring, and Robert Bennett for their assistance in the conduct of the experimental research. Aspects of the manuscript profited through discussions with James R. Pomerantz, John Artim, and Brian Fisher.
REFERENCES Attneave, F. (1983). Prignanz and soap bubble systems: A theoretical exploration. In J. Beck, B. Hope, & A. Rosenfeld (Eds.), Human and machine vision. New York: Academic Press. Ballard, D., & Brown, C. M. (1982). Cornpurer vision. Englewood Cliffs, NJ: Prentice-Hall.
52
Irving Biederman
Barrow, H. G., & Tenenbaum, J. M. (1981). Interpreting line drawings as three-dimensional surfaces. Artificial Intelligence, 17, 75-1 16. Bartlett, F. C. (1932). Remembering. Cambridge: Cambridge University Press. Bartram. D. (1974). The role of visual and semantic codes in object naming. Cognitive Psychology, 6, 325-356. Bartram, D. (1976). Levels of coding in picture-picture comparison tasks. Memory and Cognition. 4, 593-602. Beck, J., Prazdny, K., & Rosenfeld, A. (1983). A theory of textural segmentation. In J. Beck, B. Hope, & A. Rosenfeld (Eds.), Human and machine vision. New York: Academic Press. Biederman, I. (1981). On the semantics of a glance at a scene. In M. Kubovy & J. R. Pomerantz (Eds.), Perceptual organization. Hillsdale, NJ: Erlbaum. Biederman, I., Beiring, E., Ju, G., & Blickle. T. (1985). A comparison ofthe perception of partial vs degraded objects. Unpublished manuscript. State University of New York at Buffalo. Biederman, I., & Blickle, T. (1985). The perception of objects with deleted contours. Unpublished manuscript. State University of New York at Buffalo. Biederman, I., & Ju, G. (1985). The perceptual recognition of objects depicted by line drawings and color photography. Unpublished manuscript. State University of New York at Buffalo. Biederman, I., Ju, G., & Clapper, J. (1985). The perception ofpartial objects. Unpublished manuscript. State University of New York at Buffalo. Biederman, I., & Lloyd, M. (1985). Experimental studies of transfer across different object views and exemplars. Unpublished manuscript. State University of New York at Buffalo. Binford, T. 0. (1971). Visual perception by computer. IEEE Systems Science and Cybernetics Conference, Miami, December. Binford. T. 0. (1981). Inferring surfaces from images. Artificial Intelligence. 17, 205-244. Brady, M. (1983). Criteria for the representations of shape. In 1. Beck, B. Hope, & A. Rosenfeld (Eds.), Human and machine vision. New York: Academic Press. Brady, M.,& Asada, H. (1984). Smoothed local symmetries and their implementation. International Journal of Robotics Research, 3, 3. Brooks, R. A. (1981). Symbolic reasoning among 3-D models and 2-D images. Artificial Intelligence, 17, 205-244. Carey, S. (1978). The child as word learner. In M. Halle, J. Bresnan, & G. A. Miller (Eds.), Linguistic theory and psychological reality. Cambridge, MA: MIT Press. Cezanne, P. (1904/1941).Letter to Emile Bernard. In J. Rewald (Ed.), Paul Cezanne’s letters (M. Kay, Trans.). London: B. Cassirrer. Chakravarty, 1. (1979). A generalized line and junction labeling scheme with applications to scene analysis. IEEE Transactions, PAMI, April, 202-205. Checkosky, S. D., & Whitlock, D. (1973). Effects of pattern goodness on recognition time in a memory search task. Journal of Experimental Psychology, 100, 341-348. Connell, J. H. (1985). Learning shape descriptions: Generating and generalizing models of visual objects. Unpublished master’s thesis, Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA. Coss, R. G. (1979). Delayed plasticity of an instinct: Recognition and avoidance of 2 facing eyes by the jewel fish. Developmental Psychobiology, 12, 335-345. Egeth, H., & Pachella, R. (1969). Multidimensional stimulus identification. Perception and Psychophysics, 5 , 341-346. Fildes, B. N., & Triggs, T. J. (1985). The effect of changes in curve geometry on magnitude estimates of road-like perspective curvature. Perception and Psychophysics, 37, 218-224. Gamer, W. R. (1974). The processing of information and structure. New York: Wiley. Gamer, W. R. (1962). Uncertainty and structure as psychological concepts. New York: Wiley. Guzman, A. (1971). Analysis of curved line drawings using context and global information. Machine intelligence (Vol. 6). Edinburgh: Edinburgh Univ. Press.
Visual Pattern Recognition
53
Hildebrandt, K. A. (1982). The role of physical appearance in infant and child development. In H. E. Fitzgeral, E. Lester, & M. Youngman (Eds.), Theory and research in behavioral pediatrics (Vol. I ) . New York: Plenum. Hildebrandt, K. A., & Fitzgerald, H. E. (1983). The infant’s physical attractiveness: Its effect on bonding and attachment. Infant Mental Health Journal, 4, 3-12. Hochberg, J. E. (1978). Perception (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall. Hoffman, D. D. & Richards, W. (1985). Parts of recognition. Cognition, 18, 65-96. Humphreys, G. W. (1983). Reference frames and shape perception. CognitivePsychology, 15, 151196. Ittleson, W. H. (1952). The Ames demonstrations in perception. New York: Hafner. Jolicoeur, P., Gluck, M. A., & Kosslyn, S. M. (1984). Picture and names: Making the connection. Cognitive Psychology, 16, 243-275. Ju, G., Biederman, I . , & Clapper, J. (1985, April). Recognirion-by-components:A theory of image interpretation. Paper presented at the meetings of the Eastern Psychological Association, Boston, MA. Julesz, B. (1981). Textons, the elements of texture perception, and their interaction. Nature (London) 290, 91-97. Kanade, T. (1981). Recovery of the three-dimensional shape of an object from a single view. Artificial Intelligence, 17, 409-460. King, M., Meyer, G. E., Tangney, J., & Biederman, 1. (1976). Shape constancy and a perceptual bias towards symmetry. Perception and Psychophysics, 19, 129-136. Kroll, J. F., & Potter, M. C. (1984). Recognizing words, pichms, and concepts: A comparison of lexical, object, and reality decisions. Journal of Verbal Learning and Verbal Behavior, 23, 3966. Lowe, D. (1984). Perceptual organization and visual recognition. Unpublished doctoral dissertation, Department of Computer Science, Stanford University, Stanford, CA. Mark, L. S.,&Todd, J. T. (1985). Describing perception information about human growth in terms of geometric invariants. Perception and Psychophysics, 37, 249-256. Marr, D. (1977). Analysis of occluding contour. Proceedings of the Royal Sociery OfLondon B, 197, 441 -475. Marr, D. (1982). Vision. San Francisco: Freeman. Marr, D., & Nishihara, H. K. (1978). Representation and recognition of three-dimensional shapes. Proceedings of the Royal Society of London B , 200, 269-294. Marslen-Wilson, W. (1980). Optimal eficiency in human speech processing. Unpublished manuscript, Max Planck Institue fiir Psycholinguistik, Nijmegen, The Netherlands. McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception. Part I: An account of basic findings. Psychological Review, 42, 375-407. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97. Miller, G. A. (1977). Spontaneous apprentices: Children and language. New York: Seabury. Neisser, U. (1963). Decision time without reaction time: Experiments in visual scanning. American Journal of Psychology, 76, 376-385. Neisser, U. (1967). Cognitive psychology. New York: Appleton. Oldfield, R . C. (1966). Things, words, and the brain. Quarterly Journal of Experimental Psychology, 18, 340-353. Oldfield, R. C., & Wingfield, A. (1965). Response latencies in naming objects. Quarterly Journal of Experimental Psychology, 17, 273-28 1. Palmer, S. E. (1980). What makes triangles point: Local and global effects in configurations of ambiguous triangles. Cognitive Psychology, 12, 285-305. Palmer, S., Rosch, E., & Chase, P. (1981). Canonical perspective and the perception of objects. In J. Long & A. Baddeley (Eds.), Attention and performunce (Vol. 9). Hillsdale, NJ: Erlbaum.
54
Irving Biederman
Penrose, L. S., & Penrose, R. (1958). Impossible objects: A special type of illusion. British Journal Of PSychoIogy. 49, 31-33. Perkins, D. N. (1983). Why the human perceiver is a bad machine. In J. Beck, B. Hope, & A. Rosenfeld (Eds.), Human and machine vision. New York: Academic Press. Perkins, D. N., & Deregowski, J. (1982). A cross-cultural comparison of the use of a Gestalt perceptual strategy. Perception, 11, 279-286. Pornerantz, J. R. (1978). Pattern and speed of encoding. Memory and Cognition, 5 , 235-241. Pomerantz, J . R., Sager, L. C., & Stoever, R. J. (1977). Perception of wholes and their component parts: Some configural superiority effects. Journal of Experimental Psychology: Human Perception and Perjormance, 3,422-435. Rock, 1. (1984). Perception. New York: Freeman. Rosch, E., Mervis, C. B., Gray, W., Johnson, D., & Boyes-Braem, P. (1916). Basic objects in natural categories. Cognitive Psychology, 8, 382-439. Rosenthal, S. (1984). The PF474. Byte, 9, 247-256. Ryan, T., & Schwartz, C. (1956). Speed of perception as a function of mode of representation. American Journal of Psychology, 69, 60-69. Shepard, R . N., & Metzler, J . (1971). Mental rotation of three-dimensional objects. Science. 171, 701-703. Sugihara, K . (1984). An algebraic approach to shape-from-image problems. Artificial Intelligence. 23, 59-95. Treisman, A. (1982). Perceptual grouping and attention in the visual search for features and for objects. Journal of Experimental Psychology: Human Perception and Performance, 8, 194214. Treisman, A. M., & Schmidt, H. (1982). Illusory conjunctions in the perception of objects. Cognitive Psychology, 14, 107-141. Treisman, A., & Gelade, C. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97-136. Trivers, R. (1985). Social evolution. Menlo Park: Benjamin/Cummings. Tversky, A . (1977). Features of similarity. Psychological Review, 84, 327-352. Tversky, B., & Hemenway, K. (1984). Objects, parts, and categories. Journal of Experimental Psychology: General. 113, 169- 193. Ullrnan, S . (1983). Visual routines. Artificial Intelligence Laboratory. Memo No. 723, MIT, Cambridge, MA. Virsu, V. (1971a). Tendencies to eye movements and misperception of curvature, direction, and length. Perception and Psychophysics. 9, 65-72. Virsu, V. (1971b). Underestimation of curvature and task dependence in visual perception of form. Perception and Psychophysics. 9, 339-342. Waltz, D. (1975). Understanding line drawings of scenes with shadows. In P. Winston (ed.), The psychology of computer vision. New York: McGraw-Hill. Winston, P. A. (1975). Learning structural descriptions from examples. In P. H. Winston (Ed.), The psychology of computer vision. New York: McGraw-Hill. Witkin, A. P., & Tenenbaum, J . M. (1983). On the role of structure in vision. In J . Beck, B. Hope, & A. Rosenfeld (Eds.), Human and machine vision. New York: Academic Press. Woodworth, R. S. (1938). Experimental psychology. New York: Holt.
ASSOCIATIVE STRUCTURES IN INSTRUMENTAL LEARNING Ruth M . Colwill and Robert A. Rescorla DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF PENNSYLVANIA PHILADELPHIA, PENNSYLVANIA 19104
I.
Introduction
In instrumental learning, the likelihood of behavior changes as a result of its consequences. This learning process has been a major focus of experimental psychology. Many naturally occurring instances of learning seem to fit this paradigm, and substantial energy has gone into its analysis in the laboratory. The intention of this article is to consider the nature of the associative mechanisms involved in a particular sort of instrumental learning, that in which an animal’s action produces a positive outcome. It is common to acknowledge three major elements in any instrumental learning situation: a response that changes in probability, a reinforcer that is contingent upon that response, and a stimulus in the presence of which that contingency takes place. In the typical case, repeated exposure to the instrumental contingency results in an increased likelihood of the response occurring in the presence of the stimulus. For example, one commonly studied instance involves rat subjects in operant chambers. In that case, making food contingent upon lever pressing produces enhanced lever pressing in the chamber. Theories attempting to explain such changes in instrumental behavior have typically appealed to simple associative mechanisms, but they have differed in the selection of elements between which associations are assumed to form. Three different associative structures have dominated theoretical discussions. 1. The possibility that appealed to many early psychologists is that an association is formed between the response and the stimulus in the presence of which the response is reinforced (Guthrie, 1952; Hull, 1943). The assumed growth of an association between some antecedent stimulus ( S ) and the response (R) seemed to account most naturally for the observation that the response becomes more likely during the stimulus. In this S-R theory, the role of the contingent event is literally to reinforce this S-R association. The reinforcer does not itself become THE PSYCHOLOGY OF LEARNING AND MOTIVATION. VOL. 20
55
Copyright 0 19x6 by Academic Press. Inc. All rights of repmduclion in any form reserved.
56
Ruth M. ColwW and Robert A. Rescorla
part of the associative structure; it simply serves as a kind of catalyst facilitating the formation of an association between two other events, the response and the antecedent stimulus, For many early writers there was an obvious parallel to evolutionary theory: The reinforcer was seen as the analog to natural selection, sampling successful S-R contiguities from the array that occurred whenever the animal behaved during the stimulus. One particularly appealing feature of such a mechanism was its ability to generate behavior that appeared to be purposive or goal directed without actually involving any encoding of the goal itself. This view of instrumental learning so dominated thinking during the 1940s and 1950s that discussion turned from examination of the nature of the underlying associative structure to exploration of the properties that an event needed in order to be a reinforcer (e.g., Premack, 1965; Sheffield, 1966). 2. Many authors, however, have felt that this simple S-R alternative fails to capture the richness of an animal’s knowledge after instrumental training. Various kinds of evidence (some of which we review here) indicate that the animal has more knowledge of the reinforcer than is allowed by this S-R position. Several authors have suggested that this evidence could be accommodated by acknowledging a second association, that between the antecedent stimulus and the reinforcer. Many have argued that instrumental learning situations contain within them the conditions necessary for Pavlovian conditioning: When a response is reinforced during a stimulus, that stimulus is also explicitly paired with the reinforcer. According to two-process accounts, this Pavlovian S-reinforcer association occurs in parallel with the instrumental S-R association and provides the means for encoding information about the reinforcer. Some theorists (e.g., Rescorla & Solomon, 1967; Spence, 1956) give this Pavlovian association motivational properties, whereas others (e.g., Trapold & Overmier, 1972) see it primarily as playing a mediational role in which feedback from the Pavlovian response provides an additional source of stimulus support for the instrumental response. But an important consequence of both versions of two-process theories is that the instrumental reinforcer plays two roles: a catalyst for the S-R association and an associate for the S. Although the reinforcer is not represented as part of the fundamental instrumental association, it is encoded as part of a parallel Pavlovian association that forms in the course of instrumental learning. 3. Recently, it has become increasingly popular to view instrumental learning in a way that has somewhat more intuitive appeal: as an association between the response and the reinforcer (Bolles, 1972; Mackintosh & Dickinson, 1979; Tolman, 1933). According to this view, which represents a return to the earlier ideas of Konorski and Miller (1937), the organism learns the very relationship that the experimenter most carefully arranges, that the response produces the reinforcer. The animal directly encodes the goal as associated with the response. An especially attractive feature of this interpretation is that it may allow application of much of the theoretical power that has been developed for the explanation
Associative Structures in Instrumental Learning
51
of Pavlovian conditioning. A response-reinforcer view of instrumental learning parallels the widely held stimulus-reinforcer account of Pavlovian conditioning. It might then be possible to go some distance toward an understanding of the associations underlying instrumental learning by applying the rules uncovered for Pavlovian conditioning. In the discussion that follows, we present some recent evidence from our laboratory relevant to evaluating these possibilities. The structure of that discussion is as follows: First, we consider evidence suggesting that the organism forms response-reinforcer associations-that the reinforcer plays a role beyond that of a catalyst by entering into associations with antecedent responses. We describe in detail two sorts of data recently collected in our laboratory and briefly review several other types of historically important evidence. Second, we consider the problem of separating the response-reinforcer view from a two-process alternative. Many of the data described in Section I1 clearly demonstrate that the organism learns about the reinforcer; but they are less clear in deciding whether the reinforcer is encoded in terms of a response-reinforcer or stimulus-reinforcer association. Section 111 discusses these alternatives and describes some data favoring the response-reinforcer view. Third, we consider the role that the stimulus might play in an account of instrumental learning'that rests primarily on a response-reinforcer association. Throughout the discussion we emphasize the techniques and logic that allow analysis of the associative structure of instrumental learning as much as the answers that these techniques yield in the particular cases that we have studied. These techniques are primarily ones that were originally developed for the study of associative structures in Pavlovian conditioning but turn out to have considerable power and generality for studying various instances of associative learning. 11. Evidence for Response-Reinforcer Associations
In this section we describe in detail two procedures for identifying responsereinforcer associations. Both are derived from parallel procedures that have been quite successful in analyzing the structure of Pavlovian conditioning, and both yield clear evidence for response-reinforcer associations. We then briefly review the results of several other techniques that have been used to identify responsereinforcer associations. Finally, we discuss the generality of the finding that instrumental training results in response-reinforcer learning. A.
POSTCONDITIONING CHANGES OF THE REINFORCER
Perhaps the most straightforward way of detecting encoding of the reinforcer is to manipulate separately the value of the reinforcer after learning has taken place. We can then inspect the animal's likelihood of continuing the instrumental
58
Ruth M. Colwill and Robert A. Rescorla
performance in the absence of any further reinforcer deliveries. To the degree that changing the value of a particular reinforcer produces a specific change in the probability of responses that it has previously reinforced, we have evidence for a response-reinforcer association. This kind of logic has proved to be extremely successful in analyzing the associative structure of Pavlovian conditioning. Following the pairing of two stimuli, S2 and S 1, one can identify an S2-S 1 association by changing the value of S1 and inspecting the response to S2. Under many circumstances changes in the value of S 1 modify the response to S2, suggesting that S 1 was encoded as an associate of S2 (e.g., Rashotte. Griffin, & Sisk, 1977; Rescorla, 1979, 1980). Under other circumstances, the response to S2 is relatively impervious to changes in the value of S l , suggesting some other associative structure (e.g., Amiro & Bitterman, 1980; Cheatle & Rudy, 1978; Holland & Rescorla, 1975; Nairne & Rescorla, 1981; Rizley & Rescorla, 1972). Although we do not yet have an adequate characterization of the determinants of these different outcomes, it is clear that the postconditioning change technique can be a valuable analytic tool. Attempts to apply this tool to the case of instrumental learning have also led to a variety of results. Some authors have found evidence that a good deal of instrumental behavior persists after a change in the value of the reinforcer or reinforcer-correlated stimuli (e.g., Adams, 1980, 1982; Garcia, Kovner, & Green, 1970; Holman, 1975; Morgan, 1974; Morrison & Collyer, 1974; Tolman, 1933; Wilson, Sherman, & Holman, 1981). Others have found results that encourage the inference of a response-reinforcer association (e.g., Adams, 1982; Adams & Dickinson, 1981b; Chen & Amsel, 1980; Dickinson, Nicholas, & Adams, 1983; Khavari & Eisman, 1971; Krieckhaus & Wolf, 1968; Miller, 1935; St. Claire-Smith & MacLaren, 1983; Tolman & Gleitman, 1949). A particularly compelling example of sensitivity of the instrumental response to changes in the value of the reinforcer was recently reported by Colwill and Rescorla (1985a). They used a within-subjects design in which rats were trained on two different instrumental responses (lever pressing and chain pulling), each associated with a different reinforcer (sucrose liquid or Noyes pellets). Then each animal received pairings of one reinforcer with a lithium chloride (LiCl) toxin in an attempt to decrease its value artificially. The other reinforcer was presented but not poisoned. After this differential treatment of the reinforcers, the animals were once again given access to the instrumental response manipulanda and tested in the absence of the reinforcers. The question of interest was whether the animals would prefer to make the response whose reinforcer had not been devalued by pairing with toxin, thereby displaying knowledge of the specific response-reinforcer contingency. Because this experiment will serve as a prototype in subsequent analyses, we describe the procedure in somewhat more detail. After magazine training and one
Associative Structures in Instrumental Learning
59
session of continuous reinforcement on each response, animals received variable interval (VI) training on each manipulandum. They received, with each manipulandum, one 16-min session on a VI 30-sec schedule and then one 20-min session on a VI 60-sec schedule. Then the manipulanda were removed from the chambers and the animals were given five 2-day cycles of flavor-aversion training. On odd-numbered days the animals received 30 deliveries of one reinforcer, given at a rate of 1 /min. On each of these days, the session terminated in a 0.5% body weight intraperitoneal injection of 0.6 M LiCl. On even-numbered days the other reinforcer was delivered in the same manner, but no toxin was administered. Conditioning of this sort is extremely successful; on the last conditioning cycle, the animals consumed a mean of 0.1 and 30 of the poisoned and nonpoisoned reinforcers, respectively. Finally, each animal was given a 20-min extinction test during which it had simultaneous access to both instrumental response manipulanda. Figure 1 shows the results of that test, separated according to reinforcer identity and poisoning treatment. It is clear that for both reinforcers, instrumental responding was profoundly affected by poisoning of that reinforcer. Animals showed substantially lower response rates on the manipulandum whose reinforcer had been poisoned. The specificity of that depression implies that the Sucrose Reinforcer
Pellet Reinforcer
o Not poiaoned 0 Poiaoned
L-
OO
1
2
3
4 5 0 1 2 Blocks of 4 minutes
3
4
5
Fig. I . Sensitivity of the instrumental response to reinforcer devaluation. Mean responses per minute during the extinction test, shown separately for responses that had been reinforced by sucrose (left panel) or by Noyes pellets (right panel). An aversion had been conditioned to one reinforcer (solid symbols), but not to the other (open symbols). From Colwill and Rescorla (1985a). 0 1985 by the American Psychological Association.
60
Ruth M. Colwill and Robert A. Rescorla
animal encoded the reinforcer identity as part of its knowledge about the instrumental learning situation. A similar result can be obtained if the reinforcer is devalued by motivational means. In a companion experiment, Colwill and Rescorla (1985a) found that selectively satiating the animal on the reinforcer earned by one response led to a selective depression in the rate of making that response. These results thus suggest that instrumental performance is appropriate to the current value of the reinforcer when either motivational or associative procedures are used to devalue that reinforcer. It is worth noting two methodological features of these demonstrations of the impact of postconditioning changes in the value of the reinforcer. First, notice that all of the instrumental training and the reinforcer devaluation manipulations took place in the same chamber. This means that any general effects of the reinforcer devaluation manipulations on the chamber or on responding per se cannot account for the differential performance. Second, these experiments attempted to maximize the similarity between the conditions under which the reinforcer was earned and those under which it was devalued. The devaluation procedure involved delivery of the reinforcer at approximately the same rate and in the same chamber as it had been earned during instrumental training. This matching may be crucial for encouraging the animal to identify the reinforcer undergoing a change in value as being the same as the response-contingent reinforcer. Other results (e.g., Adams, 1982) suggest that the animal is extremely sensitive to relatively minor differences in the mode of delivery of the reinforcer. Failure to match the details of the manner in which the reinforcer is delivered may have contributed to earlier failures to find devaluation effects like those reported here (see Colwill & Rescorla, 1985a, for a fuller discussion). B. CONTINGENCY EFFECTS
A second line of evidence indicating the development of response-reinforcer associations comes from the study of reinforcer contingencies. In recent years it has become clear that Pavlovian conditioning can depend heavily on the contingency between the conditioned stimulus (CS) and the unconditioned stimulus (US), as distinguished from their simple contiguity. One result that has encouraged that view is the adverse effect of presenting USs in the interval between CSs. If one holds constant the number of USs that occur during a CS, varying the frequency of USs at other times can produce dramatic variations in conditioning of that CS (e.g., Durlach, 1983; Rescorla, 1968). That result suggests that the animal is sensitive not simply to the frequency of CS-US pairings, but rather to the contingency between the two events. Hammond (1980) and Dickinson and Chamock (1985) have recently demonstrated parallel results for instrumental training. A lever press that results in food
Associative Structures in Instrumental Learning
61
will be acquired less well if food otherwise occurs at a high rate in the absence of lever pressing. Some recent experiments in our laboratory have attempted to use that observation to analyze the nature of instrumental learning. The notion was that if instrumental responding depends on learning a response-reinforcer association, then presenting that same reinforcer in the absence of the response should have a more devastating effect than would presenting a different, but equivalently valued reinforcer. In order to provide a well-controlled and sensitive test of this proposition, we trained rats to make two different instrumental responses (lever press and chain pull), each leading to a particular reinforcer (liquid sucrose or Noyes pellet). Then we added response-independent presentations of one of the two reinforcers and inspected the consequences for each of the behaviors. If the animal encodes which reinforcer follows each response, then the adverse consequences of a free reinforcer should be more severe for the response that otherwise earned that particular reinforcer. The rats were trained on what Hammond (1980) has called a “constant probability” schedule. In this procedure, the session is divided into 1-sec intervals, and the reinforcer is delivered with some probability at the end of each interval. After some initial training, the animals were exposed to 14 sessions each with the lever and with the chain. The probability of a reinforceriwas set at the value p for each second that contained a response. The values of p were .5, .25, .lo, and .10 for the first four training days; p was set at .05 for the remaining 10 days. Then all animals received two sessions during which both the lever and the chain were available, and the probability of each reinforcer was set separately and independently at .05 for each second containing a response. Throughout this training, the probability of a reinforcer was set at zero for intervals in which no responding occurred. For the next 15 sessions both manipulanda remained present, but deliveries of one reinforcer were added in the absence of responding. The probability of that reinforcer was set at .05 both for intervals that contained the appropriate response and for intervals lacking that response. The other reinforcer continued to be delivered with a probability of .05 only in intervals containing the other response. Figure 2 displays the results of those manipulations. To the left are shown the relatively high rates of responding prior to the introduction of response-independent reinforcers. The middle portion of the figure shows the consequences of introducing free deliveries of one reinforcer. Unearned reinforcer deliveries produced an immediate drop in the rate of both responses. That loss may partly be due to the increased time spent consuming the unearned reinforcers. But more interestingly, free reinforcers produced a more substantial loss in the response which otherwise earned that particular reinforcer. The right-hand portion of Fig. 2 shows the results of a subsequent extinction test carried out in the absence of all reinforcers. During that test, the two
Ruth M. Colwill and Robert A. Rescorla
62
a
-
8-
Q)
3
C
5 & a
SWINGHIT) FAIL) (PHISICN-COYPETITIONWITH (230 SWINGHIT)) DIFFICULT-ACT (230
(WANT m m (230 SWINGHIT) SUCCEED) (PHYSICN-CoMPEIITIONWIM (237 THRGW))
4
(7~11~1
(SIyE1
V(PUILR2
?PLAYER1
PM
(PHYSICAL-COMPETITION ($TIME2 SACTZ))) (103 SWINGHIT)) (WANT PREVENT (103 SWINGHIT) FAIL) (PHISICALXOYPETITION wIm (103 SWINGHIT)) DIFFICULT-ACT
(101 ( W M
'SWINGHIT))) (OPPOSINGTEAMS $PLAYER1 SPUIERa))
9 ( A C I ("I CEO SAC12
THROW
DIFFICULT-ACT
WIM
(PHYSICAL-COMPETITION ($TIME1 THROW)))
wIm
mow))
aam
(103 SWINGHIT) SOCCEXD)
(PHYSICN-COYPETITION WITH (101 THROW))
DIFFICULT-ACT MATCH
(200 THROW))
(202 SWINGMISS))
( W A N T ENABLE
SWINGMISS) SUCCEED) (PHYSICAL-COMPETITION WITH (202 SWINGMISSQ) (202
( W A(202 N T (NOT SWINGMISS)) EXECUIE
FAIL) (PHYSICAL-COMPETITIONWITH (ZOO T H R O W ) ) )
SUBSEPUWnY OBSERVED EPISODES
SWING-AND-MISS-EPISODE (Co.pstltlvs Hypotheses)
Fig. 9. Type I prediction.
Knowledge-Directed Machine Learning
223
-Y should be observed; if X is a sufJicient condition for Y, then whenever X is observed, Y should be observed. Based on the need for a rule to be both sufficient and necessary, BASEBALL makes two predictions of events that should NOT be observed: (1) It should not be the case that if -X is observed, then Y is also observed, and (2) it should not bet the case that if X is observed, -Y is observed. The former case disconfirms that X is a necessary condition for Y,while the latter case disconfirms that X is a sufficient condition for Y. As an example of the former prediction for necessity, BASEBALL predicts that if it first observes -([ON B 1 FIRSTBASE] OCCURS-BEFORE [CATCH A3 FIRSTBASE BALL]), that is, A3 CATCHES the BALL before B 1 executes ON FIRSTBASE, and then it observes [ON B1 FIRSTBASE], then OCCURS-BEFORE is not a necessary condition for the B 1 to remain ON FIRSTBASE. Thus, an observation of such a prediction would then have the effect of invalidating the hypothesis upon which the prediction was based. Predictions of this type are also passed to AF where they are matched against the subsequent input. If either a necessity or a sufficiency prediction (a Type I1 prediction) is found to occur, the confidence value for the hypothesis on which the prediction is based is made to go negative. The motivation here is that one negative piece of evidence for a hypothesis serves to effectively eliminate it; however, positive evidence (Type I predictions) does not necessarily confirm a hypothesis, but only serves to increase the confidence in that hypothesis. C. FROMHYFQTHESISTO TRUTH BASEBALL is given a threshold for accepting hypotheses as truth. The dangers involved with the choice of a threshold are well known: If the threshold is set too low, then false hypotheses may be turned into truths; if the threshold is set too high, not all the truths will be discovered. Once a hypothesis has been accepted as a truth, it is used to modify the confidence values of other hypotheses: Those whose goals are consistent with the new truth have their confidence value increased by a constant, while those that are inconsistent have their confidence value decreased by a constant. For example, when BASEBALL accepts the hypothesis that getting ON FIRSTBASE is the intended goal of the batter in an infield single episode, then it can decrease the confidence value of an inconsistent hypothesis from an outfield single episode that says getting ON FIRSTBASE is not what was intended by the batter. The use of acquired knowledge to aid in the evaluation of other hypotheses is risky business: It assumes that a hypothesis verified in one context is relevant to the verification (or elimination) of hypotheses put forward in other contexts. In order to soften the effect of this procedure, the constant by which the confidence values on the hypotheses are modified has been kept low. Finally, in a complete learning system, the ability to recover from an incorrect
Elliot Soloway
224
hypothesis accepted as true would be provided. However, detecting when one’s view of a situation is out of whack and then deciding what to do about it is no mean feat! Currently, BASEBALL is not equipped with such backtracking capability. If it accepts a falsehood as truth and proceeds to eliminate other actual truths as being falsehoods, it cannot recover.
D. THEINTERACTION BETWEEN GENERALIZATION AND
EVALUATION
The interaction between generalization and evaluation is a subtle and troublesome one. Generalization tends to relax constraints on hypotheses, and thus a generalized hypothesis has more contexts in which its predictions can potentially be matched. One implication of this observation is that incorrect hypotheses which are overgeneralized have a tendency to become verified. Since BASEBALL currently has no error recovery procedures, our strategy has been to decrease the likelihood of an overgeneralization and thus decrease the chance of verifying an incorrect hypothesis. The conservative generalization technique (CDDG) described in Section IV,B is the key to this strategy. In Section VI we compare the performance of this technique with another more ambitious generalization technique with regard to this issue.
VI. Experiments Since practical considerations prevent us from running BASEBALL on enough games of baseball to be statistically meaningful, we must be satisfied with a more qualitative method of evaluation. In particular, we will present several runs of BASEBALL in which we vary different parameters of the system in order to illustrate some of the points made earlier (e.g., the interplay between generalization and evaluation). In the runs described below, BASEBALL was fed a continuous string of 1680 snapshots containing 105 episodes; the distribution of those 105 episodes is given in Table IV. The ordering of those episodes was random and thus does not conform to the rules of baseball. A baseball fan would no doubt be upset by the sequence of events generated in this manner; for example, in one episode string, a fielder’s choice follows a double play. However, since BASEBALL does not possess knowledge which could link episodes together, this simplification of the game does not affect BASEBALL’Ssense of propriety. The episodes do contain the variability necessary to test the generalization capabilities of the system; various values for the location, player, and timing features are present in the data. Finally, while the details of AF’s processing can be found in Soloway (1978), it is sufficient here to say that AF reduced the number of snapshots per
Knowledge-Directed Machine Learning
225
TABLE IV
PRESENTED TO BASEBALL EPISODES NAME OF EPISODE i 2 3
4 5 8
7 8
9
10 11 12 13 14
15 18 17 18 19
Infield single Infield Croundout Outfield Single-I Outfield Single-I1 Outfield Single-I11 Infield Flyout Outfield Flyout Outfield Double Out at Secondbase Infield Single plus Baserunner-I Infield Single plus Baserunner-I1 Infield Single plus Baserunnner-I11 Double-play Fielder's Choice-I Fielder's Choice-I1 Fielder's Choice-I11 Fielder's Choice-IV Swing-and-miss Throw-and-noswing
NUMBER OF TIMES IT APPEARS 3 8
3 3 4 8 9
3 3 2
2 2 3 3 2 2
4
18 27 105
episode and the number of pattern descriptions per snapshot significantly. In what follows, then, we report only on the processing carried out by levels beyond AF. A.
EXPERIMENT 1 : A REPRESENTATIVE RUN
In this first run, we will try to illustrate some of the key strengths and weaknesses of BASEBALL. We will focus on the levels of processing starting with the applications of the causal link schemata. In particular, Table Va depicts the total number of CLS activations;the numbers were computed on the basis of one example from each of the 19 episode types (Table IV). In accordance with the rules of baseball, we evaluated whether a competitive hypothesis was correct and found that 75% of them were correct. On the average, 5.7 hypotheses of competitive and cooperative interactions were put forward in an episode (Table Vc). The numbers in Table V provide some evaluation of the following question: Did the domain knowledge given to BASEBALL guide it directly to an understanding of the specific events in baseball? In that table, a competitive interaction simply indicates that some competitive relationship was hypothesized to exist between two opposing players; based on one example from each of the 19 episode types, there were 59 such interactions hypothesized. However, some-
Elliot Soloway
226
TABLE V PERFORMANCE STATISTICS OVERALL STATISTICS FOR THE CAUSAL-LINK SCHEMAS
(a)
TOTAL # OF TOTAL # OF TOTAL # OF COMPETITIVE COMPETITIVE COMPETITIVE HYPOTHESES* HYPOTHESES AND COOPERATIVE HYPOTHESES
I
109
(b)
40
TOTAL # OF INCORRECT COMPETITIVE HYPOTHESES+
sa
68
17
PERCENTAGE OF CORRECT COMPETITIVE HYPOTHESES
75%
I
DISTRIBUTION OF COMPETITIVE HYPOTHESES TOTAL # OF ORDER-OFOCCURRENCE COMPETITION HYPOTHESES
TOTAL # OF PHYSICALCOMPETITION
(C)
TOTAL. # OF CORRECT COMPETITIVE HYPOTHESES+
F/S
S/F
17
18
TOTAL # OF STATEOFDISTINCUISHEDOBJECTCOMPETITION HYPOTHESES
TOTAL # OF LOGICAL COMPETITION HYPOTHESES
F/S
S/F
F/S
S/F
F/S
S/F
9
13
5
3
a
2
AVERAGE NUMBER OF CLS ACTIVATIONS PER EPISODE; AVC. # OF COOPERATIVE HYPOTHESES
AVE. # OF COMPETITIVE HYPOTHESES
2.1
(d)
TOTAL # OF DISTINCT COMPETITIVE INTERACTIONS
69
3.6
AVC. # OF COMPETITIVE/COOPERATIVE HYPOTHESES 5.7
ALTERNATIVE INTERPRETATIONS TOTAL # OF COMPETITIVE HYPOTHESES
69
AVG. # OF COMPETITIVE HYPOTHESES PER COMPETITIVE INTERACTION
1.2
MAXIMUM 17 OF ALTERNATIVE COMPETITIVE HYPOTHESES PER COMPETITIVE INTERACTION 3
Ignores trivial physical cooperation hypothese. Correctness and incorrectness was determined by human evaluation in accordance with the rules of baseball. 0
b
Knowledge-Directed Machine Learning
227
times more than one CLS was put forth as an explanation for a competitive interaction; that is, on the average, 1.2 competitive hypotheses were suggested for each competitive interaction. The maximum number of alternative hypotheses put forward for a competitive interaction was 3. Quite frankly, not all that many alternatives were generated; we would have felt better if the average number of alternative competitive hypotheses put forward was higher. Before displaying the classes of events developed by BASEBALL in this run, let us review some aspects of the generalization and verification processes. First, both processes function in “real time”; as the data are observed, episodes are generalized and verified (or eliminated). Since no external teacher is employed in the former process, BASEBALL must form its own classes. In particular, an episode class is formed on the basis of two (or more) episodes having the same hypothesized competitive and cooperative interactions. Within an episode class, classes of individual interactions are formed. Such classes and the episodes they compose are evaluated on the basis of a confidence value. In this run, the confidence value on a hypothesis must reach 6 before it is accepted as truth. An episode is considered to be verified if the final competitive hypothesis and half of the other competitive hypotheses are verified. BASEBALL’Soverall box score for this run is depicted in Tables VIa and b. While the correct number of episode classes represented by the 19 episode types was 12, BASEBALL actually formed 31 classes and accepted (verified) 10 of them as correct (Table VIa). Multiple and erroneous hypotheses caused a large number of classes to be formed initially. However, as the erroneous hypotheses were eliminated, the episode classes containing those hypotheses were also eliminated. Ten episode classes were finally verified, nine of which were deemed correct inasmuch as they were based on a correct analysis of the observed activity (Fig. 10). The one incorrect episode class which should not have been verified was a class composed of flyouts; in these episodes, BASEBALL mistakenly hypothesized that a competitive timing relationship existed between the outfielder who was CATCHing the BALL and the batter who was reaching FIRSTBASE. Some classes one might expect to be formed were not; BASEBALL did not verify hypotheses which would have established a class of outfield singles or those which would have established a class in which the pitcher throws the BALL and the batter does not swing his bat. Also, a large number (14) of episode classes were left “undecided.” These classes were formed at the end of the run and really should have been merged with episode classes formed earlier. The problem was that a number of those classes were verified or eliminated before the features in the episodes were fully generalized. Once verified or eliminated, the features became frozen in their undergeneralized state and thus were no longer able to accommodate new episodes. In Table VIb we break down the verified episode classes along another interesting dimension: level of generality. In Section IV,B, we described the rather
Elliot Soloway
228
TABLE VI
Box SCOREFOR RUN 1 Episodes Formed = 31
Verif led Eliminated Undecided
14
Episodes Verified = 10
I
Overgeneralized
2
I
Correctly Generalized
4
I
Undergenerallzed
3
I I
Episodes Which Should Not Have Been Verified
conservative process by which features were generalized; variables were substituted for features which were constrained to match only those values that had already been observed. This generalization strategy resulted in a number of episode classes ( 5 ) being verified before they were sufficiently generalized. For example, three classes of groundouts were formed, two of which were verified (see Fig. lo), where each class covered only a portion of the possibly observable episodes of that type. On the other hand, infield singles and doubles were overgeneralized into the same class because of the order in which they were observed; that is, in the descriptions of the infield single and outfield double provided to BASEBALL, there is little to which the system attends which discriminates between them; in both types of episodes a fielder throws the ball to the
TYPE OF EPISODE
EPISODE
CUSS
Out at Secondbase Groundout at Firstbase Infield Flyout Outfield Flyout Flelder’s Choice-I Fielder’s Choice-I1 Fielder’s Choice-111Fielder’s choice-Iv
Infield Single plus Baserunner-I11
Infield Single Outfield Single-I Outfield Single-I1 Outfield Single-I11 Throw-and-sringmiss Throw-and-nosring
CG
.Strike.
UC means undergeneralized, CG means correctly generalized. OC means overgenerallzed, * indicates an episode class which should not have been verified since the analysis was incorrect.
Fig. 10. Levels of classes formed by BASEBALL. UG, undergeneralized;CG, correctly generalized;OG, overgeneralized. * indicates an episode class which should not have been verified, since the analysis was incorrect.
230
Elliot Soloway
baseman while the runner arrives at that base just before the opposing fielder at the base catches the ball. At a higher lever of abstraction, BASEBALL formed three classes on the basis of only the final competitive goal of an episode (see Fig. 10). We have supplied the labels hit, strike, and out; they summarize the concepts acquired in the schemas. In the fielder’s choice episodes, there were two final competitive goals, one for the batter and one for the runner; either the batter failed or the runner failed with his goal of getting ON some base. Thus, this class of episodes participates in two classes-hit and out-at this higher level of abstraction. There were many (93) classes of individual competitive interactions formed. However, the majority of these classes were identical; for example, the competitive interaction between pitcher and the opposing batter appears in each of the episode classes. Thus, when duplicates were eliminated, the 36 classes of individual competitive interactions which were verified reduces to only 8 different classes. It is these classes in their production rule representation which are added to the set of causal link schemas and used to make specific inferences of goals and relationships in subsequently observed activity. Table VII lists the English equivalents of several of the learned rules.
2: THE EFFECTS B. EXPERIMENT OF ALTERNATIVE GENERALIZATION STRATEGIES
In this section, we shall analyze the performance of two generalization techniques: constrained data-directed generalization (CDDG) and unconstructed data-directed generalization UDDG. Both techniques accommodate observed differences in the data by inserting a variable into the generalized pattern description to replace differing constants. In the former technique, that variable is constrained to subsequently match only pattern descriptions whose value for that feature is a member of the set of already observed values; for example, a variable would be inserted into the generalized pattern description for the differing location features in (batter hits ball to centerfield) and (batter hits ball to rightfield) which would be constrained to match only instances of this pattern description in which the ball was hit to either centerfield or rightfield. The results presented in Experiment 1 were obtained using this technique. Alternatively, in UDDG a variable is substituted which is allowed to match any value for that feature. The choice of generalization strategy is important, since our task differs in two crucial respects from most generalization situations reported in the literature: ( 1) The data over which BASEBALL will generalize are not necessarily correct, and (2) the data have not been partitioned into the correct classes by an external source. It is the interaction of these two problems which causes the trouble; we shall see that each technique can cope with one of the problems, but not with the other. In particular, we shall analyze the behavior of these two techniques with
Knowledge-Directed Machine Learning
23 1
TABLE VII LEARNED RULESEXPRESSED IN ENGLISH 1.
I f a pl ayer a t HOMEPLATE HITS a BALL which was THROWn from t h e PITCHER'S MOUND by a member of t h e opposing team. Then Hypothesize
2.
t h a t t h e b a t t e r wanted t o HIT t h e BALL and t h u s SUCCEEDed with h i s g o a l ; t h e p i t c h e r wanted t o p r e v e n t t h e b a t t e r from performing t h a t a c t i o n and t h u s FAILed with his goal.
If a pl ayer a t HOMEPLATE SWINGS a t a BALL and MISSES it. and an opposing pl ayer a t t h e PITCHER'S MOUND threw t h a t BALL,
Then Hypothesize
t h a t t h e b a t t e r d i d n o t want t o miss t h e BALL and t h u s he FAILed with h i s g o a l ; t h e p i t c h e r wanted t h e b a t t e r t o
miss. t h u s t h e p i t c h e r SUCCEDed with h i s g o a l . 3.
I f a pl ayer a r r i v e s a t FIRSTBASE o r SECONDBASE b e f o r e an opposing p l a y e r a t t h e base CATCHes t h e BALL.
Then Hypothesize
4.
If a pl ayer a t FIRSTBASE o r SECONDBASE CATCHes t h e BALL b e f o r e an opposing pl ayer reaches t h e base, and who thereupon WALKS t o h i s DUGOUT.
Then Hypothesize
5.
t h a t t h e f o r m r pla ye r wanted t o a r r i v e a t t h e base, and t h u s he SUCCEEDed with h i s g o a l ; t h e l a t t e r p l a y e r wanted t o prevent t h a t outcome. and t h u s FAILed with h i s g o a l .
If
t h a t t h e l a t t e r pla ye r d i d n o t want t o go t o h i s DUGOUT and thus he FAILed with h i s g o a l ; t h e former p l a y e r wanted t h i s outcome. and t h u s he SUCCEEDed w i t h h i s g o a l
a pl ayer CATCHes a BALL which was HIT by an opposing p l a y e r b e f o r e it
h i t s t h e ground.
Then Hypothesize
t h a t t h e former pla ye r wanted t o perform t h i s a c t i o n , and t h u s SUCCEEDed with his g o a l ; t h e l a t t e r p l a y e r d i d n o t want t h i s outcome and t h u s h e FAILed with h i s g o a l .
respect to their tendency to produce the correct level of generalization for episode classes and their role in the acceptance of incorrect hypotheses. We shall conclude that neither strategy produces completely desirable results and that additional knowledge needs to be employed in order to cope with the complex generalization situations arising in a real-world task such as baseball. In the previous experiment, the confidence value of a hypothesis was computed on the basis of predictions and consistency or inconsistency with previously acquired knowledge. In this experiment, however, we wanted to study only the effects of the alternative generalization strategies on the evaluation of a hypothesis. Thus, in this experiment we did not allow BASEBALL to use previously acquired knowledge to increase or decrease the confidence value of a hypothesis. Rather, the confidence value was based solely on the results of
232
Elliot Soloway
predictions. Since the level of generality of a hypothesis was directly reflected in its predictions, we were able to draw a clearer picture of the contribution of the alternative generalization methods to the evaluation process. In column 4 of Table VIIIa, we depict the results of running the system using CDDG with a threshold setting of 4. While nine episode classes were verified, three of them were considered to be undergeneralized. For example, the class of flyout episodes was limited to matching only those episodes in which the ball was hit to either LEFWIELD, RIGHTFIELD, or SHORTSTOP; this is clearly only a subset of the possible locations to which the BALL could be hit. The reason for this undergeneralization is actually quite simple. BASEBALL had previously observed several infield and outfield flyout episodes in which the BALL was HIT to only the locations listed above. Thus, the variable in the generalized episode description was constrained to match only those locations that had been observed so far. The confidence values for this class then reached threshold, and the class was then accepted as correct. Since the features in the pattern description were frozen at that point, BASEBALL could not merge together any new flyout episodes in which the BALL was HIT to locations other than LEITFIELD, RIGHTFIELD, or SHORTSTOP. A similar analysis holds for other undergeneralized classes. We made a straightforward change to our generalization routines so that they would perform UDDG instead of CDDG. We then passed the same data through the system, and Table VIIIa also summarizes these results. For all threshold settings, UDDG formed fewer undergeneralized classes than did CDDG. Moreover, UDDG was not sensitive to the threshold setting at all once it reached the reasonable level of 4. While UDDG generalized faster than CDDG and thus tended to reach the correct level of class generalization sooner, there are still problems with UDDG. In particular, UDDG’s blind substitution of an unconstrained variable may also lead to overgeneralization. While the number of verified episode classes which were based on a correct analysis and which were overgeneralized is virtually the same for both generalization strategies (Table VIIIa), Table VIIIb tells a different story; that is, overgeneralizing a hypothesis-and the corresponding predictions based on it-which is incorrect tends to give that hypothesis more contexts in which it can be verified. Thus, in Table VIIIb we see that the UDDG strategy consistently accepted as truth more incorrect hypotheses than did the CDDG strategy. We can summarize the tradeoffs between the UDDG and CDDG approaches as follows: 1. In order to be able to recognize a new episode, a system which employs CDDG needs to have seen an example of it already, while a system which
Knowledge-DirectedMachine Lemming
233
TABLE VIII Box SCORE FOR RUN 2
u
D D G
c
D D G
u
D D G
c
D D G
u
D D G
c
D D G
8 =2
Q = 4
8 = 6
Overgeneralized
1
1
1
1
1
1
Correctly Generalized
4
2
6
4
5
2
Undergeneralized
5
8
0
3
0
2
Episodes Which Should Not Have Been Verified
4
3
3
0
2
1
(a)
(b)
i
UDDG, unconstrained data-directed generalization;CDDG,constrained data-directed generalization.
employs UDDG needs only to have seen two examples before it creates a generalized template. 2. Since UDDG can form classes faster than CDDG,it is reasonable that the former is less sensitive than the latter to threshold setting with regard to the level of generality of the classes formed. 3. However, since UDDG is faster than CDDG,it tends to accept incorrect hypotheses and form inappropriate classes. Thus, there is a subtle interaction between generalization strategy and evaluation of hypotheses. Neither the UDDG nor the CDDG are always effective. It appears that domain knowledge might be needed in the evaluation phase of BASEBALL in order to cope with the problems created by the domain-independent generalization strategies.
234
Elliot Soloway
VII. Concluding Remarks As evidenced by all the machinery in BASEBALL, learning is a complex activity, and, as we mention below, BASEBALL is still missing a very major component. However, by attempting to build a system that starts from raw sensory input and progresses all the way to develop an understanding of the activity in terms of the relationships and intentions of the observed actors, we feel that we have had a chance to explore some interesting issues that would otherwise not have been so readily apparent:
By attempting to use knowledge that is acquired in subsequent interpretation activity, we had to confront two key issues: (1) There is a subtle interaction between generalization processes and evaluation processes: The speed with which generalization takes place must be balanced with the speed with which the system accepts hypotheses as truth; and (2) The system needs to have the knowledge that is acquired in a format that is usable by the rest of the system, and the system must be able to know when to use that new knowledge. Mere recurrence isn’t enough to evaluate hypotheses; the system needs to follow out the implications of the hypotheses in order to provide a realistic evaluation. BASEBALL attempted to predict events based on its hypotheses; the results of these predictions were used to evaluate the hypotheses. Sometimes a system must be able to take a “second look” at information that it initially threw away. In wanting to “see” a competitive interaction, BASEBALL was sometimes forced to look again among the actions that were initially filtered out during the early stage of AF. These problems came to light due to the number of levels of processing in BASEBALL and their interactions. What was the contribution of the domain knowledge initially given to BASEBALL? Was BASEBALL so biased that its learning was really reduced to recollection? The experiments described in Section VI provide some information on this issue. Quite frankly, the number of alternative hypotheses generated by BASEBALL was not all that large. Thus, BASEBALL appeared to converge rapidly on a reasonable understanding on the actors. On the other hand, BASEBALL did exhibit some variability when the threshold setting was varied and when the sequence of observed actions was not opportune; that is, BASEBALL developed incorrect interpretations of the sort that a human might, given the particular sequence of input snapshots. Thus, we do not feel that the amount and type of knowledge initially given to BASEBALL trivialized its efforts. BASEBALL’S major weakness was its inability to recover from an error. We saw in Section VI that when an incorrect hypothesis was accepted as truth, BASEBALL would then merrily continue using that incorrect information; that
Knowledge-Directed Machine Learning
235
is, the confidence in other hypotheses was adjusted to reflect its newfound-and incorrect-truth. Essentially, BASEBALL would need processes to enable it to notice that its current view of the situation was growing more and more inappropriate, and then it would need processes to identify where it went awry; that is, it would need to cope with the credit assignment problem (Minsky, 1963). While some researchers have explored the latter problem (e.g., Waterman, 1970), the former problem is still quite perplexing. One can discern two general approaches to learning in the A1 literature: (1) The system is given general knowledge about a domain, and then it somehow customizes that general knowledge to the current specific situation [e.g., BASEBALL (Collins, 1985)l; and (2) the system is given specific knowledge about situations and attempts to map that knowledge over to a new specific situation (e.g., Burstein, 1983). Thus, how can the slave boy learn the proof of the Pythagorean theorem without knowing it already? A1 research of the sort described here and elsewhere in the literature has begun to provide some mechanisms that can carry out just such nontrivial learning. ACKNOWLEDGMENTS This work was supported by the U.S. Army Research Institute for the Behavioral and Social Sciences under Grant Nos. DAHC19-77-G-0012 and DAHC19-77-G-0013. This article is based on the author’s Ph. D. dissertation research carried out at the University of Massachusetts at Amherst.
REFERENCES Avedon, E. M., & Sutton-Smith B. (1971). The study of games. New York: Wiley. Bruce, B., & Newman, D. (1978). Interacting plans. Technical Report 88, Center for the Study of Reading, Univ. of Illinois at Urbana-Champaign. Burstein, M. (1983). Concept formation by incremental analogical reasoning and debugging. In Proceedings of the International Machine Learning Workshop. (R. S . Michalsk, Ed.), Univ. of Illinois at Urbana-Champaign, June, 1983. Collins, G.(1986). Organization of memory for generating explanations. Ph.D. dissertation, Yale University, New Haven, Connecticut. Hayes-Roth, F., & McDermott, J. (1976). Knowledge acquisition from srrucrural descriptions. Technical Report, Department of Computer Science, Camegie-Mellon University, Pittsburgh, PA. Larson, J., & Michalski, R. (1977). Program comprehension: Theory and implications. SIGART Newsletter. June, No. 63. Lehnert, W. (1981). Plot units and narrative summarization. Cognitive Science, S(4). Lenat, D. (1976). A M : An artificial intelligence approach to discovery in mathematics. Technical Report 286. AI Laboriitory. Stanford University, Stanford, CA. Michalski, R. (1977). Towards computer-aided induction. Technical Report. Department of Computer Science, Camegie-Mellon University, Pittsburgh, PA. Minsky, M. (1963). Steps towards artificial intelligence. In Computers and thought. (J. Feldman & E. Feigenbaum, Eds.). New York: McGraw-Hill.
236
Elliot Soloway
Newell, A., Shaw, J., & Simon, H. (1959). Report on a general problem-solving program. In Proceedings of the lnrernafional Conference on Informarion Processing. Paris: UNESCO House. Plato (1949). Meno (B. Jowett, Trans.). Liberal Arts Press. Plotkin, G. (1970). A note on inductive generalization. In Machine inrelligence (Vol. 5). New York: American Elsevier. Reynolds, J. (1970). Transformational systems and the algebraic structure of atomic formulas. In Machine intelligence (Vol. 5). New York: American Elsevier. Schank, R. C., & Abelson, R. (1977). Scriprs, plans, goals and undersfanding. Hillsdale, NJ: Erlbaum. Schmidt, C., Sridharan, N. S., & Goodson, J. L. (1978). The plan recognition problem: An intersection of psychology and artificial intelligence. Artificial Inrelligence 11, 477-478. Soloway, E. (1978). “Learning = interpretation + generalizarion:” A case study in knowledge direcred learning. Technical Report COINS 78- 13, Department of Computer and Information Science, University of Massachusetts, Amherst. Sussman, G. (1973). A computarional model of skill acquisition. Technical Report TR-297, A1 Laboratory, MIT, Cambridge, MA. Vere, S. (1975). Induction of concepts in the predicate calculus. In Proceedings of IJCAl4. Intemational Joint Conference on AI, Tbilisi, USSR. Vex, S. (1978). Inductive learning of relational productions. In fanern-Direcred Inference Sysrems. (R. Hayes-Roth & D. Waterman, Eds.). New York: Academic Press. Waterman, D. (1970). Generalization techniques for automating the learning of heuristics. Artificial Intelligence, 1, 121-170.
Francis S . Bellezza DEPARTMENT OF PSYCHOLOGY OHIO UNIVERSITY ATHENS, OHIO 45701
Learning without thought is labor lost. The Confucian Analects, Book 2: 15
I. Introduction What is the relation between learning and thinking? When learning, can a person verbally describe what he or she is thinking? Can such verbal descriptions provide valuable insights into the learning process? These are some of the questions addressed here. The approach to learning theory implied by these questions provides a link between the learning process and other mental processes and mental structures. This approach is in keeping with those recent developments in cognitive psychology that emphasize the importance of consciousness and mental events as integral to a scientific psychology (Hilgard, 1980; Miller, 1962). It is argued that mental events play a crucial role in learning and remembering, and a number of specific ways in which this occurs are discussed. A few introductory comments are necessary before addressing the issues that are the focus of this article. Two points are emphasized. First, mental cues consist of mental context generated by the cognitive system and present in conscious memory during learning. This contextual information resides in conscious memory along with mental representations created by perception of the immediate environment. When a particular arrangement of mental context and perceived information is stored in permanent memory, learning occurs. This allows for the mental context to be later reconstructed by the cognitive system and to serve as a memory cue for the perceived information. The second point to be emphasized is that mental context present during learning can often be concurrently described by the learner. This communication between learner and investigator provides valuable data that can be used by the experimenter in a scientific manner. The notions of mental cues, conscious memory, and verbal reports are interreTHE PSYCHOLOGY OF LEARNING AND MOTIVATION, VOL. 20
231
Copyright Q 1986 by Academic Press, Inc. All rights of reproduction in any form reserved.
Francis S. Bellezza
238
lated. The following points outline the discussion that follows: ( I ) A person is aware of information that represents some of the structures and processes of the memory system. (2) The information one is aware of is stored in short-term conscious memory. (3) Verbal reports can be given describing the contents of conscious memory. (4)Information generated by the cognitive system and available in conscious memory may become linked to new information presented to the learner. (5) Later, some of the contents of conscious memory can function as recall cues for other information that previously occurred with it. (6) Verbal reports provided by the learner comprise scientific data. Recall performance may be influenced by the experimenter by re-presenting parts of the reports as explicit recall cues. (7) Mental events must have certain properties to be effective as recall cues. The properties of constructibility, associability, discriminability ,and invertibility (Bellezza, 1981) are discussed. (8) Some new experimental evidence is presented in which mental events were verbally reported in a variety of learning contexts and also manipulated as recall cues. (9) Some limitations on the use of verbal reports are discussed as well as some unresolved issues involving learning and awareness.
11.
Mental Cues and the Computer Metaphor
Up to the beginning of the twentieth century, the goal of learning theories was to explain the laws of the association of mental events (Boring, 1950; Warren, 1921). But from approximately 1910 to 1960, behaviorism was the preeminent theoretical orientation of American psychology. During this period mental mechanisms were not used to explain learning. A.
CHUNKS AS MENTAL CUES
By the late 1950s and early 1960s, cognitive mechanisms were proposed to explain learning in a variety of paradigms traditionally used with human subjects. An early example was Bousfield’s (Bousfield & Cohen, 1953) explanation of the recall of lists made up of words from common categories such as trees, vehicles, and tools. Bousfield and Cohen proposed that as each list word, such as hammer, was studied, the superordinate category to which the word belonged was activated in memory. Later, when recall took place, the category labels were first recalled and then were used to cue the words associated to them during learning. This resulted in the clustering in recall of words from the same category, even though these words were not presented together. To explain how items not preexperimentally related become associated, Miller (1956) proposed the notion of chunking by which items unrelated to one another could, nevertheless, become part of the same mental unit, called a chunk. In a
Mental Cues and Verbal Reports
239
similar vein, Tulving (1962) proposed that subjective units are formed in memory when words not naturally related to one another are presented for free-recall learning. As a result of these and other studies, many investigators in the 1960s emphasized the role of organization as a process in learning (Mandler, 1967), with new information organized into some sort of mental packages. According to the emerging cognitive perspective, these cognitive units or chunks seemed to function as mental cues in recall. It was assumed that units formed during list learning had to be recalled as implicit cues before the separate items comprising them could be recalled (Bower, 1972a). Recalling the chunk increased the probability that its constituent information would be recalled. Unlike a common category, a newly formed chunk may be a mental structure with no verbal label. The precedence of chunk recall over item recall is frequently inferred through the analysis of transition-error probabilities (Johnson, 1972) and trial-by-trial consistencies in recall (Tulving, 1962). However, newly formed chunks can often be described or labeled by the learner. Later, these labels may be presented by the experimenter as effective recall cues for the items contained in each chunk (Bellezza & Hartwell, 1981). The early work on chunking was influenced by the notion that human memory is similar to the information-processing system of a digital computer. The simple serial connectionism of traditional associationist psychology appeared to be inadequate to explain some of the chunking phenomena found in recall experiments. A component was needed in the memory system in which the creation of chunks took place. This feature of the memory system will be referred to as conscious memory. B.
CONSCIOUS MEMORY
What James (1950) referred to as primary memory has in recent times been given a variety of different names. For convenience, the term conscious memory will be used here as a label for that part of memory of which we are immediately aware. What James called secondary memory will be referred to as permanent memory. A complete formulation of conscious memory will not be presented; instead, only those general properties that most investigators agree upon will be discussed. The similarities between the operation of conscious memory and the operation of the executive processor of a digital computer are striking (Gilmartin, Simon, & Newell, 1976; Newell & Simon, 1972). Permanent memory is similar to the permanent memory store of the computer from which information can be fetched by the executive processor, operated upon or transformed, and then stored again. Like the computer, the dual-storage memory system has input devices (the senses), buffer stores (iconic and echoic store), and output devices (language and other behavior). Simon (1974) has characterized what is here termed permanent memory and conscious memory in the following way: Perma-
240
Francis S. Bellezza
nent memory is associatively organized, has unlimited capacity, has a relatively slow storage time, and has a slightly faster accessing time. Conscious memory has a relatively short storage and access time and very limited capacity. Furthermore, conscious memory is a serial processor similar in function to a computer’s executive processor (Minsky, 1975). The role of conscious memory in learning is an important one, as is explained below. C. SYMBOL ASSOCIATION Symbol association is the basis for most of the verbal-learning theory and experiments that have been formulated. In these traditional experiments, materials such as nonsense syllables, words, sentences, passages, and pictures have been used. Symbol association involves the formation of new relations among percepts and concepts that, as symbols, are already part of the cognitive system (Newell & Simon, 1972; Simon, 1976). Types of learning that do not involve symbol association are some skill learning (Anderson, 1982), the conditioning of drives and emotions (Dollard & Miller, 1950), and rote learning (Bellezza, 1982). Perceptual learning is the process by which symbols are learned (Gibson, 1969), but symbol association, as the term is used here, does not include perceptual learning. The notion of symbol association and its limits has not been extensively elaborated in psychology. However, it seems that for cognition to occur, operations and transformations of symbols must occur (Newell & Simon, 1972; Simon, 1976). For example, the symbols corresponding to the mental representations that result from perception can be associated during the learning process and later can be retrieved from memory to form visual images. From this point of view, then, learning involves the formation of links to interconnect a set of symbols in conscious memory. Symbol association may be defined as the association or linking of information capable of being represented in conscious memory. Information can be represented in conscious memory by a symbol only if that information exists as a unit in permanent memory. The units may be of various sizes and be embedded in one another. Historically, these units have been called chunks, but more complex memory organizations can be formed called schemas (Rumelhart, 1980). The information in conscious memory is of two general types. First, external stimulation is interpreted by the memory system, and symbolic representations of the stimulation are activated in conscious memory. This process of perception may consist of a search for a match between the pattern of information in a sensory storage buffer and the patterns previously stored in permanent memory (Sowa, 1984, Chap. 2). Second, symbols may be placed in conscious memory with little or no external stimulation, as would occur when someone is deep in thought and is receiving very little external stimulation.
Mental Cues and Verbal Reports
24 1
How does recall take place in the system? For information to be recalled, cues associated with the information must first be activated in conscious memory (Shiffrin & Atkinson, 1969). New information in permanent memory that previously has become associated with these cues can then be accessed. These cues may originate in a variety of ways. The recall cues may have been perceived in the environment and represent background context present during learning (Smith, Glenberg, & Bjork, 1978). They also may originate as spoken or written communication from another person (Tulving & Pearlstone, 1966). Finally, they may be cues generated by the cognitive system itself, as in the use of the method of loci (Bellezza, 1981). There is evidence that little or no deliberate retrieval from permanent memory takes place without the rememberer attending to the task, that is, without awareness of the mental cues necessary for retrieval (Read & Bruce, 1982).
D. REHEARSAL AND LEARNING The notion that information is chunked in conscious memory has been suggested by a number of investigators (Anderson, 1980; Mandler, 1975; Miller, 1956; Simon, 1976). In order to be chunked, the relevant information must first be assembled there. Because conscious memory is of limited capacity, there is the possibility that information entering conscious memory early in the process may be overwritten by new information or may decay before the chunking process has been completed. To offset this, the subject may use the strategy of rehearsing the information to prevent it from decaying. Rehearsal consists of the process of recycling the information in conscious memory so it can be maintained there (Atkinson & Shiffrin, 1968; Shiffrin & Atkinson, 1969). It has been shown that there are two types of rehearsal. Maintenance rehearsal refers to a process by which information is maintained in conscious memory with little attempt to transfer the information to permanent memory. Elaborative rehearsal refers to rehearsal with the goal of transferring new information to permanent memory (Craik & Lockhart, 1972). This transfer process occurs when creating a chunk, which is the symbol in permanent memory representing the assembly of information in conscious memory. The symbol can then be rehearsed rather than the assembly itself, conserving the limited capacity of conscious memory (Simon, 1974). Conscious memory capacity for presented information is greater during maintenance rehearsal than under elaborative rehearsal (Bellezza & Walker, 1974; Geiselman & Bellezza, 1977; Geiselman, Woodward, & Beatty, 1982). This results because elaborative rehearsal involves not only the storage of symbols for presented information, but also information retrieved from permanent memory. The “old” information is associated with the “new” information, and the
Francis S. Bellezza
242
composite is stored in permanent memory using the organizational structure of the old information (Bellezza & Walker, 1974). This process is diagrammed in Fig. 1. The old information provides a framework or a set of cues to which the new information can be linked in permanent memory. This framework may be a natural language mediator, a visual image, a category label, a memory schema, a mnemonic device, or information idiosyncratic (Tulving, 1962) to the subject. These organizational structures are discussed in more detail below. OF CONSCIOUS MEMORY E. THE VERBALIZATION
The term rehearsal does not necessarily refer to overt vocalization, although rehearsal is often experienced as if one is speaking to oneself (Landauer, 1962). Because the cognitive system in general and conscious memory in particular utilize symbols representing units of information in permanent memory, it is not surprising that many of these symbols have verbal labels. Our natural language systems have verbal equivalents in permanent memory for representations of physical objects, classes of physical objects, and abstract concepts. As a consequence, it is possible for people to verbalize what they are thinking about, that is, what is in conscious memory. When structuralism was an important theoretical position in American psychology (Titchener, 1909) and the scientific study of consciousness was assumed to be the goal of psychology, experimental subjects were trained to describe the contents of consciousness. Later, during the behaviorist era, consciousness and conscious memory
permanent memory
memory schema
new information
new
episodic connections Fig. 1. Representation of new information and schematic information in conscious memory.
Mental Cues and Verbal Reports
243
verbal reports of consciousness were not considered a source of scientific data. However, in recent years subjects’ verbal reports of concurrent thinking have been collected and used productively in the study of problem solving (Newell & Simon, 1972).
111. Learning Paradigms Using Verbal Reports
Verbal reports cannot tell us everything that we sould like to know about cognitive structures and processes. In fact, verbal reports may be very useful, but only in a limited number of situations. Nisbett and Wilson (1977) have cautioned that subjects’ verbal reports may have little relation to the cognitive processes actually being used. However, Ericsson and Simon (1980) discuss experimental paradigms in which verbal reports given concurrent with cognitive activity accurately reflect the contents of conscious memory and provide useful information about the goals and cognitive procedures being utilized by the subject. It is proposed here that concurrent verbal reports can be profitably used to study the process of learning. In order to bolster this claim, I will review several learning paradigms which collected verbal reports and correlated these with recall. A.
OVERTVERBALREHEARSAL
Rundus and Atkinson ( 1 970) utilized a method of verbal report whereby the learner spoke aloud the presented items that he was rehearsing. They found that the more times an item was verbalized, the more likely that it would be transferred to permanent memory. Rundus and Atkinson’s results provided support for a system in which conscious memory and rehearsal play a role in learning and recall. In a later experiment, Rundus (197 1) presented lists made up of words from a number of categories and recorded overt rehearsal. He found that the experimental subjects tended to retrieve from permanent memory and rehearse previously presented words from the same category whenever a new word from that category was presented. Words from the same category were rehearsed together, even if these words were not adjacent on the presentation list. This result supports Bousfield and Cohen’s (1953) hypothesis that each presented word’s category label is activated when a list of category items is presented for learning. In a series of experiments designed to study the effects of persuasive messages on attitude change, Greenwald (1968) used a procedure different from the Rundus procedure. He had subjects write their own thoughts during the presentation of persuasive messages or immediately afterward. He found that the subjects often rehearsed their current attitude about the topic during presentation of the message, and that these attitudes could be contrary to those expressed in the
244
Francis S. Bellezza
message. He proposed that this rehearsal of internally generated information rather than presented information explains why the subjects’ attitudes did not necessarily change, although they sometimes could remember the presented message. B. NATURALLANGUAGE MEDIATION Before the mid- l960s, many verbal-learning experiments followed in the footsteps of Ebbinghaus (1964) and used nonsense syllables unfamiliar to the learner. It was assumed that learning could be investigated in a y r e form because nonsense syllables had no prior meaning to the learner. This assumption may have been wrong. It is true that a nonsense syllable does not have a unitary symbolic representation in permanent memory. Therefore, a nonsense syllable cannot be processed in memory as a unit when first presented. In contrast to this, a pronounceable nonsense syllable may produce an integrated verbal response, though it does not have a unitary symbolic representation in memory. If pronounceable nonsense syllables are used as responses, successful rote learning of the nonsense syllables as motor responses may take place. The more common nonpronounceable nonsense syllables, however, are likely learned as a sequence of three separate letters, and this sequence has to be chunked before successful learning takes place (Bower, 1972b; Underwood & Schulz, 1960). In order to learn quickly, subjects may try to substitute for each nonsense syllable a word that is similar in spelling to it. This result was found by Mattocks in an experiment reported by Underwood and Schulz (1960). A similar substitution process was discussed by Miller, Galanter, and Pribram (1960). This process of natural language mediation (the term coined by Montague, Adams, & Kiess, 1966) soon came to be studied in a variety of experiments (Prytulak, 1971; Montague, 1972). The advantage of natural language mediators for learning is obvious: A natural language mediator is a word that corresponds to an established symbol in permanent memory. This symbol can be then used in conscious memory to represent a nonsense syllable instead of three separate symbols. Early experiments investigating the role of natural language mediation often were paired-associate learning experiments. The goal of these experiments was to determine if natural language mediators could aid the learning of nonsense syllable responses. For example, Montague ef al. (1966) reported that subjects instructed to form natural language mediators learned pairs of nonsense syllables better than did control subjects who were not given mediation instructions. Furthermore, subjects seemed to be able to recall the correct response only if they could also recall the natural language mediator. If the mediator that was formed for a pair was recalled, then the probability of a correct response was .73. If the mediator could not be recalled, then the probability of a correct response was .02. Finally, if
Mental Cues and Verbal Reports
245
learning was rote, that is, with no mediator reported during learning, then the probability of a correct response was .06. The natural language mediator must be present during both learning and remembering. Schulz and Lovelace (1964) manipulated the time given for learning and recalling pairs and found that mediator formation did not facilitate recall if insufficient time was provided for the mediator to be recalled previous to the response. All the above results support the notion that a natural language mediator represents a mental event present during learning that acts as a cue during recall. Natural Language Mediators as Recall Cues. Research on natural language mediation has also been performed in which verbal units such as words rather than nonsense syllables are presented. For example, Bellezza and Poplawsky (1974) presented pairs of nouns to college students instructed to give a one-word mediator that was somehow connected to the nouns presented. The subjects were instructed to simply study other pairs on the list. In both a paired-associate and a free-recall task, those pairs were better recalled for which subjects were instructed to form mediators, This recall difference shows that the mediators were not automatically elicited from memory by the presented material. Their creation was dependent on the learning strategy used by the subject for each particular item. The finding that recall of the language mediator is necessary for recall of the response item has been found in diverse verbal-mediation experiments (Bellezza, 1984a; Sweeney & Bellezza, 1982). An additional procedure used by Bellezza and Poplawsky was to present each subject with his own self-generated mediators as recall cues. Each mediator was found to be a more effective recall cue for each of the words in the pair than were either one of the presented words themselves. As in the research involving nonsense syllables, the language mediators reflected the information that was added to conscious memory by the cognitive system. This added information from permanent memory enabled the word pair to be stored as a unit in permanent memory and to be later retrieved. In a later study, Bellezza, Poplawsky, and Aronovsky (1977) tried to deal with the criticism that natural language mediation is an epiphenomenon that somehow accompanies learning but plays no important role (Adams & McIntyre, 1967; Underwood, 1972). They formulated a word-association model in which the oneword mediator was assumed to be a high associate to one of the words of the pair, but did not reflect any learning process. This model could explain the results of Bellezza and Poplawsky (1974) as successfully as their mediation model. The association model was tested against the mediation model by instructing one group of subjects to form one-word mediators; another group of subjects gave a verbal association to either one of the words in each pair. In a test of recall, the subjects in the mediation condition performed significantlybetter than subjects in
246
Francis S. Bellezza
the association condition. Hence, a simple model of word association cannot explain the verbal mediation process and its effect on learning.
C. VISUAL-IMAGERY MEDIATION It is not possible here to review the extensive evidence for visual images existing as mental symbols independent of their verbal labels. However, two examples may help to make the distinction clear. First, it is possible to form the visual image of a face for which the name has been forgotten. In most cases, describing a face in words is difficult, and the image seems to exist independent of any verbal description. A second example comes from Hatano, Miyake, and Binks (1977). They found that abacus masters could do complex calculations using only a visual image of an abacus rather than the abacus itself. In this case, as with the preceding example, it is difficult to argue that verbal representations were the only ones used. It has been shown that instructions to form visual images helps subjects learn pairs of words, and that those pairs for which an image can be formed are remembered better than pairs for which no image can be formed (Bower, 1972c; Paivio, 1969). However, there have not been many experiments that have studied visual imagery in learning and, in addition, have asked subjects to give verbal reports of the visual images formed. This is because verbal mediation and visualimagery mediation may be confounded when verbal reports are requested. To deal with this problem, Paivio and Foth (1970) had subjects either write sentences describing their verbal mediators or draw pictures describing their visual images. This procedure ensured that the specific mediation instructions given were being followed. They found that verbal mediation was optimal for the learning of abstract noun pairs and visual-imagery mediation was optimal for the learning of concrete noun pairs. Abstract nouns are those nouns typically rated low in imagery and concrete nouns are those typically rated high. Bellezza et al. (1977) also presented subjects pairs of abstract and concrete nouns, and the subjects had to give a one-word mediator for each pair. For the abstract pairs, a correct response was never recalled unless the mediator word was also recalled. However, for the concrete pairs there was a significantly greater proportion of responses correctly recalled with no mediator recalled. Bellezza et al. speculated that in some instances visual-image mediators were recalled for pairs even though the verbal mediators could not be. The results of the Paivio and Foth (1970) and the Bellezza et al. (1977) experiments argue for the operation of mental cues that can be verbally described but are independent of the language system. In summary, the creation of visual images seems to be a powerful strategy for learning, but only if this strategy can be implemented by the learner. When visual images are recreated, they become effective mental cues for the recall of the previously presented information.
Mental Cues and Verbal Reports
247
D. MNEMONIC DEVICES Perhaps the oldest and most persuasive example of the role of mental cues in learning is the method of loci. In the method of loci the learner first memorizes a series of visual images, usually representing a sequence of places (loci) with which he or she is already familiar (Yates, 1966). When a list of items is to be memorized, a visual image of each item is combined with the image of the locus in the same sequential position. In this manner representationsof the information to be remembered are stored in permanent memory. When recall is to take place, the mnemonist mentally reviews the images of the loci. Combined with each locus is the image representing the information to be recalled. This procedure is remarkably effective (Bower & Reitman, 1972; Morris & Reid, 1970), but depends upon the learner being able to form visual images for the material to be remembered. Similar methods using verbal mediation, however, are available (Bobrow & Bower, 1969). As in the case of learning experiments using visual imagery, there have not been many studies reported in which experimental subjects using a mnemonic device gave verbal reports describing the contents of conscious memory. Rather, what typically happens is that the investigator assumes that the subjects did what they were told after the mnemonic instructions had been given. In constrast to this typical procedure, Reddy and Bellezza (1983) had subjects make up a story from a list of presented words and vocalize the story as they proceeded through the list. This is an example of the story mnemonic (Bower & Clark, 1969) and seems to result in learning through a complex combination of story rules, verbal mediators, and visual mediators (Bellezza, 198 1). In a free-association condition Reddy and Bellezza gave subjects no specific learning instructions, but had them simply vocalize about what they were thinking as they studied the list words. During free recall, subjects had to again verbalize as they tried to recall the list words. As might be expected, subjects in the story condition reconstructed their story in order to recall the list words. Subjects in the free-association condition also tried to reconstruct what they were thinking about when they studied the words. In both conditions the mental events experienced during learning acted as cues for recalling the list words. Recall of mental context was a necessary condition for recall. It was found that if the mental context present at the time of learning could not be recalled, then the corresponding list word could not be recalled. Subjects in the story condition recalled more than those in the freeassociation condition, only because they could reconstruct more of the mental context present during learning. These results support the hypothesis that the mental events that occur during learning are important because they can later function as internal cues for the target information. These cues must be regenerated by the cognitive system during remembering because they are not available in the external environment. Also, the cuing context must be identical to that
248
Francis S . Bellezza
generated by the subject during learning (Tulving & Thomson, 1973). In the Reddy and Bellezza study, subjects given someone else’s verbalization recalled more poorly that those who simply free recalled with no external cues provided. These latter subjects were more likely to generate successfully their previous mental context. Chase and Ericsson (1981) had subjects practice recalling random sequences of digits. Although they did not teach any mnemonic procedure, Chase and Ericsson studied the mnemonic procedures subjects developed on their own. Their most successful subject, SF, who was a very good long-distance runner and knowledgeable about track events, gradually started to think of subsets of successive digits as running times and remembered them in this manner. When he had to recall, he first thought of the sequence of track events he used for encoding, and from these he recalled the digits. The track events functioned as mental cues for the digits. E. MEMORY SCHEMAS The notion of a memory schema has had a profound effect on contemporary theories of learning (Norman, 1982; Rumelhart, 1980; Anderson, 1980; Anderson, 1984). In brief, a memory schema is an organized set of knowledge stored in permanent memory that becomes activated when the person processes information similar to that stored in the schema. In a way, a memory schema is like a natural category, but has much more structure (Mandler, 1984). One form of a schema is a cognitive map (Neisser, 1976). A person may have in memory a set of visual images that represents some geographic area with which she is very familiar, such as the inside of her house or the street layout of her neighborhood. This schema can be used as a map to navigate through her house or neighborhood. It enables a person to know how to get to the bathroom from the basement of the house or how to get to the grocery store when at the gas station. However, a cognitive map has other uses. It can be used to store new information. This occurs, for example, when using the method of loci. Our immediate concern is with how schemas function as sets of mental cues and how knowledge about their functioning can be obtained through verbal reports. A type of schema commonly-used in learning experiments is the script (Schank & Abelson, 1977). A script, such as the restaurant script, is built up in permanent memory through experience and provides a plan for what to do when eating in a restaurant. It contains information about seating, ordering, table manners, paying, and so on. However, a script also allows a person to comprehend and remember language descriptions of other people eating in restaurants. When we read or hear “Joan was hungry, so she walked into McDonald’s,’’ the restaurant script is activated; that is, parts of the restaurant script become active in conscious memory, and visual images may be formed from
Mental Cues and Verbal Reports
249
these parts. This process can be validated by asking people to make and report inferences from the restaurant script. These verbal reports consist of what may be happening concurrently with the events being presented (Joan opened a door) or what might happen next (Joan will look at the menu on the wall) (Graesser, 1981, Chap. 6). The answers given to probe questions are indications of whether the description is being comprehended, that is, whether the script has been activated. Schemas as Sets of Cues. Not all the information in a printed or spoken text is already stored as part of some script in memory. The name of the particular person involved (Joan), the restaurant (McDonald’s), and other information such as what Joan ate, how much she paid, who she sat with, and so on is information particular to the event being described. However, the activated script plays a role in storing this information in episodic memory. Workers in artificial intelligence have proposed that the generic script has “slots” in which specific information can be stored (Minsky, 1975; Schank & Abelson, 1977). If no specific information is provided in the text, the information that should be in the slots is inferred. For example, no specific information may be given in a text that a restaurant provides a napkin for the patron. However, this is assumed to occur in the event being described because it is a common event and is part of the script in memory. The notion of filling slots in a schema with information will not be used here. Rather, it will be assumed that the activated script provides a set of organized internal cues to which the new information can be associated. Association rather than slot filling is assumed to be the process by which new information becomes linked to an activated schematic structure. Hence, the memory script or schema, like a mnemonic device, provides a set of organized internal cues and thereby supports learning (Bellezza, 1983a; Bellezza & Bower, 1982). Like any set of mental cues, the schema must be activated both during learning and during recall (Thorndyke & Hayes-Roth, 1979). The use of schemas for making inferences can also be explained using the association mechanism. If specific information is expected in a schema-based text but is not provided, then it can be inferred from what has been associated to the schema in the past. Bellezza and Bower (1982; Experiment 2) compared the effectiveness of the restaurant script to a set of pegword cues such as those used in a mnemonic device. The pegwords were high-imagery concrete nouns not semantically related to one another in any systematic manner. There were two learning conditions. In one condition subjects learned a list of concrete script nouns appropriate to a restaurant script. Some subjects associated the script nouns to the text of the restaurant script during learning, and other subjects associated the script nouns to the set of pegwords. In the test of recall that followed, the subjects using the script as a set of cues recalled significantly better than subjects using the pegwords as a set of cues. This occurred because the script nouns fit the script well, but could not as easily be related to the pegwords. In a second condition of the
250
Francis S. Bellezza
experiment, randomly sampled concrete nouns were used as the list words. The recall results were just the opposite of what occurred when script nouns were used as list words. Subjects using the restaurant script recalled the random nouns more poorly than did subjects using the pegword cues. These recall results are shown in the bottom part of Fig. 2. The random nouns typically did not fit into the script framework at all, whereas they could be related to the pegwords in a moderately successful manner. As part of this experiment, the subjects had been instructed to rate during learning the quality of each visual image they experienced. It turned out that the pattern of imagery ratings matched the pattern of recall results. This result indicated that the recall of each item was closely related to the subjects' ability to create a visual image for that item. The mean imagery ratings are shown in the upper part of Fig. 2. When the script nouns were presented as list words, they could easily be fit into the script images. But when the random nouns were presented, they did not fit into the script images. On the other hand, the type of list words made little difference when using pegword cues, because some sort of relation between each cue and each list word could frequently be found using visual-imagery mediation. Bellezza and Bower suggested that recall for both the script and the pegword cues was based on similar processes, but that the script had a narrower bandwidth. The bandwidth of a set of cues determines what
4
I
TYPE OF CUE Fig. 2. Recall performance and imagery ratings for script nouns and random nouns when memorized using script cues or mnemonic pegword cues. The lower two lines in the figure represent recall performance and the upper two lines represent the imagery ratings. (Reprinted with permission from Bellezza & Bower, 1982; OI982 by North-Holland Publishing Company.)
Mental Cues and Verbal Reports
25 I
words can be associated with the cues. The greater the bandwidth, the greater the possible interpretations allowed for the information associated with each cue in the set. For the pegword cues, which were simply single, high-imagery nouns, any word could conceivably be related to each mnemonic cue using visual imagery. On the other hand, for a word to fit into a. restaurant script and ,be remembered, it has to make sense in the context of eating in a restaurant. Bellezza (1983a) replicated these results and demonstrated that scripts could sometimes be used to remember words that were rated as not fitting into the activated script. But this occurred only when a script-based mental cue occurred in conjunction with a list word in a meaningful relationship. The results of Bellezza and Bower (1982) and Bellezza (1983a) support the notion that memory schemas support learning by providing a set of organized mental cues to which new information can be associated. Furthermore, the appropriate subparts of the schema used must be present in conscious memory when learning and recall take place; that is, the subject must be aware of this mediating information and thus be able to report on it using verbal descriptions or some other reporting procedure. A question may be raised as to how a person can discriminate between information in memory that originated in the external environment and information generated by the cognitive system. It seems that people are often, but not always, successful in distinguishing between these two types of information. This discrimination process has been labeled real@ monitoring (Johnson & Raye, 1981) and is necessary for successful cuing to occur. In a typical recall test, the subject is conscious of both context information and the information to be recalled, but must discriminate between the two and recall only that information previously presented by the experimenter.
F. PREVIOUS USE OF VERBALREPORTS I N STUDIES OF LEARNING It is proposed here that verbal reports can play an important role in the study of human learning. Yet, the question may be raised as to why there have not been many learning experiments collecting verbal reports about language mediators, visual images, and memory schemas. There seem to be a number of reasons for this paucity of studies. (1) Investigators have relied on insights gained from their own mental experiences when they themselves learn symbolic material. Expen. ments can be performed that indirectly confirm these insights. For example, Lea (1975) collected reaction times from subjects learning a list of words using the method of loci. One of his results was that subjects take less time to generate mental images of familiar loci than to generate mental representations of the list words newly associated with these loci. This result agrees with what users of the method of loci sense in their own performance. (2) Special instructions are sometimes used to get people to use procedures that seem to create mental events
252
Francis S. Bellezza
similar to those experienced by the investigator. Therefore, providing instructions to engage in certain mental activities are assumed to replace validation through concurrent verbal reports of these activities. This assumption is sometimes warranted. For example, visual-imagery instructions (Paivio, 1971) or instructions regarding how to use a mnemonic device (Bower & Clark, 1969) can have a major impact on learning new information. (3) Verbal reports can be intrusive. In some special circumstances, such as in the study of visual-imagery mediation in learning, care must be taken to separate imagery mediation from verbal mediation. Having subjects give verbal reports can bias them toward a verbal mediation strategy (Paivio, 1971). (4) Normative data have often been collected for materials that vary in how well they elicit mental events. Early work on nonsense syllables referred to this property as “meaningfulness” or “associative value” (Underwood & Schulz, 1960). In a similar manner, imagery and concreteness ratings have been collected for nouns (Paivio, Yuille, & Madigan, 1968). Also, normative data have been collected for categories (Battig & Montague, 1969) and scripts (Bower, Black, & Turner, 1979). Experimenters using these materials assume that the mental events experienced by the subjects can be controlled in part by the kinds of materials presented, and that subjects react in a similar manner to the same materials. Therefore, verbal reports are not needed to describe mental events. ( 5 ) Early functionalists and behaviorists were skeptical of the value of verbal reports collected by the structuralists, and there continues to be good reason to be skeptical of verbal reports as explanations of cognitive processes (Nisbett & Wilson, 1977). However, verbal reports can represent the current contents of conscious memory, and this often is useful information (Ericsson & Simon, 1980). In summary, the reasons why verbal reports have not been widely used in the study of learning have to do primarily with the manner in which the study of learning has developed historically. Taking into account certain limitations discussed below, none of the above reasons is compelling enough to justify not using subjects’ verbal reports in the study of learning.
G. WHYVERBALREPORTS SHOULDBE USED Why are verbal reports useful in the study of cognitive processes such as learning? There are a number of reasons: (1) Subjects do not always follow instructions. In experiments using mnemonic devices, it is not unusual for a large proportion of the subjects to not follow the procedures in which they have been instructed or trained (Bellezza, 1981). (2) Normative data are limited in their usefulness. What is meaningful, of high imagery, or schematic for one person may not be so for another. Differences in prior knowledge do exist among people, even with materials as simple as categories of common objects. Also,
Mental Cues and Verbal R e p a
253
people seem to vary in how they respond to materials from one occasion to the next (Bellezza, 1984b). Verbal reports recorded during learning allow the investigator to deal with some of these problems. (3) It has recently become clear how verbal reports can be useful and when they can be used. Ericsson and Simon (1980) distinguish among three levels of verbalization. Level 1 verbalization is a direct articulation of the information in conscious memory that is already in a language code. An example of this would be the overt verbalization of implicit speech. Level 2 verbalization involves the recoding of nonverbal information without additional processing. This would occur when describing a visual image. Level 3 verbalization involves articulation preceded by decisions, inferences, or generative acts that involve not a description of information in conscious memory, but a transformation of it using other cognitive processes. Ericsson and Simon propose that Level 1 and Level 2, but not Level 3 verbalizations are legitimate ways to study the contents of conscious memory. Level 1 verbalizations do not change the course and structure of the cognitive processes, or their speed. Level 2 verbalizations may slow down performance and the verbalizations may be incomplete, but the course and structure of the cognitive processes will remain largely unchanged. In most learning experiments, the rate of presentation of the new information can be adjusted so that Level 1 or Level 2 verbalization can occur. For the reasons outlined above, the investigator can gain a greater degree of experimental control by using verbal reports. However, what can be accomplished in the study of learning that has not already been achieved? (1) The most important use of verbal reports follows from the theory of mental cues that is presented here. Verbal reports allow the investigator to determine to some extent the nature of the mental structures that act as mediators in learning. (2) Once these cognitive structures have been identified, they can be manipulated by the experimenter for each individual subject. For example, verbal reports can be presented as recall cues after being analyzed into parts or otherwise modified. (3) Verbal reports force investigators to develop theories that account for both the verbal reports and other overt behavior; hence, verbal-report data create more stringent criteria for learning theories (Simon, 1979).
IV.
Properties of Mental Cues Important in Learning
Psychologists working in the field of paired-associate learning have proposed properties of stimuli that are crucial to successful learning (Battig, 1968; McGuire, 1961). Accordingly, mental cues must also have certain properties for successful learning to occur. Bellezza (1981) proposed four key properties of mental cues used in mnemonic devices, and these same four properties are
254
Francis S. Bellezza
important for all the types of mental cues discussed here. These are the properties of constructibility, associability, discriminability, and invertibility. A.
CONSTRUCTIBILITY
The property of constructibility (Norman & Bobrow, 1979) refers to the reliability with which information can be constructed by the cognitive system, both at the time of learning and at the time of recall. If a mental symbol is activated in conscious memory at the time of learning and becomes associated with new information also there, then that symbol must be activated as a cue if the new information is to be recalled. Sometimes environmental stimuli will elicit mental cues (S. Smith et a f . , 1978), but often the mental context created from the cognitive system is what is associated with new information, and at recall these mental cues must be strategically regenerated to act as recall cues (Greenwald, 1981). Perhaps the most easily understood example of constructibility occurs in the method of loci. If a locus is forgotten during recall, then its corresponding list item will be forgotten. Similarly, if the loci are recalled in an order different from the presentation order, then the list items will be recalled in an order different from the original. Buschke and Hinrichs (1968) demonstrated the importance of constructibility. They found that after presenting numbers in the range from 1 to 20 for recall, performance was better when subjects recalled the numbers in ascending order compared to recalling them in the order they were presented. To recall in ascending order, the subject used the strategy of “marking” in memory each number as it was presented. At recall, the numbers from 1 to 20 were mentally reviewed to see which were marked. However, to recall the numbers in their order of presentation, this strategy could not be used. Another strategy had to be used that stored both the numbers and their order (Buschke, 1968). To recall the numbers in their ascending order, the “number loci” could be used, but these loci could not be used when recalling numbers in their order of presentation. The point is that a set of mental cues must usually be generated in a stereotyped order, and if this does not correspond to the required recall order, then these cues cannot be used during learning. Constructibility is also important in schema-based learning. Memory schemas facilitate remembering by providing an organized set of mental cues to which new information can be associated. If the schema is not well formed in memory, then the schema components activated during learning may not be identical to the components activated during recall. Because schema components function as mental cues, the cues used during learning may not be available during recall, thereby reducing the amount of new information recalled. One reason why people with expert knowledge are able to remember new information related to
Mental Cues and Verbal Reports
255
their area of expertise (Smith, Adams, & Schorr, 1978) is that at different times they are able to reliably generate many cues with identical organization. B. ASSOCIABILITY Not only do mental cues have to be reliably generated in the learning and testing situation, but they must also be readily linked to new information. All familiar words have symbolic representations in memory that can be activated. However, words that have associated visual images are more easily associated with new information than words low in imagery (Paivio, 1969). Similarly, mnemonic devices and memory schemas whose components are high in visual imagery will result in better learning than mnemonic devices and schemas containing low-imagery components (Delprato & Baker, 1974). In addition to visual imagery, other factors can enhance associability. For example, if the mental context and new information represent verbal symbols that have often been experienced together, such as dog-cut, then they will be easy to link. Similarly, typical actions are easier to remember in script-based texts than are atypical activities (Graesser, Woll, Kowalski, & Smith, 1980). This notion of associability is similar to the redundancy of Miller and Selfridge (1950) and the congruity of Craik and Tulving (1975). Because a memory schema often represents objects and situations of a specialized nature, only a narrow range of information can be associated to it, and the schema is said to have a narrow bandwidth (Bellezza & Bower, 1982). Hence, the notion of the bandwidth of a set of mental cues refers to how easily schematic cues are associated with a wide range of information. C. DISCRIMINABILITY
Mental cues, like physical cues, must be discriminable to support learning; they must not be confused with one another. The anonymous author of Ad Herennium, an ancient Greek textbook on rhetoric, suggested that the locations to be used in the method of loci not be too much alike and should be at least 30 feet apart (Yates, 1966, pp. 7-8). This advice has received empirical support from contemporary paired-associate experiments. It has been shown that word stimuli similar in meaning, that is, similar in their mental representations, result in poorer learning than do word stimuli dissimilar in meaning. Day and Bellezza (1983) found that paired-associated learning was poorer when the stimulus words were made similar by being chosen from the same natural category (such as fruits) than when they represented dissimilar physical objects. This was true even though all the stimuli were meaningful and high in visual imagery. Comparable results have been found by Underwood, Ekstrand, and Keppel(l965). In another
256
Francis S. Bellezza
study, Bellezza (1983b) has reported that different word lists presented on visually distinct background patterns are better recalled than different lists presented on the same pattern. It seems that the patterns acted as mental cues for the lists, and their discriminability in memory was an important factor in their effectiveness. So far, only semantic similarity among mental cues has been discussed as influencing discriminability. But episodic similarity is also possible. The same mental cue may become associated with new information on a variety of different occasions; that is, the same symbol may occur in a number of different codes in episodic memory (Tulving, 1972). For example, if a person is instructed to memorize a series of five lists of words using the same set of loci, he or she may have difficulty remembering if asked to recall the fourth word from the third list. This is because the subject is forced to associate the same mental cue (perhaps the visual image of the learner’s front lawn) with the fourth word in every list. When asked for the fourth word from the third list, the fourth word from one of the other lists might be recalled and an error made. It is surprising to investigators that people are able to perform so well in this type of recall task. However, it appears that the learner must rely on temporal-contextual information in memory in addition to the semantic characteristics of the mental cue (Anderson & Bower, 1972, 1974; Bellezza, 1982; Shiffrin, 1976). But use of this temporalcontextual information is not well understood. This whole problem of episodic similarity of mental cues has been studied extensively using traditional interference paradigms, though without the theoretical perspective used here (Postman, 1971). D. INVERTIBILITY Invertibility means that a mental cue and its corresponding information are bidirectionally associated, and this property is important for mental cues. During learning, new information coded into conscious memory may activate old information in permanent memory. Yet this activated information may have to serve later as a cue for the information that preceded it in time. For example, when reading a passage about eating in a restaurant, the passage may mention that the patrons were met inside the door by someone wearing a red dress. Information regarding a red dress becomes activated in conscious memory before the inference is made that this person is the hostess. However, at recall the order of mental events may be different. When a person tries to remember, he or she may first think about a restaurant. Next, specific cues may be generated from the restaurant script regarding the roles, props, and actions in a restaurant. The subject may think about the fact that when eating in some restaurants one must deal with a headwaiter or hostess. The symbol for hostess in conscious memory
Mental Cues and Verbal Reports
257
may therefore act as a cue for the information in the passage regarding the fact that the hostess was wearing a red dress. This reversal in the order of events during recall means that the associations between mental cues and new information in memory should be bidirectional. If symbol A precedes symbol B in conscious memory, symbol B should later be able to elicit symbol A. This invertibility seems to occur when the symbols associated are visual images, but the strength of the association may be asymmetric when only verbal responses are involved (Paivio, 1971, Chap. 8; Ekstrand, 1966). Without the property of invertibility, mental cues are ineffective because they cannot elicit information from episodic memory preceding them in conscious memory. One possible distinction between symbol association and rote learning is that in symbol association bidirectionality is likely to be preserved, whereas in rote learning it is not. In rote learning the symbols in conscious memory are not directly linked, but generate a sequence of motor responses. These motor responses become associated in the sequence in which they are verbalized.
V. Mental Cues Formed under Different Task Sets An understanding of mental cuing is important for the study of learning, and verbal reports are a valuable tool for doing this. In this section, two studies are discussed that demonstrate the necessity of mental cues in learning and the importance of the four properties proposed for them. Use is made of verbal reports to provide a description of the mental cues. A.
STUDY1 : CONSTRUCTIBILITY
In this experiment, randomly selected concrete nouns were studied by subjects who were instructed to report whatever came to mind in response to each presentation. Rather than being a simple free-association experiment, four different tasks were specified which varied from word to word. For a quarter of the words the subject was requested to give a word or phrase that sounded like the word presented. This was the sound task. For another quarter of the words, a dictionary definition had to be generated. For the remaining words, a personal experience related to the word had to be described, or the subject had to describe where in his or her house the object named by the word would be most appropriately placed. This last task was the house task. For each word and task combination, each subject wrote down the required response as a description of the mental events he or she experienced. Of course, the complete contents of conscious memory cannot be represented by such written descriptions or by any other kind
258
Francis S. Bellezza
of verbal report. However, it was assumed that the written descriptions were representative of the cognitive content of conscious memory when the task was completed. It was hypothesized that later recall of the words could occur only if the mental context generated during learning was also generated as a mental cue (see Reddy & Bellezza, 1983; Bellezza, 1984a). Because of the procedure used here, the terms mental context, mental cues, verbal reports, and written descriptions are used interchangeably. The hypothesis tested in Study 1 was that constructibility is an important attribute of mental cues. If one's house forms a cognitive map (Neisser, 1976; Norman, 1982), then at the time of recall it should be possible to recall words processed with the house task. This is because the mental cues derived from the house map are easily reconstructible. If one's personal experiences are well organized in memory, the same should be true for the experience task (Bellezza, 1984a). But constructibility should be less operative for the mental cues generated in the sound and definition tasks because sounds of randomly selected concrete nouns or the definitions of these nouns are not organized in any systematic manner in memory. For these two tasks, schemas are not available to mediate the organization of words in episodic memory. Other theoretical approaches to learning make different predictions regarding the effectiveness of these four tasks. The definition task does require a great deal of semantic processing. If depth or level of processing is an important determinant of free recall, as opposed to mental cuing, then the definition task should result in high levels of recall (Craik & Lockhart, 1972; Craik & Tulving, 1975). The definition task may also result in the maximum discriminability of episodic memory codes because the creation of dictionary definitions requires the identification of the unique properties of each of the defined words. If recall performance is good in the definition task, that would support the idea that the distinctiveness of the memory code is an important determinant of recall (Jacoby & Craik, 1979). 1 . Method
Four different list forms were made up, each consisting of 48 concrete nouns randomly sampled from Toglia and Battig (1978). Each list was based on the same sample of nouns presented in the same order. Each noun had a value between 5.0 and 7.0 on the dimensions of concreteness, imagery, and meaningfulness and had a value between 5.5 and 7.0 on the dimension of familiarity. Next to each word was printed one of the four tasks: similar-sounding word, write a definition, a personal experience, or where in your house? Each task occurred only once in every successive set of four words, but appeared in a different random order for each set. The list forms were created so that each word was paired with each task once across the four forms. The noun task items were printed in booklets, and the subjects spent 30 sec writing down a response to each
Mental Cues and Verbal Reports
259
noun task combination. The subjects were paced through the booklets by the experimenter and were instructed to write down a maximum of about 10 words in each of their responses. After they wrote down their verbal response and within the 30-sec period allowed, they also had to rate how difficult it was to generate a response for the noun task combination. A rating of 7 indicated that the task for the noun was very difficult, and a rating of 1 indicated that the task was very easy. Subjects were presented a practice list of four words to practice each task once. When the 48 noun task items in the main list were completed, the booklets were collected and a blank booklet was handed out. The subjects were instructed to write down both the list words and their own previous responses to them. They were to write down the list words and responses in two parallel columns. It was emphasized that if a list word, but not the response given to it, could be recalled, the list word should nevertheless be written down. Similarly, if a response, but not the list word that elicited it, could be remembered, the response should be written down. The subjects were given as much time as needed for this freerecall task. A total of 54 subjects were tested in two sessions. Approximately the same number of subjects were tested on each of the four list forms.
2 . Results A one-way analysis of variance was performed on the dependent variables, with processing task as the one within-subjects factor. The results are shown in Table I. a. Dificulty Ratings. The means for the difficulty ratings were significantly different across the four processing tasks, F(3, 159) = 8.61, MS, = .234, p < .001. Using Tukey’s HSD tests (Kirk, 1982), it was found that the definition task was rated as the most difficult, with the other three tasks not significantly different from one another. b. Elaboration. The mean number of written words per response also differed between tasks, F(3, 159) = 270.39, MS, = 1 . 3 8 , ~ < .001. Tukey’s HSD tests showed that the definition and experience tasks resulted in a larger number TABLE I DIFFICULTY, ELABORATION, AND FREE-RECALL MEASURES FOR THE FOURTASKS USEDIN STUDY1 Measure
Sound
Definition
Experience
House
Difficulty Elaboration Mental-cue recall List-word recall
2.03 1.14 .32 .34
2.48 6.58 .39 .41
2.14 6.95 .41 .44
2.16 4.65 .63 .65
260
Francis S. Bellezza
of words per response than did the house task. The task in which subjects had to generate a similar-sounding word resulted in the fewest words per response. c. Recall. There was a significant effect of task on the number of list words recalled, F(3, 159) = 43.38, MS, = .022, p < .001. Tukey’s HSD tests showed that more words were recalled in the house task than in the definition or experience tasks, which were not significantly different from each other. Recall was poorest following the task where subjects generated a similar-sounding word. The analysis of variance for recall of the subjects’ own responses gave results very similar to the statistical results resulting from analysis of the list words. As can be seen from Table I, the recall means of the list words and subject responses were almost identical in each task. When a list word was recalled, its subjectgenerated response was also likely to be recalled, and vice versa. Similar results have been reported by Bellezza (1984a, Experiment 2).
3. Discussion The level of free recall in the house task was about 53% higher than recall in the definition and experience tasks and about 91% greater than in the task that required similar-sounding words be generated. The most important characteristic of the house task was that the subjects utilized the cognitive map of their house during learning. When the cognitive map was again activated at recall, the components generated from it had become associated with the recently presented list words and could act as recall cues. Thus, the mental cues utilized in the house task were more constructible than those cues generated by any of the other three tasks. Other investigators have suggested that personal experiences are effective mediators for free recall (Bower & Gilligan, 1979). but personal experiences do not seem to be organized in memory as well as other schemas (Bellezza, 1984a), such as the schema for one’s own dwelling. The results of Study 1 support the notion that the recall of the list items was mediated by the mental cues described in written reports collected during learning. Also, these cues showed a high degree of associability. Whenever a mental cue was recalled, the list word associated with that cue was also recalled. One could argue that the list word was first recalled, and the verbal report was recreated using the list word. However, the significant differences found in the levels of recall found among the various tasks do not support this counterargument. Theories of learning that emphasize levels of processing propose that semantic processing will result in better recall than nonsemantic processing. To some extent, this occurred in Study 1 because the task in which subjects generated words similar in sound to the list words resulted in the poorest recall. A levels-ofprocessing approach, however, cannot account for all the results. Some investigators have suggested that items with the most distinctive memory codes are the
Mental Cues and Verbal Repow
26 1
most retrievable from memory (Jacoby, Craik, & Begg, 1979). Of the words in the four tasks, the words in the definition task should have had the most distinctive episodic memory codes. When giving a dictionary definition, a person must access in permanent memory those properties of the defined word that most clearly distinguish it from the other items on the list. If discriminability is of paramount importance, then the definition task should result in the greatest amount of processing. It did indeed result in the greatest amount of semantic processing, for the definition task was rated as significantly more difficult than the other three tasks. However, the definition task did not result in as much recall as did the house task. The definitions generated by the subjects were not organized in memory and therefore could not be reconstructed during recall to form an effective set of mental cues. Another hypothesis related to the levels-of-processingapproach is that those items that are most broadly processed (Craik & Tulving, 1975) or elaborated (Anderson & Reder, 1979) will be best recalled. One measure of elaboration is the mean number of words per response generated by the subjects in the four tasks. The definition and experience tasks resulted in a significantly greater number of words per response than did the house task, but the house task resulted in a greater level of free recall. The obvious result of Study 1 is that the house task resulted in the best free recall because the house task involved the most constructible mental cues.
B. STUDY2: ASSOCIABILITY, AND DISCRIMINABILITY INVERTIBILITY,
In Study 1, written reports of the contents of conscious memory were collected for list words during both learning and recall. It was found that a list word was almost always recalled when the mental events accompanying its encoding were recalled, and vice versa. This was taken as evidence that the mental events functioning as mental cues had a high degree of associability. In Study 2 a more direct assessment of the associability of mental cues was made. The presentation procedures and materials were the same as those used in Study 1, but there was a 3-day retention interval preceding free recall. Following free recall, each subject was given two types of recall cues. One set of cues consisted of half the nouns previously presented. For these the subject had to provide the same verbal report for each noun as he or she provided 3 days previously. The second set of cues was made up of the written reports the subject gave for the other half of the nouns. To these he or she had to give the list word for which the written report had been made. The written reports presented to each subject were always his or her own. Using this procedure, a direct measure of associability could be made by determining how well the verbal descriptions generated in each task elicited the original list words. Also, a measure of the invertibility of the mental cues
262
Francis S. Bellezza
could be determined by computing how well the verbal reports elicited the original list words versus how well the list words regenerated the verbal descriptions given 3 days earlier.
I . Method The procedure and materials were the same as those used in Study 1, but only up to the point where the booklets containing the written responses to the words were collected from the subjects. At this point, the subjects were dismissed and reminded to return 3 days later for the second part of the experiment. No mention was made of what the second session would entail. During the 3-day interval a cued-recall test booklet was made up for each subject. Half the words in each task were listed as recall cues. Also, the written descriptions made in response to the other half of the list words were included as recall cues. If the list word was contained in the written description, then a blank was substituted for it in the description. When the subjects returned for testing, they were given a blank booklet and the free-recall instructions used in Study 1. After 10 min these booklets were collected and the cued-recall tests handed out. The cues were arranged in the booklet so that it was clear which cue was a written description requiring a list word as a response and which cue was a list word requiring the description given 3 days earlier. The subject was not told what task had been originally associated with each cue, although when the written descriptions were cues, this information could often be inferred. A total of 40 subjects were tested in two sets of sessions. 2 . Results a. Free Recall. An analysis of variance on the proportion of list words free recalled showed a significant effect of task, F(3, 117) = 13.02, MS, = .011, p < .001. A similar result was obtained when analyzing the proportion of written descriptions recalled, F(3, 117) = 19.76, MS, = .010, p < .001. These proportions are shown in Table 11. Tukey HSD tests showed that more list words and written descriptions of mental context were recalled in the house task than in any of the other three tasks. The latter were not significantly different from each other. These freerecall results are similar to those obtained in Study 1, except that the levels of recall are lower. As in Study 1, recall of a list word was almost always accompanied by recall of its associated written description, and vice versa. b. Cued Recall. The proportion of items recalled in the cued-recall test for each task is also shown in Table 11. A 4 X 2 analysis of variance was performed on these proportions, with the first factor being task and the second factor being cue type. Both factors were within-subjects factors. Task was significant, F(3, 117) = 26.59, MS, = .047, p < .001. Tukey HSD tests showed that overall
Mental Cues and Verbal Reports
263
TABLE I1 FREE-RECALL AND CUED-RECALL MEASURES FOR THE FOUR TASKSUSED IN STUDY 2 ~~~~~
Measure Free recall Mental-cue recall List-word recall Cued recall List-word recall Mental-cue recall Correct-task recall
Sound
Definition
Experience
House
.I0 .I3
.I4 .16
.12 .13
.26 .26
.46 .53
.83 .74 .77
.67
.53 .59 .67
.64
.58
.65
cued recall in the definition task (.79)was superior to any of the other three tasks: experience (.62), house (.56), or similar-sounding word (SO). The only other significant difference was between the experience and the similar-soundingword task. The main effect of cue type was not significant, F < 1. 3. Discussion
The pattern of cued-recall results is markedly different from that of free recall. Why should this be? The answer is that the most important property of a set of mental cues in free recall is constructibility. If the mental cues cannot be constructed at recall, then the other properties of the mental cues do not have the opportunity to influence performance. However, in the tests of cued recall, written descriptions of the mental cues were used to activate their mental representations in the subject’s conscious memory. The property of constructibility was no longer necessary because the cues were presented to the subject by the experimenter. The best cued-recall performance occurred in the definition task. This was the case regardless of whether the written description was presented as a cue for the list word or the list word was presented as a cue for the written description. Hence, the associability of the mental cues in the definition task was greater than the associability of the cues in any of the other three tasks. Furthermore, the mental cues in all four tasks showed invertibility. Each written response described the mental context elicited by a combination of list word and task during the learning phase. Later, list words were again presented to cue the written descriptions, and written descriptions were presented to cue the original list words. If one type of presented cue had been more effective than the other, this would provide evidence for directionality in the association. Howev-
264
Francis S. Bellezza
er, there was no significant difference in the effectiveness of these types of cues, so bidirectionality of association was found. It should be noted that in Study 2 the degree of invertibility varied somewhat from task to task. Although there was no main effect for cue type, there was a marginally significant task by cue type interaction. It would not be surprising if future research found invertibility to be better for some types of learning cues than for others. The definition task resulted in the best cued recall because of the strong association between the list words and the definitions generated by the subjects. However, associability is not the only property of mental cues that is relevant here. The subjects correctlv recalled the definitions from the words and the words from the definitions, but the subjects may not have been aware that these were the same words and definitions used 3 days earlier; that is, it is possible for the subject to generate the correct response from semantic memory without recognizing the word as having occurred at some earlier time (Tulving & Thomson, 1973, Experiment 3). Because of this possibility, the question can be raised as to whether these mental cues are discriminable in episodic memory. As mentioned earlier, mental cues must be discriminable on a number of dimensions in order to be effective. These dimensions are often semantic ones dealing with the meaning of the words. However, the relevant dimension can be the temporal-contextual dimension that is important to the functioning of episodic memory. Because the same set of words may function as a mental cue for many episodes in a person’s life, he or she must distinguish among many instances of the same mental cue by using temporal-contextual attributes. How discriminable from episodes involving the same words were the mental events in Study 2 after a retention interval of 3 days? When the written descriptions were presented as cues, it was often clear from the description what the original task was. However, when the list words were presented, the subjects were not told what the task had been for each word. They had to look at each word and remember the response they had given to it 3 days previously. They simply could not write down a definition for every word, because four different tasks were involved. The last row of Table I1 indicates how often subjects gave a response representing the correct task category regardless of whether their written description was correct. As can be seen, these proportions are much larger than the chance value of .25 and are generally larger than the proportion of correct descriptions recalled. It appears that subjects could sometimes recall the task associated with the word but not make the correct response. The tasks could be more easily recalled than the particular written response first given to the word. Both the presented word and the recalled task functioned as cues for recall of the written description. It can be concluded that the episodic codes containing the words were successfully discriminated from other personal episodes of the subjects in which the list word was a part.
Mental Cues and Verbal Reports
VI.
265
Limitations of Verbal Reports about Learning
It is proposed that the analysis of verbal reports is useful for a better understanding of human learning. The reasons for this are as follows: (1) Much of human learning consists of the creation of new chunks in episodic memory by the interconnecting of cognitive symbols in conscious memory. These symbols represent a mixture of newly perceived information and information already organized in permanent memory. Some of these symbols can be later regenerated by the cognitive system as recall cues. (2) The symbols in conscious memory can, to some extent, be verbally or linguistically described. (3) The verbal data provided permit the investigator to better understand by experimental manipulation the information contributed by the subject’s cognitive system during learning. Controversy regarding the nature of verbal reports has existed from the time of the Wurzburg School’s discovery of imageless thought (Humphrey, 1963) to the present day (Ericsson & Simon, 1980; Nisbett & Ross, 1980; Nisbett & Wilson, 1977). The use of verbal reports in present-day psychology should not be thought of as a reversion to the method of introspection favored by structural psychologists such as Titchener. The structuralists believed that the study and understanding of the contents of consciousness was the primary goal of psychology and not simply one aspect of psychology’s methodology. They argued that conscious experience must be analyzed into irreducible elements such as sensations, images, and feelings. Also, each element had attributes such as quality, intensity, and extensity (MacLeod, 1964). Contemporary cognitive psychologists tend to regard verbal reports as another source of data, useful for the study of mental processes and their relation to language and behavior (Hilgard, 1980). A. PROCEDURAL AND DECLARATIVE KNOWLEDGE The distinction between what is called declarative versus what is called procedural knowledge has been made by a variety of investigators (Anderson, 1976; Ryle, 1949; Winograd, 1975). Declarative knowledge is knowledge in symbolic form stored in permanent memory. Anderson (1983, Chap. 2) proposes three types of information that can make up declarativeknowledge and thus become part of the content of conscious memory. These are abstract propositions, spatial images, and temporal strings. When any one of these three types of knowledge is accessed or activated, the learner immediately becomes aware of it and its related information in the same cognitive unit. In addition to declarative knowledge, there is another kind of knowledge called procedural knowledge. This knowledge may not be stored in a form that makes us aware of it when we use it. Our knowledge of grammar is such a type of knowledge. We create both original and grammatically correct sentences often without being able to explain how we do it. Other examples of procedural knowledge are those knowledge sets necessary
Francis S. Bellezza
266
for reading, writing, tying a bow, or driving an automobile. Simple memory processes may also represent procedural knowledge. In the act of retrieving information from permanent memory, we are aware only of the result of this act of retrieval. Although some of the symbols in conscious memory may be acting as retrieval cues, we are not aware of the process by which information is being retrieved from permanent memory. There seems to be agreement among investigators that verbal reports reflect the symbolic contents of conscious memory, but tell us little about the memory retrieval processes themselves (Ericsson & Simon, 1980; Nisbett & Wilson, 1977; Read & Bruce, 1982). Procedures that initially are accompanied by a large number of mental events may gradually be performed with little or no preceding thought. An example of this is learning to play a piano or learning to touch-type. When a procedure can be performed without having to think about it, we say that it has become automatic (Shiffrin & Schneider, 1977). Anderson (1982) suggests that all skill acquisition consists of two stages: The first is a declarative stage in which knowledge about the skill is in propositional form that must be interpreted before the skill can be performed. In this stage the information that the person utilizes can be expressed verbally. In fact, this information is rehearsed and used in a piecemeal fashion to perform the skill. The second stage is procedural. Through a series of processes that take place with practice, the skills become faster and more efficient. Furthermore, the skills become automatic; that is, the skills are performed without the declarative knowledge controlling the skill first being activated in conscious memory and then being interpreted. Even after a skill becomes automatic, the declarative knowledge used to first perform it may sometimes be available in permanent memory. Often, however, this knowledge is forgotten. For example, try to explain to someone how to tie a bow. There also seem to be learning situations in which little or no declarative knowledge is available when the skill is first being learned. This is true of many motor skills. Developing a skill such as throwing darts, dribbling a basketball, or learning to ride a bicycle seems to involve certain component skills that are not initially accompanied by required mental events; that is, learning does not consist of new organizations of mental symbols in the cognitive system, but rather involves the development of perceptual-motor skills not paralleled by new symbol organization. Learning occurs in the procedural-knowledge system that is independent of the declarative-knowledge system. Each improvement based on procedural knowledge is not necessarily preceded or accompanied by some mental event signaling that improvement, even though the learner may notice his or her improvement after it has occurred.
B. VOCALIZATION AS
A
SKILL
The process of overt rehearsal, which involves repeatedly verbalizing information after it has been presented, seems to be a fairly complex skill (Flavell,
Mental Cues and Verbal Reports
261
Friedrichs, & Hoyt, 1970). Rehearsal does not spontaneously occur in a verbal learning task when the subjects are young children or mentally retarded. Yet training in rehearsal improves memory performance for both the young children (Kenny, Cannizzo, & Flavell, 1967) and the retarded (Brown, Campione, Bray, & Wilcox, 1973). Furthermore, unrestricted reporting of the contents of conscious memory may be more difficult than the overt rehearsal of only externally presented information. For this reason, researchers often instruct the subject to report only material previously presented and still in conscious memory. However, vocalization with respect to learning and problem solving involves the reporting of as much information from conscious memory as possible, including those elaborations contributed by the subject. It should be kept in mind that this type of vocalization may itself be a complex skill developed over a long period of time. C. CONDITIONING AND AWARENESS A controversy related to the meaning of verbal reports concerns whether different types of human conditioning occur without awareness. Attention has been focused on three learning paradigms: verbal mediation in transfer experiments (Bugelski & Scharlock, 1952), classical conditioning (Brewer, 1974), and operant conditioning (Greenspoon, 1962). The problem of conditioning and awareness is a complex one, and reviews are provided by Brewer (1974) and Spielberger (1965). The issue of awareness is equivalent to the issue of whether the learner can accurately report the psychological processes responsible for learning. In most cases, awareness means that the learner must be able to report the process or relation that is critical for conditioning to occur. Prytulak (197 1) argues that verbal transfer must be accompanied by awareness, and Brewer (1974) provides an extensive review of the conditioning literature and concludes that awareness is necessary for conditioning to occur in humans. But the question may not be resolved (Dulany, 1974). As might be expected, many of the inconsistencies in experimental results have arisen from inadequacies in assessing awareness. As Brewer points out, many experimenters have not reported the procedure or test by which awareness was assessed.
D.
VERBAL
REPORTSAND
AFFECT
In addition to cognitive symbols, affective symbols representing current feelings are also active during cognition. Bower has proposed that cognitive symbols and affective symbols become interlinked in memory in a manner similar to the interconnection of cognitive symbols themselves (Bower, 1981; Bower & Cohen, 1982). However, the association of affective and cognitive components may complicate the process of thinking, learning, and verbalization. Because of the affect associated with some information in permanent memory, people may not be able to verbally report this information. Consequently, the kind of learning
268
Francis S. Bellezza
necessary for healthy cognitive functioning will not occur. According to Dollard and Miller (1950) the term repression refers to the automatic tendency to stop thinking about certain anxiety-producing information and avoid the necessity of activating it into consciousness. It is implicitly assumed that these negative feelings increase as we become more aware of the associated symbols and diminish as we become less aware of them. Whether the process of repression actually occurs in this manner is still uncertain (Holmes, 1974). However, it seems reasonable to assume that any person will have difficulty verbalizing information that has been associated with very negative feelings. ACKNOWLEDGMENTS Portions of this research were presented at the meetings of the Psychonomic Society in San Diego. California during the November 1983 and in San Antonio, Texas during November 1984. This research is supported in part by a grant from the Field-Wiltsie Foundation. Thanks goes to Steven 1. Lynn and Hal Arkes for their helpful comments on an earlier version of this article and to Kathy Sandy and Daniel Scully for their assistance in collecting and scoring the data. Thanks also goes to Ohio Computer and Learning Services for making computer time and their facilities available.
REFERENCES Adams, J. A., & McIntyre, J. S. (1967). Natural language mediation and all-or-none learning. Canadian Journal of Psychology, 21, 436-449. Anderson, I. R. (1976). Language, memory, and thought. Hillsdale, NJ: Erlbaum. Anderson, J. R. (1980). Concepts. propositions, and schemata: What are the cognitive units? In H. E. Howe, Jr. & J. H. Flowers (Eds.), Nebraska Symposium on Motivation (Vol. 28. pp. 121162). Lincoln, NE: Univ. of Nebraska Press. Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological Review, 89, 369-406. Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard Univ. Press. Anderson, J. R., & Bower, G. H. (1972). Recognition and retrieval process in free recall. Psychological Review, 79, 97- 123. Anderson, J. R., & Bower, G. H. (1974). A propositional theory of recognition memory. Memory and Cognition, 2, 406-412. Anderson, J. R., & Reder, L. M. (1979). An elaborative processing explanation of depth of processing. In L. S. Cermak & F. I. M. Craik (Eds.), Levels ofprocessing in human memory (pp. 385403). Hillsdale, NJ: Erlbaum. Anderson; R. C. (1984). Some reflections on the acquisition of knowledge. Educational Psychologist, 13, 5-10. Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.),The psychology of learning and motivation (Vol. 2, pp. 89-195). New York: Academic Press. Battig, W. F. (1968). Paired-associate learning. In T. R. Dixon & D. L. Horton (Eds.). Verbal behavior and general behavior theory (pp. 149- 171). Englewocd Cliffs, NJ: Prentice-Hall. Battig, W. F., & Montague, W. E. (1969). Category norms of verbal items in 56 categories: A replication and extension of the Connecticut category norms. Journal of Experimental Psychology Monographs, SO(3, Pt. 2).
Mental Cues and Verbal Reports
269
Bellezza, F. S . (1981). Mnemonic devices: Classification, characteristics, and criteria. Review of Educational Research, 51, 247-275. Bellezza, F. S. (1982). Updating memory using mnemonic devices. Cognitive Psychology, 14 301327. Bellezza, F. S. (1983a). Recalling script-based text: The role of selective processing and schematic cues. Bulletin of the Psychonomic Society, 21, 267-270. Bellezza, F. S. ( I 983b). The spatial-arrangement mnemonic. Journal of Educational Psychology, 15, 830-837. Bellezza, F. S. (1984a). The self as a mnemonic device: The role of internal cues. Journal of Personality and Social P ~ y ~ h o l ~41, g y ,506-5 16. Bellezza, F. S . (1984b). Reliability of retrieval from semantic memory: Common categories. Bulletin of the Psychonomic Society. 22, 324-326. Bellezza, F. S . , & Bower, G . H. (1982). Remembering script-based text. Poetics, 11, 1-23. Bellezza, F. S., & Hartwell, T. C. (1981). Cuing subjective units. The Journal ofPsychology. 107, 209-2 18. Bellezza, F. S., & Poplawsky, A. J. (1974). The function of one-word mediators in the recall of word pairs. Memory and Cognition. 2, 447-452. Bellezza, F. W., Poplawsky, A. J., & Aronovsky, L. A. (1977). The functional role of one-word mediators. Bulletin of the Psychonomic Society, 10, 460-462. Bellezza, F. S., & Walker, R. J. (1974). Storagecoding trade-off in short-term store. Journal of Experimental Psychology, 102, 629-633. Bobrow, S. A., & Bower, G . H. (1969). Comprehension and recall of sentences. Journal of Experimental Psychology, 80, 455-461. Boring, E. G . (1950). A history of experimental psychology. New York: Appleton. Bousfield, W. A,, & Cohen, B. H. (1953). The effects of reinforcement on the Occurrence of clustering in the recall of randomly arranged associates. Journal of Psychology, 36, 67-81. Bower, G . H. (1972a). A selective review of organizational factors in memory. In E. Tulving & W. Donaldson (Eds.), Organization of memory (pp. 93-137). New York Academic Press. Bower, G . H. (1972b). Perceptual groups as coding units in immediate memory. Psychonomic Science, 21, 217-219. Bower, G . H. (1972~).Mental imagery and associative learning. In L. Gregg (Ed.), Cognition in learning and memory (pp. 51-87). New York: Wiley. Bower, G . H. (1981). Mood and memory. American Psychologist, 36, 129-148. Bower, G. H., Black, J. B., &Turner, T. J. (1979). Scripts in memory for text. Cognitive Psychology, 11, 177-220. Bower, G . H., & Clark, M. C. (1969). Narrative stories as mediators for serial learning. Psychonomic Science, 14, 181-182. Bower, G . H., & Cohen, P. R. (1982). Emotional influences in memory and thinking: Data and theory. In M. S. Clark & S. T. Fiske (Eds.), Afect and Cognition: The Seventeenth Annual Carnegie Symposium on Cognition (pp. 291-33 l).Hillsdale, N1: Erlbaum. Bower, G. H., & Gilligan, S. G . (1979). Remembering information related to one’s self. Journal of Research in Personality, 13, 420-432. Bower, G. H., & Reitman, J. S. (1972). Mnemonic elaboration in multilist learning. Journal of Verbal Learning and Verbal Behavior, 11, 478-485. Brewer, W. F. (1974). There is no convincing evidence for operant or classical conditioning in adult humans. In W. B. Weimer & D. S. Palermo (Eds.), Cognition andthesymbolicprocesses (Vol. 1, pp. 1-42). Hillsdale, NJ: Erlbaum. Brown, A. L., Campione, 3. C., Bray, N. W., & Wilcox. B. L. (1973). Keeping track of changing variables: Effects of rehearsal training and rehearsal prevention in normal and retarded adolescents. Journal of Experimental Psychology, 101, 123-131.
270
Francis S. Bellezza
Bugelski, B. R.,& Scharlock, D. P. (1952). An experimental demonstration of unconscious mediated association. Journal of Experimental Psychology, 44, 334-338. Buschke, H. (1968). Perceiving and encoding two kinds of item-information. Perception and Psychophysics. 3, 331-336. Buschke, H., & Hinrichs, J. V. (1968). Controlled rehearsal and recall order in serial list retention. Journal of Experimental Psychology, 78, 502-509. Chase, W. C., & Ericsson, K. A. (1981). Skilled memory. In J. R. Anderson (Ed.), Cognitive skills and their application (pp. 141-189). Hillsdale, NJ: Erlbaum. Craik, F. 1. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11, 671-684. Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General, 104, 268-294. Day, J. C., & Bellezza, F. S. (1983). The relation between visual-imagery mediators and recall. Memory and Cognition, 11, 251-257. Delprato, D. J., & Baker, E. J. (1974). Concreteness of pegwords in two mnemonic systems. Journal of Experimental Psychology, 102, 521-522. Dollard, J., & Miller, N. E. (1950). Personality and psychotherapy. New York: McGraw-Hill. Dulany, D. E. (1974). On the support of cognitive theory in opposition to behavior theory: A methodological problem. In W. B. Weimer & D. S. Palermo (Eds.), Cognition and the symbolic processes (Vol. I , pp. 43-56). Hillsdale, NJ: Erlbaum. Ebbinghaus. E. (1964). Memory: A contribution to experimental psychology (H. A. Ruger & C. E. Bussenius, Trans.). New York: Dover. (Original work published 1885) Ekstrand, B. (1966). Backward (R-S) associations. Psychological Bulletin, 65, 50-64. Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. PsychologicalReview, 87, 215251. Flavell, J. H., Friedrichs, A. G., & Hoyt, J. D. (1970). Developmental changes in memorization processes. Cognitive Psychology, 1, 324-340. Geiselman, R. E., & Bellezza, F. S. (1977). Eye movements and overt rehearsal in word recall. Journal of Experimental Psychology: Human Learning and Memory, 3, 305-3 15. Geiselman, R. E., Woodward, J. A., & Beatty, J. (1982). Individual differences in verbal memory performance: A test of alternative information-processing models. Journal of Experimental Psychology: General, 111, 109-134. Gibson, E. J. (1969). Principles of perceptual learning and development. New York: Appleton. Gilmartin, K. J., Simon, H. A., & Newell, A. (1976). A program modeling short-term memory under strategy control. In C. M. Cofer (Ed.), The structure of human memory (pp. 15-30). San Francisco: Freeman. Graesser, A. C. (1981). Prose comprehension beyond the word. New York: Springer-Verlag. Graesser, A. C., Woll, S. B., Kowalski, D. J., & Smith, D. A. (1980). Memory for typical and atypical actions in scripted activities. Journal of Experimental Psychology: Human Learning and Memory, 6 , 503-515. Greenspoon, J. (1962). Verbal conditioning and clinical psychology. In A. J. Bachrach (Ed.), Experimental foundations of clinical psychology (pp. 5 10-553). New York: Basic Books. Greenwald, A. G. (1968). Cognitive learning, cognitive response to persuasion, and attitude change. In A. G. Greenwald, T. C. Brock, & T. M. Ostrom (Eds.), Psychological foundations of attirudes (pp. 147-170). New York: Academic Press. Greenwald, A. G. (1981). Self and memory. In G. H. Bower (Ed.), Thepsychology of learning and motivation (Vol. 15, pp. 201-236). New York Academic Press. Hatano, G., Miyake, Y., & Binks, M. G. (1977). Performance of expert abacus operators. Cognition, 5 , 57-71,
Mental Cues and Verbal Reports
27 1
Hilgard, E. R. (1980). Consciousness in contemporary psychology. Annual Review of Psychology, 31, 1-26. Holmes, D. S . (1974). Investigations of repression: Differential recall of material experimentally or naturally associated with ego threat. Psychological Bulletin, 81, 632-653. Humphrey, G. (1963). Thinking: An introduction to its experimenralpsychology. New York: Wiley. (Original work published in 1952) Jacoby, L. L., & Craik, F. I. M. (1979). Effects of elaboration of processing at encoding and retrieval: Trace distinctiveness and recovery of initial context. In L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing in human memory (pp. 1-21). Hillsdale, NJ: Erlbaum. Jacoby, L. L., Craik, F. I. M., & Begg, 1. (1979). Effects of decision difficulty on recognition and recall. Journal of Verbal Learning and Verbal Behavior, 18, 585-600. James, W. (1950). The principles of psychology. New York: Dover. (Original work published in 1 890). Johnson, M. K., & Raye, C. L. (1981). Reality monitoring. Psychological Review, 88, 67-85. Johnson, N. J. (1972). Organization and the concept of a memory code. In A. W. Melton & E. Martin (Eds.), Codingprocesses in human memory (pp. 125-159). Washington, DC: Winston. Kenny, T. J., Cannizzo, S. R., & Flavell, J. H. (1967). Spontaneous and induced verbal rehearsal in a recall task. Child Developmenr, 38, 953-966. Kirk, R. E. (1982). Experimental design: Procedures for the behavioral sciences (2nd ed.). Belmont, CA: Wadsworth. Landauer, T. K. (1962). Rate of implicit speech. Perceptual and Moror Skills, 15, 646. Lea, G. (1975). Chronometric analysis of the method of loci. Journal of Experimental Psychology: Human Perception and Performance, 1, 95-104. MacLeod, R. B. (1964). Phenomenology:A challenge to experimental psychology. In T. W. Wann (Ed.), Behaviorism and phenomenology (pp. 47-78). Chicago: Univ. of Chicago Press. Mandler, G. (1967). Organization and memory. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation (Vol. I , pp. 327-372). New York: Academic Press. Mandler, G. (1975). Memory storage and retrieval: Some limits on the reach of attention and consciousness. In P. M. A. Rabbit & S. Dornic (Eds.), Attention andperformance (Vol. 5, pp. 499-516). New York: Academic Press. Mandler, J. M. (1984). Stories, scripts, and scenes: Aspects of schema theory. Hillsdale, NJ: Erlbaum . McGuire, W. J. (1961). A multiprocessmodel for paired-associatelearning. Journal of Experimental Psychology, 62, 335-347. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits in our capacity for processing information. Pschological Review, 63, 81-97. Miller, G. A. (1962). Psychology: The science of mental life. New Y o k Harper & Row. Miller, G. A,, Galanter, E., & Wbram, K. H. (1960). Plans and the structure of behavior. New York: Holt. Miller, G. A , , Selfridge, J. A. (1950). Verbal context and the recall of meaningful material. American Journal of Psychology, 63, 176- 187. Minsky, M. (1975). A framework for representing knowledge. In P. H. Winston (Ed.), Thepsychology of computer vision (pp. 21 1-277). New York: McGraw-Hill. Montague, W. E. (1972). Elaborative strategies in verbal learning and memory. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 6, pp. 225-302). New York: Academic Press. Montague, W. E., Adams, J. A,, & Kiess, H. 0. (1966). Forgetting and natural language mediation. Journal of Experimental Psychology, 72, 829-833. Moms, P. E., & Reid, R. L. (1970). The repeated use of mnemonic imagery. Psychonomic Science, 20, 337-338.
272
Francis S. Bellezza
Neisser, U. (1976). Cognition and reality. San Francisco: Freeman. Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice-Hall. Nisbett, R. E., & Ross, L. (1980). Human inference: Strategies and shortcomings of social judgment. Englewood Cliffs, NJ: Prentice-Hall. Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84, 23 1-259. Norman, D. A. (1982). Learning and memory. San Francisco: Freeman. Norman, D. A,, & Bobrow, D. G. (1979). Descriptions: An intermediate stage in memory retrieval. Cognirive Psychology, 11, 107-113. Paivio, A. (1969). Mental imagery in associative learning and memory. Psychological Review, 76, 241-263. Paivio, A. (1971). Imagery and verbal processes. New York: Holt. Pavio, A., & Foth, D. (1970). Imaginal and verbal mediators and noun concreteness in pairedassociate learning: The elusive interaction. Journal of Verbal Learning and Verbal Behavior, 9, 384-390. Pavio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of Experimental Psychology Monograph, 76(I , Pt. 2). Postman, L. (1971). Transfer, interference, and forgetting. In J. W. King & L. A. Riggs (Eds.), Woodworth and Schlosberg’s experimental psychology (pp. 1019- I 132). New York: Holt. Prytulak, L. S. (1971). Natural language mediation. Cognitive Psychology, 2, 1-56. Read, 1. D., & Bruce, D. (1982). Longitudinal tracking of difficult memory retrievals. Cognitive Psychology, 14, 280-300. Reddy, B. G.,& Bellezza, F. S. (1983). Encoding specificity in free recall. Journal of Experimental Psychology: Learning. Memory. and Cognition. 9, 167- 174. Rumelhart, D. E. (1980). Schemata: The building blocks of cognition. In R. Spiro, B. Bruce, & W. Brewer (Eds.), Theoretical issues in reading comprehension (pp. 33-58). Hillsdale, NJ: Erl baum . Rundus, D. (1971). Analysis of rehearsal processes in free recall. Journal of Experimental Psychology, 89, 63-77. Rundus, D . , & Atkinson, R. C. (1970). Rehearsal processes in free recall: A procedure for direct observation. Journal of Verbal Learning and Verbal Behavior, 9, 99- 105. Ryle, G. (1949). The concept of mind. New York: Harper & Row. Schank, R., & Abelson, R. (1977). Scripts. plans, goals, and understanding. Hillsdale, NJ: Erlbaum. Schulz, R. W., & Lovelace, E. A. (1964). Mediation in verbal paired-associate learning: The role of temporal factors. Psychonomic Science, 1, 95-96. Shiffrin, R. M. (1976). Capacity limitations in information processing, attention, and memory. In W. K. Estes (Ed.), Handbook of learning and cognifiveprocesses(Vol. 4, pp. 177-236). Hillsdale, NJ: Erlbaum. Shiffrin. R. M., & Atkinson,R. C. (1969). Storage and retrieval processes in long-term memory. Psychological Review, 76, 179-193. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing: 11. Perceptual learning, automatic attending, and a general theory. Psychological Review. 84, 127-190. Simon, H. A. (1974). How big is a chunk? Science, 183, 482-488. Simon, H. A. (1976). The information storage system called “human memory.” In M. R. Rosenzweig & E. L. Bennett (Eds.), Neural mechanisms of learning and memon, (pp. 79-96). Cambridge, MA: MIT. Simon, H. A. (1979). Information processing models of acquisition. Annual Review of Psychology, 30, 363-396.
Mental Cues and Verbal Reports
273
Smith, E. E., Adams, N., & Schorr, D. (1978). Fact retrieval and the paradox of interference. Cognitive Psychology, 10, 438-464. Smith, S. M., Glenberg, A. M., & Bjork, R. A. (1978). Environmental context and human memory. Memory and Cognition, 6 , 342-353. Sowa, J. F. (1984). Conceptual structures:Informationprocessing in mindand machine. New York Addison-Wesley . Spielberger, C. D. (1965). Theoretical and epistemological issues in verbal conditioning. In S. Rosenberg (Ed.), Directions in psycholinquistics (pp. 149-200). New York: Macmillan. Sweeney, C. A., & Bellezza, F. S. (1982). Use of the keyword mnemonic in learning English vocabulary words. Human Learning, 1, 155-163. Thomdyke, P. W., & Hayes-Roth, B. (1979). The use of schemata in the acquistion and transfer of knowledge. Cognitive Psychology, 11, 82-106. Titchener, E. B. (1909). Lectures on the experimental psychology of the thought processes. New York: Macmillan. Toglia, M.P., & Battig, W. F. (1978). Handbookof semantic wordnorms. Hillsdale, NJ: Erlbaum. Tulving, E. (1962). Subjective organization in free recall of “unrelated” words. Psychological Review. 69, 344-354. Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.), Organization and memory (pp. 381-403). New York: Academic Press. Tulving, E., & Pearlstone, Z. (1966). Availability versus accessibility of information in memory for words. Journal of Verbal Learning and Verbal Behavior, 5, 381-391. Tulving, E., & Thomson, D. M. (1973). Encoding specificity and retrieval processes is episodic memory. Psychological Review, 80, 352-373. Underwood, B. J. (1972) Are we overloading memory? In A. W.Melton & E. Martin ( a s . ) , Coding processes in human memory (pp. 1-23). Washington, DC: Winston. Underwood, B. J., Ekstrand, B. R., & Keppel, G. (1965). An analysis of intralist similarity in verbal learning with experiments on conceptual similarity. Journal of Verbal Learning and Verbal Behavior. 4, 447-462. Underwood, B. I., & Schulz, R. W. (1960). Meaningfulness and verbal learning. Philadelphia: Lippincott. Warren, H. C. (1921). A history of association psychology. New York: Scribner’s. Winograd. T. (1975). Frame representation and the declarative/pmdurl controversy. In D. G. Bobrow 8; A. Collins (Eds.), Representation and understanding: Studies in cognitive science (pp. 185-210). New York: Academic Press. Yates, F. A. (1966). The art of memory. London: Routledge & Kegan Paul.
This Page Intentionally Left Blank
Murray Glanzer and Suzanne Donnenwerth Nolan DEPARTMENT OF PSYCHOLOGY NEW YORK UNIVERSITY NEW YORK, NEW YORK 10003
I. Introduction: Restrictions This article describes a series of studies concerned with the role of memory mechanisms and, particularly, short-term storage in comprehension of text. The work described had the following restrictions. A.
ANALYSIS OF ONGOING COMPREHENSION
There are many studies that are concerned with the effects of such variables as text organization on subjects’ eventual recall or comprehension of the text. That work leaves open the question of just when the variable has had its effectduring comprehension, during later recall, or at some point in the time separating the two. Our focus was on effects that were measured during the course of comprehension. We routinely measured the subjects’ later recall or comprehension of text, but for the variables we used those later measures were not of primary interest. They also turned out not to be informative in our experimental arrangements. B.
USE OF “NORMAL”TEXT
The texts used were drawn from the types of material ordinarily read by our subjects, college students. They included both narration and exposition. The texts were restricted in specific ways for the purposes of some of the experiments; for example, sentence length was held constant. In all cases, however, the texts used could appear in the usual reading of our subjects without being noticed as remarkable. They were neither simpleminded nor artificial.
C. Focus ON “NORMAL”READING Our concern was with obtaining a picture of the ongoing process of comprehension during the reading of ordinary text. This is a rapid, automatic process THE PSYCHOLOGY OF LEARNING AND MOTIVATION, VOL. 20
275
Copyright 0 1986 by Academic Press. Inc. All rights of repmduction in any form reserved.
Murray Glanzer and Suzanne Donnenwerth Nolan
216
for our readers. The work aimed at this rapid, automatic aspect. We intervened with secondary tasks and slowed the subject with a button-pressing task, but only to define the comprehension process as it occurred rapidly and automatically. We tried, particularly in the later experiments, to keep the obvious interventions to a minimum while still obtaining a reasonable amount of information from each experimental trial. The attempt to focus on normal reading is also seen in our use of normal text.
D. MAINTENANCE OF RELATION TO
SIMPLEMEMORY TASKS
We assumed that the processes involved in the memory of a simple list of words were also those involved in the comprehension of complex text. This assumption arose in part from the history of the work that started with simple memory tasks. The assumption seems, however, to be a necessary one for any psychologist concerned with either memory or comprehension. The assumption here leads to theorizing that draws on concepts which have been developed and tested in simple memory tasks. It also leads to the use of techniques that have been developed and tested for the study of simple memory performance, such as distractor tasks.
E. POSTULATION OF ENTITIES ONLYAS REQUIRED A major emphasis in our work was to keep the theoretical structure as simple as possible. This emphasis is closely connected to the attempt to maintain the relation to simple memory tasks. It is seen first in our attempt to keep the postulated contents of short-term storage to a minimum. At one point we thought we could hold the number of items involved in the ongoing processing down to one or two recent sentences held verbatim in short-term storage. The fuller probing of the performance involved in reading revealed the role of additional factors. Thematic or topic information, which had initially been difficult to substantiate in ongoing processing, turned out to be a major factor. Moreover, in order to describe the processing fully, we drew on the concept of cuing of information in long-term storage in addition to the concept of short-term storage. However, both of these concepts were supported by data from simple memory experiments. The view of the comprehension process we arrived at was, not surprisingly, more complicated than the one we started with. However, it remains much simpler than most current views of the process. The result of the emphasis on simple theoretical structure is that our view contrasts with views that attempt to explain comprehension on the basis of complex cognitive structures. These views are sometimes labeled constructivist. It also contrasts with views that attempt to explain comprehension on the basis of
Memory Mechanisms in Text Comprehension
277
complex and deliberate processes enacted by the reader, which we would label voluntaristic, others would label as effortful (Hasher & Zacks, 1979). We recognize that the reader’s knowledge plays an important role at all levels of comprehension, from the interpretation of words to the expectations concerning sequences of events in a story. This knowledge is assumed here, however, to be mediated by the same memory mechanisms that produce the recall of simple word lists. The demonstration that such knowledge plays a role defines a problem. Mechanisms have to be specified and tested. Another characteristic that differentiates our approach from others is that we have not assumed that readers translate the text into propositions. The form in which internal representations are held remains unclear. Adoption of particular form at this times seems premature (Anderson, 1978). The experimental work described in the following will demonstrate what can be done on the basis of the listed restrictions, particularly that of theoretical simplicity. It will show how the work leads to the demonstrationof regularities in representative reading tasks. Moreover, the work will show that the approach is flexible enough to lead, when the data call for it, to the formulation of an extended theory and, in turn, to further testable propositions.
11. Background: Preceding Work
A. REGULARITIES IN MEMORY FOR SIMPLE LISTS Our starting point was earlier work on free recall of word lists. That work had demonstrated a number of strong regularities for long-term storage and regularities for short-term storage (Glanzer, 1972). This division in regularities was the basis for asserting that a separate short-term storage existed. The regularities for long-term storage included the effects of rate of presentation, length of list, number and spacing of repetitions, meaningfulness, intelligence, and aging. Those for short-term storage included the effects of filled delay, presentation modality (auditory vs visual), and grouping. Data were also presented showing that these regularities generalize across a range of memory tasks. Once the regularities of short-term storage had been established, it was possible to begin the examination of its function in other types of performance, in particular, in text processing. The variables that had a specific effect on shortterm storage, such as grouping, suggested that it played a particularly important role in language and text processing. The concept of short-term storage has been widely used in the analysis of text comprehension. It appears in proposals by a number of investigators (Chafe, 1973; Kieras, 1981; Kintsch & van Dijk, 1978). This does not mean that there is agreement on the use of the concept in this area. Some investigators have questioned the utility of the concept of short-term storage. Others have offered
278
Murray Glanzer and Suzanne Donnenwerth Nolan
alternative labels-foregrounding (Lesgold, Roth, & Curtis, 1979), primary memory and working memory (Baddeley & Hitch, 1974). In some cases the alternative labeling has been associated with assertions about the characteristics of the memory, for example, that it varies in the amount of information it can hold. One popular relabeling that seems to avoid controversy is that of “activation.” What we will talk about as short-term storage could in almost all cases be equally well named “activated portion of memory.” The term has the advantage of not implying that there is a structure involved, that is, a definite physical unit. It has the disadvantage for us of losing sight of the fact that the contents of shortterm storage include verbatim representations of recent portions of preceding text. We will hold to the term short-term storage and not go into issues that digress from our main concern-its role in comprehension. It was decided to look at the role of short-term storage in, specifically, the comprehension of text. The stimulus for the work came in part from the following assertion by Huey (1908): It is of the greatest service to the reader or the listener that at each moment a considerable amount of what is being read should hang suspended in the primary memory of the inner speech. It is doubtless true that without something of this there should be no comprehension of speech at all. When a considerable amount is thus suspended, the attention may wander backward and forward to get a fuller meaning where this is needed, with no fear of losing the minor parts, which are taken care of physiologically and may be taken into the focus of consciousness at will. (p. 148)
The picture of comprehension that we derived will be less voluntaristic than Huey’s, but will support his assertion concerning the usefulness of short-term storage. Several background investigations helped determine characteristics of shortterm storage relevant to comprehension of text. The first was concerned with the amount of material that can be held in short-term storage. Can it cany a sufficient amount of information to play a role in intersentence processing?
B. THENUMBERAND KINDOF UNITS IN SHORT-TERM STORAGE A series of investigations (Glanzer & Razel, 1974) was carried out to determine the number and kind of units held in short-term storage. As a preliminary step, a survey was carried out on a range of studies on memory for lists of unrelated words. Analysis of the results of that survey indicated that short-term storage held an average of two units. The surveyed studies used lists of unrelated words, and so we inferred that the capacity of short-term storage was two words. The next question was whether larger units consisting of sequences of words were also held in short-term storage. Moreover, if larger units were held, would
Memory Mechanisms in Text Comprehension
279
two of them be held? These questions are critical. If short-term storage was limited to a couple of words, we would be restricted to the consideration of shortterm storage in intrasentence processing. Our interest, however, included intersentence processing, the processing referred to in the statement by Huey (1908). The experiments reported by Glanzer and Razel (1974) show, first, that familiar word sequences such as proverbs function much the same way as unrelated single words in free recall. Subjects show serial position curves for lists of proverbs that are very similar to those for single words (see Fig. 1). In particular, the end peak for proverbs is very much like the end peak for words. That end peak represents primarily output from short-term storage. It shows a large shortterm storage effect for sentences. Indeed, the amount estimated as held in shortterm storage for proverbs was comparable to that for words-approximately two units. The sentences used in that study were, as noted, familiar sentences, proverbs. A relevant question was whether the same number of units would be held if the sentences were not familiar. To answer that question, a study was carried out in which half the sentences were familiar sentences, proverbs, and half were new I .oo
,901
.so
6
-
.lo-
W
a [L
0
.60-
0
.50-
0
I-
[L
0
a
.40-
0 LL
a .30,201
SERIAL
POSITION
Fig. I . Serial position curves for words and proverbs in Glanzer and Razel (1974), Experiment 3.
Murray Glanzer and Suzanne Donnenwerth Nolan
280
sentences, matched with the proverbs in vocabulary and structure. Both sets of sentences were recalled under both delay and no-delay conditions. The results are shown in Fig. 2. Under both the delay and no-delay recall conditions, both types of sentences show the serial position curves found with unrelated words. The new, unfamiliar sentences do show a marked and expected difference from the proverbs in the long-term storage component, as indicated by the separation of the early portions of the serial position curve. Both the proverbs and the new sentences, however, have marked end peaks in the no-delay condition, indicating sizable amounts held in short-term storage for both. Estimates of the amount held in short-term storage for proverbs is two sentences. For new sentences it is 1.5. These numbers, moreover, probably underestimate the amount in short-term storage. One reason for the underestimation is that there is output interference in recall. When the subject recalls one item, the probability of recall of other items from short-term storage is reduced. A clear demonstration of this effect in a probe recall task is seen in a study by Tulving and Arbuckle (1963). Other I .oo
,901
p
.a0 -
I
I
.70-
li,
w [r
.60-
SERIAL
POSITION
Fig. 2. Serial position curves for proverbs and new sentences in Glanzer and Raze1 (1974). Experiment 6. Both delay and no-delay conditions are shown.
Memory Mechanisms in Text Comprehension
28 I
demonstrations are found in studies by Dalezman (1976) and Tulving and Arbuckle (1966). The number of sentences actually held in short-term storage may therefore be three or more. Thus, we have a basis for examining the role of shortterm storage in intersentence processing.
111. Text Comprehension Studies A.
DIRECTEVIDENCE OF SHORT-TERM STORAGE CARRYING INFORMATION DURING THE COMPREHENSION OF TEXT: RECALLOF RECENTSENTENCES
Now that short-term storage is shown to have the capacity to permit it a role in intersentence processing, the next question is whether it holds sentences in the same way during normal text comprehension. It is possible that the capacity measured in the studies above was peculiar to the recall of lists of unrelated units. That short-term storage did hold text sentences in the same way was indicated, however, by the studies of Jarvella (1971, 1979) and Sachs (1967). Moreover, it has been assumed to be present and an important functioning component in theories of comprehension (Just & Carpenter, 1980; Kintsch & van Dijk, 1978; Miller & Kintsch, 1980), with indirect evidence of short-term storage playing an important role in some of the work based on those theories. In order to uphold the relation to simple memory tasks, we thought it important to show fully that the regularities found for short-term storage in the recall of unrelated word lists are also found in the recall of sentences from organized text. The regularity we focused on was the serial position effect, the end peak seen in Fig. 1. We were also concerned with whether the capacity for short-term storage estimated in the preceding work would be the same in the processing of text. A series of experiments was carried out (Glanzer, Dorfman, & Kaplan, 1981, Experiments I A-D) to examine these issues. Subjects listened to tape-recorded text which was interrupted at various points. When interrupted, the subjects were given a probe cue to recall a sentence one (last sentence heard) to four positions back. The recall was to be verbatim. The main results of the study are shown in Fig. 3. The regularities that hold for short-term storage in recall of simple word lists also hold for the recall of text sentences. The end peak is present. The number of sentences held is approximately two. The ease with which the subjects gave the verbatim recall for the last few sentences supports the idea that short-term storage holds the sentence in verbatim form. These experiments and the experiments that precede them, in particular those of Sachs (1967), give evidence that short-term storage holds two sentences in verbatim from preceding text. Jarvella’s (1971, 1979) work raises the issue of whether the unit stored is clauses rather than sentences. Most of the work in the
Murray Glanzer and Suzanne Donnenwerth Nolan
282
l
.
t; w .80-
-
O O - ~la c ~~ .~ EXPERIMENT b lEXPERIMENT lc c--l EXPERIMENT Id
a a
~
~
~
~
~
~
8 60-
El' Is 0
8 .40a n
z
9I .2001
I
lx
I
m
I
II SENTENCE PROBE
I
I
Fig. 3. Serial position curves for probe recall of text sentences in Glanzer et al. ( 1 9 8 1 ~Experiments la-d.
literature favors a phrasing in terms of simple sentence or clauses (Chang, 1980; Clark & Sengul, 1979; Jarvella & Herman, 1972). However, the issue is not important for the kinds of experimental procedures we used and will not be considered further here. %.
EFFECTSOF STANDARD DISTRACTOR TASKS (COUNTING, ARITHMETIC)ON TEXTPROCESSING: PARALLELS TO SIMPLE MEMORYPERFORMANCE
Although there is evidence that short-term storage may be functioning during text processing, there is no evidence so far that it plays an important role in the processing. The short-term storage effects could be a by-product of processes operating during comprehension rather than a necessary part. To demonstrate that it is a necessary part requires several steps. The first step is to show that a standard operation used to eliminate the contents of short-term storage in simple memory tasks has a damaging effect on the comprehension of text-slowing the process or lowering the amount comprehended and remembered. To show this, two series of experiments were carried out (Glanzer et al., 1981) in which subjects read paragraphs. In the control condition, the sentences followed each other without interruption. In the experimental condition, in one series the sentences alternated with simple addition problems, while in the other series the sentences alternated with a simple counting task. The time the subjects took to read the sentences was measured, and their comprehension was also tested after they had completed the text.
Memory Mechanisms in Text Comprehension
283
The effect of both the arithmetic tasks and the counting task was to slow the reading of the text sentences -400 msec. Distractor tasks had little effect here and subsequently on the later comprehension measures. In most cases, comprehension measured after the reading in the interrupted condition was either equivalent to that after the control condition, or lower. When it was lower, the difference was not statistically significant. It is not surprising, however, that there were only weak effects on this later measure. The reading task was selfpaced, and the subjects had full opportunity in their slower reading to recover from the distractor task in order to maintain comprehension. Since for our selfpaced reading experiments the final comprehension scores here and subsequently were not informative, they will not be discussed further. However, they were collected for all the experiments described. In the experiments (3a and 3b) of this study, using counting as a betweensentence distraction, that distractor task was also used as a concurrent task. Subjects counted aloud as they read the text sentences silently. Such concurrent counting had a very strong effect in slowing reading. Those results indicate the role of short-term storage in intrasentence processing as well. Our focus here, however, is on the effect of the between-sentencedistractor task as evidence that short-term storage plays a role in intersentence processing.
C. EFFECTSOF READING DISTRACTOR TASKS ON TEXTPROCESSING: CONTENT VERSUS SET, TEXTVERSUS UNRELATEDSENTENCES
In the preceding experiments, two standard types of distractor tasks were used-counting and addition. The fact that they slow reading and that they also lower the end peak in recall of both lists of unrelated items and text sentences may indicate that there is a general influence on comprehension of a reduced short-term storage. It is possible, however, that the slowing observed with the between-sentence distractor tasks was due to the switch from one task to another, that is, from counting to reading or from arithmetic to reading. It will be seen later that such general task-related effects, which we label “set effects,” do exist. However, we were specifically concerned here with establishing the role of short-term storage in carrying information needed for text comprehension. To determine whether we were dealing with a loss of information or a loss of set, an experiment was set up in which the distractor task was also a reading and comprehension task (Glanzer, Fischer, & Dorfman, 1984, Experiment I ) . In the control condition, the text was read normally, one sentence after the other, and then a series of factual statements was read. In the experimental condition, the text sentences alternated with the unrelated factual statements. The subjects’ reading time for both sentence types was recorded. A sample paragraph in the experimental condition is given in Table I.
284
Murray Glanzer and Suzanne Donnenwerth Nolan
TABLE I TEXTOF A PARAGRAPH WITH INTERLEAVED FACTUAL STATEMENTS FROM GLANZER,FISCHER,AND DORFMAN’S (1984) EXPERIMENT
+ + +
+ + + +
+ + +
Jupiter is unlike the Earth in almost every way. Weizmann, the first president of Israel, was a well-known chemist before he took public office. We used to think it had a hard core, covered with a layer of ice. Some Roman emperors used lotteries to give away property and slaves to guests at feasts. Now we can see it with a telescope. Pigeons and doves, unlike most birds, keep their bills in water and drink with a pumping action. It seems now that it is made entirely of gas. In the reign of Peter the Great, a factory was established for the manufacture of asbestos articles. This is mostly hydrogen, some of which is combined to form poisonous compounds. Residents John Adams and Thomas Jefferson died on the same day, July 4, 1826. It is clear that no life can exist there. Amboy Street in Brooklyn was the site of the first birth control clinic in America. Not only is the atmosphere poisonous, but Jupiter is too far from the sun. George Washington was the sole survivor of 10 children at the time of his death. The planet is very cold. The first field hospital treating wounded soldiers on the battlefield was introduced by Queen Isabella of Spain. Like Venezuela, the economic backbone of the Caribbean islands of Trinidad and Tobago is petroleum. Until 1830, the deaf, dumb, and blind were not included in the U.S. Census. What was our earlier idea of how Jupiter was constructed? Of what do we now think Jupiter is composed? What is one ingredient in the poisonous compounds found on Jupiter? Are there other gases besides hydrogen on Jupiter? What about the atmosphere prevents life from existing on Jupiter? Why is Jupiter cold? What was the career of the first president of Israel before he took office? What was given away by lottery in early Rome? Which birds drink with a pumping action? In what state was the first birth control clinic in the U.S. located? In which country was the first field hospital located? What is the backbone of Venezuela’s economy?
Each sentence was presented separately and in succession. In the continuous condition all the paragraph sentences were presented in immediate succession. In both conditions the last statement was followed by two blocks of comprehension questions. The plus was presented before each factual statement to make sure that the subject could easily distinguish the two sets of material. Below the paragraph are the comprehension questions.
Memory Mechanisms in Text Comprehension
285
TABLE I1
MEANREADINGTIMES(MSEC) FOR THE CONDITIONS IN GLANZER, FISCHER, AND DORMAN’S(1984) EXPERIMENT 1 Reading condition
Text Unrelated sentences
Continuous
Interrupted
3896 6690
4210 6695
Table II shows the reading times for the continuous and the interrupted condition and for the text sentences and the unrelated factual statements. The text sentences were slowed by more than 300 msec in the experimental condition. It can be concluded therefore that the slowing occurs when the distractor task involves the same kinds of processes as the paragraph comprehension task, that is, even when there is no change in processing set. The slowing cannot be ascribed simply to a change in set. Also very important is the absence of any interruption effect on the unrelated factual statements. In the interrupted condition the unrelated factual statements were interrupted by the paragraph sentencesjust as the paragraph sentences were interrupted by the unrelated factual statements. The data show that the interruption effects occur, however, only for the continuous text. Reading times increase for the related text sentences after an interruption, but do not increase for the unrelated factual statements. The data support the following conclusions. Short-term storage holds information concerning preceding text. The loss of this information results in a slowing of reading. The slowing may result from the subjects taking additional time to recover missing information before or while continuing the following text. It may also result from the subjects continuing their reading without recovering the missing information, but reading slowly because they are handicapped by its absence. The fact that comprehension tests given after the text was read show negligible effects of interruption supports the first alternative. If the subjects had been reading that subsequent text at a lesser level of comprehension, some sign of that lesser level should have appeared in the later comprehension tests.
D. ROLE OF THEMATIC OR TOPICINFORMATION IN
SHORT-TERM STORAGE: FIRSTATTEMPT
The experiment on probe recall showed that subjects could recall verbatim one to two of the most recent sentences read. That finding and its congruence with
286
Murray Glanzer and Suzanne Donnenwerth Nolan
the findings from free recall of unrelated items led us to think that the main information carried in short-term storage was those sentences. However, theorists have assumed the presence of thematic or topic statements, higher-order statements (Kieras, 1981; Kintsch & van Dijk, 1978), during the processing of text, and have assumed that those statements were in short-term storage. These assumptions seem reasonable, since it would be expected that if subjects were interrupted in their reading and were asked what they were reading, they could produce a topic statement. But to draw this conclusion, several things have to be established. It is necessary to determine the timing characteristics of such topic statements in order to interpret that presumed performance. It is necessary to determine, for example, whether subjects would produce a topic statement as easily as a report on the last sentence read. The time to make the response is of major importance in determining whether thematic information is indeed carried in short-term storage. If subjects take longer to give thematic information than to report the last sentence read, then its retrieval may be from long-term storage. (In Section III,I we will describe the use of time to respond to thematic information as a technique to analyze the presence of that information in short-term storage.) It is also necessary to establish whether topic statements play a critical role in text processing. One way to determine whether topic statements have this role is by measuring the effect on text processing when they are eliminated from shortterm storage. We made several attempts using this technique to measure the presence and the role of thematic or topical information in short-term storage. As will be seen, the initial attempts did not succeed. The first attempt was with a procedure that added a step to that used in the preceding experiments. As before, a distractor task was used to clear short-term storage. Then the subjects were given a topic word that would presumably place the topic or theme again in short-term storage. The re-placing of the theme into short-term storage should, if the topic were important for ongoing processing, counter the effect of the distractor task. Text was read in either a continuous or interrupted condition. The distractor task (in the interrupted condition) was addition. Crossed with these two conditions was a theme versus no-theme condition. In the theme condition each paragraph sentence was preceded by a word designed to remind the subjects of the topic. In the paragraph shown in Table I the reminder word was “Jupiter.” In the no-theme condition the word preceding the next sentence would be a neutral item such as “text.” Our expectation was that if part of the critical information that subjects lost from short-term storage with the distractor task was thematic information, then furnishing them with a topic word would lessen the effect of the distractor. The results were negative. Although there was a strong effect of the distractor task, there was no effect of the topic reminder word.
Memory Mechanism in Text Comprehension
287
There were several possible reasons for the negative results. One possible reason was that thematic information was not being carried in short-term storage or was not important for ongoing processing. Another was that the thematic information was selectively maintained by the subjects despite the distractor task, although our detailed analysis of the data gave no sign of such special maintenance. A third possible reason was that the words we had selected did not correspond to the topic or thematic information subjects normally carry forward. Perhaps that information was more detailed than the information that was furnished, or it may have been canied in a different, more highly processed form. To try the large number of possible thematic statements did not seem feasible, particularly since those statements might be carried in a highly processed, abstract form. We decided instead to concentrate on the information we were fairly sure the subjects were carrying in short-term storage and to analyze its role further. In the course of this analysis, we thought we might identify several types of information being carried, including topic information.
E. SPECIFIC CONTENTS OF SHORT-TERM IN READING STORAGE We were fairly sure that the subjects were carrying the last one or two sentences in short-term storage, but we were somewhat less sure that they were critical for ongoing processing. We were also unsure whether those last two sentences were the only information being carried. The question considered next was whether all that the subjects needed for the continued processing of text were those last one or two sentences. To answer that question we tried to determine whether subjects would recover completely from an interruption if they were given the last sentence or two that preceded the interruption (a distractor task). We would clear short-term storage with a distractor task, reinsert the last one or two sentences, and then measure the subjects’ reading times. The logic is similar to that of the preceding experiment. If it turned out that resupplying the subjects with the last two sentences was sufficient for them to continue with normal reading, then the ongoing processing could be viewed as primarily sentence-to-sentencelinkage, If the last two sentences did not suffice, then another type of necessary information would be indicated. In the following experiments, the massive use of distractors between all sentences of a text that characterized the preceding experiments was replaced by a more focused technique. A distractor or interruption occurred at one place in a text, and the effect of that distractor with respect to reading time for successive sentences was measured. Thus, it was possible to view not only the impact of the interruption, but also the course of recovery from the interruption. In the first experiment (Glanzer et al., 1984, Experiment 3) with this technique, three
288
Murray Glanzer and Suzanne Donnenwerth Nolan
conditions were used: ( 1) continuous-the paragraph sentences were read in immediate succession; (2) interrupted-the paragraph was interrupted by the reading of another unrelated text; and (3) interrupted with repetition-the paragraph was interrupted as in condition 2, but instead of continuing after the interruption with the next sentence of the paragraph, the last two sentences read in that paragraph (before the interruption) were repeated. The interruption was effected by alternating blocks from different texts so that both texts furnished information concerning the effects of interruption. A sample sequence is presented in Table 111. The results of the experiment are shown in Fig. 4. The control condition is the continuous condition and affords the baseline for the other reading conditions. As might be expected, the first four sentences, which in the experimental conditions precede the interruption, are read at the same speed in all three conditions. In the simple interruption condition, the sentence that immediately follows the interruption, sentence 5 , is slowed by an average of 355 msec. The sentence that follows it, sentence 6, has still not returned to the control speed (although this elevation is not statistically significant). The third sentence after the interruption has fully returned to normal reading speed. These data support the assertion that two recent sentences have to be carried in short-term storage. In the interruption with repetition condition the contents of short-term storage were presumably removed and the last two sentences (before the interruption) placed again in short-term storage by presenting them again. The data show that the repetition eliminates almost all of the effect of the interruption. The slight elevation of 84 msec on sentence 5 above the control condition is not statistically significant. We could conclude that the reinsertion of the last two sentences fills in all the information lost during the interruption. However, the 84-msec elevation was bothersome, particularly since such elevations appear in the subsequent experiments. This deviation will be considered again. Also of interest are the reading times for the repeated sentences in the repetition condition. Both repeated sentences are read faster the second time than the first. However, there is a considerable drop in the reading time for the second repeated sentence as compared to the first. Even the reading of a repeated sentence is aided by the presence of a preceding sentence in short-term storage. The faster reading of a repeated sentence is not unexpected, but it does require an explanation in terms of any theory of comprehension that is adopted. The explanation we will use is closely related to the reason we now think that there is an advantage for the subjects in having a verbatim representation in short-term storage. The speed in reading the repeated sentence is interpreted as arising from its special relation to the trace it left in the text representation during its first reading. The repeated verbatim sentence is a strong cue for eliciting that part of the representation. The subject can therefore fit the repeated sentence into the representation without a search in long-term storage for its place in that represen-
Memory Mechanisms In Text Comprehension
289
TABLE 111 A SEQUENCE OF INTERRUPTEDPARAGRAPHS IN GLANZER, FISCHER, AND DORFMAN’S (1984) EXPERIMENT 3“*b 1
There have been many scientists over the years who have wanted to produce cheap synthetic diamonds. One chemist tried putting charcoal made from sugar between two blocks of very hot iron. He then plunged the blocks into cold water causing a sudden contraction of the iron. The contraction of the iron was supposed to exert great pressure on the sugar charcoal.
1 1 1
2
During the Middle Ages, the British military was controlled by people who spoke the French language. It was further influenced by France because many of its campaigns were fought in that region. These two factors resulted in the introduction of many French military terms into the English vocabulary. Many of the words acquired long ago from the French are still commonly used today.
2 2 2
According to his theory, the sudden pressure on the charcoal would turn it into a diamond. After many unsuccessful experiments, he finally had something that he thought was the real gem stone. However, later tests showed that he had failed once again and merely produced a carbide. Carbides are carbon compounds that are quite different from the carbon which is a diamond.
1 1
I I
2
These two factors resulted in the introduction of many French military terms into the English vocabulary. Many of the words acquired long ago from the French are commonly used today. Some of the words which we acquired from the French include “army,” “navy.” and “sergeant. ’’ Some of the terms were borrowed from the French to identify a new object or idea. In other cases, objects or ideas were identified by both an English and French word. For example, the word “battle” comes from the French and the word “fight” from the English.
2 2
2 2 2
There has never been much interest in producing man-made diamonds. One scientist tried to make a diamond by compressing sugar. This scientist thought that he had created a real gem. He actually produced a carbon compound called a carbide. The English army had a great influence on the French. The English language contains military terms acquired from the French. Words were borrowed in order to communicate with the French. “Battle” and “fight” were both borrowed from the French. ~
The first paragraph is in the interrupted without repetition condition. The second paragraph is in the interrupted with repetition condition. Each sentence was presented separately and in succession. In the continuous condition all of the first paragraph was presented and then all of the second. All pairs of paragraphs were followed by eight comprehension questions. The numeral 1 or 2 was presented before each paragraph sentence to make sure that the subject could distinguish between the two paragraphs. Below the paragraph are the true-false comprehension questions. 0
Murray Glanzer and Suzanne Donnenwerth Nolan
290
4100
-
mm-
uwo-
4100 urn
"t
m.
=Jam 3100
I-
I
-
I
I
p5"a
'
t I
I
4
=--
3300-
31M mm21100-
Zlrnj
I \
+CONTROL --InTERnuPT. NO REPEAT AIWTERRUPT. REPEAT314 P-
--
1
\
1 I
I
\ \
\
I I
I
'
1
1 II
I
\ I \ I \ I \I
A
Fig. 4. Mean reading times across eight sentences of text for the three experimental conditions of Glanzer et a/. (1984), Experiment 3. Abscissa positions 3R and 4R refer to repetitions of sentences 3 and 4. Breaks in the curves symbolize interruptions by another text.
tation. The place is automatically accessed. The additional speeding that occurs for the second repeated sentence may indicate that two verbatim sentences, the first in short-term storage from the previously read sentence, the second entering with the currently read sentence, are better cues for eliciting relevant parts of the underlying representation than one cue. The other possibility is that for the second repeated sentence there are three useful cuing units held in short-term storage: the first sentence (stored), the parts of the underlying text representation recovered by its presentation, and the second verbatim sentence (just entered). The idea of the role of both incoming and stored information as retrieval cues and their additivity will be considered again. They are additions to our initial formulation. Except for the slight 84-msec elevation previously mentioned, the data support the argument that the presence of the preceding one or two sentences in short-
Memory Mechanisms in Text Comprehension
29 1
term storage is sufficient for normal reading. ,This was the argument that was made in the paper cited (Glanzer et al., 1984). The next question concerned the specific information in those repeated sentences that permitted the subjects to recover from an interruption that cleared short-term storage of its contents. Several possible factors may be involved. One was already discussed in considering the faster reading of repeated sentences, namely, their cuing function. The preceding text sentences elicit the relevant parts of the underlying text representation. The second factor is higher-order thematic information, which may be embedded in or cued by the preceding sentences. A third factor, and the one considered next, concerns the way in which the new sentence is related to the preceding text. This factor, brought to the fore by the work of linguists (Grimes, 1975; Halliday & Hasan, 1976), will be considered here under the label of sentence-to-sentence linkages. It will be argued later that the linkage factor is closely related to the cuing factor. The information needed for sentence-to-sentence linkage is clearly returned with the repetition of sentences. The next question addressed was the role of that factor in the subjects’ use of short-term storage. Some of the data of the experiment previously described could be analyzed to give some light on the issue of linkage information. Thematic information will be considered again later.
F. INITIALEVIDENCE FOR THE ROLE OF LINKAGE INFORMATION IN SHORT-TERM STORAGE Using a suggestion from Grimes (1975) concerning factors involved in making text coherent, we classified sentences according to their role in linking with preceding text. Two classes were defined-dependent and independent. Dependent sentences contain words or phrases that can only be interpreted fully if the preceding text has been read. These included certain classes of anaphora such as pronouns and classes of connectives (e.g., causal connectives). An example of a dependent sentence is, “Eventually he learned to make what he needed instead of having to search for them.” The words “eventually,” “he” and “them” make the sentence dependent on the preceding text. An example of an independent sentence is, “Scientists who study gorillas in their native African habitat know what to do if one charges.” The topic sentence of a paragraph, particularly when it is the first sentence of the paragraph, is almost always independent. Other sentences in a paragraph may, however, also be independent. The preceding sentence on gorillas came from the middle of a paragraph. The idea of dependence and independence is closely related to the linguistic analysis of text coherence. For the present, we will consider the idea as defined here and as it relates to the subject’s processing of text. The first question of interest was the extent to which this linkage factor accounted for the effect of the
292
Murray Glanzer and Suzanne hnnenwerth Nolan
TABLE IV
IN
MEANCOMBINED READING TIMES (MSEC) FOR THE CONDITIONS GLANZER, FISCHER,AND DORFMAN’S (1984) EXPERIMENTS 3 AND 4 Reading condition
Sentence type
Control, continuous
Interrupt, no repetition
Dependent Independent
4429
4876 4807
4877
distractor task. To do this we classified the text sentences of the preceding experiment and another experiment involving such interruptions as dependent or independent. We then examined the reading times for each class in both the continuous and interrupted condition. The means are given in Table IV. It appeared that the interruption effect occurred only with dependent sentences. The data seemed to indicate again that the effect of the interruption was wholly due to the disruption of sentence-to-sentence linkage. This implied that the key and only needed component carried in short-term storage was linkage information. Linkage information could, of course, be nicely covered by the storage of verbatim sentences in short-term storage. As will be seen later, the data from this post hoc comparison were somewhat misleading. Moreover, there were signs mentioned earlier that the repetition of previously read sentences was not sufficient for the full recovery of reading speed. There was the slight elevation of 84 msec of the interrupt with repetition condition over the control condition in sentence 5 (see Fig. 4). This suggested that something was needed by the reader in addition to the last two sentences. The elevation was not statistically significant, but similar elevations appeared in this condition in the next two experiments. Therefore, the possibility remained that something else had to be recovered by the subject or be replaced in short-term storage. The possibility we considered next was the thematic or topic information that we had not been able to demonstrate earlier.
G . ROLE OF THEMATIC OR TOPIC INFORMATION: SECONDATTEMPT In the next experiment (Glanzer et al., 1984, Experiment 4), we pitted topic information against the information contained in the immediately preceding sen-
Memory Mechanisms in Text Comprehenslon
293
e-CONTROL IJ-
- -INTERRUPT. NO REPEAT
- - INTERRUPT. REPEAT 5 k - - INTERRUPT. REPEAT 1
f+
Fig. 5. Meaning reading times across nine sentences of text for the four experimental conditions of Glanzer er al. (1984). Experiment 4. Abscissa position R refers to the repetition of either sentence 1 or 5. Breaks in the curve symbolize interruptions by another text.
tence. The materials and basic structure of the preceding experiment were used. These was a control, continuous condition, an interruption without repetition condition, and now two interruption with repetition conditions. In one, the sentence that immediately preceded the interruption was repeated. In the other, a topic sentence, the first sentence of the paragraph, was repeated. The results of the experiment are shown in Fig. 5 . The results for the control condition, interruption without repetition condition, and interruption with repetition of sentence 5 (the last paragraph sentence read before the interruption) condition replicate the findings of the preceding experiment. Interruption with repetition of the topic sentence, however, disrupts the reading. The increase over the control condition is over 500 msec. The results seemed to indicate that topical information was unimportant and that the sentence-to-sentence linkage information was critical. The story is not complete,
294
Murray Glanzer and Suzanne Donnenwerth Nolan
however, since the information in the preceding sentence and topical information were in opposition here. The effects of the two were not being measured separately. However, the indications were again strong that a major factor was linkage information. These indications were further strengthened by the results from a follow-up experiment. That experiment showed that the specific location of the repeated sentence in the preceding text (early vs late) was not the important variable. A sentence that linked appropriately with the next text sentence would, when repeated, serve equally well to help subjects recover from an interruption. At this point, short-term storage seemed to hold nothing but the last one or two sentences read and their function was solely to permit the subject to link successive sentences to form a coherent text. The weight of evidence was against any strong, positive effect of thematic information in immediate processing. OF THE BUILDING H. DIRECTEXAMINATION OF COHESION, SENTENCE-TO-SENTENCE LINKAGE
The post hoc analysis (Section III,F) indicated that a critical type of information carried in short-term storage was related to the linkage or dependence of the new sentence to preceding text. Linguists have considered what was labeled dependence under the heading of cohesion devices. We will follow the usage and system introduced by Halliday and Hasan (1976), who have outlined four major classes of grammatical cohesion devices: reference (pronominals, demonstratives, definite articles, and comparatives), substitution, ellipsis, and connectives (additive, adversative, causal, and temporal). The devices used in the studies to follow were primarily in the class of reference: demonstratives, definite articles, and pronominals. A second major class used was conjunction, connectives such as “however” and “thus.” Also used were substitutions, for example, “phenomenon” substituting for a more specific noun such as “hibernation” in a succeeding sentence. The following dependent sentence uses both reference (the pronominals “them” and “its”) and conjunction (the adversative connective “but”): “But few of them realize what a remarkable achievement its construction was. ” The sentence cannot be understood fully without some preceding text. It would be made independent by rewriting it as follows: “Few people crossing the Brooklyn Bridge realize what a remarkable achievement the bridge’s construction was. In order to examine the role of grammatical cohesion devices, linkages, in short-term storage and text comprehension, a series of four studies (Fischer & Glanzer, 1986) was carried out. The studies varied the dependence of specific sentences in the text and examined the reading times for those sentences when they had been preceded by an interruption. The interruption was designed to remove the contents of short-term storage as in the earlier experiments reported here. A sample of the material used is given in Table V. The distractor task used ”
Memory Mechanism in Text Comprehension
295
TABLE V
SAMPLETEXTUSED BY FISCHERAND GLANZER(1986), EXPERIMENT 1“ Computers The word “computer” may be used to refer to any device that calculates or computes. However, its use is most often restricted to a particular device that has several distinguishing features. (Dependent) The word “computer” is usually used to refer to a particular device with several distinguishing features. (Independent) Computers are always electronic and because of this they can operate at very high speeds. (Independent) They are always electronic and because of this they can operate at very high speeds. (Dependent) A second important feature is that they have the ability to retain facts and figures. (Dependent) An important feature of computers is that they are able to retain facts and figures. (Independent) A computer’s ability to retain facts and figures is referred to as memory or internal storage. (Independent) The feature is quite often referred to as the memory or internal storage of the computer. (Dependent) Information stored in the machine’s memory can be recalled quickly and easily at some future time. Another distinguishing feature is that a computer holds in its memory a set of instructions. A set of instructions that is held in a computer’s memory is called a program. a The sentences preceded by a (1) appeared in one version of the text, the sentences preceded by a (2) in the other, counterbalanced version. The material in parentheses, numerals and classification, did not, of course, appear in the text presented to subjects.
in the first experiment of this series was the reading of a set of unrelated factual statements. We expected, with these materials, to repeat the findings shown in Table IVa disruptive effect of the distractor task on the dependent sentences, but not on the independent sentences. What we found was a differential effect of the distractor task on the dependent sentences, but a considerable effect on the independent sentences as well. The results are given in Table VI, and we will rely on those results for our further discussion, since they are based on a wider sampling of sentences and a careful matching and counterbalancing of the sentence conditions. That statement does not apply to the post hoc analysis that gave the data in Table IV. A main and expected finding seen in Table VI was the strong differential disruptive effect of the distractor on the dependent sentences. This fully supports the effect found in Table IV. Table VI also indicates that in the continuous condition dependent sentences are read faster than independent sentences. This effect appears in all five experi-
296
Murray Glanzer and Suzanne Donnenwerth Nolan
TABLE VI
MEANREADINGTIMES(MSEC) CONDITIONSIN FISCHERAND GLANZER’S( 1986) EXPERIMENT 1
FOR THE
Reading condition
Sentence type
Continuous
Distractor
Dependent Independent
4730 4822
5718 5410
ments we have run in which this factor was analyzed, but it is relatively small in size and not statistically significant in any one of the experiments. Moreover, the basis for the effect is not clear at present. It may be due to the efficiency with which text can be organized when it consists of dependent sentences. Dependent sentences indicate very clearly how the successive sentences are linked. However, the effect may also be due to word frequency effects. Dependent sentences have higher-frequency words (e.g., pronouns instead of nouns) than independent sentences. These points do not, of course, take away from the disruptive effect on dependent sentences in the interruption condition. Another important characteristic of the data in Table VI is the large increase of reading time in the interruption condition, which cannot be ascribed to the loss of linkage information. In the case of independent sentences, the distractor task increases the reading time in resuming the text by nearly 600 msec. This finding indicates that another class of information besides linkage information is lost with the interruption. One class of information that may be missing, information removed by the distractor task, is topic information. Further data on topic information will be presented later (Section IIIJ). The function of the topic information during ongoing processing will also be considered later. It will be proposed that topic information serves to cue relevant parts of the text representation for retrieval during processing. Two further experiments of the same type as the last were carried out, one using digit recall, the other using addition problems as the distractor task. The same general pattern of results was obtained as that shown in Table VI. Dependent sentences were slowed by interruption more than independent sentences. The overall effect of the interruption was much greater with these other distractor tasks, however. With digit recall the independent sentences increase in reading by 895 msec, and with addition problems by 1704 msec. The increase with a reading distractor task is only 588 msec (Table VI). These large increases in reading time with digit recall and addition suggest still another factor in deter-
Memory Mechanisms in Text Comprehension
297
mining reading time-a set for reading. Initially, we assumed on the basis of ideas from interference theorizing that using reading as a distractor task would produce larger effects than tasks that did not involve reading. The performance called for in the distractor task does have an effect, but not the one we expected. The less like reading the distractor task is, the more disruptive it is. We labeled the additional factor “reading set.” This set may be the same as the very general structures often assumed to control the reading of a text and include the goals and interests of the reader in reading the text (Kintsch & van Dijk, 1978; Meyer, 1984). They may also include the more specific knowledge structures that apply to a particular type of text, for example, a narrative story schema for a simple story (Mandler & Johnson, 1977) or the organizational schemata for expository prose outlined by Meyer (1975, 1984). The factor we are calling reading set may include such controlling schemata. The increases in reading time caused by the loss of reading set reflect, in part, the time needed to reconstruct or reactivate the controlling schemata that the subjects use in processing texts. A considerable amount of further work is needed, however, to specify how this factor appears or reappears during ongoing processing. In a final experiment in the series, the effect of both arithmetic and reading distractor tasks was examined using within-subjects comparisons. In addition, various classes of distractor reading material were compared-cases in which the interpolated sentences were unrelated to the text, cases in which the interpolated sentences offered possible but incorrect referents for the anaphors in subsequent sentences, and cases in which the interpolated sentences were thematically related to the interrupted text. Again, on the basis of interference theorizing we expected that thematically related distractor sentences would be particularly disruptive. The data showed, however, that any effect the thematic information had in this case was facilitative. On the basis of the various classes of distractor tasks, we were able to define four main factors as determinants of reading time. They were the following: independence, presence of needed linkage material in short-term storage, presence of topic or theme in short-term storage, and set. These four factors were incorporated in a multiple-regression analysis that is summarized in Table VII. The regression weights can be translated into the time it takes the subjects to recover from the lack of the listed factor. For example, if needed linkage information is not in short-term storage, as would occur with a dependent sentence after a distractor task, it costs the subject 448 msec to recover. The regression analysis was extended then to include all four experiments of the series, adding some constants to cover changes in the base reading rates for the four experiments. When extended in this fashion, we obtained a similar set of weights for the four main factors in Table VII, but with higher significance levels. (For example, the weight for dependence was now 156 and its significance level p < .076.) The multiple correlation squared was .986, F(8,13) = 112.12,
298
Murray Glanzer and Suzanne Donnenwerth Nolan
TABLE VII RESULTSOF MULTIPLEREGRESSION ANALYSISFOR FISCHER AND GLANZER’S (1986) EXPERIMENT IV” Factor
Regression weights
r (df = 5 )
p
Dependence (bl) Needed linkage information in STS (b2) Theme in STS (b3) Reading set (b4)
142 448
1.17 3.49