THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 3
CONTRIBUTORS TO THIS V O L U M E ...
98 downloads
1297 Views
20MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 3
CONTRIBUTORS TO THIS V O L U M E
Harley A . Bernbach
Richard 8.Bogartz Kenneth R. Laughery Marvin Levine Michael I . Posner Leo Postman Allan R . Wagner
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
EDITEDBY GORDON H. BOWER STANFORD UNIVERSITY, STANFORD, CALIFORNIA AND
JANET TAYLOR SPENCE
UNIVERSITY OF TEXAS,AUSTIN,TEXAS
Volume 3
1969 ACADEMIC PRESS
New York
0
London
COPYRIGHT
0 1969, BY ACADEMIC PRESS, INC.
ALL RIGHTS RESERVED NO PART OF THIS BOOK MAY BE REPRODUCED IN ANY FORM, BY PHOTOSTAT, MICROFILM, RETRIEVAL SYSTEM, OR ANY OTHER MEANS, WITHOUT WRITTEN PERMISSION FROM THE PUBLISHERS.
ACADEMIC PRESS, INC. 111 Fifth Avenue, New York, New York 10003
United K i n g d o m E d i t i o n published by ACADEMIC PRESS, INC. (LONDON) LTD. Berkeley Square House, London W1X 6BA
LIBRARY OF CONGRESSCATALOG CARD NUMBER:66-30104
PRINTED I N THE UNITED STATES O F AMERICA
LIST OF CONTRIBUTORS Harley A. Bernbach, Cornell University, Ithaca, New York Richard S. Bogartz, University of Illinois, Urbana, Illinois Kenneth R. Laughery, State University of New York at Buffalo, Buffalo, New York Marvin Levine, State University of New York at Stony Brook, Stony Brook, New York Michael I. Posner, University of Oregon, Eugene, Oregon Leo Postman, University of California, Berkeley, California Allan R. Wagner, Yale University, New Haven, Connecticut
V
This Page Intentionally Left Blank
PREFACE This is the third volume of the annual serial publication “The Psychology of Learning and Motivation,” and its format is similar to the first two volumes. Ideally these volumes are to provide a forum in which a contributor can pull together the several facets of his research around a single problem or theory, providing thereby a sustained and integrated characterization of his recent research and its import. The contributions typically involve the reporting of new experimental results along with selective reviewing of results previously scattered throughout the professional journals. I n the midst of the scientific knowledge explosion, such collections of summary papers, in which prominent investigators provide an overview of the present state of their research, can serve a unique and valuable function not presently served by current journals or by annual reviews of an entire field. The aim of science, after all, is comprehension, and our comprehension of the thrust of another man’s research is materially improved if he is permitted “elbow room” to expand upon the context of his experimental efforts-the background, intuitive hunches, and speculative relations between his data and other phenomena of the science. I n this regard, contributors to these volumes are allowed considerably more leeway and freedom than in current edited journals to tell their research stories as they wish, emphasizing what they believe to be exciting and significant. The editors have hoped to be eclectic in taste and range of coverage of the various subdivisions and topics within the areas spanned by learning and motivation. The main criteria for inviting a contribution is that the editors felt that the investigator had something new and interesting to write about. To this end, contributions have been invited from a number of eminent investigators spanning a broadly diverse range of topics, with the date of the contributions to be chosen by the investigator. The vicissitudes of acceptances and self-selected deadlines by contributors produce some unintentional nonrandomness in coverage within particular volumes; thus, Volume 3 is more heavily weighted on the side of human learning and information processing, while the projected Volume 4 will equalize matters by having more articles on motivation, conditioning, and animal learning. The range of topics in these volumes will Vii
viii
Preface
vary about as much as the range in random samples of five to seven articles from any issue of current experimental and theoretical journals in psychology.
GORDONH. BOWER JANET T. SPENCE September 1969
CONTENTS
................................................
v
..........................................................
vii
Contents of Previous Volumes ........................................
... xm
List of Contributors Preface
STIMULUS SELECTION AND A “MODIFIED CONTINUITY THEORY” Allan R . Wagner
. .
I Introduction ................................................ I1. The Research Strategy ....................................... I11 Cue Validity and Stimulus Selection ............................ IV . Theoretical Alternatives ...................................... V. An Experimental Evaluation of Modified Continuity Theory ....... VI . Concluding Comments ....................................... References .................................................
1 2 4 25 34 38 40
ABSTRACTION AND THE PROCESS OF RECOGNITION Michacl I . Posner
. . . .
I Introduction ................................................ I1 Stimulus Examination ....................................... I11 Past Experience ............................................. IV Visual Representation in Memory ............................... V. Separating the Visual and Name Codes of Prior Stimulation ....... VI . Summary and Conclusions .................................... References .................................................
44 47 56 74 84 94 96
NEO-NONCONTINUITY THEORY Marvin Levine
. . .
I Introduction ................................................ I1 Probingfor H s .............................................. I11. The Dynamics of H Testing ................................... IV Discussion .................................................. V . Appendix ................................................... References ................................................. ix
101 103 105 122 127 132
Contents
X
COMPUTER SIMULATION OF SHORT-TERM MEMORY: A COMPONENT-DECAY MODEL Kenneth R . Laughery
.
................................................
I Introduction I1 The Model-An Overview ..................................... I11 The Model-A Detailed Description ............................ TX A Sample Simulation ........................................ V Some Simulation Results ..................................... VI Discussion and Conclusions ................................... References
. . . . .
.................................................
135 138 139 174 182 188 197
REPLICATION PROCESSES IN HUMAN MEMORY AND LEARNING Harley A . Bernbach
. Introduction ................................................ . Bwic Properties of the Theory ................................. . Serial-Position Effects in Short-Term Memory ................... . Some Other Short-Term Memory Tasks ......................... . Repeated Presentations and Learning .......................... . Some Evidence for Rehearsal Processes ......................... . Concluding Remarks ......................................... References .................................................
I I1 I11 IV V VI VII
201 202 206 215 223 231 236 237
EXPERIMENTAL ANALYSIS OF LEARNING TO LEARN Leo Postman
. . . .
................................................
I Introduction I1 The Role of Warm.Up ........................................ I11 Two-Stage Analysis of Nonspecific Transfer ..................... N . Whole versus Part Learning .................................. V Acquisition of Transfer Skills .................................. VI The Effects of Practice on Recall ............................... VII Conclusions References
. .
................................................. .................................................
241 242 256 263 273 285 296 296
SHORT-TERM MEMORY IN BINARY PREDICTION BY CHILDREN:
SOME STOCHASTIC INFORMATION PROCESSING MODELS Richard S. Bogartz I. Single Alternation ........................................... I1. A Model for Single Alternation ................................ 111. Data ...................................................... IV. Extension to the Effects of Intertrial Interval Duration ...........
300 312 329 34 1
Contents
. Extension t o Interpolated Events .............................. . Extension to Markov Event Sequences ......................... . Noncontingent Event Sequences ............................... . Conclusions and Directions ................................... References .................................................
xi
V VI VII VIII
356 363 373 386 389
..................................................... SubjeCtInde;z: .....................................................
393
AuthorIdx
398
This Page Intentionally Left Blank
CONTENTS OF PREVIOUS VOLUMES Volume 1 Partial Reinforcement Effects on Vigor and Persistence ABRAMAMSEL
A Sequential Hypothesis of Instrumental Learning E. J. CAPALDI Satiation and Curiosity HARRYFOWLER
A Multicomponent Theory of the Memory Trace GORDONBOWER Organization and Memory GEORGEMANDLER AUTHOR INDEX-SUBJECT
INDEX
Volume 2 Incentive Theory and Changes in Reward FRANK A. LOGAN Shift in Activity and the Concept of Persisting Tendency DAVID BIRCH Human Memory: A Proposed System and Its Control Processes R. C. ATKINSON AND R. M. SHIFFRIN Mediation and Conceptual Behavior HOWARD K. KENDLER AND TRACYS. KENDLER AUTHOR INDEX-SUBJECT
INDEX
...
This Page Intentionally Left Blank
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 3
This Page Intentionally Left Blank
STIMULUS SELECTION AND ‘‘MODIFIED CONTINUITY THEORY’ ’’ Allan R. Wagner YALE UNIVERSITY NEW HAVEN. CONNECTICUT
I. Introduction ............................................. 11. The Research Strategy. ................................... 111. Cue Validity and Stimulus Selection.. ....................... A. The Basic Experiment. ................................ B. Confirming Data ...................................... C. An Empirical Extension.. .............................. I V. Theoretical Alternatives. .................................. A. AttentionalTheory .................................... B. Modified Continuity Theory.. ........................... V. An Experimental Evaluation of Modified Continuity Theory.. V I. Concluding Comments. ................................... References ..............................................
..
1 2 4 5 14 20 25 26 28 34 38 40
I. Introduction
It is rarely, if ever, the case in a learning situation that only a single descriptive feature of the environment offers information concerning the availability of reward or the occasions for reinforcement. It is generally possible to identify a number of “elements,” “dimensions,” or “attributes” of the situation, each of which has some degree of correlation with the signaled event, and to which it is known that the subject could be trained to respond discriminatively. A persistent question that has been associated with an uncommon degree of controversy (see, e.g., Goodrich, Ross, & Wagner, 1961; Mackintosh, 1965a; Trabasso & Bower, 1968) concerns the degree to whichSs make use of, or learn about, such multiple “cues.” For example, does S learn about each cue as though it were the only cue available, or does X “focus)’on only a portion, or on a single one, of the potential cues? There has rarely been any real issue over the fact that the availability of one cue in the environment may reduce the amount that S learns about the other available cues (e.g., Spence, 1936), and there is ample evidence (e.g., Hughes & North, 1959)that S may learn about more than a single cue at once. Thus, it is tempting to agree with Bruner, Matter, and Papanek (1955)that it is after all an empirical question to determine Preparation of this paper and the research reported were supported in part by National Science Foundation Grant GB-6534. 1
2
Allan R. Wagner
the range of cues responded to in a given situation. However, data concerning the degree of focusing or stimulus selection that characterizes common learning situations have been viewed as crucial for determining whether “attention” or some “attention-like” construct is to be awarded a prominent role in an adequate theory of learning (e.g., Lashley, 1942; Mackintosh, 1965a). It can be argued (see Wagner, 1969a) that few experimental designs have been particularly relevant for judging the usefulness of attentional theory because of the lack of experimental control over the schedule of stimulation and reinforcement which 8s receive. There is little question, however, that there has been a recent wave of sympathy for attentional theory (e.g., Sutherland, 1964; Trabasso & Bower, 1968; Zeaman & House, 1963; Lovejoy, 1965; Mackintosh, 1965a), based in part on the apparent pervasiveness of stimulus-selection effects. I n the present chapter, data will be presented which support the contention that stimulus selection is a potent effect, even in experimental situations that allow considerable control over the schedules of stimulation and reinforcement to which S is exposed. Yet, it will be questioned whether or not such data demand an “attentional” interpretation. A theoretical alternative suggested by Kamin (1968, 1969) and elaborated by Wagner (1969a) as a “modified continuity theory” will be discussed, and new data will be presented for which this approach appears to offer a relatively unique account. Although at this time the theory must be regarded as especially tentative and incomplete, opportunity will be taken to suggest its potential usefulness in interpreting several phenomena that have been in search of theoretical integration.
II. The Research Strategy There are a number of common experimental procedures for evaluating the degree of focusing or stimulus selection. Perhaps the most obvious involves a comparison of the degree of learning exhibited to some stimulus when it is the only relevant cue available, as compared to a condition in which there are additional relevant cues. The degree to which the presence of additional relevant cues reduces the apparent amount learned is taken to indicate the degree of stimulus selection (e.g.. Sutherland & Andelman, 1967; Lovejoy & Russell, 1967). Another procedure involves holding constant the number of relevant cues, but varying in some manner the saliency of the redundant cues. It is thereby possible to evaluate the decrement in behavioral control acquired by a cue as a function of the increasing potency of the alternative cues (e.g., Lawrence, 1950 ; Mackintosh, 196513). It is also possible to examine the correlations between the amounts learned about several redundant cues: If there is intersubject variability in the learning rates with respect to one such
Stimulus Selection and a “Modified Continuity Theory”
3
cue, will there be concomitant variability in the learning with respect to other available cues such that it appears that the more S learns about one cue the less it learns about others (e.g., Sutherland & Holgate, 1966; Sutherland, 1966)? The present research strategy follows in this basic line, but the essential question has been whether or not the learning that occurs with respect to one cue is dependent upon the validity of other concomitant cues. The problem to which the research has been addressed can best be elaborated at this point by considering a simple experimental paradigm. Suppose a cue (X) is sometimes followed by a reinforcing stimulus (US) in a classical conditioning situation, or by the availability of reward, consequent to some response, in an instrumental learning situation. Also suppose, however, that X is never present alone but always in compound with a second cue (A) and furthermore, that, in the general case, A and the reinforcing event are each free to occur with some frequency in the absence of X. Given such a training situation, it is possible to construct one contingency matrix describing the frequencies with which reinforcement and nonreinforcement occur in relationship to the presence and absence of A, and a second matrix describing the frequencies with which reinforcement and nonreinforcement occur in relationship to the presence and absence of X. It is then possible to ask which of the frequencies contained in the two matrixes are important in determining the degree of learning that occurs with respect to X, i.e., its “associative strength” or “signal value.” Nonselective treatments of associative learning (e.g., Hull, 1943, 1950) have conventionally emphasized the importance of the frequencies contained in only one row of the X matrix, i.e., the number of occasions on which X has been experienced and then “paired” with reinforcement, and the number of occasions on which X has been experienced and then not “paired” with reinforcement. These, respective, conditioning and extinction experiences, are assumed to determine the degree to which X will come to be responded to as though it signaled reinforcement. Other theorists (e.g., Tolman and Brunswick, 1935; Rescorla, 1966) would challenge this emphasis with the suggestion that all of the frequencies in the X matrix may be important, including the number of occasions on which reinforcement and nonreinforcement have been experienced in the absence of X. Rescorla (1968) has, in fact, shown that as the probability of reinforcement in the absence of a cue approaches the probability of reinforcement in the presence of that cue, the apparent associative strength of the cue becomes negligible, regardless of the frequency of reinforcement in the presence of the cue. Such evidence suggests that the signal value of X may depend upon the correlation between
4
Allan R. Wagner
the presence and absence of X and the presence and absence of reinforcement, i.e., upon the validity of X in predicting reinforcement. The research to be described will comment on the above question, but the major issue in the present context involves whether or not some portion of the A matrix may be of significance, independent of the X matrix, in determining the signal value of X. Since X occurs only in compound with A, will S’s experiences with respect to the presence and absence of A influence what is learned about X ? If there were some stimulus-selection process so that X competed with A for the behavioral effects resulting from reinforcement and nonreinforcement, such might be expected. Arelevant attentional view has been voiced by Sutherland (e.g., 1964) and Mackintosh (e.g., 1965a). According t o this position, the acquisition of behavioral control is assumed t o be mediated by a stimulus-selection mechanism (stimulus analyzer), such that if an appropriate analyzer is not “switched in,” a cue will be ineffective in acquiring new associative tendencies as a result of reinforcement or nonreinforcement. The likelihood that an analyzer appropriate to a cue will be switched in is assumed to depend upon the validity of that cue (“on differences [in the outputs of the stimulus analyzer] being consistently associated with the subsequent occurrence of events of importance to the animal.” Sutherland, 1964, p. 57). It is assumed also, however, to depend inversely upon the validity of other concomitant cues, since the subject is assumed to be capable of attending simultaneously to only a limited number of cues, i.e., of having only a limited number of analyzers switched in. Thus, in the case of the paradigmatic example, it should be expected that all of the frequencies in the A and X matrixes would be important in determining the signal value of X : The degree to which a subject will attend to, and hence learn about, the X cue should depend upon the relative correlations of A and X with the occurrence of reinforcement.
111. Cue Validity and Stimulus Selection The following section describes a series of interrelated experiments designed to evaluate the proposition that the signal value of a cue depends upon the relative validity of other available cues. The studies, as will be seen, employed several different training situations, including both classical and instrumental conditioning. Each environment was selected by virtue of its relatively frequent experimental usage, and by virtue of allowing reasonably good experimental control over the conditions of stimulation and reinforcement, as compared, for example, with a selective learning situation. Such control, which is critical to the problem under investigation, was no doubt better afforded in those studies that employed classical conditioning than in those that
Stimulus Selection and a “Modified Continuity Theory”
5
involved instrumental learning. Yet, it was deemed advantageous to employ some variety of common learning situations. If there were an apparent selectivity in whether a cue would come to be reacted to as a signal, depending on the validity of other available cues, it would be important t o determine whether such effects were rather general, or were peculiar to the idiosyncratic characteristics of one experimental situation or to the choice of one referent behavior.
A. THEBASICEXPERIMENT The first experiments to be described closely adhered to the paradigm presented in Section 11. I n the initial study (Wagner, 1969b),the signaled event was the availability of food reward in an operant conditioning situation. Thirty-six rats were shaped to bar-press on a Variable Interval 20-second reward schedule. For all Ss, the VI reward schedule was then arranged to be in effect only during irregularly occurring 1-minute intervals during daily 2-hour training sessions. Twenty such occasions in each session were signaled by a simultaneous compound (AX) consisting of a 2500-Hz tone (A) and the 3/second flashing of two relatively bright chamber lamps ( X ) . The Ss were divided into three treatment groups, distinguished b j whether or not the A cue was ever presented alone, and by whether or not reward was available if A were presented alone. For Group I, A was presented alone daily on 20 1-minute occasions, randomly interspersed in sequence with the compound occasions, and reward was then available on the same VI schedule as during the compound. For Group 11,A was never presented alone. Finally, for Group 111,A was presented alone on the same scheduled occasions as in Group I, but reward was not then available. At all other times, A and X were absent and reward was unavailable for all Ss. Training was continued for 25 sessions, but testing was begun on Day 7 , by which time bar-pressing performance of the three groups had stabilized. Testing involved the nonreinforced presentation of the X element alone, once every second day in place of a normally occurring compound trial. There were thus 10 occasions, distributed over the last 19 %hour sessions, in which the light cue, which was otherwise presented only in the AX compound for all groups, was presented alone for 1 minute. The principle question was whether or not responding to X alone would differ in the three groups. If the signal value of a cue treated such as X depends only upon the number of occasions on which reward does or does not occur in the presence of the cue, there should have been no differences. If, however, those events that occur in the absence of X also influence the signal value of X, the groups might have been expected to
Allan R. Wagner
6
differ in their test trial responding to X. More specifically, if the signal value of X is dependent on the relative validities of A and X in the manner suggested by Sutherland (1964), the degree of responding to X should have been ordered Group I < Group I1 < Group 111. Figure 1 summarizes, for each of the three training conditions, the probability of reinforcement in the presence of the AX compound, in the presence of the A cue when occurring alone, and in the absence of both A and X [represented as “not (A or X)”]. Also indicated are the separate correlations of A and X with reinforcement. As may be seen, for Group I, A was perfectly valid, but the occurrence of reward in the presence of A
PROBABILITY OF REINFORCEMENT
CORRELATION WITH REINFORCEMENT
IR!El AX
A
-
AUX
m A
X
II
m
1.0
Fee and Pce
11
B
6
> fjec
fjce > Fee and
.i.
Try both row I1 and row I11 of this table. Choose the solution giving the greater value for In L in Eq. ( 2 5 ) . n,, .-
+ n,,
- rice - nee
0
2(nce
Fce
111
G fjec
ncc + n e e - nce
P,, < pee and
n..
0 Fce
IV
a p- i j
- %,
2(nce
+ net)
n..
Fec
Pc, G Fee and Fce
+ nee)
n..
n..
Pec - Fce
Pec
= ni,ni.-l
(i,j
= e,
c).
Pee
- Fce
2 F C C
Short-Term Memory in Binary Prediction by Children
327
the various transitions. Table IV provides examples which correspond, row by row, to the four cases covered in Table III. Consider row IS of Table IV. The transition count matrix indicates that 40 correct responses followed correct responses, 60 errors followed correct responses, 60 correct responses followed errors, and 30 errors followed errors. Converting to relative frequencies or conditional probability estimates by dividing each count by its associated row total, i.e., 401100, 60ll00, 60190, and 30190, we obtain the matrix of relative frequencies. Inspection of these relative frequencies reveals that they do stand in the relations indicated in row I1 of Table 111. The estimates of a , /3, and y in Table IV are then obtained using the corresponding formulas in Table III. A similar discussion applies to the other three rows of the two tables. The basis for the method used to obtain the entries in Table I11 can be found in an article by Brunk (1958). The method consists of a series of
TABLE IV
EXAMPLES FOR THE VARIOUS Transition counts Case
Relative frequencies
C
e
TABLE111
Parameter estimates ___.
a
(ni,nlC1)
(nij)
I
CASES CONSIDERED IN
C
e
.500
.500
C
e
B
P
Rows I1 and,111 both give in this case 6 = /3 = 0, 9 = 1 since for row I1 6 < 0, and for row 111, < 0, therefore use either row to obtain 9 = 1
fi
e
I1
C
e
c
1
e
111
C
IV
C
c
e
e
e
1
.667
.333
C
e
C
.557
.052
0
.948
0
.304
.696
,157
.043
.800
e
.443
Richard S. Bogartz
328
steps. I n Step 1, the unconstrained maximum likelihood estimates of the parameters are found by the usual methods [Eq. (27)1. If they lie in the region of constraint, they are the appropriate values (see row Iv, Table 111),and no further steps are taken. If one or more lie outside the region, one is chosen and it is brought to the boundary of the region (e.g., a negative value of /3 becomes 0 ) and the estimates are obtained again, given the assumption that the maximum likelihood estimate for the chosen parameter is the value a t the boundary. If all the remaining parameter estimates now are in the permissible region, the maximum likelihood estimates are now in hand and the procedure stops. Otherwise, it continues by repetition of Step 1 using one of the parameters still remaining outside the theoretically permissible region. I n referring to the article by Brunk (1958) in the present context, the reader may wish to have available the fact that the maximization problem of interest here can be viewed as the minimization of a convex function on the intersection of closed, convex sets. First note that maximization of the expression in Eq. ( 2 5 ) ,which will be denoted by f (P, y ) , is equivalent t o minimization of -f(/3,y). The domain of f ( , 8 , y ) is the parallelogram with sides y = 0, /3 = 1 - y/2, y = 2, and /3 = y/2. The theoretical constraints limit f @,y ) t o the triangular domain with boundaries /3 = 0 , y = 0, and /3 y = 1. Given that a set S is convex if for every x and y in S and every 0 f 0 < 1, 0%+ ( 1 - O)y is also in S (geometrically, if two points are in the set, then any point on the line joining those two points is also in the set), it is obvious that both domains are convex and that the triangular domain is the intersection of the two. It is a simple matter t o show that -f (,8,y) is a convex function if we note that ( 1 ) a sufficient condition for convexity is that the Hessian of -f(,B,y), i.e., the matrix of second partial derivatives with respect to ,B and y , be positive semidefinite, and (2) a sufficient condition for positive semidefiniteness of a matrix is given by the Hadamard-Gershgorin theorem which states that a matrix is positive semidefinite if for every i,
+
where qij is the entry in the ith row and the j t h column of the matrix (Wolfe, 1965). It follows immediately that Brunk’s (1958, p. 438) theorem (Theorem 2.1) applies here, providing a justification for Table 111. We may also note in passing that the convexity of -f(,B, y ) implies the concavity off (,8,y), and it is well known that any local maximum of a concave function over a given domain is in fact its global maximum over that domain. Thus, the value of the likelihood function the logarithm of which is given in Eq. ( 2 5 ) attains its global maximum (rather than just
Short-Term Memory in Binary Prediction by Children
329
a local maximum as may occur in the solution of likelihood equations) over the parameter space with insertion of the appropriate value from Table 111. No new difficulties arise in the estimation of parameter values for a group of subjects assumed to be homogeneous in their parameter values. The maximum likelihood estimate of the initial probability of a correct response is just the proportion of correct responses on the first trial, and the maximum likelihood estimates of u, p, and y are obtained as for an individual subject, using the transition count over all subjects in conjunction with Table 111.
111. Data A. EXPERIMENT I : PREDICTION OF A SINGLEALTERNATION SEQUENCE In this section and the next one, two sets of data are analyzed from the point of view of the theory just considered. None of these data sets was collected for the purpose of testing the theory ;they had all been collected for other purposes before the theory was formulated. This comment will explain the presentation of data in Experiment I1 based on relatively small numbers of trials per subject. 1. Method The subjects in this first experiment were 20 4-and 5-year old preschool children attending classes a t the University Preschool Laboratories of the University of Iowa. Each child was taken individually to an experimental room and seated a t a table upon which the apparatus was mounted. From the child’s point of view, the apparatus was a small, metal receptacle and a microswitch mounted on the front of a large black box. Concealed within the box were 105 marbles stored sequentially such that each time a buzzer mounted within the box sounded, the next marble in sequence could be released into the receptacle by depression of the microswitch. After the marble was released, the switch was then deactivated until the next sounding of the buzzer since the circuit was arranged so that the current sources used to sound the buzzer also charged a capacitor which.was discharged across a solenoid to release the marble. The child was told that he was going to play a marble game. He was instructed t o guess the color of the next marble each time the buzzer sounded, and then to release the next marble by pressing the switch only after making his guess. He was then shown a pair of marbles and asked t o identify their colors. A randomly selected pair of the four colors black, blue, white, and yellow was assigned to each child. He was told that the pair of colors would be used in the game but that first he would
Richard S. Bogartz
330
practice with some red marbles. The first 5 marbles were red for all children and the remaining 100 marbles followed in a single alternation sequence of the two randomly selected colors. For example, a child assigned the pair blue-yellow received the sequence RRRRRBYBY . . . BY. After the five practice trials with the red marbles, the child was again shown the two colors and reminded that only those colors would be used thereafter. The 100 trials were then run without interruption. The buzzer, programmed by two cycling Hunter decade interval timers, sounded for . 3 second every 8 seconds. The experimenter removed each marble from the receptacle after approximately 1 second of exposure and placed it in an opaque container. 2. Results
A standard
x2 test
for homogeneity of transition matrices over the
20 subjects (Anderson & Goodman, 1957) gave a x2 of 154.33 on 38 df which is equivalent to a z of 8.91, is significant, and indicates rejection
-OBSERVED --ePREDICTED
.9
0
----
kO--d :-8 -. r a W
f
.61
I
I
I
I
I
2
3
4
5
BLOCKS OF 20 TRIALS
FIG. 4. Observed and predicted mean proportions of correct responses in successive 20-trial blocks (Experiment I).
of the hypothesis that the individual protocols were all sampled from the same Markov chain. Heterogeneity of this kind renders the standard xZ test for order (Order 1 versus Order 2 ; Anderson & Goodman, 1957) uninterpretable since that test presupposes sampling from a single chain and becomes subject t o selection effects which inflate the xz when heterogeneity of transition matrices exists. For the record, this x2 was 52.57 on 2 d f . A similar comment applies to the test for stationarity of the transition probabilities (Anderson & Goodman, 1957) which yielded a x2 of 212.66 on 198df, equivalent t o a z of .749. It will be seen later withother data that when homogeneity of the transition matrices is obtained, the xZ for order also falls into line. Another way t o approach the test for order
Short-Term Memory in Binary Prediction by Children
331
is to select subjects having similar parameter estimates. When this was done, the order test again gave nonsignificant x2’s. In Fig. 4, the solid curve shows the observed relative frequency of correct responses for the group of 20 subjects on the five consecutive blocks of 20 trials. A curve of this type will be referred to as an observed mean performance curve. The dashed curve in that figure gives the predicted mean performance curve.There are, in fact, two distinct methods for computing a predicted mean performance curve. The first uses the corollary to Theorem 2 with insertion of asingle set of parameter estimates inserted into Eq. (8). These are group estimates obtained under the hypothesis of homogeneity of subjects as indicated in the last paragraph of Section 11,D. The second method uses the parameter estimates for the individual subjects in conjunction with Eq. (8) to obtain a predicted mean performance curve for each individual. The average of these
OBSERVED AND PREDICTED
TABLE V RELATIVE FREQUENCIES OF THE EXFERIMENT I
3-TWLES:
3-tuples
Obs. Pred.,, Pred.,
ccc
cce
cec
cee
ecc
ece
eec
eee
.560
.096 .lo2 .114
.095 .090 .092
.042 .042 .045
.097 .I08 .115
.041 .032 .024
.042 .044 .046
.027 .030 .023
551
.541
predicted curves is then used as the predicted mean performance curve. Strictly speaking, the second method is always the appropriate one. I n practice, the two curves obtained by these two methods are almost always very close to one another. Although the amount of computational labor is greatly reduced when the first method is used, in the absence of a high speed computer the second method can be used with a desk calculator for good-sized groups of subjects without becoming overly burdensome. Similar remarks apply to each of the other types of data analysis that are presented (3-tuples, runs, and so on). Hereafter, predicted values based on group parameter estimates will be subscripted with G and predicted values obtained as the average of individual predicted values will be subscripted with AI. As it happens, the predicted,, (pred.*,) curve in Fig. 4 could be replaced by the predicted, (pred.G)curve with no discernible difference. The values for the pred., curve are identical to three decimal places with those of the pred.,, curve over the last four blocks and differ by .002 on the first block.
Richard S. Bogartz
332
The observed relative frequencies of the eight 3-tuples are shown in Table V together with the two types of predicted values obtained using Theorem 3. Although there is a consistent advantage, from the standpoint of goodness of fit, in using the average of the individual predicted values, the differences are quite small. Clearly, either method gives an adequate description of the observed values. TABLE VI OBSERVEDAND PREDICTED MEAN NUMBERS OF RUNSOF ERRORS OF LENGTHS 1 THROUGH 5 : EXPERIMENT I Run length
Obs.
2.25 2.70 3.07
9.80 9.31 9.40
Pred.,, Pred.,
1.30 .94 1.00
.45 .36 .33
.15 .16 .ll
The observed mean total number of runs of errors was 13.95, pred., was 13.96, andpred.,, was 13.60. Tables V I and V I I present the observed means and the two types of predicted means for the number of error runs of length one through five and the number of joint occurrences of correct responses k trials apart for values of k from 1 to 5. TABLE VII PREDICTED MEANNUMBERS O F JOINT OCCURRENCES OF CORRECTRESPONSES SEPARATED BY K TRIALS FOR VALUES OF K FROM 1 THROUGH 5 : EXPERIMENT I OBSERVED AND
K
Obs. Pred.,, Pred.,
1
2
3
4
5
64.80 65.21 64.86
64.20 63.41 62.09
61.50 62.51 61.13
62.15 61.80 60.44
60.20 61.14 59.80
The existence of badness of fit a t the level of the individual subject, which is masked by averaging over a group, is a consideration to whic6 some attention should always be paid. That such is not occurring in this experiment may be seen by inspection of Fig. 5 which shows a plot of observed and predicted values for individual subjects on the measures of the total number of runs of errors and the number of errors of length
Short-Term Memory in Binary Prediction by Children
333
one, and by inspection of Fig. 6 which shows a plot of the individual observed and predicted values of C, for values of k from 1 through 5 . The raw data are presented in Table V I I I coded with a one for a correct response and a zero for an error. Discussion of this experiment is postponed until Section 111, C.
30-
denotes two points
PREDICTED
FIG.5. Plots of individual predicted versus observed total number of runs of errors (R)and runs of length one ( r l ) .The plot for r l is displaced upward by 10 units (Experiment I).
PREDICTED
FIG. 6. Plots of individual observed versus predicted values of C ( K ) , the frequency of occurrence of two correct responses K trials apart, for K = 1-5. The plot for each value of K is displaced 20(K -1) units to the right (ExperimentI).
Richard S. Bogartz
334
TABLE VIII INDIVIDUAL PROTOCOLS : EXPERIMENT I ~~
Subject
Protocol
1
10010001011001001111111111111100111111011111111111 11111010111000011100111111011101111111101111111110
2
11111111111111111111111111111111111111111111111111 11111111111110111111111111111111111111111111111111
3 4
5
6
7 8 9
10111111111111111011111111101110111110101111100011 11100111111111001011111111110111111111111111111111 00111111011111101110111111111111101111111111111111 01111011101111101111111110101110111010101111111111 01110111111010110111111110111111111111110001111111 11111111011111111111111111111111111111101111111111 11000111111110111111111111100101011111111010111100
10110111110111111111110111111111110111001111111111 01110110101010001111101110111111001011111000111110 10111110110010100010011011100000111000100011110101 01111111110111111111110101010011011111110111111101 11110001111111111000111101101110101111010101111101 11111011111111111111111011100111110111111111111111 10010111110011111111101011001111011111100110001111
10
11001100111111111000110000111100110001010001110001 01100111111100001110111000011100101111110001111101
11
11111101111111111111111011111111010111011111111111 11011111110111111111111111111111111110111111101010
12
10101000001111111011110000100110000111111101111111 11111100111111101111111111111111100011101100001111
13
01100111111111101001ll1111111111111111111111111111 11111111111111110111111111111111111111011111111111 10111111101111111111111111111010011111111110111110 11111110111111011111100111111111111111111001111111 11111111111111111110110111111111111111110101111111 11111111111111110111111111111111111111111111111111
14 15
16
11011101011110111011111111111111111110110100111111 11111111111111111111111110111111111111111111111111
17
19
11101111001111101110ll111011101110l101101111111011 11101111111111111111111010011000010001111111101110 11001111111111111111011111111101001101010111001010 01101001010111110101011001011101110101010100001101 01110110101010001111101110111111001011111000111110
20
01111111110111111111110101010011011111110111111101
18
10111110110010100010011011100000111000100011110101
11110001111111111000111101101110101111010101111101
Short-Term Memory in Binary Prediction by Children
335
B. EXPERIMENT I1 : PREDICTION OF A SINGLEALTERNATION SEQUENCE FOLLOWING PREDICTION OF A MA~KOVIAN TENDING-TO-ALTERNATE SEQUENCE I n this section, we consider the data from a group of subjects that participated in a more extensive experiment which is reported elsewhere (Bogartz, 1966a). For the present purposes, we regard this group as a replication of Experiment I. 1. Method The apparatus, procedure, and instructions were the same as in Experiment I with the following exceptions. Each of the six 5-year-old and five 4-year-old subjects from the University of Iowa Preschool Laboratories used only black and white marbles. Each child had a different
-
OBSERVED ---a PREDICTED
n .6
z
U
g
.5
-
I
I
2
I
3
I
4 BLOCKS OF 15 TRIALS
1
5
FIG. 7. Observed and predicted mean proportions of correct responses in successive 15-trial blocks (Experiment 11).
75 :25 tending-to-alternate Markov sequence of blacks and whites during the first 75 trials and then was transferred with no interruption t o a single alternation sequence for the final 75 trials. I n the 75:25 tending-to-alternate Markov sequence, Pr(b1ack on trial n + 1[white on trial n ) = Pr(white on trial n + llblack on trial n ) = .75. Including the initial five practice trials with the red marbles as in Experiment I, each child had 155 trials. 2. Results
Only the data from the final 75 trials will be considered a t this point. The x2 test for homogeneity of subjects gave a x2 of 71.91 on 20 d f , indicating significant differences between individual transition matrices. The x 2 ’ s for stationarity and order were 167.46 on 148 df ( x = 1.13) and 7.83 on 2 d f , respectively. Figure 7 displays a plot of the observed mean performance curve and the pred.,, mean performance curve plotted for blocks of 15 trials.
336
Richard S. Bogartz
The deviations of the observed points from the theoretical curve are larger than those found in Experiment I (see Fig. 4). Since each observed point is based on 165 pieces of binary data, the deviations appear quite large. We should note, however, that positive autocorrelation exists within the individual protocols. Such autocorrelation tends to perpetuate departures from the expected curve. Inspection is not a satisfactory method for evaluating the magnitude of the departures in this case. There is no theory developed for the distribution of this particular statistic. Consequently, an empirical investigation was undertaken t o obtain quantitative information concerning the expected magnitudes of departures of observed mean performance curves from pred.,, curves when the theory is correct. To perform such an investigation, we began by creating theoretical counterparts to the 11 real subjects that were actually run. Such counterparts have been referred to in other contexts as mathematical robots or stat-rats (Bush & Mosteller, 1955) and, more appropriately for this context, stat-children (Zeaman & House, 1963). Each of these 11 statchildren were caused to “behave” according to the theory for 75 trials, thereby generating a replication of the experiment. By doing this many times, it is possible t o obtain an empirical distribution of protocols from which the distribution of any statistic of interest can be obtained. As the number of replications increases, furthermore, the empirical distribution approaches the appropriate theoretical distribution. Each stat-child was assigned a true initial probability of a correct response and a matrix of true transition probabilities corresponding to Matrix ( 5 ) . The values assigned as the true values mere the maximum likelihood estimates for the real children. Thus, the first stat-child had as his true initial probability the maximum likelihood estimate of that probability for Subject 1. As his true matrix of transition probabilities he had the maximum likelihood estimates of the transition probabilities for Subject 1. The second stat-child had as his true values the values estimated for Subject 2, and so on. Each stat-child’s response protocol was then determined by using a new set of 75 random numbers sampled from the uniform distribution of numbers between zero and one. The first random number was compared with the initial probability of a correct response, the comparison resulting in the assignment of a correct response t o the first trial when the random number was less than the initial probability or the assignment of an error if it exceeded the initial probability. The remaining random numbers, indexed by the integers 2 through 75, were compared with the appropriate entry in the stat-child’s transition matrix, either with the probability of a correct given a correct if the response assigned to the previous trial was correct, or with the probability of a correct given an error when the response assigned to the
Short-Term Memory in Binary Prediction by Children
337
previous trial was an error. Each of these comparisons resulted in the assignment of a response to the trial indexed with the same integer as the random number. When the random number was less than the probability to which it was compared, a correct response was assigned. Otherwise, an error was assigned. Once the 11 stat-children were run for their 7 5 trials, a replication of the experiment was obtained. These pseudodata were then treated exactly as we would have treated genuine data. A “pred.,,” curve was obtained, an “observed” curve was “plotted,” and the deviations of the observed from the pred.,, were obtained for each of the five blocks of
I
4
OBSERVED
--4
PREDICTED
2
3
4
5
BLOCKS O F 15 TRIALS FIG. 8. Observed and predicted mean proportions of correct responses in successive 15-trial blocks, without Subject 3. The vertical lines at each block span the acceptance rcgion for the model (Experiment 11).
15 trials. Replication of this procedure 1000 times gave an empirical distribution of deviations at each of the five blocks which is an approximation to the theoretical distribution of deviations. The .025 and .975 percentage points in this empirical distribution were then used as critical values to define a critical region having aprobability of approximately .05 when the theory is correct. Approximate significance tests could then be performed on the real data by comparing the observed deviations with these critical values. The observed mean proportion for the third block of 15 trials (seeFig. 7) was the only point to fall in a critical region, the critical values for that block being .704 - .087 = .617 and .704 + .081 = .785. On the one hand, this test is conservative. It employs a .05 region at each block of trials and, therefore, an experiment-wise error rate for false rejection of the model when it is true which is closer to 1. - (.95)*= .226. On the other
Richard S . Bogartz
338
hand, Subject 3 (see Table XII) does have a highly atypical protocol and almost certainly is not described by the model discussed in Section 11. His protocol is not incompatible with the general theoretical approach taken here, however, and we will consider him again. Eliminating this one subject, recomputing the observed and pred.,, curves and the Monte Carlo-generated critical regions based on the parameter estimates TABLE IX OBSERVEDAND PREDICTED RELATIVE FREQUENCIES OF THE TUPLES : EXPERIMENT I1
Obs. Pred.,, Pred.,
ccc
cce
cec
cee
ecc
ece
eec
eee
.438 .461 .435
.098 .099 .lo7
.067 .068
.067 .063 .077
.lo6 .lo7 .110
.034 .034 .027
.067 .067 .079
.122 .lo2 .lo8
.056
for the remaining 10 subjects, the results shown in Fig. 8 were obtained. None of the departures of the observed mean proportions from the predicted values are now significant, even with the conservative test. The results of the various sequential analyses will now be given. Tables IX-XIpresent observed, pred.,,, and pred., values of the 3-tuple proportions, the mean number of runs of errors of lengths 1 through 5, TABLE X OBSERVEDAND PREDICTED MEAN NUMBER OF RUNS OF ERRORS OF LENGTHS 1 THROUGH 5 :EXPERIMENT I1 Run length
Obs. Pred.A, Pred.,
5.36 5.50 4.52
2.18 2.25 2.57
1.18 1.10 1.47
.46 .60 .84
.46 .35 .48
and the mean number of joint occurrences of correct responses K trials apart for values of K from 1 through 5. The observed mean total number of runs of errors was 10.46, pred.,, was 10.45, and pred., was 10.50. Figure 9 shows the plot of the individual observed and predicted values for the total number of runs and the runs of length one. Figure 10 shows a plot of the individual observed and predicted values of C ( K )for values
Short-Term Memory in Binary Prediction by Children
339
TABLE XI OBSERVEDAND PREDCITED MEAN NUMBERS OF JOINT OCCURRENCES OF CORRECTRESPONSES SEPARATED BY K TRIALS FOR VALUES OF K FROM 1 THROUGH 5 : EXPERIMENT I1
K
Obs. Pred.,, Pred.,
1
2
3
4
5
39.91 41.78 40.08
36.91 38.98 35.84
35.64 37.61 33.97
35.64 36.75 32.97
33.91 36.08 32.31
of K from 1 through 5. Again it can be seen that the model is fitting well the data of most of the individual subjects. The individual protocols are given in Table XII.
C. DISCUSSION The analyses of the data collected in Experiments I and I1 indicate the capability of a model derived from the present theoretical approach to correspond well, statistically, to data generated in a single experimental situation. While this is no mean accomplishment, especially for 30 25
+denotes two points ' 0
5
10 15 20 PREDICTED
25
30
F I Q . 9. Plots of individual predicted versus observed total number of runs of errors ( R ) and runs of length one (rl):Experiment 11. The plot for r1is displaced upward by 10 units.
data from 4- and 5-year-old children, still one feels a bit uneasy, after inspecting the individual protocols, about how much information there is in many of the protocols. Subject 2 in Table VIII is of course the extreme example of a lack of information. If all the data were like his, one would need no theory or model, just a new experiment. I n short, then, while goodness of fit of the model is nice, we require something more of
Richard S . Bogartz
340
PREDICTED
FIG. 10. Plots of individual observed versus predicted values of C ( K ) , the frequency of occurrence of two correct responses K trials apart, for K = 1-5: Experiment 11. The plot for each value of K is displaced 30(K - 1 ) units t o the right.
TABLE XI1 INDIVIDUALPROTOCOLS : EXPERIMENT I1 Subject 1
2 3
4 5
6
I 8 9
10 11
Protocol 11110001110100000011111111111111111011000000011110 0001001100100111101111111 01111110111110111101101100110101000011000111000011 1110111111111110001111111
00000000111010001111000000000000000000000011001100 1000000110001110011111111 01111011101111111111111111101101111111111100001101 1111011111101111111011011 01111111111111111111011111111111111110011111111101 1111001111111111111110011 11111111111111111111101111111111111101111111111111 1111111001110111111111101 01111100010011110011111001111110111111111111101111 1111000001100011111111100 10101110101010110001110100110100111011001110000000 1000100000101110000010111 00000000010001100000010111000100110000000111111111 1100000110010101111110000 10101111111100111111111111110110100111011111~11111 1111101111111111111111111 01011111101111111111111111111100000111111111011011 1000100111100110001110111
Short-Term Memory in Binary Prediction by Children
341
the theory. Namely, the theory should be psychologically revealing in that it directs us to experiments that tend to confirm the relevance of the processes embedded in the theory to the behavior we are studying. In the next two sections, the theory is used to generate two such experiments. Some attention should be paid to the poorness of fit for Subject 3 and perhaps also Subject 9 in Experiment 11. It appears that these subjects are operating almost exclusively with response traces and null traces. I n addition, the input from the distracter does not remain constant. Instead, it seems that during portions of the trials they hold the t, well and use it. This gives the long runs of errors and the not-so-frequent long runs of corrects. At other portions of the session, the distractor gets very active. Perhaps the subjects attempt to engage E in conversation, bathroom needs arise, they begin to think about extraneous things, irrelevant objects in the experimental room attract their attention. Thus, other portions of the protocol show up quite free of long runs, looking more like chance performance (of course long runs would be expected with chance performance, but with low frequency). What is important to note here is not the badness of fit, which may simply show how unrealistic it is to suppose that all of the subjects not only operate according to the one model so far considered, but that the probabilities remain constant over trials indefinitely. Rather, the important feature of these data are the long error runs that produce the badness of fit. They indicate clearly the response-response dependencethat has been assumed to play an important role in binary prediction and for which the responsetrace mechanism was explicitly provided. Thus, while the particular model presented above probably does not apply to these two subjects, the general theoretical ideas may nevertheless be appropriate.
IV. Extension to the Effects of Intertrial Interval Duration A. THEEFFECT OF LENGTHENING THE INTERTRIAL INTERVAL
First, suppose that the subject does not attend to the event and encode it. Then, if the response trace t, is still present when the event is removed, t, will determine the next response provided that it rem.ainsin the STM store through the intertrial interval I,, (Fig. 1). On the other hand, if the event is encoded and t,, the event trace, has replaced t, (or some tothat has already replaced t,),then t, must remain in the memory through I,, in order to determine the next response. It seems reasonable, then, to assume that the longer the intertrial interval, the smaller the probability that t, or t, will determine the next response and, therefore, the greater the probability of guessing as a result of transfer of a to. Any
342
Richard S . Bogartz
monotonically decreasing decay function for traces will of course yield such a prediction. Also, any reasonable notion about extraneous information entering the memory and interfering with retrieval o f t , or t, will yield the same prediction.
B. EXPERIMENT 111: THEBOGARTZ AND PEDERSON STUDY Portions of the results of this study have been presented elsewhere (Bogartz & Pederson, 1966) with a more detailed statement of the method than will be given here. The study supports the theoretical expectations but contains certain flaws which will be discussed later. The analysis of the results in terms of the model has not been presented before. 1. Method
The Xs were 40 4- and 5-year-old children. Each was seated in front of a display panel having two circular apertures, each 1.5inches in diameter, one above the other. A blue or green light could appear in either aperture. The S was informed that he was going to play a guessing game with blue and green lights, shown a blue-green-blue-green alternation sequence, told that the sequence would always be blue-green-blue-green-.. ., and instructed to guess what the next color would be as soon as a buzzer sounded. Only the blue-green color sequence was presented throughout the experiment. Following attainment of six consecutive correct responses, S was given two stages of 40 trials each. At the close of a 2-minute interval between Stage I and Stage 11,Swasremindedof theblue-greenalternation sequence and informed of a change in spatial position of the stimuli which was to take place. Each trial consisted of a .3-second buzz, S's verbal color prediction, color onset, and color offset after 5 seconds. On each trial, E , seated behind 8,pressed the stimulus onset switch immediately upon hearing S's prediction and recorded the prediction as well as the response latency (to the nearest 1/100 of a second). The intertrial interval (ITI), the time between stimulus offset and the following buzz onset, was 3 seconds for 20 Ss and 10 seconds for the other 20 Ss. Two patterns of stimulus position were used. In Series A, position of the color alternated from aperture to aperture on each trial; in Series S, the stimulus color always appeared in the bottom aperture. Thus, an instance of Series A was blue in the bottom aperture, green in the top, blue in the bottom, green in the top ;in Series S, it was blue in the bottom, green in the bottom, blue in the bottom, green in the bottom. One half of the Ss in each IT1 condition received Series A in the 40 trials of Stage I and Series S in the 40 trials of Stage 11. The otherSs
Short-Term Memory in Binary Prediction by Children
343
received Series S in Stage I and Series A in Stage 11.The design is thus arepeated measurement two-by-two Latin square replicated at two ITI’s. 2. Results Two measures, number of errors and mean response latency were obtained in each block of 10 trials. Figure 11 shows the mean latencies and percentages of errors as a function of intertrial interval and trial blocks in each stage. A significantly greater number of errors are made with the 10-second IT1 than with the 3-second IT1 (PI,36 = 6.86), but the apparent difference on the latency measure was not significant (F,, 36 = 1.88). In view of the close correspondence of the latency curves to the error curves, the lack of significance is probably a Type-I1 error attributable to the notoriously large variances of children’s latencies. The model was fit to each of the eight sets of data (two ITI’s x 2
----
Latency
3.0
Errors
3-Second 10-Second
40
0
.’
p--4
7 2.5 77
30
-8 g
r?
E
20
20
1.5
10
E
B
+
-0 0 c
f
I .o
I
I
I
I
1
I
I
I
I
2
3
4
1
2
3
4
0
Stage 51
Stage I Blocks of 10 trials
FIG.11. Mean latency and percent errors as a function of intertrial interval and trial blocks in each stage.
stages x 2 series, A-S or S-A). Table XI11 shows the observed and predicted mean performance curves for blocks of 20 trials, and Tables XIV-XVI give the observed and predicted values for the 3-tuples, error runs, and C , statistics. Generally, the model describes the data well. Inspection of Table XI11 does suggest, however, that with the %secondITI, the mean performance curves for the S series slope somewhat more steeply than the model predicts, and that under the 10-second IT1 the same discrepancy may exist for the subjects receiving Series A in Stage I and Series S in Stage 11.
TABLE XI11 OBSERVED AND PREDICTED
MEAN PROPORTIONS O F CORRECTRESPONSES I N SUCCESSIVE 20-TRIAL BLOCKS : EXPERIMENT I11
IT1
3
Position
i;'
10
S-A
A-S
9
S-A
A-S
Stage
Block
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Prod.,
I
1 2
.950 .940
.946 .943
.946 .944
.905 .815
.864 .854
.863 .852
.925 .805
.872 .860
.868 .859
.865 .855
.854 .858
.861 .859
I1
1 2
.925 .835
.885 .877
383 .876
.925 .905
.925 .920
.918 .912
.825 .715
.776 .762
.776 .762
.875 .850
.865 .857
.866 .859
% r/l F m fi
TABLE XIV OBSERVED AND PREDICTED
IT1
RELATIVE FREQUENCIES O F THE 3-TUPLES: EXPERIMENT 111
3
F
5
10
Y
A-S
Position
S-A
A-S
8
s-A
Stage
3-tuple
Obs.
Pred.,
Prf3d.G
Obs.
PP3d.A
Pred.,
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Fred.,
I
ccc cce cec cee ecc ece eec eee
.839 .050 .047 .005 .050 .003 .005
.848 .046 .044 .006 .045 .004 .006 .001
.845 .049 .047 .005 .047 .003 .005
.700 .082 .061 .026 .068 .013 .024 .026
.701 .07 1 .059 .028 .068 .016 .027 .030
.685 .082 .056 .035 078 .009 .034 .021
.692 .087 .063 .024 .079 .024 .024
.704 .075 .061 .026 .072 .010 .025 .027
.690 .082 .059 .032 .079 .009 .031 .017
.642 .lo3 .097 .018 .097 .021 .018 .003
.650 .096 .090 ,023 .095 .016 .022 .008
.637 .I03 .lo2 .018 .I02 .017 .018 .003
ccc cce cec cee ecc ece eec eee
.708 .087
.723 .073 .062 .023 .071 .011 .022 .015
.710 .080 .064 .025 .077 .009 .024 .010
.795 .055 .044 .021 .052 .011
.812 .051 .045 .014 .049 .008 .014 .007
.791 .060 .047 .017 .057 .004 .017 .006
.513 .lo8 .I05 .055 .095 .055 .047 .021
.512 .I09 .096 .053 .I05 .038 .050 .037
.483 .127 .lo7 .053 .122 .032 .051 .026
.661
.664
.089
.089
,095 .021 .079 .034 .021
.083 .025 .086 .019 .024 .001
.639 .lo3 .lo1 ,018 .lo0 .016 .018 .003
I1
.ooo
.OM .021 .079 .005 .021 .013
.ooo
.018 .003
.
.008
.ooo
E $
g 9
7
(0
8 B
e y e
5
TABLE XV OBSERVED AND PREDICTED
IT1
3
1
Run length r1
TZ
r, -r, I1
rl r2 ~3
-T,
10
A-S
Position Stage
MEANNUMBER OF RUNS OF ERRORS : EXPERIMENT 111
S-A
A-S
S-A
9 !id
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
Obs.
1.80 .20
1.83 .16 01
.
2.60 .60 .40
2.31 .60 .49
2.23
.oo
1.73 .19 .04
.49
2.70 .40 .50
2.38 .59 .40
2.70 .50 .30
2.45 .55 .32
2.53 .70 .26
1.70 .70 .I0
1.74 .38 .17
1.84 .49 .17
4.20 1.50 .60
3.80 1.20 .go
.84
Obs.
Pred.,,
Pred.,
2.35 .81 .42
4.10 .60 .10
3.61 .67 .20
4.07 .60
4.21 1.37 .66
3.90 .80
3.28 .70 .25
3.98 .60 .10
Pred.A Pred.,
.oo
.ll
m
3
09
e N
TABLE XVI OBSERVEDAND PREDICTED MEANNUMBERS OF JOINT OCCURRENCES OF CORRECTRESPONSES SEPARATEDBY K TRIALSFOR VALUES OF K FROM 1THROUGH 5 :EXPERIMENT 111
IT1
3
10
S-A
A-S
Position
S-A
A-S
Stage
K
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
I
1 2 3 4 5
34.80 33.70 32.80 32.10 30.90
34.87 33.90 32.99 32.10 31.21
34.86 33.89 33.00 32.11 31.22
30.20 28.90 28.10 27.40 26.60
30.11 28.88 27.92 27.06 26.26
29.89 28.16 27.16 26.36 25.62
30.30 28.70 27.80 27.00 26.10
30.40 29.05 28.07 27.21 26.39
30.11 28.48 27.53 26.74 25.99
29.00 28.10 27.50 27.70 25.90
29.12 28.14 27.36 26.61 25.87
28.86 28.07 27.33 26.60 25.86
I1
1 2 3 4 5
30.90 29.40 28.60 27.60 26.70
31.04 29.83 28.90 28.06 27.26
30.82 29.41 28.54 27.75 26.98
33.20 31.90 30.70 30.20 29.50
33.68 32.57 31.64 30.76 29.90
33.18 31.84 30.91 30.06 29.22
24.00 23.50 22.00 22.10 20.80
24.20 23.09 22.33 21.68 21.07
23.76 22.40 21.72 21.13 20.54
29.10 28.70 27.60 26.90 25.90
28.96 28.15 27.41 26.68 25.94
29.34 28.39 27.59 26.82 26.08
348
Richard S. Bogartz
Table XI1 also suggests that performance is better under Series A than under Series S. An analysis of variance (see Bogartz & Pederson, 1966) indicated that this was in fact the case ( F , , , 0 8= 12.28). This effect may be attributable to the completely redundant position cue which may have provided another trace that could be used in conjunction with another rule. Thus, a position trace could be used with the rule: if top position last time, predict blue, if bottom, predict green. There is, however, the possibility that the alternating position of the light simply serves to maintain attention more effectively. A similar interpretive problem exists for the IT1 effect. Obviously, for S to be correct at a greater-than-chance level, some trace or other representation of either his response or the event on the previous trial must carry over to the next trial. It is reasonable to assume that the longer the ITI, the less likely is this carry-over, either as a result of greater trace decay or greater opportunity for the occurrence of responses incompatible with maintaining some representation of what happened on the previous trial. Unfortunately, IT1 is confounded with time in the experimental situation. Thus, the apparent effect of a longer IT1 may be the result of, say, a general lagging of attention which produces a greater performance decrement the longer S is in the situation. Additional information bearing on this question can be obtained by varying IT1 within 8 s instead of between Ss. That is, let IT1 vary from trial to trial, using a different random sequence for each S. Decremental effects should average out, and a cleaner picture of the effects of IT1 should be obtained. This is done in Experiment IV.
C. A MODELFOR WITHIN-SSVARIATION IN
IT1 DURATION We now introduce an experimental manipulation together with a theoretical assumption concerning the effects of that manipulation. Suppose that I,, the intertrial interval on trial n , is not held constant from trial to trial, but instead is manipulated as a “within-subjects” variable such that on each trial one of s possible values of I, is used according t o some experimenter-determined schedule of ITI’s. We will assume that the effect of the intertrial interval duration is located only in the values of d , and d,, that the effect is independent of the trial number and other experimental occurrences (response, event, and so on), and that d , and d, are monotonic increasing with increasing values of I,. Let d,, and d,, be the values of d , and d, associated with the occurrence of I,,, the kth type of IT1 occurring on trial n. Then, a, = p ( 1 - d,,), ,8, = ( 1 - p ) ( 1 - d,,), 7, = 1 - CL, - 3/, and we have the transition matrix THE
Short-Term Memory in Binary Prediction by Children
349
Given this experimental manipulation and theoretical assumption, the sequence of correct responses and errors is, in general, no longer a Markov chain with stationary transition probabilities since the transition probabilities are not necessarily constant over trials. I n view of the many analytical tools available for the treatment of such chains, it is useful to know which experimenter-determined within-subject trial-totrial variations in the intertrial interval will preserve the stationarity of the transition probabilities. A sufficient condition will be given. Letting p i j ( n )be the probability of a transition from response i on trial n to responsej on trial n + 1, and p i t k )be the conditional probability of such a transition given the occurrence I, (i.e., letting pi$,) be the general entry in Matrix (29) then T
p I.J. ( n )=
2
piIk)tik(n),
k= 1
where tik(n)is the conditional probability of I,, given response i on trial n. By definition, the sequence of correct responses and errors is a finite Markov chain with stationary transition probabilities if p ij ( n ) is independent of n ; i.e., a constant, p i j . From Eq. (30) it can be seen that this will be the case if the sequence of intervals is constructed using a probability density ti, that is independent of n . That is, if ti, is a constant. The required condition, therefore, is that the sequence of intertrial interval durations be a t most a simple contingent sequence. Putting this another way, it is required that the sequence be such that the probability of a given interval occurring on any trial depends at most upon the response made on that trial. Thus, of course, random equiprobability sequences of intervals (ti, = I/$) and noncontingent sequences (ti, = t,) also yield the desired stochastic process. A rigorous proof of the assertions in the previous paragraph is available in an article by Rouanet and Rosenberg ( 1 964), although some translation is required, in that their discussion is in the context of models for response continua. However, as they indicate, their analysis handles discrete random variables also. For translation, their response random variable x, should be defined on the response set (c,,e,), their reinforcement random variable should be defined on the set of intertrial intervals, and their “state of the organism” variable z, can be ignored or set equal to x,. Then their Theorem 2 (p. 2 2 2 ) provides the desired result.
Richard S. Bogartz
350
We note here that in Section V, we shall make use of this result again, although there in regard to differentinterpolated events occurring during the IT1 rather than to different durations of the ITI. Intepreting Iknas the occurrence of the kth type of interpolated event during the ITI, the above result continues to apply. The result is actually more general but we shall have no occasion to consider its other extensions. 1. The Distribution of 3,2-Tuples
We find here the distribution of a set of statistics similar to the 3-tuples treated in Theorem 3. Again, letting yn denote the generic response on trial n (c, or en)and i, the generic IT1 on trial n, the general inflY,,+~. 3,2-tuple beginning on trial n is yninyn+, The proportion of occurrences of this 3,2-tuple in N trials is N -2
n= 1
where J, is a value oft.. depending at most upon yn, J2is a value of t . . depending at most upon Y,,+~,K , is a value of p..(*)which is an entry in an appropriate matrix of the form of Matrix (29) determined by in, and K2 is the same as K , but from a matrix determined by in+l.Thus, for example, suppose there are only two intervals, I, and I,, and they occur at random according to a 50 :50 schedule. Then N-2
P(CI,
eI2 C)
= ( N - 2I-l
2 P(cn)(.5)(r1/2)(.5)(.,+ Y2/2).
n=l
(32)
To find P(cn)we use the fact that the sequence of correct responses and errors is a two-state Markov chain with stationary transition probabilities since we are limiting our attention to use of at most simple contingent sequences. Theorem 2 gives the value of P(cn)provided that each of the parameters CY, j3, and y is interpreted as an average over the values for
Short-Term Memory in Binary Prediction by Children
351
each ITI. Estimation of these average values, denote them by E , 8, and 7, is made using the transition counts as in Section 11, D, treating the data as if the IT1 variation did not occur. As in Eq. 9, letting
vN
N--2
=
( N - 2)-I
2 P(cn)
and
(1 -
n=l
vN)= ( N -
N-2
2)-l
2 P(en),
n= 1
we will use a quantity i;f,
= P(c,) - Z(1-
PN--2)(N
- 2)4(1-&I,
(33)
where now P(c,) = (a + 7/2)/(1 - 8) and 2 = P(c,) - P(cI).Estimation of &k, Pk, and Y k would of course be made in the usual way using transition counts appropriate to Matrix (29). Thus, for example, if there are nc.k transitions from cn n Ikn, and nCekof them are to en+,,the maximum (ignoring the problem likelihood estimate of yk would be = 2n,cek/nc.k of constraints for which the treatment is the same as in Section 11,D).
D. EXPERIMENT IV : WITHIN-SVARIATION IN THE IT1 DURATION 1. Method The Ss were 25 4-and 5-year-old preschool children. Each was taken to an experimental room and seated a t a table opposite E . On the table was a stack of 102 4 x 6-inch file cards concealed behind a small black box which could hold the entire stack. CenCered on each card was a 1.5 x 2.0-inch rectangular patch of red or green tape. The colors in the stack alternated (RGRG . . . or GRGR . ..). The S was shown the first two cards in the stack, one after the other, after being asked to name the colors on the two cards. Following correct naming of the two colors, a 6-V buzzer was sounded briefly and the child was told that each time he heard the buzzer he was to guess quickly the next color in the stack. On each of 100 trials, following each buzz, S made his prediction, E removed the top card from the stack, turned it color side up in front of S for about 1 second, and then placed it color side down in the box. The buzzer sounded for . 3 second every 8 seconds, except when the E depressed a foot switch which opened the buzzer circuit. If this occurred, the interbuzz interval was increased from 7.7 seconds to 15.7 seconds. A different random sequence of long and short intervals was used with each child; thus, approximately half the intervals were long and half short. 2 . Results
Since there are two ITI’s and they are presented according to a random equiprobability schedule (ti, = .5, all i and k), the sequence of correct responses and errors is theoretically a sample from a two-state
Richard S. Bogartz
352
Markov chain with stationary transition probabilities. Therefore, all of the model analyses appropriate to the data in the previous experiments are also appropriate in this experiment. Table XVII presents the observed and predicted mean performance curve in blocks of 20 trials, Table XVIII presents the observed and predicted relative frequencies of the 3-tuples, and Table XIX the observed and predicted error runs and C, statistics. The only suggestion of badness of fit in these tables is the perhaps greater than expected TABLE XVII MEANPERFORMANCE CURVE IN %)-TRIAL BLOCKS
Block
Obs. Pred.,, Pred.,
1
2
3
4
5
.796 .751
.788 .758 .770
.788 .758 .773
.754 .758 .770
.712 .758 .770
.758
TABLE XVIII OBSERVED AND
PREDICTED
RELATIVEFREQUENCY OB TUPLES
3-tuple
Obs. Pred.,, Pred.G
ccc
cce
cec
cee
ecc
ece
eec
eee
.530
.I07
.I03
.085 .080
.523
.I09
.076
.046 .048 .056
.lo7
529
.026 .026 .023
.046 .051 .057
.053 .055 .042
.I09 .I 11
TABLE XIX OBSERVED AND PREDICTED MEANVALUESOF r j , THE NUMBER OF ERROR RUNSOF LENGTH j,AND Ck,THE NUMBEROF JOINT OCCURRENCES OF Two CORRECT RESPONSES k TRIALSAPART rj
Pred.,
Pred.,,
rl r2 r3
7.78
8.28 2.67 1.07
3.28
r4
1.38 .58
TS
.25
.50
.26
Obs.
8.88 2.00 1.40 .68 .12
1
Ck
C, Cz C3 C, C,
Pred.,
62.90 58.99 57.56 56.76 56.11
Pred.,,
Obs.
63.17 60.25
63.00 60.32
58.82
58.12
57.87 57.11
57.96 56.84
Short-Term Memory in Binary Prediction by Children
353
downward slope to the observed performance curve. The trend suggests a possible drift in the parameter values, probably in the probability of an encoding response occurring. The drift is not large over the 100 trials and actually appears t o be nonexistent during the first 7 0 or 80 trials. It does not hamper the ability of the model to describe well the detailed structure of the data. To treat the effects of the IT1 duration we must estimate the transition probabilities for the long (1,) and short (S) ITI’s, cn+I
en+ I
1- Y L P dL
+Y L P
YLP BL
+ YLI2 (34)
The maximum likelihood estimates for the group were cn+I
en+1
cn+I
en+1
and
Using theoretical Matrix (34),we can estimate the group meanprobability of a to being passed to the generator after a long IT1 by the quantity 2pceL= 2(.266) = ,532 and the corresponding value for a short IT1 by 2pCes= 2(.074) = .148. Thus, the probability of a guess following a long IT1 is about 34 times the guessing probability following a short ITI, as was predicted. To further evaluate the ability of the model t o describe the effects of the variation in IT1 duration, the observed and predicted values of the 32 possible 3,2-tuples were obtained and are shown in Table XX. The agreement of observed with predicted is excellent.
E. DISCUSSION A remark concerning the effects of TTI duration in Experiment 111is needed. It seems likely that in the %second condition, the pacing of t h e trials is so rapid that there are attentional effects as well as memory effects. With an 8- or 10-second ITI, there is time for the child to reorient, become involved with other matters between predictions. With a 3second ITI, the child is captured by the sequence of events and prevented from disorienting. For this reason, children in the %second condition are probably attending to almost every event. We know almost nothing about the effects ofthe trial pace on children’s learning, but these results suggest that control of attention could be
TABLE XX OBSERVED AND 3,2-Tuple c C c c c c c c e e e e e e e e
s S S S S S S S S S S S S S S S
c C c c e e e e c c c c e e e e
s L S L S L S L S L S L S L S L
c C e e c c e e c c e e c c e e
PREDICTED
Obs.
Pred.,,
.156 .123 .016 .054 .009 .011 .004 .004 .025 .027 .003 .011 .011 .011 .016 .014
.166 .133 .012 .045 .009 .009 .005 .005 .030 .022 .003 .010 .011 .013 .015 .013
RELATIVE FREQUENCY
Pred., .I64 .130 .013 .047 .008 .008 .006 .006 .030 .024 .002 .009 .014 .015
.012 .011
O F THE
3,2-Tuple C C c c c c c c e e e e e e e e
L L L L L L L L L L L L L L L L
C C c c e e e e c c c c e e e e
S L S L S L S L S L S L S L S L
C C e e c c e e c c e e c c e e
3,2-TUPLES
Obs.
Pred.,,
Pred.,
.137 .114 .006 .031 .032 .033 .018 .021 .030 .025 .004 .009 .013 .011 .014 .008
.133 .lo8 .010 .034 .030 '029 .019 .020 .030 .023 .003 .009 .013 .013 .013 .013
.130 .lo3 .010 .037 .028 .030 .023 .021 .032 .025 .003 .009 .013 .014 .011 .010
Short-Term Memory in Binary Prediction by Children
355
maintained by a rapid pacing. This might help us understand why young children in the laboratory seem to have attentional limitations they do not seem to have outside the laboratory (except perhaps in classrooms, which also, for the individual child, have much blank time). Even irrelevant filler tasks which do little but fill time (although consideration to their novelty effects would also have to be given) might be very helpful. (An excellent place to see this effect is a t parades where for the young child, even the uninteresting dignitaries in cars seem to help span the interclown or interfloat intervals in preference to just the blank time of the strung-out parade.) The results of Experiment IV support the general theoretical point of view taken here. The support is at the level of predicted directional experimental effects and predicted directional differences in parameter values rather than simply the goodness of fit of a model with parameters free for estimation. The results justify the introduction of a second memory axiom giving a formal statement of the assumption concerning the effects of the IT1 duration; however, that axiom will not be given here. Nor will any of the other additional assumptions to be presented later be given the same formal treatment as was presented in Section 11, although there are no serious problems that would prevent this. There is another interesting piece of evidence which should be mentioned at this point. I n the original design of Experiment IV, it was planned that for half of the subjects the event card would not be exposed for only 1 second, but instead would remain exposeduntil after the subject made his next prediction and it was covered by the next event card. Thus, information as to what event occurred on the previous trial would always be available to the child. The child is thereby provided with a distraction-free memory which is not subject t o the effects of his responses provided that he looks a t the exposed card when he receives the cue to make his next prediction. Six additional subjects were randomly assigned to this condition at the beginning of Experiment IV. The condition was then terminated after these six because the subjects were making so few errors that it was felt more information would be obtained if the limited number of subjects were used in the 1-second exposure of the card condition. The largest number of errors made by any of the six was four in the 100 trials. The mean error probability for the six subjects was .023 as compared with a mean error probability of 2 3 2 for the 25 subjects in the 1-second exposure condition. Thus performance in the prediction task can be raised from a level of less than 80% correct to practically perfect performance by supplementing the child’s distractible memory with a distraction-free memory that is not subject t o effects of the child’s own previous response.
356
Richard S. Bogartz
V. Extension to Interpolated Events A. INTERFERENCE EFFECTS Experiment IV demonstrated that performance dependent upon the presence of the response and event traces is degraded by the lengthening of a temporal interval during which one of those traces must be held in memory. This does not necessitate a “fading-trace” hypothesis since the possibility of distracting events causing replacement of the trace in the memory also exists. The experiment to be considered in Section V,C will show that such interference effects can be produced experimentally. This will not rule out the possibility of trace decay occurring, but will a t least demonstrate occurrence and manipulability of interference effects in the alternation prediction task. A natural extension of the model based on the theoretical ideas presented above will be used to demonstrate the power of the present theoretical position t o describe the effects of this experimental manipulation. The memory store subsystem that has been used above is a one-slot memory. That is, only one trace is stored a t a time. The theoretical meaning of interference is the displacement of one trace by another. Storage of an event trace, for example, must interfere with use of the response trace since if the t, is stored, the t, is displaced with probability 1.0. Likewise, entry of a tointo the memory during the IT1 displaces the t, and thereby produces complete interference with it. We wish now to consider experimental production of interference effects by attempted manipulation of the contents of the memory store. The manipulation to be considered is one in which the subject is required to respond to some stimulus during the ITI. The presumed probabilistic consequences of this manipulation is the entry into the memory store of a trace ti produced by the encoding of the interpolated event. We shall be specifically interested in an overt naming response that the subject makes to a stimulus presented during the ITI. We shall require a categorization of the types of stimuli that can be presented and the types of ti’sthat can be the consequences of naming such stimuli. Remembering that we are now still within the context of the alternation prediction task, it will be convenient to introduce a twofold categorization of the possible interpolation events. To do this we first note that the generator determines a partition of the set of possible traces. Recalling that A, and A, denote the two possible PR’s, abd t,“) are the traces and t,(,) are the traces produced, respectively, by A, and A,, tIC2) produced by encoding of E, and E,, respectively, and t , ( O ) is the null trace, we now introduce t, = {to(,),t,(,),t2(,)}, the set of possible traces produced by naming the interpolated event. We see that the generator , ti(,),t,(,),to(,),tl(,),t2(,)} into the rules partition the set { t l ( 0 ) , t l ( l ) t,(’),
Short-Term Memory in Binary Prediction by Children
357
sets t,={t,(l), t,',), t,(3)},t, = {t,(I), t,(Z),t2(3)}, and to = {t,(O), The set t I is the set of traces each of which results in the PR A, ; the elements oft, all result in the PR A, ; and the elements of toresult in aguess. We can now introduce the twofold categorization. An interpolated event is first categorized as relevant or irrelevant. Any interpolated event the naming of which can result in entry of a in the memory store is an irrelevant (I)event. An event that can is a relevant event. Relevant events result in entry of a tlC3)or a t2(3) will be categorized as either same (S) or different (D) according to the following. On any given trial, a relevant interpolated event is an S event if its if the prediction made on that trial encoding would enter the trace ti(3), resulted in entry of t j ( l ) ,and if i =j . Thus, for any k(k = 1,2),if the event E t,. Encoding the interpolated is an S event, then tj(l)E t, implies ti(3) event enters into the memory a trace that will cause the generator to produce the same prediction that would have been produced if the response trace on that trial had been transferred to the generator. A relevant event that is not an S event is a D event. Thus, the trace entered by encoding a D event will cause the generator to produce the prediction Ai such that, if Aj is the prediction that would have been produced if the response trace on that trial had been transferred to the generator, then i # j . To summarize : Encoding of irrelevant events enters the equivalent of a null trace into the memory; encoding of an S event enters the equivalent of the response trace entered on that trial; encoding of a D event enters a trace equivalent to that which would have been entered if the response that did not occur on that trial had occurred.
B. A MODELFOR THE EFFECTS OF INTERPOLATED EVENTS The model to be considered here is for the case of noncontingent interpolations, i.e., where the probabilities of S, D, and I events remain constant from trial to trial. It will be assumed that on each trial the probabilityof the interpolated event being encoded and transferred to the generator is a constant 6. Thus, u,/3, y , and 6 are the probabilities that t,, t,, to,and ti, respectively, are transferred to the generator on a given trial. Since these are mutually exclusive, exhaustive events, + /3 + y 6 = 1.0.
+
The transition matrix
Richard S . Bogartz
358
gives the transition probabilities when an S event occurs during the IT1 between trial n and trial n + 1. Since the effect of encoding an S event is the same as retaining the response trace, we can arrive at Matrix (36) by simply adding, in each row, a 6 to the column containing the term /I in the matrix appropriate to alternation prediction without interpolations, Matrix ( 5 ) .Similarly, for D events we add, in Matrix ( 5 ) ,6 in each row to the column that does not contain 8, since the effect of encoding a D event is opposite to that of retaining the response trace. This gives
C,nD e,nD
(37) a+S+y/2
P+y/2.
Finally, for I events, encoding of which has the same effect as entry of a null trace, we simply replace y by 6 y everywhere in Matrix ( 5 ) , giving
+
en+ I
en+I
c, n I
en n I Since the sequence of interpolated events is noncontingent, the result of Section IV,C applies, giving that the sequence of correct responses and errors will be a two-state Markov chain with stationary transition probabilities. The transition probabilities will be weighted averages of the entries in corresponding positions of Matrices (36), (37), and (38), with the weights equal to the probabilities of S, D, and I events occurring. Thus, the data can be analyzed as if the interpolated events manipulation had not been introduced, as was done in Experiment I V with the variable IT1 duration. To test the effects of the interpolations, an approximate x2 test can be applied to test the goodness of fit of the matrix C
e
cnS cnD cnI ens enD en1
(39)
Short-Term Memory in Binary Prediction by Children
359
to the corresponding table of observed transition relative frequencies. For large n, the standard x2 statistic will be distributed as x2 with 6 - 3 = 3 df (sincethree parameters are estimated) under the hypothesis that the data are a sample from the model.
C. EXPERIMENT V : THEINTERPOLATED EVENTS EXPERIMENT 1. Method The subjects were 40 4- and 5-year-old children attending the University Preschool Laboratories of the University of Iowa. The stimuli were Animal Rummy cards (Whitman Publishing Company). Thirty-eight of the following animals were used: dog, owl, squirrel, kitty, goose, turtle, lamb, mouse, fox, bunny, and chick. For each subject, four of these animals were randomly selected as stimuli and assigned as stimuli A, B, C, and D. Randomization was restricted such that both elements of the following pairs could not be included in the same set : goose and chick ; kitty and bunny; squirrel and fox ; dog and kitty. The first three of these restrictions were because the animal pictures in those pairs were considered too similar in appearance. The fourth restriction was included because of probable association value. The child was escorted into the experimental room and seated at a table opposite the experimenter. The experimenter showed him an instance of each of the four stimulus cards, A, B, C, and D, asked him to name them, and then told him that they were the kinds of cards to be used in the game. The stimulus cards for a given subject were arranged in two decks: an alternating deck and an interpolation deck. The two decks were on the table, face down, a few inches apart and directly in front of the child. The alternating deck contained 10 A and B alternating cards. The interpolation deck contained, for each subject, a different randomly ordered sequence of 64 interpolations. The game was explained to the child as follows. “This is the naming deck (the interpolation deck was touched) and this is the guessing deck (the alternation deck was touched). The guessing deck has only A’s and B’s in it. The first one is an A (the first card was turned face up). The next one is a B (the next card was turned over). The next one is an A (the same).’’ Several alternating A’s and B’s were shown this way. The child was asked to guess a few. When the experimenter felt confident that *he child knew the rule, he tested him by asking “If I have just shown you an A from this deck, what will be next? And if I have just shown you a B, what comes next?” In all cases the children gave satisfactory answers to these questions and there was no need for more practice. Kext, the experimenter explained the interpolation deck. “This is the naming deck. It has all four kinds of cards in it (the four initial examples were picked up and fanned in front of the subject).
360
Richard S. Bogartz
When I show you a card from this deck, you just tell me what it is, and when I say guess, you guess a card from this (alternation) deck.’’ Each subject predicted an alternation sequence of 65 A and B stimulus cards. Following each card in this sequence except the last, he was shown a series of interpolation cards to be named aloud. The set of 32 interpolations following A cards was identical to the set following B cards, except for order. Of the 32, 16 were of length 1 and 16 were of length 3. Half of the interpolations of each length (8) contained relevant cards (A’s and B’s) and the other half contained irrelevant cards (C’s and D’s). Thus the length-1 interpolations in each set of 32 consisted of 4 A’s, 4 B’s, 4 C’s, and 4 D’s. The length-3 interpolations consisted of the eight possible AB triples (AAA, AAB, . . ., BBB) and the eight possible CD triples (CCC, CCD, ..., DDD). The sequence started with the child guessing the top card in the alternation deck. As soon as the child made his guess, the guessed card was turned over so he could see it and then placed face down on the bottom of the deck. The correct number of cards from the interpolation deck (one or three) was turned over one at a time and the child named each. The used cards from the interpolation deck were placed face down in back of the deck. The experimenter attempted to show the cards at a 2-second rate, however, there was much variability because of response latency variability. 2. Results
Because the S, D, and I interpolation events occur on any given trial with respective noncontingent probabilities of 114, 114, and 112, the sequence of correct responses and errors is, according to the model, a sample from the same type of stochastic process assumed to have generated the data in the previous experiments. Therefore, the usual analyses may be again performed, this time as if the interpolation procedure had not occurred. The results of these analyses are shown in Table XXI, which gives the observed and predicted values for the mean performance curve in blocks of 13 trials, the 3-tuple analysis, the error runs analysis, and the C, analysis. I t can be seen that the model again describes in detail the statistical properties of the data. The most obvious difference between these results and those for the first four experiments is the relatively low level of performance in this experiment. As the model predicts, the mean performance curve is flat throughout the 65 trials, but the probability of a correct response is now at about .66, whereas in the previous experiments the performance level was near .80 or above. This result is not unexpected because the introduction of the interpolation schedule used in this experiment should have a net effect of degrading the overall performance level.
Short-Term Memory in Binary Prediction by Children
361
Preliminary analyses of the effects of the interpolation procedure focused on the difference between the naming of one and naming of three interpolated stimulus cards. Study of the responses following different triples such as SSS versus DDD revealed no simple or obvious differences that could not be equally well accounted for by using just the last card of the triple to categorize the interpolation. The effects of the interpolaTABLE XXI STATISTICAL ANALYSES FOR EXPERIMENT V, IGNORING THE INTERPOLATION MANIPULATION ~~
~~
~~
~
Mean performance curve in 13-trial blocks Block 1 2 3 4 .667 .638 .669 Obs. .685 .666 .666 Pred.,, .677 .666 .663 .663 Pred., .673 .663
5 .665 .666 .663
Relative frequency of 3-tuples cec 3-Tuple ccc cce .337 .134 .119 Obs. .356 .126 .110 Pred.,, Pred., .332 .138 .113
ecc .131 .125 .137
cee .075 .077 .082
ece .062 .061 .057
eec .074 .076 .082
eee .068 .070 .060
Mean numbers of runs of errors of lengths 1-5 Run length Obs. Pred.,, Pred.,
1 7.95 7.25 7.43
2 2.52 2.74 3.09
3 1.30 1.15 1.29
4 .40
.52 .54
5 .28 .25 .22
Mean number of joint occurrences of correct responses separated b y K trials: K = 1-5 K 1 2 3 4 5 Obs. 30.08 28.72 27.82 27.88 27.15 Pred.,, 30.83 29.35 28.66 28.13 27.65 Pred., 30.07 28.01 27.37 26.90 26.46
tion were, therefore, analyzed as if on each interpolation of three cards, only the last card of the triple had been interpolated. Table 22 shows the observed and predicted transition probabilities corresponding to the theoretical Matrix (39). Maximum likelihood estimates of&,B, y , and 6 were found to be .286, .119, .385, and .210 for the group. Thus, the children were using te’s about 30% of the item, tr’s about loyo,guessing about 40%, and using a memory trace of the interpolated event about 20% of the time. A xZ test of the hypothesis that
Richard S. Bogartz
362
the observed transition relative frequencies are compatible with the theoretical Matrix (39) yielded a nonsignificant x2 of 4.61 on 3 df,which, considering the large N’s upon which the estimates are based, indicates a remarkable agreement of the data with the model.
D. DISCUSSION Experiment V demonstrates the susceptibility of the memory store to information intrusions produced by the encoding of the interpolated events as a result of the overt naming response. The effects of these naming responses are exactly those that were suggested during the discussion of the data of the single subject in Section I, B. The naming of it stimulus event tends t o enter a trace of that event into the memory, increasing the probability that the prediction of the next response will be based upon the informational properties of the named event in relationship t o the rules of the generator. TABLE X X I I ANALYSIS OF THE INTERPOLATION EFFECT Group predicted
Group observed
C
E
C
E
.808
.192
.SO8
597 .702 .478
.403
.607 .698 .435 .663
.192 .393 .302 .565 .337 .382
N
_ .
CandS CandD CandI EandS EandD EandI
.688
.583
.298 .522 .312 .417
.618
438 420 843 223 199 437
The fact that ti’s and t,)s are being used by the generator suggests that there is imperfect tagging of the trace with respect to its source or, perhaps equivalently, imperfect discrimination of the tags by the generator. The child uses the traces, a t least to a certain extent, interchangeably. On the other hand, the fact that even though the child must name the interpolated events, the probability of using a ti is still only about .20, may indicate either a resistance to encoding the interpolation events or rejection of them a t some later stage of processing. These are questions that require subtle experimental study and this sort of discussion is a t most conjecture a t this point. An attempt to train children to tag the traces should be one fruitful avenue of attack on the problem and would give some indication of the child’s capacity to sort information in memory.
Short-Term Memory in Binary Prediction by Children
363
VI. Extension to Markov Event Sequences A. THEIMPLICATION OF A ONE-TRIAL MEMORYFOR PREDICTION OF MORECOMPLEXEVENTSEQUENCES The theory proposed so far states that in predicting an alternation sequence the child uses stored information from just the previous trial. The short-term store is the locus of trial-to-trial effects, while any longterm effects are located in the set of generator rules. One implication of these assumptions is that as long as the generator rules stay fixed, the long-term properties of the event sequence are of no consequence. The subject’s response is as predictable as it can be solely on the basis of what his last response was and which event occurred. The assumption of a one-trial memory is very strong. We know that 4- and 5-year-old children can predict a double alternation sequence (AABBAABB. . .). This fact strongly suggests, but, contrary to popular belief (e.g., Restle, 1961), does not require a two-trial memory (a memory with two slots). Memory depth (number of slots) can be replaced by response-trace diversification together with an expanded set of rules. For example, in predicting the double alternation sequence red-redgreen-green- . .., the child may actually output two different responses for each type of event. Responding with the iambs red-RED, greenGREEN, or the trochees RED-red, GREEN-green, differences in stress or volume may result in four response traces rather than just two. Then, for the iamb system, for example, the rules r --f R, R --f g, g -+ G, and G -+ r would permit double alternation behavior using only one memory trace. It is of interest, then, to see how far the assumption of a one-trial memory can be carried reasonably. Also, even if memory effects from two or more trials back do occur, as they certainly do with adults, in young children these effects may be slight, so that a one-trial memory model may capture the bulk of the truth. The fact that in Experiment V the error probability was about .33 and yet the children did not abandon the alternation rule indicates that generator rules are rather resistant to change in the standard prediction task. Prediction errors do not seem to cause the child to abandon a rule very quickly (although this is perhaps not true during the first few trials during which rule selection often takes place). This, in turn, suggests that even if the sequence did not alternate on every trial, but just alternated fairly often, the alternation rule might be held. In this case, prediction of a sequence that alternated imperfectly would still depend only on the response and event of the previous trial. Incorporation into bhe model of the irregular nature of the event sequence should permit data analysis that would test the basic theoretical ideas in such an extension of the alternation prediction task. The model to be given in
Richard S . Bogartz
364
Section VI,C will do this and will be applied in Section VI,D to data that were obtained from children predicting an event sequence which alternated on a random 75% of the trials. This constitutes one extension of the model to the so-called probability learning task (“so-called” because from the present point of view the subjects do not learn probabilities). In the following section, a new, alternative response axiom is introduced to permit application of the basic ideas to prediction of sequences that tend to repeat rather than alternate.
B. THEUSEOF
A
REPETITION RULE
Thus far we have considered models in which Response Axiom R1, the consequences of which are shown in Fig. 3, has played a part because the tasks have all involved prediction of an alternation sequence. According to this axiom, the child is using an alternation rule with event and response traces. We now wish to introduce a different rule, a repetition rule, because we also wish t o treat data obtained from children who were predicting an event sequence in which the events alternated only 25% of the time, i.e., an event followed itself 75% of the time. Whereas the alternation rule causes the trace of one response t o lead t o the occurrence of the other response and the trace of one event t o lead to the prediction of the other event, the repetition rule is such that the trace of a response results in the occurrence of that same response on the next trial, i.e., a response perseveration or repetition, and the trace of an event results in the prediction that the same event will occur again. Thus, when a repetition rule is in use Response Axiom R1 (Section 11) would be replaced by the following
RESPONSE AXIOMR2. For every Cn-, in X, every i ( i = 1,2), every f (f = 0 , 1 , 2 ) ,everyg (g = 1 , 2 iff # 0 ; g = 1iff = 0), and everyn ( n= 1 , 2 , .. .), p(Ai,nlCn-l
=
iO
1 112
n
t$fi)= P(Ai,nlt$fi) if f = 1 or 2 a,nd i if f = 0 otherwise.
=g
C. Two MODELSFOR PREDICTION OF MARKOVEVENT SEQUENCES Two Markov event sequences (Bush & Mosteller, 1955, p. 124) are of interest here, and a model for prediction of each will be derived. For later notational convenience we shall let the symbol rr denote a probability that is always greater than 112. To define the two types of sequences we again use the notion of C,, an n-cylinder set of the sample space X defined in terms of a t most the first n trial outcomes. (The sample space X for the two models to be considered here can be defined as was that
Short-Term Memory in Binary Prediction by Children
365
defined in Section B). Then, a tending-to-alternate (TA) event sequence is one such that for any C,,
and a tending-to-repeat (TR)event sequence is one such that for any C,,
Thus, for example, a TA event sequence with transition probability matrix
T =
.75 would have a
while the TR matrix for T = .75 would be
'32, n
I ;:1
.25 .75
.
Using Axiom R1 with TA event sequences and Axiom R2 with TR sequences, we obtain the two tree diagrams shown in Figs. 12 and 13, as summaries corresponding to the role of Fig. 3 which summarized the first model for prediction of the single alternation sequence. By arguments similar to those in Section I1 it can be shown that the sequences of correct response and errors for both the TA and the TR models are samples from a two-state first-order Markov chain with stationary transition probabilities. Let &(x) = rx + (1 - .rr)(l- 2). Then for both the TA model and the TR model, the transition matrix for correct responses and errors is cn
en &(R+Y/~) &(B+Y/~) * Thus, for example, P(cn+,Icn)= n(l - y / 2 ) (1 - ~ ) k y / 2 ) . To estimate U , 8, and y , we first note that & ( ~ + y / 2=nec/ne, ) &P y / 2 ) = nee/ne, and $ ( y / Z ) = nce/nc. are maximum likelihood estimates. It is then easily shown that
+
+
Richard S . Bogartz
366
p = -a@ + YP)- Q(YPJ
(43)
2T- 1
are the maximum likelihood estimates of a,@, and y . If either Eq. (42) or (43)yields an estimate that falls outside the permissible values for probabilities, modified estimates must be used. These modified estimates are given in Table XXIII for the various possible violations that may arise. TRIAL n RESPONSE EVENT PAIR
TRIAL n ENCODING RESPONSE
l
IV 8 < 0 B1 6+/3>1
VI 6 > 1 B .5 The most common “probability learning” task is that involving noncontingent binary events. I n this case, the sequence of events E, and E, is a sequence of independent trials in which E l occurs with a probability T 2 .5 and E, occurs with probability 1 - n-. (There are other variations involving blank trials in which neither event occurs and double trials in which both occur, but these will not concern us.) The transition matrix for the event sequence is then E*,fl+I
C” n E,*fl cn
n E2.n
E2,fltI
374
Richard S. Bogartz
for every C,. The case in which rr = .5 requires a separate model so the first two models t o be considered will apply only t o the case rr > .5.
B. INDIVIDUAL DIFFERENCES IN RULESELECTION I n Section VI it was suggested that a t least two rules, a repetition rule and an alternation rule, tend t o be adopted by the children we have studied. As it happened, all of the children predicting the TR sequence in Experiment V I could be assumed t o have been using a repetition rule and all of the children predicting the TA sequence could be assumed to have been using the alternation rule. This sort of uniformity of rule adoption is by no means typical. Bogartz (1965) found some children alternating even with an essentially noncontingent sequence with rr = .8 (in which case the events tend t o repeat on about 68% of the trials). Craig and Myers (1963) found, as we would expect, that kindergarten children alternated more to a sequence with rr = .6 than t o one with T = .8, the former having more event alternations than the latter. Also, of course, children can be expected to have initial preferences for rules they may use almost regardless of the probabilistic structure of the event sequence. The major consequence of such individual differences in rule adoption is that tests of the theory will have to incorporate the different rule possibilities. It would be a mistake, for example, to apply a repetition rule model to data from a group of subjects some of whom were using an alternation rule. In the absence of any a priori reason for believing a given subject was using a particular rule (such information could come from analysis of pretraining data, instructions given to the child concerning the use of rules, and so on), the logic of deciding which rule and, therefore, which model applies, seems t o be that of parameter estimation, although of a somewhat unusual sort. Further comment on this will now be postponed until the two models of interest are presented.
C. Two MODELSFOR PREDICTION OF NONCONTINGENT EVENT WITH T > .5 SEQUENCES To derive the two models of concern here, we continue to apply the same basic ideas as before concerning the processes governing the child’s predictions but incorporate the new set of event probabilities. This immediately gives the tree diagram shown in Fig. 14, assuming a repetition rule to be in effect. Analysis of this stochastic structure reveals that the sequence of correct responses and errors is not a first-order Markov chain as it has been in the models considered above but, in fact, the sequence of A , and A, responses is such a Markov chain with stationary transition probability matrix
Short-Term Memory in Binary Prediction by Children
375
It is of interest here to note that the use of a repetition rule with a = 1.0 (use of only the event trace) is equivalent to the well-known
win-stay, lose-switch strategy. The present formulation is more general TRIAL n RESPONSE EVENT PAIR
TRIAL n ENCODlNG RESPONSE
TRIAL n + i MEMORY TRACE
TRIAL n + i RESPONSE
TRIAL n + i EVENT
REPETITION RULE
a h,n