This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0 in Footnote 1) or that both effects are nil (b = d = 0). Alternatively, we could posit a compromise Computation-Pre-storage model in which absolute and consistent relative information is stored and inconsistent relative information computed. But if anything, this latter model is more ad hoc than the one outlined above. and we stick to the former alternative in the discussions that follow.
Relative properties in memnrv
I61
will be labeled short). However, inconsistent items will possess two such properties, marked with respect to the alternative reference points: for example, poinsettia will be listed as tall-for-flowers and short-for-averageobjects. Both predicates will have to be checked in verifying inconsistent PA sentences, and a choice made between them, yielding slow and errorprone responses. But with PN sentences, only the superordinate reference point is considered, producing faster, error-free decisions. This modified Pre-storage model, like its Computation rival. can therefore account for the results of Figures 1 and 2, but at the expense of some ad hoc assumptions. In short,. our findings are sufficient to reject both of the models in their original form. However, both can be revived by including the distinction between consistent and inconsistent items. To differentiate these modified theories, we need to explore variables other than syntactic form. petunia
Experiment 2 The main feature that distinguishes the Computation model from the Prestorage model is its extra comparison step. Previous studies have identified factors that affect this step, and if we can show that these factors also affect verification of sentences with relative adjectives, we will have obtained some prima facie support for the Computation model. This is the strategy that we pursue in the experiments reported below, using symbolic distance as the critical factor. In Experiment 2 we look for evidence of this effect in ratings of the truth of relative sentences, while in Experiment 3 we use reaction time data. Symbolic
distance predictions
Symbolic distance is the subjective difference in the size of two objects (Moyer & Bayer, 1976), and in general, it determines the speed with which objects can be mentally compared, with greater distance producing faster comparison times. For example, subjects take less time to decide that a horse is larger than a rabbit than that a horse is larger than a deer (Banks & Flora, 1977; Holyoak, 1977; Jamieson & Petrusic, 1975; Moyer, 1973; and Paivio, 1975). The mechanism responsible for this effect is a matter of current dispute (see Banks, 1977, and Moyer & Dumais, 1978, for reviews), but in most of the theories proposed to date, the size values of the two objects are retrieved from the relevant concepts in semantic memory and are compared to determine which is larger. This process is clearly similar to that of the Computation model and suggests that we look for symbolic distance effects in relative sentences.
162
L. J. Rips and W. Turrzbull
However, in this case, the critical distance will be between a subordinate item and its superordinate category, rather than between two coordinate categories. For example, consider the true PN sentences Airplanes are large vehicles and Trucks are large vehicles. According to the Computation model, these sentences are verified by comparing the size of an airplane or truck to that of a normal-sized vehicle. Assuming that the symbolic distance between airplane and vehicle is greater than that between truck and vehicle, it should be easier to confirm the first of the two sentences above. This symbolic distance prediction can be elaborated in view of the results of Experiment 1. In explaining the consistency findings of that experiment, we were led to assume that the Computation model performs two comparisons in verifying relative sentences. If this hypothesis is correct, we should also expect to find two symbolic distance effects: one of them will depend on the difference in size between the subject category and the superordinate reference point, while the other will depend on the difference between the subject category and the object reference point. To put this a bit more precisely, let I denote the normal (subjective) size of instances in the subject category, S the reference point for the immediate superordinate, and 0 the reference point for objects. The difference 1 - S then represents the amount by which instances of the subject category exceed the superordinate reference point, and I 0 represents the amount by which the instances exceed the object reference point. For a given subject category, we will call I - S its “superordinate size” and Z 0 its “object size”. In these terms, the Computation model predicts that the perceived truth of a sentence containing an adjective like big will increase as superordinate size increases. For example, Airplanes are big (vehicles) should be given higher truth ratings than Trucks are big (vehicles) since airplanes are bigger vehicles than trucks. In the same vein, the model predicts that rated truth will increase with increasing object size. For although both airplanes and elephants are large members of their respective categories, Airplanes are big (vehicles) should receive higher ratings than Elephants are big fanimals) since airplanes are bigger than elephants. These effects, however, may depend on whether or not the superordinate is specified. The superordinates vehicle and animal indicate that the truth of the sentence should be judged relative to these categories. So while the effect of superordinate size may be greater for predicate-noun than predicate-adjective constructions, the effect of object size should be greater for the predicate adjectives. Of course, all of these predictions are peculiar to the Computation model. Since the Pre-storage model has no comparison stage, it does not predict variations in ratings with changes in distance.
Relative properties in memory
163
Method We began with 172 of the items from Experiment 1 for which there had been good agreement about immediate superordinates (one noun from the original set was inadvertently omitted). For each of these items, separate groups of subjects were asked to provide ratings of the following variables. (a) the size of the items with respect to an average member of their immediate superordinate (e.g., the size of apples with respect to the average fruit); (b) the size of the items with respect to average objects; (c) the truth of PA sentences of the form I’S are big (e.g., Apples are big); (d) the truth of PN sentences of the form I’s are big S’s (e.g., Apples are big fruits). The first two of these measures were used to determine symbolic distance. The truth ratings in (c) and (d) serve as dependent variables. For the size ratings of Task (a), we followed the procedure described in Experiment 1 (see the section entitled Rating Task). The procedure for Task (b) was somewhat similar; however, in this case, subjects received a dittoed set of instructions together with a computer-generated list consisting of noun-adjective pairs (e.g., apple-big). The instructions asked subjects to compare each item to an object of average size with respect to the indicated property. The subject used an 1 l-point scale for his response, with 0 designated much less than average, 5 average, and 10 much more than average. All of the 172 nouns were paired with the adjective big, but a number of these nouns were repeated with other relative adjectives. These additional pairs were used to examine the consistency of the items in Experiment 1, as described in the previous Results and Discussion section. Altogether the list contained 306 pairs. Order of the pairs on the list was randomized in a new order for each subject. (This was true as well for the lists associated with Tasks (c) and (d) below.) In the remaining tasks, subjects evaluated the degree of truth for a set of PA sentences (Task (c)) or PN sentences (Task (d)). All 172 nouns appeared with the adjective big, but as in Task (b), some of the nouns were repeated with other adjectives. The ratings were made on the usual 1 l-point scale, with 0 denoting definitely false and 10 denoting definitely true. Forty-eight subjects provided these ratings, twelve in each task. The subjects were part of the same population as those in Experiment 1, but had not taken part in the earlier study. They were tested in groups of from I to 12 individuals and were paid $2.00 for an hour-long session.
164
I.. J. Kips at& W. Turnbull
Results and Discussion in order to assess our predictions statistically, we performed a regression analysis on the mean truth ratings with superordinate size, object size, and sentence construction as independent variables. This choice of method was motivated by the continuous nature of the two size variables and the correlatron between them (r = 0.65 for these data). All effects were estimated usmg the procedure for repeated measures described by Cohen and Cohen (1975, Ch. 10). The truth and size ratings were first re-expressed to compenbate for the upper and lower bounds of the scale using the logit transformatron Y = log X -~ log( 10 X), where X is the rating on the original U to 10 scale and Y is the transformed variable (Mosteller & Tukey, 1977, Ch. 5). The construction factor was coded +l for predicate-noun sentences and -- 1 tar predicate-adjective sentences. The results of this analysis were in good agreement with the predictions of the Computation model. First, the rated truth value of a sentence increased as the superordinate size of the subject noun increased (b = 0.70, SE = 0.034, J( 1,169) = 424). indicating that subject judgments were sensitrve to average size within the immediate superordinate category. This effect is larger for predicate-noun constructions where the superordinate is mentioned, than for predicate-adjective sentences where it is not (b = 0.28, SE = 0.02 1, F( 1, 169) = 180). More interestingly, object size had an independent effect on truth ratings (17 = 0.44, SE = 0.040, F( 1, 169) = 116), suggesting that subjects also made use of a standard associated with everyday objects. Object size also interacted with sentence construction, exerting a larger influence on predicate-adjective than predicate-noun sentences (b = --0.19, SE = 0.025, F( 1, 169) = 59). So while superordinate size dominates for predicate-noun constructions, object size is more important for predicate-adjective items. Superordinate size, object size, and their interactions with sentence type together account for 85.8% of the variance among the mean truth ratings, and the success of the Computation model in predicting these ratings encourages us to look for similar effects in a reaction time task like that of Experiment 1. The ratings collected in the present study stand us in good stead in this regard, since they allow us to partition the stimulus instances into those high and low along both size continua. Moreover, the truth ratings can be used to assign a truth value to individual sentences rather than relying on the now suspect Immediate Superordinate hypothesis.
Relative properties in rrwrwty
Experiment
105
3
The predictions of the Computation model are slightly mure cornplicareci for reaction times than for truth ratings. The perceived truth of the sentence X’S urc big should increase as the X’s increase in size, bur verifica’iion time for such a sentence should instead follow an mverred U-shaped f’uncrion. As the X’s get larger, X’S are big goes fi-om being detlmreiy false (e.g., Hummingbirds are big) to dubiously false (e.g.,Spurrows are big) 50 dubiously true (Pigeons ure big) to definitely true (Flamzngos are big), with slower verlfication rmle for the dubious cases. This means that our superorciinate sir,e and object size variables should interact with the truth of Ihe srimuius sentences. For true sentences, verification time should decrease with superordirlate (or ObJect) size, while for false sentences, verification time should increase with size. The results of Experiment 2 also prompt us to expect an inferaction of the size variables with sentence construction, superordinate SIK having a larger effect on predicate-noun sentences and object size having a larger effect on predicate-adjective sentences.
Method We again used the adjective big to test the above predictions. To one group of subjects (analogous to the PA group of Experiment 1) we presented plural nouns (e.g., apples) singly on a CRT. On each trial, the subject was to decide whether the sentence frame “_ are big” would be true if the noun was substituted in the blank. A second, PN, group viewed the same nouns this time accompanied by their immediate superordinates, (e.g., apples-jkuits), and they were asked to determine the truth of the sentence L‘____ are big __” when the instance filled the first slot and the superordinate the second (the frames themselves did not appear during the trial). To select the stimuli, we employed the ratings collected in Experiments 1 and 2. From the pool of 172 nouns used in the second experiment, we selected 1 12 according to the following criteria. First, for each item, both the predicate-adjective and predicate-noun sentences that contained it received a mean truth rating greater than 5.00 (True items) or both received a mean rating less than 5.00 (False items). This rule was adopted to simplify the analysis of the results, since the truth of a given item is fixed across PA and PN groups. The True items and False items were then separately classified as large or small with respect to rated superordinate and objecr size. This classification produces eight categories (e.g., true, small superordinate size, large object size; false, large superordinate size, small object size; etc.),
166
L. J. Rips and W. Turnbull
and the final set of items was chosen with an equal number of instances (viz., 14) in each category. For the True instances, mean superordinate size was 8.24 for large items and 6.26 for small ones; mean object size was 6.52 for large and 4.72 for small items on the 0 to 10 scale (SE = 0.24). For False instances, mean superordinate size was 4.38 for large and 2.52 for small items, while object size was 3.56 for large and 1.77 for small items (SE = 0.23). Median word frequency for the True instances was 9.5 tokens per million words for small superordinate size, 5.5 for large superordinate size, 4.5 for small object size, and 26 for large object size. The corresponding frequencies for False instances were 0, 2, 0, and 3.5 tokens per million. To these critical items we added 34 fillers so that for most (9 of 16) superordinate categories, half of the instances in each were True and half were False. Over the entire set of 146 items there were also an equal number of True and False instances. This set of instances was presented to subjects four times in successive blocks of trials, two during a first day’s session and two on a following day, with stimulus order randomized anew at each presentation. The procedure during a trial was similar to that of Experiment 1, but with a few minor changes. The subject was seated this time at a CRT terminal with a response apparatus that consisted of a button at his left and three buttons about 18 mm apart on his right. He initiated the trial by pressing the left-hand button with his left index finger, and for a 2 set interval thereafter he saw the word “ready” presented on the screen about 400 mm away. At the end of the warning interval, the ready signal was replaced by either a single instance (for the PA group) or a superordinate-instance pair (for the PN group) with the superordinate just above the instance. The subject made his true or false decision by moving his right index finger from the center of the three buttons on the right to one of the neighboring buttons. The position of the True button was at the right of center for half the subjects in each group and at the left for the other half. The response terminated the display and was followed by a 2 set period in which the reaction time for that trial (but no indication of accuracy) was presented to the subject as feedback. At the end of a session (i.e., two blocks of trials) the experimenter informed the subject of both his mean reaction time and error rate. Delaying accuracy feedback until the end of the session was intended to discourage rote learning of the assigned truth value, while at the same time encouraging correct responses. The experiment was preceded by a 20-trial practice session during which subjects were asked to press the appropriate button in response to the word “true” or “false”. The PA and PN groups consisted of eight subjects each. These subjects were right handed and belonged to the same subject pool as those of Experi-
Relative properties in memory
167
ments 1 and 2; however, none of them had been involved in the previous experiments. They received $4.00 apiece for participating, plus a 50 cent bonus for each block in which their error rate was less than 10 per cent. Subjects received an average of $4.47.
Results and Discussion The main reaction time and error data are shown in Table 1. Perhaps the most obvious fact about them is the very fast times for the PN group, a difference that may be due to the way we presented the stimuli. PN subjects saw the superordinate noun above and slightly before the instance, and the superordinate may have given them a headstart in processing the following word (Meyer & Schvaneveldt, 197 1). In most respects the data bear out the predictions of the Computation model, and to see this, consider first the results of the superordinate size variation. If we combine, for the moment, data from the PA and PN groups, we find that mean correct verification time for True items decreases from 906 msec for small instances to 840 msec for large ones. However, verification time for False items increases with size from 861 to 9 13 msec, producing the predicted interaction of superordinate size and truth; SE (subjects) = 21 msec, SE(items) = 22 msec, min F’ (1,41) = 4.79, p < 0.05 (for the min F’ statistic, see H. Clark, 1973). The error rates also conform to this pattern, decreasing with size from 21.5% to 9.2% for True items and increasing with size from 9.2% to 15.2% for False items (SE(subjects) = 1.3%, SE(items) = 2.0%, min F (1,46) = 11.33,~ < 0.01). The effect of object size is at least equally strong in Table 1. The interaction with truth value is evidenced by decreasing verification time with size for True items (from 9 12 msec for small instances to 834 msec for large ones) and increasing times for False items (from 828 msec to 946 msec). Error rates again echo the reaction times, dropping from 19.4% for small True items to 11.2% for large Trues, and rising from 7.0% for small False items to 17.5% for large False ones. This interaction is once again significant for reaction times (SE(subjects) = 19 msec, SE(items) = 22 msec, min F’ (1,63) = 10.6 1, p < 0.01) and for error rates as well (SE(subjects) = SE (items) = 2.0%, min F’(1,52) = 11.2,p < 0.01). We can also check the way the above effects differ for the PA and PN groups. On the basis of Experiment 2, we would expect larger effects of superordinate size for predicate-noun constructions, but larger object size effects for the predicate-adjective items. In the context of the present experiment, these predictions imply that the interaction of superordinate size and
168
L. J. Rips and W. Turnbull
Table 1.
Mean Reaction Time (msec) and Percent Errors (in ParenthcsesJ for Prediute Adjective (PA) and Predicute Noun (PN) Sentences in Experiment 3, b.v Truth, Superordinate Size, and Object Size.
SUpUordinate Size
True PA Sentences
..__~~~~~ Object
ITake
PA Sentences
Objwt
SIX
True PN Sentences
SIX
Object
Size
I.&e
PN Scntenccc
Object
Size
Small
L31pJ
SIIl~ll
La1g:u
Small
Lllrgc
SInall
LClQ!C
1175 (31.U)
1CD.)3 (Y.6)
922 (2.0)
1142 (1 X.5)
135 (27.2)
112 (18.1)
6Y1 (3.8)
688 (12.7)
10x0 (13.6)
988 (1 1.6) ~__ __-
984 (5.6)
1189 (17.8)
656 (6.0)
634 (5.6)
716 (16.5)
763 (2 1 SJ) _
.~_
truth. examined above, should be larger for the PN than for the PA group, while the object size by truth interaction should show the reverse effect. Turning to the data, we find that the relevant difference in reaction time for superordinate size is not reliable, though it shows a trend in the predicted direction (S&subjects) = 29 msec, SE(items) = 24 msec, min F’ < 1). The errors show a somewhat stronger effect, with the interaction increasing from 4.6Yc for the PA group to 13.7% for the PN group; however, this effect is only nlarginally significant by the min F’ test (XQsubjccts) = 2.6(,%, SE(items) = 2.0%, nzin I;‘( 1.34) = 3.73, 0.05 < p < 0.10). The object size predictions, however, are clearly confirmed since the size of the interaction is 172 msec for the PA group and only 22 msec for the PN group (SE(subjects) = 27 msec, SE(items) = 24 msec, min F’( 1,38) = 9.32, p < 0.01). The error rates show a parallel difference, though in this case not a significant one (SE(subjects) = 2.8%, SE(items) = 2.0’S, min F’( 1,31) = 2.15, p > 0.10). Taken together, the above results provide rather strong support for the Computation model. Moreover, by partitioning the items, we have been able to show effects of both superordinate and object size when these factors vary orthogonally.
General
Discussion
Computarion
versus Pre-Storage
ModcJls
The Computation model evolved from the basic idea that the truth of sentences with relative adjectives is determined by mental comparison. For the
Relative properties in memory
169
sentence Spruces are tall, this would mean comparing the stored height of spruces with that of its immediate superordinate tree. However, the results of Experiment 1 led us to modify this assumption by suggesting that two comparisons were involved - one to the normal value of the superordinate and the other to a normal value for everyday objects. Experiments 2 and 3 lent some support to this prospect. RTs, errors, and truth ratings all showed effects of symbolic distance to both the superordinate value, and to the object value as well. The Pre-storage model stacks up less well against the evidence. While it was able to explain the results of the first experiment on the assumption that two relative properties are stored, it ran into difficulties in accounting for the symbolic distance effects in Experiments 2 and 3. Of course, our results do not imply that relative properties are never pre-stored; what the evidence rules out is pre-storage for all relative properties of common object concepts. Although the results favor a Computation approach, there are a number of residual problems with such a model that we should consider carefully. One of these concerns the inefficiency of Computation, for it seems redundant to calculate the truth of a relative sentence in the elaborate manner that the model dictates. Why not store the result of an initial computation once and for all so that it can be referred to as needed? The question of efficiency, however, depends on the relative costs attached to storage and processing. If storage consumes a large share of the system’s resources, it may prove more efficient to store a minimal amount of information. By analogy, mental arithmetic would be computationally easier if one memorized the multiplication table for all pairs of numbers less than 100. The fact that few of us do so indicates that computational simplicity must trade against storage economy. Furthermore, while storage is not out of the question for the kinds of sentences considered here, we should remember that relative information is used in other ways as well - for example, to compare two instances (A spruce is taller than a refrigerator) or to compare an instance to a metric reference point (A spruce is more than six feet tall) or to a contextually established reference point (Spruces are the tallest trees on this block). Since there is an unlimited number of such propositions, not all of them can be pre-stored. Given that computation is needed in these cases, it would not be surprising if a similar process were applied to sentences such as those considered here. One can grant the plausibility of a computation process, however, and still object to the model outlined above. In particular, the idea of two distinct comparisons seems odd, since in functional terms a single comparison would be easier to perform and would simplify communication about relative facts. It may be possible to formulate the Computation model in a way
170
L. J. Rips and W. Turnbull
that omits the object comparison and that is still consistent with the experimental results. For example, we can suppose that instead of the double comparisons, a subject weights the result of a single superordinate comparison by the absolute size of the instance, with instances at the extremes of the size continuum receiving high weight. But of course in these terms, the question then becomes why any weighting is needed in determining the sentence’s truth value. Note, too, that something akin to an object reference point is still required in this alternative model to decide what constitutes the upper extreme of a dimension like size that is unbounded above. A second possibility is that the object comparison be explained away as an artifact of the experimental situation. In all of the studies reported here, subjects received a randomized set of instances drawn from a variety of categories, and the range of instances may itself provide a context against which any given instance will appear big (tall, thick, and so on). We can think of this as a type of adaptation level that could be absent in more naturalistic settings. While we have no firm evidence against the adaptation theory, Experiment 3 provides some suggestive data. If the effect of object size is due to adaptation, we should find that this effect increases over the four blocks of trials. But in fact, the opposite trend appears in the results: the crucial interaction of object size and truth decreases steadily (though not significantly) across blocks. Moreover, there are independent reasons why an object reference point might be important. First, for very atypical instances, the immediate superordinate may be uncertain or inaccessible. Second, even if the immediate superordinate is obvious, its reference point may not be. For example, the superordinate size of a category like weapon will vary greatly depending on which instances we are willing to include in this category (the size will be quite large if such things as missiles are included). Both problems can be avoided by using the object reference point. Properties
in Semantic
Memory
While our experiments have tried to determine the status (pre-stored or computed) of relative properties, we have simply assumed that absolute properties are pre-stored. However, it is possible to challenge this assumption, and in fact, there are several good reasons for doing so. First, if we consider properties like being non-pink or being-a-resident-ofa-state-beginning-with-l, then it becomes clear that not all absolute properties can be pre-stored. Non-pink is an absolute property if pink is, but it is unlikely that concepts such as gruss and snow contain non-pink in their pro-
Relative properties in memory
17 1
perty lists. While we could memorize the fact that grass is non-pink, we need not do so, but can infer it from other sources of information. Second, it is easy to imagine how even common absolute properties could be computed rather than stored. In answering the question Is a banana yellow? we may compare the hue of bananas with some prototypical yellow in much the same way as we would compare the size of a banana in determining whether it is big. Te Linde and Paivio (1979) have obtained clear distance effects when subjects must determine the similarity between color chips and a named color. Stephens (Note 1) has also found distance effects for absolute properties of named objects (by asking questions like Which is more yellow - a Iemon or a banana?) that parallel those for relative properties. These possibilities suggest that the substantive difference between relative and absolute adjectives may depend, not on whether they are computed or pre-stored, but on the kind of computation involved. In this respect, the notion of two reference points provides one way that this difference might be framed. As a first approximation, we can suppose that adjectives vary in the importance attached to the superordinate and object points during the comparison. Relative adjectives would depend most on the superordinate point, but for reasons described above, influenced to a lesser extent by the object point. Absolute adjectives, on the other hand, would be dominated mainly by the object point so that judgments would be indifferent to category membership of the modified noun (cf. Wheeler, 1972). In this way, we can account for the logical distinctions proposed by Katz (1972) and Vendler (1968) and, at the same time, explain our intuition of a continuum between absolute and relative adjectives, as discussed earlier. However, viewing adjectives in this way leads us to a number of difficult questions. Clearly, not all properties can be computed, since if this were true there would be nothing for the comparison process to operate on. But while some core of data must be present to make the computations possible, it appears to be a very difficult task to get at these core properties. Perhaps there is some underlying level of analysis in which all properties are prestored. But it is equally possible that pre-storage occurs with just a few landmark instances. For example, in determining whether an object X is big, we could try to recall its relation to some other object Y that we have already determined to be big. If X and Y share the same superordinate, and if we can show that X contains Y, or that Y is a part of X, or that X completely occludes Y when X is immediately in front of Y, then we can deduce that X is also big. Such a process may be less elegant than a simple comparison, but it is not out of the running (see Banks, 1977). Another question concerns the ultimate grounds for the distinction between relative and absolute adjectives. Why, for example, are color terms
172
L. J. Rips and W. Turnbull
absolute and dimensional adjectives relative? The difference apparently does not lie in our ability to distinguish variation in the corresponding qualities, for we can certainly discern degrees of yellowness. Number of underlying dimensions is also immaterial since big, which depends on three dimensions, is no less a relative adjective than tall, which depends on one. One possibility is that the difference has less to do with the type of attribute than with its distribution among objects. For relative adjectives, variability of the corresponding property may be greater between superordinate classes than within them, so that a comparison to the superordinate reference point will convey valuable information. For absolute adjectives, variability may be equally great within as between superordinates, so that such a comparison is irrelevant. This question is far from settled, however, and the distinction may depend also on the integrality of the property (Garner, 1974), the salience of its component dimensions (Kamp, 1975), or the way in which the reference point changes with exposure to new instances (Wheeler, 1972). Finally, it is important to realize that absolute and relative adjectives do not exhaust the range of adjectives in English. For example, we have not considered “fictionalizing” adjectives like mythical that map real entities like fake or pseudo that signal noninto imaginary ones or “negators” membership in a given category (R. Clark, 1970). Adjectives like these probably call for a very different kind of analysis than the one offered above. However, these items take us further from the traditional view of adjectives as properties stored with the nouns they modify, and in this way, they echo the message of the preceding studies.
References Anderson, J. R. (1976) Language, memory, and thought. Hillsdale, N.J., Lawrence Erlbaum Associates. Banks, W. P. (1977) Encoding and processing of symbolic information in comparative judgments. In G. H. Bower (Ed.), The psychology of learning and motivarion (Vol. 1 I), New York, Academic Press. Banks, W. P., and Flora, J. (1977) Semantic and perceptual processes in symbolic comparisons. J. exp. Psychol.: Human Perception and Performance, 3, 278-290. Battig, W. F., and Montague, W. E. (1969) Category norms for verbal items in 56 categories. A replication of the Connecticut category norms. J. exper. Psychol. Mono., 80, (3, Pt. 2). Bierwisch, M. (1967) Some universals of German adjectivals. Found. Lang., 3, 1-36. Bierwisch, M. (1971) On classifying semantic features. In D. D. Steinberg and L. A. Jakobovits (Eds.), Semantics: An interdisciplinary reader in philosophy, linguistics, and psychology. Cambridge, Cambridge University Press. Clark, H. H. (1973) The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. J. verb. Learn. verb. Behav., 12, 335-359. Clark. R. (1970) Concerning the logic of predicate modifiers. Nous, 4. 31 l-355. Cohen, J., and Cohen, P. (1975) Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, N.J., Lawrence Erlbaum Associates.
Relative
Collins.
properties
in memop
173
A. M.. and Loftus. E. F. (1975) A spreadin?-activation theory of semantic processing. ps,vvrhol. Rev., 82. 407-428. Cresswell. M. J. (1976) The semantics of degree. In B. H. Partee (Ed.).Monfa4regrammar. New York, Academic Press. Fillmore. C. J. (1971) Entailment rules in a semantic theory. In J. F. Ronenbere and C. Travis (Eds.), Readings in the philosophy of language. Englewood Cliffs. N.J.. Prentice-Hall. Garner, W. R. (1974) 7’he processing of information and structure. Potomac. Md.. Lawrence Erlbaum Associates. Higgins, E. T. (1976) Effects of presuppositions on deductive reasonine. J. verb. Learn. verb. Rehav., U-,419-430. Holyoak, K. J. (19773 The form of analog size information in memory. Cog. Psychol.. 9. 31-51. Huttenlocher, J., and Higgins, E. T. (1971) Adjectives. comparatives, and sylloeisms. P.sychol. Rev.. 78. 487-504. Jamieson, D. G,, and Petrusic, W. M. (1975) Relational judgments with remembered stimuli. Percep. Psychophys.. 18, 373-378. Kamp. J. A. W. (1975) Two theories about adjectives. In E. L. Keenan (Ed.), Formal semantics of natural language. Cambridge, Cambridge University Press. Katz. J. J. (1972) Semantic theory. New York, Harper and Row. Kintsch. W. (1974) The representation of mean{ng in memory. Hillsdale, N.J., Lawrence Erlbaum Associates. Kuzera, H., and Francis, W. N. (1976) Computational analysis of presentdav American English. Provi: dence, R.I., Brown University Press. Langford, C. H. (1942) Moore’s notion of analysis. In P. A. Schilpp (Ed.). The philosophy of G. E. Moore. Chicago, Northwestern University Press. Meyer, D. E., and Schvaneveldt, R. W. (1971) Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations. J. exper. Psychol., 90, 227-234. Miller, G. A., and Johnson-Laird, P. N. (1976) Langua,qe and perception. Cambridge. Mass. Belknap Press. Mosteller, F., and Tukey, J. W. (1977) Data analysis and regression. Reading, Mass., Addison-Wesley. Moyer, R. S. (1973) Comparing objects in memory: Evidence suggesting an internal psychophysics. Percep. Psychophys., 13, 180-184. Moyer, R. S. and Bayer, R. H. (1976) Mental comparison and the symbolic distance effect. Cog. PsychoI., 8, 228-246. Moyer, R. S., and Dumais, S. T. (1978) Mental comparisons. In G. H. Bow& (Ed.), The ps.ychoZogy of learning and motivation (Vol. 12). New York, Academic Press. Paivio, A. (1975) Perceptual comparisons through the mind’s eye. Mem. Cog., 3, 635&647. Parsons, T. (1972) Some problems concerning the logic of grammatical modifiers, In D. Davidson & G. Harman (Eds.), Semantics of natural language. Dordrecht, Holland, D. Reidel. Ross, W. D. (1930) The right and thegood. Oxford, Clarendon Press. Sapir, E. (1944) Grading: A study in semantics. Philos. Sci., II, 93--116. Smith, E. E. (1978) Theories of semantic memory. In W. K. Estes (Ed.), Handbook of learning and cognitive processes (Vol. 6). Hillsdale, N.J., Lawrence Erlbaum Associates. Smith, E. E., Rips, L. J., and Shoben. E. J. (1974) Semantic memory and psychological semantics. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 8). New York. Academic Press. Suzuki, T. (1970) An essay on the anthropomorphic norm. In R. Jakobson and S. Kawamoto (Eds.), Studies in general and oriental linguistics. Tokyo, TEC Company. Te Linde, J. and Paivio, A. (1979) Symbolic comparisons of color similarity. Memo Cog., 7, 141-148. Vendler, Z. (1968) Adjectives and nominalizations. The Hague, Mouton. Wallace, J. (1972) Positive, comparative, superlative. J. Pbilos., 69, 773-782. Wheeler, S. C. (1972) Attributives and their modifiers. Nous, 6, 310-334. Wierzbicka, A. (1972) Semantic primitives. Frankfurt, Athenaum. Winer, B. J. (197 1) Statistical principles in experimental design. New York: McGraw-Hill.
174
L. J. Ripsand
W. Turnbull
Reference Notes 1. Stephens, D. (1978) Processing of pictures versus words in a comparative judgment lished manuscript, University of Chicago.
task. Unpub-
Des etudes anterieures sur la memoire semantique ont omis une distinction importante parmi ce que I’on a appeld les “assertions de proprietes”. Les assertions avec des adjectifs i caractere relatif (Ex. les flamants sont grands) impliquent une comparaison avec un point de reference ou une norme standard associee i la categoric superordonnee (Ex. un flamant est grand en tant que oiseau). La valeur de v&it6 des assertions avec des adjectifs absolus (Ex. les flamants sont roses) est g&&alement indipendante de ce type de reference. Les consequences psychologiques de cette distinction ont et6 ktudiees dans 1’Experience 1. Les sujets ont pour tache de verifier des phrases in&ant soit des adjectifs absolus soit des adjectifs i caractere relatif dans des structures de type predicatadjectif (Ex. un flamant est grand (rose)) ou dans des structures de type pridicat-nom (Ex. un flamant est un grand oiseau (oiseau rose)). Dans ce cas le nom ‘predique est le superordonne immediat. Les temps de reaction et les erreurs sont moindres pour les phrases a caractere relatif lorsque le terme superordonne est specific. Pour les phrases absolues il n’y a pas de difference. Ces donnees suggerent que la valeur de v&it6 des phrases a caractere relatif ne depend pas seuiement du terme superordonne mais aussi de normes plus g&kales pour les objets familiers. Dans 1’Experience 2 on montre que le classement des valeurs de &rite des phrases a caractere relatif est fonction de la difference entre I’exemple donne et le superordonne standard (par exemple, la taille du flamant par rapport i celle d’un oiseau ordinaire), et de la difference entre I’exemple et la norme des objets familiers. Dans 1’Experience 3 on retrouve ces resultats en utilisant comme mesure dependante le temps de reaction.
Cognition, @Elsevier
8 (1980) 175-185 Sequoia S.A., Lausanne
3 - Printed
in the Netherlands
Very long term memory for tacit knowledge*
RHIANON
ALLEN**
The Graduate
ARTHUR Brooklyn
Center of CUNY
S. REBER College of CUNY
Abstract Very long term memory for abstract materials was examined by recalling subjects who had served in a synthetic grammar learning experiment two years earlier. In that study (Reber & Allen, 1978) we differentiated among several cognitive modes of acquisition, their resultant memorial representations, and their associated decision processes. Two years later and without any opportunity for rehearsal or relearning, subjects still retain knowledge of these grammars to a remarkable degree. Although some differences have become blurred with the passage of time, the form and structure of that knowledge and the manner in which it is put to use remain strikingly similar to the original. That is, differences traceable to acquisition mode and conditions of initial training can still be observed. As in the original study, these results are discussed within the general context of a functionalist approach to complex cognitive processes. This paper is a report of rather remarkably persistent long term memory for highly abstract and complex materials; specifically, the knowledge of the grammatical structure of two artificial languages after a two year hiatus.’ In researching the area of very long term memory we were struck by the lack of attention which has been paid to memories of this kind. For the most part, the study of long term memory has dealt with real world knowledge
*This research was conducted while the senior author was supported by a doctoral fellowship from the Social Sciences and Humanities Research Council of Canada. **Requests for reprints should be sent to Rhianon Allen, Developmental Psychology Program, The Graduate Center of CUNY, 33 West 42nd Street, New York, NY, 10036, USA. ‘The original study was reported two years ago in this journal (Reber & Allen, 1978). Although we provide a synopsis below of the major findings of that experiment, the interested reader should refer to that paper for details on procedure and results as well as a full discussion of the theoretical issues which underlie the learning of complexly structured, rule-governed stimulus domains.
176
R. Allen and A. S. Reber
which is both highly codable and likely to be either rehearsed or refreshed by day to day activities and events. The few studies that we found which used arbitrary stimuli, however, suggested that human memory is certainly quite robust (Wickelgren, 1972; Burtt, 1941; Kolers, 1976). In this experiment we take these notions of arbitrariness and nonrehearsability of the stimulus materials to previously unexplored extremes. First, we are focusing on very long term memory for knowledge of a complex stimulus domain which was specifically selected to be as remote as possible from normal day to day activities. ’ Second, the knowledge of grammatical structure which resulted from the original learning was largely unconscious so we shall be looking at the longevity of implicit knowledge, not explicit. Third, during the original experiment the subjects did not know that there was to be this later follow-up and we can be quite confident that our subjects have not rehearsed the material in the interim. Indeed, it is far from clear, given our understanding of memorial strategies, how one can rehearse abstract information which is tacitly coded.3 Before proceeding, it seems prudent to review briefly some of the basic findings from the original study so that the kinds of memorial residues we are looking for can be specified. In that experiment, subjects learned about the underlying grammatical structure of two different artificial languages under two different training conditions. On one occasion a paired associate (PA) procedure was used where exemplary letter strings from one artificial language were paired with the names of cities; on the other occasion an observation (OBS) procedure was used in which the same subjects attended to a series of exemplars from the other language. Knowledge of each synthetic language was assessed using a well-formedness test in which subjects had to judge the “grammaticality” of a large number of novel letter strings. The results revealed that subjects have available three basic cognitive modes for acquiring knowledge of such complex stimulus environments. 2The decision to use stimulus materials whose structure was dictated by finite-state grammars was motivated by theoretical issues concerning acquisition of tacit knowledge. They seemed a reasonable choice because they are arbitrary and can be made arbitrarily complex; they are organized and deeply so; and they have structural forms that are most unlikely to be amenable to the typical subject’s bank of heuristic devices for learning about rule-governed systems. These points are discussed in more detail in Reber and Click (forthcoming). 3This issue of unconscious rehearsal or unconscious “work” has received some attention in the area of problem solving. The so-called “incubation” period during which solutions to problems are often achieved certainly suggests that some kind of long term unconscious cogitation takes place. A nice discussion of a number of mechanisms which could be operating to produce the incubation effect may be found in Posner (1973). However, we suspect that there are fundamental differences here since the problem solver is aware that he is a problem solver; our subjects are unaware that they were later to be memorially responsible for the material learned.
Vety long term memory for tacit knowledge
177
Each acquisition mode results in a particular form of memorial representation and an attendant set of operations for making decisions. Let us review each. (a) Explicit
rule induction
This procedure consists of the overt formation and testing of hypotheses about aspects of letter order and the establishment of consciously held rules. We found this mode appearing to a limited extent in both the PA and OBS training procedures. Typically, these rules were correct reflections of the letter order constraints although they were not particularly sophisticated. They consisted almost entirely of relatively simple notions about short letter groups (primarily bigrams) which occur in initial and terminal positions of letter strings. Even at this simple level, however, subjects reported using them on only about 40% of the test trials. Generally speaking, this explicit mode can be identified with the phenomena which have been extensively studied in the literature on concept formation, problem solving, pattern learning, etc. (b) Individuated
memory
and the analogic strategy
This procedure consists of attending to and memorizing specific items and/or discriminable differences between items during learning - operations which result in a fairly concrete memorial space. The PA task, by its very nature, invited such a mode and hence it was associated almost entirely with that learning procedure. Decisions about the acceptability of test strings tended to be made by searching for an analogy between the to-be-judged item and the contents of this individuated memory (see Brooks, 1978, for a general theoretical discussion of this procedure). Not surprisingly, this strategy led to high accuracy in the assignment of grammaticality to the few “old” test items which had also been part of the learning set. It was, however, also associated with a relatively poor knowledge of structure and a high rate of erroneous rejection of novel grammatical test strings. These “omission” errors were frequently based on a tendency to reject any item for which no acceptable analogy could be recalled. (c) Implicit
learning and the abstraction
strategy
This acquisition mode consists of the unconscious abstraction of the underlying rule system inherent in the exemplars presented during learning. Characteristic of this mode is that little or no specific concrete information
178
R. Allen and A. S. Reber
about the actual learning items is retained, and decisions about the wellformedness of test strings are made largely on an intuitive basis. Although there was evidence that some learning of this kind accompanied the PA procedure, the abstraction strategy was strongly associated with the OBS training procedure which, unlike PA, has no specific task demands. The advantage in dealing with “old” strings found with the analogy strategy was totally absent here; all strings from the learning set are dealt with as if they were novel strings. In this study, then, we are looking for evidence with respect to three important issues in the study of very long term memory. First, can a body of unconscious knowledge be retained for an extended period of time without the opportunity for rehearsal? Second, how important is the mode of acquisition of original knowledge in determining what is retained? Third, how closely does the form of two-year-old knowledge resemble that of original knowledge?
Method Subjects
Of the ten subjects in the original experiment, we were able to recall eight. The two unavailable ones were typical of the group as a whole and, since each was from a different order condition in the original design, there are no reasons to suspect that any systematic biases were introduced by the failure to corral all ten. For reasons explained in the original paper, these subjects were hand-picked advanced undergraduates and graduate students who agreed to serve without pay or other remuneration. Stimulus
materials
The stimuli used were the letter strings from the two tests for well-formedness in the original study. In that experiment, the knowledge of grammatical structure acquired during learning was evaluated by presenting each subject with a set of 100 strings of letters (actually only 50 distinct items were used, each being presented twice), one-half of which conformed with the rules for letter order (the grammatical strings) and one-half of which contained one or more violations of those rules (the nongrammatical strings). Details of these test items are given in Reber and Allen (1978). For our purposes here note that five of the grammatical strings had been used as part of the original
Very long term memoy for tacit knowledge
learning occurred
stimuli (the “old” items) and the other only during testing (the “novel” items).
20 grammatical
179
strings
Procedure Prior to testing, subjects were told which grammar they would be responsible for and asked in all cases to respond “yes” or “no” depending upon whether or not, as best as they could recall, each item conformed to the rules of that grammar. All subjects were reminded that half of the items were acceptable and half were not. There was no opportunity for relearning or refamiliarization with the materials. No other information about the materials or the task was given; no mention was made of the repetition of test items or about the existence of the old items; no feedback about the correctness of their responses was given; and no reference was made to the fact that these test strings were the same ones which had been used two years ago. Both grammars were tested in exactly the same manner, each time reminding the subject about the procedure used to learn that particular grammar two years ago. After completing the well-formedness test, subjects were asked to provide an estimate of how well they thought they had done by estimating how many of the 100 items they classified correctly. Counterbalancing and notation The order of running was counterbalanced with four of the subjects first tested with the strings based on the grammar learned by the PA procedure two years earlier (denoted as PA-1st subjects) and the remaining four subjects beginning testing with the one learned with the OBS procedure (OBS1st subjects). Following testing on the first grammar, subjects proceeded directly to the task for the other grammar (denoted as PA-2nd and OBS2nd). Note that subjects referred to as OBS-2nd are the same subjects as PA-I st and similarly for the PA-2nd and OBS-1st subjects. All subjects were run in the same order condition as two years ago. For example, PA-1st subjects here are the same subjects as those who were described as PA-1st in Reber and Allen (1978). This point will be important later since we will report on some effects that can be traced back to the order of running in the initial training sessions.
180
R. Allen and A. S. Reber
Results Introspections At the outset, only one or two subjects thought that they were now capable of performing above chance on this task. However, as testing continued all reported that they were, to their surprise, becoming more and more aware of their ability to make accurate decisions and all but one of the subjects estimated their performance to be above chance. However, unlike two years ago, the overall correlation between estimated and actual performance was not significantly different from zero:.our subjects knew they were performing above chance but they had no accurate sense of just how well they were doing. The pattern of justifications offered two years ago had revealed some strong differences in the types of reasons given for making decisions following the two learning procedures. Here, no differences were observed. For both grammars we received a mixture of justifications like, “I’m just guessing”, “This one somehow feels right (or wrong)“, “I think I remember this one”, and so forth. However, the frequency of such justifications was very low. Unlike two years ago where roughly 40% of all responses could be justified, a concrete reason for a decision in this follow-up was a relatively rare event. This apparent loss of conscious contact with at least some sense of what is known probably accounts for the lack of confidence that subjects had in their knowledge and the generally poor ability to assess actual performance. Finally here, virtually all subjects felt that the task became easier as testing proceeded and they thought their performance improved consistently. Although there was a trend in this direction over the full course of testing, it failed to reach significance, F(3, 21) = 1.68. The sense of increased performance over trials probably has more to do with a refamiliarization with the task than with an actual increase in the amount of recalled knowledge. Probability of a correct response (PC) Table 1 gives the mean P, values for the grammatical and non-grammatical items for the grammars learned under each condition in both the original experiment and the follow-up. The single most interesting value in this experiment is the overall P, for the follow-up of 0.667. With chance at 0.5 and P, > 0.6 needed for significance for an individual subject, this value demonstrates that sufficient knowledge of these grammars has survived the two year hiatus for our subjects to reliably distinguish well-formed from
Very long term memory for tacit knowledge
18 1
non-well-formed strings. However, it is also clear that there has been a decrease in overall performance; the difference between the P, values from the original and follow-up testing sessions is significant,4 F( 1,7) = 26.2, p < 0.005. Table 1. Item Status
Grammatical Nongrammatical Means
Probability of a Correct Response (P,) on the Two Well-formedness Tasks Original
Follow-up
Task
Task
Observation
Paired Associates
Means
Observation
Paired Associates
Means
0.845 0.775 0.808
0.710 0.780 0.740
0.778 0.778 0.778
0.678 0.690 0.684
0.650 0.650 0.650
0.664 0.670 0.667
In the original experiment the OBS procedure produced better overall performance than PA and there was an item status by training procedure interaction. These effects are no longer statistically detectable. Although on the surface this suggests that there has been a significant loss of knowledge of the grammar learned using the OBS procedure relative to the amount lost from the PA acquired grammar, the interactions which would reflect such an effect (the training procedure by time of testing and the training procedure by time of testing by grammatical status) were not significant, F( 1,7) = 4.88 and 5.43 respectively, p’s > 0.05. Two years ago subjects were better at detecting non-grammatical items which contained multiple errors than those with but a single violation. This effect emerges intact two years later, P, = 0.80 and 0.65 for multiple and single letter violations respectively. Two years ago subjects were also better at detecting non-grammatical items with single violations in the initial position than in any other position. This effect is no longer present; no particular violation location shows an advantage over any other. This result is probably due to the loss of explicit knowledge about letter position constraints. As mentioned above, a large proportion of the justifications which subjects supplied in the initial testing concerned initial letters and initial bigrams. Once this concrete information is lost from memory the detection advantage accruing to first letter violations goes with it.
4Wherever statistical comparisons are drawn between the original and follow-up studies, the data from the two subjects not recalled have been discarded. All tests to follow, therefore, utilize a completely within subjects design with eight subjects. The deletion of these two original subjects seems not to have resulted in any systematic loss of data.
182 R. Allen and A. S. Reber
Representativeness
The issue here concerns the extent to which subjects’ knowledge of structure is an accurate reflection of grammatical structure as displayed in the original learning stimuli. We had noted two kinds of non-representativeness in the original experiment: the explicit rule induction strategy occasionally led subjects to articulate rules which were simply incorrect and the analogy strategy often led subjects to consistently misclassify items on the grounds that candidates for analogy-by-similarity were not in memory. The existence of non-representativeness is detemrined by analyzing the pattern of responses to the two presentations of each test item, comparing (by a x2 test) the number of repeated misclassifications (EE) to the number of single misclassifications (CE and EC). Table 2 shows all four possible patterns from each learning procedure and order of running. Table 2.
Patterns of Responding to Successive Presentations on the Follow-up Wellformedness Task Pattern
Training
Condition
Observation
cc CE EC EE
Run 1st
.~~ Run 2nd
109 24 36 31
104 35 26 35
Paired Associates Run 1st
Run 2nd
93 20 32 55
106 29 40 25
The overall EE rate here is significantly higher than would be expected if errors were simply a result of guessing when an item’s grammatical status is not determinable given a representative knowledge base, x’(3) = 12.0, p < 0.01. This effect, as it was at the time of original testing, is contained entirely in the PA-1st subjects, x*(l) = 10.43,~ < 0.01. This result raises the question of whether subjects are still using the same inappropriate strategies and hence consistently misclassifying the same items on the follow-up. Since exactly the same test items were used both times, the overall pattern of classification responses can be easily traced through all. four separate presentations. The proper test here is to compare the EEEE rate with the mean of the other 14 combinations of correct and error responding to ensure that the most conservative test is being applied. As before, only the PA-1st subjects show a significant tendency to commit repeated errors, x*(l) = 6.13, p < 0.01, with three of the four subjects in this con-
very long term memow for tacit knowledge 183
dition reaching significance. Interestingly, fully 88% of these consistent misclassifications were rejections of items which were actually grammatical. This is the reverse of all other conditions where the modal error was to accept non-grammatical items. The tendency for PA-1st subjects to consistently err on grammatical items thus seems to result from their persistent use of the analogic strategy - a strategy which gets one into difficulty when a test item does not sufficiently resemble the letter strings in memory. “Old” versus ‘novel “grammatical
test items
After original learning all subjects had performed equally effectively when assigning grammatical status to “old” test items and to novel grammatical items. In the follow-up, however, we now observed a significant learning procedure by old/new status interaction, F( 1,7) = 10.62, p < 0.025. Specifically, PA subjects now perform significantly poorer on novel grammatical items than they do on old items, F( 1,7) = 6.15, p < 0.05; OBS subjects show no significant difference. This result clearly indicates that there is retention of specitic learning set materials after a two year lag, and that it is associated with the acquisition mode that most strongly directed subjects’ attention to the physical features of the stimulus material. In summary, there is no doubt that knowledge of these grammars has survived remarkably well. Some of it is in an abstract form and some in reasonably concrete form, and these memorial forms correspond quite closely with the memory systems of two years ago. Moreover, as indicated by the analysis of the response patterns to old and novel items and by the emergence of non-representativeness in the PA-1st subjects, both the beneficial and detrimental impacts of these memorial forms can still be felt.
Discussion To return to the original questions of robustness, form and mode of acquisition, it seems quite remarkable that information gained over the course of a 10 to 15 minute exposure to an artificial language can be retained for as long as two years without intervening exposure or rehearsal. Even two years after learning, all subjects are significantly above chance at assigning grammatical status to test items. But it is not the case that all types of knowledge are equally robust. Explicit, conscious knowledge in particular appears to be relatively fragile in nature.s From a “levels of processing” point of view as ‘Rather, we should say that explicit knowledge is fragile without rehearsal. It seems an obvious point that if a rule (e.g., a chess move) is rehearsed periodically, it will be remembered - perhaps indefinitely. The important notion here is that the other two modes are robust without rehearsal.
184
R. Allen and A. S. Reber
put forward by Craik and his co-workers (Craik & Lockhart, 1972; Craik & Tulving, 1975) one is led to the surprising implication that knowledge gained from conscious, analytic procedures is less deeply processed than knowledge achieved by alternative means. Knowledge acquired in an implicit mode, on the other hand, can still be detected after the two year hiatus; subjects continue to be able to make accurate judgments in the absence of verbalizable knowledge. What is known here is still abstract in nature; no advantage accrues to old items and it remains an accurate reflection of the underlying structure of the grammar. While some blurring of structure knowledge comes with time, and subjects report that immediate intuitive apprehension of grammaticality is somewhat harder to come by, knowledge gained in the implicit mode is persistent in both form and quality. A surprising result was the persistence of individuated memories in the PA-1st subjects. Although few could consciously recall learning set items, they continued to perform at a high rate on “old” letter strings. While these subjects perform well, their reliance on concrete memory and analogy carries some disadvantages. First, holes in memory are not going to be patched in the course of time. Consequently, there is a high likelihood that items initially rejected on the grounds that they do not resemble anything in concrete memory will be repeatedly rejected. Second, the individuated memory space seems to be established at the expense of structural knowledge, resulting in subjects emerging from the PA training session with little aside from concrete memories of letter strings and fragments thereof. From the functionalist point of view which we favor, the level of processing necessary for very long term memory can be attained by either implicit processing or memorization of exemplars. That is, both abstract structural knowledge and concrete individuated memories are processed deeply enough to result in knowledge that is resistant to the passage of time. Yet, pragmatic distinctions can be drawn between these two modes. The abstraction strategy encouraged by the OBS procedure confers an advantage in identification of underlying grammaticality, in the recognition of that which is structurally regular. The memorize-and-analogize strategy optimized by the PA procedure yields an advantage in identification of specific stimuli, in the recognition of that which has been confronted previously. Our original data suggested that strategies of acquisition are tailored to immediate task demands, task expectations, and stimulus parameters. These learning strategies resulted in distinctive types of memorial representations which are still detectable two years after learning. Very long term memory appears not to be uniform in nature. That is, knowledge can be represented
Vev long term memory for tacit knowledge 185
in either abstract or concrete form and it seems to the time of initial entry or formation. While two of their attendant decision processes are remarkably themselves adaptive to different kinds of ecological their application and deployment.
maintain its form from the memorial forms and robust, they may find niches when it comes to
References Brooks,
L. R. (1978) Nonanalytic concept formation and memory for instances. In E. Rosch and B. B. Lloyd (Eds.), Cognition and categorizntion. Hillsdale, N.J., Lawrence Erlbaum Associates. Burtt, H. E. (1941) An experimental study of early childhood memory. J. gener. Psychol., 58, 435439. Craik, F. I. M. and Lockhart, R. S. (1972) Levels of processing: A framework for memory research. J. verb. Learn. verb. Beh., 11, 671-684. Craik, F. I. M. and Tulving, E. (1975) Depth of processing and the retention of words in episodic memory. J. exper. Psychol.: General, 104, 268-294. Kolers, P. (1976) Pattern analyzing memory. Science, 191, 1280 -1281. Posner, M. I. (1973) Cognition: An introduction. Glenview, Ill., Scott, Foresman and Co., Chap. 7. Reber, A. S. and Allen, R. (1978) Analogic and abstraction strategies in synthetic grammar learning: A functionalist interpretation. Cog., 6, 189-221. Reber, A. S. and Glick, J. A. Implicit learning and stage theory. Int. J. Beh. Devel., in press. Wicketgren, W. A. (1972) Trace resistance and the decay of long term memory. J. math. Psychol., 9, 418-455.
Cette recherche Porte sur la memoire i long terme pour un materiel abstrait. Les sujets de I’experience avaient participe, deux ans auparavant, i une experience d’apprentissage de grammaire synthitique. Au tours de cette recherche (Reber and Allen, 1978) on avait degagk plusieurs modes d’acquisition cognitive, les representations en memoire qu’ils induisaient et les processus de decisions qui y ktaient associes. Deux ans plus tard sans qu’il y ait possibilite de repetition ou de reapprentissage, les sujets se souvenaient remarquablement de ces grammaires. Si certaines nuances etaient att&mees avec le temps, la forme et la structure des connaissances et leurs modes d’utilisation restaient t&s cornparables avec les originaux. Les variations remarquekes dans le mode d’acquisition dam I’entrainement initial s’observaient encore. Comme pour la premiere etude, ces resultats sont discutks dans le contexte general d’une approche fonctionnaliste des processus cognitifs complexes.
Cognition, @Elsevier
8 (1980) 187 207 Sequoia S.A., Iausanne
4 -- Printed
in the Netherlands
The acquisition of homonymy” ANN M. PETERS University
of Hawaii
ERAN ZAIDE L University
of California,
Los Angeles
and California
Institute
of
Technology
Abstract The growth in children S ability to perform the task of separating the sounds of words from their meanings was investigated by asking children between 3;3 and 6;3 to select homonyms from pictures. The results show a growth in ability with age, with a jump at 4;4. An investigation of the developmental changes in the strategies employed shows that the task is cognitively complex. Performance in the younger children is more hampered by a resource-limited inability to cope with many cognitive factors all at once than by lack of ability to do the linguistic aspects of the task. These cognitive factors include access to vocabulary, rehearsal o’f intermediate results, and implementation of a search strategy.
Introduction In English, with its phonologically-based writing system (as opposed, for example, to the Chinese ideographically-based system), reading readiness must depend in part on an ability to separate the sounds of words from their meanings. At what point in their linguistic development are Englishspeaking children able to effect this separation? Is there a clearly marked *We thank Deborah Burke for advice on test design, Leslie L. Wolcott for drawing of the test materials, Susan Fischer and Danny Steinberg for help in statistics, and the All Saints Day Care Center for making subjects and facilities available. Thanks are also due to H. and V. Wayland for support of the first author during this study. This work was also supported by NIMH grant MH-03372 and NSF grant BNS76-01629 to Prof. R. W. Sperry, by USPH awards MH-00179 and RR07003,land by NSF grant BNS7 8-247 29 to E. Zaidel.
188
A. M. Peters and E. Zaidel
point at which such an ability appears? In order to be able to separate the sound of a word from its meaning, a child has to be able to operate on language metalinguistically. That is, both cognitive and linguistic development must have progressed to the point where the child is able to manipulate the pieces of language as if they were objects unrelated to any immediate need to communicate. Even very young children can, to some extent, separate the sounds of words from their meanings, but the circumstances under which this can happen seem to be very limited. Thus, children as young as 2;4 years have been observed to play with the sounds of their language in noncommunicative contexts (e.g., Chao, 195 1; Keenan, ms.), and Iwamura (1977) has observed 3-year-olds discussing the pronunciation of words in the context of an ongoing conversation. If, however, children are to have enough control of their phonological systems to make use of them, for instance, in learning to read they must be able to do such metalinguistic tasks whenever it is necessary and not just when the circumstances are optimal. It seems that this ability is still developing past age 3. Thus, it has been shown that although 4-year-olds seem to be able to use phonemic information to recall labels of pictures (they recall more rhyming labels than non-rhyming ones) (Locke, 1971), for 3-year-olds the semantic aspects of labels are still more important since they recall significantly more words in semantically similar ensembles than in phonemically similar control lists (Locke and Locke, 1971). The Lockes observe that their “young Ss were respectful of the symbolic value of language. They treated words as words, as units of reference and meaning, rather than nonrepresentational phonemic strings”. (ibid, p. 189) In our culture, phonological awareness seems first to be introduced to children through the vehicle of rhyme, especially through nursery rhymes and children’s jingles. That is, words are (more or less consciously) juxtaposed which share a partial phonological similarity: they have the same sounds at the ends. Reading readiness exercises go on to introduce another kind of partial phonological similarity: the idea of words which begin with the same sound (alliteration). Total similarity of sound between two words (homonymy) is, however, rarely explicitly brought to children’s attention. Whether this is because homonymy is rarer in the language than rhyme and alliteration, because it is considered too confusing, or because homonymy does not allow access to single phonological units in the way that partial similarity does is unclear and yet it seems as if it should be a simpler task to determine whether there is total phonological similarity (homonymy) between two words than only partial similarity (rhyme or alliteration). In this study, therefore, we ask whether there is an age before which children cannot (in general) separate the sound of a word from its meaning,
The acquisition of homonymy
189
as measured by their ability to find pictures representing two homonymous words from a given set of four pictures. We further ask how the linguistic strategies which children bring to bear in solving such a problem develop with age. In particular, we will look at the errors they make to see whether younger children tend to make more semantically-based errors while older children make more phonologically-based mistakes.
Method Subjects
Thirty middle class children of normal intelligence attending a private day care center in Pasadena, California, participated in the study: there were- 5 boys and 5 girls from each of three age groups. The mean ages for these groups were 3;lO years (range 3;3 to 4;5, s.d. 4 mo.), 4;9 years (range 4;3 to 5;1, s.d. 3 mo.), and 5;8 years(range 5;l to 6;3, s.d. 5 mo.). Materials
Twelve sets of picturable homonym pairs were chosen with a fairly wide range of vocabulary difficulty. ’ Three sets of homonyms were reserved for training; the other 9 pairs were used for testing. For, each of the test sets, four picturable distractor items were found: two semantic associates (one for each member of the pair), a rhyme, and an alliteration, thus making sets of six items. See Table 1 for the entire list of words depicted. For each such set of 6, eight line drawings were made, similar to those in the Peabody Picture Vocabulary Test: one for each of the four distractor items and two different pictures for each of the homonyms. The pictures were arranged in sets of four, each set containing a target word, its homonym, its semantic associate, and either its rhyme or alliteration (chosen randomly). An attempt was made to pick the easier meaning of a homonym pair as the first target word, and these first sets were presented on the first pass (underlined items in Table 1). The second set (with semantic focus on the other word of the ‘Vocabulary difficulty was not easy to estimate ahead of time, partly because most of the homonym pairs are also homographs and the relative frequencies of the two meanings are not separated out in e.g., Thomdike-Lorge, and partly because with the spoken language of preschool children, Thomdike-Lorge and similar sources which are based on written materials seem inappropriate anyway. There was indeed a clear range of vocabulary difficulty which can be inferred from the children’s performance on the tests, but it does not correspond to the Thomdike-Lorge vocabulary ratings, nor does it uniquely predict ability to recognize homonymy.
190
A. M. Peters and E. Zaidel
Homonym sets”
Table 1.
2
Semantic
ring (jewel) glasses (drink) nail (metal)
ring (bell) glasses (specs) nail (finger)
necklace cups hammer
bat (baseball) bow (arrow) horn (instrument) trunk (elephant) tie (cravat) bear night palm (hand) spring (metal) -_
bat (mammal) bow (ribbon) horn (animal) trunk (chest) tie (package) bare knight palm (tree) spring (season)
Homonym
Homonym
1
1
Semantic
2
Rhyme
Allit
Repeat order
Training
1. 2. 3. 4. 5. 6. 7. 8. 9.
mitt gu” drum hippo jacket lion day foot screw
swing glad/girl
hat hoe corn skunk
spider knot tusk suitcases sew clothes queen bush fall
cry pear kite bomb ring
back bone horse train tire barrel knife P_OJ
2. 6. 5. I. 1. 4. 3. 9. 8.
aThe order indicated on the left is the original order of presentation. The order indicated on the right is the order for the repetitions. The underlined items appeared in the first presentations, the other items (plus the homonyms) in the repeats.
pair and using the other phonological associate) was presented on a second pass. The pictures were arranged in rectangular formation and the positions of the four elements were varied so that the homonym pair fell equally often in each of the 6 possible positions. with the positions of the other elements also being randomized. Either all 4 pictures of a set were colored or none was. For example: first pass flying-bat baseball-bat Figure
1 illustrates
repeat pass
mitt hat
baseball-bat back
flying-bat spider
these two test sets.
Testing Procedures
The children were tested one at a time in a small room or office at the school (whichever happened to be free at the time). Sessions took varying lengths of time depending on the ages of the children: some of the younger children took 45 minutes while some of the older children finished in 15. All sessions were tape recorded.
T?zeacquisition of homonymy
Figure 1.
19 1
Sample items from the homonym test: “Find hvo pictures that sound the same but mean different kinds of things. ” Left: first pass; right: repeat pass (see text).
I. Prenaming
Since many of the pictures could be labelled in a number of different ways (e.g., horn/trumpet, palm/tree) an attempt was made to associate the desired label with each picture by means of a preliminary “prenaming” task. Thus, a second set of pictures, identical to those made for the homonym test but excluding the repeat pictures of the homonym words was again arranged in groups of 4 with care now taken that pictures from any given homonym set did not appear in the same group. The child was asked Can you point to X? for each of the 4 labels depicted in each set of pictures. On some of the less obvious items (e.g., bare, finger-nail), the child was warned This one is tricky. If the child couldn’t find a particular picture, the investigators pointed it out and made sure the child could recognize it, also giving a verbal association such as bare like bare feet or nail like on your finger. Any vocabulary difficulties, including hesitations, were noted. This task, then, in addition to helping to associate the desired label with each picture, also gave an estimate of each child’s passive (receptive) vocabulary.
192 A. hf. Peters and E. Zaidel
2. Homo~~yms
a. Traitzhg. Three sets of homonyms were reserved for pretraining (see Table 1). The child was shown the first set of 4 pictures and told, This is a game about words that have the same sound but meal1 different kinds of thillgs. I want ~~ozr to show me two pictures that sound just the same but mean different kirlds of things. Like this: ring, ring. A ring that you put on your finger and rirlg the bell. The)> soured exactly the s&e: ring, ring. But the], mean different kinds of things. The child was then given two practice sets to do before testing was begun. b. Testing: first pass. The first 9 homonym sets were then presented one at a time. All pointing responses were recorded on a preprinted score sheet along with response times as measured by a stopwatch. The child was first asked (Task l), Find two pictures that sound just the same but mean different kinds of things. For each pair that the child pointed to, s/he was asked What’s the word? if the word was not spontaneously given. If the wrong pair was pointed to, the child was encouraged to continue searching. If a rhyme or alliteration was chosen the child was asked, Do those sound exactly the same.?, whereas when a semantically associated pair was chosen, the mvestigator said, Yes, but that’s the same kind of thing. I want two pictures that sound the same but meat1 different kinds of things. If the child gave up or the right pair was not found after several responses or about 30 seconds on Task 1, the investigator pointed to one of the homonym pictures and asked (Task 2), Curl JWU find arlother picture that sounds exactly the same as this one? If s/he still could not find the homonym pair, or seemed to have found it on Task 1 or Task 2 but refused to say the word, s/he was asked (Task 3), Can you point to X.? And can you poitlt to another kirld of X? If the child did this correctly after having silently pointed to the right pair in Task 1 or Task 2, s/he was given credit for knowing the homonym passively. c. Repeat set. After the first 9 homonym sets, the child was shown the prenaming pictures for the distractor items that would appear in the repeat set of homonyms. (Now we’ll play the first pointing game a little more.) As mentioned above, the homonym pictures were also changed on the repeat sets but the new pictures were not shown in either prenaming. The purpose of administering the repeat set was to see whether the children transferred learning from the original task when new pictures depicting the same concepts were shown, or whether performance on the repeats was indistinguishable from that on the original presentations. The repeat sets were administered in a different order (see Table 1) with the placement of the target pairs changed for each homonym. Otherwise, the administration was the same as on the original pass.
The acquisition of homonymy
193
Scoring As soon as possible after testing, the tapes from each session were reviewed and any verbal comments made by the child were transcribed onto a new score sheet along with a copy of the pointing responses and timing information noted at the time of testing. Each child’s responses were scored according to the following rules: 1. Correct responses. a) overt: if the correct pair was indicated and the child could say the word. b) passive: if the correct pair was pointed to and, although the child would not say the word, s/he did Task 3 correctly. 2. Errors a) semantic (S) b) phonological (P), including ( 1) rhyme (RI (2) alliteration (A) c) random association (X), if the child pointed to a pair that was neither correct nor S nor P (i.e., association between the phonological and semantic distractor items, or between the non-target homonym and the semantic distractor). d) no response (-), when the child refused to point to a pair and either said nothing or said I can’t or I don ‘t know. e) phonological “inventions” (I), where the children either tried to invent rhymes or alliterations that were not words used in the prenaming or else tried to force homonymy through neutralization or by “brute force relabelling” (see “Discussion” under “Development of strategies for finding homonyms”). 3. Errors were scored as originally designed unless a verbalization indicated that some other strategy was being used, e.g., Knight (with sword), krzife was scored as A (alliteration), but if the child said knife, sword, it was scored as S (semantic). 4. No response (-) was counted as an error only on the first request for each task, but refusal to make another try after a child had made at least one response was not counted as a further error (since it was assumed that after one overt attempt, no further response simply indicated that the child had no better guess to offer). 5. Any given pair was only counted once even if it was pointed to more than once. 6. If a child indicated a pair but rejected this choice himself, it was not counted. There are several possible ways of assessing each child’s basic “homonym ability” due to the facts that (1) each child was encouraged to keep
194
A. M. Peters and E. Zaidel
searching for each homonym pair until either s/he found it or gave up, (2) when a child did fail at Task 1 the problem was made easier by shifting to Task 2, and (3) “passive” answers were noted. The measures that have been used in this study are: H, = number of overt homonyms found on Task 1, first tries only. H, = number of overt homonyms found on Task 1, all tries. Hz = number of overt homonyms found on Tasks 1 and 2, all tries. H, = number of overt and passive homonyms found on Tasks 1 and 2, all tries. Thus, HO gives a very conservative estimate of homonym finding ability, being restricted to overt first tries only. H, shows how well a child did on Task 1 while H2 reflects performance on both homonym finding tasks. H, is tlie most generous estimate of homonym ability since it also includes passively correct answers.
Results and Discussion Homonym
Performance
and Age
I. Results
The group means for the four homonym scorings are given in Table 2. Significant main effects for age were found when separate 2 X 3 (sex X age) analyses of variance were run for each scoring. Post hoc correlated-sample t-tests showed that the differences between scores on first tries only (H,) and all tries (H,) on Task 1 reached the greatest level of significance for Table 2.
Age group means for the 4 homonym scorings (maximum for each = 18). Significant differences between scorings computed from correlated-sample t-test.
Oldest Middle Youngest All
10.7 9.0 2.5
* **
13.7 10.5 3.5
* ** **
15.6 13.4 6.6
* *
16.5 15.1 8.4
7.4
**
9.2
**
11.9
**
13.3
* p < 0.01. **p < 0.001. Ho HI Hz HP
= = = =
overt overt overt overt
homonyms homonyms homonyms and passive
found on Task 1, first tries. on Task 1, all tries. on Tasks 1 and 2, all tries. homonyms on Tasks 1 and 2, all tries
The acquisition of homonymy
195
the middle group (p < O.OOl), a lesser level for the oldest group (p < 0.01) and were not significant at all for the youngest group. Increases in scores from Task I (H,) to Task 2 (H,) were significant for all three groups, whereas adding in passive scores made significant differences only for the two younger groups (p < 0.0 1) (see Table 2). Even though most of the youngest children could find at least one or two homonyms, there was a clear jump in ability at the boundary between the youngest and middle age groups (4;4 years). This was indicated by a maximum in the value of F [F = 42.6, df = (1,28)], signalling a maximum of certainty in a score difference when the children were ranked by age, and oneway analyses of variance were performed on the H, scores of “older” versus “younger” age groups when the boundary between the two groups was systematically increased. A second maximum value for F [F = 19.2, df = (1,28)] occurred with the boundary between the two groups set at 5;2 years, setting off the 8 oldest children as the most able group. Since a one-way analysis of variance on the differences in scores between Pass 1 and Pass 2 (originals versus repeats) was not significant [F = 1.43, df = (1,58)], these scores have been combined into a total score for each child (maximum = 18). The lack of change from originals to repeats shows that the children evoked the names of the concepts rather than having learned to associate them by rote with specific pictures. The children were not told that the second set of homonyms involved the same words as the first set. 2. Discussion Since the children were always asked to verbalize the homonyms for each pair they chose, it was very clear whether or not they really had found a pair and thus their scores were not compared to chance. Even for the children who had the hardest time, most of them were able to find at least one or two homonym pairs and in each case it was very clear that they understood what their goal was and were aware that they had solved that particular problem. The expected increase with age in ability to find homonyms was clearly indicated by every statistical test we made. More interesting is the relationship between age and scoring that can be seen in Table 2 and which was supported by the t-tests: the older the children, the better they did on their first try (H,), while the younger the children, the more they benefited from more tries and passive scoring. In particular, the oldest children did best within Task 1 with their biggest increase in scores from Ho to H1, while the younger two groups profited most by moving to Task 2 with their biggest increase from H, to Hz. Although no significant interaction between homonym ability and sex was found for any score separately, it is worth noting that while the girls
196
A. M. Peters and E. Zaidel
did better than the boys on Task 1, particularly, Task 2 (Hz) their performance was almost identical; counted the boys did better than the girls. Development 1. Results:
of Strutegies
on first tries (Ho), on and when passives were
for Findirzg Homonq,ms
By Age Group
Three 3-way analyses of variance were performed to look at the effects of age and sex on the strategies used by the children, as reflected by the types of errors they made. In the first analysis, the errors examined were limited to the most common types: phonological and semantic (P and S). In the second, the phonological errors were further investigated by separating them into rhymes and alliterations (R, A, S). Finally, in the third analysis, random choices and refusals to answer were added (P, S, X, -). In all three analyses, significant main effects were found for age (p < 0.01) and errors (p < 0.05 for the first two analyses and p < 0.01 for the third), as well as a significant interaction between age and errors (p < 0.05 for the first and third analyses and p < 0.01 for the second). The interaction effects are the result of each group of children having clear strategy preferences: the youngest children used S and X more than the older two groups, the middle group used P the most, and the oldest children used no response (() the least. When P was broken down into R and A, it was found that while all three groups used rhymes about equally, the oldest children used alliterations only about half as much as either of the younger two groups. The lack of any effect of sex on strategies was taken as a justification for combining the boys with the girls in the subsequent analyses on strategies. Table 3 summarizes for each of the age groups the means for each of the possible responses on first tries for Task 1. Since only first tries are tabulated, each row sums to 180. (C, = number passively correct.) Post hoc t-tests on the group mean scores show that the differences in use of individual strategies between the oldest and middle groups were never significant, but the youngest group differs significantly from both the middle and oldest groups in number correct (p < O.OOl), number of passive homonyms (p < O.OS), number of semantically related choices (p < O.Ol), and number of unrelated pairs (p < 0.05 for middle versus youngest, p < 0.01 for oldest versus youngest) (see Table 3). The percentages of each strategy are graphed in the top of Figure 2, giving a strategy profile on first tries for each age group. The differences between strategies by age groups change very little from first tries to all tries. At first glance, it seems reasonable that, although the youngest children make many more S and X tries, neither on first tries alone nor on all tries is there a significant difference between age groups in phono-
The acquisition of homonymy
Table 3.
197
Age group means for strategies on Task I, first tries. Significant differences in each strategy are shown between youngest and middle groups and youngest and oldest groups (bottom row) as computed by i-tests.
Oldest 5;1-6;3 Middle 4;3-5;l Youngest 3;3-4;5
Ho
CP
107 (59%)
A
(Z%, **
Cl”%) (*)
25 (14%) **
:3y,) (*)
P
S
X
39 (22%)
197$h)
(&
::O%, *
& (*)
$983 *
(2:2%) *
42 (23%) 43 (24%)
*p < 0.01. **p < 0.001. (*)p < 0.05. Ho = number of overt homonyms C, = number found passively. P = phonological errors. S = semantic errors. X = random associations - = no response.
logically-based guesses (P). This, ceiling effects in the oldest group.
12 (7%) 22 (12%)
found.
however,
may be artifactual
and due to
2. Results: By Ability Groups During the testing, it became evident that homonym finding ability did not vary strictly with age: some of the children in the middle group were clearly much better at the task than some of the older children. Not only did they find the homonyms quickly and efficiently (using few tries) but the few errors they did make seemed to be qualitatively different from those of the older children who had more difficulty. In order to investigate this observation, the children were divided into three “ability groups” based on their Ho scores. The most able group (A) was comprised of 7 children from the oldest group and 3 from the middle group (including one of the youngest from that group, aged 4;s). The second group (B) contained 7 children from the middle age group, 2 from the oldest, and 1 from the youngest. The least able group (C) contained the remaining 9 of the youngest children and one child from the oldest group. The ability group means for each strategy on Task 1, first tries, are summarized in Table 4, while the bottom of Figure 2 gives a strategy profile for each ability group. Comparing the top and bottom of Figure 2, we see that the strategy differences between ability groups are more marked than those between age
198
A. M. Peters and E. Zaidel
Figure 2.
Percent responses for each strategy used on Task 1, first tries. Top: by age group; bottom: by ability group. Symbols as in Table 3.
ocp Oldest
pP
Best
s x (A)
P
s x -
Middle
H,
cp
P
Middle
H,
cp
P
s x -
Youngest
s x (B)
HoCp
P
worst
s x (C)
groups. The original observation that ability did not vary strictly with age was verified: the middle group (B) used 2.4 times as many extra guesses after the first try as did the best group (A). In addition, we see that with this grouping there is a significant difference in phonologically-based choices between A and B. In fact, group B made 3 to 4 times as many such errors as group A. Semantically-based guesses remain significantly greater for the bottom group as do passively correct guesses. While refusal to respond (-)
The acquisition of homonymy
Table 4.
199
Ability group means for strategies on Task I, first tries. Significant differences in each strategy are shown between best and middle groups, middle and worst groups, and best and worst groups (bottom row) as computed by i-tests. Symbols as in Table 3. P
Ho
CP
A: best 4;s6;3
129 (72%) **
(:%,
8: middle 4;3-5;6
69 (38%) **
(24%)
58 (32%)
C: worst 3;3-5;2
24 (13%) **
12 (7%) (*)
49 (27%) (*I
**
S
X
19 (11%)
fF2%) (*I 46 (26%) *
7 (4%) *
ffl
%)
:122%) *
is fairly evenly distributed across all groups, random be concentrated in the bottom groups.
associations
(X) tend to
3. Discussion There were several other readily observable strategy differences between the groups. In both the youngest (Fig. 2, top) and least able (Fig. 2, bottom) groups, “passive” responses were extremely common: not only when these children picked correct pairs, but also when they picked phonologically and semantically associated pairs, they tended not to want to say any words aloud when asked What’s the word? In fact in Task 1, 8 1% of the passive responses were made by children in the youngest age group. (For further discussion of passive vocabulary, see below.) Another clear developmental difference was a shift in the type of semantic responses given. The youngest children not only indicated many semantically associated pairs for which they refused to verbalize, but when they did say the words, they tended to label the individual members of a class rather than giving a single superordinate class label (for instance, pointing to Zion and bear and saying lion, bear rather than animal which would at least have used the same word for both pictures). 77% of such class membership responses were made by the youngest group whereas superordinate class responses were quite evenly distributed across the age groups (youngest 31%, middle 35%, oldest 35%). A cognitively-based difference that separated one group from another was apparent in the type of searching strategy employed. Thus, while the most able children tended to scan each array of 4 pictures silently (though often subvocalizing, as evidenced by lip movement), smile, point to the right
200
A. M. Peters and E. Zaidel
pair and say the words, the youngest seemed to just pick two pictures. If these were wrong, they often picked the other two pictures and then gave up. The intermediate children, however, seemed to be on the way to developing a systematic search strategy without having quite gotten there. First, they tended to want to name all 4 pictures aloud without making any choices. Then they often seemed to pick out one picture which served as a focus for their comparisons and would systematically pair it with each of the other 3, indicating that each such pairing was a guess at the right answer. If they happened to choose one of the homonym pictures as their focus, this strategy was often successful. If, however, they picked a non-homonym as focus, they often could not find the homonym even though they applied the correct labels to the pictures: although they said the correct words aloud, they seemed not to be able to carry the sounds over from one comparison to the next. A shift to Task 2, however, in which one of the homonyms was indicated by the investigator, seemed to help these children get unstuck from that first choice of focus. This phenomenon was much more common among the older two groups of children, occurring only rarely among the youngest. It may reflect local rigidity associated with flexibility in another cognitive locus. It is as if the child has a limited resource for flexible open-ended search which s/he can apply to the search for focus or to the search for identical labels but not simultaneously to both. (See Norman and Bobrow, 1975, for a discussion of resource limitations.) An increase in ability to deal with the phonological nature of the problem was also evident, being more pronounced in the ability grouping than in the age groups. The children acted very much as if there was a hierarchy of strategies at their disposal, and if a higher strategy didn’t work they would fall back on a lower one. The apparent sequence was: get it right, make a phonological choice, make a semantic choice of the inclusive kind, make a semantic choice of the associative kind, guess randomly. (Giving up could occur at any point - how soon a child refused to try any more seemed to depend on the individual’s personality.) The older children had more control over the higher end of the sequence - the most able children almost always found the right answers and when they had trouble they would fall back on P or S almost equally often (see Fig. 2, bottom). The least able children, who had great difficulty, seemed also to use P and S about equally often, but the middle group used P much more often than S (again, see Fig. 2, bottom). This is because, aware of the phonological nature of the problem. some of them used every trick they could muster to find two words that sounded alike, including hunting for rhymes (by means of both real words and invented nonsense words) and “forcing” identity of sound between two words (“invention” (I) errors). For example, K.B. (5;l) was a prolific
The acquisition of homonynzy 201
arrowlbrarrow, horselmorse and suggesting mitten/kitten, among others. There were two different ways in which identity of sound was “forced”: through “brute force relabelling” and through phonological neutralization. “Brute force relabelling” occurred when a child pointed to two non-homonymous pictures and applied the word for one of them to both. E.N. (4;3) did this some 10 times, e.g., pointing to hoe, bow-and-arrow, but saying
rhymer,
drum/turn,
arrow, arrow.
A somewhat more subtle strategy involved taking advantage of the near phonological identity of some of the phonological associates and pronouncing such pairs “halfway between” so that the phonological contrast was neutralized. Thus A.J. (5; 1) asserted that pear and bear sounded exactly the same by devoicing the /b/ in bear, producing [ ph Er], [pErI. She also pronounced palm and bomb identically. And A.D. (4;O) tried to pronounce bat and back the same, producing batk. A developing ability to manipulate the phonological aspects of words, divorced from their meanings, is thus apparent among the intermediate children. There is now substantial evidence that the left cerebral hemisphere is specialized for processing phonological information in speech (Zaidel, 197&). If it also controls the recognition of homonymy, we would have evidence for a rather early onset of cortical lateralization of language, at about 4;6. Consequently, we wanted a developmental estimate of the abilities of the adult right and left hemispheres to recognize homonymy. In a separate study (Zaidel and Peters, 1979), we administered an extended version of the homonym test separately to the right (RH) and then the left (LH) hemispheres of two patients who had undergone complete cerebral commissurotomy to alleviate intractable epilepsy (Bogen and Vogel, 1975). First tries of Task 1 are precisely comparable across these two studies. The LHs obtained perfect scores, far superior to the corresponding RH scores which themselves fit quite well within the developmental progression found for the children. Thus, the RH of patient N.G. (a 45-year-old woman who had surgery at age 30 and first signs of epilepsy at age 17) had scores quite similar to those of the lowest ability children (Table 4) with 11% correct responses (13% for the children), and with the 36% phonological errors only slightly outnumbering the 34% semantic errors (27% and 26%, respectively, for the children). The RH of patient L.B. (a 25-year-old man who had surgery at age 13 and first epileptic symptoms at age 3) scored similarly to the middle ability group: 59% correct first tries on Task 1 (38% for the children) and many more phonological than semantic errors (about a 4 to 1 ratio for L.B., 3 to 1 ratio for the children). Comparison of the adult with the child data thus suggests a rather early LH specialization for phonological
202
A. M. Peters and E. Zaidel
encoding and individual differences in RH processing ability. Furthermore, the data are consistent with the hypothesis of a developmental arrest for the RH in the acquisition of the skill. This is not a universal result - other tasks show slightly higher equivalent mental ages for RH competence (e.g., in receptive syntax) and divergent error patterns as well as performance styles between the RHs and children who had obtained the same total score on the test (Zaidel, 1978). Vocahuhry
Difficulty
and Homonym
Performance
Since the vocabulary items involved in the various pairs were of varying degrees of difficulty, it seemed likely that some children were better at finding homonyms because they had a greater vocabulary proficiency. Therefore, two prenaming scores were calculated for each child based on the number of items for which difficulty was encountered in the prenaming task: (1) a total prenaming score, P,, based on the 54 items used in the 9 test sets, and (2) a homonym prenaming score, Ph, based on the subset of 18 homonym words used in the test sets. When calculated for the whole group, correlations between P, and each of the 4 homonym scorings (H,, H,, Hz, HP) were significant at the 0.001 level as were correlations between Ph and each of the 4 homonym scorings (see Table 5). When, however, these correlations were calculated for the individual age groups, prenaming scores turned out to be the most highly correlated with homonym performance for the oldest group, significantly correlated only with Task 1 performance for Table 5.
Group correlations between prenaming scores and homonym scores. a
Whole Group Pt Ho
HI
-0.84
Oldest
PO.66
**
**
-0.85
-0.71
**
**
H2
PO.78
m-o.73
HP
PO.78 **
m-o.70 **
**
pt
Ph
**
i PO.80 I * 1 PO.82 ; I
, ,
Middle
Youngest Ph
ph
Pt
ph
pt
-0.67 (*)
-0.72 (*)
-0.71 (*)
-0.80 * _
*
-0.77 *
-0.76 (*) \ ,
-0.85 *
PO.67
-0.76
(*)
(*)
%ignificant differences: **p < 0.001; *p < 0.01; Pt = prenaming score on total set of pictures. Ph = prenaming sco*e on set of homonym pictures. Other symbols as in Table 2.
r
-0.69 (*) -0.4;
-
-
0.76 -0.58 (*)
I I
, ,
~0.61
(*I p < 0.05.
-0.55
i
-0.62
i
-0.01
_! PO.13
1 0.47
PO.36
PO.29
0.30
The acquisition of homonymy
203
the middle group, and not correlated at all for the youngest group except for P, with H, (Table 5). Thus, although prenaming proficiency has something to do with homonym finding ability, it does not tell the whole story, especially for the youngest children. It is as if vocabulary proficiency releases resources for searching and matching. When all of the component prerequisites for the task (searching, matching, vocabulary proficiency) are mature enough, growth in ability in any one area releases cognitive resources to improve perfomance in the whole task. It also seemed likely that if a child did not know one or both of a given pair of homonym words at the prenaming stage, s/he would have difficulty finding that particular pair in the homonym test. Therefore, we looked at how well the three age groups did at finding homonyms contingent upon whether they did or did not have prenaming success with the homonym words. This analysis showed that the oldest children did quite well even when they had vocabulary difficulty (91% of the homonyms in this case). Both the oldest and middle groups did well when they had no vocabulary difficulty (95% and 96%, respectively, of these homonyms). The youngest children only got 58% of items where they had no vocabulary problems, 42% when such problems existed. And again, when there was no vocabulary difficulty, the older 2 groups had few passives (1% and 4%) while the youngest had 19%. When, however, there had been vocabulary problems, the middle group went up to 17% passives and the youngest to 24%. Thus, the youngest children seem to be relatively unable to take advantage of exposure to difficult vocabulary items at the prenaming stage as shown by their increase in homonym errors for just those words (23% to 34%). The middle children, on the other hand, could utilize at least some of the prenaming information as evidenced by the increase in passive responses (4% to 17%). And the oldest children seem to have taken such good advantage of their prenaming problems that their homonym performance dropped very little when they had vocabulary difficulty (96% to 91%). The fact that the youngest children did find 42% of these homonyms where prenaming difficulty occurred shows that mastery of vocabulary as measured by success in prenaming is by no means necessary for success in homonym finding. Although the prenaming scores do correlate fairly well with the homonym scores, the interaction between the two tasks seems much more complex. Indeed, the prenaming task was designed to associate particular labels with particular pictures in the minds of the children before they were confronted with the homonym sets, and judging from the children’s homonym scores on those items for which they had vocabulary difficulty, the prenaming task seems to have functioned much as it was intended to (although it did not work perfectly since the children did not always remember the desired
204 A. M. Peters and E. Zaidel
labels). In particular, any vocabulary items which a child knew to some extent but had temporarily forgotten were likely to be reinforced, often to the point where finding the homonym was a possibility, passively if not actively. In addition, the pressure to perform well probably further enhanced this reinforcing effect. An interesting phonological difficulty arose for some of the children when the alliteration happened to phonologically contain the whole target word as its first part. This happened with the words tie and tire, and bear/bare and barrel. Somehow these were much more confusing than minimal pairs such as bat and back, horrz and horse, or night/knight and knife. A final question that needs to be discussed with respect to the effects of vocabulary on homonym performance is that of homonymy versus polysemy. That is, is there any evidence that any of these pairs of words were stored in the children’s lexicons as two sub-meanings to a single entry rather than as two separate entries which happened to sound alike? Of all the homonym pairs, only tie (a string) and necktie seemed to be at all polysemous. (One child, age 4; 10, spontaneously remarked, You tie something around your neck an’you tie OH your shoe, too.) This did not, however, seem to be the case for all the children. The ability to find homonym pairs depends then, not only on an understanding of the nature of the task involved, but also on having access to the phonological representations of the critical words in order to be able to compare them for identity. Active (productive) versus passive (only receptive) knowledge of words probably has its effect here - in the case of “passively correct” choices the children seemed to be able to hear enough of the relevant words in their heads to make their decisions but were not sure enough of the words to want to say them out loud. The tendency of the middle children to want first to name all 4 pictures aloud before making any choices also seems to relate to the need to be able to hear the words in order to compare them. When a child’s control over pronunciation is not fully developed, it is unclear whether his difficulties with pronunciation will tend to carry over into his phonological comparisons or not. The child who had the least success in finding homonyms was a boy (4;O) whose phonological development was very slow. According to his teacher, this trait ran in his family and was always eventually outgrown. How much of his difficulty with homonyms was due to this developmental characteristic is unclear, but probably the effect was not negligible. The homonym test calls for the coordination of a number of cognitive prerequisites. These include the ability (1) to understand the task, i.e., what “sound the same” means, (2) to conduct an exhaustive search through the set of alternative pictures, (3) to access the phonological representations
The acquisition of homonymy
20.5
of the critical words, (4) to rehearse a label while searching for others to match with it, (5) phonologically to match two labels once found, (6) to cycle through alternative labels for a picture in cases of phonological mismatch.’ Inefficient processing or immaturity in any of the component processes or in the ability to coordinate them could result in failure to perform the task. Maturation of some component processes can release resources for processing others. Thus, the younger children were particularly limited by mastery of vocabulary - a problem which hardly affected the older children. That improvement in ability to find homonyms is a function of maturation rather than learning is shown by the fact that exposure to one exemplar of a homonym (Pass 1) did not result in improved performance on exposure to a second exemplar of the same homonym pair (Pass 2 - viz. the fact that the overall scores on the two passes were not significantly different). And yet there is a sharp improvement in recognizing homonyms at age 4;4 without any special training. Thus, the resource limitations affecting performance on this task would seem to be biologically determined rather than learning-dependent.
Summary In our investigation of pre-school children’s ability to find homonyms, we have found not only that children over 4;4 years of age had considerably more success than their juniors, but also that successtat solving this problem depended on a complex interaction of cognitive and linguistic development. Thus, even though children were able to deal with the linguistic aspects of the problem, the fact that they had not yet developed an efficient search strategy could, if they were unlucky in their choice of a focus for comparisons, cause insurmountable problems. And, on the other hand, even if a search strategy was well developed, linguistic problems could cause a particular pair to be missed. The youngest children had both cognitive and linguistic problems; the middle children were learning to deal with both sometimes difficulties arose in one area, sometimes in the other. The most able children had their searching strategies well developed and only rarely had linguistic difficulties. a As noted in “Homonym Performance and Age”, even the children who had the hardest time were able to find one or two homonym pairs and in these cases it was clear that they knew they had solved the problem and found two words that sounded the same. Thus, when they had difficulties with the other pairs, it was not because of problems with component (1) alone, but rather mainly with the cognitive components of search (2) and rehearsal (4) and/or the linguistic components of access to and phonological representation of vocabulary (3), (S), and (6).
206 A. M. Peters and I?. Zaidel
The linguistic abilities needed for finding these homonyms were of two kinds: lexical and phonological. If a child had no lexical access to a particular vocabulary item, s/he could not use it in the task. If such access was only passive (receptive), it might be sufficient to allow the child to find the homonym but insufficient for the child to want to risk producing the word. Such passive success was most common among the youngest children. The oldest children were the most lexically facile - if they happened to forget the particular label associated with a picture at prenaming, they were able to try out several names for each picture. Phonological ability here refers to the capability of separating sound from symbol and then manipulating that sound by comparing it with the sounds of other words. The youngest children showed relatively little evidence of having developed such abilities - they tended to fall back on semantic association as a criterion for similarity. The intermediate children, however, had developed a fair repertory of phonological manipulations they could perform. Since they were not as efficient as the most able group, they made numerous guesses, looking for rhymes and alliterations, inventing them if they had to, or trying in some way to force identity of sound. The ability to recognize phonological similarity would seem to be a necessary if not sufficient prerequisite for learning to read via phonological decoding. Indeed, the disconnected left hemisphere is proficient in both recognizing homonymy and in translating graphemes to phonemes, whereas the right hemisphere is not proficient in either. The improvement in ability to recognize homonyms between 4 and 6 years apparently reflects left hemisphere maturation (Zaidel and Peters, 1979) - if so, then age 5 seems a natural biological (rather than purely cultural) starting point for learning to read. And yet the fact that the oldest group in our experiment did not precisely consist of the most able homonym finders should be kept in mind: some children simply “had their act together” (both cognitive and linguistic) at an earlier age than others.
References Bogen,
J. E. and Vogel, P. J. (1975) Neurologic status in the long term following complete cerebral commissurotomy. In F. Michel and B. Schott (Eds.), Les Syndromes de Disconnexion Cufleuse chez I’Homme. Lyon. Hopital Neurologique. Chao, Y. R. (1951) TheCantian idiolect: an analysis of the Chinese spoken by a twenty-eight-monthsold-child. Reprinted in A. BarAdon and W. Leopold (Eds.1, Child Language: A Book of Readings. En&wood Cliffs, New Jersey, Prentice-Hall. Iwamura, S. J. (1977) Games and other Routines in the Conversation ofPreschool Children. Unpublished Ph.D. dissertation, University of Hawaii. Keenan, E. 0. (n.d.) Evolving discourse ~ the next step. Ms.
The acquisition of homonymy
207
Locke, Locke,
J. L. (1971) Phonetic mediation in four-year-old children. Psychon. Sci 24, 409. J. L. and Locke, V. L. (1971) Recall of phonetically and semantically similar words by 3-yearold children. Psychon. Sci. 24, 189. Norman, D. A. and Bobrow, D. G. (1975) On data-limited and resource-limited processes. Cog. Psychol. 7, 44-64. Thorndike, E. L. and Lorge, I. (1944) The Teacher’s Word Book of30,OOO Words. New York, Teachers College Press. Zaidel, E. (1978) Lexical organization in the right hemisphere. In P. Buser and A. Rougcul-Buser (Eds.), Cerebral Correlates of Conscious Experience. Amsterdam, Elsevier. Zaidel, E. and Peters, A. M. (1979) Phonological encoding and ideographic reading by the disconnected right hemisphere: Two case studies. Submitted for publication.
Les auteurs etudient le developpement de la capaciti des enfants i dissocier sons et sens des mots. La tache consiste, pour des enfants de 3 ans 3 i 6 ans 3 i choisir des homonymes i partir de dcssins. Les resultats montrent que le developpement de cette capacite subit une brusque acceleration i 4 ans 4. L’etude longitudinale des strategies ‘utilisees indique une tache cognitivement complexe. La performance des jeunes enfants est limitde plus par leur incapacite fondamentale i faire face i plusieurs facteurs cognitifs i la fois, que par une incapacitd a traiter les aspects linguistiques de la &he. Les facteurs cognitifs incluent I’accks au vocabulaire, l’enumkration dcs r&ultats intermediaires et l’etablissement d’une strategic de rechcrche.
Cognition, @Elsevier
8 (1980) 209-225 Sequoia LA., Lausanne
Discussion - Printed
in the Netherlands
The ATN and the Sausage Machine : Which one is baloney ?
ERIC WANNER* Sussex University
In a recent issue of Cognition, Lyn Frazier and Janet Dean Fodor proposed a new two-stage parsing model, dubbed the Sausage Machine (Frazier and Fodor, 1978). One of the major results which Frazier and Fodor bring forward in support of their proposal concerns a parsing strategy which, following Kimball (1973), they call Right Association. The center-piece of their argument concerns an interaction between this parsing strategy and another one, which they call Minimal Attachment. Frazier and Fodor (henceforth FF) provide interesting evidence that the language user makes tacit use of both strategies to resolve temporary syntactic ambiguities that arise during parsing. FF then proceed to argue that the existence of these strategies, as well as the apparent interaction between them, can be fully explained if we assume that the language user’s parsing system is configured along the lines of the Sausage Machine. In FF’s view, the Augmented Transition Network (ATN) runs a very poor second to the Sausage Machine, for according to FF’s argument, it is impossible even to describe the two parsing strategies within the ATN framework. In effect then, FF are claiming that the Sausage Machine achieves explanation adequacy in this case while the ATN fails to reach the level of descriptive adequacy. These are strong and potentially important claims. If correct, they obviously provide grounds for pursuing parsing models built along the lines of the Sausage Machine rather than the ATN. However, when FF’s arguments are examined at close range, the comparison between parsing systems comes out rather differently than they claim. In particular, it appears that the Sausage Machine explanation of Right Association and its interaction with Minimal Attachment is empirically incorrect. The inadequacy of this explanation completely cancels the Sausage Machine’s ability to describe the interaction between strategies that FF have observed. This follows because *Reprint Cambridge,
requests should be sent Mass. 02138, U.S.A.
to Eric Wanner,
Harvard
University
Press,
79, Garden
Street,
210
Eric Wanner
FF aspire to an explanation that renders independent description of the parsing strategies unnecessary. The Sausage Machine contains no apparatus for describing strategies. Hence, the failure to achieve explanatory adequacy automatically entails descriptive failure as well. In contrast, and in contradiction of FF’s negative claim, the ATN can provide a perfectly general description for each strategy in terms of scheduling principles that constrain the order in which arcs in an ATN grammar are attempted. Moreover, when these scheduling principles are coupled with an ATN version of the grammar FF tacitly employed to generate their pivotal cases, FF’s observations about the interactions between strategies are completely accounted for. Thus, although the ATN framework does not provide an explanation for either parsing strategy, it appears to achieve descriptive adequacy. Moreover, the descriptive framework of the ATN makes it possible to discern just what phenomena require explanation and to speculate in a reasonable way about the explanatory principles that underlie the parsing strategies FF have discovered.
The Sausage Machine As advertized, the Sausage Machine has two very distinct stages. According to Frazier and Fodor’s proposal, “... the human sentence parsing device assigns phrase structure to word strings in two steps. The first stage parser (called the PPP) assigns lexical and phrasal nodes to substrings of roughly six words. The second stage parser (called the SSS) then adds higher nodes to link these phrasal packages together into a complete phrase marker” (p. 29 1). Although FF do not provide a detailed characterization of how the Sausage Machine works, they do supply the following sketch: The PPP has a “‘viewing window”’ which “shifts continuously through the sentence and accommodates perhaps half a dozen words” (p. 305). The PPP uses the rules of the grammar to assign each input string within the window “it’s lower lexical and phrasal nodes” (p. 296). It is important to understand that in making these structural assignments, the PPP can only take account of the six words within its current window plus any low level structure it may have already assigned to the words within the window. Given the severe “shortsightedness” of the PPP, the SSS “can survey the whole phrase tnarker for the sentence as it is computed, and it can keep track of dependencies between items that are widely separated in the sentence and of long term structural commitments which are acquired as the analysis proceeds” (p. 292). The SSS works only on the output of the PPP. The low level phrasal packages assembled by the PPP are deposited “in the path of the SSS which
The A TN and the Sausage Machine
2 11
is sweeping through the sentence behind it” (p. 306). As it sweeps along, the SSS also uses the grammar to assemble the phrases left to it by the PPP into a complete phrase marker for the input sentence. Although this description is somewhat vague, it is precise enough for FF’s purposes. According to their argument, there are only three features of the Sausage Machine which provide it’s explanatory power. These are also the features which most notably distinguish it from the ATN: (A) The existence of 2 separate stages of parsing. (B) The PPP’s limitation to a six word viewing window. (C) The SSS’s ability to appraise the whole phrase marker as it develops and therefore to make decisions contingent upon the geometry of the entire parse tree.
Can the Sausage Machine Cut the Mustard? In FF’s terms, a parsing strategy is a rule that governs situations in which the grammar permits the parser to attach a constituent in more than one possible way to the developing parse tree. So, for example, both sentence (1) and (2) are ambiguous because the final word in each can be attached at two possible points in the phrase marker: (1) (2)
Tom said that Bill had taken the cleaning out yesterday. Joe called the friend who had smashed his new car’up.
In (l), yesterday can be attached as an adverbial modifier either to the topmost S in the phrase marker (Tom said . ..) or to the embedded S (Bill had taken . ..). Similarly, in (2), up can be attached as a particle to the verb in the topmost S (called) or to the verb in the embedded S (smashed). In both sentences, the lower of the two possible attachments seems to be preferred by most people and Frazier (1978) has provided experimental evidence for the reliability of this preference. According to FF, this type of bias can be adequately described by Kimball’s principle of Right Association, which dictates that an ambiguous constituent should be “attached into the phrase marker as a right sister to existing constituents and as low in the tree as possible” (p. 294). The Right Association strategy applies in the obvious way to make the correct predictions about the language user’s preferences in sentences (1) and (2). But what explains the existence of this particular strategy? Why should the language user be uniformly biased toward low right attachment as opposed to (say) high right attachment? According to FF, the Sausage Machine can supply the answer. Their story begins with the observation that “the ten-
212
Eric Wanner
dency towards low right association of an incoming constituent sets in only when the word is at some distance from the other daughter constituents of the higher node to which it might have been attached” (p. 299). Sentences (3) and (4) provide the evidence for FF’s claim that Right Association “sets in only . . . at some distance”. (3) (4)
Joe bought Joe bought
the book that I had been trying to obtain for Susan. the book for Susan.
In (3) there are two possible attachments for the final prepositional phrase for Susarz: it can be attached either to the object noun phrase (the book that I had been trying to obtain for Susan) or the main clause verb phrase (bought the book for Susnn). Right Association correctly predicts the preference for the first of these attachments, which is at the lower right margin of the phrase marker. Notice, however, that in sentence (4), this preference seems to be reversed. The preferred attachment is to the verb phrase, not the noun phrase; and as phrase markers (5) and (6) demonstrate this is clearly the higher of the two possible attachments:
(5)
‘1 N
P”“NP
I Joe
bought
the
I book
I for
(6)
FF argue that the preference for (5) over (6) is a special case of the general parsing strategy they call Minimal Attachment. This strategy also governs situations where the grammar permits more than one possible attachment
The A TN and the Sausage Machine
2 13
for a given constituent and it stipulates that the ambiguous item “is to be attached into the phrase marker with the fewest possible number of nonterminal nodes linking it with the nodes that are already present” (p. 320). Comparison of (5) and (6) will show that noun phrase attachment involves one more non-terminal node than verb phrase attachment; hence the Minimal Attachment principle correctly predicts the language user’s preference for (5). But why does Minimal Attachment prevail over Right Associaticm in sentence (4)? And why does Right Association appear to set in only at a distance? Here FF offer an ingenious explanation based exclusively on the architecture of the Sausage Machine: L.et us suppose for the sake of argument that the first stage parser has the capacity to retain six words of the sentence, together with whatever lexical and phrasal nodes it has assigned to them. Then in processing (4), it will still be able to ‘see’ the verb when it encounters for Susan. It will know that there is a verb phrase node to which the prepositional phrase could be attached, and also that this particular verb is one which permits a for-phrase. But in sentence (3), where a long noun phrase follows the verb bought, the first stage parser will have lost access to bought by the time for Susan must be entered into the structure; the only possible attachment will be within the long noun phrase, as a modifier to trying to obtain (p. 300). Notice that according to this account, there need be no independent statement of Right Association anywhere in the Sausage Machine. The PPP simply makes whatever attachments it can. In long seritences like (3) the low right attachment of for Susan is the only attachment the PPP can make
because its limited window prevents it from “seeing” the higher attachment possibility. Note also that this account automatically explains why Minimal Attachment prevails over Right Association in (4). Since there is no independent statement of Right Association in the parser there is no conflict to be explained. In short sentences like (l), the PPP will “see” both attachment possibilities. Therefore, there will be no bias towards low right attachment and the Minimal Attachment strategy prevails by default.’ On the basis of this demonstration, FF claim to have achieved, at least in one important instance, their announced goal of showing that “the parser’s decision preferences can be seen as an automatic consequence of its structure” (p. 297).
‘FF also offer a structural account for Minimal Attachment which is quite irrelevant to the interaction between the two strategies. Here it is sufficient to note that on FF’s account, Minimal Attachment is insensitive to distance effects in the manner putatively characteristic of Right Association. Hence, Minimal Attachment continues to operate in contexts where Right Association does not.
214
Eric Wanner
There are, however, serious problems with this claim. If the preference for low right attachment “sets in . .. at some distance” just because of the PPP’s limitation to a six word window, then this limitation ought to operate uniformly in all cases. Just as the preference for low right attachment dissolves as sentence (3) is shortened into sentence (4), so it should also dissolve as sentences (1) and (2) are shortened. But it does not. Sentence sets (7) and (8) represent progressive shortenings of sentences (1) and (2): (7)
(8)
(a) (b) (c) (d) (e) (f) (a) (b) (c) (d) (e) (f)
Tom said Tom said Tom said Tom said Tom said Tom said Joe called Joe called Joe called Joe called Joe called Joe called
that Bill had taken the cleaning out yesterday. that Bill had taken it out yesterday. that Bill had taken it yesterday. that Bill took it yesterday. that Bill died yesterday. Bill died yesterday. the friend who had smashed his new car up. the friend who had smashed his car up. the friend who had smashed it up. the friend who smashed it up. everyone who smashed it up. everyone who smashed up.
Notice that as these sentences shrink, there is no noticeable tendency for the preference for low right attachment to diminish. Indeed, informants to whom I have given just the (f) versions uniformly report a preference in favor of the analysis in which the final word is attached to the lower of the two clauses.* But neither (f) version is more than six words long. Both (f) sentences can fit comfortably within the PPP’s window. Hence the PPP could readily “see” both clauses as candidates for possible attachment. Therefore, the structure of the PPP cannot provide any explanation of the language user’s continued preference for low right attachment in these short sentences.3 *Some informants find the higher attachment in (80 ungrammatical, presumably because it requires an intransitive interpretation of “smashed”. However, these informants all prefer the low right attachment in (8e) where there is Rio possible confounding from ungrammaticality of either attachment. 3Thc same sort of argument can be brought to bear upon some of FF’s other arguments for the explanatory power of the PPP’s limited window. For example, FE‘ argue that the multiple embedded sentence (a) is easier than the identically embedded sentence (b) because its major constituents (marked here by brackets) are approximately the length of the PPP’s window: [The very beautiful young woman] [the man the girl loved] [met on a cruise ship in Maine] (a) [died of cholera in 19721. The woman the man the girl loved met died. (b) But again it is possible to construct an equivalent sentence which is short enough to fall entirely within the PPP’s window yet is very difficult to comprehend: Women men girls love meet die. (c)
The A TN and the Sausage Machine
2 15
One might hope to save the Sausage Machine by somehow incorporating the Right Association strategy within the PPP itself. It might be possible to stipulate, for example, that the PPP tries to fashion the longest possible phrases from the words within its window. But this move would leave us without an explanation of why Minimal Attachment appears to prevail over Right Association in sentence (4). Moverover, it would necessarily entail the abandonment of FF’s goal of explaining Right Association exclusively in terms of Sausage Machine architecture. For as FF point out themselves, there is nothing about the division of labor between the PPP and SSS which might explain why the PPP should strive to build maximally long phrases: Trying to squeeze extra words into the current package could also be counterproductive, for it might happen that the limits of the PPP’s capacity are reached at a point which is not a natural phrasal break in the sentence. In such circum.stances it would have been better for the PPP to terminate the current package a word or two sooner, and start afresh with a new phrase as a new package (p. 312).
To summarize, it now appears that contrary to the Sausage Machine prediction, Right Association is not limited to cases of distant attachment. Moreover, the Sausage Machine offers no explanation of why the language user appears to follow the Right Association strategy in some short sen-tences (7f and Sf, but not others (4). Accordingly, it seems clear that the Sausage Machine’s putative explanation of the behavior of Right Association strategy is simply incorrect. There is nothing about FF’s observations which would require a parser with properties (A) and (B). However, it remains to be seen whether a parser like the ATN, which has neither two stages nor a limited input window, can give a satisfactory account of the behavior of Right Association and Minimal Attachment, as well as their somewhat puzzling interaction.
Is the ATN in the Same Pickle? According to FF, IMinimal Attachment and Right Association cannot be described within the ATN framework. The problem, as they see it, is that the ATN lacks property (C) - the ability to make structural assignments contingent on the geometry of the developing phrase marker. In FF’s words, An ATN parser could certainly be designed so that it would make exactly the same decisions at choice points as the Kimball parser. But because its decisions are determined by the ranking of arcs for specific word and phrase types, rather than in
2 16 Eric Wanner
terms of concepts like ‘lowest rightmost node in the phrase marker’, the parser’s structural preferences would have to be built in separately for each type of phrase and each sentence context in which it can appear. Evidence that the human sentence parser exhibits general preferences based on the geometric arrangement of nodes in the phrase marker indicates that its executive component does have access to the results of its prior computations. Its input at each choice point must consist of both the incoming lexical string and the phrase marker (or some portion thereof) which it has already assigned to previous lexical items (p. 294). It is difficult to determine in general, whether the ATN will eventually require the addition of something like property (C). However, it is quite clear that no such property is required to give a perfectly general description of the two parsing strategies that FF have proposed. The structural preferences involved in these strategies would not have to be “built in separately for each type of phrase and each sentence context”. On the contrary, it appears to be possible to fonnulate scheduling principles for the ATN that completely capture the structural preferences involved and that do so without explicit appeal to the geometry of the phrase marker. Moreover, when these principles are combined with an ATN grammar for FF’s crucial sentences, the residual mysteries concerning the interaction between Right Association and Minimal Attachment are completely resolved. To see this, recall first that a scheduling rule in an ATN, as described by Kaplan (1975, 1972) and by Wanner and Maratsos (1978), is essentially a specification of the order in which the ATN processor considers the arcs leaving a state in an ATN grammar network. Recall also that the ATN network includes at least 5 types of arcsp ~ WORD arcs that analyze specific grammatical morphemes such as that orto, -- CAT
arcs that analyze grammatical categories such as Noun (N) or Verb (V), - SEEK arcs that analyze whole phrases or clauses such as NP, VP, or S; - SEND arcs which terminate a network; - JUMP arcs which provide a free transition between states, thus expressing the optionality of certain sub-paths through a network. Given this enumeration of arc types, we can formulate two general constraints on ATN scheduling rules which provide a general description of Right Association and Minimal Attachment:
4For a more detailed
discussion
of these arc types see the ATN sources
cited above.
The A TN and the Sausage Machine
2 17
Right Association: Schedule all SEND arcs and all JUMP arcs after every other arc type. (Since SEND arcs and JUMP arcs never leave the same state, there is no ambiguity here with respect to the relative ordering of these two arc types.) Schedule all CAT arcs and WORD arcs before all (10) Minimal Attachment: SEEK arcs.
(9)
Consider Minimal Attachment first. Basically this strategy stipulates that the parser should never add an additional non-terminal node to the parse tree unless it is forced to by the grammar. Scheduling rule (10) enforces this strategy by providing that any input element will be analysed as a category or a word of the current phrase before any SEEK to a lower phrase is attempted. Suppose, for example, that our ATN grammer includes the following network level that analyzes X phrases (XP): CAT Y