ECOLOGICAL PSYCHOLOGY, 14(1–2), 1–4 Copyright © 2002, Lawrence Erlbaum Associates, Inc.
INTRODUCTION
Nonlinear Dynamics and Psycholinguistics Guy C. Van Orden Cognitive Systems Group Arizona State University
A small change in handwriting and the word bug becomes bag. Small changes that result in disproportionate or qualitative differences in outcome are symptomatic of phenomena that may be described using nonlinear equations. If the system of nonlinear equations includes time as a variable, then it composes a nonlinear dynamic system—the term dynamic simply means that the system changes in time. The contrast implied by the term nonlinear is in contrast with linear descriptions. In a linear description of the previous example, the small change in handwriting should produce a proportional difference in outcome. Half the change should equal half the difference in outcome; twice the change should equal twice the difference in outcome—as though a handwritten string were in some sense a weighted average of bug and bag. Twentieth-century cognitive psychology was built on the assumption of linearity. The essential goal of cognitive research programs has been to describe the mental architecture that underlies cognitive and linguistic acts, such as reading or engaging in conversation. This goal requires an empirical method sufficient to reduce observed behavior to underlying cognitive processes. This, in turn, requires that effects of cognitive processes, at some level of description, combine linearly to make up the whole of the behavior in question (plus random noise). Linearity allows the reduction of behavioral effects to cognitive processes. Thus, it makes sense that most statistical tools derive from the general linear model, and most research pro-
Requests for reprints should be sent to Guy C. Van Orden, Cognitive Systems Group, Department of Psychology, Arizona State University, Tempe, AZ 85287–1104. E-mail: guy.van.orden@ asu.edu
2
VAN ORDEN
grams assume that measurements of cognitive behavior lend themselves to linear reductive analysis. All the authors of this special issue believe that nonlinear dynamic system theory may be usefully applied to problems in psycholinguistics. They do not agree, however, on the feasibility of reducing behavior to a mental architecture. On one hand, individual processes such as word recognition may be usefully described as nonlinear dynamic systems, although it is nevertheless possible to isolate the contributions of individual processes in behavior. Thus, measurements of linguistic behavior may be partitioned among specific linguistic processes, such as word recognition or syntax. If so, then nonlinear dynamic systems theory would supply theoretical tools, and traditional linear assumptions could yet suffice in empirical analyses. On the other hand, nonlinear interactions may occur among cognitive processes, between cognitive agents and their environments, and even among cognitive agents themselves. If so, then empirical analyses will require nonlinear methods. In either case, we are informed by contemporary sciences that have grown up around nonlinear dynamic systems theory—sciences that have pioneered the conceptual tools and empirical methods for working outside the linear box. Ecological psychologists have long been working outside the box. Thus, nonlinear dynamic systems theory could become a bridge between cognitive and ecological research programs. This would be a welcome development. These two approaches often proceed without contact, or, when contact occurs, proponents sometimes talk at each other rather than to each other. But if both draw assumptions from nonlinear dynamic systems theory, albeit sometimes for different purposes, then the assumptions themselves become a common basis for discussion—a basis for useful debate. The articles in this special issue may contribute to this debate. They describe ways of thinking about linguistic acts that assume nonlinear dynamics. They do not always describe things the same way, but that is part of why they may be useful. Jay Rueckl’s article is a concise overview of connectionist models of visual word recognition. Connectionist models, sometimes called “neural” networks, are often described using nonlinear equations that include time as a variable. Connectionist theory thus inherits some of its basic assumptions from nonlinear dynamic systems theory. As Rueckl makes clear, connectionist models retain the traditional assumption that we may isolate and specify the functions of individual psycholinguistic processes—processes such as word recognition that reside in between a stimulus and a response. However, the processes themselves can be described as nonlinear dynamic systems. One appeal of this view is that it maintains open lines of communication between traditional theories (e.g., information processing theories) and connectionist theories. Nonlinear phenomena such as hysteresis are assumed to be commensurate with results from linear analyses, and both kinds of theories may draw on both kinds of empirical motivation. Whitney Tabor also believes that we may specify internal processes of language and cognition. He develops an original thesis around the central traditional prob-
INTRODUCTION
3
lem of syntax. Syntax concerns the order in which classes of words, such as nouns and verbs, may appear in a sentence such that they are acceptable, well-formed sentences. Languages can differ widely in syntax. Tabor’s bold hypothesis is that different natural language grammars may originate in general lawful properties of pattern formation. He supports this argument with existence proofs in the form of computer simulations. In the simulations, distinct, historically important, artificial grammars emerge as a model is taught different artificial languages. Thus, the conventional notion of linguistic competence appears as a more inclusive construct— a competence for self-organization within linguistic processes that may be common to other cognitive processes (and nature at large). John (Jay) Holden applies concepts from fractal geometry to describe performance in traditional laboratory reading tasks. The same kinds of laboratory tasks are discussed in Rueckl’s article, which allows the two approaches to be compared. The crux of this comparison is the point already mentioned: whether specific cognitive process can be isolated in measurements of reading performance, whether we may reduce human behavior to a causally intermediate architecture of cognition. Holden speculates that systematic changes in the variability in response times imply an iterative fractal process. As he envisions this abstract process, positive feedback exists between agents and discourse environments, which extends the system “outward” to include the discourse environment. If Holden is correct, then cognitive research programs confront a possibility already made explicit in ecological theories—irreducible relations between agents and environments. Assumptions about intentionality play a central role in ecological theories, but intentionality has proven to be a recalcitrant problem for psycholinguistics. Van Orden and Holden consider a new hypothesis about how intentional contents may control behavior. Essentially, intentional contents reduce the degrees of freedom for behavior to include only intentionally appropriate actions, and self-organizing processes select actions that are actually observed. This hypothesis, proposed by Alicia Juarrero (1999), is similar in many respects to contemporary neo-Gibsonian accounts (Riley & Turvey, 2001). Van Orden and Holden contrast assumptions from Juarrero’s account with assumptions at the foundation of traditional cognitive research programs. They go on to claim that long-range correlations in the variability of measured laboratory performances—fractal patterns—can be seen as provisional corroboration of Juarrero’s hypothesis. If they are correct, then we should seriously consider that embodied control processes may self-organize intentional actions. If so, then we require methods of nonlinear analysis to adequately describe human behavior. Differences aside, each of these articles illustrate how theories and methods of nonlinear analysis may be given serious consideration. Nonlinear phenomena are ubiquitous in psycholinguistic performances (and human behavior at large). It is inevitable that they will shape our understanding of human discourse.
4
VAN ORDEN
REFERENCES Juarrero, A. (1999). Dynamics in action. Cambridge, MA: MIT Press. Riley, M. A., & Turvey, M. T. (2001). The self-organizing dynamics of intentions and actions. American Journal of Psychology, 114, 160–169.
ECOLOGICAL PSYCHOLOGY, 14(1–2), 5–19 Copyright © 2002, Lawrence Erlbaum Associates, Inc.
ARTICLES
The Dynamics of Visual Word Recognition Jay G. Rueckl Department of Psychology University of Connecticut and Haskins Laboratories
This article provides an overview of a dynamical systems approach to visual word recognition. In this approach, the dynamics of word recognition are characterized in terms of a connectionist network model. According to this model, seeing a word results in changes in the pattern of activation over the nodes in the lexical network such that, over time, the network moves into an attractor state representing the orthographic, phonological, and semantic properties of that word. At a slower timescale, a learning process modifies the strengths of the connections among the nodes in a way that attunes the network to the statistical regularities in its environment. This view of word identification accommodates a wide body of empirical results, a representative sampling of which is discussed here. Finally, the article closes with a discussion of some of the theoretical issues that should be addressed as the dynamical approach continues to develop.
Despite its apparent simplicity, visual word recognition is a remarkable skill. For one thing, the number of words a reader is familiar with is quite large—up to 250,000 for the typical reader. Moreover, word identification involves two distinct tasks. Words have meaning, and words can be pronounced, and a skilled reader gains access to the semantic and phonological properties associated with a written word within a few hundred milliseconds of seeing it. Interestingly, the mappings involved in these two subtasks are quite different. The mapping from spelling to phonology is fairly systematic—words that look alike generally sound alike as well (e.g., Requests for reprints should be sent to Jay G. Rueckl, Department of Psychology, Box U-1020, University of Connecticut, Storrs CT 06269. E-mail:
[email protected] 6
RUECKL
lake, take, bake, wake). In contrast, with the exception of morphological relatives (e.g., bake, baker, baker, bakery), whether two words look alike has no bearing on whether they are similar in meaning. Yet, despite these differences, the knowledge associated with each mapping can be used to read unfamiliar as well as familiar letter strings. Thus, skilled readers of English can pronounce both mave and zill, and they can understand what is meant by greenify and ecologize. Because of its interesting properties, and because, too, it is easily accessible to experimental investigation, visual word identification has served as an important model system in cognitive psychology for many years. In the late 1800s a sizable literature on word identification blossomed, foreshadowing much of the work that would follow the “cognitive revolution” of the 1950s and 1960s. Since the 1960s visual word identification has been a proving ground for a variety of theoretical constructs, including serial search (Forster & Davis, 1984), spreading activation (McClelland & Rumelhart, 1981; Morton, 1969), and rule-based computation (Coltheart, 1978). Most recently, word identification has served as one of the primary test cases in the exploration of the utility of connectionist networks—and by extension, dynamical systems—as models of cognitive processes. The purpose of this article is to provide an overview of a dynamical systems approach to visual word recognition.
DYNAMICS OF WORD IDENTIFICATION: A CONNECTIONIST APPROACH A connectionist network is composed of many simple, neuronlike processing units called nodes, which communicate by sending excitatory and inhibitory signals to one another. Each signal is weighted by the strength of the connection that it is sent across, and the state of each node (its activation) is a nonlinear function of the sum of these weighted signals. A learning algorithm is used to adjust the strengths of the connections (the weights) such that the flow of activation is tailored to the structure and task demands of the environment in which the network is embedded (for overviews, see Elman et al., 1996; Rumelhart & McClelland, 1986b). Many of the early connectionist models (e.g., Rumelhart & McClelland, 1986a; Seidenberg & McClelland, 1989) employed relatively simple feed-forward architectures. In feed-forward networks there is a unidirectional flow of activation from input nodes to output nodes (often with a set of “hidden” nodes in between), and as a consequence these early models had the flavor of quasi-behaviorist stimulus-response associators. It has become increasingly clear, however, that simple feed-forward models are best thought of as simplifications of more powerful (and more interesting) interactive networks. Interactivity (the recurrent flow of activation via feedback connections) allows the pattern of activation in a network to evolve over time, even if the external input to the network remains constant. As a result, interactive networks exhibit self-organizing attractor dynamics—over time a network’s
DYNAMICS OF WORD RECOGNITION
7
pattern of activation migrates toward a stable state, and once the network reaches an attractor state it remains there until the input to the network changes. Many of the attempts to apply the connectionist framework to the understanding of cognitive processes have focused on the case of visual word identification (e.g., Grossberg & Stone, 1986; Harm, 1998; Harm & Seidenberg, 1999; Kawamoto, 1993; Masson, 1995; Plaut, McClelland, Seidenberg, & Patterson, 1996; Rueckl & Raveh, 1999; Seidenberg & McClelland, 1989; Stone & Van Orden, 1994). These efforts have converged on a canonical “triangle model,” a network which includes separate layers of nodes responsible for representing the orthographic, phonological, and semantic properties of a word, with hidden units mediating the interactions among these layers. The representations of the triangle model are organized such that similarly spelled words have similar patterns of activation over the orthographic layer, semantically similar words have similar patterns of activation over the semantic layer, and so on. When a given word comes into view, the resulting input initiates a flow of activation within the network. Over time, the network settles into a pattern of activation, which in turn provides input to speech generation and language comprehension processes. At this point it may be useful to explicitly characterize the structure and operation of the triangle model in the language of nonlinear dynamics. The following points are particularly important:
• The state of a dynamical system is characterized by one or more state (or order) parameters. In the triangle model these parameters are the activation values of the individual nodes. Thus, the network lives in a high-dimensional state space (where the number of dimensions is equal to the number of nodes). • Dynamical systems are self-causal: Changes in the state of a system are a consequence of state-dependent processes. Thus, the behavior of a dynamical system can be characterized by a flow field that determines the trajectory of a system through its state space. In the triangle model (as in all recurrent networks), the change in the network’s pattern of activation is a function of its current activation, and thus the behavior of the network is state dependent. • The state space of many dynamical systems includes fixed points (attractors and repellers)—states that the system will remain in until it is perturbed. When the triangle model is properly trained, each word has a unique attractor, and the positions of the attractors in the state space are organized to reflect similarities in spelling, pronunciation, and meaning. • In a dynamical system there are typically one or more control parameters, the values of which determine the structure of the flow field (e.g., the location of the fixed points, and so on). The control parameters in the triangle model include the weights and the external input. The weights are coupling parameters that control the interactions among the nodes and are determined by a learning process that attunes the network to its environment and task demands. The external input is assumed to come from basic visual processes, which feed the network information
8
RUECKL
about the identity and position of the letters in a written word, and thus has a direct impact on the activation of nodes in the orthographic layer.1 Thus, the identification of a written word is a dynamical process that is shaped by two sets of constraints. The internal constraints (to borrow a phrase from Gestalt psychology) are embodied in the weights and act to ensure that the states of the components of the network are mutually consistent. The flow field generated by these constraints is multistable, with a large number of attractors (one or more per word) organized to capture the relations among the orthographic, phonological, and semantic properties of the words in the reader’s vocabulary. The external constraints on the dynamics of word identification reflect the “optical push” that seeing a written word exerts on the lexical system, and they are captured in the triangle model by the external input parameters. If the internal constraints were absent, nodes that receive external input would be driven to a certain level of excitation or inhibition, but nodes that do not receive external input would remain in whatever state they were in. (Thus, in the absence of other forces, the flow field generated by the external input would form an attractor manifold—a hyperplane in the system’s state space.) Of course, for the system to work properly, the internal constraints must be present. Hence, in the model the effect on the external constraints is to distort the flow field generated by the internal constraints, strengthening the attractor corresponding to the word now being seen (and perhaps some of its neighbors), and weakening or destroying the attractors corresponding to other words. The direction, speed, and outcome of the system’s resulting movement through its state space depends both on the structure of the flow field (as jointly determined by the weights and the external input) and the initial position of the system (its pattern of activation at the time that the external input begins to change).
DYNAMICS OF WORD IDENTIFICATION: REPRESENTATIVE PHENOMENA The dynamical approach offers a conception of word identification that differs rather dramatically from other kinds of cognitive models. Word identification is not a matter of accessing prestored mental representations, nor does it involve the use of rules to strip a word into its constituents and transform these constituents from one kind of representation to another. Instead, word identification involves self-organizing processes at two timescales. At the faster timescale, commensurate with the rate at which individual words are read, the flow of activation within the lexical network is drawn toward an attractor state, which is jointly determined by 1Network models sometimes include additional control parameters (e.g., decay, attentional gain), but in the research of relevance here these parameters have played a relatively minor role and, thus, will not be considered further.
DYNAMICS OF WORD RECOGNITION
9
the visual input and the network’s pattern of connectivity. The pattern of connectivity is in turn the product of a self-organizing process that occurs at a slower timescale. This slower process adjusts the weights to attune the network to the structure of its environment and task demands. One way to further clarify the characteristics of these dynamic processes and draw out the differences among the unique aspects of the dynamical perspective is to consider how the dynamical approach accounts for a variety of experimental findings. Of the many sorts of empirical phenomena that are of interest to theories of word identification, only a representative subset will be considered here. Collectively, these phenomena reflect two of the primary characteristics of dynamical systems: the manner in which the dynamics are shaped by the control parameters, and the sensitivity of these dynamics to initial conditions. Consistency Effects In a typical word identification experiment, a reader is shown a letter string on a computer screen and asked to either name it aloud or push a button indicating whether or not it is a real word. A starting point for any theory of word identification is the observation that the speed and accuracy with which these responses can be made systematically varies as a function of a number of lexical properties. One such property is word frequency—the more experience a reader has with a word (as estimated by its frequency in a large corpus of text), the faster and more accurate its response in a naming or lexical decision task (Scarborough, Cortese, & Scarborough, 1977). Another, and perhaps less obvious, determinant of reading performance concerns the relation between how a word is spelled and how it is pronounced. Although most words in an alphabetic language adhere to standard spelling–sound correspondences (e.g., save, mint), a minority of words violate these correspondences (e.g., have, pint). It is commonly observed that words such as mint are read faster than words such as pint, provided that they are relatively low in frequency. For high-frequency words, effects related to spelling–sound correspondence are generally weak or absent (Coltheart, 1978). One prominent approach to visual word identification (e.g., Coltheart, 1978) holds that this pattern of behavior reveals the interplay of two kinds of processes: a search process that provides direct access to lexical representations on the basis of a word’s orthographic structure, and a process that uses rules to transform a sequence of letters to a phonological code (“sounding out the word”) in order to access the lexicon. According to the dual-route account, “regular” words (mint) conform to the rules used by the phonological route, but “exception” words (pint) do not. As a consequence, although regular words benefit from the operation of both routes, exception words must be read via the direct route. To the degree that the phonological route is involved, it will either slow the correct response or result in an error, and thus regular words are identified faster than exceptions. The interaction of regularity and frequency is attributed to the effect of frequency on the direct route. It is assumed that
10
RUECKL
this route is so efficient in processing high-frequency words that it completes its operation well before the phonological route has a chance to produce an output. Hence, for high-frequency words the routes in effect neither cooperate nor compete, and regular and exception words are read at the same rate. The dynamical approach offers a different perspective, from which the difference between mint and pint reflects the statistical structure of the correspondences between spelling and phonology. Most of the English words that end in int rhyme with mint (e.g., hint, lint, tint, print). The one word that is inconsistent with this pattern is pint. Thus, for the word mint, hint and tint are “friends,” but pint is an “enemy.” In terms of spelling–sound correspondences, mint is more consistent than pint—it has a higher ratio of friends to enemies. From a dynamical perspective, the putative effects of regularity are actually effects of consistency: The more consistent the word, the more quickly and accurately it will be read (Glushko, 1979; Jared, McRae, & Seidenberg, 1990). In the triangle model, consistency effects are a consequence of the way in which learning structures the pattern of connectivity, and hence the activation dynamics. It is assumed that learning never ceases. Thus, each time a word is read, the weights are adjusted to strengthen that word’s attractor. One effect of these weight changes is to improve the network’s behavior with regard to that word (i.e., in subsequent encounters with that word, it will be identified more quickly and accurately). In addition, however, because the pattern of connectivity controls the network’s response to all of the words that it sees, the learning that results from an encounter with a given word will have consequences for the subsequent processing of other words as well. In other words, learning about one word can interfere with the subsequent processing of another word. Because they are represented by dissimilar (orthogonal) patterns of activation, interference has minimal effect on words that are dissimilar in spelling, pronunciation, and meaning. For words that are similar along one or more of these dimensions, interference can be either beneficial or detrimental. For example, because mint and hint are similar in both spelling and pronunciation, the weight changes that strengthen the association between the orthographic and phonological representations of mint tend to strengthen the corresponding association for hint as well. Thus, learning about mint improves the network’s performance on hint. In contrast, because the bodies of mint and pint are pronounced differently, the changes in the orthographic–phonological weights resulting from an encounter with the word mint are inappropriate for the word pint. (In fact, they tend to make the network pronounce pint so that it rhymes with mint; Seidenberg & McClelland, 1989.) Consistency effects reflect the cumulative effects of experiences with a word’s friends and enemies on the network’s pattern of connectivity, and hence on its activation dynamics. The more friends a word has, the faster it will be identified; the more enemies, the slower. Of course, the pattern of connectivity depends not only on the network’s experience with a word’s friends and enemies but also on its experience with that word itself. As noted above, the effect of spelling–sound con-
DYNAMICS OF WORD RECOGNITION
11
sistency on word identification is modulated by frequency—for high-frequency words, consistency has little or no effect. Like readers, the triangle model also exhibits this interaction of frequency and consistency (Plaut et al., 1996; Seidenberg & McClelland, 1989). This interaction comes about because each encounter with a word provides the network with an opportunity to adjust its weights in order to improve its performance. Given enough learning trials, the weights can be adjusted to fully compensate for the interference produced by a word’s enemies. Practice makes perfect. To summarize, frequency and consistency effects reflect the statistical structure of the mappings among the properties of the words in a reader’s vocabulary. Over the course of learning, this structure shapes the pattern of connectivity within the lexical system. The pattern of connectivity, in turn, constrains the flow of activation within the system and, hence, its behavior. It is worth noting that this linkage between statistical structure, learning, and behavior provides the basis for understanding a variety of empirical phenomena. For example, readers are influenced by the morphological structure of the words that they read (for reviews, see Feldman, 1995; Henderson, 1985). In the triangle model, morphology plays an important role in word identification because morphology is virtually the only source of structure in the mappings from orthography and phonology to meaning. (That is, with the exception of morphological relatives, words that are similar in form are typically not similar in meaning—contrast make, made, maker with make, take, lake.) Simulations have demonstrated that networks are sensitive to morphological regularities in much the same way that they are influenced by orthographic–phonological consistency (Plaut & Gonnerman, 2000; Rueckl & Raveh, 1999). Repetition Priming The explanation of consistency effects developed above was couched primarily in terms of the effects of learning on the pattern of connectivity in a connectionist network. It may be instructive to restate this explanation using the terminology of nonlinear dynamics. To wit, after a word is identified, the lexical system’s control parameters are adjusted so as to strengthen the attractor corresponding to that word. However, changes in the control parameters have global consequences—a change in a control parameter deforms the entire flow field. Consequently, if two words have nearby attractors in both the orthographic and phonological subspaces, strengthening one of their attractors strengthens the other as well. In contrast, if two words with nearby orthographic attractors have relatively distant attractors in the phonological subspace, strengthening one attractor weakens the other. The impact of these transfer effects depends on the frequency with which the unprimed word is encountered. More frequent words have stronger attractors, and thus transfer effects have a negligible impact on them. Consistency effects reveal the collective influence of many learning events on the dynamics of word identification. However, if each encounter with a word
12
RUECKL
causes a change in the lexical system’s control parameters, then in principle it should be possible to observe the behavioral consequences of a single learning event. In fact they are observable—in a phenomenon known as repetition priming. Repetition priming is the facilitation in the identification of a word that results from having seen that word recently. Priming effects occur on a scale of minutes, hours, or even days. They can be observed in a variety of experimental tasks, and they influence the behavior of readers at all skill levels (for general reviews, see Roediger & McDermott, 1993; Tenpenny, 1995). Repetition priming has been modeled in a variety of ways—for example, as a change in a word detector’s threshold (Morton, 1969); as a reordering of the lexical entries subjected to a serial search process (Forster & Davis, 1984); and as the influence of episodic memory traces on perception (Jacoby & Dallas, 1981; Kolers, 1979). According to the dynamical approach, priming is a manifestation of the process that adjusts the system’s control parameters (i.e., the network’s weights) after a word has been identified. These changes strengthen the word’s attractor, and as a result, on subsequent encounters with that word the system responds faster and more accurately. Repetition priming has been extensively studied, both by theorists interested in word identification and theorists concerned with the nature of memory.2 An extensive review of these findings and their fit with the dynamical approach is provided by Rueckl (in press). For present purposes, it is sufficient to discuss several of the characteristics of repetition priming that are especially illuminating with regard to the nature of the dynamical approach. One important set of findings concerns transfer effects. Because a change in a system’s control parameters deforms its entire flow field, the dynamical account holds that the effects of priming should not be limited to the subsequent identification of that prime, but instead should have consequences for the identification of other words, particularly those that are similar to the prime (and hence have nearby attractors). Consistent with this account, priming has been found to transfer to words that are similar to the prime in spelling, pronunciation, and meaning (for a review, see Rueckl, in press). A particularly robust example of a transfer effect is morphological priming. Numerous studies (e.g., Rueckl, Mikolinski, Raveh, Miner, & Mars, 1997; Stanners, Neiser, Hernon, & Hall, 1979) have shown that the identification of a word is facilitated by the prior presentation of a morphological relative (e.g., walk– walks, compute–computer). Morphological priming has often been taken to indicate that morphemes are explicitly represented in the mental lexicon. On this view, morphological priming occurs because the identification of morphologically related words involves access to the same mental representations. The dynamical approach offers an alternative account. On this account, morphological priming occurs because morphological relatives are typically similar in spelling, pronunciation, and
2In
the memory literature repetition priming often goes by the name of implicit memory.
DYNAMICS OF WORD RECOGNITION
13
meaning. Thus, morphological relatives live close together in state space, so that changes in the control parameters that strengthen the attractor for one word strengthen the attractors for its morphological relatives as well. (Consistent with this account, morphological priming varies with orthographic similarity—e.g., made primes make more then bought primes buy; see Rueckl et al., 1997.) Another interesting aspect of priming from a dynamical perspective is pseudoword priming. Pseudowords are pronounceable nonwords such as mave and zill. An important tenet of the dynamical approach is that words and pseudowords are processed in the same way. That is, seeing a pseudoword, like seeing a real word, causes a flow of activation within the lexical processing network such that over time the network moves into an attractor state, which allows the reader to behave in appropriate ways (e.g., pronouncing the word in a sensible way). Thus, pseudoword identification is an example of automatic generalization—unfamiliar inputs are processed in fundamentally the same way as familiar inputs. If this view is correct, then the process that produces repetition priming for words should also produce repetition priming for pseudowords. The empirical evidence supports this claim (for reviews, see Rueckl, in press; Tenpenny, 1995). Although some early findings suggested that repetition priming is purely a lexical phenomenon, it appears that these findings were the consequences of some methodological flaws. The bulk of the evidence overwhelmingly shows that words and pseudowords benefit from repetition priming in much the same way, as the dynamical approach predicts. Short-Lag Priming Studies of consistency effects and repetition priming exemplify one strategy for exploring the behavior of a dynamical system: Determine how that system’s behavior varies with changes in a control parameter. With regard to consistency effects and repetition priming, the control parameters in question are the lexical network’s connection weights. An equally important investigative strategy it to document how a system’s behavior varies as a function of initial conditions. In the case of word identification, this strategy involves examining how the response to a word is conditioned by the state of the system when that word first comes into view. One well-established experimental technique for doing this is the short-lag priming paradigm. In this paradigm two stimuli—a prime and a target—are presented in rapid succession. (Typically, the prime is presented somewhere between 25 to 250 msec before the target, although both longer and shorter asynchronies are occasionally used.) The main observation that grows out of this paradigm is that the relation between the prime and target influences how quickly responses to the targets can be made. For example, naming and lexical decision responses are generally faster if the prime and target are related in meaning (e.g., bird–robin) than if they are not (e.g., fruit–robin). Responses are also influenced by the orthographic, phonological, and morphological relations between the prime and target, although the various forms of priming
14
RUECKL
differ somewhat as a function of prime duration and other task characteristics (for reviews, see Lukatela, Frost, & Turvey, 1999; Neely, 1991). Superficially, short-lag priming resembles the long-lag repetition effects discussed in the previous section, in that both involve the effects of seeing a word on subsequent processing, albeit at different timescales. However, from a dynamical perspective, these phenomena are fundamentally distinct. Long-term priming is a consequence of learning and changes in the lexical network’s pattern of connectivity. In contrast, short-lag priming is an effect of initial conditions. The presentation of the prime causes the system to move toward its corresponding attractor. Because the attractors for similar words are relatively close together, as the system moves toward the attractor for a related prime, it also moves toward the attractor for the target. Thus, because response times increase with the distance between the initial state and the target’s attractor, response times are faster if the prime and target are related (for applications of this account to a wide variety of priming phenomena, see Masson, 1995; Plaut & Booth, 2000). Understood in this way, differences in priming as a function of prime duration and prime–target relation provide a window on the dynamics of word identification. For example, when the prime duration is very short, morphologically related prime–target pairs that differ in semantic relatedness (e.g., professes–profess vs. professor–profess) produce equivalent levels of priming. However, at longer prime durations priming is correlated with semantic similarity (Raveh, 1999). This pattern is what would be expected on the assumptions that (a) semantic similarity is reflected in the proximity of attractors in the semantic subspace, but not in the orthographic and semantic subspaces; and (b) the external input affects the system’s position in the orthographic subspace earlier than it affects its position in the semantic subspace. Hysteresis The phenomena discussed thus far (consistency effects, repetition priming, shortlag priming, and so on) have been explored using an arsenal of time-honored and well-honed experimental techniques. Although these techniques provide a glimpse of the dynamical processes underlying word identification, it is also worth noting that none of these methods were specifically developed with dynamical processes in mind. Thus, one of the challenges for the dynamical approach is to develop methodologies that are well suited to elucidating the organization of the dynamics of a cognitive task. One technique that is often used in the investigation of dynamical systems is to observe how the behavior of a system changes as a control parameter is varied in a smooth and continuous fashion. A well-known success story involving this technique concerns the effects of oscillation frequency (a control parameter) on the dynamics of interlimb coordination. By slowly increasing or decreasing the frequency with which two limbs oscillate, one can observe bifurcations, multistability, hyster-
DYNAMICS OF WORD RECOGNITION
FIGURE 1
15
An example of a stimulus continuum used in the hysteresis experiments.
esis, and other signatures of dynamical processes. Indeed, the results of studies using this technique have provided a remarkably detailed and elegant understanding of the dynamics of interlimb coordination (see Kelso, 1995). One attempt to adapt this technique to the study of a more cognitive task was reported by Tuller, Case, Ding, and Kelso (1994) in their study of the dynamics of speech perception. By varying an acoustic parameter that distinguishes the spoken words say and stay (namely, the duration of the silent interval following the s), Tuller et al. constructed a continuum of 20 stimuli that ranged from a clear say to a clear stay. Listeners heard the tokens of the continuum in order, first in one direction (e.g., say to stay) and then in the other (e.g., stay to say). The behavior of the listener’s exhibited many of the hallmarks of nonlinear dynamics, including multistability and context-dependent behavior. For example, the listeners often exhibited hysteresis—the tendency to remain in the same state, even when other options are available. Thus, a particular token that was heard as stay as the continuum changed from stay to say would be heard as say as the continuum changed in the opposite direction. In a series of unpublished experiments, Jason Fourahar and I applied an analogous technique to the study of written word perception. For example, in one experiment we constructed several continua of handwritten words with two orthographically similar words at their endpoints (see Figure 1). Following Tuller et al. (1994), readers were presented with the tokens of these continua in order, starting at one endpoint, moving through the continuum to the other endpoint, and then back. We observed hysteresis on over 60% of the trials. Moreover, the effects of hysteresis were rather dramatic, often changing the perception of three or four consecutive tokens. These experiments on hysteresis represent an initial step toward developing methodologies that are particularly appropriate for studying the dynamics of a cognitive task. The articles by Holden and Van Orden in this issue also represent steps in this direction. One of the important challenges for a dynamical approach to word recognition (and other cognitive processes) is to continue to develop experimental methodologies that allow us to ask the kinds of questions that the dynamical approach tells us we should ask. SUMMARY AND DISCUSSION The purpose of this article was to provide an overview of a dynamical systems perspective on visual word identification. From this perspective, word identification is a consequence of dynamical processes occurring on (at least) two different time-
16
RUECKL
scales. At the faster scale, seeing a word results in changes in the state of the lexical system such that, over time, it moves into an attractor state capturing the orthographic, phonological, and semantic properties associated with that word. At a slower scale, a learning process adjusts the system’s control parameters in a way that attunes the system to the structure of its environment and task demands. Connectionist networks provide a framework for modeling these dynamics. Indeed, the account of word identification developed here is a connectionist account, although presented in a way that takes advantage of the rich body of constructs offered by dynamical systems theory. Over the last decade or so, the dynamical systems perspective (usually couched in terms of connectionist networks) has been applied to a wide variety of empirical phenomena, including not only aspects of the behavior of skilled readers (Harm, 1998; Plaut et al., 1996) but also the effects of brain damage on reading (Plaut et al., 1996) and the problems associated with developmental dyslexia (Harm & Seidenberg, 1999). In the present article a handful of representative phenomena were discussed. These phenomena were chosen because they exemplify central aspects of the dynamical account, including the conceptualization of the connection weights as control parameters, the manner in which the weights are determined by learning and thus come to reflect the structure of the reader’s linguistic environment, and the influence of initial conditions on word identification. Because the application of dynamical systems theory to cognitive processes is relatively new, a brief discussion of several concerns and speculations is in order. One issue of interest to the readers of this journal concerns the points of contact between the present account and the ecological approach to psychology. It seems likely that in comparison to more traditional information processing approaches to word identification, the present account might be more appealing to ecological psychologists in several respects. In particular, both the appeal to self-organizing dynamics and the emphasis on the structure of the environment should seem familiar to proponents of the ecological approach. On the other hand, in the dynamical approach to word identification, relevant environmental regularities are not invariants, but instead are statistical facts concerning the degree of consistency or ambiguity in the mappings among orthography, phonology, and semantics. Moreover, the present approach is representational in the sense that it posits internal states that carry information about an external stimulus, and it is difficult to imagine how an account of a phenomenon such as short-lag priming could do without such states. At the same time, it might be noted that, as Elman (1995) put it, “it is more accurate to think of [the internal state of a network] as the result of processing the word, rather than as a representation of the word itself” (p. 207). Perhaps there is a middle ground—a way of characterizing internal states that is rich enough to capture the phenomena of interest to cognitive psychologists without doing violence to the underpinnings of the ecological approach. Another issue worth considering concerns the dimensionality of the state space occupied by the lexical system. Many of the successes of dynamical systems theory
DYNAMICS OF WORD RECOGNITION
17
have been achieved by discovering how to characterize the behavior of a complex system in terms of a small number of order parameters. For example, the dynamics of interlimb coordination can be captured by a single order parameter, relative phase (Kelso, 1995). The present account assumes that the lexical system lives in a high-dimensional state space—the dimensionality of a network model is equivalent to the number of nodes that comprise it, and recent versions of the triangle model (e.g., Harm, 1998) include several thousand nodes. Finding a way to dramatically reduce the dimensionality of this space would be attractive on both explanatory and methodological grounds, but at this point it remains unclear whether such a reduction might be achieved. One approach might be to project the high-dimensional system onto a low-order one using principal component analysis or other analytic techniques (e.g., Elman, 1995; Tabor, this issue). Whether this approach could do justice to a reader’s ability to distinguish among 250,000 or so different words remains an open question. Finally, although the dynamical approach puts an emphasis on the statistical structure of the mappings of relevance to reading and how this structure comes to organize the system’s flow field, important issues concerning this structure remain unresolved. For example, the measure of orthographic–phonological consistency discussed previously was defined with regard to a word’s body (the vowel and final consonants of a monosyllabic word). However, consistency measures could also be defined at other grain sizes (such as individual letters and whole words), and indeed, the consistency of these other units also has behavioral consequences (Cortese & Simpson, 2000; Ziegler, Perry, Jacobs, & Braun, 2001). Although the emphasis on statistical structure seems to be a step in the right direction, at this point we are far from a general theory of how to characterize the behaviorally relevant regularities in the mappings among linguistic properties. A more fundamental question concerns the origin of these regularities. What simulations of connectionist models have shown is that if a network is exposed to a structured mapping, the regularities in this mapping are captured in the network’s pattern of connectivity, and consequently influence the network’s activation dynamics. Although this might be the right story for understanding the behavior of a child growing up in a linguistic environment, it is not a sufficient story for understanding the origins of that linguistic environment. The question that lurks is why language has the structure that it does. The hope is that dynamical systems theory can provide the conceptual tools needed to answer this question as well.
ACKNOWLEDGMENT The preparation of this article was supported by National Institute for Child Health and Development Grant HD–01994 to Haskins Laboratories.
18
RUECKL
REFERENCES Coltheart, M. (1978). Lexical access in simple reading tasks. In G. Underwood (Ed.), Strategies of information processing (pp. 151–216). London: Academic. Cortese, M. J., & Simpson, G. B. (2000). Regularity effects in word naming: What are they? Memory & Cognition, 28, 1269–1276. Elman, J. L. (1995). Language as a dynamical system. In R. F. Port & T. van Gelder (Eds.), Mind as motion: Explorations in the dynamics of cognition (pp. 195–223). Cambridge, MA: MIT Press. Elman, J. L., Bates, E. A., Johnson, M. H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press. Feldman, L. B. (Ed.). (1995). Morphological aspects of language processing. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Forster, K. I., & Davis, C. (1984). Repetition priming and frequency attenuation in lexical access. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 680–698. Glushko, R. J. (1979). The organization and activation of orthographic knowledge in reading aloud. Journal of Experimental Psychology: Human Perception and Performance, 5, 674–691. Grossberg, S., & Stone, G. (1986). Neural dynamics of word recognition and recall: Attentional priming, learning, and resonance, Psychological Review, 93, 46–74. Harm, M. W. (1998). Division of labor in a computational model of visual word recognition. Dissertation Abstracts International, 60(02), 0849B. Harm, M. W., & Seidenberg, M. S. (1999). Phonology, reading acquisition, and dyslexia: Insights from connectionist models. Psychological Review, 106, 491–528. Henderson, L. (1985). Towards a psychology of morphemes. In A. W. Ellis (Ed.), Progress in the psychology of language (Vol. 1, pp. 15–72). London: Lawrence Erlbaum Associates, Inc. Jacoby, L. L., & Dallas, M. (1981). On the relationship between autobiographical memory and perceptual learning. Journal of Experimental Psychology: General, 110, 306–340. Jared, D., McRae, K., & Seidenberg, M. S. (1990). The basis of consistency effects in word naming. Journal of Memory and Language, 29, 687–715. Kawamoto, A. H. (1993). Nonlinear dynamics in the resolution of lexical ambiguity: A parallel distributed processing account. Journal of Memory and Language, 32, 474–516. Kelso, J. A. S. (1995). Dynamic patterns: The self-organization of brain and behavior. Cambridge, MA: MIT Press. Kolers, P. A. (1979). A pattern-analyzing basis for recognition memory. In L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing and human memory (pp. 363–384). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Lukatela, G., Frost, S., & Turvey, M. T. (1999). Identity priming in English is compromised by phonological ambiguity. Journal of Experimental Psychology: Human Perception and Performance, 25, 775–790. Masson, M. E. J. (1995). A distributed memory model of semantic priming. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 3–23. McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: Pt. 1. Psychological Review, 88, 375–407. Morton, J. (1969). Interaction of information in word identification. Psychological Review, 76, 165–178. Neely, J. (1991). Semantic priming effects in visual word recognition: A selective review of current findings and theories. In D. Besner & G. Humphreys (Eds.), Basic processes in reading: Visual word recognition (pp. 264–336). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Plaut, D. C., & Booth, J. R. (2000). Individual and developmental differences in semantic priming: Empirical and computational support for a single-mechanism account of lexical processing. Psychological Review, 107, 786–823.
DYNAMICS OF WORD RECOGNITION
19
Plaut, D. C., & Gonnerman, L. M. (2000). Are non-semantic morphological effects incompatible with a distributed connectionist approach to lexical processing? Language and Cognitive Processes, 15, 445–485. Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996). Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review, 103, 56–115. Raveh, M. (1999). The contribution of frequency and semantic similarity to morphological processing. Dissertation Abstracts International, 60(08), 4280B. Roediger, H. L., & McDermott, K. B. (1993). Implicit memory in normal human subjects. In H. Spinnler & F. Boller (Eds.), Handbook of neuropsychology (Vol. 8, pp. 63–130). Amsterdam: Elsevier. Rueckl, J. G. (in press). A connectionist perspective on repetition priming. In J. S. Bowers & C. Marsolek (Eds.), Rethinking implicit memory. New York: Oxford University Press. Rueckl, J. G., Mikolinski, M., Raveh, M., Miner, C., & Mars, F. (1997). Morphological priming, fragment completion, and connectionist networks. Journal of Memory and Language, 36, 382–405. Rueckl, J. G., & Raveh, M. (1999). The influence of morphological regularities on the dynamics of a connectionist network. Brain and Language, 68, 110–117. Rumelhart, D. E., & McClelland, J. L. (1986a). On learning the past tenses of English verbs. In J. L. McClelland, D. E. Rumelhart, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition: Vol. 2. Psychological and biological models (pp. 216–271). Cambridge, MA: MIT Press. Rumelhart, D. E., & McClelland, J. L. (Eds.). (1986b). Parallel distributed processing: Explorations in the microstructure of cognition: Vol. 1. Foundations. Cambridge, MA: MIT Press. Scarborough, D. L., Cortese, C., & Scarborough, H. (1977). Frequency and repetition effects in lexical memory. Journal of Experimental Psychology: Human Perception and Performance, 3, 1–17. Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model of visual word recognition. Psychological Review, 96, 523–568. Stanners, R. F., Neiser, J. J., Hernon, W. P., & Hall, R. (1979). Memory representation for morphologically related words. Journal of Verbal Learning and Verbal Behavior, 18, 399–412. Stone, G. O., & Van Orden, G. C. (1994). Building a resonance framework for recognition using design and system principles. Journal of Experimental Psychology: Human Perception and Performance, 20, 1248–1268. Tenpenny, P. L. (1995). Abstractionist vs. episodic theories of repetition priming and word identification. Psychonomic Bulletin & Review, 2, 339–363. Tuller, B., Case, P., Ding, M., & Kelso, J. A. S. (1994). The nonlinear dynamics of speech categorization. Journal of Experimental Psychology: Human Perception and Performance, 20, 1–16. Ziegler, J. C., Perry, C., Jacobs, A. M., & Braun, M. (2001). Identical words are read differently in different languages. Psychological Science, 12, 379–384.
ECOLOGICAL PSYCHOLOGY, 14(1–2), 21–51 Copyright © 2002, Lawrence Erlbaum Associates, Inc.
The Value of Symbolic Computation Whitney Tabor Department of Psychology University of Connecticut
Standard generative linguistic theory, which uses discrete symbolic models of cognition, has some strengths and weaknesses. It is strong on providing a network of outposts that make scientific travel in the jungles of natural language feasible. It is weak in that it currently depends on the elaborate and unformalized use of intuition to develop critical supporting assumptions about each data point. In this regard, it is not in a position to characterize natural language systems in the lawful terms that ecological psychologists strive for. Connectionist learning models offer some help: They define lawful relations between linguistic environments and language systems. But our understanding of them is currently weak, especially when it comes to natural language syntax. Fortunately, symbolic linguistic analysis can help connectionism if the two meet via dynamical systems theory. I discuss a case in point: Insights from linguistic explorations of natural language syntax appear to have identified information structures that are particularly relevant to understanding ecologically appealing but analytically mysterious connectionist learning models.
This article is concerned with the relation between discrete, symbolic systems of the sort that have been widely used in linguists’ formal analyses of natural languages and continuous dynamical systems that many ecological psychologists have found insightful, especially for understanding limb movement and visual perception. It begins by casting the discrete, symbolic linguistic models as unecological in several respects (see also Carello, Turvey, Kugler, & Shaw, 1984). Connectionist (or artificial neural-network) models, a particular type of dynamical system, are offered as a more ecological alternative. But despite their strengths, these connectionist models suffer from a certain opacity, which makes it difficult to understand what they are doing and how to improve their performance. A helpful way of overcoming this opacity is to explore their capabilities using discrete, symbolic models as reference points. The symbolic models are identical in behavior to certain special cases of the Requests for reprints should be sent to Whitney Tabor, Department of Psychology, U-0020, University of Connecticut, Storrs, CT 06269–1020. E-mail:
[email protected] 22
TABOR
dynamical models and these cases are useful to know about because they are relatively easy to understand. The result points to a general correspondence between regimes of symbolic and dynamical systems and suggests that to understand particularly complex dynamical processes, symbolic insights may be helpful. By discrete symbolic computation, I mean something very like Carello et al.’s (1984) use of the term discrete-mode computation. This kind of computation goes hand in hand with the “representationalist” approach to cognition, which Gibson (1979/1986) so soundly rejects, and discrete symbolic theories of psychology often take out “a loan of intelligence” (Dennett, 1978, p. 12) of the sort that many ecological psychologists quite reasonably deplore. The aim of the present article is not to suggest that discrete symbolic computation is accurate or complete as a foundation for psychology, but rather that it provides insights into the structuring of information, and these insights may turn out to be helpful to ecological psychology. Synergy between the perspectives might thus be worth seeking. I begin with some examples. The formal linguistic theory called generative linguistics, working in a discrete and symbolic mode, maps a sentence such as (1) to a representation along the lines of (2). (1) The June bug, which Bob had trapped in the neighbor’s screened porch, careened erratically but vigorously between the siding and the mesh. (2)
Under the framework proposed by Frege (1892/1952) and laid out by Montague (1970/1974), this representation supports an interpretation along the following lines: The generic meaning of the and the generic meaning of June bug combine to
SYMBOLIC COMPUTATION
23
form the meaning of the June bug; that meaning combines with the meaning of trapped in such a way as to convey the notion of the June bug being trapped, rather than doing the trapping, for example; the generic meanings of the and neighbor also combine to define a meaning for the neighbor, and so on. It thus fits into a rather broad coverage and effective theory (compositional semantics) of how the literal meanings of sentences can be derived from the generic meanings of their words.
ECOLOGICAL CRITIQUE Despite its power, there are a number of drawbacks to Analysis 2 if it is taken as a portrayal of the mentation associated with an actual instance of comprehending (or producing) a natural language sentence. Many of these drawbacks can be related to concerns that ecological psychologists have expressed about representationalist approaches to cognition in general. I will discuss three: staticness, context-freeness, and lack of emphasis on lawfulness. First, staticness: Time is not a variable in a phrase-structure diagram. Thus, the phrase-tree viewpoint seems immediately at odds with the ecological emphasis on action: “The ecological approach asserts that the concept of information cannot be developed systematically apart from considerations of activity” (Turvey & Carello, 1981, p. 316). I take the relevant sense of “activity” here to be the activity of interacting with (e.g., comprehending, producing) language as it unfolds over time. It is true that a phrase-structure analysis often provides a map of its utterance across time (usually left-to-right in the diagram corresponds to history-to-future in time). But because such a map is not explicit about what mental states are occurring as the utterance develops through time, theories of parsing have been proposed— that is to say, theories of how the tree structure and its meaning get built over time when an utterance is interpreted or formulated. There are a multitude of possible temporal programs for building tree structures. Much research is dedicated to measuring human interaction with sentences over time in an effort to figure out which one of these programs is correct. But if the temporal information were not factored out of the encoding process in the first place—that is, if the theory of encoding were required to produce something that not only embodied structures but also built them as the input unfolded temporally—then it might well be possible to derive predictions about the time course of processing directly from the mechanism that establishes the encoding, bypassing this nettlesome problem. A second concern is that the phrase-structure approach gains a great deal of its efficiency from its assumption that the building blocks of tree diagrams (e.g., rules such as NP → Det N) are context-free abstractions over language content. Ecological psychologists have long objected to context-free mental objects. Indeed, they are cumbersome when the focus is on the ongoing mutual influence between organism and environment. The standardly cited linguistic cases involve deixis—linguistic elements whose function is to refer context dependently to entities in the world
24
TABOR
(e.g., I, tonight; Turvey & Carello, 1981). Abstract phrase-structure objects, such as NP → Det N, might seem, at first glance, to be less susceptible to criticism on this account, because they capture patterns that are remarkably stable across instances of the use of a language, and in fact, they provide a helpfully sturdy skeletal structure within which the flexibility of deictic elements can be well modeled (van Eijck & Kamp, 1996). Nevertheless, there are reasons to be skeptical. There are many informational regularities that cut across the independent phrasal units (Charniak, 1993). For example, in (3), (3) The chicken which Fred baked was not ready to eat. the chicken is interpreted as the patient of eating (the one that gets eaten) when the verb eat is read. But in (4), (4) The chicken which Fred fed was not ready to eat. the chicken is more likely to be interpreted as the agent of eating. The difference between the two chickens is determined by the content of the embedded relative clause, which is phrase-structurally quite remote from the verb eat. Somehow, the information has to be transported across the tree so that the parser can select the right role assignments at eat. A useful strategy adopted by several linguistic theorists (e.g., Bresnan, 1982; Joshi & Schabes, 1996; Pollard & Sag, 1994) is to employ features that “percolate” from daughter nodes to mother nodes or vice versa and thus carry information to places the context-free rules do not get it to (other theories employ syntactic transformations with similar effect, e.g., Chomsky, 1981). It is tempting to adopt such a strategy in this case by, for example, letting the verb feed generate a feature [+alive] specifying that its patient (the thing fed) ought to be alive, by letting [+alive] percolate from its starting point in the relative clause up to chicken, by simultaneously letting the verb eat specify that its agent should be [+alive] while its patient should be [–alive], by letting the feature from chicken percolate down through the main clause to eat, and by letting the parser examine the two different ways of linking the subject of be with an argument of eat in order to choose the one for which the features are consistent. Workable as this approach might be for this case, its plausibility is cast into doubt by (5), (5) The chicken which Fred last fed just yesterday is now ready to eat. which has exactly the same verbs and nouns in the same phrase-structural relations to one another but seems to be biased toward the cooked chicken interpretation. The temporal adverbs (last, just yesterday, and now) are making the difference but it is not clear how that difference could be expressed via modulation of the percolating features. Taking an inspiration from Gibson (1979/1986), one might sug-
SYMBOLIC COMPUTATION
25
gest that perceivers of language do not perceive meaning via context-free syntactic abstractions, but rather they directly perceive meaning. See Tanenhaus, Carlson, and Trueswell (1989) for an experimental development that points toward a similar conclusion. Third, the program of inquiry underlying analysis (2) only weakly supports a lawful treatment of the language–environment system. By a lawful treatment, I mean here a complete and coherent characterization of how the proposed mental state and the proposed environment coevolve at the timescale of moment-to-moment experience. This definition of lawfulness sounds like it could, in principle, accommodate different types of lawfulness from the “specificational” sense that ecological psychologists generally focus on (Turvey & Carello, 1985). For the case at hand, I do not think it does. In the body of this article I use the term lawfulness as I have just defined it. In the conclusion, I return to the question of how my use of the term lawfulness is related in its use among Gibsonian psychologists. For linguistic lawfulness, two levels of completeness need considering. I refer to the first, less complete level as processing lawfulness and to the second, more-complete level as inductive lawfulness. Processing lawfulness characterizes the relation between a language utterance unfolding through time and the associated mental trajectory of its generator–perceiver. Inductive lawfulness characterizes the relation between an entire linguistic environment (consisting, for example, of all the linguistic experience a person has during childhood) and the associated processing map from utterances in contexts to mental trajectories. Generative linguistic theory has not rejected lawfulness in either of these senses. But there is an issue of how much commitment the methodology of the approach has to seeing the lawfulness through. The problem is not so much with processing lawfulness. If the individual words can be identified and assigned meanings, then the standard models based on diagrams such as (1) seem to be within range of providing a lawful account of how linguistic utterances specify mental states (up to symbolic ambiguity) and vice versa (Kamp & Reyle, 1993; Montague, 1970/1974; van Eijck & Kamp, 1996). But generative linguistics has shied away from studying inductive lawfulness from an early day when Chomsky (1957) argued that “discovery procedures” (which build grammars from scratch, based on experience) are hard to devise, but “evaluation procedures” (which select among a finite set of clearly distinct options that evolution has made conveniently available) are easier. The idea seems to have been that if enough of the territory of linguistic structure could be mapped out in accurate detail, then the problem of induction could be reduced to making a finite set of binary, or n-ary, choices based on the observation of easily recognized “triggers.” Indeed a number of proposals along these lines have been put forth (Gibson & Wexler, 1994; Hyams, 1986; Lasnik, 1990). But unifying laws are lacking, and most researchers in the field pay little attention to the predictions of these inductive models, relying instead primarily on intuition to establish most of the structural background (in particular, the tree diagrams) on which models of specific utterances are built.
26
TABOR
I suggest that in the case of language, it is crucial to have a full-fledged account of both processing lawfulness and inductive lawfulness. Here, language contrasts, at least in degree, with the domains that are typically studied by ecological psychologists, for example, visually guided locomotion, dynamic touch, wielding, and so forth. Ecological psychologists do not usually object, for example, to attempts to characterize organism-relevant invariants of optic flow without first building a theory of how an organism learns to detect those invariants (if, indeed, learning is required at all in that case). But there is a fairly strong sense in which we understand and can accurately model the nature of the materials (e.g., light, matter) involved in situations where optic flow is important. The materials involved in “syntactic flow” (ostensibly, words; lexical classes, such as noun, verb, adjective, etc.; and phrasal classes, such as noun phrase, verb phrase, sentence, etc.) are more mysterious. In particular, their definitions are highly mutually dependent. Generative theory assumes, for example, that an uttered word w counts as belonging to lexical class C and as forming part of an instance of phrase class P, because assuming the existence of C and P, along with the role that w plays in the instance at hand, provides optimal generalization of the theory about the set of potential utterances that a native speaker commands. It is with the definition of “optimal generalization” that inductive models are concerned. Because of the considerable interdependence of structure definitions under this notion, a theory of induction is essential to a lawful treatment of language use.
CONNECTIONIST INSIGHT Connectionist (or artificial neural-network) models (Haykin, 1994; Hertz, Krogh, Palmer, 1991; Rumelhart, McClelland, & PDP Research Group, 1986) of language offer an alternative to the generative linguistics approach that helps address several of these ecological concerns. The next subsection gives a brief introduction to the class of connectionist language models considered here. The following subsection examines the connectionist approach in light of the critiques. Connectionist Models Connectionist models are mathematical or computational models of organism behavior, which take their inspiration from neurobiology (O’Reilly & Munakata, 2000). In particular, they consist of networks of nodes and connections that resemble, in a very pared-down way, networks of neurons and axons, synapses, and dendrites. Each node i is associated with a number, ai, called its activation (analogous to neural firing rate in real brain tissue); each connection between nodes is unidirectional and is associated with a number, wij, called its weight (analogous to synaptic strength). By convention, I use the term wij to denote the weight on the connection from node j to node i.
SYMBOLIC COMPUTATION
27
Networks undergo two types of change: activation change, which happens quickly, is intended as a model of what psychologists typically call “behavior”; weight change, which happens slowly, is intended as a model of “learning.” Activation change is specified by equation (1), where t indexes time. a i( t) = f ( net i( t)) net i( t) = å w ij a j( t -1)
(1)
j
In the present work, the activation function f is a smoothed step function or sig1 moid, usually f( x ) = . Thus, each node computes a weighted sum of the in1 + e −x put from nodes that feed into it and becomes strongly activated if the sum is positive and weakly activated if the sum is negative. The model is immersed in an environment. Weight change is the process by which the network attunes itself to the regularities (invariances) in its environment. In the present case, following the paradigm developed by Elman (1990, 1991, 1995), the model iteratively senses an event corresponding to the perception of a word in a stream of speech and tries to predict the next event, which is assumed to be the perception of the following word. To implement this idea, it is convenient to define one set of nodes within the network as “input” (for sensing the current event) and another as “output” (for predicting the next event), respectively; see Figure 1.1 To make no prior assumptions about the structure of the data, an unbiased input–output encoding is used: Each word form corresponds to one unit on the input layer and one unit on the output layer. Words from a large sample of language are presented to the model in sequence. At the point of each word presentation, the unit corresponding to the current word is activated on the input layer (all other input and output units have activation 0). Observation of the following word in the language sample provides the basis for adjusting the weights of the model slightly so as to improve its ability to predict the future of its environment. Effective weight change is accomplished by referring to a cost function that computes the discrepancy between the activation pattern the network produces on its output layer and the word that actually occurs next at each point in time. By choosing, as the measure of discrepancy, the Kullback–Leibler divergence, D, between output activation and observed event ( D = S t i log(t i / a i ))where ti is 1 if word j occurred and 0 otherwise), one arrives at i∈Outputs
1To be able to interpret the outputs as probabilities, I have used the normalized exponential sigmoid
for the output units as a group: a i =
e net i
∑e
net j
j∈Outputs
.
28
TABOR
FIGURE 1
The simple recurrent network (Elman, 1990).
a particularly simple and intuitive formula for reducing total cost on the basis of the network’s experiences with each event: The change in the weight on the connection from unit j to unit i, ∆wij, is given by ∆wij = (ti – ai)aj (Rumelhart, Durbin, Golden, & Chauvin, 1995). This formula is called the delta rule. It says, essentially, to change those weights most that come from active nodes (for they are the ones that are creating the current pattern), and change them in the direction that makes the network more strongly expect what just happened in this context to happen again the next time this context is encountered. As Elman (1990, 1991, 1995), Christiansen (1994), Christiansen and Chater (1999), Rohde and Plaut (1999), Tabor (1994), Tabor, Juliano, and Tanenhaus (1997) have shown, this kind of network can do a reasonable job at learning syntactic and semantic structure from a simple corpus of words approximating patterns that occur in English. When it is exposed to a several-hundred-thousand word corpus of sentences based on a simple vocabulary of 30 lexemes or so, it learns to distribute activation over its output units in probability distributions that fairly accurately characterize the semantic and syntactic constraints inherent in the corpus. After an initial the, for example, the trained network distributes activations mostly over adjectives and nouns; if it then gets an adjective such as happy, it distributes activation over compatible second adjectives and compatible nouns (e.g., those that refer to sentient creatures). In this sense, the network learns a time-sensitive encoding of the syntactic and semantic structure of the corpus. Ecological character of connectionism. In what ways does the connectionist model address the previously outlined ecological critique of representationalist treatments?
SYMBOLIC COMPUTATION
29
Regarding the staticness of the generativist representations, the network model brings at least some improvement. At the level of the objects with which it is designed to deal (sentences), the network interacts with those objects in a temporal sequence that is similar to the sequence in which people interact with them in life: one word at a time, starting from the first word, going toward the last word. By embracing this particular dynamical aspect of speech, the network encodes syntactic invariances enmeshed with those “processing” invariants associated with temporal sequencing. Indeed, Christiansen and Chater (1999) have shown that, in empirically prominent ways, the behavior of the network model diverges from the predictions of the simplest generative parsing theory in ways quite similar to the way humans diverge (e.g., both networks and humans struggle with center-embedded structures such as “the dog the cat the rat chased bit died”). Generative models of parsing generally posit an additional mechanism (“memory load”; e.g., E. Gibson, 1998) to account for these facts. The network account, by being more direct, avoids this disjunction. Although no network of this sort has yet been successfully trained to pick up on facts as subtle as the chicken–dinner vagaries illustrated previously, there is a significant sense in which the network approach opens the door to a treatment of the highly context-sensitive nature of natural language interpretation. Unlike the representationalist account, which begins with context-freeness and tries to back away from it as the data warrant, the network account begins with openness to arbitrary context sensitivity and then tries to mold its attention (though not categorically) to focus just on the most relevant facts in each context. For this reason, learning network models essentially never assign the same encoding to two different perceptions. Their internal codes are real-valued and sensitive to subtle properties of their environment, so the chance of two codes being identical is small (unless “design governs”—see reference to this later in this article). The network models are thus more naturally context sensitive. Finally, as previously suggested, because it learns an encoding in the service of a simple functional task (predicting the future of its environment), the network introduces a degree of lawfulness that surpasses the lawfulness achieved by the generative program. In network terminology, both activation change and weight change are lawfully related to the properties of the network’s environment. These correspond, respectively, to processing lawfulness and inductive lawfulness. Regarding the latter, it is reasonable to say that the network builds a “grammar” by doing the best it can to fit some rather flexible materials in its possession (its “hidden unit manifold”) to the structure of the environmental invariants. How, one might ask, is this different from the post hoc inductive models, mentioned previously, which have been constructed on the basis of linguistic structural discoveries? If the discrete parameters of the generative models can have arbitrary form and be great in number, then the difference between symbolic, discrete-parameter models and dynamic, continuous-parameter models may be very subtle and hard to establish empirically. What is different is that the network model effectively derives the parameter list from a single, simple principle—the learning rule. Also, by addressing
30
TABOR
induction in quantitative terms, the connectionist approach necessitates the development of specific distributional characterizations of the training data, so its analytical emphasis is much more evenly distributed between organism and environment than in the more mindcentric generative approach.2 It can be said, then, that the network model improves on lawfulness by providing a simple principle for lawful learning, as well as for lawful behavior. A challenge for the connectionist approach. The foregoing discussion suggests that connectionist language models address several of the ecological complaints about representationalist models of language. But there is, it must be noted, a substantial practical challenge for the connectionist approach. Connectionist models labor hard to learn the kinds of complex temporal dependencies illustrated in Example (1). Their temporal realism introduces a bias that makes the signals coming from short temporal dependencies louder than the signals coming from longer ones; moreover, their generalism with regard to possible dependencies implies that, with even a few tens of vocabulary items and a couple of intervening words, the correlational signals they are trying to detect in the longer, phrasal-dependency cases are shrouded in a haze of noise created by irrelevant potential dependencies, and this haze makes learning very difficult (Servan-Schreiber, Cleeremans, & McClelland, 1991). In other words, the very properties that make the models desirable on ecological grounds seem to fetter them when it comes to handling the complexities of natural language syntax. Addressing the challenge. What can be done? In this section, I outline a method (described in detail in Tabor, 2000) for encoding languagelike complex dependencies in connectionist networks. The method, called dynamic automata or fractal grammars, skips over the problem of learning and focuses directly on encoding. Of course, this means it fails to fully meet the lawfulness desideratum identified previously because it does not address induction. It also has a discrete symbolic bias—it is a method of creating idealized, context-free structures whose sentence-parsing abilities are exactly equivalent to the idealized symbol-manipulating grammars that form the backbone of generative linguistics theories. This is useful, I suggest, because it helps establish a set of bearing points in the wide sea of nonlinear systems, which the connectionist networks are capable of embodying (Moore, 2It is true that I and many connectionist modelers are quite happy to take it as a goal of research to model “the mentation associated with an actual instance of comprehending.” This perspective might seem to be at odds with that of J. Gibson (1979/1986) and many of his successors, who do not seek laws characterizing sequences of mental events. I think this apparent contrast is false. Gibsonians may not explicitly model mental events, but when they seek laws describing, for example, the affordances provided to a deer as opposed to a chipmunk by a downed tree, they are, in effect, characterizing animal-specific mental processes. The difference between the classical symbolic approach on the one hand, and the ecological and connectionist approaches on the other, lies not in whether mental events are accepted or rejected but in the qualities that are ascribed to them.
SYMBOLIC COMPUTATION
31
1998). In a later section (Simulations), I bring the learning algorithm back in and reexamine it with the help of these bearing points. Encouragingly, the learning networks appear to converge on the types of encodings predicted by the dynamical automaton models. Moreover, there is a revealing alignment between symbolic and dynamical computation hierarchies. Dynamical Automata A major feature of the syntactic structure of natural language is its nested structure: There are many cases where a phrase occurs inside another phrase, including ones in which the embedded phrase is of the same type as the dominating phrase recall (1). To think clearly about nested sequence structures, it is helpful to design a simple example. Consider a language, called Language 0, which has a prototype sentence in it consisting of the words, a, b, c, and d in sequence. Suppose that after any word of Language 0, it is possible to insert an embedded instance of this prototype sentence. Thus a typical sentence would be something such as a b a b c d c a a b c d b c d d. Language 0 can be described by the context-free grammar shown in Table 1. Table 1 describes a symbolic system for generating Language 0. A symbolic system computes by putting symbols in memory registers and following symbolic rules for manipulating the symbols. By contrast, a connectionist network computes by adjusting the real-valued activations of its nodes. The relation between the form of the symbolic symbols and the information they point to is arbitrary. By contrast, a connectionist network computes in a metric space, where distances between states are defined, and nearby states predict similar behaviors; thus, the form of the network’s encoding tends to bear a predictable relation to its content. How might a connectionist, metric-space computer be designed so it could easily produce all and only the strings of Language 0? To keep track of the temporal dependencies in Language 0, it is necessary to keep track of each point at which an a b c d sequence was started but not finished. A symbolic machine uses a stack for this purpose—a list of the incomplete embeddings that need to be completed. Suppose this list uses the stack symbol A for an embedding under a; TABLE 1 Grammar 0, Which Defines Language 0 S→ABCD
A→aS A→a
B→bS B→b
C→cS C→c
D→dS D→d
Note. To generate a sentence, the grammar-interpreter starts with a rule of the form “S → …” and stores, in a special staging area of its memory, the sequence of symbols to the right of the →. It then attempts to replace each of these symbols following the rules of the grammar under the assumption that the symbol “→” means “can be replaced by.” The process terminates whenever it reaches a point where no more replacements can be made. The sequence that lies in the staging area at that point is the generated sentence.
32
TABOR
FIGURE 2
A useful way of mapping stack states for Language 0 to neural activation states.
B for an embedding under b; and so forth. Then, for Language 0, the set of all possible stack states that can occur in the language is the set of all strings composed of 0 or more As, Bs, and Cs (a D stack symbol is not needed because d always ends a phrase). This set is called {A, B, C}*. To keep track of stack states, a metric-space computer needs to map each member of {A, B, C}* to a unique point in the metric space. Figure 2 shows one scheme for doing this in a two-unit neural network. The two axes identify activation values of the two units. The points associated with stack states are points in a fractal set called the Sierpinski triangle—note that each stack state is at the midpoint of the hypotenuse of a triangle that is isomorphic to the whole set. Of course, there are (uncountably infinitely) many ways to map the members of {A, B, C}* to points in a connected metric space. The proposed way translates naturally into a connectionist encoding because (a) the infinite set of stack states lies in a bounded region—this helps because unit activations are bounded; (b) states that are prominently different in a sensory sense (the state of expecting b vs. the state of expecting c) are separable from one another by straight-line boundaries or linearly separable (Minsky & Papert, 1969)—this helps because sigmoidal activation functions approximate linear separators; and (c) the larger scale shape of the trajectory associated with a particular phrase stays constant across levels of embedding (e.g., a b c is always “lower right to lower left to upper left” in Figure 2)—this helps because the structure of the separators can be used to make appropriate distinctions at all levels of embedding, provided the scaling is appropriately normalized. Table 2 shows how to cash these benefits in a r connectionist encoding. The essence of the network is a two-element vector, z, r corresponding to a position on the Sierpinski triangle (Barnsley, 1993). When z is in the subset of the plane specified in the Compartment column, the possible input
SYMBOLIC COMPUTATION
33
TABLE 2 Dynamical Automaton 0 (DA 0) Compartment z1 > ½ and z2 < ½ z1 < ½ and z2 < ½ z1 < ½ and z2 > ½ Any
Input b c d a
State Change r z r z r z r z
r ← z − (½, 0 ) r ← z + (0, ½) r ← 2 ( z − (0, ½)) r ← ½ z + (½, 0 )
words are those shown in the Input column. Given a compartment and a legal inr put for that compartment, the change in z that results from reading the input is shown in the State Change column. If we specify that the network must start with r z=(½,½), make state changes according to the rules in Table 2 as symbols are read r from an input string, and return to z=(½,½) (the Final Region) when the last symbol is read, then the computer functions as a recognizer for the language of Grammar 0—that is, the rules will bring it back to the starting point for all and only the sentences of Language 0. To see this intuitively, note that any subsequence of the r form a b c d invokes the identity map on z. Thus Dynamical Automaton 0 (DA 0) is equivalent to the nested finite-state machine version of Grammar 0. I refer to Table 2 as a “connectionist encoding” because the formulas translate directly into an artificial neural implementation using standard connectionist devices (Tabor, 2000). Tabor also shows that the method illustrated in this example is sufficiently general that it can handle all nested phrase-structure dependencies. Lyapunov Analysis Having used representational conceptions to design dynamical systems (neural networks) for processing phrase-structure languages, I wanted to see whether self-organizing (learning) neural networks were, in fact, creating similar encodings when faced with unbounded nesting languages. But because the networks need to work with a few more than two dimensions (otherwise learning success is a long shot), it is not easy to compare dynamical automata to trained neural-symbol processors simply by “looking at” their hidden unit encodings. A tool is needed. Lyapunov characteristic exponents (Abarbanel, 1996; Oseledec, 1968) are useful for characterizing the complexity of dynamical processes (processes such as walking, seeing, haptically exploring, or, in this case, understanding and producing language). Many dynamical systems have attractors, or states, which the system tends toward over time. Attractors can be single, static-system states (such as the hangingstraight-down state of a pendulum). They can also be dynamic—for example, an easily sustained rhythmic gait in a walking organism. I suggest that familiar phrasal sequences (e.g., determiner-adjective-noun, noun_phrase-verb-noun_phrase, etc.) are associated with trajectories on a dynamic attractor that underlies language processing. There are two main classes of dynamic attractors: attractive limit cycles— bounded dynamic sequences that repeat—and chaotic attractors—these are bound-
34
TABOR
ed like limit cycles but move around so wildly (and unpredictably) that they never repeat. Lyapunov exponents measure the average rate of contraction (or, equivalently, divergence) of system states near the attractor of a dynamical system. In deterministic dynamical systems, Lyapunov exponents can be used to classify an attractor as repeating or chaotic: The maximal Lyapunov exponent on a limit cycle is negative; the maximal exponent on a chaotic attractor is positive. Thus, by measuring Lyapunov exponents, one can discover qualitative distinctions between dynamical systems. I suspect that natural languages have complexity such as that of chaotic attractors (and thus positive Lyapunov exponents). One of the main points of this article is to suggest that linguistic theory’s grammars are associated with dynamical systems that have Lyapuonv exponents equal to 0. This puts them right on the border between repeating processes and unpredictable ones. Though linguistic processes probably do not live right on the border, my suspicion is that they live near it, because they show a lot of similarity to the cases on the border. Thus, understanding how the cases on the border work may be a helpful step toward understanding how natural languages work. The standard definition of Lyapunov exponents applies only to deterministic dynamical systems. The Appendix describes an extension to symbol-driven stochastic dynamical systems, which I used in discussing the following analyses. Under this extended definition, dynamical automata that process context-free languages have Lyapunov exponents equal to 0. The next section reports measurements of Lyapunov exponents of self-organizing (learning) connectionist networks trained to generate languages defined by symbolic processes. I tested the hypothesis that the learning networks trained on phrasal-embedding languages would exhibit 0 Lyapunov exponents. Zero Lyapunov exponents in connectionist networks trained on context-free languages would be of interest for several reasons. First, they would indicate that the self-organizing connectionist models develop the same fractal organization as the preprogrammed dynamical automata, and thus that insight into the easily understood dynamical automata can be extended to poorly understood connectionist models. Second, they would indicate a correspondence between an important dividing line in the realm of dynamical systems (zero Lyapunov exponents) and an important dividing line in the realm of symbolic computers (infinite-state Turing machines, which lie between finite-state machines and nonrecursive devices). Third, the “edge of chaos” (where Lyapunov exponents are 0) has been identified as important for living systems based on rather different considerations. For example, Alexander and Globus (1996) argue that brains have a recursive cellular organization that puts their dynamics on the edge of chaos, and Kauffman (1993) argues that optimal biological evolution occurs when systems are on the edge of chaos. The identification of a pervasive property of natural language syntax as chaos proximal would suggest that language theory might be incorporated into biology in a helpful new way and that linguistic analysis offers tools for understanding complex phenomena that might have application in other sciences.
SYMBOLIC COMPUTATION
35
SIMULATIONS Which stack-based languages should be used to test the neural networks? Linguistic research has identified several patterns in natural languages that motivate the use of various kinds of stack memories. In this section, I introduce the training data by describing some of that research. Linguistic–Computational Guidance The syntactic patterns of natural language come in dizzying variety. Chomsky (1957) attempted to shine a beam on this feature of human-generated languages by describing a series of increasingly powerful computing mechanisms, now referred to as the Chomsky hierarchy (Table 3). About this series of devices, Chomsky posed the question: Which is the least powerful device that can encode all the order-of-morpheme patterns of each human language in the world? A finite-state language is a set of element sequences that can be generated by a computer, called a finite-state machine, which only occupies a finite number of states. Each context-free language can be generated by a finite-state machine manipulating an unbounded pushdown stack (i.e., a first-in, last-out memory). Each context-sensitive language can be generated by a finite-state machine manipulating a memory that can be accessed in any order, provided the amount of memory needed by the device grows at most linearly with the length of the output. Each Turing language can be generated by a finite-state machine coupled with an any-access-order memory of unbounded (though still finite) size (Hopcroft & Ullman, 1979). This hierarchy provides a kind of crude map, which is useful for navigating among the syntactic patterns that occur in natural languages. I concur with Moore (1998) that the Chomsky hierarchy is an imperfectly designed tool, and it will be useful to replace it with a better apparatus. I will discuss part of the case for this claim below, but the hierarchy is not altogether useless, and it serves well for getting started. Grammars Counting. Chomsky (1957) argued for the need for (at least) stack-based computation (above finite state on the hierarchy) on the basis of constructions such as (6), where Si corresponds to some declarative sentence. TABLE 3 The Formal Language Classes of the Chomsky (1957) Hierarchy Finite state Context free Context sensitive Turing machine
Finite number of states Pushdown stack Linear bounded tape Unbounded tape
36
TABOR
(6) a. If S1 then S2. b. If it is the case that if S1 then S2 then S3. A less formalized example reveals that the structure, (6b) can be part of a normal-sounding sentence (7). (7) If it is true that if we leave without telling Bunnie where we are going then she will ransack the apartment when she returns, then I’d say let’s wait until she gets back before we head out. Simplification and generalization of (6), with substitution of shorter symbols for longer yields (8). (8) If separator if separator if … then S1 then … then Sk where the number of ifs = the number of thens = k. Assuming the content of the Sis is arbitrary (a case of the simplifying context-freeness assumption), the embedding structure of this form is isomorphic to the embedding structure of the language {anbn : n ∈ {1, 2, 3, … }} (i.e., the set of strings consisting of 1 or more as followed by the same number of bs). As Rodriguez (2001) noted, a symbolic stack for generating this language functions simply as a counter—it counts the number of as and remembers the count in order to determine how many bs to generate. This “counting language” is, in some sense, the simplest non-finite-state language. I chose it as the basis for the first network simulation. To design a generator with an embedding distribution similar to that of natural language, I used the probabilized version of anbn shown in Row Sl of Table 4. In this language, sentences of embedding level i occur half as often as sentences of embedding level i – 1. Palindromic. Typically, languages do not repeat the same elements over and over in the same sentence (as in the language anbn just discussed). But on an abstract level, they use the same patterns over and over again, sometimes in rather elaborately nested configurations. For example, the transitive clause pattern (subj-verb-object) appears repeatedly in (9). (9) The realization1-subj that someone2-Subj/3-Obj she3-Subj had known3-Verb well but had not seen3-Verb for over twenty years was about to walk2-Verb into the room filled1-Verb Tial-obj with a kind of delicious dread. Many subject–noun phrases can combine felicitously with a limited class of verbs. Simplifying considerably, we can approximate the dependencies with a list of preferred sequences (10).
SYMBOLIC COMPUTATION
37
TABLE 4 Infinite State Symbol-Generating Grammars Grammar Number
Name
S1
Counting
S2
Palindromic
S3
Interleaving
Definition S → S1 p S1 → a (S1) b S → S1 p S1 → S11/S12 S11 → a (S1) b S12 → (S1) y S → Wiwip Wi → {n1, n2}* wi is hom (Wi)a
Note. A constituent in parentheses is present in half the instances and absent in the other half. Two or more constituents with slashes (/) between them split the instances equally among them. In Language S3, the probability of generating a string of length 2k + 1 was 1/2k. The symbol p is an end-of-sentence marker. ahom (W ) replaces n1 with v1 and replaces n2 with v2. i
(10) N1 V1 N 2 V2 The nesting then creates patterns such as (11). (11) N2 N1 V1 V2 N 2 N2 V2 V2 N 1 N2 N2 V2 V2 V1 etc. which, taken all ways, define the language WW′, where each W is a sequence of one or more N1s and N2s, and each W′is the corresponding reversed sequence of V1s and V2s (see Row S2 of Table 4). This language is a type of palindrome language. In the realm of symbolic devices, a pushdown stack is the minimal device that is needed to keep track of the dependencies in this language. For the second simulation, I used a pushdown automaton to generate strings from the palindrome language. The probability of embedding was again equal to .5 wherever embedding was possible. Interleaving. Harman (1963), perhaps inspired by the ubiquity of nested dependencies in English and other languages, promoted the thesis that a restrictive theory of grammar should use only (context-free) phrase-structure grammars for syntactic representation (see also Pullum & Gazdar, 1982). Claims such as his prompted linguists to search the languages of the world for patterns that could not be handled by context-free rules. In fact, several cases turned up, most of them involving structures along the lines of the Dutch sentence (12)
38
TABOR
(Huybregts, 1976; example taken from Bresnan, Kaplan, Peters, & Zaenen, 1982; see also Savitch, 1987). (12) … dat Jan Piet Marie de kinderen zag helpen laten zwemmen … that Jan Piet Marie the children saw help make swim … that Jan saw Piet help Marie make the children swim. The subject–verb correlations in this sentence have the interleaved structure shown in (13). (13)
Again, approximating the informational dependencies with constraints on the sequencing of types, we consider the language WW′ where W is a sequence of nouns chosen from {N1N2} and W′ is the corresponding sequence of verbs with the verbs in the same order as the nouns. I refer to this language as an interleaving language. A simple interleaving language with two types of Ns and two types of Vs is listed on Row S3 of Table 4. The existence of crossed-serial dependencies in Dutch and other languages was originally interpreted as evidence that natural languages lie higher on the Chomsky hierarchy than the context-free level. Following Moore (1998), I suggest a different interpretation: The Chomsky hierarchy is an imperfect framework. The simplest symbolic device that can keep track of the dependencies in an interleaving language is a queue automaton, which uses a first-in, first-out stack. The difference between pushdown stacks and queues is not a computational power difference but a difference in the kind of computational device involved. The fact that natural languages exhibit both kinds of structure is evidence that the theory of language needs to cut in at a more fundamental level. One response (not insightful) would be to list both stacks and queues in the array of devices available for the learners of natural languages to choose from when they set up a grammar. A better response, I suggest, is to define a more general computational framework and to let structure, in the sense of stacks and queues, develop emergently, in response to experience with the data. The following results indicate that the class of connectionist networks studied here constitutes such a framework. Control Cases The dynamical automaton hypothesis predicts that if a neural network learns any language that can be efficiently characterized with a stack mechanism (either
SYMBOLIC COMPUTATION
39
pushdown or queuelike), then the maximal Lyapunov exponent of the induced stochastic process should approximate 0. Because a learning network only converges on a perfect stack emulator in the limit (and is also limited by the precision of its implementation), measurements will produce values near 0, but not exactly 0. Thus, to test the dynamical automaton hypothesis, it is useful to construct a set of control cases against which quantitative comparisons can be made. When should the maximal exponent not be 0? If the learning algorithm functions conservatively in the sense that it does not build an emergent device more complex than it needs to for the task at hand, then all finite-state processes should lead to negative maximal exponents. Likewise, memoryless infinite-state processes should lead to negative maximal exponents. I examined a variety of such control cases. Finite, memoryless. Within the finite-state languages, there is an even smaller class of languages that Chomsky did not include as a separate entry in his hierarchy—the finite languages. The sentences of a finite language can be listed in a finite-length list. For the first control case, I trained a network on the finite language consisting of the single sentence a b c (repeated over and over again throughout the training process; see Language Cl of Table 5). Finite with memory. Crudely speaking, the difference between the finitestate and infinite-state languages on the Chomsky hierarchy is the inclusion of memory. But the memory of a stack or tape is a special kind of memory because its size is unbounded. There are many finite languages and finite-state languages that require the use of a memory too, a finite-length memory. One may wonder whether the presence of any correlational structure that requires the use of memory will induce a zero Lyapunov exponent. To address this question, I included a control netTABLE 5 Finite-State Symbol Generating Grammars Cl C2 C3
Finite, memoryless Finite with memory Infinite-state, memoryless
C4
Finite-state with memory
S→abc S→abac S → a (P1/P2) p P1 → a1 (P11/P12) P2 → a2 (P21/P22) P11 → a11 (P111/P112) P12 → a12 (P121/P122) P21 → a21 (P211/P212) P22 → a22 (P221/P222) … S → (NP) v (NP) p NP → n (NP)
Note. A constituent in parentheses is present in half the instances and absent in the other half. Two or more constituents with slashes (/) between them split the instances equally among them. The symbol p is an end-of-sentence marker.
40
TABOR
work that was trained on the language consisting of the sentence a b a c (Language C2 in Table 5). In order to distinguish between the b and c events, the processor needs to remember the event preceding each a while simultaneously attending to the current symbol—a simple, finite-memory task. Infinite-state, memoryless. It is also possible to define an infinite-state language that requires no memory if one employs an infinite alphabet of symbols. Language C3 in Table 5 is such a language. This language has the structure of a set of lineages read randomly off an infinite-depth taxonomic tree starting at the root, with truncation probability always equal to .5. This language provides a valuable comparison to the palindrome language and the interleaving language because in all three cases, there is a hyperbolic (Zipf’s law) relation between each state’s frequency and its frequency rank (Zipf, 1949). One might, a priori, expect such 1/f structure in the frequency distribution to be associated with a chaotic or edge-of-chaos dynamical process. But because memory is not involved, the dynamical automaton hypothesis predicts limit-cycle dynamics and hence a negative maximal Lyapunov exponent. Finite-state with memory. Early in the history of psycholinguistics, it was noted that probabilistic finite-state machines (also called finite-state Markov models) can provide good approximations of natural languages (Osgood & Sebeok, 1954). As control Language C4 (Table 5), I included the output of one such device, inspired by English compound noun and clause structure. This language contains infinite-length sentences and requires information to be stored over unbounded time, but it only requires a finite number of states. For all four of these control cases, the dynamical automaton hypothesis predicts limit-cycle dynamics and, hence, negative maximal Lyapunov exponents in the network solution. Results The simulations were run with the simulator Lens (Rohde, 2001). Each network was trained for 3,000,000 pattern presentations with learning rate 0.0001. The simulations used “Doug’s momentum” (value 0.9), a method of avoiding overly radical adjustment of weights when the cost function is steep. The root mean squared error (RMSE) at the end of training for each type of network (computed with respect to the grammar-derived probabilities on an appropriate test corpus of 200 random sentences in each case) are shown in the column labeled RMSE in Table 6. Root mean squared error is the standard error measure reported in neural-network studies—it gives an approximate sense of how the network performed quantitatively with respect to the cost function it was trying to minimize, but it does not give a very good sense of the qualitative character of the results in cases such as the ones at hand where correctness on specific structures is important. To
SYMBOLIC COMPUTATION
41
address qualitative performance, a particular word-to-word transition was defined as “correctly processed” if the vector of network activations on that transition was closer to the correct grammar-derived probability vector than it was to any other grammar-derived probability vector (Tabor & Tanenhaus, 1999). At the end of training, each network that processed embedded structures processed at least 69% of the transitions in doubly embedded sentences correctly, 88% of the transitions in singly embedded sentences correctly, and 98% of the transitions in matrix sentences correctly in a sample of 200 sentences, although performance was much better on the simplest case, anbn (> 99.3% of all transitions correct down to 6 levels of embedding). Each network that processed data from the taxonomic grammar processed all sentences at least 4 words long correctly in a sample of 200 random sentences. Each finite-state-trained network processed all transitions correctly in a sample of 200 random sentences. The superior results for matrix sentences and finite-state grammars may reflect the finite-state bias of the random-initial state of each network noted by Christiansen and Chater (1999) and Tiòo, Èeròanský, and Beòuková (2001). I note below, however, that at least in the case where the network did well on an infinite-state language (anbn), the network cum learning algorithm may have an infinite-state bias. Table 6 shows the average maximal Lyapunov exponent for each type of network. An analysis of variance (ANOVA) with language environment as random factor indicated that the maximal exponents were less negative for the set of stack-trained networks than for the set of non-stack-trained networks, F(l, 5) = 15.66, p = .011. This result is encouragingly consistent with the dynamical automaton hypothesis. A second analysis showed that the maximal exponents for just the stack-trained networks were significantly less than 0. This result was expected because the networks appear to approximate infinite-state computation by building progressively more complex limit-cycle machines over the course of training. The limit cycles lead one to expect negative exponents; only in the limit of infinite training should the maximal exponent actually equal 0. TABLE 6 RMSE and Maximal Lyapunov Exponents for the Net, Organized by Training Grammar
Label S1 S2 S3 C1 C2 C3 C4 Note.
Grammar
RMSE
Standard (RMSE)
Stack Memory?
Maximal Exponent
SD
Counting Palindromic Interleaving Finite Finite with memory Infinite-state, memoryless Finite-state with memory
0.05 0.15 0.14 0.00014 0.00029 0.03 0.013
0.02 0.01 0.01 0.0000039 0.000091 0.01 0.003
Yes Yes Yes No No No No
–0.32 –0.31 –0.22 –1.88 –1.44 –1.74 –0.75
0.10 0.20 0.09 0.21 0.24 0.19 0.23
RMSE = root mean squared error.
42
TABOR
Structuralism and Causality A related result sheds some new light on the discussion about the relation between discrete (or “symbolic”) and dynamical computation. Carello et al. (1984) argued against a dualist treatment of these types (e.g., as advocated by Pattee, 1982) and stated that under their “strategy of elaborating continuous dynamics, the so-called discrete mode would be relieved of an explanatory role and relegated to the status of just one way (out of several or many ways) that a complex system might behave” (p. 237). “One way out of several” is an accurate description of the status of the discrete mode in the models at hand. The implementation of stacklike computation in the recurrent network, a dynamical computer, depends on the creation of a precise balance in the timing of the model’s habitation of expansive and contractive regions of the hidden unit manifold. Tabor (2000) showed that in one parameterizable dynamical automaton, the cases of context-free computation (one kind of discrete-mode computation) are rare atolls in a wide sea of non-context-free behavior. Countability considerations indicate that these non-context-free behaviors must consist mostly of super Turing processes, that is, they are outside the realm of the discrete mode. When Carello et al. (1984) talked of relieving the discrete mode of its “explanatory role,” I interpret them as rejecting the structuralist habit of discovering a pattern in a domain and describing a characterization of the pattern as an explanation. One wants to know why the pattern is there in a more ultimate sense. The present work attempts to probe more ultimate causes by examining language learning (inductive lawfulness). But one particular behavior of the learning mechanism at hand adds an interesting twist to the debate about the explanatory relevance of symbolic models: Tabor (2001) asked, How do networks such as those described above generalize when they are trained on just the most frequent sentences of one of the infinite-state probabilistic languages examined above? This question is a way of asking how the network goes beyond its input. The answer is that it shows a clear bias toward the infinite-state process from which the finite approximations were derived, even to the point of distorting its approximation of situations it has reliable experience with. In Tabor (2001), I trained the network on the output of a series of finite-state grammars derived from the counting grammar mentioned previously. The first grammar just had the sentence a b; the second had, in addition, a a b b; the third added a a a b b b, and so forth. The probabilities of the sentences were the same as they were for the infinite-state counting language described above, except that the most deeply embedded sentence had twice the probability of its counterpart in the counting language and there was no possibility of continuing on after the last a. I trained networks such as those described above on this succession of grammars (200,000 words presented from each). I compared the performance of the network after each stage of training to the performance of a variable-length Markov model (VLMM; Ron, Singer, & Tishby, 1996) trained on the same data. VLMMs work on
SYMBOLIC COMPUTATION
43
a simple principle: Predict the future by finding the longest identical past to the past of the current state; if there are multiple longest identical pasts, take their mean as the current prediction.3 VLMMs are among the best finite-state methods of approximating natural language phenomena (Guyon & Pereira, 1995). Thus, comparing the network’s performance to that of the VLMM provided a way of asking how the network tended to diverge from optimal finite-state behavior. I tested the network and the VLMM on their predictions about new sentences with greater levels of embedding than those included in each training grammar. I used the infinite-state process as a standard. For all novel sentences, the network diverged from the VLMM in the direction of the infinite-state process: Its error with respect to the behavior of the infinite-state process was substantially lower than the error of the VLMM with respect to the same process (Figure 3). In fact, it even showed an infinite-statelike response to the most deeply embedded observed sentence, even though the training data provided a statistically reliable signal indicating finiteness—Figure 3. Tabor (2001) takes this result as suggesting that the infinite-state process may be a kind of attractor for the neural network: It tends to gravitate to it, even when the input only approximates it. The possibility that the simple, context-free grammar anbn is an attractor of the learning process has particular relevance to the question of causes. Imagine a flock of neural networks, such as these, providing the training signals for “younger” neural networks that are trying to learn the “language of their community.” If a particular grammar is an attractor in this sense, then it is plausible that the community would migrate toward that grammar across the generations. Such a model is, to be sure, rather unreal, because it is disembodied: There is no world that the language refers to; thus, for example, the language does not have a communicative function with respect to such a world. But it is not implausible that if extra linguistic reference were brought into the picture and it was useful to use context-free (or interleaved) embedded structures to describe the world at hand, then a lineage of network talkers such as these, with stacklike mental attractors, might well converge on one or another of the attractors over time. In this sense, the current model looks helpful for probing causes. Thus, the simple infinite-state systems that symbolic theories have identified as structurally relevant may also turn out to be relevant to a theory of the causes of the nature of language organization. To be sure, just identifying these states as important, as discrete-mode theorizing has done, is not adequate for understanding the causality involved. But it may be a helpful first step. The present study suggests that learning neural-network models may be able to take this development a step further. 3In natural language modeling work using real corpora, exhaustive exploration of identical paths is not feasible and so approximation methods are needed. In such contexts, VLMMs are thus associated with a specific strategy for employing long histories only where they help (Guyon & Pereira, 1995; Ron et al., 1996). Because the present study involves simpler, artificial languages, maximal matching contexts and their associated probabilities can be computed accurately in all cases, so no truncation is used.
FIGURE 3 Generalization behavior of the anbn model when it is trained on successively longer strings. Subgraph i corresponds to training grammar with sentences with embedding level 1 … i for each i. Each subgraph is stratified into 10 levels of embedding (x axis). The curves marked 0 show the average Kullback–Leibler divergence per word between the VLMM and the infinite-state process. The curves marked X show the average divergence per word between the network and the infinite-state process. The shaded regions indicate the depths of embedding on which the network and VLMM were trained.
44 TABOR
SYMBOLIC COMPUTATION
45
CONCLUSIONS In a nutshell, this article has developed the following argument: Generative linguistic models of sentence-level structure in natural languages have several properties that ecological psychologists have rightly criticized in representationalist models in general: staticness, context-freeness, and lack of a lawful basis. Connectionist learning models offer an alternative that is more dynamic, desirably context sensitive, and explicit about the laws relating the organism’s state to its environment. But it has not been clear how the connectionist learning devices can handle the complex temporal patterns that characterize natural language syntax. For this problem, representationalist mechanisms are manifestly useful. In earlier work (Tabor, 2000), I described a method (dynamical automata–fractal grammars) of translating the principles of their successes into the encoding framework in which connectionist models operate. The present study tested connectionist learning of complex, languagelike processes to see if they shared an important feature with the dynamical automata for such languages—zero Lyapunov exponents, or balance between habitation of contractive and expansive regions. Indeed, the results suggest that the learning connectionist models converge on representation with the same kind of fractal organization as the corresponding dynamical automata. Moreover, under similar learning conditions, Tabor (2001) found evidence suggesting that the infinite-state fractal computers are attractors of the learning system. These results suggest a potentially insightful way of aligning discrete and dynamical computation. Limit cycles correspond to finite-state devices. Edge-ofchaos processes correspond to stack-based mechanisms. The fact that a Turing machine can be built with three stacks suggests that all recursively enumerable computations may lie on the edge of chaos in the current sense. I speculate that chaotic processes correspond to so-called super-Turing computation (Siegelmann, 1999). The results thus locate one of the core entities of discrete computation (stack mechanisms) at the heart of dynamical computation as well. This insight shows promise of helping us figure out how to understand connectionist learning of the complex dependencies of natural language syntax. For example, one of the first insightful analyses of connectionist learning of complex languages, Rodriguez (2001), succeeds by training networks and then building similar dynamical automata. If, as Tabor (2001) suggests, the stack-based computers are attractors of the dynamical learning mechanism, then they may also play a central role in understanding what causes languages to be organized into constituent structures. The value, then, of symbolic, representationalist objects, is that they provide important bearing points for exploring complex dynamical learning systems, a psychologically appealing class of models. They are not by themselves, as the orthodox theories hold, good models of human mental states. The problem is that if one restricts one’s ontological prospect to just the bearing points, then one is helpless when it comes to navigation in the surrounding space. Such navigation is very
46
TABOR
helpful for addressing inductive lawfulness, as the connectionist models demonstrate. But having some bearing points for the navigation is also a good thing. In the introduction, I characterized a lawful theory, for present purposes, as one that provides a complete and coherent account of how mental states and environments coevolve at the timescale of moment-to-moment experience. When lawfulness is the topic, ecological psychologists are inclined to emphasize “specificational” lawfulness—illustrated, for example, by the relation between a physical situation and the optic flow pattern that it produces (Turvey & Carello, 1985). Specificational lawfulness contrasts with “indicational” lawfulness, which characterizes, among other things, the relation between a linguistic symbol and the meaning to which it is conventionally linked. If one assumes that linguistic objects (by which I mean actually occurring phonemes, morphemes, phrases, etc.) are to be treated as symbols and are thus of a fundamentally different ilk from other objects in the world (e.g., sidewalks, doors, and chairs), then it would appear that indicational lawfulness is fundamentally different from specificational lawfulness. Indeed, linguistic understanding seems, on initial observation, to require a different kind of computation from the kind that works well for detecting kinematically relevant invariants in rays of light. But what the connectionist models discussed here suggest is that linguistic understanding may not operate so differently after all. At the basic level, it may depend on the same necessity of establishing invariants that support effective action. It may require the same enfolded organization of metric-space computation, rather than the disconnected memory structure of Turing machines. The suggested similarity is supported by the fact that connectionist models operating on very similar principles to the ones described here have been used successfully to characterize nonlinguistic visual, olfactory, and auditory domains that appear to involve specification in the classical Gibsonian sense (Arbib, 1995). Convention seems strange in comparison to something from classical physics (e.g., gravity) because of the prominent role of arbitrariness. But it seems less strange in comparison to biological variety (Millikan, 2001), and there are good reasons to believe that biological variety is founded on a specificational relation between each organism’s environment and its nature (Thompson, 1917/1992). The prominence of linguistic arbitrariness may reflect the fact that we live on such intimate terms with linguistic conventions. The shift of the focus from language processing to language induction makes the conventional features less distracting because they become small bits of material that form the substance of a larger, law-governed whole. I conclude with four ideas:
• It might be helpful to study connectionist models of phenomena, such as haptic perception, dynamic touch, locomotion, and so forth, in order to situate these in the computational (as well as the dynamical) framework outlined here. • Representationalist entities might play a useful role in the ecological study of nonlinguistic perception if they were viewed as conceptual bearing points rather
SYMBOLIC COMPUTATION
47
than as sufficient models of the phenomena. For example, perhaps there are learned schemata of limb movement that could so serve. • The notion that symbolic objects might be attractors of mental dynamics suggests a new species of observer’s paradox: Why are representationalist models so appealing to many researchers? Perhaps the conceptions they identify are familiar because our mental states are likely to spend time near them. Why is it hard to examine the bigger picture, even when we can model it? Perhaps it is because the nonsymbolic states are mental transients that we experience only fleetingly. • Symbolic objects as attractors do, I believe, help probe the question of causality, but they also raise an interesting question about the proper approach to explanation from an ecological perspective: Are the stable states there because they afford us a selective advantage in the world (e.g., fractal language is useful for talking about the recursive structure of nature)? Or are they there because our minds and the world are traveling in a kind of abstract, information-theoretic reality, and the observed topologies are the (objective) nature of that reality? The notion of universal “laws of information” favors the latter view.
ACKNOWLEDGMENTS This research was partly supported by University of Connecticut Research Foundation Grant FRS 444038. Thanks to Chaopeng Zhou for discussions and help with running the simulations. Many thanks to Carol Fowler, Claire Michaels, William Mace, and Guy van Orden for helpful feedback on earlier drafts of this article.
REFERENCES Abarbanel, H. D. I. (1996). Analysis of observed chaotic data. New York: Springer-Verlag. Alexander, D. M., & Globus, G. G. (1996). Edge-of-chaos dynamics in recursively organized neural systems. In E. MacCormac & M. I. Stamenov (Eds.), Fractals of brain, fractals of mind: In search of a symmetry bond (pp. 31–73). Amsterdam: Benjamins. Arbib, M. A. (1995). The handbook of brain theory and neural networks. Cambridge, MA: MIT Press. Barnsley, M. (1993). Fractals everywhere (2nd ed.). Boston: Academic. Bresnan, J. (1982). The mental representation of grammatical relations. Cambridge, MA: MIT Press. Bresnan, J., Kaplan, R. M., Peters, S., & Zaenen, A. (1982). Cross-serial dependencies in Dutch. Linguistic Inquiry, 13, 613–635. Carello, C., Turvey, M. T., Kugler, P. N., & Shaw, R. E. (1984). Inadequacies of the computer metaphor. In M. S. Gazzaniga (Ed.), Handbook of cognitive neuroscience (pp. 229–248). New York: Plenum. Charniak, E. (1993). Statistical language learning. Cambridge, MA: MIT Press. Chomsky, N. (1957). Syntactic structures. The Hague, The Netherlands: Mouton. Chomsky, N. (1981). Lectures on government and binding. Dordrecht, The Netherlands: Foris. Christiansen, M. H. (1994). Connectionism, learning, and linguistic structure. Unpublished doctoral dissertation, Department of Linguistics, University of Edinburgh, Scotland. Christiansen, M. H., & Chater, N. (1999). Toward a connectionist model of recursion in human linguistic performance. Cognitive Science, 23, 157–205.
48
TABOR
Crutchfield, J. P. (1994). The calculi of emergence: Computation, dynamics, and induction. Physica D, 75, 11–54. Dennett, D. C. (1978). Brainstorms: Philosophical essays on mind and psychology. Cambridge, MA: MIT Press. Elman, J. (1990). Finding structure in time. Cognitive Science, 74, 179–211. Elman, J. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7, 195–225. Elman, J. (1995). Language as a dynamical system. In R. Port & T. van Gelder (Eds.), Mind as motion: Explorations in the dynamics of cognition. Cambridge, MA: MIT Press. Frege, C. (1952). On sense and reference. In P. T. Geach & M. Black (Eds.), The philosophical writings of Gottlob Frege (pp. 56–78). Oxford, England: Basil Blackwell. (Original work published 1892) Gibson, E. (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition, 68, 1–76. Gibson, E., & Wexler, K. (1994). Triggers. Linguistic Inquiry, 25, 407–454. Gibson, J. J. (1986). The ecological approach to visual perception. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. (Original work published 1979) Guyon, I., & Pereira, F. (1995). Design of a linguistic postprocessor using variable memory length Markov models. In International Conference on Document Analysis and Recognition (pp. 454–457). Montreal, Quebec, Canada: IEEE Computer Society Press. (Available at www.clopinet.com/isabelle/ Papers/index.html) Harman, C. H. (1963). Generative grammars without transformational rules. Language, 39, 597–616. Haykin, S. S. (1994). Neural networks: A comprehensive foundation. New York: Macmillan. Hertz, J., Krogh, A., & Palmer, R. G. (1991). Introduction to the theory of neural computation. Redwood City, CA: Addison-Wesley. Hopcroft, J. E., & Ullman, J. D. (1979). Introduction to automata theory, languages, and computation. Menlo Park, CA: Addison-Wesley. Huybregts, M. A. C. (1976). Overlapping dependencies in Dutch. Utrecht Working Papers in Linguistics, 1, 24–65. Hyams, N. M. (1986). Language acquisition and the theory of parameters. Dordrecht, The Netherlands: Reidel. Joshi, A. K., & Schabes, Y. (1996). Tree-adjoining grammars. In C. Rosenbeerg & A. Salomaa (Eds.), Handbook of formal languages (Vol. 3, pp. 69–123). New York: Springer-Verlag. Kamp, H., & Reyle, U. (1993). From discourse to logic: Introduction to model-theoretic semantics of natural language, formal logic, and discourse representation theory. Dordrecht, The Netherlands: Kluwer Academic. Kauffman, S. (1993). The origins of order: Self-organization and selection in evolution. Oxford, England: Oxford University Press. Lasnik, H. (1990). Essays on restrictiveness and learnability. Dordrecht, The Netherlands: Kluwer Academic. Millikan, R. G. (2001). Purposes and cross-purposes: On the evolution of languages and language. Unpublished manuscript, Department of Philosophy, University of Connecticut. Minsky, M., & Papert, S. (1969). Perceptrons. Cambridge, MA: MIT Press. Montague, R. (1974). English as a formal language. In R. H. Thomason (Ed.), Formal philosophy: Selected papers of Richard Montague (pp. 108–221). New Haven, CT: Yale University Press. (Original work published 1970) Moore, C. (1998). Dynamical recognizers: Real-time language recognition by analog computers. Theoretical Computer Science, 201, 99–136. O’Reilly, R. C., & Munakata, Y. (2000). Computational explorations in cognitive neuroscience. Cambridge, MA: MIT Press. Oseledec, V. I. (1968). A multiplicative ergodic theorem: Lyapunov characteristic numbers for dynamical systems. Trudy Moskovskogo Matematicheskogo Obshchestva, 19, 197.
SYMBOLIC COMPUTATION
49
Osgood, C., & Sebeok, T. (1954). Psycholinguistics: A survey of theory and research problems. Journal of Abnormal and Social Psychology, 49(4, Pt. 2), 1–203. Pattee, H. H. (1982). The need for complementarity in models of cognitive behaviors: Response to Carol Fowler and Michael Turvey. In W. Weimer & D. Palermo (Eds.), Cognition and the symbolic processes (Vol. 2). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Pollard, C., & Sag, I. A. (1994). Head-driven phrase structure grammar. Chicago: University of Chicago Press. Pullum, G. K., & Gazdar, G. (1982). Natural languages and context free languages. Linguistics and Philosophy, 4, 471–504. Rodriguez, P. (2001). Simple recurrent networks learn context-free and context-sensitive languages by counting. Neural Computation, 13, 2093–2118. Rohde, D. (2001). Lens, the light, efficient network simulator [Computer software]. Retrieved from http://www.cs.cmu.edu/dr/Lens/ Rohde, D., & Plaut, D. (1999). Language acquisition in the absence of explicit negative evidence: How important is starting small? Journal of Memory and Language, 72, 67–109. Ron, D., Singer, Y., & Tishby, N. (1996). The power of amnesia: Learning probabilistic automata with variable memory length. Machine Learning, 25, 117–149. Rumelhart, D., Durbin, R., Golden, R., & Chauvin, Y. (1995). Backpropagation: The basic theory. In Y. Chauvin & D. Rumelhart (Eds.), Backpropagation: Theory, architectures, and applications (pp. 1–34). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Rumelhart, D. E., McClelland, J. L., & PDP Research Group. (1986). Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1). Cambridge, MA: MIT Press. Savitch, W. J. (Ed.). (1987). The formal complexity of natural language. Norwell, MA: Kluwer. Servan-Schreiber, D., Cleeremans, A., & McClelland, J. L. (1991). Graded state machines: The representation of temporal contingencies in simple recurrent networks. Machine Learning, 7, 161–193. Siegelmann, H. T. (1999). Neural networks and analog computation: Beyond the Turing limit. Boston: Birkhäuser. Tabor, W. (1994). Syntactic innovation: A connectionist model. Dissertation Abstracts International, 55(01), 3178A. Tabor, W. (2000). Fractal encoding of context-free grammars in connectionist networks. Expert Systems: The International Journal of Knowledge Engineering and Neural Networks, 17, 41–56. Tabor, W. (2001). Lyapunov characteristic exponents of discrete stochastic neural networks. Unpublished manuscript, University of Connecticut. Tabor, W., Juliano, C., & Tanenhaus, M. (1997). Parsing in a dynamical system: An attractor-based account of the interaction of lexical and structural constraints in sentence processing. Language and Cognitive Processes, 12, 211–271. Tabor, W., & Tanenhaus, M. K. (1999). Dynamical models of sentence processing. Cognitive Science, 23, 491–515. Tanenhaus, M., Carlson, G., & Trueswell, J. (1989). The role of thematic structures in interpretation and parsing. Language and Cognitive Processes, 4, S1211–S1234. Thompson, D. W. (1992). On growth and form. New York: Dover. (Original work published 1917) Tiòo, P., Èeròanský, M., & Beòuková, L. (2001). Markovian architectural bias of recurrent neural networks. Unpublished manuscript, Neural Computing Research Group, Aston University, Birmingham, England. Turvey, M. T., & Carello, C. (1981). Cognition: The view from ecological realism. Cognition, 10, 313–321. Turvey, M. T., & Carello, C. (1985). The equation of information and meaning from the perspectives of situation semantics and Gibson’s ecological realism. Linguistics and Philosophy, 8, 81–90. van Eijck, J., & Kamp, H. (1996). Representing discourse in context. In J. van Benthem & A. T. Meulen (Eds.), Handbook of logic and linguistics (pp. 179–237). Oxford, England: Elsevier.
50
TABOR
Von Bremmen, H., Udwadia, F. E., & Proskurowski, W. (1997). An efficient method for the computation of Lyapunov numbers in dynamical systems. Physica D, 110, 1–16. Wolf, A., Swift, J., Swinney, H., & Vastano, J. (1985). Determining Lyapunov exponents from a time series. Physica D, 16, 285–317. Zipf, G. K. (1949). Human behavior and the principle of least effort: An introduction to human ecology. Cambridge, MA: Addison-Wesley.
APPENDIX: LYAPUNOV EXPONENTS FOR SYMBOL-DRIVEN STOCHASTIC DYNAMICAL SYSTEMS Let r r xt+1 = f ( xt+1 )
( A1)
r be a discrete, deterministic dynamical system with n dimensional state, x. The r Lyapunov exponents, λi for i = 1, … , n, of the trajectory starting at x are the logarithms of the eigenvalues of the matrix 1 r r OSL( x ) = lim( Dft )T ( Dft ))2t ( x ) t®¥
( A2)
r r where T denotes transpose, Df(x) is the Jacobian of f at x and r r Df ( x ) = D[ ft o ft-1 o ... o f ( x )] = Df ( rxt )× Df ( rxt-1 )× ... × Df ( rx )
( A3)
r For x in the basin of a single attractor, the values of the eigenvalues are essentially r independent of the choice of x so we may speak of the Lyapunov exponents of the attractor. From another perspective, the ith eigenvalue measures the average rate of growth of the ith principle axis of infinitesimal ellipses surrounding points on the attractor (Wolf, Swift, Swinney, & Vastano, 1985). The sum of the Lyapunov exponents indicates the global stability of the system: The sum must be negative for the system to have Lyapunov stability. If all the exponents are negative, then the system is a limit cycle and visits only finitely many points. If at least one exponent is positive (and the sum is negative), then the system is chaotic. The case in which the greatest Lyapunov exponent is 0 in a discrete system is a special case that can yield complex behavior (famously for the logistic map at the “edge of chaos”; Crutchfield, 1994). Definition (A2) can be extended to symbol-driven stochastic dynamical systems (such as the neural networks discussed previously) as follows: Because of the nondeterminicity, we have to let the definition of eigenvalues depend not only on
SYMBOLIC COMPUTATION
51
the initial condition but also on the specific random sequence of inputs, S, which the autonomously functioning network chooses: 1 r r OSL( x, S) = lim( Dft )T ( Dft ))2t ( x ) t®¥
( A4)
In Tabor (2001), I provide simulation evidence that this particular extension of the definition of Lyapunov exponents to stochastic systems is useful in the sense r that the logarithms of the eigenvalues of this matrix are constants for almost all x and corresponding sequences S. In other words, the generalized Lyapunov exponents defined by (A4) and the standard Lyapunov exponents defined for deterministic systems are invariants with respect to the same types of sets. This suggests that they provide a stable characterization of the stochastic system’s expansion and contraction behavior. Encouraged by such results, I use these measures here as a way of characterizing and comparing various stochastic dynamical systems (dynamical automata and learning neural networks). I conjecture also that the Lyapunov exponents of the stochastic dynamical systems provide analogous classificatory information about dynamical regimes of the stochastic systems, as the Lyapunov exponents of the deterministic systems provide for deterministic systems For example, negative exponents indicate limit cycles; a negative sum of exponents indicates a dissipative system; positive exponents in a dissipative system indicate chaos, and so forth (see Abarbanel, 1996). In the case of pushdown dynamical automata such as DA 0 above, the characr teristic exponents can be calculated exactly. If DA 0 starts at z = (½,½), then every trajectory produces a string with an equal number of as, bs, cs, and ds. The Jacobian 1 0 for the b and c transitions is I, the identity matrix. For the a transitions, it is 2 1 , 0 2 2 0 and for the d transitions, it is . Therefore, the Oseledec matrix is I in the 0 2 limit and both Lyapunov exponents are 0. A similar analysis applies anytime a fractal set is used to keep track of a stack memory, where well-formedness maps to an empty stack. In the general case, there must be at least one dimension of fractal scaling, and that dimension will produce a zero Lyapunov exponent. The Lyapunov values reported in the main body of the paper were calculated using the algorithm of Von Bremmen, Udwadia, and Proskurowski (1997).
ECOLOGICAL PSYCHOLOGY, 14(1–2), 53–86 Copyright © 2002, Lawrence Erlbaum Associates, Inc.
Fractal Characteristics of Response Time Variability John G. Holden Department of Psychology California State University, Northridge
The relations between English spellings and pronunciations have been described as a fractal pattern. Manipulations of word properties are constructed to coincide with the fractal pattern of ambiguity in these relations (sampled as random variables). New word naming and lexical decision experiments replicate previously established effects of relations between word spellings and pronunciations. The structure of these relations establishes a measurement scale, which is nested in a manner that is loosely analogous to the way that centimeters are nested within meters, and so on. The experiments revealed that broader, increasingly stretched—more variable—distributions of pronunciation and response time accompany the introduction of higher resolution measurement scales to the naming and lexical decision measurement process. Moreover, the distribution of lexical decision response times is shown to obey an inverse power-law scaling relation, which implies lexical decision does not conform to a characteristic measurement scale.
Fractal patterns in nature are composed of nested forms, which cannot be measured on a single scale of measurement. The result of a measurement will depend on the size of the increment, or scale, used in the process of measurement (Bassingthwaighte, Liebovitch, & West, 1994; Mandelbrot, 1983; Schroeder, 1991). For example, the measured length of the British coastline will increase proportionally if the “yardstick” is shortened from kilometers to meters. An even shorter centimeter measurement scale would result in a further proportional increase in the measured length of the coastline. The changing measurements result from using regular line segments, the yardstick, to approximate the nested self-similar structure of the coastal bays and peninsulas. The length measurement increases when a Requests for reprints should be sent to John G. Holden, Department of Psychology, California State University, Northridge, 18111 Nordhoff Street, Northridge, CA 91330–8255. E-mail: jay.holden@ csun.edu
54
HOLDEN
bay or peninsula that is not captured at a lower resolution adds length at a higher resolution. Thus, as smaller and smaller sub-bays and subpeninsulas are resolved, they add to the length of the coastline. When measurements change as a function of measurement scale, there is no “true” or characteristic value for the measurement. The length of the British coastline grows as a power function of the precision of the yardstick used to measure length (plus a constant). This power-law scaling relation implies that results of a measurement depend on the measurement scale or sampling unit used to take the measurement (over a finite range of scales). Power-law scaling relations, linear relations between the logarithms of the scale and the logarithms of the measurement result, are commonly observed of natural phenomena described using fractal geometry and are symptomatic of self-similar patterns (Bassingthwaighte et al., 1994). In mathematical fractals, this nonlinear, multiplicative measurement amplification can appear at an infinite range of scales, a precise form of self-similarity. In natural fractals, the most common form of self-similarity appears to be statistical self-similarity. Nevertheless, the result of a measurement of a natural fractal is also amplified in proportion to the measurement scale. Two naming experiments follow that measure variability in pronunciation times, the time to initiate a pronunciation from a word’s spelling. The experiments contrast words that have nested sources of ambiguity in the relations between their spellings and pronunciations. Ambiguity is conceived here as abstractly parallel to the nested structure of a coastline. A coastline is jagged at several scales: Large bays nest smaller bays within them, and so on. Likewise, the pronunciation of words’ spellings are ambiguous at several scales, as I explain in the next section. Just as length accumulates as a scale gets access to successively nested bays, variability in pronunciation time is expected to grow as ambiguity manipulations access the nested ambiguity in the relation between words’ spellings and pronunciations. Similarly, two lexical decision experiments measure the time to recognize a word as a word—the time to respond that a spelling pattern is a legitimate English word. In this case, ambiguity appears at different resolutions in the relations between words’ pronunciations and their spellings—now emphasizing the inverse mapping from pronunciation to spelling. As in naming, the measure of variability in recognition time is expected to grow as nested sources of ambiguity accrue in the relations between pronunciation and spelling. Response-time distributions that are increasingly stretched by nested ambiguity may be expected if word naming and recognition performance are fractal processes. In contrast to the fractal perspective, conventional word-recognition theories offer no a priori basis for systematic changes in response-time variability. Standard inferential statistical procedures assume that, in the long run, measured (dependent) variables yield approximately symmetrical Gaussian distributions. As a consequence, the term variability is often used to refer specifically to the variance or standard deviation of a distribution in that framework. I use the term variability more inclusively—more variable distributions encompass a wider range
RESPONSE TIME SCALING RELATIONS
55
of numerical values. Thus, increases in variability loosely refer to simultaneous increases in parameters, such as variance and skew. Ultimately, however, increasing variability refers here to a pattern of progressive “stretching” in the shapes of density functions of pronunciation and lexical decision times.
SPELLING AND PRONUNCIATION Van Orden, Pennington, and Stone (2001) described the relations between English spellings and pronunciations as a fractal pattern. They described relations at three scales, grapheme-phoneme, body-rime, and whole-word, which I illustrate next. Ambiguous Graphemes The grapheme B, at the beginning of a word, is always pronounced /b/, and the phoneme /b/ is always spelled B. This defines an invariant, symmetrical relation between a grapheme and a phoneme. Some consonant graphemes and phonemes have invariant relations, but vowel spellings are always ambiguous. Ambiguity implies a spelling that can be pronounced in more than one way (or phonemes that can be spelled in more than one way). Vowel spellings can always be pronounced in multiple ways. For example, duck, burn, and dude include multiple pronunciations of the spelling _ u _ _. Similarly, the phoneme /u/ can be spelled _ u _ _, _ o _ _, or _ o e _ , respectively, in duck, monk, and does. These examples illustrate the two forms of ambiguity in vowel graphemes and phonemes. Ambiguous vowel graphemes map onto more than one phoneme, and ambiguous vowel phonemes map onto more than one grapheme. Ambiguous Spelling Bodies Ambiguous vowel graphemes sometimes rely on contextual constraints to specify a single pronunciation option. An example is the context provided for the spelling _ u _ _ by the relation between a word’s spelling body (e.g., _uck) and its pronunciation rime (e.g., /uk/). The word duck includes an invariant, symmetrical relation between its spelling body, _uck, and its pronunciation rime, /_uk/. All words that contain the spelling body _uck are pronounced to rhyme with duck (e.g., luck and buck), and all words that are pronounced to rhyme with /_uk/ are spelled using the same _uck spelling pattern. The vowel letter _ u _ _ entails a large set of potential relations between spelling and pronunciation, which subdivides into smaller sets of relations, one of which corresponds to _uck ↔ /uk/. It is in this sense that a body-rime relation is nested within a large set of potential vowel relations. Invariant body-rime relations such as _uck ↔ /uk/ collapse the ambiguity of the vowel relation into a single rime pronun-
56
HOLDEN
ciation. Invariant body-rime relations limit ambiguity to the scale of graphemephoneme relations. Body-rime invariant words only require that grapheme ambiguity be resolved to specify a single pronunciation option. Thus, pronunciation time variability based on these words reflects ambiguity at the grapheme scale. Invariant body-rime relations are relatively rare in English because most spelling bodies are ambiguous (Stone, Vanhoy, & Van Orden, 1997). The spelling body _int in pint, for example, is pronounced /aInt/ in pint but also maps to /Int/ as in mint. An ambiguous spelling body reveals ambiguity over and above vowel ambiguity at the grapheme scale. Body-rime ambiguous words require that grapheme ambiguity and spelling-body ambiguity be resolved to specify a single pronunciation option. Pronunciation time variability based on these words reflects ambiguity at the grapheme scale and the spelling-body scale. Ambiguous Whole Words Ambiguity in the _int vowel pronunciation is resolved once the context supplied by the whole-word spelling p-i-n-t is taken into account. The whole word pint has an invariant, symmetrical relation between its whole-word spelling and its wholeword pronunciation. Pint has only one correct pronunciation, and its pronunciation only has one legitimate spelling. This invariant symmetrical relation is “contained” within the set of potential body-rime relations for _int. Taking into account the whole-word spelling has further subdivided _ i _ _ and _int’s potential relations into a particular relation, pint ↔ /paInt/. Invariant whole-word relations limit ambiguity to the grapheme and spelling-body scales. Most English words have invariant whole-word relations, but there are important exceptions. Homographs are whole words with more than one pronunciation. For example, the homograph lead has two legitimate pronunciations: one rhymes with bead, the other with head. This illustrates ambiguity in a whole-word spelling. In a laboratory setting that presents lead in relative isolation, a unique vowel pronunciation cannot be resolved from available episodic constraints—they would be balanced between the two pronunciations. Historical sources of constraint may tip the balance, however. A participant’s unique history of literate discourse, and consequent internalized constraints, will favor one or the other pronunciation (cf. Kawamoto & Zemblidge, 1992). Homographs reveal ambiguity at all three scales of the relations between spelling and pronunciation. Thus, pronunciation time variability based on these words reflects ambiguity at the grapheme scale, the spelling-body scale, and the whole-word spelling scale. The overall pattern of relations between English spelling and pronunciation is proposed to be a fractal pattern of ambiguity. Ambiguous grapheme-phoneme relations can entail nested, ambiguous body-rime relations that, themselves, can entail nested ambiguous whole-word relations. As noted, homographs, such as lead, are ambiguous at all three scales of the nested relations between spelling and pronunciation.
RESPONSE TIME SCALING RELATIONS
57
SPELLING, PRONUNCIATION, AND THE BRITISH COASTLINE With respect to the coastline analogy, a word in which ambiguity is limited to the grapheme scale would correspond to a relatively low-resolution “kilometer scale” to measure how vowel ambiguity affects pronunciation time variability. In contrast, spelling-body scale vowel ambiguity amplifies grapheme scale ambiguity and pronunciation time variability, compared to a word with a spelling body that specifies a single pronunciation option. It reveals more of the process that resolves ambiguity in performance measures. Similarly, homographs amplify vowel ambiguity and pronunciation time variability, over and above ambiguity at the spelling-body scale, and reveal still more of the process that ends in a pronunciation. The empirical goal of this research is to track changes in pronunciation and lexical decision time variability that accompany the separate scales of relations between word spellings and pronunciations. Distinguishing the perspective of a means analysis from the perspective of a variability analysis serves as an entry point for the hypotheses that follow. Pronunciation times and lexical decision times are typically viewed as random variables. Presenting particular words under particular circumstances yields a range or distribution of response-time measurements. A means analysis assumes the existence of a characteristic pronunciation time that is shrouded by unsystematic, additive sources of noise. By contrast, a variability analysis is concerned with identifying systematic changes in variability; it assumes that the pattern of changes in variability is informative, that it may reflect the intrinsic dynamics of the system (cf. Riley & Turvey, in press). Just as I might ask, “What is the length of the British coastline?” and get different answers with different scales—so may I ask, “How variable are response times?” and get different answers from different scales of ambiguity. My main prediction is that using progressively higher resolution scales to measure performance will yield a pattern of progressive increases in the parameters that describe variability—that is, the shape of pronunciation and response-time distributions. Changing scales will not simply shift a distribution’s mean. Visually, this will appear as a progressive “stretching” of a response-time distribution toward very slow response times. Each scale of resolution in ambiguous relations, sampled as random variables, should amplify the variability observed using a lower resolution ambiguity scale. A similar pattern can be observed in the distributions of synthetic pronunciation times generated by recurrent (iterative) network models of word naming (I. Choi, personal communication, October 31, 2001). The reason that these patterns emerge in a model’s behavior motivate my predictions concerning human performance. Multiple, mutually inconsistent relations (connections) among nodes are an iterative model’s analogue to vowel ambiguity. Iterative networks multiply a vector of node values by a matrix of connection weights, and the result serves as the next input vector. Past that point, it continues to iterate until a relatively consistent, sta-
58
HOLDEN
ble pattern of node values emerges. This means that iterative models rely on multiplicative interactions to stabilize their dynamics. As such, an iterative model’s distribution of finishing times reflects multiplicative interactions among random variables. The product of two or more random variables yields a lognormal distribution, a point I return to later (Ulrich & Miller, 1993; West & Deering, 1995). All else being equal, active, mutually inconsistent node relations decrease the probability that a model will stabilize its interaction on a given iteration. Over many synthetic trials, competing relations may reveal a wider range of a model’s finishing times because reducing the probability that a model will stabilize on any particular iteration does not preclude it from doing so in the long run. This is one basis of my prediction that access to multiple scales of mutually inconsistent pronunciation options, for example, may amplify variability in empirical distributions of response times.
EXPERIMENT 1: GRAPHEME SCALE AND SPELLING-BODY SCALE AMBIGUITY Experiment 1 uses a word naming task, which measures the time required to pronounce individual words and replicates the standard effect of spelling-body ambiguity (pint, hereafter called the spelling-body scale), in a contrast with spelling-body invariant words (duck, hereafter called the grapheme scale; Glushko, 1979; Jared, McRae, & Seidenberg, 1990). It tests whether the conventional mean effect of spelling-body scale ambiguity is better described as a skewing, or more precisely, a stretching of pronunciation time distributions toward slower times—an increase in variability. This hypothesis is tested by contrasting distributions of pronunciation times at the grapheme scale (duck) with those at the spelling-body scale (pint). This experiment was constructed to closely mimic previous demonstrations of this effect. In this, and all following experiments, contrasts were accomplished by item yoking. Each grapheme scale item was yoked to a spelling-body scale item. Yoked item pairs allowed for within-subject distribution analyses; each participant served as his or her own control. Yoked items were matched with respect to variables, other than spelling-body ambiguity, that affect pronunciation time. Only item pairs that both produced correct pronunciation times, from the same participant, were included in statistical analyses. (In every experiment, analyses that admitted error pronunciations and response times yielded the same pattern of results.)
Method Participants. Thirty-five introductory psychology students, who were all native English speakers, participated in the experiment.
RESPONSE TIME SCALING RELATIONS
59
Materials. The spelling-body scale of ambiguity is defined with respect to word bodies (Jared et al., 1990). The key stimuli consisted of 21 yoked pairs of grapheme scale (duck) and spelling-body scale (pint) items. All items were monosyllabic four- and five-letter English words. A word frequency count estimates the rate of recurrence (per million words) of individual words in large samples of printed text. Words that have high frequency counts appear more often in samples of printed text than do words with low frequency counts. Frequently encountered words are responded to more quickly and accurately in a variety of word-recognition tasks, such as word naming and lexical decision. I yoked the grapheme scale and spelling-body scale words with respect to their frequency counts: Grapheme Scale, M = 4.95 (SD = 4.19); Spelling-Body Scale, M = 5.33 (SD = 4.78). I also yoked items with respect to their number of letters: Grapheme Scale, M = 4.38 (SD = 0.50); Spelling-Body Scale, M = 4.43 (SD = 0.51). In addition, a voice key takes different amounts of time to detect different initial phonemes, so I matched the words on that basis as well. Word bodies are spelling patterns (e.g., _int in mint and pint) that define a spelling neighborhood. A spelling neighborhood is composed of all monosyllabic words that include the same spelling body. For example, pint, mint, lint, and all other words ending in _int comprise _int’s spelling neighborhood. Ambiguous spelling bodies, such as _int, map to more than one pronunciation rime. Body-rime relations can be divided into dominant and subordinate relations. For example, with the exception of pint, every word in _int’s spelling neighborhood uses the _int ↔ /Int/ relation. Thus, _int ↔ /Int/, as in mint, is _int’s dominant body-rime relation. By contrast, _int ↔ /aInt/, as in pint, is _int’s subordinate body-rime relation. A pronunciation ratio estimates the relative dominance of a body-rime relation. The pronunciation ratio is computed by dividing the sum of the frequencies of words that share a body-rime relation into the overall sum of the frequencies of all words that have the spelling body. In cases where there are exactly two possible pronunciations, and this ratio is less than one half, the body-rime relation is a subordinate relation. In cases where there are several alternative body-rime relations in a spelling neighborhood, the subordinate relation is the relation with the smallest pronunciation ratio. Words that are both body-rime and rime-body invariant are relatively rare in English, which severely restricted the available items (Stone et al., 1997). Rimebody ambiguity refers to a pronunciation rime that has multiple spellings (e.g., /ûrn/, as in fern, could be spelled _urn, as in turn). Although all items were strictly body-rime invariant, the definition of rime-body invariance was relaxed to rimebody dominance for 7 grapheme scale items and 12 spelling-body scale items. Thus, rime-body dominant items appeared in similar proportions at both scales. The relaxed definitions were justified, in large part, because rime-body ambiguity does not appear to reliably affect pronunciation time (Ziegler, Montant, & Jacobs, 1997). Seven of the 21 grapheme scale items had alternate rime spellings in words with frequencies of occurrence greater than 1 per million. Most of these words were
60
HOLDEN
relatively obscure, with frequencies of 7 or less per million (Kuèera & Francis, 1967). Two notable exceptions were the target weed, with rime /ed/, which may also be spelled _ead, as in bead, and the target quake, with rime /ak/, which may also be spelled as in break. Of the 12 spelling-body scale targets, 10 word rimes had alternative spellings in words with frequency counts greater than 10 per million. Again, rime-body status has, so far, failed to reliably affect pronunciation time, so these items are not expected to influence the outcome of the experiment on the basis of their rime-body status. Almost all the spelling-body scale targets were chosen to have subordinate body-rime relations, rouge and wharf being the exceptions. Also, with one exception, the target broth, I used no items whose whole-word dominant pronunciation ratio was derived from a single very high frequency word. Body-rime invariant items that have large spelling neighborhoods, with several high-frequency members, were preferred in the selection of grapheme scale items. For example, duck has several high-frequency neighbors (e.g., truck, luck, and stuck). Hermit words that are themselves the only member of a spelling neighborhood were not used as targets. None of the target spelling-bodies appeared more than once in the experiment. The yoked pairs appear in Appendix A.
Procedure. A trial began with a fixation stimulus (+++) presented for 415 msec at the center of a standard PC screen. The fixation was followed by a blank screen for 200 msec, after which a target word appeared. Participants named aloud the target word, quickly and accurately (as instructed). A voice key registered the participant’s pronunciation time, and the target remained visible until the next trial was initiated. The participant pressed one of two keys to initiate the next trial. A correct key signaled a successful trial, an error key signaled that they had mispronounced the word. An intertrial interval of 1,200 msec occurred following the key press. (The naming procedure was modeled after Spieler & Balota, 1997.) During the experiment, the experimenter sat out of the participant’s view and transcribed errors. This ensured that an improper “error” key press by the participant did not jeopardize coding. Also, some words, such as hoof, are pronounced differently in different regions of the country. Thus, on occasion, a participant produced an ambiguous pronunciation. These pronunciations were verified by having the participants pronounce the item again, the way they typically pronounce it, at the end of the experiment. Participants began by completing 50 randomly ordered practice trials and received feedback from the experimenter during the practice trials concerning what constitutes an error. This was followed by 5 fixed-order buffer trials and a 140-trial experimental block. The 42 experimental stimuli were embedded in a set of 98 filler items. The items were presented in a pseudorandom order, which ensured targets were randomly but evenly distributed among subsets of fillers. The entire experimental procedure required approximately 20 min to complete.
RESPONSE TIME SCALING RELATIONS
61
Results and Discussion Variability analysis. I used one-tailed significance tests, with alpha set at .05, for all planned contrasts in this and all experiments that follow, because all involve directional hypotheses. As noted in the introduction, the higher resolution spelling-body scale is expected to amplify the parameters that describe the shapes of the pronunciation time distributions (i.e., variability and skew). Evidence of this pattern is revealed in a stretching analysis. I could not directly test the stretching hypothesis with standard variance and skew statistics, however. Higher moments of response-time distributions are difficult to estimate because they fluctuate wildly (Ratcliff, 1979). This is to be expected from attempts to measure a process that has no characteristic scale of measurement. Furthermore, variance (and skew) are computed relative to a distribution’s own mean (and variance), and meaningful contrasts require that both distributions have the same mean and variance. Instead, I recorded the first and third quartiles of each participant’s distribution of correct grapheme scale and spelling-body scale pronunciation times. A pattern of stretching in the spelling-body scale distribution, with respect to the grapheme scale distribution, is established if the difference between the third quartiles of the two distributions is reliably larger than the difference between the same distribution’s first quartiles. This pattern would appear as an overadditive interaction in a 2 × 2 repeated measures analysis of variance (ANOVA), which crosses the two ambiguity scales with the two quartile measures. The interaction term of an ANOVA that treated participants as a random variable was statistically reliable, Fp(1, 34) = 14.30, p < .05, as was the interaction term for an ANOVA that treated items as a random variable, Fi(1, 20) = 10.94, p < .05. I used planned contrasts to reveal pattern of the interactions. On average, the difference between the first quartiles of the participant analysis grapheme and spelling-body scale distributions was 24 msec and, as predicted, the average difference between the third quartiles of the same distributions was larger, at 62 msec. Both differences were statistically reliable, tp(34) = 3.54, p < .05, and tp(34) = 5.53, p < .05, respectively. Likewise, the average difference of 21 msec between the first quartiles of the item-analysis grapheme and spelling-body scale distribution was not statistically reliable. However the 82 msec difference between the third quartiles of the same two distributions was statistically reliable, ti(20) = 3.84, p < .05. The stretching pattern established by these analyses corroborates the hypothesis that variability in pronunciation time is amplified as ambiguity scales—the spelling-body scale—access nested ambiguity in the relation between word spellings and pronunciations. Multiplicative interaction. The stretching pattern is readily visible in the pronunciation time distributions. I estimated probability density functions for the grapheme scale and spelling-body scale distributions by transforming the empirical pronunciation time distributions into smoothed probability density curves with a
62
HOLDEN
nonparametric lognormal kernel-density estimator (Silverman, 1986). A lognormal kernel density represents each pronunciation time measurement as a small lognormal curve, rather than a point, centered about its recorded value. A probability density is estimated by summing (and normalizing) the area occupied by the kernels over successive intervals of pronunciation time. (A lognormal kernel corresponds to applying a symmetric Gaussian kernel to the logarithms of the pronunciation times and then transforming the resulting density back into the linear domain.) The extent of smoothing is controlled by adjusting the standard deviation of the kernel, just as one might adjust the width of a histogram bin. The smoothing parameter was chosen using a formula based on the standard deviation of the pronunciation time distributions (cf. Silverman, 1986; Van Zandt, 2000). I used a lognormal kernel because it progressively increases the amount of smoothing as it moves into the sparse, slow tail of the distribution. This is appropriate for positively skewed distributions. It preserves detail in the dense front end of the distribution while simultaneously compensating for the lower proportion of raw data points in the slow tail of the distribution. A lognormal empirical density function will closely resemble a standard Gaussian density function after a logarithmic transformation of the raw data. The approximate lognormal shape of response-time distributions has been previously noted (e.g., Ratcliff & Murdock, 1976). The dashed line in Panel A of Figure 1 depicts an ideal Gaussian distribution with equivalent variability, centered about the median
FIGURE 1 The solid line of Panel A pertains to the kernel-smoothed density function of pronunciation times to the grapheme scale items after a logarithmic transformation. The dashed line of Panel A represents an ideal Gaussian curve, with the same standard deviation, centered on the empirical distributions’ median. The empirical distributions’ approximate Gaussian shape, in the logarithmic domain, indicates a close resemblance to a lognormal distribution and implies multiplicative interaction. Panel B contrasts variability on the grapheme and spelling-body scales, now in the linear domain. The dashed line depicts the variability to grapheme scale words. The solid line depicts the variability at the spelling-body scale. The distribution at the spelling-body scale displays a longer, more stretched tail than that of its grapheme scale counterpart.
RESPONSE TIME SCALING RELATIONS
63
of the empirical grapheme scale distribution depicted by the solid line (treating the x axis as a linear scale). The close similarity of the grapheme scale distribution to an ideal Gaussian distribution, in the logarithmic domain, could imply that word naming performance entails multiplicative interactions among system parameters, a point I elaborate in the Summary. The variability changes that accompanied the change in measurement scale are apparent in a visual contrast to the shapes of the two density functions of pronunciation time. Individual parametric estimates of variability, such as variance and skew, describe specific aspects of the shape of a distribution—they are abstracted, shorthand shape descriptors. A visual examination of the entire distribution supplies a more complete account of the shape of a distribution, however. The change from the grapheme scale to the spelling-body scale amplified the measured variability in the observed probability density functions. Probability density functions for the two scales are plotted in Panel B of Figure 1. The dashed line portrays the distribution of observed variability at the grapheme scale. The solid line portrays the variability on the spelling-body scale. Variability in the pronunciation times at the spelling-body scale appears as a longer, more stretched slow tail compared to the grapheme scale. Although the modes of the two distributions are similar, the spelling-body scale yields a distribution with many more extreme, slow-pronunciation times. Next, I describe a systematic replication, expanded to include contrasts among all three scales of spelling ambiguity.
EXPERIMENT 2: GRAPHEME VERSUS SPELLING-BODY VERSUS WHOLE-WORD SPELLING AMBIGUITY Experiment 2 estimates variability in pronunciation times on all three scales of vowel ambiguity. Pronunciation time distributions on a whole-word scale, which uses homographs such as lead as targets, are contrasted with distributions on a spelling-body scale, which uses words such as pint as targets. Pronunciation times on a spelling-body scale are in turn contrasted with pronunciation times on a grapheme scale, using words such as duck (a direct replication of Experiment 1). Items were yoked across all three scales, which allows a simultaneous contrast of the variability on the three scales. Previous research has established that homographs yield exaggerated pronunciation times compared to nonhomographic words (Kawamoto & Zemblidge, 1992). I conducted Experiment 1 to establish a stronger variability contrast between the grapheme and spelling-body scales than I could accomplish in this three-scale contrast. Most homographs are high-frequency words, and performance tends to be asymptotic for high-frequency words (cf. Jared, 1997). This weakens the contrast of the three scales. Furthermore, there are very few homographs in English (compared to Hebrew, for example) and only seven homographs were suitable for
64
HOLDEN
use in this experiment. Consequently, the three-scale contrast relies on fewer items than the two-scale contrast of Experiment 1. Given the clear demonstration of variability changes at the grapheme and spelling-body scales established by Experiment 1, I gave priority to the control factors between the whole-word spelling scale (homographs) and the spelling-body scale in Experiment 2. Homographs tend to be pronounced with their statistically dominant pronunciation (see pronunciation ratio in Method), which requires a contrast to the dominant pronunciation on a spelling-body scale. More often than not, mint would be a proper contrast for most pronunciations of a homograph, but not pint, for example. As such, the contrast between variability on the grapheme scale and variability on the spelling-body scale is weaker in this experiment than it was for Experiment 1. Because I could not know in advance which of the alternate pronunciations a participant might provide to a homograph, I selected two different control items for each homograph. If naming resembles a fractal or iterative process, then admitting more nested scales of vowel ambiguity into the measurement procedure should change the result of the measurement. Each scale of resolution of variability should nonlinearly amplify the variability measurements taken at lower resolutions. The result of the measurement would thus depend on the measurement scale used to measure pronunciation time variability. Method Participants. Thirty-five additional introductory psychology students, who were all native English speakers, participated in the experiment. Materials. The key stimuli were 7 four- and five-letter monosyllabic homographs. The critical control variable concerns the relative frequency of the two alternative body-rime relations that may be used to determine a pronunciation for a homograph. Each homograph has two legitimate pronunciations: a statistically dominant pronunciation, defined as the most frequently occurring body-rime relation and a subordinate body-rime relation corresponding to the less frequently occurring, alternative body-rime relation. For example, the homograph bass has a frequency count of 16 occurrences per million. Its statistically dominant pronunciation rhymes with mass. Adding the frequencies of all the words that rhyme with mass, and that use either the _ass ↔ /as/ or the _ass ↔ /ace/ body-rime relation, including bass, yields a total summed frequency count of 603 per million (e.g., class, pass, etc.). The statistically subordinate pronunciation of bass rhymes with space, and it is the only word in the _ass spelling neighborhood that uses the _ass ↔ /ace/ body-rime relation. Its maximum estimated frequency can be no greater than 16 occurrences per million. For each homograph, the subordinate pronunciation ratio was computed by dividing the summed frequency of words that have the subordi-
RESPONSE TIME SCALING RELATIONS
65
nate body-rime into the total summed frequency of the words that contain the spelling body. The relative dominance of a body-rime relation was estimated by subtracting the subordinate ratio from one. If no other words in the homograph’s spelling neighborhood used the alternative body-rime relation, the frequency of the homographic letter string itself was used to estimate the subordinate pronunciation frequency. Thus, in the case of bass (pronounced to rhyme with space), the subordinate body-rime ratio was .03 (i.e., 16/603), and the dominant body-rime ratio was .97 (i.e., 1 – .03). The same procedure was applied to wound and wind; both letter strings represented the only alternative body-rime relation in their respective spelling neighborhoods. The controls chosen to yoke with a homograph’s dominant pronunciation were spelling-body scale ambiguous words matched, as closely as possible, to the homograph’s dominant pronunciation ratio. Likewise, controls for the subordinate pronunciation were matched to the homograph’s subordinate pronunciation ratio. The pronunciation that was actually provided for each homograph by each participant determined which control item was used in subsequent analyses. The dominance assignments for three items were equivocal, because reasonable alternative assignment procedures could have been adopted. The spelling neighborhood for lead includes another homograph, read; I assigned the frequency of lead to one body-rime relation and read to the other. The spelling neighborhoods of dove and close allow three possible body-rime relations, and in the case of dove it could be argued that its neighborhood does not have a dominant body-rime relation. In all three cases, the other yoking constraints limited the list of candidate controls to the point that no better matches were available, using either method for computing the dominance ratio. The mean relative dominance ratio was .10 (SD = .07) for the homograph’s subordinate pronunciations and .13 (SD = .08) for the subordinate spelling-body scale yokes. The mean relative dominance ratio was .90 (SD = .07) for the homograph’s dominant pronunciations and .73 (SD = .10) for the dominant spelling-body scale yokes. In addition, all the items were matched for length: Grapheme Scale, M = 4.00 (SD = 0.00); Dominant Spelling-Body Scale yokes, M = 4.14 (SD = 0.38); Subordinate Spelling-Body Scale yokes, M = 4.14 (SD = 0.38); Whole-Word Spelling Scale, M = 4.29 (SD = 0.49); and word frequency: Grapheme Scale, M = 57.14 (SD = 68.00); Dominant Spelling-Body Scale yokes, M = 91.14 (SD = 139.63); Subordinate Spelling-Body Scale yokes, M = 57.29 (SD = 79.97); Whole-Word Spelling Scale, M = 69.29 (SD = 84.60). Care was taken to ensure close frequency matches for items with relatively low frequency counts. A minority of words occur often enough to be classified as high-frequency words. In order to increase the pool of available yoking candidates, the frequency-matching criteria for high-frequency items was relaxed. Likewise, the strict initial phoneme-matching criteria were relaxed to allow yoking of categorically similar initial phonemes, which will trigger a voice key in a similar way (S. Goldinger, personal communication, April 6, 2000). The yoked quartets appear in Appendix B.
66
HOLDEN
Procedure. The timing of target presentations and the practice procedures were identical to Experiment 1. Practice trials were followed by a 134-trial experimental block. It began with 6 fixed-order buffer trials, followed by 126 pseudorandomly ordered trials. Each set of four yoked targets was presented randomly within blocks of fillers, and the blocks themselves were presented in a random order. During the experiment, the experimenter sat out of the participant’s view and transcribed homograph pronunciations and errors. The entire experimental procedure required approximately 20 min to complete. Results and Discussion Variability analysis. A pattern of stretching at the spelling-body scale, as compared to the grapheme scale, is established if the difference between the spelling-body scale distribution’s third quartiles is reliably larger than the average difference between the same distribution’s first quartiles. This would appear as an overadditive interaction in an ANOVA that crosses the two scales with the pronunciation time measures taken at the two quartiles. I tested the participant- and item-interaction terms using 2 × 2 repeated measures ANOVAs. The interaction term for the participant analysis was statistically reliable, Fp(1, 34) = 4.55, p < .05; the interaction term for the item analysis failed to reach statistical significance. The null outcome of the item analysis is not surprising. The effects of most standard factors that affect naming are attenuated to high-frequency words (Van Orden, Pennington, & Stone, 1990). Also, the homograph yoking constraints imposed a weaker contrast between these two scales, and after all, there were only seven items at each scale. Planned contrasts were used to reveal the pattern of the participant-analysis interaction. The average difference of 2 msec between the first quartiles of the grapheme and spelling-body scales was not statistically reliable, but the average difference of 24 msec between the third quartiles of the same distributions was reliable, tp(34) = 2.34, p < .05. I also used planned contrasts to determine whether the statistically unreliable interaction of the item analysis at the grapheme and spelling-body scales may yield an apparent stretching pattern. The average difference between first quartiles of the grapheme scale and spelling-body scale distributions was only 14 msec, although the average difference between the third quartiles of the same distributions was 25 msec. Although the item-analysis contrasts were not statistically reliable, the apparent pattern of the differences is consistent with the statistically reliable stretching pattern of the participant analysis. An important goal of Experiment 2 was to establish variability changes in a contrast between the spelling-body and whole-word scales. A pattern of stretching for the spelling-body and whole-word scales is established if the difference between the distributions’ third quartiles is reliably larger than the difference between the same distributions’ first quartiles. I tested two interaction terms, one treated participants as a random variable, the other treated items as a random variable. Both
RESPONSE TIME SCALING RELATIONS
67
analyses used a 2 × 2 repeated measures ANOVA. And both interaction terms were statistically reliable, Fp(1, 34) = 7.32, p < .05, and Fi(1, 6) = 14.79, p < .05, respectively. I again used planned contrasts to reveal the pattern of these interactions. The 21-msec average difference between first quartiles of the spelling-body and wholeword scale participant-analysis distributions was reliable, tp(34) = 3.42, p < .05, and the third quartiles of the same distributions yielded a larger, statistically reliable, average difference of 108 msec, tp(34) = 3.38, p < .05. Likewise, the average difference of 16 msec between the first quartiles of the spelling-body and wholeword pronunciation scale item distributions was statistically reliable, ti(6) = 2.16, p < .05, and the average difference of 46 msec between the third quartiles of the same distributions was larger and also statistically reliable, ti(6) = 3.76, p < .05. The result of the variability measurement appears to depend on the scale used to measure variability. The variability changes that track the three measurement scales are apparent in the probability density functions of the pronunciation times. Estimated probability density functions of the three distributions were obtained using the kernel-density estimation procedure described in Experiment 1. The empirical probability density functions of the three distributions are plotted in Figure 2. The dotted line depicts the pronunciation time variability at a grapheme scale. The dashed line depicts the pronunciation time variability on the spelling-body scale, and the solid line depicts the pronunciation time variability at the whole-word spelling scale. The qualitative pattern revealed by this contrast is identical to that revealed with the variability contrast at the grapheme scale and spelling-body scales in Experiment 1. The distribution of pronunciation times to words at the whole-word spelling scale has a longer, more stretched tail than the distribution to words at the spelling-body scale, which is in turn more stretched than the distribution to words at the grapheme scale. As before, the distributions illustrate a progressive divergence, primarily in their slow tails, to include more and more extreme, slower pronunciation times—a change in variability that is statistically reliable by participants. This stimulus set was comprised of mostly high-frequency words. High-frequency words are, by definition, more commonly encountered while reading. Thus, in contrast to the low-frequency items of Experiment 1, the high-frequency words’ pronunciation times are compressed more toward the fast end of the distribution. This can be seen in the more pointed (leptokurtic) peaks of the distributions in this experiment, compared to the distributions appearing in Experiment 1. Nevertheless, the slower tail of the whole-word scale distribution is stretched beyond that of the two other distributions and contains pronunciation times, even in the interval between 1,000 and 1,500 msec. To take a measurement of something is to make contact with it in some way. To measure the duration of the act of naming a printed word, I must constrain the act by presenting individual words. Control parameters are sampled probabilistically as particular words are presented to particular participants at particular points in
68
HOLDEN
FIGURE 2 The dotted line pertains to the kernel-smoothed density function of pronunciation times measured at the grapheme scale. The dashed line depicts the pronunciation time variability at the spelling-body scale. The solid line represents the pronunciation time variability measured on a whole-word spelling scale. The greater variability at the whole-word spelling scale is apparent in the density function’s longer, more stretched tail, as compared to the same measurement taken at the spelling-body scale, which is, in turn, more variable than the same measurement taken at the grapheme scale.
time. Presenting many words to many participants yields a range of sampled control-parameter values. As such, not all word presentations yield identical measurements. Density functions of pronunciation time yield probabilistic estimates of the range of naming-event durations. The visual and statistical analyses identified systematic differences in relative variability in pronunciation time—changes that exceeded the boundaries of change that are likely due to measurement error. Apparently, a wider range of control-parameter values were sampled as the resolution of the measurement scale was increased. The particular measurement procedures—the measurement scale—used to make contact with the naming act influenced the measurement result. All the measurements, taken using each class of words, each scale, are legitimate measurements. After all, there is no a priori reason to point to any particular scale as a tool that is more or less suitable for taking naming measurements. To consider the pro-
RESPONSE TIME SCALING RELATIONS
69
nunciation times to all words, together, would only increase the range of the measurements, which again implies that pronunciation time does not conform to a characteristic measurement scale. As in Experiment 1, the result of my variability measurement increased as I moved to more detailed measurement scales. Conventional word-recognition theories are silent as to why systematic increases in variability appear as the measurement scale is changed. Conventional perspectives assume that pronunciation times conform to a characteristic scale of measurement. Implicit in the assumption of a characteristic scale of measurement is the requirement that disagreement among results of measurements, taken at different scales, are unsystematic and due only to measurement error. However, a basic method used to characterize a fractal process is to describe precisely how the process interacts with the measurement procedure—to establish a scaling relation. As predicted, the variability measurement taken at a whole-word scale was amplified over and above the same measurement taken at a spelling-body scale, which was in turn amplified over and above the same measurement taken at a grapheme scale. Moreover, the increasing progressive rank ordering of the variability measurements is consistent with the idea that the process of resolving ambiguity may have no characteristic scale of measurement. The result of the variability measurement appears to depend on where I “step into” the process to take my measurement. After all, other things being equal, a homograph does not entail any more grapheme scale vowel ambiguity than a word such as duck whose vowel ambiguity is fully resolved at the lower resolution spelling-body scale. Next, I illustrate the same pattern of variability amplification in lexical decision performance using nested ambiguity in relations between pronunciations and patterns of spelling—the mirror image of the ambiguity scale in spelling–pronunciation relations.
EXPERIMENT 3: PHONEME SCALE VERSUS RIME-BODY SCALE AMBIGUITY The next two experiments use the lexical decision task. Lexical decision measures the time required to decide whether a presented letter string is a legitimate English word. The first lexical decision study is a conceptual replication of Experiment 1. Experiment 1 contrasted variability in pronunciation times for words on a grapheme scale versus a spelling-body scale of ambiguity. Naming performance requires the participant to supply a correct pronunciation; it emphasizes the relation between spelling and pronunciation (Ziegler, Montant, et al., 1997). Lexical decision performance emphasizes the inverse relation, the relation between word pronunciations and spellings (Stone et al., 1997; Ziegler, Montant, et al., 1997; Ziegler, Van Orden, & Jacobs, 1997). It may not be possible to know all that transpires in making a lexical decision. However, it is intuitive that one must judge whether the presented spelling is that
70
HOLDEN
of a known word. Previously established effects of rime-body ambiguity suggest that spelling verification takes this relation into account. If lexical decision entails something like this, then pronunciations that map to more than one spelling option may amplify variability in performance measures, just as spellings that map to more than one pronunciation do. This experiment estimates variability in response time using two scales of pronunciation ambiguity. Response-time distributions for words that have invariant rime-body relations (duck, hereafter called the phoneme scale) are contrasted with words that have rime-body ambiguous relations (e.g., compare /ûrn/ in fern and turn, hereafter called the rime-body scale). Again, with respect to the coastline analogy, this experiment contrasts variability estimated on the phoneme scale with variability estimated on the rime-body scale. The move from a phoneme scale to a rime-body scale moves to a more detailed measurement scale, “zooming in” on lexical decision performance. As with Experiment 1, the method of this experiment was constructed to mimic previous demonstrations of rime-body scale ambiguity. There is some question in the literature as to whether rime-body scale ambiguity reliably influences performance (Peereman, Content, & Bonin, 1998), and an independent replication, which relies on a strong manipulation of rime-body scale ambiguity, may help resolve this debate. Method Participants. Seventy-four introductory psychology students, who were all right-handed native English speakers, participated in the experiment. Materials. The key stimuli were 20 yoked pairs of phoneme scale and rime-body scale ambiguous words. Pronunciation rimes refer to the particular pronunciations of word bodies (e.g., /ûrn/, in fern and turn). A pronunciation neighborhood is defined as all the monosyllabic words that use that same pronunciation rime. For example, fern, turn, learn, and all monosyllabic words that rhyme with fern comprise /ûrn/’s pronunciation neighborhood. Ambiguous rime-body relations are pronunciation rimes, such as /ûrn/, that map to more than one spelling body. The spelling ratio provides a quantitative estimate of the relative dominance of a rime-body relation that is based on the word frequencies of words that either share or do not share particular rime-body relations in a pronunciation neighborhood. The spelling ratio of a rime-body relation is estimated by dividing the sum of the frequencies of words that share a rime-body relation into the overall sum of the frequencies of all words that have the pronunciation rime. If, in cases where there are exactly two possible spellings, this ratio is less than one half, the rime-body relation is a subordinate relation. In the case where there are several alternative rime-body relations in a pronunciation neighborhood, the subordinate relation is the relation with the smallest spelling ratio.
RESPONSE TIME SCALING RELATIONS
71
I only used rime-body scale ambiguous words that have subordinate rime-body relations, and I used no items that had whole-word dominant spelling ratios established by a single high-frequency word. Phoneme scale items that have large spelling neighborhoods with several high-frequency members were preferred in the selection process. For example, duck has several high-frequency neighbors (e.g., truck, luck, and stuck). Hermit words that are themselves the only member of a spelling neighborhood were not used as targets. Five rime-body invariant relations appeared more than once in the experiment. Only two of the items with invariant rime-body relations (rape and click) had alternative spelling-to-pronunciation relations with frequencies above zero per million. The exception spelling bodies are rare: Crepe has a frequency of one, and chic has a frequency of seven. All of the items chosen for the rime-body scale stimulus list were strictly body-rime invariant. This was important because spelling-body scale ambiguity affects lexical decision performance (Stone et al., 1997). The yoked item pairs were closely matched for word frequency: Phoneme Scale, M = 3.70 (SD = 2.60); Rime-Body Scale, M = 3.80 (SD = 2.31); and number of letters: Phoneme Scale, M = 4.45 (SD = 0.51); Rime-Body Scale, M = 4.50 (SD = 0.51). The experimental stimuli were embedded in a list of 138 filler words and 178 (half four-letter, half five-letter) pronounceable nonwords, drawn at random from a nonword database supplied by Greg Stone. The yoked item pairs appear in Appendix C. Procedure. A trial began with a fixation stimulus (+++) presented for 415 msec at the center of a standard PC screen. The fixation was followed by a blank screen for 200 msec, after which a target letter string appeared. Participants were instructed to respond quickly and accurately by pressing one of two keys; the yes key signaled a word letter string, and the no key signaled a nonword letter string. The letter string remained visible until the participant responded. An intertrial interval of 629 msec followed the key press. Each participant completed 50 practice trials immediately prior to the 356-trial experimental block, half of which were nonwords. The yokes containing repeated rime-body relations were separated by presenting one yoked pair, embedded among nonwords and fillers, at the beginning of the experiment, and the other pair, also embedded in nonwords and fillers, at the end of the experiment. The presentation order was counterbalanced across participants; post hoc analyses failed to detect a rime-body repetition effect. The remaining targets were distributed evenly among fillers and nonwords and presented in a pseudorandom order. The entire experimental procedure required approximately 20 min to complete. Results and Discussion Standard means analysis. I include a standard participant- and item-means analysis because there is some question in the literature as to whether rime-body
72
HOLDEN
scale ambiguity reliably affects lexical decision performance (Peereman et al., 1998). This analysis represents the typical manner in which the literature presents rime-body scale ambiguity as an “effect” or a difference in means. I wanted to use data from as many participants as possible, although ensuring that the included participants were engaged in the task. The data for 13 of the 74 participants were eliminated because their combined error rates to filler word trials and nonword foil trials exceeded 10%. All analyses were conducted on the response-time data for the remaining 61 participants. (Identical analyses, using stricter and looser exclusionary criteria, yielded the same qualitative pattern of results.) A standard participant- and item-means analysis was conducted on yoked correct response times that fell within the interval between 200 and 2,500 msec. This analysis revealed a reliable effect of rime-body scale ambiguity, which replicates the standard effect, tp(60) = 8.70, p < .05, and ti(19) = 4.17, p < .05. Mean response times to words on a phoneme scale, such as duck, were faster than mean response times to words on a rime-body scale, such as fern: Phoneme Scale, M = 654.37 (SD = 89.50); Rime-Body Scale, M = 739.37 (SD = 107.24).
Variability analysis. A pattern of stretching is established if the average difference between the third quartiles of the phoneme- and rime-body scale distributions is reliably larger than the average difference between the same two distribution’s first quartiles. I recorded first and third quartiles of each participant’s distribution of correct phoneme scale and rime-body scale response-time distributions. I then conducted the stretching analysis, as described in Experiment 1, by first testing for a reliable interaction of the average first- and third-quartile response-time measurements, across the two scales. I conducted two 2 × 2 repeated measures ANOVAs; the first treated participants as a random variable and the second treated items as a random variable. The interaction term for the participant analysis was statistically reliable, Fp(1, 60) = 23.77, p < .05, as was the interaction term for the item analysis, Fi(1, 19) = 8.57, p < .05. Planned contrasts revealed the pattern of the interaction. The average difference of 47 msec between first quartiles of the participant-analysis phoneme and rime-body scale distributions was statistically reliable, tp(60) = 7.43, p < .05, and the average difference of 123 msec between the third quartiles of the same two distributions was larger and also statistically reliable, tp(60) = 7.98, p < .05. Likewise, the average difference of 60 msec between first quartiles of the item-analysis phoneme and rime-body scale distributions was statistically reliable, ti(19) = 4.55, p < .05, and the average difference of 159 msec between the third quartiles for the same two distributions was larger and also statistically reliable, ti(19) = 3.77, p < .05. As predicted, the variability measurement resulting from using a higher resolution rime-body scale is larger than that found at the phoneme scale. A visual contrast of the probability density functions of these distributions, presented in the following figure, corroborates the results of the statistical analysis.
RESPONSE TIME SCALING RELATIONS
73
Probability density functions for the two distributions, constructed as described in Experiment 1, are plotted in Figure 3. The dashed line depicts the variability of the distribution of response times to phoneme scale words. The solid line depicts the density function representing the variability found at the rime-body scale. The qualitative pattern revealed in this contrast is virtually identical to the pattern revealed by the parallel contrast at the grapheme and spelling-body scales in naming. The distribution at the rime-body scale has a more stretched tail than the distribution at the phoneme scale. As predicted, the result of my variability measurement is amplified as I move to a higher resolution measurement scale. Next, I describe a nearly identical lexical decision experiment that included contrasts of all three scales of spelling ambiguity.
FIGURE 3 The kernel-smoothed probability density functions depict the response-time variability to words at the phoneme scale and rime-body scales. The dashed line depicts the variability measurement at the phoneme scale. The solid line depicts the variability measurement at the rime-body scale. The qualitative stretching pattern revealed by this contrast is virtually identical to the pattern revealed by the contrast at the grapheme and spelling-body scales in naming.
74
HOLDEN
EXPERIMENT 4: PHONEME VERSUS RIME-BODY VERSUS WHOLE-WORD PRONUNCIATION AMBIGUITY This final lexical decision study is a conceptual replication of Experiment 2, which contrasted pronunciation time variability on a grapheme scale, a spelling-body scale, and a whole-word spelling scale. Homographs have more than one legitimate pronunciation and thus entail all three nested scales of ambiguous spelling–pronunciation relations: grapheme-phoneme, body-rime, and wholeword spelling. Similarly, homophones refer to different words with different meanings but identical pronunciations. Because they are whole-word pronunciation ambiguous, they entail all three nested scales of ambiguous pronunciation–spelling relations. Experiment 4 measures variability in response-time distributions on a phoneme scale (duck), on a rime-body scale (fern), and on a whole-word scale (using homophones such as reed, hereafter called the whole-word pronunciation scale). I expect that the variability measurement, taken at the whole-word pronunciation scale will be larger than what one would expect, given their status as rime-body scale words. Detecting a variability increase in a contrast between the rime-body and whole-word pronunciation scales requires yoking whole-word pronunciation scale words with rime-body scale words that have similar spelling and pronunciation ratios (see Method). Previous research has established that homophones yield exaggerated lexical decision times compared to nonhomophonic words (Pexman, Lupker, & Jared, 2001). I conducted Experiment 3 to establish a stronger variability contrast between the phoneme and rime-body scales than I could accomplish in the three-scale contrast of this experiment. Both experiments used the same participants, which meant I could not reuse any phoneme scale words in the direct replication of the contrast at the phoneme and rime-body scales provided by Experiment 4; I gave priority to the contrast at the phoneme and rime-body scales in Experiment 3. Thus, the phoneme scale items used in Experiment 3 came from more common spelling neighborhoods (as measured by adding the frequency counts of all the monosyllabic words that share a phoneme scale word’s rime-body relation) than did the phoneme scale words used in Experiment 4. This may imply that weaker historical sources of constraint, and consequent internalized constraints, are available to the process that ends in a lexical decision and thus implies a weaker contrast at the phoneme and rime-body scales in this experiment than in Experiment 3. Method Participants. The participants were the same 74 right-handed, native English-speaking introductory psychology students who participated in Experiment 3.
RESPONSE TIME SCALING RELATIONS
75
Materials. The key stimuli were 20 yoked triplicates of phoneme scale (duck), rime-body scale (fern), and whole-word pronunciation scale (reed) words. Only subordinate homophones were used. A subordinate homophone is defined as having the lowest frequency count of all the alternative spellings. Thus, blew, which has a frequency of 12 per million is subordinate to blue, with a frequency of 143 per million. Word bodies refer to particular spelling patterns (e.g., _int in mint and pint). A spelling neighborhood is defined as all the monosyllabic words that use that same spelling body. For example, pint, mint, lint, and all other words ending in _int comprise _int’s spelling neighborhood. Ambiguous body-rime relations are spelling bodies, such as _int, that map to more than one pronunciation rime. The pronunciation ratio provides a quantitative estimate of the relative dominance of a body-rime relation. The pronunciation ratio of a body-rime relation is computed by dividing the sum of the frequencies of words that share a body-rime relation into the overall sum of the frequencies of all the words that have the spelling body. Similarly, pronunciation rimes refer to the particular pronunciations of word bodies (e.g., /ûrn/, in fern and turn). A pronunciation neighborhood is defined as all of the monosyllabic words that use that same pronunciation rime. For example, fern, turn, learn, and all other monosyllabic words that rhyme with fern comprise /ûrn/’s pronunciation neighborhood. The spelling ratio of a rime-body relation is estimated by dividing the sum of the frequencies of the words that share a rime-body relation into the overall sum of the frequencies of all the words that have the pronunciation rime. The rime-body scale and whole-word pronunciation scale yokes were matched with respect to their pronunciation and spelling ratios: Rime-Body Scale pronunciation ratio, M = .90 (SD = .30); Whole-Word Pronunciation Scale pronunciation ratio, M = .91 (SD = .26); Rime-Body Scale spelling ratio, M = .13 (SD = .18); Whole-Word Pronunciation Scale spelling ratio, M = .16 (SD = .23). The yoked word triplicates were closely matched for word frequency: Phoneme Scale, M = 3.95 (SD = 3.03); Rime-Body Scale, M = 3.80 (SD = 2.84); WholeWord Pronunciation Scale, M = 4.05 (SD = 3.36); and length: Phoneme Scale, M = 4.35 (SD = 0.49); Rime-Body Scale, M = 4.10 (SD = 0.85); Whole-Word Pronunciation Scale, M = 4.45 (SD = 0.51). The 60 experimental stimuli were embedded in a list of 118 filler words, and 178 (half four-letter, half five-letter) pronounceable nonwords, drawn at random from a nonword database supplied by Greg Stone. The yoked triplicates appear in Appendix D. Procedure. The procedures were identical to the those used in Experiment 3. Results and Discussion Variability analysis. I recorded the first and third quartiles of each participant’s distribution of correct phoneme scale, rime-body scale, and whole-word pronunciation scale response-time distributions. I began the stretching analyses by
76
HOLDEN
first testing for a reliable interaction of the first and third quartile response-time measurements on the phoneme and rime-body scales. A pattern of increasing variability in a contrast between the phoneme and rime-body scales is established if the difference between the distribution’s third quartiles is larger than the difference between their first quartiles. A test of the interaction terms for both the participant and item analyses, using 2 × 2 repeated measures ANOVAs, failed to reach statistical significance. Nevertheless, I conducted the planned contrasts to determine if the qualitative pattern of the average quartile differences was consistent with the hypothesis of increasing variability, and it was. The average difference of 17 msec between first quartiles of the participant-analysis phoneme and rime-body scale distributions was statistically reliable, tp(60) = 2.59, p < .05, and the average difference of 32 msec between the third quartiles of the same distributions was larger and statistically reliable, tp(60) = 1.79, p < .05. The average difference of 9 msec between first quartiles of the item-analysis phoneme and rime-body scale distributions was not statistically reliable, however. The average difference of 34 msec between the third quartiles of the same two distributions was indeed larger but also failed to reach statistical significance. As noted, the phoneme and rime-body contrast was weaker in this experiment than in Experiment 3. The apparent stretching pattern revealed by the planned contrasts mirrors that of the statistically reliable stretching pattern established in Experiment 3, which included a stronger contrast between the phoneme and the rime-body scales. The critical contrast of Experiment 4 concerns the variability measurements taken at the rime-body and the whole-word pronunciation scales. A test of the interaction term for the participant analysis, using a 2 × 2 repeated measures ANOVA, at the rime-body and whole-word pronunciation scales, was statistically reliable, Fp(1, 60) = 26.18, p < .05. Likewise, the interaction term of a 2 × 2 repeated measures ANOVA, using items as a random variable, was statistically reliable, Fi(1, 19) = 6.98, p < .05. The 41-msec average difference between the first quartiles of the participant-analysis rime-body and whole-word pronunciation scale distributions was statistically reliable, tp(60) = 4.12, p < .05, and the third quartiles of the same two distributions yielded a larger and statistically reliable average difference of 139 msec, tp(60) = 6.94, p < .05. Similarly, the average difference of 58 msec between first quartiles of the item analysis rime-body scale and whole-word pronunciation scale distributions was reliable, ti(19) = 2.50, p < .05, and the larger average difference of 176 msec between the third quartiles of the same two distributions was also statistically reliable, ti(19) = 3.23, p < .05. Once again, the results of the variability measurement increased, as the move to a higher resolution measurement scale was made. I corroborated the results of the statistical analyses with a visual contrast of the probability density functions of all three distributions. The overall prediction was a progressive increase in variability measurements that tracks the successively nested, higher resolution scales. Probability density functions for the three distributions, generated as previously described, are plotted
RESPONSE TIME SCALING RELATIONS
77
FIGURE 4 The probability density functions to the three scales of ambiguous pronunciation–spelling relations are depicted. The dotted line represents the kernel-smoothed density function of response times to words at the phoneme scale. The dashed line depicts the distribution of response times to words at the rime scale. The solid line depicts the distribution of response times, now on a whole-word pronunciation scale. The qualitative pattern revealed by this contrast is virtually identical to patterns revealed by the contrast at the grapheme, spelling-body, and whole-word scales in naming.
in Figure 4. The dotted line depicts the variability measurement at the phoneme scale. The dashed line depicts the variability found at the rime-body scale. The solid line depicts the result of the variability measurement, now at the whole-word pronunciation scale. The modes of the distributions tend to be similar, but the distributions diverge in their slow tails to include more and more slow responses. As predicted, the result of the variability measurement at each scale is different—it increases as scale resolution increases.
INVERSE POWER-LAW BEHAVIOR Pronunciation and response-time distributions that resemble inverse power-law distributions render plausible the hypothesis that naming and lexical decision are indeed fractal processes. Next, I explore this possibility. If naming and lexical deci-
78
HOLDEN
sion are fractal processes, it may be possible to observe empirical patterns that are more strictly associated with fractal processes. How might distributions of response time appear if one backed away from all the typical assumptions that go into a conventional means analysis? For instance, let us ignore the a priori biases that distinguish word-response times from nonword-response times, correct response times from incorrect response times, skilled from unskilled performance, and typical responses from atypical responses (i.e., outliers). That is, let us take a very conservative look at how response times are distributed. Panel A of Figure 5 represents every response I collected that fell between 200 and 5,500 msec for both lexical decision experiments described earlier (i.e., Experiments 3 and 4). This distribution is comprised of more than 26,000 response times. It includes both correct and incorrect times to all targets, all fillers, and all nonword foils. It includes the responses from all 74 participants, even those whose data were originally excluded because they failed to meet my accuracy criterion. The distribution’s mode is 600 msec. Past that point, the distribution’s slow tail diminishes dramatically, but nevertheless includes responses even in the neighborhood of 5 sec. This distribution appears as a unitary object, even though it represents responses to many different words and nonwords taken from many different participants. Panel B of Figure 5 displays the same probability density function, now plotted on double logarithmic coordinates. In this domain the x axis corresponds to the natural logarithm of response time and the y axis corresponds to the natural logarithm of the probability density. Apparently, the slow tail of the probability density
FIGURE 5 Panel A depicts the kernel-estimated density for all responses from Experiments 3 and 4 that fell between 200 and 5,500 msec. Included are both correct and incorrect responses to both words and nonwords for all 74 participants. Panel B depicts the same density, now plotted on double logarithmic coordinates. The heavy solid line represents a least squares regression line, fit in the logarithmic domain, over the interval extending from the density’s mode to the end of its slow tail. Apparently, the distribution’s slow tail is well described by an inverse power-law scaling relation.
RESPONSE TIME SCALING RELATIONS
79
diminishes as a linear function of response time in logarithmic coordinates, with a slope of –3.40 (r2 = .99); the wavy oscillations are an artifact of the kernel procedure estimating very sparse intervals of the distribution.1 An inverse power-law scaling relation is precisely a linear relation on double logarithmic coordinates along the slow tail of a distribution (Jensen, 1998; West & Deering, 1995). This represents a positive, conservative test for a scaling relation in lexical decision performance. Apparently, the probability density function of response time is well described in terms of an inverse power-law scaling relation—the line extending from the distributions mode out to the end of its slow tail. This is the strongest evidence I have presented so far that response times may have no characteristic scale of measurement. From this perspective, even the very slow 5-sec response times obey the same power-law scaling relation as the faster response times. Thus, the relative number of very slow response times is strictly related to the relative number of very fast response times, and vice versa. A conventional perspective presumes that different classes of words may require different mechanisms (e.g., Coltheart, 1978; Coltheart, Curtis, Atkins, & Haller, 1993), which are revealed by distinct time courses. Alternatively, the fractal perspective allows that all response times gauge the selfsame process. Here, all the response times closely follow a power law, and no natural break point exists that would distinguish additional processes. I have speculated that the increases in variability described in all four experiments result from the higher resolution scales increasing their access to power-law behavior. They reveal a more exhaustive sample of the range of possible performances. If so, then classic cognitive assumptions may simply restrict what is admitted into the measurement process—the resolution or entry point from which a scientist may observe a cognitive performance—just as one might restrict which sub-bays and subpeninsulas are admitted into measurements of the coastline of Great Britain. The introduction illustrated how printed text, a cultural artifact, resembles a fractal pattern (Van Orden et al., 2001), and performance yields patterns of variability that are consistent with fractal processes. The experiments demonstrated how the fractal structure of the cultural artifact constrains performance, and ultimately (over much slower time scales of cultural change) the fractal patterns of performance supply constraints to the “trajectory” of the cultural artifact (e.g., the fluctuating patterns of typical spellings and pronunciations; cf. wast and was; see also Van Orden, Moreno, & Holden, 2003). Perhaps it is here that fractal perspectives on performance make contact with ecological perspectives on performance. Iterative models of reading performance 1An inverse power-law density function that has a slope of –2 or greater has infinite variance, and a slope of –1 or greater implies an infinite mean (Jensen, 1998). There were too few pronunciation times to present them in an identical plot. Elsewhere, my colleagues and I established the same pattern in pronunciation time distributions (Holden, Van Orden, & Turvey, 2002).
80
HOLDEN
formally illustrate how a fractal perspective implies that linguistic performances entail irreducible relations between the linguistic environment and embodied linguistic constraints—constraints that emerge as a consequence of the idiosyncratic details of a person’s history of interaction in his or her linguistic environment. Likewise, ecological perspectives seek to catalog the irreducible relations that constrain other kinds of perceptual performances. For example, haptic perceptions do not simply gauge an isolated property of an object, such as its mass. Instead, they entail irreducible relations between an object’s objective physical properties and the properties of the force-producing neuromuscular system that is wielding the object (Turvey, Shockley, & Carello, 1999)—irreducible relations among environmental and embodied constraints.
SUMMARY In each of the four experiments I presented, variability measurements for pronunciation and lexical decision times changed as a function of the scale of resolution used to take the measurement. These experiments corroborate the hypothesis that variability in response time is amplified as the scale of ambiguity accesses nested ambiguity in the relations between word spellings and pronunciations. Furthermore, the response times from the two lexical decision experiments were shown to obey an inverse power-law scaling relation, which suggests response time does not conform to a characteristic measurement scale. So what kinds of systems have an inherent capacity to produce the kinds of systematic changes in variability that were observed here? As noted earlier, the same process repeating itself can be viewed as iteration in time, as an iterative dynamical system. Some processes that give rise to fractal patterns result from the repeated application of a simple mathematical rule. Recurrent network models of word naming (e.g., Farrar & Van Orden, 2001; Kawamoto & Zemblidge, 1992) and word recognition (e.g., Gibbs & Van Orden, 1998) are also examples of iterative processes that are predicated on multiplicative interaction. The approximate lognormal shape of the grapheme scale pronunciation time distribution suggests multiplicative interaction of system (control) parameters. Naming is a laboratory task that more closely parallels normal reading activities than lexical decision, for example (Bosman & Van Orden, 1997). As such, it may enlist well-tuned, well-learned relations between spellings and their pronunciations, resulting in relatively stable performances. If so, variability distributed in the lognormal shape may be a benchmark of skilled performance (cf. Holden, Van Orden, & Turvey, 2002). Nearly all the other response-time distributions diverged from the lognormal shape, with exaggerated slow tails. Thus, most response-time distributions better resemble inverse power-law distributions, distributions in which variability outpaces the benchmark established by the more constrained multiplicative model (Schroeder, 1991; West & Deering, 1995).
RESPONSE TIME SCALING RELATIONS
81
In my introduction I described how recurrent networks represent a model that describes how variability may accumulate in a multiplicative iterative process. In a fractal process, however, the amplification of variability outpaces the limits of multiplicative interaction, and parametric measurements of variability will entail even more dramatic, proportional amplification, which is observed as a power-law scaling relation. Variability measurements at higher resolutions in the nested fractal relations between English spellings and pronunciations were larger than the same measures of variability taken with lower resolution scales in both the word naming and lexical decision tasks. Apparently, a wider range of iterations in the process that resolves ambiguity in spelling–pronunciation relations was revealed as higher resolution ambiguity scales were introduced to the measurement procedure. In terms of my analogy to the measurement of a coastline, it was as if more jagged details of the coastline were revealed as more detailed measurements were made.
ACKNOWLEDGMENTS This article is based, in part, on a doctoral dissertation submitted to Arizona State University, Tempe. Preparation of this article was funded by a grant from the College of Social and Behavioral Sciences, California State University, Northridge, as well as by an Independent Scientist Award (1 K02 NS01905) to Guy Van Orden.
REFERENCES Bassingthwaighte, J. B., Liebovitch, L. S., & West, B. J. (1994). Fractal physiology. New York: Oxford University Press. Bosman, A. M. T., & Van Orden, G. C. (1997). Why spelling is more difficult than reading. In C. A. Perfetti, L. Rieben, & M. Fayol (Eds.), Learning to spell (pp. 173–194). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Coltheart, M. (1978). Lexical access in simple reading tasks. In G. Underwood (Ed.), Strategies of information processing (pp. 151–216). New York: Academic. Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (1993). Models of reading aloud: Dual-route and parallel-distributed-processing approaches. Psychological Review, 100, 589–608. Farrar, W. T., IV, & Van Orden, G. C. (2001). Errors as metastable response options. Nonlinear Dynamics, Psychology, and Life Sciences, 5, 223–265. Gibbs, P., & Van Orden, G. C. (1998). Pathway selection’s utility for control of word recognition. Journal of Experimental Psychology: Human Perception and Performance, 24, 1162–1187. Glushko, R. (1979). The organization and activation of orthographic knowledge in reading aloud. Journal of Experimental Psychology: Human Perception and Performance, 2, 361–379. Holden, J. G., Van Orden, G. C., & Turvey, M. T. (2002). Inverse power-law behavior and response time distributions. Manuscript submitted for publication. Jared, D. (1997). Spelling-sound consistency affects the naming of high frequency words. Journal of Memory and Language, 36, 505–529.
82
HOLDEN
Jared, D., McRae, K., & Seidenberg, M. S. (1990). The basis of consistency effects in word naming. Journal of Memory and Language, 29, 687–715. Jensen, H. J. (1998). Self-organized criticality. Cambridge, England: Cambridge University Press. Kawamoto, A. H., & Zemblidge, J. (1992). Pronunciation of homographs. Journal of Memory and Language, 31, 349–374. Kuèera, H., & Francis, W. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press. Mandelbrot, B. B. (1983). The fractal geometry of nature. San Francisco: Freeman. Peereman, R., Content, A., & Bonin P. (1998). Is perception a two-way street? The case of feedback consistency in visual word recognition. Journal of Memory and Language, 39, 151–174. Pexman, P. M., Lupker, S. J., & Jared, D. (2001). Homophone effects in lexical decision. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 139–156. Ratcliff, R. (1979). Group reaction time distributions and an analysis of distribution statistics. Psychological Bulletin, 3, 446–461. Ratcliff, R., & Murdock, B. B. (1976). Retrieval processes in recognition memory. Psychological Review, 83, 190–214. Riley, M. A., & Turvey, M. T. (in press). Variability and determinism in motor behavior. Journal of Motor Behavior. Schroeder, M. R. (1991). Fractals, chaos, power laws: Minutes from an infinite universe. New York: Freeman. Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman & Hall. Spieler, D. H., & Balota, D. A. (1997). Bringing computational models of word naming down to the item level. Psychological Science, 8, 411–416. Stone, G. O., Vanhoy, M. D., & Van Orden, G. C. (1997). Perception is a two-way street: Feedforward and feedback phonology in visual word recognition. Journal of Memory and Language, 36, 337–359. Turvey, M. T., Shockley, K., & Carello, C. (1999). Affordance, proper function, and the physical basis of perceived heaviness. Cognition, 73, B17–B26. Ulrich, R., & Miller, J. (1993). Information processing models generating lognormally distributed reaction times. Journal of Mathematical Psychology, 37, 513–525. Van Orden, G. C., Moreno, M. A., & Holden, J. G. (2003). A proper metaphysics for cognitive performance. Nonlinear Dynamics, Psychology, and Life Sciences, 7, 47–58. Van Orden, G. C., Pennington B. F., & Stone, G. O. (1990). Word identification in reading and the promise of subsymbolic psycholinguistics. Psychological Review, 97, 488–522. Van Orden, G. C., Pennington, B. F., & Stone, G. O. (2001). What do double dissociations prove? Cognitive Science, 25, 111–172. Van Zandt, T. (2000). How to fit a response time distribution. Psychonomic Bulletin & Review, 7, 424–465. West, B. J., & Deering, B. (1995). The lure of modern science: Fractal thinking. River Edge, NJ: World Scientific. Ziegler, J. C., Montant, M., & Jacobs, A. M. (1997). The feedback consistency effect in lexical decision and naming. Journal of Memory and Language, 34, 567–593. Ziegler, J. C., Van Orden, G. C., & Jacobs, A. M. (1997). Phonology can help or hurt the perception of print. Journal of Experimental Psychology: Human Perception and Performance, 23, 845–860.
RESPONSE TIME SCALING RELATIONS
83
APPENDIX A Yoked Grapheme Scale and Spelling-Body Scale Items Grapheme Scale Item
Number of Letters
Word Frequency
Spelling-Body Scale Item
Number of Letters
Word Frequency
BOLT SLANG BARGE WEDGE DICE WICK SKID SURF WISP PRANK ROBE SPOIL QUAKE SCRUB DUCK BUDGE HUMP PILL WELD SNAG WEED M SD
4 5 5 5 4 4 4 4 4 5 4 5 5 5 4 5 4 4 4 4 4 4.38 0.5
10 2 7 4 14 4 2 1 2 1 6 3 2 9 9 3 2 15 4 3 1 4.95 4.19
BUSH SOOT BROOD WHARF DOME WARP SWAP SWATH WASP PLAID ROUGE SKULL QUART SQUAT DOLL BROTH HOOF PINT WORM SWAN WAND
4 4 5 5 4 4 4 5 4 5 5 5 5 5 4 5 4 4 4 4 4 4.43 0.51
14 1 9 4 17 4 2 1 2 1 7 3 3 7 11 3 2 13 4 3 1 5.33 4.78
Word Frequency
23 11 56 10 186 4 110 57.14 68
Grapheme Scale Item
WAKE PUMP WAGE TUNE KEPT DIME LACK M SD
WASTE PEAK WOOD TOSS GIVE TOAD LIST
Dominant Spelling Body Scale Item .97 .66 .76 .72 .57 .75 .68 .73 .12
Dominant Pronunciation Ratio 35 10 56 9 391 4 133 91.14 139.63
Word Frequency WARD PORK WAVE TOMB COST DRONE LAID
Subordinate Spelling-Body Scale Item .08 .04 .11 .22 .24 .07 .17 .13 .08
Subordinate Pronunciation Ratio 25 10 46 11 229 3 77 57.29 79.97
Word Frequency WOUND BASS WIND TEAR CLOSE DOVE LEAD
Whole-Word Spelling Scale Item
APPENDIX B Yoked Grapheme, Spelling Body, and Whole-Word Spelling Scale Items
.02 .03 .05 .06 .15 .19 .21 .1 (.9) .08
Subordinate Pronunication Ratio
28 16 63 11 234 4 129 69.29 84.60
Word Frequency
84 HOLDEN
RESPONSE TIME SCALING RELATIONS
85
APPENDIX C Yoked Phoneme Scale and Rime-Body Scale Items Phoneme Scale Item
Number of Letters
Word Frequency
Rime-Body Scale Item
RAPE HUMP FROG SPIT PUKE GULP BARGE STING LOFT ROBE KELP HUNCH NUDGE HULK LUST PUNCH SLANG CLICK BUDGE FLING M SD
4 4 4 4 4 4 5 5 4 4 4 5 5 4 4 5 5 5 5 5 4.45 0.51
5 2 1 11 1 2 7 5 2 6 2 7 2 2 5 5 2 2 3 2 3.70 2.60
RUDE HOOP FERN SPED POMP GRAIL SEIZE SMIRK YAWN STIR CRUX FRAUD NICHE HURL MULE TRUCE STUNG GAUZE BOOZE PLUME
Number of Letters 4 4 4 4 4 5 5 5 4 4 4 5 5 4 4 5 5 5 5 5 4.50 0.51
Word Frequency 6 3 1 9 3 2 6 3 2 7 2 8 3 3 4 5 2 1 4 2 3.80 2.31
Word Frequency
10 5 7 1 1 4 2 9 5 3 6 9 2 2 1 7 1 2 1 2 3.95 3.03
Phoneme Scale Item
COIN HOOK BUST DRIP CUBE WEDGE SKID MASK MARSH STUB PROBE DUCK ROACH TORCH ROOST SMUG BLIMP BRAG SURF SOCK M SD
HUNT BEAN BLAZE BRAWL HONE BLISS SNIFF SLUM QUEER QUIZ RINSE GOAT PLAID CLOAK GLIDE STUD PERCH PERK PUFF MOAN
Rime-Body Scale Item
10 5 7 1 2 4 2 8 6 2 6 6 1 3 2 7 1 1 1 1 3.8 2.84
Word Frequency 1 1 1 1 0.07 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0.904 0.297
Pronunciation Ratio .11 .37 .06 .005 .18 .05 .02 .02 .04 0 .008 .32 0 .08 .62 .28 .002 .05 .45 .03 0.135 0.178
Spelling Ratio BLEW REED BEECH BOAR HARE BERTH FEINT HYMN WEIGH LIEN REIGN SEAM KNEAD CHUTE GRATE RITE WAIVE YOLK REEL LUTE
Whole-Word Pronunciation Scale Item 12 5 6 1 1 4 2 9 4 2 7 9 1 2 3 8 1 1 2 1 4.05 3.36
Word Frequency
APPENDIX D Yoked Phoneme, Rime-Body, and Whole-Word Pronunciation Scale Items
0.99 1 1 1 0.11 1 1 1 1 1 1 1 0.17 1 1 1 1 1 1 1 0.914 0.265
Pronunciation Ratio
.003 .008 .47 .21 .08 .57 .26 .002 .11 .43 .08 0.162 0.225
0
.07 .78 .06 .001 .08 .01 .02 .003
Spelling Ratio
86 HOLDEN
ECOLOGICAL PSYCHOLOGY, 14(1–2), 87–109 Copyright © 2002, Lawrence Erlbaum Associates, Inc.
Intentional Contents and Self-Control Guy C. Van Orden Cognitive Systems Group Arizona State University
John G. Holden Department of Psychology California State University, Northridge
Conventional research programs adopt efficient cause as a metaphor for how mental events affect behavior. Such theory-constitutive metaphors usefully restrict the purview of research programs, to define the space of possibilities. However, conventional research programs have not yet offered a plausible account of how intentional contents control action, and such an account may be beyond the range of its theoretical possibilities. Circular causality supplies a more inclusive metaphor for how mental events might control behavior. Circular causality perpetuates dynamic structures in time. Mental contents are seen as emergent dynamic constraints perpetuated in time and vertically coupled across their multiple timescales. Intentional contents are accommodated as extraordinary boundary conditions (constraints) that evolve on timescales longer than those of motor coordination (Kugler & Turvey, 1987). Intentional contents, on their longer timescales, are thus available to control embodied processes on shorter timescales. One key assumption—that constraints are vertically coupled in time—is motivated empirically by correlated noise, long-range correlations in the background variability of measured laboratory performances.
“The classification of behavior in categories, the limits of which are rigidly fixed, together with the adoption of a specific terminology, frequently serves to check scientific advance. … The terms ‘reflex,’ ‘involuntary,’ ‘voluntary’ and ‘automatic’ are more than classificatory designations; they have come to carry a burden of implications, philosophical, physiological and psychological. … It is essential that they be used with caution, and that the hypothetical implications which they have acquired during the 17th and 18th centuries be regarded as provisional only” (Fearing, 1930/1970, p. 253). Requests for reprints should be sent to Guy C. Van Orden, Cognitive Systems Group, Department of Psychology, Arizona State University, Tempe, AZ 85287–1104. E-mail:
[email protected] 88
VAN ORDEN AND HOLDEN
Several years ago, the cover of the American Psychologist announced: “Behavior—It’s Involuntary.” The banner heading referred to articles in the “Science Watch” section. The articles summed up studies reporting that mostly involuntary, automatic processes underlie human behavior (“Science Watch,” 1999). The conventional distinction between automatic versus intentional, controlled, voluntary, willed, or strategic behaviors stems from Descartes’ famous analogy to water-driven motions of garden statues. Descartes proposed that some actions of living beings might originate in clocklike automata. Contemporary thinkers amend his proposal and claim that most human behavior is automatic. A nagging concern, however, is the perpetual absence of reliable criteria by which to distinguish automatic behavior. For example, Fearing’s (1930/1970) historical review notes that even a knee-jerk reflex, a classic automatic behavior, is difficult or impossible to distinguish from voluntary behavior. He concluded, at the time of his review, that no reliable criteria exist by which to distinguish automatic behavior. Fearing’s conclusion applies to current studies as well. They remain stuck on the same issue. It is still the case that no generally accepted definition exists that can distinguish automatic laboratory performances from intentional performances. Now, as in the past, definitions of the term automatic behavior lean precariously and exclusively on intuition and a few illustrative laboratory performances (Vollmer, 2001). The Stroop effect is an example of autonomous automatic processing. The term autonomous means that the Stroop effect has its basis in an involuntary process that operates outside of, and possibly in opposition to, a laboratory participant’s goals or intentions. A Stroop experiment presents a color word such as red or blue printed in red ink (for instance), and the participant names the color of the ink. The Stroop effect refers to faster color naming times in the congruent “red on red” condition than in the incongruent “red on blue” condition (Stroop, 1935). Presumably, the color words’ “names” are automatically generated and reinforce (or interfere with) color naming, irrespective of participants’ intentions. The Stroop effect is the most widely cited example of automaticity (MacLeod, 1992). Several definitions have been proposed to capture the essential character of examples such as the Stroop effect. Criteria for automaticity have included (a) absence of voluntary control (as noted previously); (b) absence of resource limitations, which means that resource-limited processes such as attention cannot be the essential basis of automaticity; and (c) ballistic, whereby effects proceed automatically and inevitably from their causes, their stimulus triggers—like a bullet fired from a gun. All these criteria are challenged by results from careful empirical studies. For instance: Contrary to what has been frequently assumed … automatic processing is sensitive to resource limitations [and] can be controlled, at least to some extent … which in turn challenges the criterion of (the absence) of volition. This has led some to question the usefulness of the very concept of automaticity. (Tzelgov, 1997, p. 442)
INTENTIONAL CONTENTS AND SELF-CONTROL
89
Presently, the term ballistic remains in play, but even the Stroop effect is demonstrably not ballistic. A ballistic process should not be affected by factors extraneous to the trigger events. Nevertheless, Besner and his colleagues demonstrate reduced or absent Stroop effects after small extraneous changes, such as restricting the colored ink to only one of blue’s letters (Besner & Stolz, 1999b; see also Bauer & Besner, 1997; Besner & Stolz, 1999a; Besner, Stolz, & Boutilier, 1997). So what do we talk about when we talk about automaticity? Apparently, no one knows for sure. Juarrero (1999) described a similar failure by philosophers of action to adequately distinguish automatic acts. She attributed this failure to “a flawed understanding of … cause and explanation” in intentional behavior. Juarrero proposed a philosophical view of intentionality in which the meaningful content of “intentions flow into behavior” and “unequivocally inform[s] and constrain[s] behavior” (p. 103). Like Juarrero, we give priority to intentionality and take seriously the protracted failure to adequately define automatic behavior. In other words, we take issue here with the conclusion so confidently displayed on the cover of American Psychologist (“Science Watch,” 1999). Laboratory performances are never involuntary, in the conventional sense, but are by their very nature intentional (Gibbs & Van Orden, 2001). We consider next why laboratory performances are always intentional, and then explain why conventional research programs are biased nevertheless to discover automatic behavior. After that we describe more inclusive metaphysical assumptions that may accommodate intentional control.
INTUITIVE INTENTIONALITY The view of behavior as mostly involuntary is a bit strange. It suggests a robot world in which mindless individuals stagger along trajectories that change in billiardball-type “collisions.” In the robot world, a scientist’s clever external stimulation of the robot would be perpetuated through connected modules in the robot brain and output as behavior. Change always refers eventually and exclusively to external sources; “mental processes … are put into motion by features of the environment and … operate outside of conscious awareness and guidance” (Bargh & Chartrand, 1999, p. 462). That is what it means for behavior to be automatic, in the conventional metaphor. Lacking intentionality, the robot world appears incomplete compared to the meaningfully animated and purpose-filled world in which we actually live (Searle, 1992; Velmans, 2000). In the world we occupy, people generally interpret the behavior of others as intentional, to make sense of their behavior. We evaluate intentions in all domains of discourse (Gibbs, 1999). Laboratory performances are themselves intuitively intentional, and scientists show the same disposition as anyone else to see them that way.
90
VAN ORDEN AND HOLDEN
Take the typical scenario of a psychology experiment, for instance, which may discover automatic behavior. A participant is told the response options and instructed to respond quickly and accurately. But not every person actually behaves as instructed. A rare uncooperative person may produce the same response on every trial. Someone else, equally disagreeable, may produce a nonsensical pattern of responses, ignoring the instruction to respond accurately, or they may dawdle in the task, responding too slowly to produce usable response-time data. Investigators actually eliminate data from analyses on the basis of such idiosyncrasies, an implicit evaluation of the participant’s disagreeable intentions. The point of this example, however, does not concern uncooperative performances per se, but their opposite. The simple fact that scientists are disposed to evaluate participants’ intentions contradicts (or at least qualifies) any claim that performances are automatic. The contrast with uncooperative performances makes more salient the spheres of intentionality that surround every cooperative performance (Vollmer, 2001). Otherwise, the attribution of uncooperative performance is paradoxical. To be fair, the conventional term intentional automatic processing seems to circumvent the paradox. Speeded word naming is an example of intentional automatic processing. In a speeded naming experiment, a participant is presented with a printed word and the instructions to read it aloud quickly and accurately. Similar to the Stroop effect, speeded word naming is based on automatic retrieval of words’ names. Thus the automatic performance is directly aligned with the task instructions, the source of directed intentions to perform speeded word naming. The conventional view of speeded word naming seems to make room for both intentions, which initiate behavior, and automatic processes, which follow from those intentions (Jacoby, Levy, & Steibach, 1992). Intentional contents are equated with representations, causal states along the same lines as the representations in automatic processes. To make scientific sense, however, this use of the term intention must entail empirical methods that can dissociate intentions from other representations. Otherwise, its use merely pretends to address intentionality, and ducks the issue altogether. (We discuss empirical methods in the next section.) But if it is so intuitive that laboratory performances are intentional, then why do laboratory studies discover that behavior is mostly automatic? Conventional research methods presume the limited view of cause and effect in which representations cause behavioral effects. Consequently, when they address intentionality, they must misread intentions as mediating, causal states of mind— states of mind with causal powers no greater than colliding billiard balls or chains of falling dominoes (Gibbs & Van Orden, 2001). Conventional methods are blind to other possibilities. We claim that this limited view of cause and effect leads inevitably to the conventional emphasis on automatic processes (cf. Wegner & Wheatley, 1999). The next section spells out the basis of our claim in conventional metaphysics.
INTENTIONAL CONTENTS AND SELF-CONTROL
91
CONVENTIONAL METAPHYSICS Most research efforts in cognitive psychology concern the series of mental representations that result in behavior. Structural hierarchy theory distinguishes such states of the mind from the rest of nature, insofar as nature is a nearly decomposable system (Simon, 1973). Nearly decomposable systems comprise a hierarchy of structures nested, one inside the other, like Chinese boxes. A necessary assumption is that the Chinese boxes are vertically separated in time. Vertical separation means simply that larger boxes change states on longer timescales. The crucial point of vertical separation is that changes on different timescales may be separated in terms of their causal implications—we may isolate causal properties on different timescales. For instance, some scientists believe that linguistic competence changes on the very long timescale of evolution, whereas the advent of literacy refers to a separate long timescale of cultural change. Linguistic competence changes on a longer timescale than culture, and both change on longer timescales than mental events. If so, then they present a static background for states of mind in an automatic action such as speeded word naming in a word naming experiment. In turn, mental states provide a static background for neural events, interactions that occur on the still-shorter timescales of the nervous system. The timescales are sufficiently separate that changes on long timescales appear frozen in time when seen from the perspective of shorter timescales. Neural interactions themselves contribute unsystematic variability to measurements of mental events on longer timescales. Interactions on the very short timescale of the nervous system contribute random variability around the average pronunciation time to a printed word, for example, as measured in a word naming experiment. Methods for measurement of mental effects, on their characteristic timescale, are simply too coarse-grained to pick up systematic variability on the much shorter timescales of neural processes (and other bodily processes on very short timescales). Thus neural events show themselves as a background of unsystematic variability— uncorrelated noise (which becomes important in a later section). The basic premise of structural hierarchy theory is that discernible mediating causal states, or representations, exist and may be described (see also Markman & Dietrich, 2000). Take speeded word naming, for instance. A component process of sensation may represent optical features of a stimulus word, which perception takes as input to supply representations of the word’s letters. Letter representations, in turn, serve as input to a component process of word recognition that may represent the pronunciation of the particular word. The representation of a word’s pronunciation, in its own turn, triggers elements of articulation, which we observe as the pronunciation response. Vertical separation is one of several assumptions that are crucial for an analysis of mental states as mediating causes. If mental events unfold on their own, separate, characteristic, timescale, then total elapsed response time can be parsed into the time course of component events that preceded a response. A separate charac-
92
VAN ORDEN AND HOLDEN
teristic timescale allows a sequence of mental effects to be treated as a causal chain distinct from effects on longer and shorter timescales (A. Newell, 1990). Thus, vertical separation must be true a priori if we are to dissociate mental events from other phenomena of nature. Another assumption must be true so that we may dissociate mediating representations in measured behavior. Component processes must interact approximately linearly; an assumption Simon (1973) dubbed loose horizontal coupling. For example, additive factors method can be viewed as a test for loose horizontal coupling. Experiments with several experimental manipulations in factorial designs provide the opportunity for interaction. If the effects of two or more factors are strictly additive, then the manipulations satisfy the superposition principle. They selectively influence distinct components (Sternberg, 1969). Loose horizontal coupling respects a very old intuition about human behavior— that it originates in component-dominant dynamics among specialized component devices of mind: sensation, perception, memory, language, and so on. As the term component-dominant suggests, the intrinsic dynamics of the components—dynamics inside the components—dominate interactions among components. This may ensure the integrity of component effects. It encapsulates component effects such that they can be recovered in the measured behavior of the whole. Thus, component effects may be individuated in measurements of a system’s behavior. Component effects that are reliably individuated in measurements of a person’s behavior—as with additive factors method—reduce to the causal properties of the components themselves. Notably, additive factors method tests the “adequacy of the assumptions that underlie its use,” whether response time actually entails loosely coupled components (Pachella, 1974, p. 50). This sets additive factors method apart. Other methods that would reduce behavior to mediating states include no test of their assumptions. Other reductive methods rely on a priori knowledge of the components that they seek to justify the search. Examples include subtractive methods and dissociation analyses popular in cognitive neuroscience (for a critique, see Uttal, 2001). But how does one know a priori whether laboratory tasks differ simply by distinct mental components, or which components a task would entail, or whether manipulations refer to distinct components? Additive factors method does not require a priori knowledge of mental states; it requires only the assumption that components are loosely coupled, and it tests this assumption each time it asks whether component effects are additive. Thus, additive factors method illustrates a scientifically conservative test of whether laboratory performances implicate a nearly decomposable system. But what about intentionality? Simon (1973) did not discuss intentionality, but the assumptions of vertical separation and loose horizontal coupling allow only two choices for intentionality: Either intentions fit as a link in a causal chain of mental representations, or intentionality is not a proper subject for scientific discourse. No other choices exist within this framework. It is in this sense that efficient cause
INTENTIONAL CONTENTS AND SELF-CONTROL
93
serves as the theory-constitutive metaphor for how we think about cause and behavior. Vertical separation and loose horizontal coupling extend the metaphysics of billiard-ball causality (efficient cause) to mental events. Actions are viewed as end states that follow from chains of mediating representations. If so, then intentions must be representations. Otherwise there is no entry point for intentions in the analysis. We mentioned Juarrero’s (1999) philosophical argument that intentions, as a basis of ongoing control, cannot possibly reduce to mediating causal states (see also Greve, 2001). If she were correct, then we would expect that empirical studies must fail to justify mediating causes in human performance. As a matter of fact, nonadditive effects are the rule in cognitive experiments, not the additive effects that would corroborate loose horizontal coupling. A vast nexus of nonadditive interactions, across published experiments, precludes assigning any factors to distinct components. This is true in particular for the vast literature that has grown out of laboratory reading experiments (Van Orden, Pennington, & Stone, 2001). Performances attendant on reading are conditioned by task demands, culture, and language, whether they come from tasks that require controlled processing or from automatic performances such as speeded word naming. Consider the implications within the guidelines of additive factors logic. Cognitive factors in reading are neither individuated as causes, nor causally segregated from the context of their manipulation— task, culture, or language. No reliable evidence exists that may individuate any mediating representations in human performance; no evidence exists that would motivate the core assumptions of structural hierarchy theory.
LIVING SYSTEMS Conventional research programs have failed, so far, to produce empirical corroboration for the assumptions of structural hierarchy theory. However, do not confuse these failures with naive falsification of vertical separation and loose horizontal coupling. The failures up until the present moment could mean that we have not yet correctly described the set of factors that do combine linearly in performance. A correct parsing of performance using correctly manipulated factors could yet discover elementary additive interactions. Likewise, do not take the outcome, so far, as falsification of representation. The question is not whether there are mediating states, but whether a research program is feasible that must equate mediating representations with units of cognitive performance. Said slightly differently, we ask whether a research program is feasible that must recover component effects in measurements of behavior for a reduction to component causes. We expect that the conventional research program will continue its failure to corroborate core assumptions, because it cannot accommodate the complex autonomous behavior of living beings. In contrast, a more contemporary and inclu-
94
VAN ORDEN AND HOLDEN
sive view of causality recognizes a basis for self-control in complexity theory and self-organization (Juarrero, 1999). From this perspective, intentional acts are observed every time a person performs a laboratory task (Kugler & Turvey, 1987). This and the sections that follow introduce a more contemporary metaphysics that finds a place for living systems. Living systems are complex systems; increasingly complex dynamic structure makes possible increasingly autonomous behavior. Autonomous behavior originates in positive feedback processes. Positive feedback processes include billiard-ball causality, but they do not reduce to billiard-ball causality. Life itself originates in the circular, positive feedback process of chemical autocatalysis. In autocatalysis, the output of a chemical reaction becomes, in turn, its input and catalyzes the same reaction. Autocatalysis thus perpetuates the reaction in cycles of chemical reproduction. The chemical reaction, itself, appears as a coherent, self-organized, iterative structure—a cycle perpetuated through time. Contemporary evolutionary theory treats chemical autocatalysis as an archetype. The units of selection in natural selection, for instance, are positive feedback processes of metabolism, development, and behavior. Such units are metaphorical extensions of autocatalysis, with cycles that recur on longer timescales (Depew & Weber, 1997). Circular causality—illustrated by archetypal chemical autocatalysis—presents us with an alternative theory-constitutive metaphor. Positive feedback perpetuates dynamic structures in time, on their own timescale. The timescale of a simple dynamic structure corresponds to the time course of its cycles. Different dynamic structures may live on widely divergent timescales. Positive feedback among structures on different timescales adds another dimension to this metaphor. For example, a single complex system may evolve simultaneously on many timescales. The additional dimension may be directly contrasted with Simon’s (1973) metaphysics. In Simon’s metaphysics, vertical separation partitioned nature among segregated, characteristic timescales. Vertical separation implied that mental events would appear as random fluctuations, in measurements of events on an evolutionary timescale, for example. But feedback processes in complex living systems are vertically coupled on different timescales. As a consequence, fractal patterns of long-range correlation may emerge, which can be discovered in a system’s behavior (i.e., correlated noise, or fractal time, which we describe shortly). Perhaps a made-up concrete example of vertical coupling can give a better feel for the contrast with vertical separation. Suppose that human performances (e.g., invention, consumption) attendant on mental events may reverberate through cultures (e.g., industrialization, consumerism) and environments (e.g., pollution, deforestation, global warming). Such reverberations could alter the niches that the environment affords for us, and for the species with which we coevolved, and alter the relations among species. A fitness landscape summarizes the complex web of these relations; more stable relations are represented as occupying higher, fitter peaks (Bak, 1996; Goodwin, 1994; Kauffman, 1995). Sufficient change will elicit new relations (e.g., new species), changes in existing relations (e.g., altered pheno-
INTENTIONAL CONTENTS AND SELF-CONTROL
95
types, invasion of one species’ niche by another species), or eliminate the potential for previously viable relations (e.g., precipitate small or large cascades of extinction). This evolutionary process yields an ever-changing fluid topology of fitness. Vertical coupling among events on multiple timescales allows the previous linked changes to occur. Sufficient change in bottom-up “microlevel” interactions among individuals may alter the possibilities for top-down “macrolevel” control of these interactions. Thus, changes in the relations among species on their long timescales are inherently coupled to changes in the relations among individuals acting on shorter timescales. Vertical coupling takes into account that relations among species are emergent products of interactions among individuals (and environments). Control parameters index emergent, self-perpetuating, abstract relations among individuals and groups of individuals. Abstract relations come into or out of existence if the balance among constraints changes sufficiently. Control hierarchy theory summarizes the network of complex relations in an abstract hierarchy of control parameters. Juarrero (1999) proposed that intentional contents are part of this endlessly evolving hierarchy of control parameters. Intentional contents emerge and are perpetuated in time via circular causality. Because they are perpetuated in time, they are available to constrain control processes on shorter timescales. Thus, intentional actions self-organize in embodied, vertically coupled, control processes.
CONTROL HIERARCHY THEORY Structural hierarchy theory was concerned with the causal status of mental structures. To map out the functional organization of a cognitive system would be to map out the causal interactions among the components of the system—the system’s flowchart. To do so would require knowing the states (representations) associated with each component and how representations are causally dependent on each other. As such, structural hierarchy theory was concerned exclusively with causal relations in the form of efficient causes. Again, efficient cause served as its theory-constitutive metaphor for how to think about cause and behavior. In the more contemporary picture, we substitute vertically coupled feedback processes, summarized in a hierarchy of control parameters, for Simon’s (1973) vertically separated Chinese boxes. Control hierarchy theory draws on the theory-constitutive metaphor of circular causality in the form of positive feedback, a form of self-cause (Pattee, 1973). This metaphor may be extended to laboratory performances that are also embedded in the complex ecology of living systems. Feedback across timescales reaches all the way out, into the long timescales of culture (e.g., social constraints that emerge as laboratory etiquette) and evolution (e.g., capacities for categories of action such as articulation), and all the way in, to the very short timescales of the nervous system (and so on). Relations among levels in the hierarchy are “causally and interpretively bidirectional” (Lumsden, 1997, p.
96
VAN ORDEN AND HOLDEN
35). Laboratory performances emerge from this endlessly evolving hierarchy of vertically coupled processes. Most important, for our present argument, control hierarchy theory makes a place for participant’s intentional contents in explanations of cooperative and uncooperative behavior. Intentional contents supply exceptional boundary conditions for behavior (Kugler & Turvey, 1987). For example, it is intuitive that instructions and other aspects of laboratory control define “boundaries” that limit the behavioral options of participants. That is why we so carefully prepare detailed laboratory scripts to guarantee that participants perform as planned. Self-organizing systems may perpetuate their dynamic structure in time (on multiple timescales), which invokes again the analogy with autocatalytic processes and the theory-constitutive metaphor of circular causality. Likewise, intentional contents that evolve (self-organize) on longer timescales are perpetuated in time relative to control processes on shorter timescales, which makes them available to limit the degrees of freedom for interactions among processes on shorter timescales. Intentional contents emerge out of, and control, cognitive performances. Juarrero (1999) describes at length how intentional contents reduce degrees of freedom in a human capacity for self-organization. Intentions modify a system’s phase space and restrict the potential set of trajectories through that space. In this way, intentional contents reduce the degrees of freedom for behavior and thereby construct specialized devices—as laboratory participants may make of themselves specialized laboratory devices: simple reaction time devices, word naming devices, or whatever, as required by task instructions (Kugler & Turvey, 1987). According to nonlinear, far-from-equilibrium science … systems are created from interacting components, which they then, in turn, control. As a result of this strange loop relation between parts and wholes, these dynamical systems are not mere epiphenomena; they actively exercise causal power over their components. (Juarrero, 1999, p. 131)
Instructions, as directed participant intentions, set boundaries and limit the options for laboratory performance. Agreeable intentional performances self-organize within the understood boundaries. This capacity to sustain directed intentions for laboratory performance emerges within a control hierarchy of vertically coupled constraints. Unlike Simon’s (1973) Chinese boxes, however, vertically coupled constraints fluctuate and interact across many timescales, including timescales within the time course of an experiment. This core assumption can be tested. It sets us up to expect correlated noise in measurements of human performance. CORRELATED VERSUS UNCORRELATED NOISE Self-organization concerns the integrity of a whole that may become a specialized device, as circumstances require. In a self-organizing system, interactions among
INTENTIONAL CONTENTS AND SELF-CONTROL
97
component processes may dominate the intrinsic dynamics of the components themselves—call this interaction-dominant dynamics. When interactions among component processes dominate their intrinsic dynamics, then the behavior of the whole is different from the sum of its parts. Self-organization requires these more flexibly coupled dynamics (Jensen, 1998). Intentional contents fluctuate on timescales longer than the trial-by-trial pace of a laboratory experiment—longer than the trial pace at which response times are taken, for example. These and other fluctuations on longer timescales are the source of long-range correlations in the background variability of performance measures—correlated noise (Van Orden, Holden, & Turvey, 2002). We are particularly interested in pink noise, a statistically self-similar (fractal) pattern of longrange correlations in trial-to-trial variability. Pink noise has been observed previously in response-time studies. Figure 1a illustrates pink noise as it may appear in a participant’s trial series of simple reaction times (prepared for spectral analysis— see figure caption). Pink noise is also called fractal time. Fractal objects such as pink noise occupy fractal dimensions that lie “in-between” the dimensions of more familiar, ideal, geometric objects such as lines and planes. Variability in response time can be conceptualized to partly occupy, or leak into a next higher Euclidean dimension. Figure 1’s series of reaction times, graphed in the ordered series in which they were collected, appear as points connected by a line—a trial series. Clearly, if we could “pull” this line taut, and make it straight, then it would have a Euclidean dimension of 1. But any departure from the ideal form of a line begins to occupy or leak into the next higher, second, Euclidean dimension (likewise, departures from an ideal plane leak into the third dimension, and so on). In this sense, variability in response time may occupy area and will have a dimension between an ideal one-dimensional line and an ideal two-dimensional plane. The more jagged and irregular the graph of response times, the more area it occupies. Conventional analyses require that background variability is exclusively uncorrelated noise, uniformly distributed Gaussian noise—white noise—as prescribed by structural hierarchy theory. Otherwise measurements must be “corrected” to create statistical independence between successive trials (West & Hepworth, 1991). White noise yields a jagged and irregular line with a fractal dimension of 1.5 that gauges the extent to which it occupies two-dimensional space. The fractal dimension of white noise derives from a familiar scaling relation, which may serve to introduce less widely appreciated possibilities. The scaling relation is familiar from the equation for the standard error of the mean (SE)—the standard deviation of a sampling distribution of means, where each sample mean characterizes a distribution drawn from a standardized, homogeneous, Gaussian, independent, random variable.
SDPop/√N = SE
(1)
FIGURE 1 Panel A displays the pattern of pink noise in a trial series of simple reaction times. The x axis is the trial number, in the order of the experiment, and the y axis is reaction time. To prepare the series for spectral analysis, reaction times were normalized to have a mean of zero and unit standard deviation (after linear and quadratic trends were removed). Panel B depicts the spectral analysis of the same trial series. Like Fourier analysis, a spectral analysis fits a large set of sine (and cosine) waves to approximate a complex waveform. The x axis of Panel B indexes the period of oscillation (frequency), and the y axis indexes amplitude (relative height) of each component wave. Panel C represents the results of Panel B’s spectral analysis after a transformation to a double logarithmic scale. The slope of the line (–.60) in Panel C estimates an inverse power law that describes the relation between frequency and power (amplitude squared) of the component oscillations. Slopes near –1 suggest relatively strong positive correlations across a wide range of frequencies. The observed slope is consistent with the hypothesis that fluctuations in reaction time comprise a nested, statistically self-similar pattern—pink noise, fractal time. (These data from one participant come from a study that contrasted several participants’ data with yoked surrogate data in analyses that concluded in favor of pink noise; Van Orden et al., 2002.)
98 VAN ORDEN AND HOLDEN
99
INTENTIONAL CONTENTS AND SELF-CONTROL
SDPop, in equation (1) is the population standard deviation for the sample size N. Standardized SDPop equals one, which allows Equation (1) to be rewritten as: 1/√N = SE
(2)
Equation (2) is a scaling relation between the index of variability SE and sample size N. Taking the logarithm of both sides of equation (2) yields: –0.5 × log(N) = log(SE)
(3)
Equation (3) describes how error in the estimation of the mean of a Gaussian variable is reduced as sample size increases. A plot of this scaling relation, on log-log scales, is a straight line with slope –0.5 (the dashed line in Figure 2). The
FIGURE 2 Dots represent paired values of log[bin size] and log[Standardized Dispersion] from the two rightmost columns of Table A1. The x axis indexes the logarithm of bin size (sample size; see Appendix), and the y axis indexes the logarithm of corresponding values of standardized dispersion. The dashed line has a slope of –.5, which would be expected if reaction times were statistically independent from trial to trial. The filled circles are the basis of the linear least squares fit regression line (on the log scales). The filled circles correspond to the values in Table A1 that are underlined. The open circles represent values that were excluded from the regression. The solid regression line has a slope of –.30. The fractal dimension of the trial series is 1.30, consistent with pink noise.
100
VAN ORDEN AND HOLDEN
fractal dimension of white noise is calculated by subtracting this slope from 1, its Euclidean dimension (Bassingthwaighte, Liebovitch, & West, 1994). The standard error of the mean thus illustrates how uncertainty in an estimated sample population parameter scales as a function of sample size. Measured variability of a homogeneous uncorrelated signal, such as white noise, stabilizes relatively quickly, as sample size is increased. In contrast to white noise, we may find nested, correlated, statistically self-similar fluctuations—pink noise. Nested long-range correlations yield a graphical picture of response times that is less jagged than white noise, leaks less into the second Euclidean dimension, and yields a fractal dimension closer to 1. Correlated noise implies that samples of all sizes tend to “hang together,” which leads to counterintuitive statistical properties. As larger samples are considered, variability tends to increase rather than stabilize and may lead to a notable limiting case in which the variance is undefined (Bassingthwaighte et al., 1994). Heterogeneity in variability measured at different scales destabilizes parametric measurements and creates a challenge for conventional statistical methods that must assume stable parameters. Relative dispersion analysis is a robust method to estimate fractal dimension (Eke et al., 2000). Dispersion analysis is related to the renormalization group procedures used by physicists to study critical point behavior (e.g., see Bruce & Wallace, 1989). Van Orden et al. (2002) used dispersion analysis to gauge how variability scales with the size of adjacent samples in trial series of simple reaction times and speeded word naming times. This fractal method repeatedly resamples the trial series using sampling units of different sizes to estimate the fractal dimension of a trial series. The fractal dimension of variability in the trial series gauges the scaling relation between variability and sample size, whether variability converges fast enough, as sample size increases, to yield stable population parameters. The results of a dispersion analysis on the data from Figure 1 are plotted in Figure 2, as described in the Appendix. The solid line represents the least squares regression line for the relation between the relative dispersion (y axis) and sample size (x axis, i.e., bin size; see Appendix). The dashed line in the figure has a slope of –.5, and represents the ideal slope of white noise—compare to the previous Equation (3). The slope of the solid line is –.30, which implies a fractal dimension of 1.30. Empirical fractal dimensions of pink noise may range between 1.5 (white noise) and 1.2 (ideal pink noise). Van Orden et al. (2002) found fractal dimensions consistent with pink noise in almost every participant’s trial series of simple reaction times and speeded word naming times. (One participant’s simple reaction time trial series yielded a fractal dimension that fell on the boundary that distinguishes pink noise from brown noise.) Correlated noise has been observed widely in spectral analyses of motor performances, such as swinging pendula, tapping, and human gait (Chen, Ding, & Kelso, 1997, 2001; Hausdorff et al., 1996; Schmidt, Beek, Treffner, & Turvey, 1991). It is found in controlled processing tasks, such as mental rotation, lexical decision, vi-
INTENTIONAL CONTENTS AND SELF-CONTROL
101
sual search, repeated production of a spatial interval, repeated judgments of an elapsed time, and simple classifications (Aks, Zelinsky, & Sprott, 2002; Clayton & Frey, 1997; Gilden, 1997; Gilden, Thornton, & Mallon, 1995; Kelly, Heathcote, Heath, & Longstaff, 2001). And the word naming experiment of Van Orden et al. (2002) demonstrates correlated noise in an automatic cognitive performance based on learned associations. Correlated noise is most pronounced in tasks such as simple reaction time, which repeat identical trial demands (Gilden, 2001). Apparently, correlated noise can be observed in all appropriately measured laboratory performances.
SELF-ORGANIZED CRITICALITY Pink noise is a characteristic pattern of correlated noise associated with interaction-dominant dynamics and states of self-organized criticality. Self-organization entails a capacity to move between different, ordered, dynamic states—between qualitatively different patterns of behavior (Nicolis, 1989). Criticality refers to the balance among constraints that yields one or another ordered state. At or near a critical point, active competing constraints can be “forcefully” present at the same time in the same system. Near a critical point, mutually inconsistent constraints are poised together as potential constraints. The “pull” of these constraints extends across the entire system through interactions among neighboring processes. “The system becomes critical in the sense that all members of the system influence each other” (Jensen, 1998. p. 3). The presence of the pink noise pattern justifies serious consideration of this hypothesis. The appeal of self-organized criticality is that systems near critical points are poised to access all potential behavioral trajectories (within the boundary conditions). Thus, near a critical point, the system is exquisitely context sensitive. The intention to perform speeded word naming, for example, positions the body as a word naming device near a critical point, which makes available a large set of mutually exclusive articulatory trajectories. Perception of the target word, with its entailed cognitive constraints, further restricts the set of potential trajectories. Over time, mutually consistent constraints combine to prune the set and exclude those trajectories that bear only superficial resemblance to the target pronunciation (Van Orden & Goldinger, 1994; Van Orden, Pennington, & Stone, 1990; cf. Kello & Plaut, 2000). Protracted “cognitive pruning” of action trajectories implies that cognitive constraints are continually available to—are vertically coupled to—“peripheral” control processes of motor coordination (within the boundaries specified by intentional contents and other control processes on longer timescales). This hypothesis is supported by a growing family of experiments in which “central” constraints (attendant on cognitive factors) are available to peripheral control processes. Cognitive constraints are subtly reflected in the actual kinematics of motor trajectories (Abrams & Balota, 1991; Balota & Abrams, 1995; Gentilucci, Benuzzi, Bertolani,
102
VAN ORDEN AND HOLDEN
Daprati, & Gangitano, 2000; Zelinsky & Murphy, 2000). Pruning, itself, resembles simulated annealing (Shaw & Turvey, 1999; cf. Smolensky, 1986). Vertical coupling of embodied constraints simultaneously takes into account well-tuned cognitive constraints (e.g., learned relations between spellings and pronunciations), the current status of intentional contents, and other embodied constraints (the current status of articulatory muscles, breath, heartbeat, neural fluctuations, and so on) that are all implicated in each unique pronunciation trajectory. Over time, all pronunciations that fail to satisfy converging constraints are pruned from the potential set, which yields a globally coherent, locally efficient, articulatory trajectory (cf. Shaw, Kadar, & Kinsella-Shaw, 1994; Shaw, Kugler, & Kinsella-Shaw, 1990). We just described the intentional basis for speeded word naming—an intentional automatic performance in conventional terms. But what about autonomous automatic processing as in the Stroop effect? Instructions in the Stroop procedure emerge as directed intentional contents that restrict behavior to ink-color naming—intentional contents that constrain a human body to become an ink-color naming device. This sets up a potential set of color-name articulatory trajectories, a necessary backdrop for the Stroop effect. Perception of ink color provides additional constraints that, with converging constraints, prune the potential set to an appropriate, globally coherent trajectory—a color-name pronunciation. However, if the colored ink is arranged in a shape that spells a color name, then cognitive constraints entailed by the color word come to bear in pruning, which may reinforce (speed up) or interfere with (slow down) pruning of extraneous trajectories. If laboratory performances self-organize, then intentional contents are causally intertwined with learned constraints in so-called automatic performances. Moreover, intentional contents have an essential a priori function; they must emerge before there is any possibility of word naming or Stroop phenomena. We hope these examples illustrate how central the problem of intentionality is to laboratory observations of human performance. Any credible research program should begin with a plausible story of how laboratory protocols yield cooperative performances—a plausible story about intentional contents and self-control.
SUMMARY This article has described the core assumptions of two research programs as they are spelled out in structural hierarchy theory and control hierarchy theory. By core assumptions we mean something close to what Lakatos (1970) called negative heuristics: defining assumptions that a research program must hold onto at all costs. To let go of a core assumption is to become a different research program. A successful research program will accrue empirical support for its core assumptions, which may stabilize research efforts around those assumptions. The inherent pattern of variability in behavioral measures could supply this kind of empirical support (cf. K. M. Newell & Slifkin, 1998; Riley & Turvey, in press).
INTENTIONAL CONTENTS AND SELF-CONTROL
103
Both research programs were described as to whether they may accommodate intentionality—arguably the first question of psychology. For example: Am I an automaton, or am I an intentional being? Is my apparently intentional act simply the end product of billiard-ball causality, or could it attend on a capacity for self-control as in self-organization? Do I reduce to specialized devices of mind, or am I a coherent whole that creates of itself specialized devices as circumstances require? Do the morphologically reductive methods of linear statistical analysis or strategically reductive nonlinear methods have greater utility for understanding my behavior? The two views that we have contrasted supplied answers to the previous questions and made explicit links among the answers. (We know of no third alternative that can equally supply linked assumptions from existential head to methodological foot.) Structural hierarchy theory answered the questions as follows: Cognitive systems are automata. Behavior reduces to specialized component devices. Behavior can be viewed as the end product of linked efficient causes, and the methods of linear analysis should suffice to discover mental components as component effects. It is easy to see why research efforts grounded in the assumptions of structural hierarchy theory inevitably discover automata. Automatic behavior is the only kind of behavior that structural hierarchy theory acknowledges. But the core assumptions of vertical separation and loose horizontal coupling have yet to be corroborated. Moreover, correlated noise brings into question the basic premise of structural hierarchy theory, that mediating states can be individuated (Van Orden, Jansen op de Haar, & Bosman, 1997). Correlated noise may imply a self-organizing system, which warrants research efforts that may take into account this possibility. Efforts to understand cognitive performances could pattern themselves after, and build on, previous efforts to understand motor coordination in terms of self-organization. Among other things, such research could focus on qualitative changes in cognitive performance to characterize, and motivate empirically, the relevant control parameters (e.g., Van Orden, Holden, Podgornik, & Aitchison, 1999). Of course this is not without its difficulties. Nonlinear methods can be challenging in their own right. Efforts may be rewarded, however. They may pay off in a plausible theory of self-control.
ACKNOWLEDGMENTS We thank Lori Buchanan, Heidi Kloos, Markus Nagler, Clark Presson, Roger Schvaneveldt, and Whit Tabor for comments on previous versions of this article.
REFERENCES Abrams, R. A., & Balota, D. A. (1991). Mental chronometry: Beyond reaction time. Psychological Science, 2, 153–157.
104
VAN ORDEN AND HOLDEN
Aks, D. J., Zelinsky, G., & Sprott, J. C. (2002). Memory across eye-movements: 1/f dynamic in visual search. Nonlinear Dynamics, Psychology, and Life Sciences, 6, 1–25. Bak, P. (1996). How nature works. New York: Springer-Verlag. Balota, D. A., & Abrams, R. A. (1995). Mental chronometry: Beyond onset latencies in the lexical decision task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 1289–1302. Bargh, J. A., & Chartrand, T. L. (1999). The unbearable automaticity of being. American Psychologist, 54, 462–479. Bassingthwaighte, J. B., Liebovitch, L. S., & West, B. J. (1994). Fractal physiology. New York: Oxford University Press. Bauer, B., & Besner, D. (1997). Mental set as a determinant of processing in the Stroop task. Canadian Journal of Experimental Psychology, 51, 61–68. Besner, D., & Stolz, J. A. (1999a). Unconsciously controlled processing: The Stroop effect reconsidered. Psychonomic Bulletin & Review, 6, 449–455. Besner, D., & Stolz, J. A. (1999b). What kind of attention modulates the Stroop effect? Psychonomic Bulletin & Review, 6, 99–104. Besner, D., Stolz, J. A., & Boutilier, C. (1997). The Stroop effect and the myth of automaticity. Psychonomic Bulletin & Review, 4, 221–225. Bruce, A., & Wallace, D. (1989). Critical point phenomena: Universal physics at large length scales. In P. Davies (Ed.), The new physics (pp. 236–267). New York: Cambridge University Press. Caccia, D. C., Percival, D., Cannon, M. J., Raymond, G., & Bassingthwaighte, J. B. (1997). Analyzing exact fractal time series: Evaluating dispersional analysis and rescaled range methods. Physica A, 246, 609–632. Cannon, M. J., Percival, D. B., Caccia, D. C., Raymond, G. M., & Bassingthwaighte, J. B. (1997). Evaluating scaled windowed variance methods for estimating the Hurst coefficient of time series. Physica A, 241, 606–626. Chen, Y., Ding, M., & Kelso, J. A. S. (1997). Long memory processes (1/fα type) in human coordination. Physical Review Letters, 79, 4501–4504. Chen, Y., Ding, M., & Kelso, J. A. S. (2001). Origins of time errors in human sensorimotor coordination. Journal of Motor Behavior, 33, 3–8. Clayton, K., & Frey, B. B. (1997). Studies of mental “noise.” Nonlinear Dynamics, Psychology, and Life Sciences, 1, 173–180. Depew, D. J., & Weber, B. H. (1997). Darwinism evolving: Systems dynamics and the genealogy of natural selection. Cambridge, MA: MIT Press. Eke, A., Hermán, P., Bassingthwaighte, J. B., Raymond, G. M., Percival, D. B., Cannon, M., et al. (2000). Physiological time series: Distinguishing fractal noises from motions. European Journal of Physiology, 439, 403–415. Fearing, F. (1970). Reflex action: A study in the history of physiological psychology. Cambridge, MA: MIT Press. (Original work published 1930) Gentilucci, M., Benuzzi, F., Bertolani, L., Daprati, E., & Gangitano, M. (2000). Language and motor control. Experimental Brain Research, 133, 468–490. Gibbs, R. W. (1999). Intentions in the experience of meaning. New York: Cambridge University Press. Gibbs, R. W., & Van Orden, G. C. (2001). Mental causation and psychological theory: An essay review of Dynamics in action: Intentional behavior in a complex system. Human Development, 44, 368–374. Gilden, D. L. (1997). Fluctuations in the time required for elementary decisions. Psychological Science, 8, 296–301. Gilden, D. L. (2001). Cognitive emissions of 1/f noise. Psychological Review, 108, 33–56. Gilden, D. L., Thornton, T., & Mallon, M. W. (1995). 1/f noise in human cognition. Science, 267, 1837–1839. Goodwin, B. (1994). How the leopard changed its spots: The evolution of complexity. New York: Scribner. Greve, W. (2001). Traps and gaps in action explanation: Theoretical problems of a psychology of human action. Psychological Review, 108, 435–451.
INTENTIONAL CONTENTS AND SELF-CONTROL
105
Hausdorff, J. M., Purdon, P. L., Peng, C. -K., Ladin, Z., Wei, J. Y., & Goldberger, A. L. (1996). Fractal dynamics of human gait: Stability of long-range correlations in stride interval fluctuations. Journal of Applied Physiology, 80, 1448–1457. Jacoby, L., Levy, B. A., & Steibach, K. (1992). Episodic transfer and automaticity: Integration of data-driven and conceptually-driven processing in rereading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 15–24. Jensen, H. J. (1998). Self-organized criticality. Cambridge, England: Cambridge University Press. Juarrero, A. (1999). Dynamics in action. Cambridge, MA: MIT Press. Kauffman, S. (1995). At home in the universe: The search for laws of self-organization and complexity. New York: Oxford University Press. Kello, C. T., & Plaut, D. C. (2000). Strategic control in word reading: Evidence from speeded responding in the tempo-naming task. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 719–750. Kelly, A., Heathcote, A., Heath, R., & Longstaff, M. (2001). Response time dynamics: Evidence for linear and low-dimensional nonlinear structure in human choice sequences. Quarterly Journal of Experimental Psychology, 54A, 805–840. Kugler, P. N., & Turvey, M. T. (1987). Information, natural law, and the self-assembly of rhythmic movement. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Lakatos, I. (1970). Falsification and the methodology of scientific research programmes. In I. Lakatos & A. Musgrave (Eds.), Criticism and the growth of knowledge (pp. 91–195). London: Cambridge University Press. Lumsden, C. J. (1997). Holism and reduction. In C. Lumsden, W. Brandts, & L. Trainor (Eds.), Physical theory in biology (pp. 17–44). River Edge, NJ: World Scientific. Markman, A. B., & Dietrich, E. (2000). In defense of representation. Cognitive Psychology, 40, 138–171. MacLeod, C. M. (1992). The Stroop task: The “gold standard” of attentional measures. Journal of Experimental Psychology: General, 121, 12–14. Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press. Newell, K. M., & Slifkin, A. B. (1998). The nature of movement variability. In J. P. Piek (Ed.), Motor behavior and human skill: A multidisciplinary approach (pp. 143–160). Champaign, IL: Human Kinetics. Nicolis, G. (1989). Physics of far-from-equilibrium systems and self-organisation. In P. Davies (Ed.), The new physics (pp. 316–347). New York: Cambridge University Press. Pachella, R. G. (1974). The interpretation of reaction time in information processing research. In B. Kantowitz (Ed.), Human information processing: Tutorials in performance and cognition (pp. 41–82). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Pattee, H. H. (1973). The physical basis and origin of hierarchical control theory. In H. Pattee (Ed.), The challenge of complex systems (pp. 73–108). New York: Braziller. Riley, M. A., & Turvey, M. T. (in press). Variability and determinism in elementary behaviors. Journal of Motor Behavior. Schmidt, R. C., Beek, P. J., Treffner, P. J., & Turvey, M. T. (1991). Dynamical substructure of coordinated rhythmic movements. Journal of Experimental Psychology: Human Perception and Performance, 17, 636–651. Science Watch. (1999). American Psychologist, 54, 461–515. Searle, J. R. (1992). The rediscovery of the mind. Cambridge, MA: MIT Press. Shaw, R. E., Kadar, E. E., & Kinsella-Shaw, J. (1994). Modeling systems with intentional dynamics: A lesson from quantum mechanics. In K. Pribram (Ed.), Origins: Brain and self-organization (pp. 53–101). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Shaw, R. E., Kugler, P., & Kinsella-Shaw, J. (1990). Reciprocities of intentional systems. In R. Warren & A. Wertheim (Eds.), Perception and control of self-motion (pp. 579–619). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Shaw, R. E., & Turvey, M. T. (1999). Ecological foundations of cognition: II. Degrees of freedom and conserved quantities in animal-environment systems. In R. Núñez & W. J. Freeman (Eds.), Reclaiming cognition (pp. 111–123). Bowling Green, OH: Imprint Academic.
106
VAN ORDEN AND HOLDEN
Simon, H. A. (1973). The organization of complex systems. In H. H. Pattee (Ed.), Hierarchy theory: The challenge of complex systems (pp. 1–27). New York: Braziller. Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D. E. Rumelhart, J. L. McClelland, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition: Vol. 1. Foundations (pp. 194–281). Cambridge, MA: MIT Press. Sternberg, S. (1969). The discovery of processing stages: Extensions of Donders’ method. Acta Psychologica, 30, 276–315. Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643–661. Tzelgov, J. (1997). Specifying the relations between automaticity and consciousness: A theoretical note. Consciousness and Cognition, 6, 441–451. Uttal, W. R. (2001). The new phrenology: The limits of localizing cognitive processes in the brain. Cambridge, MA: MIT Press. Van Orden, G. C., & Goldinger, S. D. (1994). Interdependence of form and function in cognitive systems explains perception of printed words. Journal of Experimental Psychology: Human Perception and Performance, 20, 1269–1291. Van Orden, G. C., Holden, J. G., Podgornik, M. N., & Aitchison, C. S. (1999). What swimming says about reading: Coordination, context and homophone errors. Ecological Psychology, 11, 45–79. Van Orden, G. C., Holden, J. G., & Turvey, M. T. (2002). Self-organization of cognitive performances. Manuscript submitted for publication. Van Orden, G. C., Jansen op de Haar, M. A., & Bosman, A. M. T. (1997). Complex dynamic systems also predict dissociations, but they do not reduce to autonomous components. Cognitive Neuropsychology, 14, 131–165. Van Orden, G. C., Pennington, B. F., & Stone, G. O. (1990). Word identification in reading and the promise of subsymbolic psycholinguistics. Psychological Review, 97, 488–522. Van Orden, G. C., Pennington, B. F., & Stone, G. O. (2001). What do double dissociations prove? Cognitive Science, 25, 111–172. Velmans, M. (2000). Understanding consciousness. London: Routledge. Vollmer, F. (2001). The control of everyday behaviour. Theory & Psychology, 11, 637–654. Wegner, D. M., & Wheatley, T. (1999). Sources of the experience of will. American Psychologist, 54, 480–492. West, S. G., & Hepworth, J. T. (1991). Statistical issues in the study of temporal data: Daily experiences. Journal of Personality, 59, 609–662. Zelinsky, G. J., & Murphy, G. L. (2000). Synchronizing visual and language processing: An effect of object name length on eye movements. Psychological Science, 11, 125–131.
APPENDIX A dispersion analysis yields the fractal dimension of a trial series. The fractal dimension gauges the change in variability attendant on changing sample sizes. It may indicate whether variability converges fast enough, as sample sizes increase, to yield stable population parameters. If not, then the process that produced the variability is scale free—it has no characteristic “quantity” of variability. This appendix includes some guidelines for computing the fractal dimension of a trial series and a description of the specific analysis of the simple reaction time trial series portrayed in Figure 1a.
INTENTIONAL CONTENTS AND SELF-CONTROL
107
There are several ways to compute fractal dimension, but dispersion techniques are more accurate than other methods (Bassingthwaighte et al., 1994; Caccia, Percival, Cannon, Raymond, & Bassingthwaighte, 1997; Eke et al., 2000). Also, dispersion statistics are computed using means and standard deviations—familiar statistical constructs. To highlight the relation between these techniques and basic statistical theory, we adapted the usual technique of relative dispersion analysis to use normalized data instead of raw data. Relative dispersion analyses typically use the relative dispersion statistic, which is expressed in terms of a ratio of the standard deviation and the mean, that is, RD = SD/M (see Bassingthwaighte et al., 1994). Using normalized data yields dispersion measurements in units of the standard error of the mean. Begin with an experiment that may leave at least 1,024 observations after outliers, and so forth, are removed. These techniques can be applied to shorter data series, but, all other things being equal, fractal dimension estimates become more variable as progressively shorter data series are used (Cannon, Percival, Caccia, Raymond, & Bassingthwaighte, 1997). In addition, the measurements should be collected together as a continuous trial series. A “lined up” series of measurements, which were actually collected across different experimental sessions, distorts the timescale, and a fractal dimension analysis may not accurately characterize the temporal structure of the series. The data in Figure 1 came from a procedure that presented 1,100 simple reaction time trials, which included a healthy 76-trial buffer. Response time tasks usually yield some extremely long (or short) response times. Regardless of whether these outliers result from equipment problems or represent legitimate measurements, they may distort the outcome of the fractal dimension analysis. For the illustrated analysis, we removed simple reaction times greater than 1,000 msec, then computed the series mean and standard deviation, and removed times that fell beyond ±3 standard deviations from the trial series mean. If more than 1,024 measurements remain after trimming, then eliminate initial transients by truncating enough of the early trials to leave 1,024 observations. Trial series that display self-similar fluctuations may be expected to display nonstationary drift at all scales. It can be difficult to distinguish a nested, fractal pattern of long-range fluctuations from long-range trends (Hausdorff et al., 1996), and long-range trends may bias estimates of fractal dimension. They may even overwhelm the fractal dimension analysis, yielding spurious fractal dimension statistics (Caccia et al., 1997; Hausdorff et al., 1996). Consequently, it is prudent to remove linear and quadratic trends (at least) before conducting the analysis. As a general rule, if the trial series has fractal structure, progressively more liberal detrending procedures will not dramatically change the fractal dimension estimate (cf. Hausdorff et al., 1996). In the present example, the fractal dimension estimate was essentially the same whether only linear trends were removed or trends were removed up to a quartic. That being the case, we removed linear and quadratic trends. After that, the trial series was normalized leaving the measurements in units of standard deviation with
108
VAN ORDEN AND HOLDEN
a mean of 0 and a standard deviation of 1. When normalizing the series, and measuring dispersion in the subsequent steps, compute the standard deviation using the population formula (i.e., use n, the number of data points, in the calculation, rather than the usual bias-corrected n – 1). Fractal dimension is calculated as follows: Construct a table, such as Table A1. The standard deviation of the series (SD = 1) estimates the overall dispersion of the series. Begin the table by recording a 1 in both the points-per-bin column and the dispersion column. Essentially, this treats the standard deviation (SD = 1) as a population parameter, and for this initial step, n also equals 1. In the next step, group adjacent pairs of data points into two-point bins. Compute the average of the two points for each bin. The resulting 512 means becomes the new sample of data. Compute the standard deviation for this new sample. Enter a 2 in the first column of the table (two points were averaged to get each mean) and next to it, in the column labeled Standardized Dispersion, enter the standard deviation of the new sample. Repeat the previous step until only two data points remain. The second iteration should yield 256 bins of size 4, the third iteration yields 128 bins of size 8, and so on, until there are only two bins—one containing the first half of the original trial series and one containing the last half. At each step, enter the number of points in each bin, and the standard deviation of the sample means. Next, plot Bin Size and Standardized Dispersion against each other on log scales, as illustrated in Figure 2 (bases other than Base 10 will also work). Typically, the last few dispersion measurements, corresponding to the largest bin sizes, are excluded at this point (Cannon et al., 1997). Excluded points in Figure 2 appear as open circles corresponding to the larger bin sizes; the dispersion statistic for the TABLE A1 Values That Come From the Iterative Procedure Used to Calculate the Fractal Dimension of the Trial Series Portrayed in Figure 1 Bin Size 1 2 4 8 16 32 64 128 256 512
Standardized Dispersion
Log 10 Bin Size
1.0 0.84 0.71 0.58 0.48 0.36 0.29 0.19 0.14 0.05
0.0 0.30 0.60 0.90 1.20 1.51 1.81 2.11 2.41 2.17
Log 10 Standardized Dispersion 0.0 –0.07 –0.15 –0.24 –0.32 –0.44 –0.54 –0.72 –0.85 –1.30
Note. Bin size is simply the number of points per bin at each iteration of the procedure. The underlined values printed in the two rightmost columns correspond to the filled circles of the graph in Figure 2; the remaining values in these rightmost columns correspond to open circles.
INTENTIONAL CONTENTS AND SELF-CONTROL
109
largest 512-point bins fell outside the axis limits and does not appear on the plot in Figure 2. The large bin sizes are so close to the size of the full data set that their variability estimates do not differ appreciably. Natural fractals are “truncated” by their finite range of scales, so the linear relation breaks down for the largest bin sizes (or sometimes for the smallest, or both). In the standardized series, they approach zero (and negative infinity when the log transformation is performed) and bias the slope of the regression line (for additional refinements of this technique, especially for shorter data sets, see Caccia et al., 1997 ). If a linear relation exists between bin size and the standardized dispersion statistic (on log-log scales), then the trial series may be a simple fractal. The illustrated linear relation is a power-law scaling relation. The slope of the regression line is –.30, and the fractal dimension is 1.30, given by subtracting the slope from 1.