ATTRACTION, DISTRACTION AND ACTION Multiple Perspectives on Attentional Capture
ADVANCES IN PSYCHOLOGY 133 Editor:
G. E. STELMACH
ELSEVIER Amsterdam - London -New York -Oxford - Paris -Shannon
- Singapore -Tokyo
ATTRACTION, DISTRACTION AND ACTION Multiple Perspectives on Attentional Capture
Edited b y
Charles L. FOLK Department of Psychology Villanova University Villanova,PA, U.S.A.
Bradley S. GIBSON Department of Psychology University of Notre Dame Notre Dame, IN, U.S.A.
200 1
ELSEVIER Amsterdam-London-NewYork-Oxford-Paris
-Shannon-Tokyo
ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box21 I , l000AEAmsterdam,TheNetherlands 0 2001 Elsevier Science B.V. All rights reserved.
This work is protected under copyright by Elsevier Science, and the following terms and conditions apply to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use Permissions may be sought directly from Elsevier Science Global Rights Department, PO Box 800, Oxford OX5 IDX, UK; phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected] may also contact Global Rights directly through Elsevier’s home page (http://www.elsevier.nl), by selecting ‘Obtaining Permissions’.
In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (+1) (978) 7508400, fax (+1) (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London W 1 P OLP, UK; phone (+44) 207 63 1 5555; fax: (+44) 207 63 I 5500. Other countries may have a local reprographic rights agency forpayments. Derivative works Tables of contents may be reproduced for internal circulation, but permission of Elsevier Science is required for external resale or distribution of such material. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier Science Global Rights Department, at the mail, fax and e-mail addresses noted above Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liabili ty, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made.
First edition 2001 Library of Congress Cataloging in Publication Data A catalog record from the Library of Congress has been applied for.
ISBN: 0 444 50676 4 ISSN: 0166-41 15 (Series)
@ The paper used in this publication meets the requirements ofANSI/NISO 239.48- 1992 (Permanence of Paper). Printed in The Netherlands.
Preface
The notion that certain mental or physical events can capture attention has strong intuitive appeal. Such intuitions are typically based on experiences in which an irrelevant event summons or attracts attention away from the demands of a current task. Although this apparent vulnerability to extemal distraction can, in some situations, be detrimental to the mental and physical health of the organism (as when the distracting event causes us to have an automobile accident), it may also be beneficial to the organism in situations where adaptation to important environmental change is required (as when the distracting event is itself potentially harmful and should therefore be avoided). Because attentional capture can have profound consequences both positive and negative) for mental and physical action, it is necessary to go beyond a simple intuitive understanding of this complex behavior. Indeed, scientific interest in attentional capture has grown exponentially over the last 10 years. A good part of this interest stems from the fact that modeling attentional capture has the potential to provide fundamental insights into the nature of cognitive control in general. More specifically, attentional capture provides an important empirical domain for modeling the interaction between "automatic" and "controlled" processing. However, a broad survey of this field suggests that the term "capture" means different things to different people. In some cases, it refers to shifts of spatial attention, in others involuntary saccades, and in still others general distraction by irrelevant stimuli. The properties that elicit "capture" can also range from abrupt flashes of light, to unexpected tones, to semantic novelty, to reoccurring thoughts. There also appear to be a number of different theoretical perspectives on the mechanisms underlying "capture" (both functional and neurophysiological) and the level of cognitive control over capture. Thus, the study of attentional capture appears to be at somewhat of a crossroad. Although there is growing interest in the phenomenon, and general agreement as to its practical and theoretical importance, there is also a growing diversity of empirical findings, theoretical perspectives, and experimental approaches. We believe that at this crossroad, it is critical to pause and attempt to reach some consensus on the existing state of research on attentional capture, and to chart new directions for future research on this important topic. However, given the diversity of experimental approaches to attentional capture, there is currently no forum for bringing together researchers to accomplish these goals. Existing conferences, such as Psychonomics, ARVO, Neuroscience, SRCD, Cognitive Aging, etc., rarely attract all the relevant researchers. Therefore, the first conference and workshop devoted exclusively to the study of attentional capture was held on June 3-4, 2000 at Villanova University. Over twenty-five researchers from a variety of different theoretical and methodological perspectives participated. The express purpose of the conference
vi was twofold: The first purpose was to provide a forum for researchers to present their latest empirical findings and theoretical developments; the second purpose was to engage in structured discussions concerning such fundamental issues as the definition of attentional capture, behavioral manifestations of attentional capture, and the measurement of attentional capture. By far, the issue of how to define attentional capture generated the most extensive discussions, with no clear consensus emerging. (Indeed, one of the discussion group leaders described his role as akin to "herding cats.") Nonetheless, although many of the fundamental issues remained unresolved, the interdisciplinary nature of the conference resulted in an exciting exchange of ideas and theory, many of which are represented in the following chapters. The present volume is organized into six different topic areas, or "perspectives." Each chapter reflects either cutting-edge research or state-of-the-art reviews of specific content areas. The Neuroscience section contains chapters that explore the biological underpinnings of attentional capture. The Visual Cognition section explores the theoretical boundaries of attentional capture within the visual domain, with particular emphasis on the debate regarding the degree of top-down control over attentional capture. The Multiple Modalities section extends the phenomenon of attentional capture to other modalities besides vision, including work on pre-pulse inhibition, auditory attention, and cross-modal interactions. The Developmental section addresses how attentional capture varies across the life span; whereas, the Individual Differences section addresses how attentional capture varies across individuals at similar stages of development. And, finally, the Dynamical Systems/Evolution section addresses the function of attentional capture from a broad, evolutionary perspective. We owe a debt of gratitude to the National Science Foundation and Villanova University for providing generous funding for this project. We would especially like to thank Helene Intraub at NSF for her encouragement and support. We also like to thank all those who participated in the Villanova Capture Conference, including presenters as well as those who participated in the workshop discussions; collectively, you made it an unqualified success. Finally, we are thank Clare Gideon for her clerical assistance in preparing the manuscript on which this volume is based. Charles L. Folk Bradley S. Gibson
vii
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v .ix
Part I. Neuroscience Electrophysiological studies of reflexive attention Joseph B. Hopfinger and George R. Mangun. . . . . . . . . . . . . . . . . . . . . . . .3 2. Inhibition of return in monkey and man Raymond M. Klein, Douglas P. Munoz, Michael C. Dorris, and Tracy L. Taylor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.
Part 11. Visual Cognition Inattentional blindness and attentional capture: Evidence for attention-based theories of visual salience Bradley S. Gibson and Mary A. Peterson . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1 4. Involuntary orienting to flashing distractors in delayed search? HaroldPashler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5 . Attentional capture in the Spatial and Temporal Domains Howard E. Egeth, Charles L. Folk, Andrew B. Leber, Takehiko Nakuyama, andSharmaK.Hende1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6. Attentional and oculomotor capture Jan Theeuwes and Richard Godijn .............................. 121 7. Attention capture, orienting, and awareness Steven B. Most and Daniel J. Simons . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 1 3.
Part 111. Multiple Modalities
8. Using pre-pulse inhibition to study attentional capture: A warning about pre-pulse correlations J. Toby Mordkoffand Hilary Barth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ,177 9. Temporal expectancies, capture, and timing in auditory sequences Mari Riess Jones. ........................................... 191 10. Crossmodal attentional capture: A controversy resolved? Charlesspence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
viii Part IV. Developmental 11. Testing models of attentional capture during early infancy James L. Dannemiller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .265 12. Attentional capture, attentional control, and aging Arthur F. Kramer, Charles T. Scialfa, Matthew S. Peterson, and David E. Irwin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Part V. Individual Differences 13. A multidisciplinary perspective on attentional control Douglas Derryberry and Majorie A. Reed. . . . . . . . . . . . . . . . . . . . . . . . .325 14. Capacity, control and conflict: An individual differences perspective on attentional capture Andrew R. A. Conway and Michael J. Kane ...................... .349 Part VI. Dynamical Systems/Evolution
15. A dynamic, evolutionary perspective on attention capture William A . Johnston and David L. Stayer ........................ Subjectindex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.375 399
ix
Contributors
Hilary Barth, Massachusetts Institute of Technology Andrew R. A. Conway, University of Illinois at Chicago James L. Dannemiller, University of Wisconsin at Madison Douglas Derryberry, Oregon State University Michael C. Dorris, New York University Howard E. Egeth, Johns Hopkins University Charles L. Folk, Villanova University Bradley S. Gibson, University of Notre Dame Richard Godijn, Vrije Universiteit Sharma K. Hendel, Johns Hopkins University Joseph B. Hopfinger, University of North Carolina at Chapel Hill David E. Irwin, University of Illinois at Urbana-Champaign William A. Johnston, University of Utah Michael J. Kane, University of North Carolina at Greensboro Raymond M. Klein, Dalhousie University Arthur F. Kramer, University of Illinois at Urbana-Champaign Andrew B. Leber, Johns Hopkins University George R. Mangun, Duke University J. Toby Mordkoff, Pennsylvania State University Steven B. Most, Harvard University Douglas P. Munoz, Queens University Takehiko Nakayama, Johns Hopkins University Harold Pashler, University of California at San Diego Mary A. Peterson, University of Arizona Matthew S. Peterson, University of Illinois at Urbana-Champaign Marjorie A. Reed, Oregon State University Mari Riess Jones, Ohio State University Charles T. Scialfa, University of Calgary Daniel J. Simons, Harvard University Charles Spence, University of Oxford David L. Stayer, University of Utah Tracy L. Taylor, Dalhousie University Jan Theeuwes, Vrije Universiteit
This Page Intentionally Left Blank
Part I Neuroscience
This Page Intentionally Left Blank
Attraction, Distraction, and Action: Multiple Perspectives on Attentional Capture C. Folk and B. Gibson (Editors) 9 ElsevierScienceB. V. All rights reserved.
Electrophysiological Studies of Reflexive Attention Joseph B. Hopfinger and George R. Mangun
Models of human cognition hold that information processing occurs in a series of stages. Cognitive psychology, in particular, is concerned with the internal mental processes that begin with the appearance of an external stimulus and result in a behavioral response. An enduring question has focused on determining the stage or stages of information processing at which attention might have an influence. Measures of overt behavior have long been used to make inferences about the internal mental mechanisms of attention. Increasingly though, physiological measures of human brain activity have been used to provide direct measures of discrete stages of information processing during attentional performance. In this chapter, we briefly review the event-related potential (ERP) approach to the study of attention, and present recent results utilizing this methodology in the study of reflexive attentional capture. These experiments have revealed that reflexive attention is able to influence multiple stages of information processing beginning at a relatively early stage of visual cortical analysis.
Background Tracking information processing in the brain: Electrophysiological methods The development of electrophysiological recording techniques dates back to the early 1930's, when Hans Berger and Herbert Jasper developed techniques that would later be used to directly examine the neural mechanisms of the human brain's attention systems (Jasper, 1935). By recording from electrodes placed on a human subject's scalp, they were able to measure small voltage fluctuations that reflected underlying neural activity. The recording of the ongoing voltage variations measured on the scalp is known as the electroencephalogram (EEG) and is now known to be primarily a measure of the post-synaptic (dendritic) potentials from populations of synchronously active and aligned neurons (see Nunez, 1981 for a more comprehensive discussion). Early electrophysiologists analyzed large rhythmic fluctuations in the EEG (e.g., "alpha" waves) that could index overall states of arousal (e.g., Jasper, 1935). Although the ongoing EEG can provide a measure of the subject's global brain state, it is not as well suited for identifying patterns of brain activity associated with specific types of stimulus processing or specific mental functions. This is due to the fact that the larger rhythmic potentials
4
Hopfinger and Mangun
of the ongoing EEG may be several times larger in amplitude than the relatively small fluctuations produced by neural activity supporting individual mental events. The EEG reflects processes occurring throughout the brain related to a host of mental activities, as well as voltage fluctuation that are not due to brain activity (e.g., "artifacts" generated by muscles on the head or neck). As a result, the neural activity generated by a specific mental event of interest can be difficult or impossible to observe in the ongoing EEG. The voltage fluctuations produced by particular events of interest can, however, be detected using signal averaging procedures. For example, the neural activity produced by a specific visual stimulus can be measured if the ongoing EEG is averaged over multiple occurrences of that specific visual event (Figure 1). Epochs of time surrounding the visual event of interest can be extracted from the EEG record and averaged together, after aligning the onset of the visual stimulus for each epoch. The voltage amplitude can then be averaged at each timepoint separately, resulting in a single event-related-potential (ERP) waveform. The ERP thus represents the response to a specific event, timelocked to the onset of that event. The averaging process effectively cancels out the electrical activity in the EEG that is not time-locked to the stimulus event of interest. This occurs because on average over many trials, the uncorrelated activity is just as likely to be of positive or negative polarity at any post-stimulus time point. Given a sufficient number of trials, the averaging process leaves only the activity evoked by the event of interest. Electroencephalogram (EEG)
=o,v r /
Signal Averager
[ NnO A
I I
I J
I I
I
o
r
,vL
Visual Event.Related (ERP)
!'
,
'
S1
S2
S3
.t..t..t.
I
' . .
S = onset of visual stimulus
.t, Sn
5o0 mssc
I
~Sz 9 ~ . e .
I
Sn ~t, ~
n
~ Sk k=l n
Potential
N2
_.J fl n
"201JV
"--r l/V -lt00
0
1~)0
2~0
,v
300
t OnsetofVisualstimulus
400 mlec
Figure 1. At left is shown an example of the scalp recorded electroencephalogram (EEG), recorded continuously while the event of interest (in this case a visual stimulus: S) is presented multiple times. Epochs of the EEG surrounding the onset of the visual event are extracted, aligned according to the onset time of the event of interest, and then averaged point by point (middle column). The resulting average is referred to as the Event-Related Potential (ERP; right column). Note that the amplitude of the ERP is much less than that of the EEG, a typical situation that necessitates the averaging procedure.
A canonical ERP waveform consists of a series of voltage fluctuations, representing positive and negative potentials generated by the event of interest. As shown in Figure 1, the voltage fluctuations are typically labeled according to: (1) polarity ("P"ositive, or "N"egative; note that the convention followed here plots positive voltages downward); and (2) order of occurrence (PI=I st major exclusively-
Electrophysiological Studies
5
positive component) or latency of occurrence (e.g., the peak of the NP80 component, which can be negative or positive, depending on the location of the visual stimulus, occurs at approximately 80 ms latency). The "prestimulus" period represents the activity time-locked to the event of interest that occurs before the stimulus appears. Since the event of interest has not yet occurred, under most circumstances, there should be no systematic activity before the onset of the event of interest. Therefore, this period can be used as a measure o f the effectiveness of the averaging procedure in eliminating activity that is not due to the event of interest. Using ERPs, it is possible to measure neural activity from the moment in time a stimulus is presented, through multiple levels of processing, up to and including response execution. ERP components can be related to hypothesized stages of mental processing (indicated schematically in Figure 2). Although much work remains to be done in order to understand the specific mental functions subserved by each particular ERP component, many of these components can at least be classified as underlying simple sensory processes or higher order cognitive processes. The ability to track mental processing in real time has proven very useful in helping to elucidate the stage(s) of processing that attention may act upon to modify mental processing.
Figure 2. Shown at the top are a few of the many hypothesized stages of information processing that intercede between the initial presentation of a physical stimulus and an eventual response to that stimulus. At bottom, an ERP waveform is shown approximately aligned with the hypothesized stages of processing. The ERP waveform shown here is only for illustration purposes - the components shown here are typically observed at different scalp sites; not all would be observed at the same scalp location. In this chapter, we will focus mainly upon the sensory P1 component, and on the P300 component that indexes post-sensory higher-order cognitive processing.
6
Hopfinger and Mangun
Early versus late selection
A classic debate in psychology has concerned the nature of our ability to filter out unwanted information. Specifically, the debate concerns the level of information processing at which relevant information is selected. One possibility is that this selection process occurs only just before a response must be made. This would be the extreme version of the "late-selection" argument that holds that all information received by the senses is fully processed to the level of semantic meaning (e.g., Deutsch & Deutsch, 1963). Accordingly, all sensory inputs would be completely processed, and selection would involve choosing to respond to one of several completely processed inputs (e.g., Allport et al., 1985). Alternatively, as suggested by "early-selection" theories, information may get filtered out well before it is ever processed to a level of semantic meaning. Broadbent (1958) argued that selective attention acts as a gate that allows only the desired information to proceed to higher-order processing, while keeping out all irrelevant information. Treisman (1960) argued along less extreme lines that attention acts to attenuate, rather than completely filter out, the processing of unattended inputs. Eason, Harter, and White (1969) used the ERP technique to show that alertness and attention could affect pre-decision level neuronal processing. Specifically they showed that attentional alertness could alter neural processing of a visual stimulus as quickly as 200 ms after the presentation of a stimulus. Van Voorhis and Hillyard (1977) showed that covert (in the absence of any overt eye movements) visual selective attention could enhance visual processing starting within about 100 ms after stimulus presentation. Further investigations have shown that the P 1, a positive deflection in the visual evoked ERP that peaks around 90-110 ms latency and is maximal at posterior occipital scalp sites, is the earliest visually evoked component to be reliably affected by spatial attention (e.g., Clark & Hillyard, 1996; Luck, Hillyard, Mouloua, Woldorff, Clark, & Hawkins, 1994; Mangun & Hillyard, 1988; 1990, 1991). The P 1 component is referred to as a visual sensory component, in that it is evoked by visual stimuli and is sensitive to physical features of the stimulus. Scalp current density mapping and dipole modeling of scalp recorded electrical activity in attention studies have suggested that these P1 attention effects, produced by voluntary spatial selective attention, are generated in lateral extrastriate cortex (Gomez Gonzalez, Clark, Fan, Luck, & Hillyard, 1994; Mangun, Hillyard, & Luck, 1993). Combined ERP and functional neuroimaging studies have provided further evidence that the P 1 is generated in the fusiform gyrus of extrastriate cortex in humans (Heinze et al., 1994; Mangun, Hopfinger, Kussmaul, Fletcher, & Heinze, 1997; Woldorff et al., 1997). Many investigations, using multiple disciplines, have thereby converged on the conclusion that voluntary attention can affect neural processing at relatively early levels. However, there are components of the visual ERP that occur earlier than the P1 that are not reliably modulated by selective voluntary attention. The NP80 component, thought to be generated by activity in the striate cortex (area V1), has not been found to be reliably affected by voluntary selective spatial attention (e.g., Clark & Hillyard,
Electrophysiological Studies
7
1996). Although some neuroimaging and non-human primate studies have provided evidence for attention-related modulations in the striate visual cortex (e.g., Worden & Schneider, 1996; Motter, 1993), a recent combined neuroimaging and ERP study found that the modulation of activity in striate cortex was related to processing that occurred after the NP80 component (Martinez et al., 1999). This result suggests that modulations of striate cortex happen via feedback pathways, after the initial sensory processing in that region (indexed by the NP80) has completed without being influenced by voluntary spatial attention. While previous research has thus been able to identify the precise stages of processing at which voluntary attention can and cannot affect visual processing, much less research has been devoted to understanding the stage(s) of processing affected by reflexive attentional capture. Finally, recent theories of attention suggest that attentional selection cannot adequately be described as either simply early or late (see Pashler, 1998 for a comprehensive discussion). For example, Lavie and Tsal (1994) provided evidence that task difficulty plays a significant role in determining whether behavioral measures show evidence for early or for late selection. Specifically, under high levels of perceptual load, voluntary attention has been shown to act as an early filter, as all available resources are consumed by the difficult task, and unattended information is not completely processed. Under low levels of perceptual load however, attentional resources exceed what is needed to perform the easy perceptual task, and attention may act only at a later stage of processing. Handy and Mangun (2000) recently demonstrated an enhancement of the P1 by voluntary attention under conditions of high perceptual load, and no modulation of the P1 under conditions of low perceptual load. Finally, Lavie (2000) has suggested that in addition to the perceptual difficulty of the task (perceptual load), cognitive load (e.g., working memory resources; task coordination) may also play a significant role in the control of attention. Reflexive versus voluntary attention
Despite the fact that both voluntary and reflexive attention influence the focus of our "mind's eye," evidence supports a strong distinction between these two attention systems. For instance, compared to voluntary attention, reflexive attention is engaged more rapidly, is more resistant to interference, and dissipates more quickly (e.g., Cheal & Lyon, 1991; Jonides, 1981; Mtiller & Rabbitt, 1989; Posner, Nissen, & Ogden, 1978). In addition, the effects of reflexive attention change significantly as time passes after an attention-capturing event (i.e., a non-predictive exogenous "cue"), whereas voluntary attention is more stable over time. Unlike voluntary attention, reflexive attention results in a biphasic effect on response times. Specifically, the initial facilitation that follows reflexive attentional capture is followed by a period in which items at the cued location are actually responded to more slowly (i.e., Inhibition of Return- IOR, Posner et al., 1985; Posner & Cohen, 1984). In addition, neuropsychological studies indicate that reflexive attention may
8
Hopfinger and Mangun
be controlled by partially or wholly separate neural mechanisms from those involved in voluntary attention (see Rafal, 1996, for review). While there has been an abundance of research into the neural mechanisms of voluntary attention (including work in both humans and non-human animals), relatively little is known about the neural consequences of reflexive attention. In part, this may reflect an implicit assumption that the reflexive system is somehow more basic than the voluntary system, and that voluntary attention works through the same mechanisms as reflexive attention, merely adding on higher-order control mechanisms. As Briand (1998) points out, however, reflexive and voluntary attention have been shown to have distinct properties and qualitative differences, which make such assumptions tenuous. For example, evidence suggests that reflexive attention performs a role in feature integration, while voluntary attention alone does not (Briand, 1998; Briand & Klein, 1987). Therefore, reflexive attention mechanisms cannot be completely understood on the basis of inferences drawn from the results of voluntary attention studies. The relative lack of neurophysiological studies of reflexive attention may also be due in part to the difficulty of attributing neural activity to specific events when those events (e.g., reflexive cue and target) occur very closely in time, as is typical in studies of reflexive attention. Specifically, two or more events occurring in a short period of time may produce partially overlapping pattems of neural activity recorded at the scalp (and in neuronal recordings in animals). Certain precautions therefore need to be taken to ensure that electrophysiological recordings will not be contaminated by the overlapping activities. If successive events are separated by only a brief interval, and if the interval is constant across all trials, it is not possible to completely differentiate the event-related activity from the two events. This is because any activity time-locked to the second event will also be time-locked to the first event, since the second event itself is perfectly time-locked to the first event. If, on the other hand, the interval between events can be varied over a range of time, then it may be possible, via signal analysis methods, to obtain distinct ERPs for both events. This procedure of randomly varying the interstimulus intervals can be quite effective if (1) the range of ISI variation is larger than the period of the slowest component of interest, and (2) if there is a sufficiently long interval between the events, such that the processing of one event finishes before the processing of the next begins. However, in studies of reflexive attention, this second requirement often cannot be met, due to the transient nature of reflexive attentional capture, which requires very short interstimulus intervals to be used. Therefore, even with a randomly varying interval, the average ERPs from both events still contain some overlapping activity, because the events generating them (i.e., reflexive cues and subsequent targets), are only a few tens of milliseconds offset from one another. Within the past decade, however, advances in signal processing techniques have provided scientists with the tools (e.g., the adjacent response filter of Woldorff, 1993) to dissociate overlapping pattems of brain activity, allowing the investigation of neural activity related to specific events occurring with short interstimulus intervals. As described previously (Woldorff, 1993), this procedure
Electrophysiological Studies
9
estimates overlapping activity by convolving the recorded ERP waveforms with the actual distribution of the interstimulus intervals. For example, the overlap from the cue processing onto the target ERP can be estimated by convolving the cue ERP with the event distribution specifying the interstimulus intervals at which it preceded the targets. This estimate of overlap from the cue may then be subtracted from the recorded ERP to the target, providing a better estimate of the target related activity. This better estimate of the target activity can then be convolved with the interstimulus interval distribution to provide an estimate of overlap from the target onto the preceding cue ERP. This procedure is then iterated using the new estimates of the cue and target waveforms until a stable solution is arrived at (when successive iterations no longer produce any differences in the estimates). The following experiments investigated the effects that reflexive attentional capture has on subsequent visual processing: specifically, whether reflexive attention can modulate neural processing within early sensory processing stages, as early as does voluntary attention. These experiments examined processing at both short interstimulus intervals and at longer interstimulus intervals to investigate the early facilitatory effects of reflexive attention as well as the later inhibitory effect (IOR). Finally, across the experiments, we were able to examine the effects of reflexive attention within different tasks in order to examine whether simple task changes would affect the "automatic" effects of reflexive attention. The Effects of Reflexive Attentional Capture on Visual Processing: ERP Studies Part 1: Reflexive attention in a difficult discrimination task
Recently, we investigated the effects of reflexively oriented attention on visual processing by measuring neural activity in human subjects using the ERP method (Hopfinger & Mangun, 1998). This study used a paradigm that was known to produce a reflexive shift of attention, in order to investigate the effects that attentional capture has on the processing of subsequent visual events (i.e., events occurring after attention has been captured by a brief visual transient - the reflexive cue). Similar to the early versus late selection debate discussed above, this study was motivated in part by the question of whether reflexive attention would be able to affect visual processing as early as does voluntary attention. ERPs were recorded from human subjects while they performed a discrimination task in which non-predictive "cues" preceded each target stimulus. Subjects maintained fixation upon a centrally located cross on a computer monitor throughout all trials (see Figure 3). On either side of fixation, four small white dots demarcated the comers of an imaginary rectangle 1.03 degrees wide and 1.37 degrees tall. The center of each imaginary rectangle was located 1.5 degrees above and 6.4 degrees lateral to fixation. The beginning of each trial commenced with the four dots on one side of fixation (equally probable on the left or right of fixation)
10
Hopfinger and Mangun
Figure 3. Discrimination Experiment. Example of stimulus display showing a trial with a target occurring at a cued location (left column) and a trial where this target is occurring at an uncued location (fight column). The "cue" was a 34-msec offset and then re-appearance of the 4 dots on one side of fixation. The cue-to-target inter-stimulus-interval (ISI) was randomly varied over a short (34-234 msec) or long (566-766 msec) interval. The target was a vertical bar presented for 50 msec. The participants' task was to judge whether the bar was the "tall" or "short" bar, and press the appropriate button as quickly as possible.
being extinguished for 34 ms and then reappearing, giving the subjective impression of a blinking of one set of dots. This "cue" was used in order to minimize neuronal
Electrophysiological Studies
11
refractory effects and overlap from the cue ERP onto the target-evoked ERP, while still producing an effective sensory cue. Subjects were informed that the cue would be completely non-predictive of the location of the subsequent target, as described below. After a variable interval (ranging randomly from either 34-234 or 566-766 ms; rectangular distribution within each range), a vertical target bar was flashed to one side of fixation, centered between the dots on that side. The location of the target bar was equally probable on the right or left of fixation and was equally likely to be at the same versus the opposite hemifield location as the preceding cue. The target remained on the screen for 50 ms, and was either a short (1.8 deg by .69 deg) or tall (2.3 deg by .69 deg) vertical bar. Subjects performed a height discrimination judgment in which they were required to rapidly press one button for short bars or a different button for tall bars. The intertrial interval was varied randomly between 1500 - 2000 ms, and each block consisted of 40 trials wherein short (34-234) and long (566-766 ms) cue-to-target intervals (interstimulus interval, ISI) were randomly intermixed. Each block was 90-100 seconds long and each subject performed 80 blocks, 40 on each of 2 separate testing days. Catch trials, during which no target appeared, accounted for 20% of the trials and were included in order to reduce the likelihood of subjects forming temporal expectancies and to prevent anticipatory responses. Data from 8 healthy, right-handed, volunteers (4 female), ages 18-30, with normal or corrected-to-normal vision, were analyzed. Although the cue was subtle and did not overlap on the retina with the target, the scalp-recorded neural responses to the cue still overlapped with the responses recorded to the target stimulus, especially at the shortest cue-to-target intervals. In order to eliminate the possibility that any differences in early ERP components might be due to overlapping neural activity produced by the cues, the adjacent response (Adjar) filter method (Woldorff, 1993) was employed to remove confounding potentials generated by the lateralized cues. As described briefly earlier, this procedure iteratively estimates and subtracts the overlap from adjacent events (i.e., cue and target) until the estimates of the cue and target overlap do not change over successive iterations, at which point overlap is considered to have been removed from the original waveforms (see Hopfinger & Mangun, 1998, for more details on how this procedure was applied to the present data). Physiological measures were gathered by recording from 64 electrodes distributed over the scalp of each volunteer. In agreement with prior reaction time (RT) studies using non-predictive peripheral visual transients (Jonides, 1981; Miller, 1989; Mtiller & Rabbitt, 1989; Theeuwes, 1991), subjects in the present experiment responded reliably faster to targets at the cued location versus the uncued location (517 ms versus 533 ms, respectively) for the short cue-to-target intervals (main effect of cueing F(1,7)=24.41, p.35). 4000 ..O
3800
,........
3600 A
.... ..'''"
.... .,'"
........... O
O'"'"""
3400 3200
I-n,' c
3000 2800 2600 2400 2200 2000
' No Distractor
i Static Distractor
,
i Flashing Distractor
Distractor Condition Figure 4. Mean correct RTs (in ms) in Experiment 1 (same green digits flash on and off) as a function of distractor condition and target presence/absence.
Table 1. Mean percent errors in Experiments 1 and 2 as a function of target presence/absence and distractor condition.
Exp. 1 Target Present Exp. 1 Target Absent Exp. 2 Target Present Exp. 2 Target Absent
No Distractor
Static Distractor
Flashing Distractor
16.4
19.1
17.9
1.8
1.9
2.5
19.5
17.2
18.4
2.0
1.7
2.1
84
Pashler Experiment 2
Using visual search designs, Yantis and colleagues have found that visual transient signals that do not signal the appearance of new objects generally do not produce involuntary orienting (Yantis & Hillstrom, 1994; see Yantis, 2000, for a review). It is not clear whether or not the reappearance of the green distractor digits in the displays used in Experiment 1 should be regarded as signaling the appearance of new objects. To see whether the flashing of the distractors would remain harmless (and indeed, helpful) even when new objects appeared, Experiment 2 was conducted with one change: when the green digits reappeared every 400 ms, each digit was replaced with a new (usually different) digit in the same location.
Method
Subjects. Forty two UCSD undergraduates (9 male) participated in partial fulfillment of a course requirement. The Apparatus, Procedure and Design were exactly as in Experiment 1. The only difference was that in the flashing-distractor condition, every 400 ms a new randomly chosen set of green digits flashed on. Results No subjects had overall error rates in excess of 25%, and RTs were trimmed as in Experiment 1. Figure 5 shows the mean correct reaction times for target-present and target-absent trials in the three conditions. The effect of distractor condition was significant, F(2,82)-22.5, p .80, nor the interaction between cue type and validity were significant, F(4, 48) = 1.88, p > .12. In the absence of both a main effect of cue type and an interaction between cue type and validity, contrasts were analyzed collapsed across the two cue types. The increase in accuracy on valid as compared to null trials was significant, F(1, 12) = 26.11, p < .001, as was the decrease for horizontal trials relative to the null cue, F(1, 12) = 7.05, p < .05. However, the differences between the vertical and null cues and the diagonal and null cues did not reach significance, F(1, 12) = 3.06, p > .10, and F(1, 12) = 1.63, p > .22, respectively. In Hendel's Experiment 3b, both the red cue and the orientation cue led to shifts of attention. This result is expected on the contingent capture model assuming subjects are in singleton detection mode. It is also expected on the salience model, if we can assume that both the color and the orientation cue displays contain a salient discontinuity. However, the results of Hendel's Experiment 3a suggest that salience alone cannot explain the observed pattern of validity effects. In that study, significant validity effects were obtained only when the target-defining feature and the cue-defining feature were the same (red), suggesting that observers maintained a set for red stimuli. Taken as a whole, Hendel's Experiments 3a and 3b are more consistent with the contingent capture hypothesis than the salience hypothesis for the following reasons. As we have argued, the salience hypothesis predicts that subjects will shift their attention to the most salient element in the cue display. In each of these experiments, on any given cue display, there is only one salient difference -the cue. Consequently, it was expected that a shift of attention to the most salient element would always be to the cue, leading to a cue-validity effect for both cues in Experiments 3a and 3b. As no validity effect was observed with the orientation cue in Experiment 3a, the predicted outcome for the salience hypothesis was not obtained. Nonetheless, a different interpretation of these data may be entertained. Theeuwes (1994) argued that the typical brief presentation of stimuli in the cueing paradigm may lead subjects to integrate the cue and the target displays. Thus attention would be likely to go to the most salient element in the entire trial
Spatial and TemporalCapture
| 01
If this were the case, then the salience hypothesis could be interpreted to predict that attention would be likely to shift to the target, rather than the cue, if the target were more salient than the cue. In Experiment 3a, it is possible, then, that the target -- a red L among white Ls -- was more salient than the oriented bar cue presented on half of the trials. The red bar cue, on the other hand, may have been e v e n m o r e salient than the red target, leading to the validity effects observed for the red cue. Note that this interpretation requires that we assume that the red bar cue was the most salient element in the trial sequence, followed by the red target L. The orientation cue is assumed to be less salient than the red cue and the red target. This interpretation would be ruled out if one could find a target task in which the orientation cue leads to attentional capture but the color cue does not. This was the rationale for Hendel's Experiment 4. In Experiment 4 the cue types used previously (a color singleton and an orientation singleton) were used in conjunction with a new target task in which 24 subjects were set for a stimulus of a particular orientation. Subjects had to detect the presence of a Q-like character among Os. The "stem" of the Q was a horizontal bar on either the left or the right of the target. Subjects were asked to report whether the stem was on the left or right side of the character. The rationale was that this task would lead subjects to set their attention for horizontal bars. The experiment was similar to the preceding ones except that the orientation cue was now always a matrix of vertical bars with a single horizontal bar; as before the color cue consisted of a single red vertical bar among white vertical bars. The results are shown in Figure 3. The crucial result is that contrast analysis showed that for color cues there were no reliable differences between the null cue and any of the other cues (all p values >. 15) However, the orientation cue led to a reliable improvement in the valid cue condition compared to the null cue condition (p .15] (and the adjacent-cells test, albeit unjustified, also revealed no pair-wise differences). Finally, the startle peak latencies across the high- vs low-correlation conditions were not different in the control condition [t(7.50) - 1.55, p > .15], nor in the -50-ms condition [t(6.75) < 1], but did differ at all of the positive SOAs [+50 ms: t(6.45) = 4.37, p < .005; +150 ms: t(6.70) = 3.84, p < .01; +350 ms: t(6.21) = 5.33,p < .005; and +750 ms: t(5.75) = 6.61,p < .005]. Discussion In general, the data from the high-correlation condition replicated previous studies of pre-pulse inhibition that have used this sort of experiment design (e.g.,
184
Mordkoff and Barth
Stitt et al., 1980). In particular, pre-pulse inhibition waxed and waned over the SOA range of +50 to +750 ms, reaching a maximum at an SOA of +150 ms, and there was weak (and, in this case, statistically insignificant) augmentation at an SOA o f - 5 0 ms. Also replicating and extending some previous work (e.g., Graham & Murray, 1977), pre-pulse tones reduced startle peak latency in the high-correlation condition, but only when they preceded the startle stimulus. In contrast, the data from the low-correlation condition show a very different pattern. With regards to pre-pulse inhibition, the maximum effect was here much smaller (25%, as opposed to 60%), started and peaked sooner in terms of SOA (at +50 ms, as opposed to at +150 ms), and faded in half the time (less than 350 ms, as opposed to 750 ms). Furthermore, no effect on startle peak latency was observed at any SOA. Taken as a whole, the results from the high- and low-correlation conditions make several points concerning classical conditioning of pre-pulse inhibition. First, the impressive size and extended time-course of this phenomenon is probably due to some form of conditioning. When the correlation between pre-pulse tones and startling taps is weakened -- as was done here with the addition of pre-pulses during the inter-trial intervals -- the magnitude and time-course of pre-pulse inhibition is shrunk to a significant degree. This extends the related results that have been observed in non-humans (e.g., Gewirtz & Davis, 1995) Second, the finding of reflex augmentation by tones that follow the startling taps (as opposed to precede them) is probably also due to classical conditioning. When the correlation between the tones and taps is weakened, no such augmentation is observed. Finally, the reduction in startle latency that has been observed when a tone precedes the tap is most likely due to some form of conditioning. As above, when the correlation is decreased, no such effects are observed.
An alternative explanation" Pre-pulse habituation Before continuing to discuss other implications of these findings, at least one alternative interpretation of the difference in results between the high- and lowcorrelation conditions must be addressed. This alternative focuses on the number of pre-pulse tones that were administered during the experimental session, as opposed to the correlation between the pre-pulse tones and startling taps. In other words, this alternative concerns possible habituation of pre-pulse inhibition (see, e.g., Gewirtz & Davis, 1995). As a start, note that participants in the low-correlation condition experienced approximately five times as many pre-pulse tones as those in the highcorrelation condition. Therefore, if repeated presentation of the tones leads to a weakening of their effect (regardless of correlation), then the smaller amount of pre-
Pre-pulse Inhibition
185
pulse inhibition (and the null effect of the tones on startle peak latency) in the lowcorrelation condition can be explained without reference to classical conditioning. While this altemative cannot be ruled out definitively, there are several reasons to doubt that it is the sole cause of the difference between the high- and lowcorrelation conditions. First, while the overall amount of pre-pulse inhibition was smaller in the condition that involved more tones, the specific amount of pre-pulse inhibition at one SOA was actually larger (albeit, not significantly so). Second, not only was the amount of pre-pulse inhibition affected by correlation condition, but the time-course was altered, as well, and it is not clear why habituation would affect the latter. Third, a post-hoc analysis of the relative peak amplitudes that included "practice" as an additional factor (by dividing the twelve blocks of trials into three sets of four) produced neither a main effect nor any interactions involving "practice" [all F < 1.00]. Finally, at least one direct test for habituation of pre-pulse inhibition (after controlling for changes in the startle reflex, as was done here) failed to find any evidence for such an effect (Blumenthal, 1997).
Automatic and conditioned components of pre-pulse inhibition On the assumption that the low-correlation condition did not evoke any classical conditioning and the high-correlation condition did, the present results may now be used to provide separate estimates of the automatic and conditioned components of pre-pulse inhibition. This analysis is based on the auxiliary assumption that the two components are additive, such that the amount of one has no influence on the amount of the other. (This assumption cannot be tested using these data, but is here used to gain a foothold on the results.) To do this, one first uses the results from the low-correlation condition as a direct estimate of the automatic component. Next, because it is here being assumed that the highcorrelation condition provides a measure of the sum of the automatic and conditioned components, one merely subtracts the amount of pre-pulse inhibition in the low-correlation condition from the amount in the high-correlation condition to find the conditioned component. The results from this procedure are shown in Figure 3 (plotted in terms of inhibition vs. augmentation, as opposed to relative peak magnitude). As can be seen, the automatic component of pre-pulse inhibition is relatively small, always inhibitory, and very short-lived. In contrast, the conditioned component is larger, both facilitatory and inhibitory (depending on SOA), and extends over a wider period of time.
186
Mordkoff and Barth
4O + Automatic Component ---o-- Conditioned Component
t-
E
o
:,~ tO--~"
._o
30
~,
2o
E
O o ~
xg (1)
~--(::D rr
-10 t.9 "~-Q .F
-20
"-
-40
E
-30
-50
,
,
,
,
,
-50
50
150
350
750
Tone-Tap SOA (ms) Figure 3. Separate estimates of the automatic and conditioned components of reflex modification. (The filled points represent the automatic effect; the open points represent the conditioned effect. This analysis assumes additivity between the two components and uses the low-correlation condition as a direct estimate of the automatic effect.)
Going farther, the automatic component of pre-pulse inhibition shows a remarkable resemblance to two of the phenomena from the capture literature: viz., the attentional blink (e.g., Raymond et al., 1992) and attentional dwell-time (e.g., Ward et al., 1996). These effects are observed in visual tasks that require participants to watch for and report the presence of various stimuli, usually displayed in rapid succession. As can be seen by comparing the results across these studies, the time-course of these perceptual interference effects are very similar to the present estimate of the automatic component of pre-pulse inhibition. Also similar to several recent discussions of pre-pulse inhibition (e.g., Brunia, 1993; Hackley, 1993), the attentional blink and attentional dwell-time have mostly been discussed in terms of sensory or attentional gating (see, also, Moore, Egeth, Berglan, & Luck, 1996). In contrast, the conditioned component of pre-pulse inhibition is probably better understood in terms of what a correlated pre-pulse can "tell" an experimental participant, and what the likely response to this information would be. Recall that under the high-correlation design, a pre-pulse tone is a perfect predictor of a startling tap; if the participant has just detected a tone, a tap must occur within 750 ms (if it hasn't already). In light of this, the participant's initial reaction could well
Pre-pulse Inhibition
187
be something like fear, which has been shown to increase the startle reflex (see, e.g., Falls & Davis, 1994; Leaton & Cranney, 1990); hence the finding of pre-pulse augmentation of the eye-blink reflex at SOAs near zero (see, also, Flaten, 1993). Upon further processing, however, the participant could use the "warning" provided by the pre-pulse to prepare for the upcoming tap; hence the reduction of the reflex at SOAs of 150 ms or more (see, also, Ison, Sanes, Foss, & Pinckney, 1990). Conclusions
In summary, the present study has shown that several of the important and well-known effects of pre-pulse tones on the glabellar-tap, eye-blink reflex are probably at least partially due to classical conditioning. In particular, the size of the automatic component of pre-pulse inhibition is much smaller (and much shorterlived) than the total effect that is observed when the experimental design includes a strong correlation between the pre-pulse and startle stimuli. Furthermore, while the effects of a pre-pulse tone on eye-blink peak latency are large and facilitatory in the correlated condition, the effects of an uncorrelated tone are nil. Therefore, besides raising the general issue of classical conditioning in studies of pre-pulse inhibition in humans, the present study should also be seen as a warning to researchers to pay special attention to the experimental designs that are used to examine this class of phenomena. On the positive side, as long as these warnings are heeded, the present study goes far to suggest that pre-pulse inhibition could well be used as an important new tool for the study of attentional capture. The main value of this new measure is that it does not require instructions (in that people cannot help but blink when they are tapped on the glabella) and the involuntary reaction (an eye-blink) is easily measured. References
Anthony, B. J. (1985). In the blink of an eye: Implications of reflex modification for information processing. In Advanaces in Psychophysiology, Vol 1, pp. 167-218. Blumenthal, T. D. (1995). Prepulse inhibition of the startle eyeblink as an indicator of temporal summation. Perception & Psychophysics, 5 7, 487-494. Blumenthal, T. D. (1997). Prepulse inhibition decreases as startle reactivity habituates. Psychophysiology, 34, 446-450. Blumenthal, T. D., & Levey, B. J. (1989). Prepulse rise time and startle reflex modification: Different effects for discrete and continuous prepulses. Psychophysiology, 26, 158-165.
18 8
Mordkoff and Barth
Bradley, M. M., Cuthbert, B. N., & Lang, P. J. (1993). Pictures as prepulse: Attention and emotion in startle modification. Psychophysiology, 30, 541545. Brunia, C. H. M. (1993). Waiting in readiness: Gating in attention and motor preparation. Psychophysiology, 30, 327-339. Cohen, M. E., Hoffman, H. S., & Stitt, C. L. (1981). Sensory magnitude estimation in the context of reflex modification. Journal of Experimental Psychology: Human Perception and Performance, 7, 1363-1370. Crofton, K. M., Dean, K. F., Sheets, L. P., & Peele, D. B. (1990). Evidence for an involvement of associative conditioning in reflex modification of the acoustic startle response with gaps in background noise. Psychobiology, 18, 467-474. Dawson, M. E., Hazlett, E. A., Filion, D. L., Nuechterlein, K. H., & Schell, A . M . (1993). Attention and schizophrenia: Impaired modulation of the startle reflex. Journal of Abnormal Psychology, 102, 633-641. Falls, W. A., & Davis, M. (1994). Fear-potentiated starle using three conditioned stimulus modalities. Animal Learning & Behavior, 22, 379-383. Flaten, M. A. (1993). Startle reflex facilitation as a function of classical eyeblink conditioning in humans. Psychophysiology, 30, 581-588. Gewirtz, J. C., & Davis, M. (1995). Habituation of prepulse inhibition of the startle reflex using an auditory prepulse close to background noise. Behavioral Neuroscience, 109, 388-395. Graham, F. K. (1975). The more or less startling effects of weak prestimulation. Psychophysiology, 12, 238-248 Graham, F. K., & Murray, G. M. (1977). Discordant effects of weak prestimulation on the magnitude and latency of reflex blink. Physiological Psychology, 5, 108-114. Hackley, S. A. (1993). An evaluation of the automaticity of sensory processing using event-related potentials and brain-stem reflexes. Psychophysiology, 30, 415-428. Hackley, S. A., & Graham, F. K. (1984). Early selective attention effects on cutaneous and acoustic blink reflexes. Physiological Psychology, 11,235-242. Hackley, S. A., & Graham, F. K. (1987). Effects of attending selectively to the spatial position of reflex-eliciting and reflex-modulating stimuli. Journal of Experimental Psychology: Human Perception and Performance, 13, 411-424. Hoffman, H. S., Cohen, M. E., & Stitt, C. L. (1981). Acoustic augmentation and inhibition of the human eyeblink. Journal of Experimental Psychology: Human Perception and Performance, 7, 1357-1362. Hoffman, H. S., & Ison, J. R. (1980). Reflex modification in the domain of startle: I. Some empirical findings and their implications for how the nervous system processes sensory input. Psychological Review, 87, 175-189.
Pre-pulse Inhibition
189
Hoffman, H. S., & Stitt, C. L. (1980). Inhibition of the glabella reflex by monaural and binaural stimulation. Journal of Experimental Psychology: Human Perception and Performance, 6, 769-776. Ison, J. R., Sanes, J. N., Foss, J. A., & Pinckney, L. A. (1990). Facilitation and inhibition of the human startle blink reflexes by stimulus anticipation. Behavioral Neuroscience, 104, 418-429. Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (1990). Emotion, attention, and the startle reflex. Psychological Review, 97, 377-395. Leaton, R. N., & Cranney, J. (1990). Potentiation of the acoustic startle response by a conditioned stimulus paired with acoustic startle stimulus in rats. Journal of Experimental Psychology: Animal Behavior Processes, 16, 279-287. Marsh, R. R., Hoffman, H. S., & Stitt, C. L. (1979). Eyeblink elicitation and measurement inthe human infant. Behavior Research Methods &
Instrumentation, 11,498-502. Moore, C. M., Egeth, H., Berglan, L. R., & Luck, S. J. (1996). Are attentional dwell times inconsistent with serial visual search? Psychonomic Bulletin & Review, 3, 360-365. Mordkoff, A. M., Edelberg, R., & Ustick, M. (1967). The differential conditionability of two components of the skin conductance response. Psychophysiology, 4, 40-47. Mordkoff, J. T., & Yantis, S. (1991). An interactive race model of divided attention. Journal of Experimental Psychology: Human Perception and Performance, 17, 520-538. Perlstein, W. M., Fiorito, E., Simons, R. F., & Graham, F. K. (1993). Lead stimulation effects on reflex blink, exogenous brain potentials, and loudness judgments. Psychophysiology, 30, 347-358. Raymond, J. E., Shapiro, K. L., & Arnell, K. M. (1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18, 849-860. Stitt, C. L., Hoffman, H. S., & DeVido, C. J. (1980). Modification of the human glabella reflex by antecedent acoustic stimulation. Perception & Psychophysics, 27, 82-88. Ward, R., Duncan, J., & Shapiro, K. L. (1996). The slow time-course of visual attention. Cognitive Psychology, 30, 79-109.
This Page Intentionally Left Blank
Attraction, Distraction, and Action: Multiple Perspectives on Attentional Capture C. Folk and B. Gibson (Editors) 9 ElsevierScience B. V. All fights reserved.
9
191
TemporalExpectancies, Capture, and Timing in Auditory Sequences Mari Riess Jones
It is surely true that we know more about the way people attend to visual than to auditory arrays. Yet anyone who has tried to follow a friend's conversation or listened to a musical performance knows that attention is not limited to a specific modality. This chapter considers guided attending as well as stimulus-driven attention in the auditory domain, with a special emphasis upon the dynamics of attending. It is divided into four parts. The introduction reviews major paradigms and findings in research on attending to visual and auditory events with primary concentration on auditory research. A second section describes dynamic attending in the context of temporal sequences; it incorporates a new role for abrupt onsets and capture in events that transpire over time. The third section presents results from recent experiments on dynamic attending to auditory sequences; and a final section provides concluding remarks. I. Attending to Visual and Auditory Events" An Overview In Part I, I attempt to provide an overview of research on auditory attention with some focus on issues of timing. Relative to research on visual attention, much less is known either about attention to auditory events or the role of time in attending to these events. Consequently, in this introductory section, although I draw comparisons between attending to visual and auditory stimuli, my main focus is upon auditory events and the role of time. My stepping-stone into the world of auditory attention begins with a discussion of paradigms used in research on visual attention because these are familiar and well established. At the same time, it invites observations about important differences between visual and auditory paradigms. In the visual domain, it is common to rely on search tasks in which presented elements form static spatial arrays whereas in the auditory domain tasks are more likely to rely on monitoring of elements that comprise dynamic temporal arrays. Format differences between visual-spatial and auditory-temporal presentations are critical because they levy different constraints on attending. Consequently, my review of auditory attending concentrates on temporal properties of tasks and stimuli.
192
Riess Jones
Visual arrays
In attending to visual arrays, it has been useful to distinguish voluntary control of a spatial search process, which involves expectancies, from an involuntary process which involves stimulus-driven capture. Although the voluntary versus involuntary distinction becomes less clear-cut in research with auditory arrays, I organize this part of the chapter around two topics associated with this distinction, namely expectancy and capture. Expectancy. Expectancy typically refers to an anticipatory orienting of attention. By its very nature, anticipation implies a temporal component of expectancy. Nevertheless, it is most common to link attentional orientation simply to a spatial locale of a future target; thus, expectancy involves either a specific or nonspecific orientation to some location in space. A specific expectancy is one confined to a narrow spatial region, whereas a non-specific expectancy is one in which an attender anticipates that a future target may occur in anywhere within a wide spatial region. Operationalized, specific and non-specific orientations may be associated, respectively, with cued and uncued search paradigms. A cued search paradigm relies on distinct cues to instill specific expectancies about "where" a target might occur in space whereas an uncued search (by definition) does not. In visual attention, cued search usually involves two discrete, successively presented, visual elements namely, a cue and a target; together they form a short sequence. This paradigm often examines the locative function of a cue stimulus: The cue is used to signal some future spatial location of the second element in the sequence, the target. As shown in Figure l a, the cue-target task involves time constraints; these largely concern the time interval between the onset of the cue and that of the target. This interval, known as the inter-onset-interval, IOI, usually assumes only one or two values (an IOI is also termed SOA) within a session. If the cue is an arbitrary stimulus (e.g., a symbol such as an arrow), neutrally located in space, then it is termed an endogenous cue; this is distinguished from an exogenous cue (discussed shortly) which may be either similar to a cued target and/or located close to a possible target location. Usually an endogenous cue acquires its meaning for a viewer by virtue of its probabilistic connection to a forthcoming target. For instance, an endogenous cue is effective in guiding attending if a viewer knows, in some sense, that there is a high conditional probability (validity) of the target at a given location following a given cue (Kahneman & Treisman, 1984; Kahneman & Tversky, 1982; Posner, 1980; Theeuwes, 1991). In other words, operationally cue validity is taken as a determinant of expectancy. In this context, a valid symbolic cue typically generates faster and more accurate responding to a target that appears at the specified location than does an invalid cue (Downing, 1988; Posner, 1980, Posner, Synder, & Davidson, 1980; Shulman, Remington, & McLean, 1979). This sort of cuing task, along with the evident influence of cue validity, has reinforced the widespread practice of reserving the term 'expectancy' to describe endogenous cuing. In visual attention, valid endogenous cues are commonly considered to be an
Auditory Attentional Capture
193
important vehicle for the voluntary orienting of visual attending to locations in space. Endogenous cuing paradigms often equate cued attending with specific expectancies about "where" a target will occur. But emerging evidence suggests that we can enlarge the description of endogenous cuing to include the temporal component of an expectancy. People appear able to take advantage of temporal constraints of the cue-target task to anticipate "when" as well as "where" a future target may occur. Admittedly, to suggest that an endogenous cue directs attending to cued regions in time is less conventional than to suggest the allocation of attention in space. Nevertheless, recent research shows such effects (Coull & Nobre, 1998; Miniussi, Wilding, Coull & Nobre, 1999; Coull, Firth, Buchel & Nobre, 2000; Kingstone, 1992; Rothstein, 1985). The general strategy pairs two different symbolic cues (e.g., cue A and cue B), respectively, to two different time intervals (IOIs) of forthcoming targets. Figure l a (left panel) shows that cue A is paired most often with a long IOI (t3 - t~) and cue B with a short one (t2 - tl). People pick up on these probabilities, responding more quickly to validly cued targets appearing after a long IOI (cue A) than to those invalidly cued by cue A. That is, a target that suddenly occurs earlier than expected (i.e., after cue A and a short IOl), catches the viewer by surprise. One interpretation offered for these findings is that unexpectedly early targets produce an automatic reaction, whereas unexpected late ones stimulate a voluntary re-orientation of attention. Thus, with only two lOis, when a target fails to appear following B by a short IOI (at t2 ), one can confidently predict 'when' it will occur, namely after the long IOI (i.e., at t3 ) in Figure 1a (left). To sum up, given the right visual cue as well as some consistency of experienced time intervals, people can learn to specifically anticipate "when" a target will occur after a cue. Clearly, people can allocate attention in time as well as space. Many cue-target tasks use only a single visual cue, as in the right panel of Figure l a. Note that they offer greater specificities of cues with respect to space but not with respect to time. In these, because a wide range of IOIs are associated with the same cue, the target will be anticipated to occur anywhere within a broad temporal region from t 2 to t3. For instance, if short and long IOIs are 1/2sec and lsec, respectively, then attention will be focused in time over this 1/2second region. All of this means that attending has a temporal component which can be focused narrowly or widely (in time) depending on the range of lOis used in a task. In other words, depending on temporal constraints of the task, a cue-target experiment designed to specifically orient attending to a "where" in some space-like dimension, may also inadvertently provoke distinct temporal expectancies. By contrast, in uncued visual search, expectancies in both space and time are less specific. The common uncued search task presents a large number, d, of elements simultaneously within a 2D spatial array. By definition, this task lacks the sequential presentation of two items (cue, target). Because both cues and IOIs are absent, specific expectancies about a target's location in space or time are less likely. In spite of such uncertainties, a general expression regarding expectancies about target location in space and time is possible. Probabilistically, the canonical search
194
Riess Jones
task is maximally uncertain; on average, when a target is present its probability of being at a given location is 1/d (see e.g., Nissen & Corkin, 1985 for unequal target probabilities). Often such search is conceived as a series of deliberate attention shifts from one location to another resulting from various goal-oriented strategies (e.g., feature search, singleton search etc.). Imagine, for instance, an array of d elements, say short lines, that differ from one another with respect to orientation and color. A conjunctive search task defines a target in terms of a certain co-occurrence of two features as in a vertical red line, a definition that motivates the search. In such tasks, it has been argued that attention shifts serially over many spatial locations, guided by the goal of target localization and possibly by prioritized relationships among displayed elements (Cave & Wolfe, 1990; Treisman & Sato, 1990; Wolfe, Cave, & Franzel, 1989; Wolfe, 1994) 1. I speculate that subjectively such shifts transform a simultaneous array into a successive one, leading to the creation of an attentional trajectory in space and time: Attention moves from location X at time tn to location Y at time tn+l, and so on. Building on probabilistic notions, rough estimates of an expected time to target discovery can be determined from the expected number of shifts, d/2 (assuming equal probability weights). If the average attention shift time is T, then the expected search time is Td/2. Thus, Td/2 represents a kind of temporal expectancy in that it gives an expected time to target discovery. Furthermore, letting T be the average pace of attending (i.e. how fast or slow one shifts attending) associated with this space-time trajectory, this pace can be affected by instructions and task goals, among other things. My motive for describing uncued search of a large spatial array in this manner is strategic because later in this chapter I suggest that an important difference between searching static spatial arrays and monitoring dynamic temporal ones concerns the way attending is paced in the two situations. The description of uncued search I have outlined emphasizes that in these visual-spatial tasks one's attentional pace can be flexible because people are free to pace themselves; thus, when asked to respond quickly, they can voluntarily change T. Later, I will contrast this with constraints imposed on attentional pacing in a sequence monitoring task involving d temporally distributed auditory elements; in this task, people are "paced" by the rate imposed by a succession of elements. Despite the flexibility available in visual search, an expected trajectory of attention can be short-circuited, in some cases, by a single distinctive stimulus i.e., a singleton (Egeth & Yantis, 1997). If the singleton is putatively task irrelevant, but nonetheless is so distinctive that it grabs our attention, then attention shifts to an unscheduled spatial location at an unexpected time. According to Yantis and Egeth (1999), in this case, the singleton captures attending. When this happens in visual search, where the expected time to target discovery is Td/2, capture represents a kind of violation of the expectancy algorithm. That is, a target is discovered either earlier or later than expected. If a distracting singleton is the target, then observed search time will be less than expected, Td/2; if not, 2 then observed search time will exceed Td/2. To sum up, probabilistically based expectancies can be identified in both
Auditory Attentional Capture
195
cued and uncued spatial search tasks. In cued search tasks, they express relatively specific anticipations about "where" in space and "when" in time a target may occur, whereas in uncued search, they are less specific. Nonetheless, in spite of uncertainties in the latter, it is possible to estimate a global temporal expectancy about target discovery that suggests the presence of flexible attentional allocation over time in uncued search. Moreover, generally expectancies about where and when a target may occur are seen as reflecting mainly voluntary attentional activities. Capture. Capture has been touched upon above in the uncued search task. It refers to an apparent truncation of a largely voluntary search due to the presence of a distinctive singleton which "pulls" attention to a specified spatial location at an unexpected time. Central to the debate over visual capture is a concern over whether certain stimulus properties, inherent in the attractor element, are responsible for a derailment of attending. Some contend that task parameters and the goal object 'set' attention in such a way that, under the right circumstances, any unique or relationally distinct element can pull attention to itself (or its location) by virtue of its relevance to the task (Folk, Remington, & Johnston, 1992; Folk, Remington, & Wright, 1994; Gibson & Amelio, 2000). Others maintain that certain special properties of a task-irrelevant singleton, such as its abrupt onset, are important for capture to occur, either because they signal a new object (Jonides & Yantis, 1988; Yantis, 1993; Yantis & Egeth, 1999; Yantis & Jonides, 1984) or because they introduce a significant luminance change over time (Gellaty, Cole & Blurton, 1999; Gibson, 1996a,b; Yantis & Jonides, 1996). Abrupt onsets are central to debates over capture. Perhaps this is because they differ intriguingly from other salient features of a singleton, such as color, form etc. The defining property of an abrupt onset is dynamic rather than static: it involves relative timing. Relative to onsets of other elements in a display, a singleton onset is one that happens early or late; in a real sense, their 'abruptness' is a matter of relative timing because it depends on onset times and time intervals associated with other elements (e.g., see Miller, 1989; Remington, Johnston, & Yantis, 1992). Any element with an unusual onset time relative to the established time structure of a task can draw attention. For example, a cue-target IOI of 200 ms can seem surprisingly short when it follows a series of 1,100 ms intervals but less so when it follows a series of 400 ms intervals. Some timing deviations make powerful claims on our attention. In this respect, time deviations, including abrupt onsets, assume an exogenous cueing function in that, regardless of how often they occur (cue validity), they seem to "pull" attention to their spatio-temporal locations. Furthermore, because abrupt onsets seem to override cue validity and instructions, the attentional shifts they provoke are considered, at least partially, automatic (Theeuwes, 1991). To sum up, although often overlooked, the relative timing of a stimulus element, within a larger experimental context, may be an important aspect of the larger debate about capture in attending. The debate over capture really turns on the issue of automaticity. A popular dichotomy holds that expectancies are voluntarily controlled whereas stimulus-driven attending is involuntary (where involuntary equates with automatic).
196
Riess Jones
This dichotomy goes hand-in-hand with above distinctions about stimulus cues; endogenous cues are purported to instantiate voluntary attending whereas exogenous cues provoke involuntary, i.e., automatic, control of attending. Because involuntary attending is equated with "stimulus-driven" attending, one area of concern in debates over capture regards whether a particular stimulus item "qualifies" as an exogenous cue. Presumably, an item qualifies if it: 1. Immediately over-rides a voluntary intentional control of search, and 2. Is insensitive to memory/cognitive load (Kahneman & Treisman, 1984; Theeuwes, 1991). Folk and colleagues (Folk et al. 1992; 1994; 1999) have maintained that, by these criteria, capture is not automatic; rather it is contingent upon the attentional set one voluntarily assumes in a given task (i.e. whether or not a feature is task relevant). In their view, involuntary attentional orientation depends on voluntary control settings conferred by task relevant features of the goal object (e.g., Gibson & Amelio, 2000). In this case, it is possible to ignore even compelling stimulus singletons (e.g., abrupt onsets) if they are not relevant to a task goal. To the contrary, Yantis maintains that even taskirrelevant abrupt onsets can capture attention, given a neutral attentional set, i.e., a relatively wide attentional focus (Yantis & Jonides, 1990; 1996; Theeuwes, 1991). Only when a task tightly circumscribes the goal item and search, such that one's attention is narrowly focused on a particular property or spatial region, will an irrelevant abrupt onset fail to capture attending. Thus, the kernel of the debate on automaticity of stimulus-driven attending concerns whether or not the attentiongetting potential of certain stimulus properties is independent of one's goals in a task. Summary. Expectancy is often defined operationally in a probabilistic fashion in visual attention (using endogenous cue validity). Such expectancies not only appear to direct attending to "where" in space a target might occur, but it also seems that people can allocate attention in time to anticipate "when" that target will happen. The latter finding realizes a temporal component of attending inherent in the concept of expectancy. The phenomenon of attentional capture also incorporates a role for timing in that often those elements most likely to compel attention are ones that violate an established task time structure: They arrive unexpectedly early or late (e.g., abrupt onsets). These sudden onsets seem to over-ride expectancies set in motion by valid endogenous cues. But debates over the automaticity of visual capture remain.
Auditory arrays In the auditory domain much research employs temporal rather than spatial arrays; these arrays comprise short or long sequences of sounded elements. I discuss attending to these sequences by revisiting the topics of expectancy and capture outlined in the preceding section. Thus, I consider auditory versions of the cuetarget task involving short sequences (d - 2 elements), as well as auditory counterparts to the uncued spatial search tasks involving many (d > 2) temporally distributed elements used in sequence monitoring tasks. Precise parallels between
197
Auditory Attentional Capture
auditory and visual tasks are risky, especially in the latter case (i.e., uncued spatial search versus sequence monitoring). Some reasons for caution are instructive. For instance, the large visual arrays of the uncued search task are usually spatial and static, whereas the long auditory arrays of sequence monitoring are temporal and dynamic. I contend that these formatting constraints differentially affect the flexibility and the pacing of attending. A second reason for caution involves the respective functions underlying responses to spatial and temporal arrays. In our everyday lives, the visual search tasks often call for responses that resemble quests for a lost item, whereas auditory sequence monitoring resembles situations where we must listen, reply to and/or assess communications of others (speech, music). Both require attending, but in the former it serves a locative function whereas in the latter it serves as the foundation for communication. Cue-Target Tasks
t
Long IOI
I
Long IOI
Cue A
Cue O ,m
Cue B
E
,m
S h o r t IO1 S h o r t IO1
t 1
t 2
t 3
t 1
t 2
t 3
Time
Figure la. Left panel: The cue-target paradigm is shown indicating different temporal contingencies over trials using two distinct cues (A, B) to, respectively, different IOIs for target occurrences (targets are solid ovals). Given the onset time, t~, of a cue on a trial, one cue (cue A) is shown to provide a specific temporal expectancy for a target at time, t3, whereas the other (cue B) provides a temporal expectancy at t2. Right panel: The same cue is associated with two different lOis on different trials; targets (solid ovals) are anticipated in the temporal region between t2 and t3, yielding a less specific temporal expectancy than in the left panel.
198
Riess Jones
A Sequence Monitoring Task
Critical I01
0
0
"Same" "Lower" Comparison tone
Standard
tone
Time v
Figure lb. A sequence monitoring task showing tones interpolated between standard and comparison pitches. Time intervals between successive interpolated tones are IOIs; these specify a sequence rate and rhythm; the final IOI, the critical IOI, may be a variable in certain tasks.
Expectancy. Expectancies can be conceived in several ways. As we have seen, the probabilistic association view links expectancy with a slow and deliberate voluntary control process. It is most evident in visual cue-target tasks where expectancy is operationalized probabilistically as cue validity (e.g., with endogenous cues). However, a different conception emerges in sequence monitoring tasks. It involves pattern-directed attending in which expectancy is linked to pattern relationships (Jones & Yee, 1993; Garner, 1974). In this view, an expectancy is an extrapolation of attending that is determined partly by relationships among sequence elements. In this case, expectancy is neither necessarily slow nor inevitably voluntary. Cue-target paradigm. Paradigmatically, auditory versions of the cue-target task offer nice parallels to their visual counterparts. In the auditory domain, these designs continue to facilitate questions about the locative function of a cue, now a sound cue: How does one sound signal the location of another? Again, Figure l a is applicable to the cue-target paradigm. A sound that is arbitrarily related to the sound of a target (e.g., a church bell followed by a bird whistle) may acquire a signaling function merely through probabilistic associations. When endogenous cues are correlated with a target in terms of conditional probability, they become valid and function to determine a listener's expectancy about a target's location in space. A word is necessary about the term "space-like" on the ordinate of Figure 1a. Whereas in visual cueing studies, the cue often orients attending to a location in two or three-dimensional space in the world, in auditory studies two variants of this exist. In one, the sound cue signals the location of another sound in the 3D environment, meaning that space-like refers to a measure of real spatial distance. In the other variant, one sound cue signals another's position in Hertz (Hz) i.e., tone frequency (pitch); here "space-like" refers to an interval distance in pitch space. It has been argued that in the auditory domain pitch space functions psychologically much like real space (Kubovy, 1981; Jones, 1976; Woods et al., 2001).
Auditory Attentional Capture
199
In pursuit of probabilistically determined expectancies, Spence & Driver (1994) asked whether valid sound cues can orient attending in real space. They pitted endogenous sound cues against so-called exogenous ones in a task where listeners judged the spatial elevation (high versus low) of a target sound. The target sound could occur either on the listener's left or right side (Experiment 5). Listeners were told that a 2KHz endogenous sound would validly (.75) signal the location of a forthcoming target on the side opposite that of the endogenous sound cue. On invalid trials, the same sound exogenously cued the target, which then appeared on the same side as the exogenous sound cue. The design was similar to that of Figure 1a (right panel) where a cue sound preceded a target by one of three cue-target IOIs (from lOOms to 1,000 ms, with a variable cue onset time, tl). People were faster at identifying valid versus invalid targets at longer lOis, a finding consistent with the probabilistic view that valid endogenous sound cues are linked to slow voluntary expectancies. In turn, these expectancies support a listener's shift of attention to a region in space, as reported with visual cue-target designs (e.g., Downing, 1988; Posner, 1980). Others confirm that acoustic cues can orient listeners to specific regions in 3D space, but these studies involved (arguably) exogenous cues and are discussed shortly (Mondor, 1999; Mondor & Breau, 1999; Mondor & Zatorre, 1995; Mondor, Zatorre, & Terrio, 1998; Spence & Driver, 1994). It appears that useful parallels exist between auditory and visual cuing of attending to regions in space in cue-target designs. One, not mentioned above, involves time constraints. If attending is not modality-specific (and I suspect it is not), then time intervals (IOIs) may operate in a similar way in all cue-target designs. I previously argued that over the course of a session, the lOis used in these designs permit people to anticipate a region in time as well as one in space. This remains relevant to understanding the anticipatory aspect of expectancies in auditory as well as in visual cue-target designs. For example, in the Spence and Driver experiment, the target follows a cue sound within 1/10 th to 1 sec (with a probability of 1.00); this consistent pairing of temporal range with a single sound cue circumscribes a region in time for which attending may be heightened (see also Swets & Green, 1966). In sum, regardless of modality, it appears that cue-target designs afford the orienting of attending in space.., and in time. People may come to expect not only the "where" of future targets, but also their "when". Sequence Monitoring Tasks. Rigid parallels between visual and auditory arrays are problematic in the monitoring of long sequence (d > 2). It is probably wiser to acknowledge that auditory sequences are important objects of study in their own right because they approximate the stimuli we routinely encounter in our acoustic environment in the guise of speech, music and other environmental sound patterns. Unlike research in uncued visual search of large spatial arrays, fledging research on attending to many elements distributed in temporal arrays is less standardized. One version of the monitoring task embeds a target sound (e.g., a change in pitch, timbre or duration) either within or at the end of a sequence. Figure l b illustrates the case where a target (pitch) is specified by an initial standard tone and a comparison tone terminates the sequence. Although superficially some target
200
Riess Jones
embedding tasks bear similarities to visual search tasks, as a rule these tasks differ importantly in the temporal constraints they levy on attending. As indicated earlier, in visual search people can voluntarily pace attending to each of d visual elements by varying the rate (defined earlier as T) of attention shifts. By contrast, in auditory monitoring people have less freedom to voluntarily adjust the pace of attending. In fact, because a sequence conveys a series of IOIs between sounds, a temporal sequence may preclude voluntary attentional pacing. A static visual array does not change over time: Objects do not appear, then disappear. But, this is precisely what happens within a temporal sequence. Somehow, people must "attend at the right time" by coordinating their moment-to-moment attending with "when" successive elements appear, either voluntarily or involuntarily. I suspect this means that people must adjust the pace of attending (T) to match the rate (IOI) of sequence (Jones et al., 2001; Large & Jones, 1999). I consider this topic in part II. Lets return to conceptions of expectancy mentioned earlier. The probabilistic association view links expectancy to a slow voluntary control of attending based on probabilities associated with a single discrete cue. This approach to expectancy is eminently well suited to studying the locative function of sounds. But to apply it to sequence monitoring is challenging. First, this requires identifying a plausible discrete cue within a sequence comprising many potential cues. Yet, just as with its static visual counterpart (i.e., uncued search), the sequence monitoring task offers many potential sound cues, leading to great uncertainty about cuing. Second, when presented with a sound sequence, people rarely respond to it by listening for a discrete sound as an indicator of the spatial location (in the environment) of a subsequent sound; rather, their default attention mode seems to involve a tracking of relationships among a series sounds, much as we naturally do when listening to the prosody of speech or music. Consequently, in sequence monitoring tasks, expectancy has also been based on pattern-directed attending. This view links expectancy to an extrapolation of compelling aspects of pattern structure. Instead of relying strictly on cue validity, it assumes that relationships among successive elements, including their time relationships, contribute to attending and to the induction of expectancies. In both short and long sequences, these relationships include pitch changes and pitch arrangements as well as temporal relationships between onset times (t~) and IOIs (rate, rhythm). Expectancies have often been conceived as slow and voluntary. Is this necessarily the case with expectancies in sequence monitoring? According to some probabilistic interpretations, the answer is "yes." This implies that expectancies should appear only in response to slow sequences. By contrast, according to certain pattern-directed approaches, attentional pace and expectancies are rate dependent because time is part of pattem structure. As I propose later (part II), certain aspects of tracking a pattem that are involved in anticipatory orienting may become less efficient at fast rates. However, this view does not necessarily link changes in tracking efficiency with a sharp dichotomy between involuntary processes (e.g., at fast rates) and voluntary ones (e.g., at slow rates). Clearly, attentional pacing is a focal issue in this chapter and I formalize it
Auditory Attentional Capture
201
shortly. But to pave the way for this, let us first consider evidence for attentional pacing in the monitoring of slow auditory sequences (mean IOI > 200 ms). Expectancies, as extrapolations of attentional pace, can be assessed by examining effects of the time structure of an induction sequence on listeners' judgments about "when" future targets may occur. Barnes and Jones (2000) found that a regular stimulus rhythm produced temporally paced expectancies, which influenced people's judgments about subsequent time intervals. Time judgments were more accurate with expected than unexpected target timing, given the rate and rhythm of the induction sequence (cf. McAuley & Kidd, 1998; Large & Jones, 1999). Such findings suggest that temporal expectancies exist in responding to long sequences of tones and, once again, are based on stimulus timing (lOis). Because these were slow sequences, it is tempting to also infer that these expectancies are voluntary. But, at least by one criterion of involuntary control (over-riding instructions), we cannot firmly conclude this because listeners in the Barnes and Jones study had trouble complying with instructions to "ignore" the induction sequence. Sequence time structure also affects other aspects of monitoring performance. Jones, Boltz and Kidd (1982) found that sequence rhythm (as well as pitch patterning) affected monitoring; listeners were better at detecting a change in pitch of a target when it was located on a rhythmic accent than when it occurred at a temporally unaccented time point (see also Boltz, 1993; Kidd, 1993; Kidd, Boltz & Jones, 1984). Often when instructions are used to direct attending to certain pitch regions (e.g., attend to high tones), their influence on performance is qualified by listeners' bias toward relying on pattern structure itself, including its rhythm (Jones, Jagacinski, Yee, Floyd, & Klapp, 1995; Klapp, Hill, Tyler, Martin, Jagacinski, & Jones, 1985; Klein & Jones, 1996). Finally, classic findings have shown that people's anticipations about future elements in slow auditory (as well as visual) sequences are often predicated on various relationships among elements earlier in a sequence (e.g. Garner, 1974; Gamer & Gottwald, 1968; Restle, 1970; Jones, 1981 for a review). For example, a high tone that occurs rarely in a sequence may, in spite of its low probability of occurrence, be strongly expected at a particular point in time simply if it "fits" within local pattern relationships. All of these findings indicate that time relations as well other pattern relationships (e.g. pitch) between discrete tones strongly affect people's expectancies about the "when" in time and the "where" in pitch space of some target tone. In sum, research with slow sequences reveals the presence of patterndirected attending but offers no conclusive evidence bearing on whether or not the resulting expectancies are voluntary. According to Bregman (1990) any expectancies evident at these rates, including those based on rhythm, are voluntary and result from domain-specific leaming that operates only at slow rates. Others eschew the voluntary/involuntary dichotomy. Thus, I have suggested that to the extent pattem-directed attending is engaged by sequence rhythm at any rate, it rests on internal activities that are unlearned, rather primitive and responsive to timing (Jones, 1976). What happens with fast auditory pattems? If expectancies indeed reflect
202
Riess Jones
slow voluntary attending, based on learned schemes, then they should disappear in tasks when people monitor fast sequences. Although findings with fast sequences are less clear-cut, there remains some evidence of pattern-directed attending at these rates. Howard, O'Toole, Parasuraman, and Bennett (1984) assessed pattern-directed attending in fast tone sequences using trained listeners. Performance depended on several factors, including the relative pitch of the target, the conditional probability of a given (target) pitch within a particular pattern and the pattern itself (rising vs falling pitch patterns). This is one of the few reports in which conditional probability of a single tone within a pattem was pitted against the nature of pattem structure itself; in a sense, the whole pattern functioned as a "cue." The data suggest that both conditional probability (validity) of a target and pattern structure contribute to the allocation of attending in a sequence. Dowling, Lung and Herrbold (1987) reached similar conclusions. People listened for probe tones within a sequence when distractor tones were added. In different experiments, variations of pitch and timing relationships significantly affected performance suggesting their influence on attending to fast sequence (replicated by Puente & Jones, under review). However, other research fails to support pattern-directed attending and I consider it shortly (e.g., Mondor & Terrio, 1998). In sum, with fast auditory sequences there is mixed evidence for patterndirected attending. One theoretical account of such findings assumes that different processes underlie monitoring of slow versus fast sequences. Whereas voluntary expectancies have been proposed as the vehicle of selective attending in slow sequences, involuntary perceptual grouping processes (using Gestalt principles) are supposed to determine pattern perception in fast sequences (Bregman, 1990). In this view grouping is an after-the-fact automatic process that precludes attention and expectancies. Another account assumes that pattern-directed attending operates at all rates: Attending is paced by stimulus time relationships but accurate rhythmic pacing systematically falters at fast rates (Jones, 1976). Summary of research on auditory expectancies. Research with short auditory sequences (using the cue-target paradigm) indicates that expectancies about locations of future sounds can be probabilistically manipulated by cue validity of endogenous sound cues. Research with longer sequences (using monitoring tasks) indicates that pattern relationships, including rate and rhythm, influence expectancies and target identification performance in slow sequences. With rapid auditory sequences, people continue to respond to pattern relationships but evidence for the impact of pitch and time relationships on attending and expectancies is mixed.
Capture Capture is often cited as the signature example of stimulus-driven attending, although its claim of automaticity invites scrutiny. Issues pertinent to auditory capture can be addressed using exogenous sound cues in a cue-target paradigm, and in principle, using monitoring tasks by embedding a distinctive sound
Auditory Attentional Capture
203
singleton in a sequence of sounds. However, common criteria for automaticity of stimulus-driven attending seem more applicable to the former task than to sequence monitoring. That is, criteria such as the power of a stimulus cue or singleton to quickly over-ride voluntary control conferred by instructions/intentions, cue validity, and cognitive/memory load, can be easily adapted to the cue-target, whereas this is more challenging given sequence monitoring. In this section, I follow the organization of the preceding one, assessing first cue-target and then sequence monitoring tasks. To preview, relatively little research using auditory stimuli directly speaks to stimulus-driven attending and issues of attentional automaticity. The cue-target paradigm. The simplest question relating to auditory capture asks whether the location of a discrete sound cue can specifically "call" attention to its location either in the environment or in pitch space, regardless of its cue validity. Surely, the fact that alarm sounds are so ubiquitous across different cultures suggest something rather universal about the compelling effect of sudden or unusual environmental sounds. Intutively, such sounds certainly seem to grab our attention. But a sudden sound may only be surprising. And...a threatening sound has little survival value if it merely provokes surprise; in principle, such sounds afford information about location, distance or time-of-arrival of a threatening sound source. Such information can be picked up and used, if the sound waves reaching the listener automatically effect an orienting of attention characteristic of exogenous cuing. Spence and Driver (1994) addressed the issue of exogenous cueing. In Experiment 1 they found evidence for stimulus-driven attention by sound cues using the task of Figure l a. They told listeners to ignore an initial cue sound that was temporally surprising (i.e., tl was unpredictable) but which carried no informative validity about the target's location. The cue was followed by a target with IOIs in the range of 100 to 1,000 ms. The sudden onset of the exogenous sound cue indeed "pulled" the attending to the perceived location of this sound cue, thereby quickening responses to targets appearing there, especially at short IOIs. These findings meet certain automaticity criteria; first a temporally unpredictable exogenous cue over-rides instructions to ignore the cue, second, it operated in spite of low validity, and third its effects were immediate, i.e., confined to short IOIs. Related research suggests that the compelling effect of an exogenous sound cue is stronger as its distance from the target sound sources diminishes and its validity increases (Mondor & Zatorre, 1995; see see also Mondor, 1999; Mondor & Bregman, 1994; Mondor Zatorre, & Terrio, 1998). It is possible to limit the discussion of exogenous cuing to the automatic orientation of attending to a locale in space (as in the Spence & Driver study above). But given the focus of this chapter, let me comment on implications of time constraints used in these tasks. I suggest that spatial orientation of attending is only part of the story in cue-target designs. Because these designs incorporate a defined set of time intervals between an exogenous cue and target, they inevitably also invite expectancies about "when" a target will occur. So, is there evidence for such a
204
Riess Jones
claim? Admittedly, there is very little evidence, largely because the topic is rarely addressed. Consequently we find only suggestive evidence that time constraints may be important to understanding exogenous cueing. For example, Mondor (1999), using exogenous sound cues to orient attending in real space, manipulated IOIs to assess Inhibition of Return (IOR). He found that performance, specifically the locus of IOR, changed systematically depending on whether targets were temporally predictable or not. This sort of data is intriguing in suggesting that experiments designed to assess exogenous cues to spatial location may also inadvertently be introducing temporal expectancies associated with these cues. In this case, people are more likely to specifically anticipate "when" temporally predictable targets will happen than unpredictable ones. Not all exogenous cuing studies meet automaticity criteria. Nevertheless, a number do show that people's attention is somehow drawn to the region in 3D space of the sound cue. Similar effects emerge in another variant of the auditory cueing design where the pitch distance between cue and target sounds is varied. An exogenous sound cue, with a given tone frequency appears to draw attending to that region in pitch space hence facilitating responding to a target of similar pitch (e.g., Scharf, Quigley, Aoli, Peachey, & Reeves, 1984; Mondor & Bregman, 1994). However, we do not know how attention is allocated in pitch space: Is it shifted voluntarily or involuntarily? Available research offers no conclusive answers. Research on this topic has not directly assessed whether exogenous cues to a pitch region override either instructions or the effect of valid endogenous cues at brief lOis (as research with visual exogenous cues demonstrates). Nor have invalid exogenous sound cues been pitted against valid exogenous ones to determine whether the former overrides the latter in orienting attention to specific sound frequencies. In a few studies exogenous sound cues, when invalid, do appear to orient attending to the target's pitch space locale as indicated by a decline in performance with increased separation of cue and target in pitch space (cf. Greenberg & Larkin, 1968; Scharf et. al., 1984). But in others, cue similarity (tone frequency) and validity have been correlated. For example, Mondor and Bregman (1994) varied cue validity, with valid cues always identical to a particular target frequency. With the cue-target lois ranging from 550 to 1,600 ms, 3 listeners were faster and more accurate identifying target properties (e.g., tone duration) with valid cues at longer IOIs; this outcome pattern resembles that which is often found with endogenous cueing. Nevertheless, Mondor and Bregman suggest that these sounds exogenously guided attending to regions in pitch space, with attending distributed as a space-like gradient. [See also Jones (1976) who earlier hypothesized that people allocate attending to regions of space (real or pitch space) and of time.] To sum up, it seems clear that unexpected sounds do more than simply surprise people. Exogenous cues can serve a locative function in facilitating attentional orientation to locations in real or pitch space related to a cue sound. Nevertheless, several questions remain. One concerns how attentional allocation is accomplished: Is exogenous cueing truly automatic? Another concerns the degree to which timing constraints modulate observed effects of exogenous cuing.
Auditory Attentional Capture
205
Sequence monitoring tasks. I conclude Part I by considering whether something akin to capture by sound singletons occurs in auditory sequence monitoring. Theoretically, to emulate capture paradigms in sequence monitoring, listeners must be required to respond to a feature of a target sound which is coincident with a singleton ...only sometimes. The idea is to discover if the singleton facilitates feature identification, when target feature and singleton cooccur, and inhibits target identification when they do not co-occur due to the power of the singleton to "call" attending to itself and its serial location in a sequence. Moreover, to insure that any observation of capture is determined entirely by an automatic "pull" levied by the singleton and not contingent on the task, the singleton feature should be irrelevant to the task goal and it should be neither a defining nor reported feature. In practice, very rarely are these guidelines applied in the research reviewed. One reason that few monitoring studies have addressed capture and automaticity, as such, is that other issues have dominated this field. Because sequences are foundations of communication patterns, common questions have concemed how people perceive, attend to and make sense of the sequences themselves. In this context, it is not surprising to learn from sequence monitoring studies that people are more likely to notice elements that "stand-out" relationally from surrounding elements in various ways (e.g., increasing pitch or time difference, etc.; Bregman, 1990; Miller & Heise, 1950; Heise & Miller, 1951; van Noorden, 1975; Woods et al., 2001). Relatedly, distinctive tones are especially important in musical sequences where their attention-getting potential dignifies them as "accents;"accents are often distributed strategically in time by composers and performers with the goal of manipulating "when" listeners should attend (e.g., Jones, 1987). Generally, the degree to which any sound element grabs attention depends on its relationships to surrounding tones; if these are all similar to one another, then a distinctive singleton will seem still more prominent and attention-getting. Conversely, the more similar a singleton becomes to other items in a well-formed group the less accurately is it judged (Bregman, 1990; Bregman & Rudnicky, 1978; Divenyi & Hirsh, 1978; Jones, Kidd, & Wetzel, 1981; Jones & Yee, 1993; Mondor, Zattorre, & Terrio, 1998; Watson, Kelly & Wroton, 1976; van Noorden, 1975). Furthermore, the salience of a cue may depend on sequence rate; for instance, at fast rates, singletons based on frequency differences are more salient cues than those associated with sound source location (left versus right ear), but the reverse obtains at slow rates (Woods et al., 2001). One interpretation of all of this is that the attention-getting potential of a single tone within a larger serial context depends on the way a listener responds to the context. If, in listening to an unfolding sequence, the singleton "fits" together relationally with surrounding tones and/or confirms an expectancy about the pattem structure, then that tone will be less likely to be noticed as a separate object and more likely to be integrated into the ongoing sequence. Recently, direct attempts have been made to emulate visual capture in auditory sequence monitoring. Listeners monitor a sequence for a given feature (duration, rise time, intensity, etc.) that may or may not coincide with a distinctive
206
Riess Jones
singleton (Woods, Alho, Algazi, 1994; Woods et al., 2001). For example, Mondor and Terrio (1998) asked untrained listeners to respond to an irrelevant target feature in regularly timed (isochronous) tone sequences forming either rising or falling pitch trajectories. The singleton either departed from a trajectory in pitch (near or far in pitch space) or fell on the trajectory (null pitch change). A to-be-identified target feature (duration, intensity etc.) always coincided with the singleton. Listeners were best with very distinctive pitch singletons (far) and worst when the singleton tone "fit into" a pitch trajectory (null pitch change). That distinctive singletons facilitate target identification suggests that they may capture attention. At the same time, because performance was poorest for the target on the pattem-directed trajectory, Mondor and Terrio concluded that pattern-directed expectancies were not present. Nevertheless, it is interesting that in a subsequent study, they found that irregularities within a pitch sequence weakened the capture-like effect, thus underscoring the importance of pattern structure. It remains possible that capturelike phenomena are somehow contingent upon listeners' use of pattern regularities. Although currently the contingent capture idea suggested here differs in important respects from that proposed by Folk and his colleagues for visual capture, greater convergence may emerge over time. I would be remiss if I ignored the attentional blink (AB) task in a discussion about the impact of singletons planted in fast sequences (Raymond & Shapiro, 1992; Shapiro, Raymond, & Amell, 1994). Only a few auditory AB studies exist; all employ random sequences of sounded digits or letters that are conveyed in a regular time pattem (Amell & Jolicoeur, 1997; Chun & Potter, 1995). Unlike other sequence monitoring research, issues of pattern relationships have not been central to explaining the common AB finding that a distinctive singleton (target) briefly interferes with identification of a subsequent element (probe). Nevertheless, I speculate that a case might be made that the time pattem induces a regular attentional pace and this pace is somehow briefly disrupted (the blink) by the target in these sequences. In this interpretation, a kind of capture is initiated by the singleton. Moreover, because the blink has been shown to emerge only when people are instructed to explicitly monitor for the target, one may infer that such capture is contingent on attention set and task relevance; I propose that the blink may also be contingent on sequence rate and rhythm. Finally, given that abrupt onsets have been central to the debate over visual capture of attention, it is surprising that no comparable published work exists with abrupt onset singletons in auditory sequences. I suspect this is due to format differences between spatial search and temporal monitoring because in sequences all onsets are, in one sense, abrupt. Following my earlier claim that abrupt onsets may operate by virtue of their relative (not absolute) time properties, it is possible that the most attention-getting onsets in auditory sequences are ones which deviate from a temporal regularity implied by other sound onsets. Certainly larger temporal violations of an ongoing time pattem are more noticeable than smaller ones (e.g., Jones & Yee, 1998; Large & Jones, 1999). Of course, in these time judgment studies, time was task relevant; it was both the defining and reported feature.
Auditory Attentional Capture
207
Accordingly, we cannot conclude that evidence for noticeability of a time change represents stimulus-driven capture by an unexpected time change if people are already set to attend to timing. Recently, however, Ralph Barnes (in our lab) was able to demonstrate capture by tones with unexpected timings in isochronous sequences; temporally deviant tones were better identified in pitch where pitch (not time) was both a defining and reported dimension. Summary of research on auditory capture. In both cue-target and sequence monitoring tasks, evidence is less clear-cut regarding stimulus-driven attending than for expectancies based on sound stimuli. Nevertheless, some findings indicate that a sudden, invalid, cue sound "calls" attention to locations in real and (possibly) pitch space. Other research suggests that in sequence monitoring a version of attentional capture may obtain, but it is contingent on listeners use of sequence structure.
Summary of part I A recurrent theme of this, admittedly selective, review of attending to auditory events involves the role of stimulus timing. This theme emerges in my interpretations of data arising from both cue-target and sequence monitoring tasks. It is justified, I think, because auditory events, unlike visual objects, are preeminently temporal. In light of this, it is rather astonishing that we know less about the orienting of auditory attending in time than we do about its orienting in real space and pitch space. Accordingly, in my summary I return to two questions implicit in my discussions of attending in time. The first question simply asks whether people can allocate attending in time. Evidence from both visual and auditory domains indicates that they can. Temporal expectancies in cue-target designs have been manipulated mainly via valid endogenous cues and in sequence monitoring tasks by assessing effects of sequence rate and rhythm on judgments about "when" future sounds will occur. My bias is that the basis for these expectancies rests in the time structure of a task and its stimuli; thus, we will find that consistency of onset times and IOIs in certain cuetarget designs facilitates expectancies about "when" a target may occur; similar temporal expectancy effects arise from the regularities of tone onsets and recurrent IOIs in stimulus sequences. It is conceivable that people use consistent time relationships in either short (cue-target) or long (monitoring) sequences to pace attending and that a temporal expectancy represents an extrapolation of this pace. But, one lesson to be drawn from this review is that relatively little current research has directly addressed this topic. I believe that this is largely because time (e.g., as in IOIs) is not usually conceived as part of the structure of a task or a stimulus; it is rarely considered a potentially relevant aspect of an exogenous or endogenous "cue" itself. Although time has certainly been an important variable in the reviewed research, its manipulation often reflects the view that time is a void that can be filled with various processing activities (e.g., rehearsal, decay, etc.). I suggest that this view of time limits its interpretation as part of the stimulus structure that people use to attend.
208
Riess Jones
The second question concerns capture. What is capture in the auditory domain and is it contingent on aspects of the task, including task goals? Capture has received far less study in auditory than in visual events. Yet, sudden sounds indeed can orient attending in exogenous cueing designs; and distinctive singleton elements appear to attract attention within auditory sequences. However, because it is the "out-of-context" sounds in sequences that tend to grab attention, my inclination is to view capture as an expression of a listener's response to a violation of structural relationships that characterize a task or stimulus sequence. Capture-like performance in response to relationally distinct singletons has been demonstrated in monitoring of long sequences where it appears contingent on people's use of pitch relationships among surrounding sequence tones. Does this mean that capture reflects a listener's response to an expectancy violation? This question is more speculative and pertinent research is limited. Finally, it is odd that debates over abrupt onsets as a possible source of automatic control of attending find no counterpart in auditory sequence monitoring. Is this because all onsets are abrupt in sequences. If the singleton quality of an abrupt onset is removed, does this reduce its salience? Or is there a role for abrupt onsets in attending to auditory sequences? I address such questions in the next section.
II. Dynamics of Attending to Auditory Sequences In this section, I develop the idea of dynamic attending as a form of patterndirected attending based on stimulus time relationships. I attempt to show how dynamic attending explains auditory expectancies and temporal capture in slow sequence monitoring tasks.
Temporal expectancies: If people can allocate attention in time, how do they do this? The clearest evidence that people can allocate attending to regions in time comes from two sources. As indicated in Part I, over the course of a session, valid visual cues that are probabilistically paired with time intervals effectively elicit expectancies about the future onset times of a target. It is possible to argue that this dynamic attending is voluntary and based on the kind of learning that enables us to use clock codes to plan for future events. The second source for temporal targeting of attending comes from findings that people show temporal expectancies in their monitoring of slow sequences. In this case, dynamic attending is more directly influenced by stimulus time relationships and possibly this attending is more primitive. As suggested earlier, curiously little research exists in the auditory domain on orienting attending in time. For instance, is it possible to exogenously "cue" a time interval? I suspect this is difficult for many of us to imagine. But it is not unreasonable to propose that such a cue would simply be another time interval. Although I have suggested that IOIs indeed play a role in understanding auditory
Auditory Attentional Capture
209
attention, this role remains to be examined and the applicability of terminology such as exogenous/endogenous is unsettled; in fact, it is not clear that IOIs should even be considered "cues" because they participate in time pattems. But if forced into such dichotomies, then an interesting candidate for an exogenous time cue certainly might be the repeated IOIs that form a rhythm. This analogy tends to misconstrue both rhythm as a time pattem and dynamic attending as it relates to rhythm. Nevertheless, because the cue terminology is so familiar, perhaps this flawed analogy will facilitate understanding forthcoming experiments that raise the possibility of stimulus-driven aspects of attending and expectancies. In this part of the chapter my concern is with demonstrating how the idea of dynamic attending can be used to explain temporal expectancies that are influenced by temporal aspects of slow auditory sequences. Therefore, I do not focus upon how a temporal expectancy is acquired in response to discrete symbolic cues, although this is an important problem. Instead, I consider ways in which certain stimulus properties in sequences facilitate allocation of attending in time; mainly these properties relate to onsets of single sounds and IOIs between successive sounds within sequences (although I do not rule out the impact of recurrent time intervals throughout a session). Key to understanding this approach is the principle of synchrony (Jones, 1976). The idea is simple: Attending, lets say, one's attentional focus, must be selectively synchronized with a to-be-attended object to insure accurate judgments about that object. My colleagues and I have proposed that synchrony is achieved in one or both of two ways, by: 1. Anticipatory attending which is directed in advance of the onset of a tone; and/or 2. Reactive attending which entails a quick re-direction of an attentional focus in time to a tone following its onset (Jones, Moynihan, MacKenzie & Puente, in press; Barnes & Jones, 2000). Anticipatory attending is pattem-directed; it realizes a stimulus-controlled pace based on pattem timing. In this view, an expectancy is an extrapolation of such an attentional pace. Reactive attending refers to people's rapid responses to element onsets that are not correctly anticipated, i.e., expected. Reactive attending is stimulus-driven by abrupt onsets, with strong parallels to conventional capture in visual attention; I remm to this point shortly. What is unusual about this argument is the claim that anticipatory attending can also be stimulus-driven. As I will show, this is because it is responsive to pattems of stimulus onsets and stimulus sequence IOIs; in this regard, we find a new role for abrupt onsets. Together, both anticipatory and reactive attending occur in response to stimulus-sequences and they effectively pace attending. Let me clarify anticipatory attending in order to justify the controversial claim that expectancies can be stimulus driven. First, I assume that, locally, an abrupt onset captures attending by time-locking an individual's attentional focus (Jones, 1976; Posner, 1980). Sequences, by definition, comprise many onsets and while the abrupt onset is no longer a singleton within a sequence, it nevertheless retains its abrupt quality in conveying a local change from silence to sound; in auditory sequences such onsets nevertheless remain quite salient (Vos & Ellerman, 1989). Furthermore, if each onset in a tone sequence commandeers a brief reactive
210
Riess Jones
attentional shift, then together, a series of attentional shifts occurs, with each attention shift lasting T ms. I claim that T comes to approximate sequence IOIs ( T o IOI). Thus, attentional pace, first discussed as a feature of flexible attentional search in uncued visual search, returns in sequence monitoring where it is much more constrained by the temporal array. Furthermore, a new role emerges for abrupt onsets: They summon attention to particular points in time thereby outlining lOis that ultimately determine T. Clearly, anticipatory attending depends, in part, on abrupt onsets. Effectively, anticipatory attending builds on the local capture of attending by successive onsets. But, it also depends on stimulus lois which participate in sequence rate and rhythm. The average IOI of a sequence determines its rate, and when all IOIs are equal the sequence has a very simple and regular (isochronous) rhythm. I suggest that these stimulus properties come to control the pace of attending in an anticipatory manner. Moreover, if the same or relationally congruent IOIs occur over several onsets, then sequence rate determines attentional pace, thus limiting attentional flexibility. Instead, I propose that a periodic activity is engaged that synchronizes attending with tone onsets and matches T to pattern lOis. Extrapolation of this induced attending activity means that it may persist over time. In this manner, stimulus-driven temporal expectancies reflect specific anticipatory targeting of an attentional focus in time. Now consider reactive attending. This provides a more elementary way of pacing attending which relies largely on fast reflexive attentional shifts to prior tones. Whereas anticipatory attending is dependent on the oscillator period and stimulus rate, reactive attending is not. It involves a fast response to onsets and so may be more evident in fast sequences than anticipatory attending. Not only this; it is also more likely to occur in irregular than in regular rhythms. Nevertheless, in this view both anticipatory and reactive attending are largely stimulus driven.
Figure 2a. An entrainment model responding to a series of tone onsets (black bars) by changing its period (peak to subsequent peak) to match inter-onset-time intervals (IOIs) and aligning its phase (expected - observed onset times) to the stimulus rhythm. An expected point in time is given by a pulse peak.
Auditory Attentional Capture
211
A Predicted Expectancy Profile 0.8
Proportion Correct
0.5 Very Early Early
OnTime
Late
Very Late
Critical IOI Figure 2b. A predicted expectancy profile in which proportion correct (PC) judgments about tones is shown as a function of the onset time of a final tone, (critical IOI), relative to an induction rhythm. The profile 'recovers' the shape of an attention pulse from the entrainment model (top panel).
Recently Ed Large and I outlined a formal model of dynamic attending (developed by Large) that incorporates both anticipatory and reactive attending (Large & Jones, 1999). Figure 2a illustrates some properties of this approach, which describes attending as the entrainment of one or more intemal oscillations (limit cycle oscillators). 4 Entrainment refers to a real-time 'locking' of attending to certain properties of an unfolding stimulus (Jones, 1976). In this model, Large proposed that a given oscillator carries a pulse of attentional energy, where the pulse expresses an attentional focus in time. In other words, pulse width represents the width of an attentional focus over a region in time; it covers a temporal region of heightened attending surrounding an expected onset time. Basically, the oscillator entrains to timing patterns by generating, quasiperiodically, a pulse of attentional energy, as suggested in Figure 2a. When entrained to an isochronous rhythm, an oscillator temporally targets an attentional focus (pulse) over a span of time equal to sequence IOIs. If we further assume that accuracy in identifying an element (tone) within a sequence increases with attending energy associated with that tone in pitch space and time, then tones that fall at temporally expected times in the future will enjoy a greater concentration of attention and will be judged more accurately than ones that happen at unexpected times. Temporal capture: expectancies?
How
does
it
relate
to
stimulus-driven
temporal
A temporal expectancy is associated with a shift of attention in time that realizes anticipatory attending. Expectancies are affected by stimulus time properties, such as sequence (and session) IOIs; in other words, they are strongly
212
Riess Jones
affected by sequence rate. Furthermore, because anticipatory attending relies on discrete stimulus onsets, as well as IOIs, in a real sense it depends upon reactive attending. Thus, in various ways, the two kinds of attending are inter-dependent. In part for these reasons, I postpone discussion of whether we can confidently describe anticipatory attending, and hence expectancies, as strictly voluntary and reactive attending (which underlies capture) as strictly automatic and involuntary. For the present, I merely stipulate that, respectively different stimulus timing properties contribute to determining these two aspects of entrainment. If expectancies can be, in a sense, stimulus driven, where does capture fit into this picture? In the entrainment model, capture by which I will mean temporal capture (Barnes & Jones, 2000), is related to reactive attending. This is modeled as phase entrainment of oscillatory attending. Reactive attending is determined mainly by a phase parameter; this parameter describes a time-locking function of attending to an abrupt onset. Although the phase parameter governs reflexive shifts of attention in time to tone onsets, each shift is contingent on the degree to which an onset time deviates from an expected time. In other words, reactive attending is contingent on pattern-directed attending in time. Thus, if a listener correctly anticipates the time of a tone's onset, then expected and observed times are identical and no adaptive phase shift will occur. Nor does reactive attending appear in this case. Adaptive shifts occur only when anticipatory attending incorrectly targets attending in time. In the latter, the phase parameter modulates alignment of an attentional pulse to the onset time of a tone; this is phase entrainment. It governs the extent to which an attentional pulse is pulled to each stimulus onset in time, reducing the temporal difference between observed and expected time, thereby achieving attentional synchrony. Reactive attending, then, parallels in time the more familiar version of capture in space often found with abrupt visual onsets. However, in temporal events, we refer to this type of capture as temporal capture. Temporal capture is less evident as attentional focus narrows. In this model, period entrainment occurs along with phase entrainment. An oscillator's period comes to match IOIs within a stimulus pattern. This gravitation of the oscillator's period to the average IOI (sequence rate) of an auditory pattern is a slower adaptive process than phase entrainment. It is determined by the value of a period parameter which governs how rapidly an oscillator adapts its period in response to changes in stimulus IOIs. Adaptability means, for instance, if an auditory pattern begins at one rate, then speeds up (a common happening), within limits the oscillator will keep pace with it. In real time, the oscillator always attempts to align attentional focus with stimulus onsets and to match its period with the average IOI of a sequence. In sum, in Large's model an attending oscillator adapts to shifts in the rate of a pattern by changing its pace accordingly. This dynamic version of pattern-directed attending, carries two implications for understanding temporal capture. The first, we have already mentioned: All tone onsets have a potential for capturing attention via phase entrainment. The second implication draws this model closer to debates on capture because it qualifies the
Auditory Attentional Capture
213
condition of temporal capture in terms of attentional focus. That is, attentional focus width changes over time in response to the ongoing time pattern and capture is more or less likely depending on focus width. Specifically, attentional focus in time widens in irregular time pattems and narrows in regular ones. With a narrow focus, attention targeting in time is more precise than with a wide focus; thus, expectancies are more specific to certain temporal regions with regular sequences. More attending energy is allocated to expected than to unexpected onset times. Accordingly, with a narrow focus, the model predicts that accuracy of identifying tones will be better when they arrive at expected rather than unexpected times. Indeed, people may even fail to notice extremely unexpected tones, given a very narrow attentional focus, implying that temporal capture is less likely with a narrow focus. By contrast, irregular time patterns induce a wide attentional focus; expectancies are less specific, ranging over a broader region in time. In irregularly timed sequences, attending becomes more variable and attentional pulses are erratically targeted in time. In these cases, listeners have a better chance of detecting tone onsets that are very unexpected than with a narrow attention focus and hence are more susceptible to temporal capture. In short, temporal capture by abrupt onsets is contingent on the width of an attentional focus in time.
III.
Evidence for Dynamic Attending to Slow Auditory Sequences.
In this section, I describe experiments concerned with stimulus timing and dynamic attending 5. First, I show that slow auditory sequences, simply by virtue of recurrent IOIs and tone onsets may pace anticipatory attending and effect a directed allocation of attending in time. Second, I describe experiments that suggest a means of allocating attending in time based on sequence rate and rhythm. Finally, I show that when the temporal regularity is removed from an inducing pattern, effects of temporal targeting of attending vanish. All of the research I report uses listeners with minimal musical training who receive relatively little practice in the task. The task requires that they judge the pitch of a comparison tone relative to a preceding standard tone. Our strategy was to render the onset of a comparison tone either temporally expected or unexpected by introducing an interpolated rhythmic sequence between the standard and comparison tones to assess its effects on comparison judgments. In some respects this represents a reversal of a task used by Mondor and Bregman (1994) in which people judged the duration of a tone cued by the frequency of another tone. We required that people judge the pitch of a tone whose onset time is cued by a series of stimulus IOIs. Although our task presents auditory sequences, thereby qualifying as a sequence monitoring task, in our version of this task the comparison (i.e., target) tone always follows the sequence and listeners are actually told to ignore the interpolated sequence. Following Yantis and Egeth (1999), in this task pitch is both the defining and reported dimension; the frequency of a to-be-attended tone (comparison) is systematically varied and people must report its relative pitch.
214
Riess Jones
Our aim is to manipulate relative timing in a task where time is putatively irrelevant and probabilistically uninformative. In addition, we sought to discover if sequence timing might over-ride instructions to ignore the distractor tones.
The task and general methodology The task adapts an old procedure, the interpolated tones task (Deustch, 1999, for a review). It is shown in Figure lb. A standard tone of 150 ms is followed by eight, 60 ms, interpolated tones (randomly re-arranged on each trial). All tones (recorder voice) and sequences were generated via customized software (MIDILAB 6.0; Todd, Boltz, & Jones, 1989) using a Yamaha TG 100 tone generator interfaced with a pentium PC. Sound sequences were delivered over Beyerdynamic DT770 headphones at a comfortable listening level. All together we used six different standard pitches, 415Hz (Ab4), 440 Hz (A4), 466 Hz (Bb4), 622 Hz (E b 5), 659 Hz (E5), 698 Hz (F5), each associated with three different comparison pitches (+ 1,-1, 0 semitone, ST, differences). Interpolated tones randomly varied within three semitones (544.4 Hz to 789 Hz) centered on 659 Hz, if the standard was between 415 and 466 Hz; they varied between 370 to 523.3 Hz, centering on 440 Hz if the standard was between 622 Hz and 698 Hz. Many different interpolated pitches and arrangements of pitches were employed (one constraint, described below, relates to the final interpolated tone). The listener's task was to judge the pitch of a comparison (Same, Higher, Lower) relative to the standard. In adapting this task to study dynamic attending we introduced three modifications. First, in our initial experiments we fixed all IOIs of interpolated tones to the same 600 ms value to create a time pattern with a regular rhythm and varied the relative onset time of the comparison tone to render it either temporally expected or unexpected given this rhythm. The final IOI, which immediatedly preceded the comparison tone, termed the critical IOI, assumed five different values rendering a comparison either Very Early, Early, On Time (600 ms IOI), Late, or Very Late. A second modification built upon Deutsch's finding that a single repetition of the standard pitch within the interpolated sequence boosts overall accuracy in this rather difficult task (Deustch, 1972). Therefore, we included one repetition of the standard in our interpolated sequence, constraining it to be the final one. This had two advantages. In addition to rendering the task less difficult, it prevented spurious frequency cuing associated with the final interpolated tone, and controlled biased responding based on whether the final interpolated tone was higher or lower than the comparison, evident in many pilot studies. 6 The third modification involved instructions. From prior research (our own and others) we know that people do well in this task in the absence of interpolated tones. Therefore, we asked participants to "Ignore all intervening tones." They were told (validly) that this would help their performance in the task. However, our motives were not entirely altruistic. One criterion of automaticity of stimulus-driven attending involves ascertaining whether a task irrelevant stimulus property cannot be ignored, in spite of instructions to do so (Yantis & Egeth, 1999). In principle
Auditory Attentional Capture
215
timing is irrelevant in this task because listeners must judge only pitch. Moreover, to the degree they succeed in "tuning out" these distracting tones, they will be more accurate and less likely to show influences of sequence time structure. But, if people involuntarily respond to these timing patterns, then they should be unable to comply with instructions.
Experiment 1" Allocation of attending to expected regions in time The main independent variable in Experiment 1 was the critical IOI, which assumed five levels, one of which matched sequence rate (IOI). The dynamic attending model (part II) suggests sequence rate and rhythm systematically attending and determine accuracy of pitch judgments as function of the critical IOI. Specifically, if people extrapolate the pace induced by the auditory sequence, then they will most likely target attending to a comparison consistent with that rate. This predicts an expectancy profile as a function of critical IOIs where the expected time corresponds to the critical IOI identical to sequence IOIs. Figure 2b suggests such a profile; proportion of correct pitch judgments, PC, about a comparison tone is best for the temporally expected comparison and worst for very unexpected ones, thus recovering the shape of an attentional pulse (Figure 2a). In light of discussions in Part I, a broad expectancy region is outlined by the range of critical IOIs in this task. At the same time, the model of Part II suggests that a regular sequence rhythm can focus attention more narrowly within this region. On each trial entrainment to tone onsets (phase entrainment) and to IOIs (period entrainment) paces attending to sequence IOIs. This leads to the prediction of a narrow attentional focus and specific temporal expectancies, weighted in favor of the critical IOI of 600 ms. Accordingly, attentional energy should be maximal at the expected time point and symmetrically drop as critical IOIs depart from the expected one. Methodological Details. A full description of our methodology appears elsewhere (Jones et al., in press). Twenty-one participants, all with little musical training, served for 45 minutes in Experiment 1. They received 180 trials with five levels of critical IOI occurring equally often (but randomly). Critical IOI values were 524, 579, 600, 621, and 674 ms (ranging, respectively, from Very Early to Very Late). Participants also received a post-session questionnaire that queried listeners on compliance with instructions (among other things). Results and discussion. Figure 3 presents the results of Experiment l where mean PC is a function of comparison onset time (five critical IOIs). On average, listeners were best in judging pitch when the comparison tone sounded at an expected time (On Time) and worst in the two very unexpected conditions (Very Early and Very Late), yielding a main effect of timing (critical IOI), F(1,80) = 3.79, Mse = .012, p = .007. The quadratic trend traced out by the observed expectancy profile was also significant, F (1,20) = 9.27, Mse - .005, p = .006.
216
Riess Jones a.
b.
Experiment 1
Experiment 2
0.8 0.75 0.7 0.65
l/
/
u I/
. .
I In
.
/////
.
_ 0.6
0.5
I/ Very Early
I II Early
On Time
Late
Very Late
//l Next Beat
V e r y Early Early
// On Time
Late
Very Late
Onset Time of Comparison IOI(ms)
524
579
600
621
676
1124 1179 1200
1221 1276
Figure 3. Observed expectancy profiles of PC from Experiments 1 and 2 as function of critical IOIs.
These findings suggest that the :'hythaa of an interpolated tone sequence significantly affects subsequent pitch judgments. Although sequence timing is, in principle, irrelevant in this task, people apparently tacitly "used" consistent stimulus time relationships to allocate attention in time. The observed expectancy profile emerges, reinforcing the hypothesis that stimulus timing contributes to expectancies. These were relatively slow sequences, best suited to revealing anticipatory attending stimulated by the pattern's time structure. Automatic Gestalt grouping processes are probably not involved because, according to Bregman (1990), they operate at faster rates; furthermore, grouping principles do not predict anticipatory attending. For instance, if the Gestalt rule of temporal proximity were operational, then the comparison tone most likely to group with interpolated tones is given by a critical IOI of 524. In other words, the resultant temporal grouping due to proximity predicts a linear, not a quadratic, PC trend with lowest scores for a critical IOI of 524 ms, due to maximal interference (from grouping), and highest for the critical IOI of 676 ms.
Experiment 2" How do people allocate attending in time? The entrainment model proposes that the regular timing of interpolated tones induces an attentional periodicity. If so, then we anticipate that this oscillation of attentional energy should persist for at least a few (IOI) cycles before dying out. In musical terms: " the beat goes on .... " To test this we inserted a "missing beat,"
Auditory Attentional Capture
217
namely a lengthened silence equal to two sequence IOIs, between the last interpolated tone and the onset of the expected comparison tone. Thus, the On Time comparison IOI had a critical IOI of 1,200 ms instead of 600 ms IOI as in Experiment 1. The dynamic attending hypothesis continues to predict a quadratic PC profile in this case because stimulus rhythm should induce a periodic entrainment process where attending oscillations persist, cyclically, over lengthened silent time intervals before they die out. The temporal separation of a comparison tone from the interpolated sequence also permits us to assess several predictions. One concerns instructions; others concern the role of absolute time. With respect to instructions, it is possible that people in Experiment 1 had difficulty following instructions to ignore the interpolated tones simply because they do not distinguish the comparison from interpolated tones. In Experiment 2, a clear temporal segregation of the comparison from interpolated tones is given. If this segregation improves listeners ability to comply with instructions to ignore the interpolated tones, then we should observe much better performance in Experiment 2 than in Experiment 1 and the quadratic profile associated with the interpolated rhythm should vanish. Absolute time refers to the length of a critical IOI on an interval scale. People may be responding to time in an absolute (linear) fashion, rather than in a relative (periodic) fashion. If so, then the absolute time feature of a critical IOI may form the basis either for Gestalt grouping, by time interval similarity, or for a retention interval, leading to forgetting. In either case, a linear not a quadratic trend over critical lOis is predicted. For example, if the Gestalt rule of similarity applies to these time intervals; then on an interval scale, similarity is greatest between interpolated lois (600) and the Very Early critical IOI; it is lowest with the Very Late critical IOI. Because grouping tends to increase errors, this leads to the prediction of a linear trend with poorest performance for maximum grouping. 7 Finally, if memory loss due to decay or interference during a retention interval is operative, then we expect poorer performance in Experiment 2 than Experiment 1 and a monotonic decline over time (absolute critical IOI), with greatest accuracy for the shortest critical IOI. Methodological Details. In Experiment 2 we used 19 participants in a task identical to that of Experiment 1 with the exception that we added 600 ms to all levels of the critical IOI. Relative to the last tone in the interpolated sequence, these were Unexpected Very Early (1,124 ms), Unexpected Early (1,179 ms), Expected On Time (1,200 ms), Unexpected Late (1,221ms), Unexpected Very Late (1,267 ms). In this experiment, although the critical lOis were enlarged, we continued to use the same magnitude of time deviations from an expected On Time of 1,200 ms in Experiment 2; this means that expectancy violations associated with very unexpected onsets were proportionally smaller (.06). Results and discussion. Results of Experiment 2 appear in Figure 3 along with those of Experiment 1. Overall, accuracy was slightly, but not significantly, higher in Experiment 2 than in Experiment 1. But again we found that people were significantly more accurate in judging the pitch of comparisons that occur at the
218
Riess Jones
expected time (On Time comparisons), F (4,72) = 2.51, Mse = .009, p < .05. This suggests that the rhythmic expectancy established by interpolated timing persists through a missing beat. The quadratic function, although weaker than in Experiment 1, was significant due to relatively good performance with the On Time comparison, F(1,18) = 7.18, Mse = .003, p=.013 Our findings are more compatible with a dynamic attending hypothesis than with the hypothesis that a time gap facilitates ignoring interpolated tones. They are also not compatible with absolute time explanations, based either on Gestalt rules or on memory loss. The latter two accounts predict linear, not quadratic, functions over time. Instead, the observed quadratic trend suggests that temporal expectancies are carried by a persisting periodic process induced by stimulus rhythm. This interpretation is supported by evidence that the peak accuracy occurred when the critical IOI equaled two interpolated IOIs. If the attentional shift of an oscillator, entrained to the stimulus rhythm, corresponds to a period ofT = IOI (from pulse peak to pulse peak), then when this oscillator extrapolates the internalized stimulus rhythm, two periodic attending shifts will require a total time of 2T = 2IOIs.
Experiment 3: Larger Expectancy violations In Experiment 3 we enlisted new participants and assigned some to an experimental condition, in which they received the same interpolated rhythmic sequence as in Experiment 2, but with larger expectancy violations. Other participants were assigned to a control condition, in which they received no interpolated rhythm between the standard and comparison tones. Two alternative hypotheses address possible differences between experimental and control groups. First, if the quadratic trends observed in earlier experiments result from range or midpoint effects conferred merely by critical IOIs then both groups in Experiment 3 should exhibit the same quadratic PC trend as a function of critical IOIs. If listeners are responding only to the set of critical IOIs encountered in a session, and not to the rhythm of interpolated sequences, then we should not be able to reject a null hypothesis that control and experimental groups do not differ. On the other hand, if rhythm has a special role in determining this expectancy profile, then we should find a quadratic trend in the performance of the experimental but not the control group. Methodological Details. We recruited 13 na'fve participants for the experimental condition. For this condition, we duplicated Experiment 2 sequences except that the very unexpected time changes were increased to render proportionally larger (.09) changes, given the longer critical IOIs. The four unexpected comparison times were: Unexpected Very Early (-115 ms), Unexpected Early (-15 ms), and Unexpected Late (+15 ms), Unexpected Very Late (+115 ms), all relative to 1,200 ms. The very unexpected deviations exceed those of Experiment 2 in absolute magnitude (76 ms vs 115 ms) whereas the other two are slightly smaller in magnitude (21 ms vs 15 ms). The Expected On Time, comparison
Auditory Attentional Capture
219
remained identical in both Experiments 2 and 3 (0 ms deviation relative to 1,200 ms IOI). In the control condition, 16 other naive listeners received the same set of five critical IOIs as in the experimental condition. But on all trials, all interpolated tones but the final one were eliminated. Participants in this condition received the standard tone, a silence of 4,800 ms, then the single (final) interpolated tone followed, equally often, by one of the five critical IOIs which was always terminated by the comparison tone onset. Results and discussion. Results of Experiment 3 indicated that performance of the control listeners differed significantly from those of experimental listeners, both overall, F(1,27) = 20.17, Mse = .368, p = .0001, and in terms of an interaction with time level, F (4,108) = 10.16, Mse = .006, p < .0001. The control group performed very well (PC > .95) at all five critical IOIs evidencing a flat expectancy profile. The experiment group performed less well (mean PC .67) and evidenced a significant quadratic trend over the time levels, F(1,12) 23.27, Mse =.002, p = .004. Figure 4a presents the mean PC scores for this group as a function of time level. The observed expectancy profile is significantly sharper than observed in Experiment 2, reflected in a significant difference in the two quadratic trends, F (1,30) = 7.30, Mse = .004, p = .011. This suggests that the proportionately larger expectancy violations in Experiment 3 indeed produced expectancy levels that were more difficult to cooe with.
b.
a. Experiment 3 0.8
0.8
0.7
0.7
0.6
0.6
0.5
Experiment 4
0.5 Very Early Early OnTime Late Very Late
Critical IOI
Figure 4.
Very Early Early OnTime Late Very Late
Critical I 0 I
Observed expectancy profiles of PC from Experiments 3 and 4 as a function of
critical IOIs.
The findings of Experiment 3 indicate that people in the experimental condition did not succeeed in "tuning out" the interpolated rhythm as instructed. Rather their performance indicates that with a lengthened critical IOI (missing beat) attending persists in a periodic fashion over time before it fades.
220
Riess Jones
Experiment 4: Attentional focus, stimulus timing and attentional capture In this final experiment, we consider other ways in which stimulus timing may exert control over moment-to-moment attending. Preceding experiments were designed to determine whether anticipatory attending is induced by regular stimulus IOIs; in these, we hypothesized that attending would involve a relatively narrow attentional focus. Our dynamic attending model assumes that variability of stimulus timing, i.e., rhythmic irregularity, widens the focus. In this case, we might find a greater role for reactive attending to abrupt onsets. The idea is that different IOIs in an irregular rhythm (or an experimental session) can force a widening of listeners' attentional focus. In turn, more incorrect temporal expectancies will appear, providing greater opportunity for reactive attending (see e.g., Large & Jones, 1999). Accordingly, in Experiment 4 we increased the IOI variability of interpolated sequences to produce irregular rhythms without changing average rate (mean IOI = 600 ms). Our model predicts that resulting expectancy profiles will be symmetrical around the mean IOI but flatter than observed with regular rhythms. Methodological Details. We recruited 11 na'fve listeners as before. Methodologically, all aspects of Experiment 4 were identical to those of the experimental condition of Experiment 3 except that sequences were temporally irregular. The following properties characterized the context IOIs that were randomly arranged: 1. Mean IOI of interpolated sequences remained 600 ms; 2. Standard deviation of IOI was 21 lms for '/2 the sequences and 249 ms for the others; 3. The IOIs ranged from 200 ms to 850 ms; 4. First and last context IOIs remained 600 ms; 5. The On Time comparison IOI remained 1,200 ms and the four unexpected comparison IOIs were identical to those of Experiment 3. Results and Discussion. Results appear in Figure 4b. The mean PC scores do not follow a quadratic trend over critical IOI levels (this trend component is not significant); instead the expectancy profile is flat with an overall average PC of .71. A comparison of performance with regular (Experiment 3, experimental group) and irregular (Experiment 4) rhythms indicated that although the two conditions did not produce significant differences in overall accuracy, a significant interaction of context timing (regular vs irregular) with comparison timing (five levels of comparison IOIs) did appear, F (4,88) - 4.90, Mse = .009, p - .0013. In contrast to listeners with regular timing, those encountering the irregular rhythms do better with very unexpected comparisons especially those that are unexpectedly late, suggesting a widening of the attentional focus. In this experiment the distribution of critical IOIs were identical to those in Experiment 3, yet in Experiment 3 a clear expectancy profile emerged whereas it did not in Experiment 4. This suggests that the dominant stimulus timing influence in Experiments 1 - 4 involved the sequence rhythm. In addition, the Experiment 4 finding that listeners are better in judging unexpectedly late comparisons in the irregular context than in the regular one raises two interpretations. One assumes that a wide attentional focus, instigated by an irregular rhythm, renders the listener more sensitive to reactive attending and temporal capture. The other evokes the visual
Auditory Attentional Capture
221
cue-target designs where unexpectedly early targets were deemed to automatically slow responding and unexpectedly late ones were interpreted to induce a voluntary re-orientation of attending (see part I). In other words, listeners may simply "wait" for a late comparison when a wide focus is induced. We are exploring both of these possibilities.
Summary of experimental findings The main outcomes of these experiments are as follows: 1. Regular rhythms, interpolated between standard and comparison tones, selectively facilitated pitch judgments for comparisons occurring at rhythmically expected times, as indicated by significant quadratic PC profiles over time (observed expectancy profiles). 2. The quadratic expectancy profile emerged even with critical IOIs lengthened relative to the interpolated IOIs, suggesting that the rhythm created by onsets and IOIs of the interpolated rhythm induced a persisting periodic attending process. 3. The quadratic profile was sharper with larger violations of the rhythmic expectancy. 4. The quadratic profile flattened for the same critical IOIs when the interpolated rhythm was missing or irregular. 5. The quadratic profile was based on data from listeners told to ignore the interpolated sequences.
IV. General Concluding Remarks A main goal of our research is to learn about the effects of event timing on attending to auditory events. Although space dimensions figure prominently in attending to visual elements, the time dimension becomes more critical in attending to auditory elements. Yet, we know relatively little about effects of timing on attentional orienting to sounds. In the introduction, I suggested that one important difference between attending to spatial versus temporal arrays is that the former offers an attender greater opportunity for voluntary control, i.e., to search at a comfortable pace among spatially distinct elements. In uncued search, where spatial arrays are static, no obvious time constraints enforce a particular attentional pace. But, with temporal arrays (sequences) come stimulus time constraints that transform the search task into a monitoring one; time constraints in sequence monitoring derive from stimulus properties and limit the flexibility of timed attending and the voluntary pacing of attending. In this respect, such stimulus properties control attending by enforcing a particular pace not only on attending but also on expectancies. At least two quite general consequences follow from our experimental findings. The first involves the stipulation of time as an irrelevant dimension in this task. In
222
Riess Jones
principle, timing could be ignored because listeners were not explicitly required to make judgments about it or to attend to time; that is, pitch was both the defining and reported dimension. Nevertheless, the timing of interpolated tones had a systematic influence on pitch judgments. The second general consequence is that these results suggest a blurring of the strict dichotomy between voluntary and involuntary attention that has been fruitful in research on visual attending. Voluntary control of attention is often equated with expectancies and involuntary control with stimulusdriven attending. However, if stimulus properties contribute to expectancies, then perhaps attending in time depends upon an interaction of voluntary and involuntary components.
Expectancies: What are they? According to a probabilistic-association tradition, expectancies entail an orientation of attention that is conveyed by knowledge acquired from conditional probabilities (cue validity) linking endogenous symbolic cues to targets. In this view an expectancy is not essentially stimulus-driven in that it derives from the acquisition of statistical contingencies, which, in turn, determine slow, deliberate and voluntary processes. According to a pattern-directed approach, an expectancy may also be an orientation of attention that is given by an extrapolation of stimulus relationships inherent in task structure (within a session) and/or sequence structure (within a trial or stimulus pattern). This suggests that expectancies are, at least in part, stimulus based. The claim that an expectancy may be stimulus driven is clearly baffling in light of the widespread belief that expectancies are determined by cue validity. But this presents a puzzle only if we overlook the defining feature of an expectancy. I have argued that this is simply an orientation of attending toward a future happening (as opposed to reactive attending or remembering, both of which suggest orienting to past events). This future orientation is manifest in anticipatory attending, which may or may not be voluntary, conscious and deliberate. And anticipations can be affected in many ways! One way entails cue validity. For instance, people can acquire future-oriented attending from training with valid symbolic cues to time intervals (Kingstone, 1992; Coull & Nobre, 1998). But another way involves the direct reliance on stimulus time relationships where strong anticipatory attending is likely when these are regular (current findings, plus Barnes & Jones, 2000; Large & Jones, 1999). The real puzzle comes with understanding how these different circumstances actually promote timed anticipations and whether all anticipations are necessarily entirely voluntary. Furthermore, it is not clear whether the same mechanisms are responsible for these different ways of establishing expectancies. The research reported here examined the second way of establishing an attentional orientation and temporal expectancies. In this case, reinforcement may entail entrainments of internal periodicities. The dynamic attending model instantiates this stimulus-driven approach to temporal expectancy. It does so in a fashion that makes clear that an inter-dependency obtains between anticipatory
Auditory Attentional Capture
223
attending (expectancies) and reactive attending (responses to expectancy violation and capture).
Stimulus properties and temporal capture Although expectancy figures prominently in this analysis, capture also is an essential part of the story. Theoretically, I suggest that temporal capture involves an oscillator's phase adjustment to a tone onset. It can happen in two contexts: One where anticipatory attending is weak, meaning a wide attentional focus; the other where anticipatory attending is strong, meaning a narrow focus in time. Initially, in novel situations, anticipatory attending is always weak, nonspecific and associated with a wide attentional focus. This characterizes attending as a listener monitors the first few sounds of sequence and/or a session as well as attending to irregularly timed sounds throughout a session (e.g., Experiment 4). In either case, a wide attentional focus renders a listener open to onsets that are associated with a broad range of time intervals. Here temporal capture refers to the phase shift of attending locally to any tone onset. Thus, one interpretation of the flat expectancy profile in Experiment 4 is that people are more adept at responding to very unexpected onsets than are listeners in experiments with regular rhythms (Experiments 1-3). Alternatively, because this is especially true for unexpectedly late onsets, it raises the possibility that phase adjustments are made more readily for late than for early unexpected onsets when the attentional focus is wide (see Coull and Nobre in part I for a related hypothesis for endogenous temporal cuing). Each sound onset falling within this focal region locally commandeers a swift corrective attending shift. Anticipatory attending is strong and specific, instantiating a narrow attentional focus, in contexts where a time structure has been consistently established either rhythmically or otherwise. In such contexts, singular unexpected onsets always offer a potential for violating specific expectancies about "when" an element will occur. The degree to which an expectancy is corrected and the attentional pulse realigned to a tone's onset depends upon the magnitude of the expectancy violation. However, given a narrow focus of attention in time, people are relatively capable of "tuning out" extremely deviant onsets. In Experiments 1-3, the regular rhythmic context appeared to induce a narrow focus in time as suggested by resulting expectancy profiles. In other words, if a tone onset is not very unexpected, given a rhythmic context, then phase adjustments can "fine tune" a temporal expectancy by correcting such violations. In this respect, there are parallels with visual capture where a distinctive singleton may derail attending, if it is not too remote in space. In temporal sequences, attentional re-orienting in time is reactive attending and involves temporal capture. To sum up, both the dynamic attending model and the data suggest that temporal capture may be contingent upon attentional set (focus width). According to the entrainment model, a phase shift in time to an abrupt onset ensues whenever that onset time departs from an expected time. However, temporal capture is more
RiessJones
224
likely to be evident with a wide than a narrow attentional focus. Clearly, this analysis assumes that the presence of a pattern-directed expectancy and the nature of an attentional focus determine the extent to which temporal capture makes an impact on performance. Footnotes
For illustrative purposes I assume serial search in this example although Ward, Duncan, & Shapiro (1996) review many alternative strategies. 2 Others suggest priority tags and queues (Yantis & Johnson, 1990). 3 The ISis ranged from 500 ms to 1,500 ms but tone durations varied over different experiments. 4 Limit cycle oscillators are dynamical systems in which in the limit a stable period of some specified value is achieved. Stability means that whenever a single perturbation to the period occurs (e.g. a different IOI than preceding ones, implying a new period), the oscillator's period will change briefly in the direction of the new IOI, but return to the attractor state of matching the oscillator period with the average IOI following the perturbation. 5 Experiments (and related data) of part III are reported in greater detail in Jones et al. (in press). 6 In post-session questionnaires, very few listeners reported even being aware of this; data from those who did were not analyzed. In addition, we eliminated from the analysis the data from any subject who produced a significantly highly proportion of "Same" responses. 7 Grouping in temporal sequences, unlike in visual arrays, leads to predictions of lowered accuracy for judgments about tones within a common group. Whereas in visual attention, facilitation of elements belonging to a common object may occur, in auditory attentional, interference occurs when elements are grouped (see part I). References
Amell, K. M., & Jolicouer, P. (1999). The attentional blink across stimulus modalities: Evidence for central processing limitations. Journal of Experimental Psychology: Human Perception and Performance, 25, 630-648. Barnes, R., & Jones, M. R. (2000). Expectancy, Attention, and Time.
Cognitive Psychology, 41,254-311. Boltz, M. (1993). The generational of temporal and melodic expectancies during musical listening. Perception & Psychophysics, 53, 585-600. Bregman, A. (1990). Auditory Scene Analysis. Cambridge: MIT Press. Bregman, A. & Rudnicky, A. I. (1975) Auditory segregation: Stream or streams? Journal of Experimental Psychology: Human Perception and
Performance, 1,263-267 Cave, K. R., & Wolfe, J. M. (1990). Modeling the role of parallel
Auditory Attentional Capture
225
processing in visual search. Cognitive Psychologist, 22, 225-271. Chun, M. M., & Potter, M. C. (1995). A two stage model for multiple target detection in rapid serial visual presentation. Journal of Experimental Psychology: Human Perception and Performance, 2I, 109-127. Coull, J. T., Frith, C. D., Buchel, C., & Nobre, A. C. (2000). Orienting attention in time: behavioral and neuroanatomical distinction between exogenous and endogenous shifts. Neuropsychologia, 38, 808-819. Coull, J. T., & Nobre, A. C. (1998). Where and When to pay Attention: The neural systems for directing attention to spatial locations and to time intervals as revealed by both PET and fMRI. The Journal of Neuroscience, 18, 7426-7435. Deutsch, D, (1999). The processing of pitch combinations. In The psychology of music (second edition) Ed: Deutsch, D. Academic Press, London, 349-411. Deutsch, D. (1972). Effect of repetition of standard and comparison tones on recognition memory for pitch. Journal of Experimental Psychology, 93, 156-162. Divenyi, P.L. & Sachs, R.M. (1978) Discrimination of time intervals bounded by tone bursts. Perception & Psychophysics, 24, 429-436 Dowling, W. J., Lung, K. M., & Herrbold, S. (1987). Aiming attention in pitch and time in the perception of interleaved melodies. Perception & Psychophysics, 4I(6), 642-656. Downing, C. J. (1988). Expectancy and visual-spatial attention: Effects on perceptual quality. Journal of Experimental Psychology: Human Perception and Performance, 14, 188-202. Egeth, H., & Yantis, S. (1997). Visual Attention. Annual Review of Psychology, 48, 269-297. Folk, C. L., Remington, R. W., & Johnston, J. C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18, 1030-1044. Folk, C. L., Remington, R. W., & Wright, J. H. (1994). The structure of attentional control: Contingent attentional capture by apparent motion, abrupt onset and color. Journal of Experimental Psychology." Human Perception and Performance, 20, 317-329. Garner, W., & Gottwald, R. L. (1968). The perception and learning of temporal patterns. Quarterly Journal of Experimental Psychology, 20, 97-109. Garner, W. R. (1974). The perception and learning of temporal patterns. The Processing of Information and Structure. Potomac, MD: Erlbaum. Gellatly, A. Cole, G., & Blurton, A. (1999) Do equiluminant object onsets capture visual attention? Journal of Experimental Psychology: Human Perception and Behavior, 25, 1609-1624 Gibson, B. S. (1996). Visual quality and attentional capture: A challenge to the special role of abrupt onsets. Journal of Experimental Psychology, 22, 14961504. Gibson, B. S., & Amelio, J. (2000). Inhibition of return and attentional control settings. Perception & Psychophysics, 62, 496-504.
226
RiessJones
Green, D. M., & Swets, J. (1966). Signal detection theory and psychophysics. New York: Wiley. Greenberg, G. Z., & Larkin, W. D. (1968). Frequency-response characteristic of auditory observers detecting signal of a single frequency in noise: The probe-signal method. Journal of the Acoustical Society of America, 44, 15131523. Heise, G. A., & Miller, G. A. (1951). An experimental study of auditory patterns. American Journal of Psychology, 64, 68-77. Howard, J. H. J., O'Toole, A. J., Parasuraman, R., & Bennett, K. B. (1984). Pattem-directed attention in uncertain-frequency detection. Perception & Psychophysics, 35, 256-264. Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention and memory. Psychological Review, 83, 323-335. Jones, M. R. (1981). Music as a stimulus for psychological motion: Part I. Some determinants of expectancies. Psychomusicology, 1(2), 34-51. Jones, M. R. (1987). Dynamic pattern structure in music: Recent theory and research. Perception & Psychophysics, 41, 621-634. Jones, M. R., Boltz, M., & Kidd, G. (1982). Controlled attending as a function of melodic and temporal context. Jones, M. R., Jagacinski, R. J., Yee, W., Floyd, R. L., & Klapp, S. (1995). Tests of attentional flexibility in listening to polyrhythmic patterns. Journal of Experimental Psychology: Human Perception and Performance, 21, 293-307. Jones, M. R., Kidd, G., & Wetzel, R. (1981). Evidence for rhythmic attention. Journal of Experimental Psychology: Human Perception and Performance, 7, 1059-1073. Jones, M. R., & McAuley, D. J. (2000). Categorical time judgments in extended temporal contexts. Jones, M. R., Moynihan, H., MacKenzie, N., & Puente, J. (in press). Temporal aspects of stimulus-driven attending in dynamic arrays. Psychological Science. Jones, M. R., & Ralston, J. T. (1991). Some influences of accent structure on melody recognition. Memory and Cognition, 19, 8-20. Jones, M. R., & Yee, W. (1993). Attending to auditory events: the role of temporal organization. In S. McAdams & E. Bigand (Eds.), Thinking in Sound: The Cognitive Psychology of Human Audition (pp. 69-112). Oxford: Clarendon Press. Jonides, J. & Yantis, S. (1988) Uniqueness of abrupt visual onset in capturing attention. Perception & Psychophysics, 43, 346-354 Kahneman, D., & Treisman, A. (1984). Changing Views of Attention and Automaticity. In R. D. Parasuraman, D.R. (Ed.), Varieties of Attention. New York: Academic Press. Kahneman, D., & Tversky, A. (1982). Variants of uncertainty. Cognition, 11,143-157. Kidd, G. R. (1993). Temporally directed attention in the detection and discrimination of auditory pattem components. Poster (2pPP19) presented at
Auditory Attentional Capture
227
conference of Acoustical Society of America. Toronto. Kidd, G. R., Boltz, M., & Jones, M. R. (1984). Some effects of rhythmic context on melody recognition. American Journal of Psychology, 97, 153-173. Kingstone, A. (1992). Combining Expectancies. The Quarterly Journal of Experimental Psychology, 44, 69-104. Klapp, S., Hill, M. D., Ryler, J. G., Martin, A. E., Jagacinski, R. J., & Jones, M. R. (1985). On marching to two different drummers: Perceptual aspects of the difficulties. Journal of Experimental Psychology." Human Perception and Performance, 6, 814-827. Klein, J. M., & Jones, M. R. (1996). Effects of attentional set and rhythmic complexity on attending. Perception & Psychophysi'cs, 58, 34-46. Kubovy, M. (1981) Concurrent pitch segregation and the theory of indispensable attributes. In M. Kubovy & J. Pomerantz (Eds.), Perceptual organization (pp. 55098). Hillsdale, NJ: Erlbaum Large, E. W., & Jones, M. R. (1999). The Dynamics of Attending: How People Track Time-Varying Events. Psychological Review, 106(1), 119-159. McAuley, D. J., & Kidd, G. R. (1998). Effect of deviations from temporal expectations on tempo discrimination of isochronous tone sequences. Journal of Experimental Psychology: Human Perception and Performance, 24, 1786-1800. Miller, G. A. & Heise, G. A. (1950) The trill threshold. The Journal of the Acoustical Society of America, 22, 637- 638. Miniussi, C., Wilding, E. L., Coull, J. T., & Nobre, A. C. (1999). Orienting attention in time. Brain, 122, 1507-1518. Miller, J. (1989) The control of attention by abrupt visual onsets and offsets. Perception & Psychophysics, 45, 567-572 Miller, G.A. & Heise, G.A. (1950) The trill threshold. Journal of the American Acoustical Society of America, 22, 637-638 Mondor, T. (1999). Predictability of the cue-target relation and the timecourse of auditory inhibition of return. Perception & Psychophysics, 61, 1501-1509. Mondor, T., & Zatorre, R. J. (1995). Shifting and focusing auditory spatial attention. Journal of Experimental Psychology: Human Perception and
Performance, 21,387-409. Mondor, T., Zatorre, R. J., & Terrio, N. A. (1998). Constraints on the selection of auditory information. Journal of Experimental Psychology: Human Perception and Performance, 24, 66-79. Mondor, T. A., & Breau, L. M. (1999). Facilitative and inhibitory effects of location and frequency cues: Evidence of a modulation in perceptual sensitivity.
Perception & Psychophysics, 61,438-444. Mondor, T. A., & Bregman, A. S. (1994). Allocating attention to frequency regions. Perception & Psychophysics, 56, 268-276. Mondor, T. A., & Terrio, N. A. (1998). Mechanisms of perceptual organization and auditory selective attention: The role of pattern structure. Journal of Experimental Psychology: Human Perception and Performance, 24, 1628-1641. Nissen, M. J., & Corkin, S. (1985). Effectiveness of attentional cueing in
228
RiessJones
older and younger adults. Journal of Gerontology, 40, 185-191. Posner, M. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3-25. Posner, M. I., Snyder, C. R. R., & Davidson, B. J. (1980). Attention and the detection of signals. Journal of Experimental Psychology: General, 109, 160-174. Puente, J. & Jones, M.R. (under review) Determinants of attending and expectancy in listening to auditory pattems. Raymond, J. E., Shapiro, K., & Amell, K. M. (1992). Temporary suppression of visual processing in an RSVP task: an attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18, (849-860). Remington, R. W., Johnston, J. C., & Yantis, S. (1992). Involuntary attentional capture by abrupt onsets. Perception & Psychophysics, 51, 279-290. Restle, F. (1970). Theory of serial pattem learning: Structural trees. Psychological Review, 77, 481-495. Rothstein, A. (1973) Effect on temporal expectancy of the position of a selected foreperiod within a range. The Research Quarterly, 44, 132-139 Scharf, B., Quigley, S., Aoki, C., Peachey, N., & Reeves, A. (1987). Focused auditory attention and frequency sensitivity. Perception & Psychophysics, 42, 215-223. Shapiro, K., Raymond, J. E., & Amell, K. M. (1994). Attention to visual pattern information produces the attentional blink in rapid serial visual presentation. Journal of Experimental Psychology: Human Perception and Performance, 20, 357371. Shulman, G. L., Remington, R. W., & McLean, J. P. (1979). Moving attention through visual space. Journal of Experimental Psychology: Human Perception and Performance, 5, 522-526. Spence, C., & Driver, J. (1994). Covert Spatial Orienting in Audition: Exogenous and Endogenous Mechanisms. Journal of Experimental Psychology: Human Perception and Performance, 20, 555-574. Swets, J.A. & Green, D.M. (1966) Signal Detection Theory and Psychophysics. New York: John Wiley Theeuwes, J. (1991). Exogenous and endogenous control of attention: The effect of visual onsets and offsets. Perception & Psychophysics, 49, 83-90. Treisman, A., & Sato, S. (1990). Conjunction search revisited. Journal of Experimental Psychology: Human Perception and Performance, 16, 459-478. Todd, R., Boltz, M. & Jones, M.R. (1989) The MIDILAB auditory research system. Psychomusicology, 8, 17-30 Ward, R., Duncan, J.Z. & Shapiro, K. (1996) The slow time-course of visual attention. Cognitive Psychology, 30, 79-109 Watson, C.S., Kelly, W.J. & Wroton, H.W. (1976) Factors in the discrimination of tonal pattems. II. Selective attention and leaming under various levels of uncertainty. Journal of the Acoustical Society of America, 60, 1176-1186 van Noorden, L. P. A. S. (1975). Temporal coherence in the perception of tone sequences, University of Technology, Eindhoven.
Auditory Attentional Capture
229
Vos, P., & Ellerman, H. H. (1989). Precision and accuracy in the reproduction of simple tone sequences. Journal of Experimental Psychology: Human Perception and Performance, 15, 179-187. Wolfe, J. M. (1994). Guided Search 2.0: A revised model of visual search.
Psychonomic Bulletin and Review, 1,202-238. Wolfe, J. M., Cave, K. R., & Franzel, S. L. (1989). Guided Search: An alternative to the Feature Integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15, 419-433. Woods, D. L., Alho, K., & Algazi, A. (1994). Stages of auditory feature conjunction: An event related brain potential study. Journal of Experimental Psychology: Human Perception and Performance, 20, 81-94. Woods, D. L., Alain, C., Diaz, R., Rhodes, D. & Ogawa, K. H. (2001). Location and frequency in auditory selective attention. Journal of Experimental Psychology: Human Perception and Performance, 27, 65-74. Yantis, S. (1993). Stimulus-driven attentional capture. Current Directions in Psychological Science, 2, 156-161. Yantis, S., & Egeth, H. (1999). On the distinction between visual salience and stimulus-driven attentional capture. Journal of Experimental Psychology." Human Perception and Performance, 25, 661-676. Yantis, S., & Jonides, J. (1984). Abrupt visual onsets and selective attention: Voluntary versus automatic allocation. Journal of Experimental Psychology: Human Perception and Performance, 1O, 601-621. Yantis, S. & Johnson, D. N. (1990) Mechanisms of attentional priority. Journal of Experimental Psychology, 16, 812-825 Yantis, S., & Jonides, J. (1990). Abrupt visual onsets and selective attention: Evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 16, 121-134. Yantis, S., & Jonides, J. (1996). Attentional capture by abrupt onsets: new perceptual objects or visual masking? Journal of Experimental Psychology: Human Perception and Performance, 22, 1505 - 1513.
Acknowledgements I am grateful to colleagues who assisted in this research and who read and commented on earlier version of this chapter. These include Ralph Barnes, Jennifer Hoffman, Susan Holleran, Noah Mackenzie, Heather Moynihan, Amandine Pennel. I also thank William Johnston, and Charles Folk who commented on an earlier version of this chapter. This research was sponsored in part by a grant awarded to Mari Riess Jones from the National Science Foundation (BCS-9809446). Portions of this research were also reported at an annual meeting of the Psychonomic Society (New Orleans, Louisiana), November, 2000.
This Page Intentionally Left Blank
Attraction, Distraction, and Action: MultiplePerspectiveson Attentional Capture C. Folk and B. Gibson (Editors) 9 ElsevierScience B. V. All rights reserved.
10
231
Crossmodal Attentional Capture" A Controversy Resolved? Charles Spence
There has been a rapid growth of interest in the study of crossmodal attentional capture in recent years, as more and more researchers have started to address the question of whether or not the presentation of a spatially-nonpredictive peripheral event in one sensory modality will lead to a reflexive shift of attention in another modality (such as, for example, whether a sudden white noise burst or tap on the hand will capture visual attention). Although there has been a great deal of controversy regarding the existence of crossmodal capture between audition and vision (e.g., Spence & Driver, 1997a; Ward, 1994; Ward, McDonald, & Lin, 2000), empirical research now supports the view that crossmodal capture effects occur between all combinations of auditory, visual, and tactile stimuli, at least under certain conditions. In the present chapter, the key behavioural findings on crossmodal capture are reviewed, and an attempt is made to resolve this controversy over the existence of audiovisual capture effects. Introduction
Our senses are constantly bombarded by information from the multitudinous distal events occurring in our everyday environment. These events are often specified by information that is available to several sensory modalities simultaneously. For example, when conversing we not only listen to the sound of a person's voice, but also watch their lip movements to hear what is being said (Driver & Spence, 1994). Although the information arriving at the various sensory epithelia are initially processed independently, converging neural pathways rapidly lead to extensive multisensory integration in a variety of neural structures, such as the superior colliculus and the inferior parietal lobe (e.g., Bushara et al., 1999; Sherrington, 1920; see Stein & Meredith, 1993, for a review). In fact, multisensory integration has been reported to occur in all species known to possess more than one sensory system (Stein, London, Wilkinson, & Price, 1996). Given this extensive multisensory convergence it would make sense for our attentional mechanisms to be coordinated across the modalities as well.' To date, however, the majority of empirical research has focused on the study of attentional capture within just vision (e.g., Folk, Remington, & Johnston, 1992; Yantis, 2000), or just audition (e.g., Jones, Moynihan, MacKenzie, & Hoffman, in press; McDonald & Ward, 1999;
232
Spence
Spence & Driver, 1994). Everyday life provides numerous examples of overt crossmodal attentional capture, for example, when we suddenly tum our heads to inspect the source of a loud bang (the auditory capture of visual attention), or else to look at a fly that has unexpectedly landed on our arm (the tactile capture of visual attention). However, most research has focused on the question of whether or not crossmodal links also exist for the case of covert attentional orienting (i.e., for orienting that takes place in the absence of eye, head, or hand movements; cf. Posner, 1978; Spence & Driver, 1994). Many researchers have made a distinction between two different types of covert attentional orienting: The exogenous (also referred to as reflexive, automatic, involuntary, or stimulus-driven) orienting sometimes found in response to salient but spatially nonpredictive peripheral e v e n t s - such as a sudden sound, or an unexpected tap on the hand; and the endogenous (or voluntary) orienting induced by advance knowledge regarding where a target is most likely to o c c u r - such as a verbal instruction informing participants that targets will be more likely on one hand than the other (e.g., Klein & Shore, 2000; Spence & Driver, 1994). Numerous qualitative differences have been found between these two forms of covert orienting, and different neural substrates have been implicated (e.g., Briand, 1998; Briand & Klein, 1987; Ladavas, 1993; Rafal, 1996; Rafal, Henik, & Smith, 1991). Although the present review will focus primarily on the nature of any crossmodal links in purely exogenous spatial orienting, it should be noted that extensive crossmodal links in endogenous spatial attention have also been reported (e.g., Driver & Spence, 1994; Eimer & Driver, 2000; Eimer & Schr6ger, 1998; Hillyard, Simpson, Woods, Van Voorhis, & Munte, 1984; Lloyd, Merat, McGlone, & Spence, submitted; Spence & Driver, 1996; Spence, Pavani, & Driver, 2000; TederS~ilej~irvi, Mtinte, Sperlich, & Hillyard, 2000). The majority of experiments on crossmodal attentional capture (or exogenous crossmodal spatial orienting) 2 have utilized chronometric measures of performance (Posner, 1978), where changes in the speed and/or accuracy of performance have been taken to show that the presentation of a particular spatial cue in one modality can exogenously capture attention in another sensory modality. For example, in a typical spatial-cueing study, participants are instructed to maintain central fixation while making a speeded-detection or discrimination response to a target presented on either side of fixation. A spatially-nonpredictive peripheral cue (such as a sudden visual onset or short noise burst) is presented shortly before the target (typically at stimulus onset asynchronies [SOAs] of 0 - 1,000 ms) on either the same or opposite side. Numerous studies have now shown that response latencies are often faster for targets presented on the same side as the cue (sometimes referred to as ipsilateral or valid trials) than for targets appearing on the uncued side (contralateral or invalid trials). These spatial cueing effects, which typically last for several hundred milliseconds after the cue onset, have been shown to occur both when the cue and target are presented in the same modality and, more importantly for present purposes, when they are presented in different modalities as well. However, although such crossmodal cueing results have often been attributed
Crossmodal Attentional Capture
233
to a beneficial shift in covert exogenous crossmodal attention toward the cued position (i.e., to crossmodal attentional capture), there are several alternative, nonattentional, explanations for the behavioral effects reported in the majority of previous studies. In this review, I will summarize the key findings from studies of crossmodal attentional capture, and highlight the methodological confounds that compromise the interpretation of many of these previous studies. I hope to show that although there has been a great deal of controversy in this area in recent years (e.g., Spence & Driver, 1997a; Ward, 1994; Ward et al., 2000), a consensus view is now emerging in this fertile research area that crossmodal attentional capture effects can occur between all combinations of auditory, visual and tactile stimuli.
Speeded Detection Tasks Perhaps the most commonly used task by researchers to investigate crossmodal capture effects has been the simple detection task. Many studies have shown that simple detection latencies for visual targets presented to the left or right of fixation can be facilitated by the prior (or simultaneous) presentation of a spatially nonpredictive auditory (or visual) cue from the same, rather than opposite side in both normal participants (e.g., Klein, Brennan, D'Aloisio, D'Entremont, & Gilani, 1987, Experiment 1; Reuter-Lorenz & Rosenquist, 1996, Experiment 2; Schmitt, Postma, & de Haan, 2000, Experiments 1 & 2; Schmitt, Postma, & de Haan, in press, Experiment 1; though see Ward et al., 1998, Experiment 1, for contradictory results), and unilateral parietal patients (Farah, Wong, Monheit, & Morrow, 1989). By contrast, auditory detection latencies in normal participants have been reported to be unaffected by either auditory or visual spatial cueing (e.g., Klein et al., 1987, Experiment 6; Schmitt et al., 2000, Experiment 1; Schmitt et al., in press, Experiment 1; Spence & Driver, 1994, Experiment 8; Ward et al., 1998, Experiment 2). Some researchers (e.g., Buchtel & Butter, 1988) have taken these results to demonstrate the existence of asymmetrical crossmodal capture effects, such that auditory cues can capture visual attention, while visual cues do not capture auditory attention. However, an alternative interpretation is that auditory detection latencies may simply be relatively insensitive to the spatial distribution of attention (e.g., Klein et al., 1987; Posner, 1978; Spence & Driver, 1994). Some of the strongest evidence in support of this claim comes from the fact that visual and auditory cues still have no effect on auditory detection latencies when they are made spatiallypredictive with regard to the likely location of the upcoming target (i.e., when both exogenous and endogenous attention should facilitate auditory target detection on the cued side). For example, Buchtel and Butter (1988) reported that visual cues which predicted the target side on 80% of trials had no effect on auditory detection latencies, a result which has been replicated by several other researchers (e.g., Posner, 1978; Schmitt et al., 2000, Experiment 3; Schmitt et al., in press, Experiment 3). Moreover, Spence and Driver (1994) have also reported that spatially-predictive auditory cues (75% valid with respect to the likely target
234
Spence
location) have no effect on auditory detection latencies, despite the fact that the same cues lead to clear attentional effects in a variety of auditory discrimination tasks (see also Hugdahl & Nordby, 1994; McDonald & Ward, 1999; Schmitt et al., 2000; though see Buchtel, Butter, & Ayvasik, 1996). Taken together, these results suggest that the most parsimonious explanation for why visual cues have no effect on auditory detection latencies is that they (auditory detection latencies) are simply insensitive to the spatial distribution of attention. 3 To date, very few published studies have examined crossmodal capture between other pairs of sensory modalities using the simple detection task (Butter, Buchtel, & Santucci, 1989; Tassinari & Campara, 1996). For example, Butter et al. reported that visual and tactile detection latencies were facilitated by the prior presentation of a spatially-predictive peripheral cue in either vision or touch, hence apparently showing symmetrical crossmodal attentional capture effects between vision and touch. However, it is important to note that the use of informative and peripheral cues in Butter et al.'s studies means that both exogenous orienting (to the location of the peripheral cueing event) and endogenous orienting (to the same location, but only because the subsequent target was more likely to appear there) may have been induced by the cues. It therefore remains uncertain whether the facilitatory effects on tactile and visual detection latencies they reported should be attributed to exogenous orienting (i.e., to crossmodal attentional capture), to endogenous orienting, or to some unknown combination of these two effects (see Mondor & Amirault, 1998; Spence & Driver, 1996; Ward, 1994, on this point). In fact, because the cues were spatially informative, they may have produced a strategic shift in attention to the likely side in just the expected target modality (i.e., a purely unimodal shift of attention). For example, in the case of a visual target presented after an informative tactile cue, there may have been only an endogenous shift in just visual attention to the likely side of the anticipated visual target, exactly as would have taken place following, say, the interpretation of a central arrow (see Driver & Spence, 1994; Johnen, Wagner, & Gaese, 2001; Pashler, 1998, pp. 91-92). It should be noted that this uncertainty regarding the most appropriate interpretation of Butter et al.'s results also applies to many other studies where researchers have attempted to investigate crossmodal attentional capture by using predictive peripheral cues (e.g., Buchtel & Butter, 1988; Mondor & Amirault, 1998, Experiments 2 & 3; Schmitt et al., 2000, Experiments 3, 5, & 6; Schmitt et al., in press, Experiments 3 & 4). Finally, it is also important to note that the facilitatory effects reported in all of these simple detection studies may reflect a shift in the participant's criterion for responding rather than a genuine perceptual effect. In fact, it has been argued for many years within the visual attention literature (e.g., Duncan, 1980; Mtiller & Findlay, 1987; Sperling & Dosher, 1986) that spatial cueing effects on simple RT may reflect criterion shifts, rather than genuine attentional effects. That is, participants may simply reduce the amount of evidence necessary for deciding that a target has occurred on the cued side, and also possibly increase their criterion for responding on the uncued side, thus resulting in differences in simple detection
Crossmodal Attentional Capture
235
latencies for targets presented ipsilateral versus contralateral to the cue, without the need to invoke attention. Therefore the most parsimonious conclusion here is probably that the cueing effects reported in previous crossmodal attentional capture studies that have used a simple detection response measure reflect a combination of attentional and/or criterion-shifting effects.
Speeded Discrimination Tasks Given the problems inherent in the use of simple detection latencies to assess the spatial distribution of attention, many researchers have opted to use speeded discrimination tasks instead, so that both speed and accuracy can be measured. (The adoption of a more risky criterion should result in faster but less accurate performance.) These tasks can be grouped into four broad categories depending of the particular discrimination involved: non-spatial discrimination tasks, left/right discrimination tasks, orthogonal-cueing tasks, and implicit spatial discrimination tasks. Markedly different conclusions regarding the existence of crossmodal attentional capture have been developed on the basis of studies using each of these methodologies, as highlighted below.
Non-spatial discrimination tasks Klein et al. (1987; Experiment 5) reported an experiment in which participants made a speeded duration discrimination responses (short vs. long) to visual targets presented to either side of fixation. Spatially nonpredictive auditory cues were presented 250 ms before the target from either exactly the same position as the target, or from the mirror-symmetrical location on the other side. Target discrimination latencies were significantly faster for visual targets presented on the cued side (mean RT of 612 ms) than for targets presented on the uncued side (mean RT of 651 ms). However, participants also tended to make more erroneous responses on cued trials than on uncued trials (means of 20% vs. 18% errors respectively), making it uncertain whether Klein et al.'s RT facilitation effects reflect a genuine perceptual attentional benefit for targets on the cued side, a simple criterion-shifting effect, or some unknown combination of the two effects. Mondor and Amirault (1998, Experiment 1) failed to demonstrate crossmodal attentional capture in their study when participants were required to make speeded discrimination responses to an unpredictable sequence of auditory and visual targets presented from the left or right of fixation. Participants had to judge the colour (red vs. green) of visual targets and the direction of frequency change (upward vs. downward frequency glide) of auditory targets. Every target was preceded unpredictably by an auditory or visual cue from the same or opposite side of fixation (at SOAs of 150 or 300 ms). Mondor and Amirault reported that while auditory cues reliably captured auditory spatial attention, and visual cues reliably captured visual attention, there was no evidence of crossmodal attentional capture (either from auditory cues to visual targets or vice versa; though see
236
Spence
Widmann & Schr6ger, 1999), unless the cues were made spatially predictive with regard to the likely target location (i.e., 75% valid, 25% invalid; Experiments 2 & 3) making this finding ambiguous for present purposes. Similarly, Ward et al. (1998, Experiments 3 & 4) also reported no evidence of crossmodal attention capture in their studies when participants made non-spatial discrimination responses ('x' vs. '+' discrimination for visual targets, and 3,000 Hz vs. 4,000 Hz pure tone frequency discrimination in audition; though see also Ward et al., Experiment 9). Mondor and Amirault's (1998) failure to demonstrate any crossmodal cueing effects following the presentation of spatially nonpredictive cues may have been due to the relative positioning of their auditory and visual stimuli. The visual stimuli were presented from a computer monitor 14 degrees to either side of fixation, while auditory stimuli were presented from loudspeakers centered 17 degrees to either side. Similarly, Ward et al. (1998) also presented visual targets from an eccentricity of 12 degrees and auditory targets from an eccentricity of 24 degrees to either side of fixation. (Note that Klein et al., 1987, conducted one of the few early crossmodal attentional cueing studies to present cue and target stimuli from exactly the same spatial location on ipsilateral trials). Research has shown that introducing even small lateral discrepancies (of as little as 3 degrees) between the locations of auditory and visual stimuli can lead to a dramatic reduction, or even elimination, of crossmodal attentional effects (see Eimer & Schr6ger, 1998, for a particularly convincing demonstration of this). Dufour (1999; Experiment 2) also reported that auditory cues do not facilitate speeded orientation discrimination responses for visual targets (line segments oriented at + 45 degrees presented 40 ms later) presented amongst visual line segment distractors, even when presented from the same lateral eccentricity as the visual target on ipsilateral trials. Interestingly, Dufour reported that auditory cues did improve visual performance on an unspeeded conjunction discrimination task. On each trial of this experiment, a target letter 'T', flanked by 4 'T' distractors in different orientations was presented randomly to either the left or right of fixation, and participants had to make an unspeeded discrimination response regarding the orientation of the target. Participants performed significantly better (by approximately 8%; overall performance was in the range of 55-65% correct) on this conjunction discrimination task when the auditory cue was presented ipsilateral to the target. These results suggest that the crossmodal capture of visual attention by auditory cues may only occur when the task requires a particularly attentiondemanding discrimination response (such as when searching for a conjunction target; Treisman & Gelade, 1980).4 Unfortunately, however, an overt orienting explanation of Dufour's cueing effects cannot be ruled out, since the eye position of their participants was not monitored. The results of these studies of crossmodal attentional capture using non-spatial discrimination tasks do not, therefore, provide any unequivocal evidence to support the existence of audiovisual crossmodal attentional capture effects. More promising results have been reported by Spence, Nicholls, Gillespie, and Driver (1998) in experiments where participants made speeded continuous vs.
Crossmodal Attentional Capture
237
pulsed discrimination responses to tactile targets presented unpredictably to the index finger of either hand. Every target was preceded by a spatially uninformative auditory or visual cue (at an SOA of 150, 200, or 300 ms) on either the same or opposite side. Tactile discrimination response latencies were significantly faster, and also more accurate, when the cue was presented on the same side as the target, revealing the crossmodal capture of tactile attention by both auditory and visual cues. An overt crossmodal capture account of these results was ruled out by monitoring the eye position of participants, and eliminating all trials on which an eye movement was detected. Spence et al.'s (1998) results therefore provide the strongest evidence to date for crossmodal attentional capture (the auditory capture of touch, and the visual capture of touch) using non-spatial discrimination tasks. Only further research will reveal whether unambiguous crossmodal capture effects can also be demonstrated between other pairs of sensory modalities using nonspatial discrimination tasks.
Left/right discrimination task Simon and Craft (1970) reported that participants made speeded left/right spatial discrimination responses to visual targets more rapidly when they were accompanied by a spatially-uninformative auditory cue (presented over headphones) on the same rather than the opposite side. Similar results have also been reported by Bernstein and Edelstein (1971) for sounds presented monaurally up to 45 ms after the lateralized visual target. By contrast, Ward (1994; Experiment 1; see also Ward et al., 1998, Experiments 5 & 6) reported no effect of freefield auditory cues on visual left/right discrimination RTs, despite the fact that visual cues facilitated visual discrimination responses. However, in a second experiment, Ward (1994) found that auditory spatial discrimination responses were facilitated by both visual and auditory cues, with the largest facilitation effects when auditory and visual cues were presented simultaneously from the same side. Ward (1994) took the asymmetrical results from his left/right discrimination experiments to demonstrate the existence of asymmetrical crossmodal attentional capture effects, such that visual cues capture auditory attention, but auditory cues do not capture visual attention. However, Spence and Driver (1997a) have argued that Ward's results, together with those reported in other left/right discrimination studies (e.g., Bernstein & Edelstein, 1971; Simon & Craft, 1970) may partially reflect the facilitatory effects of response priming (or spatial compatibility) instead. The lateralized cues could have biased participants toward making a response on the side of the cue, which would in turn be expected to speed responses to targets appearing on that side (i.e., ipsilateral to the cue), and hence facilitate ipsilateral target performance, given that participants responded with their left hand to left targets and with their right hand to right targets. Moreover, Spence and Driver (1997a) argued that Ward's (1994) failure to demonstrate any facilitatory effect of auditory cues on visual left/right discrimination latencies could also be explained in terms of relative stimulus-response compatibility effects, since
Spence
238
auditory and visual stimuli were presented from different lateral eccentricities (e.g., auditory cues were presented from 24 degrees from fixation, whereas visual targets were presented from only 12 degrees from fixation; though see Ward et al., 2000). However, Spence and Driver's account of Ward's null results now seems less tenable given that the same null effect of auditory cues on visual /right discrimination responses has subsequently been replicated by Ward et al. (1998, Experiment 7) when auditory and visual stimuli were presented from the same
lateral eccentricity. Interestingly, however, Schmitt et al. (2000; Experiment 2) reported robust facilitatory effects between all four possible combinations of auditory and visual cue and target stimuli in their study (in which each cue-target combination was presented in a separate block of trials), when both stimuli were presented from the same eccentricity on ipsilateral trials. The most obvious methodological difference between Schmitt et al.'s study and the experiments reported by Ward and colleagues (Ward, 1994; Ward et al., 1998), is that Schmitt et al. presented auditory cues in a simple blocked cueing environment (i.e., where only auditory cues were presented within a particular block of trials), whereas Ward and colleagues presented auditory cues in a more complex cueing environment (where they were unpredictably mixed with visual and multimodal cues; see Ward et al., 1998, on this point). It now seems that left/right discrimination responses to visual targets are facilitated by the ipsilateral presentation of an auditory cue only when they are presented in a simple cueing environment (e.g., Bernstein & Edelstein, 1971; Schmitt et al., 2000; Simon & Craft, 1970), but not when they are presented in more complex cueing environments (e.g., Ward, 1994; Ward et al., 1998, Experiments 5-7; a point to which we return later). Nevertheless, it is important to note that no firm conclusions regarding the magnitude of crossmodal attentional capture effects can be drawn from the results of experiments utilizing the left/right discrimination task, given the possibility of response bias confounds.
Orthogonal-cueing paradigm Spence and Driver (1994, 1997a; Driver & Spence, 1998) developed the
orthogonal spatial-cueing paradigm to investigate attentional capture in a spatial task that was free from response bias (thus extending the non-spatial orthogonal crossmodal cueing paradigm first developed by Klein et al., 1987, Experiment 5). Participants in Spence and Driver's studies made speeded discrimination responses regarding the elevation (up vs. down) of a series of targets presented from above or below fixation on either the left or right (see Figure 1). Every target was preceded by a spatially-nonpredictive cue in either the same or different sensory modality. Participants were instructed to ignore the cue as much as possible, and to make a speeded discrimination response (using either response buttons or foot pedals) to the elevation of the target, regardless of the side on which it appeared, and also regardless of its modality.
239
Crossmodal Attentional Capture
Over a number of studies, it has been shown that spatially nonpredictive auditory cues on one side lead to better elevation judgments (on average, around 2030 ms faster, and somewhat more accurate) for auditory, visual and tactile events presented in the vicinity of the sound shortly after its onset (at SOAs of 100-300 ms; Driver & Spence, 1998; Schmitt et al., 2000, Experiment 5; Spence & Driver, 1994, 1997; Vroomen et al., in press, Experiment 1; though see Ward et al., 1998, Experiment 8). These results show that salient auditory events can lead to a rapid capture of covert visual and tactile spatial attention, though spatially-coincident bimodal audiovisual cues have been shown to be no more effective at capturing auditory attention than unimodal auditory cues (e.g., Vroomen, Bertelson, & de Gelder, in press, Experiment 1; Spence & Driver, 1999). Similarly, spatially nonpredictive tactile events on one hand lead to better auditory, visual, and tactile Target Loudspeaker and Light
9
Cue Loudspeaker
Figure 1. Schematic view of the position of cue and target loudspeakers (shown by ellipses), the target lights (black circles), the central fixation light, and the participant in Spence and Driver's (1997a) studies of audiovisual links in covert spatial attention.
judgments on that side (e.g., Chong & Mattingley, 2000; Kennett, Eimer, Spence, & Driver, 2001; Kennett, Spence, & Driver, submitted; Spence & McGlone, in press; Spence et al., 1998). This shows that tactile events also elicit crossmodal attentional capture. Finally, nonpredictive visual flashes have been shown to lead to better visual and tactile judgments in their vicinity (Chong & Mattingley, 2000; Kennett et al., 2001, submitted). Importantly, these crossmodal capture effects occur even when the cues are presented in a modality which is completely irrelevant to the participant's task (i.e., if the cues are always auditory, while the targets are always visual). Spence and Driver (1997a) have, however, repeatedly found that visual cues do not affect auditory judgments (at least when eye movements are prevented). This null result has been shown to hold up across numerous variations in the physical properties of the particular visual and auditory stimuli used, and has now been replicated by several different researchers (e.g., Rorden & Driver, 1999,
240
Spence
Experiment 4; Schmitt et al., 2000, Experiment 5; Spence & Driver, 2000; Vroomen et al., in press, Experiments 1-3). 5 In fact, the only situation where spatially nonpredictive visual cues have been shown to lead to the crossmodal capture of auditory attention in the orthogonal-cueing task is when they are paired with a centrally-presented auditory cue (Spence & Driver, 2000; Vroomen et al., in press). For example, Vroomen et al. reported a series of experiments in which they showed that a laterally-presented visual cue only captured auditory attention when it was presented at the same time as an auditory cue from a loudspeaker at fixation (but not when it was presented in isolation), presumably due to the well-known ventriloquism effect. (Note that the auditory cue, by itself, could not have induced a lateralized orienting effect since it was presented centrally.) Similar results, albeit with a somewhat different time-course, have been reported by Spence and Driver (2000) in their study of attentional capture in the vertical dimension. They showed that a visual cue presented either above or below fixation led to a vertical shift of auditory attention only when it was paired with a hard-to-localize auditory pure tone cue centered at fixation, but not when it was paired with a highly-localizable white noise cue (see Figure 2). These results demonstrate that attentional capture can be
A,
0
~
Cue Lights
0 Fixation Light Loudspeaker Cone
0 Figure 2A. shows a schematic front-on view of the position of the cue and target loudspeakers, cue lights (grids of LEDs), and fixation light as seen by participants in Spence and Driver's (2000) study of attentional capture by ventriloquized sounds. On each trial, either the upper or lower grid of lights was illuminated, serving as the visual cue. This was paired either with a hard-to-localize pure tone cue from all five loudspeaker cones, or else with an easy-to-localize white noise burst from just the loudspeaker situated behind the fixation light. The target consisted of pulsed white noise presented from one of the four comer loudspeakers. In order, to maintain the orthogonality of the design, participants were required to make a left-fight discrimination response, given that cueing occurred in the vertical direction.
241
Crossmodal Attentional Capture
Be
12., "~ =s
lO-
Unlocalizable Tone Cue
,....
8-
m ~9 9
Localizable
6 4-
~~eNoiseCue
0
-2
~!!i
ill
9
9
100
m s
700 200
m s
m s
Stimulus Onset Asynchrony
(SOA)
Figure 2B. shows the mean cueing effects in reaction time (contralateral- ipsilateral trials) as a function of the cue sound, and the cue-target stimulus-onset asynchrony (SOA). Pure tone cue trials are represented by the striped bars, and localizable white noise cue trials by the dotted bars. The * indicates a significant attentional capture effect, which was not compromised by the accuracy data.
directed toward the apparent location of a ventriloquized sound, suggesting that multisensory integration precedes (or else co-occurs with) reflexive shifts of covert attention (see Driver, 1996, for a similar conclusion for the case of endogenous attention). Researchers have also used the orthogonal-cueing paradigm to investigate whether crossmodal attentional capture effects lead to the facilitation of responses to all stimuli presented on the cued side, or to a more spatially-specific cueing effect. For example, participants in a study by Driver and Spence (1998) were presented with spatially nonpredictive auditory cues from one of four possible loudspeakers, two on either side of fixation (situated at eccentricities of 13 and 39 degrees; see Figure 3A). Lights were placed directly above and below each of these loudspeakers, and visual targets consisted of the brief offset of one of these eight lights. Participants were required to make a speeded discrimination response regarding the elevation of the visual offset. The maximal facilitation of response latencies (and the most accurate responses) occurred when cue and target were presented from the same lateral position, with cueing effects dropping off as the lateral eccentricity between the cue and target increased, irrespective of whether the cue and target stimuli were presented in the same or different hernispaces. These results show that the peripheral presentation of a spatially-nonpredictive auditory cue leads to spatially-specific crossmodal capture of visual attention (see Schmitt et al., in press, Experiment 2, for similar results using a 4-button localization paradigm) to a particular location within a hemifield. Similar results have been reported in unimodal studies of both visual and auditory attention (see Rorden & Driver, 2001). To date, virtually all studies of crossmodal attentional capture have been performed with the eyes and head in alignment (i.e., eyes looking straight-ahead
242
Spence
with respect to the head). However, gaze is frequently deviated in daily life, which realigns visual receptors relative to auditory receptors. This raises the important issue of whether audiovisual links in spatial attention are controlled by a fixed earretina mapping, or whether instead the relationship between the modalities gets spatially remapped whenever gaze is deviated. Recent studies using the orthogonal cueing paradigm suggest that spatial alignment is maintained even when the eyes are deviated with respect to the head, or when the hands are crossed over the midline (see Figure 3B; Driver & Spence, 1998), showing that crossmodal links are maintained under receptor misalignment, so that our attention may be focused on the same external location across the modalities regardless of the posture adopted (see also Figures 3C and 3D).
Figure 3A. Schematic view of Driver and Spence's (1998) study in which participants made speeded elevation discriminations for target lights, regardless of where the immediatelypreceding sound cue had been. In a typical crossmodal attention study, participants fixate directly ahead, and visual discriminations are best at the same eccentricity and side as the immediately preceding auditory cue. In Figure 3B, where participants fixated eccentrically (note that all visual events have been laterally translated along with gaze), visual discriminations were again best for lights at the same external location as the immediatelypreceding sound, but these now occupied different retinal locations as compared with Figure 3A. This result demonstrates remapping between auditory locations in the control of exogenous crossmodal attention, to keep vision and audition in register as regards external space despite deviations in gaze. A similar remapping of visuotactile space has also been demonstrated when participants adopt a crossed hands posture. In particular, a tactile cue presented to the left hand will facilitate elevation discrimination responses to visual targets on the left when the hands are uncrossed (Figure 3C), but will facilitate responses to lights on the fight when the hands are crossed (Figure 3D).
Crossmodal Attentional Capture
243
Implicit spatial discrimination task McDonald and Ward (1999, 2000, submitted) have developed another spatial task, called the implicit spatial discrimination task, to investigate intramodal and crossmodal attentional capture effects. In one combined behavioral and electrophysiological study, McDonald and Ward (2000) presented spatially nonpredictive auditory cues from either the left or right of fixation, followed a short time later by a visual stimulus on either the left, the right, or else at fixation. Participants were required to make a speeded detection response to visual onsets presented on either side (go signals, 80% of trials), but to refrain from responding on trials where the visual stimulus was presented at fixation (the no-go signals, 20% of trials). Participants responded significantly faster on ipsilaterally-cued trials than on contralaterally-cued trials at short SOAs (100-300 ms). McDonald and Ward (2000) also recorded event-related brain potentials (ERPs) while participants performed the implicit spatial discrimination task to examine the neural basis of their capture effect. They found that the presentation of the spatially nonpredictive auditory cue modulated ERPs to visual targets over modality-specific, extrastriate visual cortex, as reported previously in unimodal exogenous studies of visual attention (e.g., Hopfinger & Mangun, 1998). Interestingly, this crossmodal effect took place only after the initial sensory processing of the visual target had been completed (i.e., spatial cueing had no effect on the N1 and P 1 components at posterior brain sites - early negative and positive peaks related to early sensory-perceptual processing). 6 These results show that crossmodal attentional capture effects can influence the sensory processing of visual target stimuli as early as the extrastriate visual cortex, presumably via reentrant input to visual cortex from higher multisensory areas (see Driver & Spence, 2000; Kennett et al., 2001; Miyauchi et al., 1993, for similar results regarding the tactile capture of visual attention). In a number of other studies using the implicit spatial discrimination task, McDonald and Ward (1999, 2000, submitted; Ward et al., 2000) have also shown that auditory cues capture auditory attention, and that visual cues capture both visual and auditory attention. Unfortunately, however, it has proved difficult to rule out a criterion-shifting account of the behavioral effects reported in these experiments, given the use of a speeded detection response (see Ward et al., p. 1263; McDonald et al., 2001, p. 144; though note that it seems unlikely that criterion-shifting could explain the electrophysiological effects reported by McDonald and Ward in modality-specific visual cortex). Using a modified version of the task, McDonald et al. (2001) have recently shown that visual cues can still facilitate responses to ipsilaterally-presented auditory stimuli even when speed-accuracy trade-offs have been ruled out. Participants in McDonald et al.'s study were presented a spatially nonpredictive visual cue to either side of fixation, which was followed by a high or low tone (1,175 Hz vs. 1,109 Hz respectively) presented from the left, right or else from the center. Participants were required to make a speeded frequency discrimination response to tones presented from either of the peripheral locations, but to refrain
244
Spence
from responding to tones presented from fixation. Participants responded significantly faster on ipsilateral trials (mean RT of 642 ms) compared to contralateral trials (mean RT of 696 ms). Importantly, by using a two-alternative discrimination response, rather than the simple detection response used in their previous go/no-go studies, McDonald et al. were able to show that participants made no more errors on the ipsilateral trials than on the contralateral trials (3.9% errors vs. 4.0% errors respectively), hence ruling out a criterion shifting account of their findings. Analysis of ERP data from their study also revealed that visual cues led to a significant modulation of the neural processing of auditory targets at both early and late stages of auditory processing. The early negative peak (occurring 120-140 ms after the presentation of the target), thought to be related to the initial sensoryperceptual selection, was significantly larger for sounds presented ipsilateral to the visual cue, than for sounds presented contralateral to the cue. Importantly, McDonald and colleagues (McDonald, Teder-S~.lej~.rvi, & Hillyard, 2000; McDonald, Teder-S~.lej~.rvi, Di Russo, & Hillyard, 2000; see also Widmann & Schr6ger, 1999) have also recently demonstrated, using both psychophysical (signal detection measures) and electrophysiological measures that the auditory capture of visual attention still occurs, when participants perform a task in which criterion shifting can be ruled out.
Crossmodal Attentional Capture It should be clear from the preceding review that there has been a great deal of research on the topic of crossmodal attentional capture over the last few years. Using a variety of experimental paradigms, researchers have demonstrated different patterns of crossmodal capture, and consequently developed a variety of different, and often conflicting, theories to account for their data. Many of the inconsistencies in this area can, however, be attributed to one or more non-attention explanations based on one or more of the following alternative mechanisms: criterion shifting, response priming, overt orienting, and the use of insensitive response measures. With the development of more advanced experimental paradigms, such as the orthogonal spatial-cueing paradigm and the implicit spatial discrimination task, researchers are now able to show robust and reliable crossmodal attentional capture between most combinations of successive auditory, visual, and tactile stimuli. The most contentious remaining issue is whether auditory cues capture visual attention, and conversely, whether visual cues capture auditory attention (e.g., Spence & Driver, 1997a; Ward, 1994; Ward et al., 1998, 2000). However, research using the orthogonal-cueing paradigm (Driver & Spence, 1998; Spence & Driver, 1997, 1998; Schmitt et al., 2000; Vroomen et al., in press), the implicit spatial discrimination paradigm (McDonald & Ward, 2000, submitted), and other psychophysical and electrophysiological measures (e.g., McDonald et al., 2000a, b; Spence & Lupib.nez, 1998), now shows that auditory cues can capture visual attention. Similarly, recent work from McDonald et al. (2001; McDonald & Ward,
245
Crossmodal Attentional Capture
2000; see also Spence & Lupi~.nez, 1998; Widmann & Schr6ger, 1999) also provides irrefutable evidence that visual cues can capture auditory attention under certain conditions. For example, Spence and Lupib.nez reported a temporal order judgment study in which two pure tones (2,000 Hz and 500 Hz respectively) were presented, one to either side of fixation, at SOAs between 15-250 ms. Participants made unspeeded discrimination responses regarding which tone (either high or low) had been presented first (or second). On each trial, a spatially nonpredictive peripheral visual cue was presented from an LED placed directly in front of one of the loudspeakers used to present the tones. As in other crossmodal capture studies, participants were instructed to ignore the visual cue as much as possible, while still keeping their eyes open. Participants judged the tone presented on the visually cued side to have occurred 44 ms before the tone on the uncued side, demonstrating psychophysically that the visual capture of auditory attention also speeds up the 'time of arrival' of stimuli on the cued side (the phenomenon of crossmodal spatial prior entry; e.g., Shore, Spence, & Klein, 2001; Spence, Shore, & Klein, in press). Taken together, the empirical evidence therefore supports the conclusion that crossmodal attentional capture can occur between all possible combination of auditory visual and tactile stimuli (see Figure 4). The question which must now be answered is why researchers have sometimes failed to demonstrate such crossmodal capture effects?
Touch
Vision Figure 4. Schematic view of the crossmodal attentional capture effects demonstrated to date. Note that auditory, visual, and tactile cues have now been shown to facilitate responses to ipsilaterallypresented targets in all three modalities.
Many of the apparent failures to demonstrate crossmodal cueing effects for auditory target stimuli can be accounted for simply in terms of the use of response measures, such as auditory simple detection latencies, which are insensitive to the spatial distribution of attention (e.g., Buchtel & Butter, 1988; Klein et al., 1987; Spence & Driver, 1994). Many other null results may be attributed to the fact that
246
Spence
auditory stimuli have often been presented from different (in particular, more eccentric) locations than the visual stimuli (e.g., Mondor & Amirault, 1998; Ward, 1994). Moreover, closer inspection of some of the apparent failures to demonstrate the auditory capture of visual attention, reveal numerical trends toward a cueing effect, suggesting that some of these null results may reflect a lack of statistical power, rather than the absence of crossmodal cueing p e r se. For example, there was an 11 ms non-significant trend in Ward et al. (2000; which was only 1 ms away from significance), a 7 ms trend toward the auditory capture of visual attention in Ward (1994), a 10 ms trend in Mondor and Amirault's (1998, Experiment 1) study which was associated with a similar trend in the error data, and a 20 ms trend in Dufour's (1999) study (see also Spence & Driver, 1997a, p. 17). Finally, it now seems likely that the null effect of visual cues on auditory elevation discrimination responses in the orthogonal-cueing task may have been caused by the fact that auditory targets were presented from different elevations than the visual cues, even on ipsilateral trials (McDonald & Ward, submitted; Ward et al., 2000). A review of previous studies reveals that the auditory targets have always been presented from at least 14 degrees away from the visual cue in the majority of studies (e.g., Schmitt et al., 2000; Spence & Driver, 1997; Vroomen et al., in press), v The spatial distribution of attention following crossmodal capture may depend more on the attributes of the cue modality, than on the modality of the target (cf. Spence & Driver, 1998, p. 134). In particular, as suggested by Ward et al. (2000, p. 1264), it is possible that visual cues may lead to a more spatially-localized shift of attention than either auditory or tactile cues. It is already well known that the spatial acuity of the visual system is far better than that seen in response to tactile or auditory stimuli (e.g., Fisher, 1962; Simpson, 1972; Warren, 1970), and consequently it is possible that visual cues may also elicit a more spatially-localized crossmodal capture of auditory and tactile attention than either auditory or visual cues. Empirical support for this claim comes from a recent study reported by Chong and Mattingley (2000) in which they investigated crossmodal attentional capture between vision and touch using the orthogonal-cueing paradigm. They found that the presentation of a tactile cue led to the facilitation of responses to visual targets irrespective of their distance from the tactile cues. By contrast, the presentation of a visual cue was shown to facilitate tactile elevation discrimination responses more when the visual cues were close to the hand than when they were further away. If, as Chong and Mattingley's results suggest, visual cues lead to a more spatially-focused attentional capture effect than tactile, and perhaps auditory cues (see Figure 5), this would provide one possible explanation for why visual cues have only been shown to capture auditory attention when auditory targets are presented from close to the cue, as in the implicit spatial discrimination task (e.g., McDonald et al., 2001; Spence & Lupifinez, 1998; Widmann & Schr6ger, 1999), or when targets are presented from similar elevations to the cues as in the orthogonalcueing task when visual or tactile targets are used (e.g., Chong & Mattingley, 2000; Kennett et al., 2001, submitted; see Figure 3). 8 This means that it may be
Crossmodal Attentional Capture
247
particularly important in the future to ensure that cue and target stimuli are presented from the same position when assessing crossmodal attentional capture following visual cues (cf. Kadunce, Vaughan, Wallace, Benedek, & Stein, 1997). 9 It will clearly also be important for future research to address more carefully the spatial distribution of attention following the peripheral presentation of spatially nonpredictive cues. Ao
C.
Visual Cue
Auditory Cue
~
Bo
0 0' 0
Spatial Attention
Spatial Attention
Figure 5. Schematic illustration of how spatially nonpredictive cues in different sensory modalities (here audition, Figure 5A; and vision, Figure 5C) might elicit attentional capture effects of different spatial specificity, hence providing one explanation for why auditory cues may facilitate
elevation discrimination responses for visual targets (Figure 5B, while visual cues may fail to elicit any significant effect on auditory elevation discrimination latencies (Figure 5D). Taken together, it seems that methodological factors, together with this differential spatial-cueing effects of different cue modalities can account for many of the failures to demonstrate crossmodal attentional capture effects in previous studies. Before moving on, however, it is important to assess Ward and colleagues (e.g., Ward, 1994; Ward et al., 2000) recent claim that crossmodal attentional capture effects may be modulated by strategic factors. In particular, they suggest that participants may adopt different strategies in situations where they are presented with a single cue, as compared to situations in which a variety of different cues are presented (i.e., situations in which the cueing environment is complex). They suggest that participants can ignore auditory cues when they are intermixed with visual cues (i.e., when the cueing environment is complex), but cannot ignore the
248
Spence
auditory cues when only a single type of cue is presented at any given time. According to this claim, auditory cues should only capture visual attention in simple, but not complex (i.e., multimodal) cueing environments. Support for this view comes from the fact that auditory cues only facilitate visual performance in the left/right discrimination task when the cue modality is predictably auditory (e.g., Bernstein & Edelstein, 1971; Schmitt et al., 2000; Simon & Craft, 1970, but not when it is unpredictably either auditory or visual (e.g., Ward, 1994; Ward et al., 1998, Experiment 8). Until recently, the majority of orthogonal cueing studies, also presented auditory cues in a simple unimodal cue environment and so provided results consistent with Ward's claim (e.g., Driver & Spence, 1998; Schmitt et al., 2000; Spence & Driver, 1997, 1998). However, more recent studies have shown that auditory cues still facilitate visual elevation discrimination responses even when a complex cueing environment is used (e.g., Spence & Driver, 2000; Vroomen et al., in press; Spence & Driver, 1999), showing that the cue complexity account of crossmodal attentional capture cannot be correct, at least when performance is assessed in the orthogonal-cueing task. l~
Spatial relevance One general finding to emerge from the study of crossmodal attentional capture is that cueing effects appear more robust when participants make some form of spatial discrimination response, as compared to when they make a non-spatial response. A similar pattern of results has also been reported in purely auditory studies of attentional capture as well (e.g., McDonald & Ward, 1999; Spence & Driver, 1994; and Pavan;, L~tdavas, & Driver, in press, for similar results in a patient population). It seems clear that neither methodological confounds nor strategic factors can account for such findings. In a thorough review of this area, McDonald and Ward (1999) have suggested that attentional capture effects for auditory target stimuli will only be demonstrated under conditions where space is made relevant to the accomplishment of the participant's task, either by forcing them to make a spatial discrimination response, or by some other means (the so-called spatial relevance hypothesis; see also Klein & Taylor, 1994). ~t It seems likely that making space relevant to their task forces participants into responding on the basis of some representation of auditory stimuli, in which spatial information is explicitly coded. Many of the neural structures, in which auditory spatial information is represented (e.g., the superior colliculus, and the inferior parietal lobule; Bushara et al., 1999; King, 1993; Weeks et al., 1999) are multimodal, hence providing plausible neural substrates for the behavioural effects identified in these crossmodal capture studies. It seems possible that crossmodal attentional capture effects will only be demonstrated in situations where participants are forced to respond on the basis of information coded in multimodal brain structures, such as those typically implicated in spatial representation. For example, using PET, Bushara et al. (1999) recently found that both auditory and visual localization tasks result in activation in the inferior parietal lobe (see also Weeks et al., 1999). The fact that common neural
249
Crossmodal Attentional Capture
substrates are involved in both auditory and visual spatial processing might help to explain why crossmodal capture effects are so strong when a spatial discrimination response is required. It will be an interesting question for future research to determine whether certain tasks may be more affected by intramodal attentional capture, whereas, other tasks will be affected by both intramodal and crossmodal attentional capture equally (see McDonald & Ward, 2000; and Hopfinger & Mangun, 1998; for preliminary electrophysiological support for such a distinction).
Modality-Specific vs. Supramodal Attention Systems Over the years, researchers have proposed a number of different accounts of how attention may be coordinated across the modalities (see Spence & Driver, 1996; McDonald & Ward, submitted, for reviews). One of the earliest suggestions came from Wickens (e.g., 1980, 1984) who proposed that people have entirely modality-specific (i.e., auditory, visual, and tactile) attentional systems, such that the distribution of attention in one modality has no effect on attention in the other modalities (see Figure 6A). This purely modality-specific resource account is Ao
B~
Visual
Auditory
Tactile
;t
Supramodal Auditory + Visual + Tactile
CO DO Supramodal Auditory + Visual + Tactile Tactile Visual
Auditory
Tactile
Figure 6. Schematic illustration of the ways in which researchers have conceptualized how the attentional systems might be coordinated across the different sensory modalities. A) Independent modality-specific attentional resources; B) Single supramodal attention system; C) Hierarchical supramodal plus modality-specific attentional systems; and D) Separable-but-linked attentional systems (see McDonald& Ward, submitted; and Spence & Driver, 1996, for reviews).
250
Spence
clearly inconsistent with many of the behavioral and electrophysiological results reported here. Other researchers have argued for a single supramodal attentional system (e.g., Farah et al., 1989, pp. 469-470), that allocates attention to locations in space regardless of the modality of the stimuli presented there (see Figure 6B). The single supramodal account has been ruled out for the case of endogenous spatial orienting, by researchers who have shown that people can simultaneously direct their auditory, visual, and/or tactile attention in different directions simultaneously (e.g., Driver & Spence, 1994; Lloyd et al., submitted; Spence & Driver, 1996; Spence et al., 2000). Researchers have suggested more complex attentional architectures, such as the hybrid account proposed by Posner (1990, pp. 202-203), whereby the various modality-specific attentional subsystems are thought to feed into a higher-level supramodal system (see Figure 6C; see also Bushara et al., 1999, p. 764; Woods, Alho, & Algazi, 1992). By contrast, Spence and Driver (1996) argued for separable-but-linked modality-specific attentional systems in each modality to account for their endogenous spatial attention data (i.e., without the need for a higher-order supramodal system). According to Spence and Driver, separate modality-specific attentional systems may operate upon the representations of auditory, visual, and tactile space, but strong crossmodal links ensure that attention in the different modalities is normally directed to the same spatial location (see Figure 6D). Until recently, the lack of any unambiguous evidence that visual cues could pull auditory attention led many researchers to argue that the separable-but-linked account of crossmodal links in spatial attention might also provide the most parsimonious account of crossmodal links in exogenous spatial attention as well (e.g., Driver & Spence, 1998; Mondor & Amirault, 1998, p. 753; Schmitt et al., 2000, in press; Spence et al., 1998). However, given that that recent empirical data now shows that crossmodal capture can occur between all combinations of auditory, visual, and tactile stimuli (at least when cue and target are presented from the same spatial location), the validity of the single supramodal account needs to be reassessed. ~2 One critical behavioral experiment that might help to tease apart these various alternatives would involve simultaneously presenting spatially nonpredictive cues from different positions in different modalities (i.e., a visual cue on the left together with an auditory cue on the right), followed unpredictably by auditory or visual targets. According to the separable-but-linked hypothesis one might expect to see facilitation of responses to auditory targets on the side of the auditory cue, and the facilitation of responses to visual targets on the visually-cued side (clearly this effect might well be modulated by the relative intensity and timing of the cues used). Ward (1994) actually carried out this experiment, but his use of the confounded left/right localization task, makes any interpretation of his results difficult. Using their implicit spatial discrimination task, Ward et al. (2000) reported no significant spatial cueing effects when auditory and visual cues were presented simultaneously from opposite sides in their study. Resolution of this issue will clearly be an important issue for future research (e.g., Farah et al., 1989; McDonald et al., 2001; Spence & Driver, 1996), but may well require an increasing reliance of
Crossmodal Attentional Capture
251
cognitive neuroscience techniques to elucidate the brain structures underlying these crossmodal attentional effects (see Driver & Spence, 2000; Macaluso, Frith, & Driver, 2000; Spence & Driver, 1996). Neural Correlates of Crossmodal Capture
It is clear that the crossmodal capture effects reported here imply that some degree of spatial integration must arise between the sensory modalities prior to participants making a response, such that common locations are treated as such across the different senses when different postures are adopted (Driver & Spence, 1998). However, how such crossmodal coordination of spatial representation arises is a nontrivial problem, given that information is initially coded in very different coordinates in the various senses. For instance, visual stimuli are initially coded retinotopically, auditory stimuli tonotopically, and tactile stimuli somatotopically, so there is little in common between a shared location across the modalities at input (i.e., at the level of the sensory epithelia). One candidate neural substrate for the crossmodal capture effects reviewed here is the superior colliculus (SC; Spence & Driver, 1997a). A majority of cells in the deeper layers of this subcortical structure are multimodal (Stein & Meredith, 1993), and neurophysiological and neuropsychological studies have implicated its involvement in the subcortical control of overt and covert exogenous orienting both in animals (e.g., Peck, 1987; Robinson & Kertzman, 1995; Stein, & Meredith, Honeycutt, & McDade, 1989; Stein, Wallace, & Meredith, 1995) and in humans (e.g., Rafal et al., 1991). Neurophysiological studies in a number of species have demonstrated that, by the time sensory information reaches the deeper layers of the SC, it has been transformed into spatiotopically arrayed 'maps' of auditory, visual, and somatosensory space (e.g., see Groh & Sparks, 1996; King, 1993; Stein & Meredith, 1993). Moreover, these maps are in approximate spatiotopic register with each other (e.g., a bimodal visual-auditory cell that responded to visual stimuli above and to the right of the animal would also respond to sounds coming from the upper right as well), and with the motor maps found in the deepest layers, which are associated with overt orienting of the eyes, head, and body. Although much of the neuroscience interest in recent years has tended to focus on multimodal integration and its implications for spatial orienting within just the SC (perhaps because the deeper layers of the SC have one of the densest concentrations of multisensory neurons in the brain; Stein et al., 1995), there are actually many other neural centers, such as the posterior parietal cortex, the putamen, and the premotor cortex, that also show multimodal spatial integration and may also be involved in the modulation of attention (e.g., Graziano & Gross, 1994, 1998; Rizzolatti, Scandolara, Matelli, & Gentilucci, 1981). For example, neurophysiological studies reported by Graziano and Gross (1994, 1998) have demonstrated the existence of bimodal cells in several areas of the monkey brain (including the premotor cortex, parietal area 7b, and the putamen) that respond to tactile stimuli on the hand, as well as to visual stimuli presented near the hand.
252
Spence
Critically, the visual receptive field (RF) of such neurons follow the hand around as different postures are adopted, hence maintaining appropriate visuotactile register across posture change. Cells in these areas therefore provide another possible neural substrate for the crossmodal capture effects between vision and touch when participants adopt different postures, such as crossing their hands (e.g., Kennett et al., 2001, submitted; Spence, Kingstone, Shore, & Gazzaniga, 2001).
Crossmodal Capture in the Applied Domain Given the extensive evidence for crossmodal attentional capture in the laboratory, it is important to ask whether such findings have any application outside the laboratory. In recent years, there has been a rapid growth of interest in the use of auditory, tactile and multimodal warning signals to capture the attention of operators working in visually-cluttered environments (e.g., Liu, 2001; Selcon, Taylor, & McKenna, 1995; Sklar & Sarter, 1999; see Spence & Driver, 1997b, for a review). In particular, in situations where operators have to respond rapidly to time-critical information, such as missile approach warning signals for pilots, when even small time savings can be vital (see Doll, Gerth, Engelman, & Folds, 1986; Selcon et al., 1995). For example, freefield (and virtual) auditory cues have been shown to provide an effective means of crossmodally capturing a pilot's visual attention, particularly when searching for visual targets in the large and cluttered visual displays typical of many aircraft (e.g., Bolia, D'Angelo, & McKinley, 1999; Doyle & Snowden, 1999; Perrott, Cisneros, McKinley, & D'Angelo, 1996; Perrott, Saberi, Brown, & Strybel, 1990; Perrott, Sadralodabai, Saberi, & Strybel, 1991). Perrott et al. (1996) showed that the time required by pilots to localize and respond to a visual target presented amongst visual distractors, at any azimuth from 0 degrees to 360 degrees, and from elevations 90 degrees above to 70 degrees below fixation, can be dramatically reduced by presenting a localized free-field auditory warning signal from the same spatial location (without any concomitant increase in errors, thus ruling out a non-attentional explanation of these findings). These auditory facilitation effects have not only been reported for visual stimuli presented out of the current field of view, but also for visual targets lying within just a few degrees of fixation, where the RT facilitation seen in cluttered visual environments can still exceed 300 ms. In fact, in certain situations, the benefits of using auditory cues to capture a pilot's visual attention have been shown to outweigh those achieved by enhancing the saliency of the visual target stimuli themselves (e.g., Perrott et al., 1991). Given the growing interest in this area it seems clear that cockpit designers will increasingly move toward using auditory, tactile, and/or multimodal warning signals to crossmodally capture pilot's (and other interface operator's) visual attention. However, as Spence and Driver (1997b, see also McBride & Ntuen, 1997) have pointed out, it is important to note the potential trade-off associated with using multimodal warning signals, which is that their implementation may require the operator to monitor additional channels (and hence to divide their attention between several modalities simultaneously; see Spence, Nicholls, & Driver, 2001).
Crossmodal Attentional Capture
253
Conclusions
There has been a rapid growth of interest in the study of crossmodal attentional capture in recent years. Taken together, this research clearly shows that crossmodal attentional capture can occur between all possible combinations of auditory, visual, and tactile stimuli, at least under certain conditions. These crossmodal capture effects occur even when the cue modality is entirely irrelevant to the participant's task, suggesting that crossmodal capture occurs automatically. It will nevertheless be an important questions for future research to determine to what extent exogenous crossmodal capture effects can be modulated by endogenous factors, such as the direction of endogenous spatial attention to a particular modality or location (cf. Klein et al., 1987; Spence, Ranson, & Driver, 2000; Widmann & Schr6ger, 1999). Crossmodal attentional capture seems to have a particularly robust effect on spatial tasks, or in situations in which space is relevant to the participant's task (McDonald & Ward, 1999, submitted; Posner, 1978; Spence & Driver, 1994), and this may be because such tasks require participants to respond on the basis of a multimodal neural representation of space in the brain. In conclusion, it is clear that the existence of extensive crossmodal links in spatial attention between audition, vision, and touch, makes good functional sense given that information regarding an event presented in different modalities will normally occur in the same spatial location. The existence of crossmodal attentional capture ensures that mechanisms of attention are coordinated across the modalities, so that the common relevant information from novel events in our environment will get selected together across the different senses, regardless of posture. Footnotes
It should be noted that such coordination poses a considerable computational challenge, because the stimulus properties signalling a common source across the modalities (e.g., the various cues to location in audition, vision, and touch) differ so greatly at the initial stages of sensory processing (e.g., vision is retinotopic, whereas audition is initially tonotopic and then head-centred, while touch is initially coded somatotopically). 2 The term 'crossmodal attentional capture' is used here to denote situations in which the presentation of a spatially-nonpredictive peripheral event in one sensory modality leads to an exogenous shift of attention in another modality to the cued location. It is important to distinguish this use of the term 'crossmodal capture' from that seen in crossmodal conflict situations, where information in one modality is shown to dominate over conflicting information presented in another modality. For example, in the well-known ventriloquist effect where we hear a voice as coming from the lips we see move when they are presented from different (i.e., conflicting) locations. This use of the term 'capture' to describe such intersensory bias effects (i.e., crossmodal perceptual capture) has a long history in experimental psychology (e.g., Caclin, Soto-Faraco, Kingstone, & Spence, submitted; Posner,
254
Spence
Nissen, & Klein, 1976; Rock & Harris, 1967), but should be distinguished from the crossmodal attentional capture effects discussed here. 3 It has been argued that simple detection responses to auditory stimuli may be based on an 'early' tonotopic stimulus representation, in which spatial location information is not made explicit (see Spence & Driver, 1994; McDonald & Ward, 1999, on this point). This contrasts with vision, where even the earliest representations are spatial (i.e., retinotopic). 4 It should be noted that Dufour's use of a speeded discrimination response in one experiment and an unspeeded discrimination response in the other experiment makes it difficult to draw any firm conclusions regarding the underlying reason why crossmodal capture effects were reported in only one experiment (i.e., it is unclear whether the difference should be attributed to differences in the nature of the tasks, or of the particular response measures used). It is also interesting to note that Briand and Klein (1987) reported a somewhat different pattern of results in their unimodal study of visual capture. They showed that the peripheral presentation of a visual cue facilitated performance on both visual feature detection and conjunction detection tasks, though the effects were larger for the conjunction task. 5 The only exception to this finding was reported by Ward et al. (1998, Experiment 8) who actually reported that visual cues had a significant inhibitory effect on ipsilateral auditory elevation responses in the orthogonal cueing task. As discussed later, this atypical result may have been caused by the fact that Ward et al. used a highly-localizable white noise cues, rather than the pure tone cues (which are hard to localize in terms of their elevation) used in previous studies. 6 It is interesting to note here that Hopfinger and Mangun (1998) showed P 1 modulation at short SOAs, suggesting a possible difference between intramodal and crossmodal attentional capture effects. 7 It should be noted that this spatial elevation discrepancy between cue and target stimuli in the orthogonal-cueing task is particularly pronounced for the case of auditory targets, where the target loudspeakers have to be separated by a large elevation difference in order for participants to be able to discriminate target elevation reliably. 8 One result which does not immediately fit into this framework is the finding that visual cues facilitate elevation discrimination responses for visual targets positioned 14 degrees or more above or below the cue light (Spence & Driver 1997; Ward et al., 1998, Experiment 8). However, it is possible that such a result may partially reflect local landmarking (which may facilitate elevation discrimination responses for targets on the cued side), rather than attentional facilitation. 9 Although cue and target stimuli can be presented from the same, or very similar, locations in the majority of orthogonal-cueing studies (e.g., see Figure 3), the one situation in which spatial co-location of cue and target stimuli is more difficult is when visual cues precede auditory targets (see Figure 1), precisely the situation in which crossmodal attentional capture effects have not been demonstrated using this task.
CrossmodalAttentionalCapture
255
10Note that one problem with Ward et al.'s account of cue complexity is that no definition has yet been given of what constitutes a complex, rather than a simple, cueing environment. ~ Though note that McDonald and Ward's (1999) spatial relevance hypothesis cannot account for the intramodal auditory capture effects reported by Mondor and Amirault (1998, Experiment 1). ~z However, as McDonald et al. (2001) point out, even the demonstration of reciprocal crossmodal capture effects between all possible combinations of auditory, visual, and tactile stimuli does not necessarily imply that attentional capture is mediated by a purely supramodal mechanism, because one cannot rule out the possibility that a shift of attention in one modality might elicit a separate shift of attention in the other 'tightly-linked' modalities. References
Bernstein, I. H. & Edelstein, B. A. (1971). Effects of some variations in auditory input upon visual choice reaction time. Journal of Experimental Psychology, 87, 241-247. Bolia, R. S., D'Angelo, W. R., & McKinley, R. L. (1999). Aurally-aided visual search in three-dimensional space. Human Factors, 41,664-669. Briand, K. A. (1998). Feature integration and spatial attention: More evidence of a dissociation between endogenous and exogenous orienting. Journal of Experimental Psychology: Human Perception and Performance, 24, 1243-1256. Briand, K. A., & Klein, R. M. (1987). Is Posner's "beam" the same as Treisman's "glue"?: On the relation between visual orienting and feature integration theory. Journal of Experimental Psychology: Human Perception and Performance, 13,228-241. Buchtel, H. A. & Butter, C. M. (1988). Spatial attention shifts: Implications for the role of polysensory mechanisms. Neuropsychologia, 26, 499-509. Buchtel, H. A., Butter, C. M., & Ayvasik, B. (1996). Effects of stimulus source and intensity on covert orientation to auditory stimuli, Neuropsychologia, 34, 979-985. Bushara, K. O., Weeks, R. A., Ishii, K., Catalan, M.-J., Tian, B., Rauschecker, J. P., & Hallett, M. (1999). Modality-specific frontal and parietal areas for auditory and visual spatial localization in humans. Nature Neuroscience, 2, 759765. Butter, C. M., Buchtel, H. A., & Santucci, R. (1989). Spatial attentional shifts: Further evidence for the role of polysensory mechanisms using visual and tactile stimuli. Neuropsychologia, 27, 1231-1240. Chong, T. & Mattingley, J. B. (2000). Preserved cross-modal attentional links in the absence of conscious vision: Evidence from patients with primary visual cortex lesions. Journal of Cognitive Neuroscience, 12 (Supp.), 38.
256
Spence Doll, T. J., Gerth, J. M., Engelman, W. R., & Folds, D. J. (1986).
Development of simulated directional audio for cockpit applications (USAF Report AAMRL-TR-86-014). Wright-Patterson Air Force Base, OH: Armstrong Aerospace Medical Research Laboratory. Doyle, M. C. & Snowden, R. J. (1999). The effect of auditory warning signals on visual target identification. In D. Harris (Ed.), Engineering Psychology
and Cognitive Ergonomics, Vol. 4: Job Design, Product Design and HumanComputer Interaction (pp. 245-251). Ashgate Publishing: Hampshire. Driver, J. (1996). Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature, 381, 66-68. Driver, J., & Spence, C. J. (1994). Spatial synergies between auditory and visual attention. In C. Umilt/l & M. Moscovitch (Eds.), Attention and performance: Conscious and nonconscious information processing, (Vol. 15, pp. 311-331). MIT Press: Cambridge, MA. Driver, J., & Spence, C. (1998). Crossmodal links in spatial attention. Philosophical Transactions of the Royal Society Section B, 353, 1319-1331. Driver, J., & Spence, C. (2000). Multisensory perception: Beyond modularity and convergence. Current Biology, 1O, R731-R735. Dufour, A. (1999). Importance of attentional mechanisms in audiovisual links. Experimental Brain Research, 126, 215-222. Duncan, J. (1980). The demonstration of capacity limitation. Cognitive Psychology, 12, 75-96. Eimer, M., & Driver, J. (2000). An event-related brain potential study of cross-modal links in spatial attention between vision and touch. Psychophysiology, 3 7, 697-705. Eimer, M. & Schr6ger, E. (1998). ERP effects of intermodal attention and cross-modal links in spatial attention. Psychophysiology, 35, 313-327. Farah, M. J., Wong, A. B., Monheit, M. A., & Morrow, L. A. (1989). Parietal lobe mechanisms of spatial attention: Modality-specific or supramodal? Neuropsyehologia, 27, 461-470. Fisher, G. H. (1962). Resolution of spatial conflict. Bulletin of the British Psychological Society, 46, 3A. Folk, C. L., Remington, R. W., & Johnston, J. C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18, 1030-1044. Graziano, M. S. A. & Gross, C. G. (1994). Mapping space with neurons. Current Directions in Psychological Science, 3, 164-167. Graziano, M. & Gross, C. (1998). Spatial maps for the control of movement. Current Opinion in Neurobiology, 8, 195-201. Groh, J. M. & Sparks, D. L. (1996). Saccades to somatosensory targets. 2. Motor convergence in primate superior colliculus. Journal of Neurophysiology, 75, 428-438.
CrossmodalAttentionalCapture
257
Hillyard, S. A., Simpson, G. V., Woods, D. L., Van Voorhis, S., & Munte, T. F., (1984). Event-related brain potentials and selective attention to different modalities. In F. Reinoso-Suarez & C. Ajmone-Marson (Eds.), Cortical integration, (pp. 395-414). New York: Raven Press. Hopfinger, J. B. & Mangun, G. R. (1998). Reflexive attention modulates processing of visual stimuli in human extrastriate cortex. Psychological Science, 9, 441-447. Hugdahl, K. & Nordby, H. (1994). Electrophysiological correlates to cued attentional shifts in the visual and auditory modalities. Behavioral and Neural Biology, 62, 21-32. Johnen, A., Wagner, H., & Gaese, B. H. (2001). Spatial attention modulates sound localization in barn owls. Journal of Physiology, 85, 1009-1012. Jones, M. R., Moynihan, H., MacKenzie, N., & Hoffman, J. (in press). Stimulus-driven attending in dynamic arrays. Psychological Science. Kadunce, D. C., Vaughan, J. W., Wallace, M. T., Benedek, G., & Stein, B. E. (1997). Mechanisms of within- and cross-modality suppression in the superior colliculus. Journal of Neurophysiology, 78, 2834-47. Kennett, S., Eimer, M., Spence, C., & Driver, J. (2001). Tactile-visual links in exogenous spatial attention under different postures: Convergent evidence from psychophysics and ERPs. Journal of Cognitive Neuroscience, ! 3, 462-468. Kennett, S., Spence, C., & Driver, J. (submitted). The spatial coordinates of visuo-tactile links in covert exogenous spatial attention. Perception & Psychophysics. King, A. J. (1993). A map of auditory space in the mammalian brain: Neural computation and development. Experimental Physiology, 78, 559-590. Klein, R., Brennan, M., D'Aloisio, A., D'Entremont, B., & Gilani, A. (1987). Covert cross-modality orienting of attention. Unpublished manuscript. Klein, R. M., & Shore, D. I. (2000). Relationships among modes of visual orienting. In S. Monsell & J. Driver (Eds.), Control of cognitive processes: Attention and performance XVIII (pp. 195-208). Cambridge, MA: MIT Press. Klein, R. M. & Taylor, T.L. (1994). Categories of cognitive inhibition with reference to attention. In D. Dagenbach & T. H. Carr, Inhibitory processes in attention, memory, and language (pp. 113-150). Academic Press. Ladavas, E. (1993). Shifts of attention in patients with visual neglect. In I.H. Robertson & J.C. Marshall (Eds), Unilateral neglect: Clinical and experimental studies, (pp. 193-209). Hillsdale, NJ: Erlbaum. Liu, Y.-C. (2001). Comparative study of the effects of auditory, visual and multimodal displays on drivers' performance in advanced traveller information systems. Ergonomics, 44, 425-442. Lloyd, D. M., Merat, N., McGlone, F., & Spence, C. (submitted). Crossmodal links in covert endogenous spatial attention between audition and touch. Perception & Psychophysics. Macaluso, E., Frith, C., & Driver, J. (2000). Modulation of human visual cortex by crossmodal spatial attention. Science, 289, 1206-1208.
258
Spence
McBride, M. E., & Ntuen, C. A. (1997). The effects of multimodal display aids on human performance. Computers and Industrial Engineering, 33, 197-200. McDonald, J. J., Teder-S~ilej~irvi, W. A., Di Russo, F., & Hillyard, S. A. (2000a). Looking at sound: Involuntary auditory attention modulates neural processing in extrastriate visual cortex. Poster presented at the Annual Meeting of the Society for Psychophysiological Research. San Diego: California, October. McDonald, J. J., Teder-S~.lej~irvi, W. A., & Hillyard, S. A. (2000b). Involuntary orienting to sound improves visual perception. Nature, 407, 906-908. McDonald, J. J., Teder-S~ilej~irvi, W. A., Heraldez, D., & Hillyard, S. A. (2001). Electrophysiological evidence for the "missing link" in crossmodal attention. Canadian Journal of Experimental Psychology, 55, 143-151. McDonald, J. J., & Ward, L. M. (1999). Spatial relevance determines facilitatory and inhibitory effects of auditory covert spatial orienting. Journal of Experimental Psychology: Human Perception and Performance, 25, 1234-1252. McDonald, J. J., & Ward, L. M. (2000). Involuntary listening aids seeing: Evidence from human electrophysiology. Psychological Science, l 1, 167-171. McDonald, J. J. & Ward, L. M. (submitted, 2000). Crossmodal consequences of involuntary spatial attention and inhibition of return. Journal of Experimental Psychology: Human Perception and Performance. Miyauchi, S., Hikosaka, O., Shimojo, S., & Okamura, H. (1993). Spatial attention is cross-modal: An evoked potential study. Investigative Ophthalmology and Visual Science, 34, 1234. Mondor, T. A. & Amirault, K. J. (1998). Effect of same- and differentmodality spatial cues on auditory and visual target identification. Journal of Experimental Psychology: Human Perception and Performance, 24, 745-755. M~iller, H. J. & Findlay, J. M. (1987). Sensitivity and criterion effects in the spatial cueing of visual attention. Perception & Psychophysics, 42, 383-399. Pashler, H. E. (1998). The Psychology of Attention. MIT Press: Cambridge: MA. Pavani, F., L~.davas, E., & Driver, J. (in press). Selective deficit of auditory localisation in patients with visuospatial neglect. Neuropsychologia. Peck, C. K. (1987). Visual-auditory interactions in cat superior colliculus: Their role in control of gaze. Brain Research, 420, 162-166. Perrott, D. R., Cisneros, J., McKinley, R. L., & D'Angelo, W. (1996). Aurally aided visual search under virtual and flee-field listening conditions. Human Factors, 38, 702-715. Perrott, D. R., Saberi, K., Brown, K., & Strybel, T. Z. (1990). Auditory psychomotor coordination and visual search performance. Perception & Psychophysics, 48, 214-226. Perrott, D. R., Sadralodabai, T., Saberi, K., & Strybel, T. Z. (1991). Aurally aided visual search in the central visual field: Effects of visual load and visual enhancement of the target. Human Factors, 33, 389-400. Posner, M. I. (1978). Chronometric explorations of mind. Hillsdale, NJ: Erlbaum.
Crossmodal Attentional Capture
259
Posner, M. I. (1988). Structures and functions of selective attention. In T. Boll & B. K. Bryant (Eds.), Master lectures in clinical neuropsychology and brain function: Research, measurement and practice, (pp. 171-202). Washington, DC: American Psychological Association. Posner, M. I. (1990). Hierarchical distributed networks in the neuropsychology of selective attention. In A. Caramazza (Ed.), Cognitive
neuropsychology and neurolinguistics." Advances in models of cognitive function and impairment, (pp. 187-210). Hillsdale, NJ: Erlbaum. Posner, M. I., Nissen, M. J., & Klein, R. M. (1976). Visual dominance: An information-processing account of its origins and significance. Psychological Review, 83, 157-171. Rafal, R. (1996). Visual attention: Converging operations from neurology and psychology. In A. F. Kramer, M. G. H. Coles, & G. D. Logan (Eds.), Converging operations in the study of visual selective attention (pp. 139-102). Washington, DC: American Psychological Association. Rafal, R., Henik, A., & Smith, J. (1991). Extrageniculate contributions to reflex visual orienting in normal humans: A temporal hemifield advantage. Journal of Cognitive Neuroscience, 3,322-328. Reuter-Lorenz, P. A., & Rosenquist, J. N. (1996). Auditory cues and inhibition of return: The importance of oculomotor activation. Experimental Brain Research, 112, 119-126. Rizzolatti, G., Scandolara, C., Matelli, M., & Gentilucci, M. (1981). Afferent properties of periarcuate neurons in macaque monkeys. II. Visual responses. Behavioural Brain Research, 2, 147-163. Robinson, D. L., & Kertzman, C. (1995). Covert orienting of attention in macaques. III. Contributions of the superior colliculus. Journal of Neurophysiology, 74, 713-721. Rock, I., & Harris, C. S. (1967, 17 May). Vision and touch. Scientific American, 216, 96-104. Rorden, C., & Driver, J. (1999). Does auditory attention shift in the direction of an upcoming saccade? Neuropsychologia, 37, 357-377. Rorden, C., & Driver, J. (2001). Spatial deployment of attention within and across hemifields in an auditory task. Experimental Brain Research, 13 7, 487-496. Schmitt, M., Postma, A., & de Haan, E. (2000). Interactions between exogenous auditory and visual spatial attention. Quarterly Journal of Experimental Psychology, 53A, 105-130. Schmitt, M., Postma, A., & de Haan, E. (in press). Cross-modal exogenous attention and distance effects in vision and hearing. European Journal of Cognitive
Psychology. Selcon, S. J., Taylor, R. M., & McKenna, F. P. (1995). Integrating multiple information sources: using redundancy in the design of warnings. Ergonomics, 38, 2362-2370. Sherrington, C. S. (1920). Integrative action of the nervous system. New Haven: Yale University Press.
Spence
260
Shore, D. I., Spence, C., & Klein, R. M. (2001). Visual prior entry.
Psychological Science, 12, 205-212. Simon, J. R., & Craft, J. L. (1970). Effects of an irrelevant auditory stimulus on visual choice reaction time. Journal of Experimental Psychology, 86, 272-274. Simpson, W. E. (1972). Latency of locating lights and sounds. Journal of Experimental Psychology, 93, 169-175. Sklar, A. E., & Sarter, N. B. (1999). Good vibrations: Tactile feedback in support of attention allocation and human-automation coordination in event-driven domains. Human Factors, 41,543-552. Spence, C. J., & Driver, J. (1994). Covert spatial orienting in audition: Exogenous and endogenous mechanisms facilitate sound localization. Journal of Experimental Psychology." Human Perception and Performance, 20, 555 -574. Spence, C., & Driver, J. (1996). Audiovisual links in endogenous covert spatial attention. Journal of Experimental Psychology: Human Perception and Performance, 22, 1005-1030. Spence, C., & Driver, J. (1997a). Audiovisual links in exogenous covert spatial orienting. Perception & Psychophysics, 59, 1-22. Spence, C., & Driver, J. (1997b). Cross-modal links in attention between audition, vision, and touch: Implications for interface design. International Journal of Cognitive Ergonomics, 1, 351-373. Spence, C., & Driver, J. (1998). Auditory and audiovisual inhibition of return. Perception & Psychophysics, 60, 125-139. Spence, C., & Driver, J. (1999). A new approach to the design of multimodal warning signals. In D. Harris (Ed.), Engineering Psychology and Cognitive
Ergonomics, Vol. 4: Job Design, Product Design and Human-Computer Interaction (pp. 455-461). Ashgate Publishing: Hampshire. Spence, C., & Driver, J. (2000). Attracting attention to the illusory location of a sound: Reflexive crossmodal orienting and ventriloquism. Neuroreport, 11, 2057-2061. Spence, C., Kingstone, A., Shore, D. I., & Gazzaniga, M. S. (2001). Representation of visuotactile space in the split brain. Psychological Science, 12, 90-93. Spence, C., & Lupi~.nez, J. (1998). Crossmodal links in attention revealed by the orthogonal temporal order judgment task. Paper presented at II Congreso de la Sociedad Espanyola de Psicologia Experimental (SEPEX 98). Granada, Spain, 17th December. Spence, C., & McGlone, F. P. (in press). Reflexive orienting of tactile attention. Experimental Brain Research. Spence, C., Nicholls, M. E. R., & Driver, J. (2001). The cost of expecting events in the wrong sensory modality. Perception & Psychophysics, 63, 330-336. Spence, C., Nicholls, M. E. R., Gillespie, N., & Driver, J. (1998). Crossmodal links in exogenous covert spatial orienting between touch, audition, and vision. Perception & Psychophysics, 60, 544-557.
Crossmodal Attentional Capture
261
Spence, C., Pavani, F., & Driver, J. (2000). Crossmodal links between vision and touch in covert endogenous spatial attention. Journal of Experimental Psychology: Human Perception and Performance, 26, 1298-1319. Spence, C., Ranson, J., & Driver, J. (2000). Crossmodal selective attention: On the difficulty of ignoring sounds at the locus of visual attention. Perception & Psychophysics, 62, 410-424. Spence, C., Shore, D. I., & Klein, R. M. (in press). Multimodal prior entry.
Journal of Experimental Psychology." General. Sperling, G., & Dosher, B. A. (1986). Strategy and optimization in human information processing. In K. Boff, L. Kaufman, & J. Thomas (Eds.), Handbook of Perception and Performance, Vol. 1 (pp. 2-1 - 2-65). New York: Wiley. Stein, B. E., London, N., Wilkinson, L. K., & Price, D. P. (1996). Enhancement of perceived visual intensity by auditory stimuli: A psychophysical analysis. Journal of Cognitive Neuroscience, 8, 497-506. Stein, B. E. & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT Press. Stein, B., & Meredith, M. A., Honeycutt, W. S., & McDade, L. (1989). Behavioral indices of multisensory integration: Orientation to visual cues is affected by auditory stimuli. Journal of Cognitive Neuroscience, 1, 12-24. Stein, B. E., Wallace, M. T., & Meredith, M. A. (1995). Neural mechanisms mediating attention and orientation to multisensory cues. In M. S. Gazzaniga (Ed.), The cognitive neurosciences, (pp. 683-702). Cambridge, MA: MIT Press. Tassinari, G., & Campara, D. (1996). Consequences of covert orienting to non-informative stimuli of different modalities: A unitary mechanism? Neuropsychologia, 34, 235-245. Teder-S~lej/~rvi, W. A., Mfinte, T. F. Sperlich, F.-J., & Hillyard, S. A. (2000). Intra-modal and cross-modal spatial attention to auditory and visual stimuli. An event-related brain potential study. Cognitive Brain Research, 8, 327-343. Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97-136. Vroomen, J., Bertelson, P., & de Gelder, B. (in press). Directing spatial attention towards the illusory source of a ventriloquized sound. Acta Psychologica. Ward, L. M. (1994). Supramodal and modality-specific mechanisms for stimulus-driven shifts of auditory and visual attention. Canadian Journal of Experimental Psychology, 48, 242-259. Ward, L. M., McDonald, J. A., & Golestani, N. (1998). Cross-modal control of attention shifts. In R. Wright (Ed.), Visual attention, (pp. 232-268). Oxford University Press: New York. Ward, L. M., McDonald, J. J., & Lin, D. (2000). On asymmetries in crossmodal spatial attention orienting. Perception & Psychophysics, 62, 1258-1264. Warren, D. H. (1970). Intermodality interactions in spatial localization.
Cognitive Psychology, 1, 114-133.
262
Spence
Weeks, R. A., Aziz-Sultan, A., Bushara, K. O., Tian, B., Wessinger, C. M., Dang, N., Rauschecker, J. P., & Hallett, M. (1999). A PET study of human auditory spatial processing. Neurosr Letters, 262, 155-158. Welch, R. B., & Warren, D. H. (1986). Intersensory interactions. In K.R. Boff, L. Kaufman, & J.P. Thomas (Eds.), Handbook of perception and performance, Vol. 1: Sensory processes and perception (pp. 25-1 - 25-36). John Wiley and Sons, New York. Widmann, A., & Schr6ger, E. (1999). Do lateralized visual stimuli exogenously orient auditory attention? Poster presented at the Annual Meeting of the Society for Psychophysiological Research, Granada: Spain, October. Woods, D. L., Alho, K., & Algazi, A. (1992). Intermodal selective attention 1: Effects on event-related potentials to lateralized auditory and visual stimuli. Electroencephalography and Clinical Neurophysiology, 82, 341-355. Yantis S. (1996). Attentional capture in vision. In A. F. Kramer, M. G. Coles, & Logan, G. D. (Eds). Converging operations in the study of visual selective attention. (pp. 45-76). Washington, DC: American Psychological Association. Yantis, S. (2000). Goal-directed and stimulus-driven determinants of attentional control. In S. Monsell & J. Driver (Eds.), Control of cognitive processes: Attention and performance XVIII (pp. 73-103). Cambridge, MA: MIT Press. Author Notes
The author wishes to extend his thanks to Ray Klein and John McDonald for extremely helpful comments on an earlier version of this manuscript, to David Shore and Steffan Kennett for helpful discussions on many of the points covered here, and to Chris Rorden, Francesco Pavani, and Steffan Kennett for artistic assistance. Correspondence concerning this article should be addressed to Dr. Charles Spence, Department of Experimental Psychology, University of Oxford, South Parks Road, Oxford, OX1 3UD, UK. Electronic mail may be sent to
[email protected].
Part IV Developmental
This Page Intentionally Left Blank
Attraction, Distraction,and Action: MultiplePerspectiveson AttentionalCapture C. Folk and B. Gibson(Editors) 9 ElsevierScience B. V. All rights reserved.
11
265
Testing Models of Attentional Capture During Early Infancy James L. Dannemiller
The Selectivity of Visual Attention Early in Life There is a large literature on the development of visual attention during the first year of life (e.g., Atkinson, Hood, Wattam-Bell & Braddick, 1992; Casey & Richards, 1988; Cohen, 1972; Freeseman, Colombo & Coldren, 1993). Ever since Fantz (1958) demonstrated that infants are selective in their looking behavior, developmental psychologists have used this selective visual attention to ask questions about what infants at various ages can discriminate and what natural preferences exist during this early period. Models of these preferences have been proposed based on factors such as contour density (Karmel, 1969) or visibility and contrast sensitivity (Banks & Ginsburg, 1985; Gayl, Roberts & Werner, 1983). More recently, specific phenomena associated with visual attention in adults such as inhibition of return (Hood, 1993), cued facilitation and covert orienting (Johnson & Tucker, 1996), the gap effect in saccadic responding (Matsuzawa & Shimojo, 1997), and attentional pop-out (Catherwood, Skoien & Holt, 1996; Quinn & Bhatt, 1998) have been studied in this age range as well. What do we know about attentional capture as it develops during infancy? It is important in answering this question to understand that capture has an operational definition in visual search work with adults that is not necessarily the same as its definition in the developmental literature. Capture in the adult visual search literature refers to the interference that typically results from the appearance of an unexpected, odd or novel stimulus (Folk & Remington, 1998; Theeuwes, 1991). When attention is captured by such sti~nuli, reaction times to detect a different target increase modestly, but significantly. There is controversy in the adult literature on the issue of whether or not involuntary capture ever really occurs (Folk & Remington, 1998; Yantis & Egeth, 1999). Notice, however, that the operational definition of capture in the adult literature requires instructions to attend to a primary target and is measured by the extent to which a strong, unexpected stimulus interferes with the detection of the primary target. It should be obvious that such an operational definition of visual capture will not work for infants because it is impossible to "instruct" infants to attend to a primary target.
266
Dannemiller
Despite this obvious difference in how capture is operationalized in infancy and adulthood, there is evidence of attentional capture of a different sort in infancy. During early infancy, an infant's visual attention may be really captured by a strong stimulus to the point that s/he has a difficult time disengaging attention to switch it to another location. This type of capture has been called "sticky fixation" because it is as though the infant's fixation is stuck on a particular object or location with an inability to disengage attention from that location. This dramatic type of capture usually disappears by two months of age (Hood & Atkinson, 1993). Other forms of capture during infancy are less dramatic, but they are reasonably reliable. Overt orienting to a strong, exogenous stimulus, especially one presented in the periphery is easily observable from birth. I have used this type of exogenous orienting to ask questions about what captures an infant's attention early in life, how multiple objects might compete for attention, and how easily young infants can differentiate multiple stimuli within their visual fields when those stimuli appear simultaneously (Dannemiller, 1998). Our visual fields are typically populated by numerous surfaces and features, so the studies that I will describe below are really designed to bring that type of complexity into the laboratory to study the development of visual attention. It is my long-term goal to be able use the results of studies like these not only to understand the development of visual attention but also to inform our understanding of visual attention and attentional capture in its mature state. To accomplish this goal, I think that it is necessary to model exogenously driven visual attention quantitatively. I will describe one such model and tests of its predictions in this chapter.
Methodology and Modeling Methodology It is necessary to discuss some preliminary issues before proceeding to a model of attentional capture. The paradigm that I have used to study the early development of visual attention relies on a stimulus display that looks much like a visual search display used with adults. Figure 1 shows an example of a display that I have used over the last several years to study the selectivity of visual attention in infants. There are always equal numbers of bars on both sides of the display. All of the bars are static throughout a trial with the exception of one bar (the target) that oscillates in place usually at 1.2 or 2.4 Hz (periods of 833 and 417 ms, respectively) through one degree of visual angle. This target bar appears randomly across trials on the right or the left side of the display usually 10 degrees from the center. Across trials all of the bars on the display may be the same color and have the same contrast, or they may differ in color or contrast. In all of the work to be reported below, a maximum of two different types of bars is presented on the display. For example, half of the bars on the display may be red, and half may be green. Half may have positive contrast polarity (brighter than the background), and half may
267
Capture in Infancy
have negative contrast polarity (darker than the background). The balance across the display between the two colors or contrasts is manipulated. Balance here refers simply to the n u m b e r of bars o f each o f the two types on each side o f the display. A balanced display has an equal n u m b e r o f each o f the two types o f bars on both sides o f the display. An unbalanced display has more bars o f one type on one side o f the display and more bars o f the other type on the other side o f the display. It is this spatial balance variable that is the major independent variable used to assess the selectivity o f early orienting. More details on this manipulation follow below. Finally, the n u m b e r of bars on the display m a y also be manipulated. All of the bars on the display appear simultaneously from a uniform field. In other words, at the start o f a trial the infant is looking at a display with no bars on a spatially uniform background (typically white), and the bars then appear with a sharp temporal onset.
DID 0 ] 0 0 H In I I I U I 1" U I U 0 BIQ III ipsilateral
U ID 0 0 0 ~p n H U o DIo
D
I Iu I I I I I III
contralateral
Figure 1. Heterochromatic ipsilateral (left panel) and heterochromatic contralateral (right panel) displays. The display was 40 deg horizontally by 31 degrees vertically (not drawn to scale). The bars were 5 deg tall and 0.75 degrees wide. The bars were distributed randomly within 14 imaginary, equal-size columns (seven columns per side) with the constraints that no more than two bars could occupy the same column and all bars had to be completely visible. The moving target (indicated by arrows) appeared randomly across trials 10 degrees to the fight or left of center (a static bar always appeared in the same position on the side opposite to the moving singleton). The target horizontally oscillated in place at 1.2 Hz through either 0.75 or 1.0 degrees (peak to mean), all the other bars in the field were static throughout the trial, and all of the bars on the display appeared simultaneously from a uniform background. In these two heterochromatic conditions, the two classes of bars were distributed in the ratio of 11:3 across the two halves of the display with 14 bars of each class always present on each trial. The terms ipsilateral and contralateral always refer to the location of most of the putatively higher salience bars (red) with respect to the moving singleton. Red bars in this figure are represented by the black bars, and pink bars are represented by the white bars. In the homochromatic condition (not shown), all of the bars on the display were identical. An online observer starts each trial and watches the infant's overt orienting behavior to make a quick (usually < 2 sec), forced-choice j u d g m e n t about the side o f the display with the oscillating target. This observer is "blind" to the location o f the target on each trial, so the only way to produce data that exceeds 50% correct given a sufficiently large n u m b e r o f trials is for the infant to orient preferentially to the
268
Dannemiller
side of the display with the moving target, and for the adult observer to be sensitive to these overt orienting cues. The observer is given feedback on each trial in an attempt to maximize the percentage of correct judgments within the constraint of responding quickly. With several exceptions, this paradigm is essentially equivalent to Teller's (1979) Forced Choice Preferential Looking (FPL) procedure that has been used successfully to study early visual development. One difference is that in the present research the FPL observer is instructed to respond as quickly as possible while maintaining the percentage of correct judgments as high as possible. This contrasts with the standard use of FPL in which the observer can wait indefinitely to accumulate information from the infant before making a judgment. The other major difference between this technique as I am using it and how it typically has been used is that I am using this technique to study discrimination rather than detection. FPL usually generates data on the detection of threshold stimuli. One side of the display is typically blank and the other side has a nearthreshold stimulus. This is very different from the display shown in Figure 1 in which both sides of the display contain visible elements. In this sense, then, an infant faced with a display such as that shown in Figure 1 is doing something more like a discrimination task than a detection task. The discrimination may involve motion because only one side of the display contains a moving target, or it may involve differences on other stimulus dimensions such as color or luminance contrast. Is This Visual Search? Those familiar with visual search studies will notice the resemblance of the display in Figure 1 to visual search displays used with adults. The resemblance stops there. There are several critical differences between visual search tasks with adults and the attentional phenomena that I have been studying with these displays. 1. Adults can be instructed to search for a specific target; young infants cannot be so instructed, at least not directly. 2. As a result of point number one, the data from such displays with adults reflect their interpretation of the task and any attentional set or strategy induced by the instructions. In contrast, the data from such displays when presented to infants reflect their natural orienting tendencies. 3. The data from visual search studies with adults reflect the sensitivities of the adults directly. The data from infants are filtered through the sensitivities of the FPL observer. I have used the same FPL observer for the last six years in my lab, and this observer has tested more than 1000 infants, so I consider the contribution of this observer to the data collected with this paradigm to be stable and in a sense transparent. It might be true that a different observer could yield higher average percentages of correct judgments, but the important variables in these studies are manipulated within-subjects, so such inter-observer differences are largely irrelevant.
Capture in Infancy
269
A model of attentional capture during early infancy Definitions. Given these methodological preliminaries, it is now time to consider a model of what attentional capture might look like during this early postnatal period. What would it mean for an infant's attention to be captured with the display shown in Figure 1? I propose that capture would be indicated by two pieces of evidence: a) the percentage of correct judgments about the location of the moving singleton target would be above chance (50%), and b) the percentage of correct judgments would be influenced reliably by the balance across the two sides of the display of different static elements. This latter criterion essentially pits motion against the spatial imbalance manipulation to determine whether or not responding is systematically related to the latter for trials on which attention does not appear to be captured by the motion singleton. Consider these two criteria. First, capture by a motion singleton is similar to such effects in the literature on adult visual search (Nothdurft, 2000). Second, and perhaps more importantly, on trials when attention apparently is not captured by this singleton, it is nonetheless not randomly directed. Instead, it is influenced systematically by other elements in the visual field. This latter criterion appears to be more similar than the first to the meaning of capture in the literature with adults. These other elements interfere with orienting to the target. In adults, interference from salient distractors can also occur, although its likelihood may depend on whether or not the interfering element shares a feature with the target of the visual search (Yantis & Egeth, 1999). I have modeled this process quantitatively using signal detection theory. The oscillating bar is the signal, and there are multiple noise sources (all of the other static bars on the display). As such it is similar to signal detection models that have been proposed to explain certain types of visual search data collected from adults (Palmer, Ames & Lindsey, 1993; Palmer, Verghese & Pavel, 2000). It should be noted that despite the fact that the bars on the display may differ on multiple dimensions (e.g., color, contrast, motion), the decision variable is unidimensional as described below. Model Assumptions. It is useful to make the assumptions of this model explicit. 1. On each trial, each of the elements on the display produces a signal to orient to its location. 2. These signals are perturbed by intemal noise. This noise is independent of all display characteristics. 3. The variance of this intemal noise is equal across all of the elements on the display, independent of the mean level of response, random from trial to trial, and independent across different elements on the display. 4. When two different classes of bars are present on this display (e.g., red bars and pink bars), these classes may differ in their mean internal responses. It would typically be assumed that bars with the more saturated color (red) or
270
Dannemiller
with more luminance contrast would lead to higher mean internal responses than bars with less saturation or luminance contrast. 5. The internal noise that perturbs these orienting signals is the only significant source of noise in the system. 6. The decision rule that characterizes the infant's overt attentional orienting is to orient to the side of the display with the element that produces the maximum internal response. Notice that an overt, directional response is determined by the element that produces the largest internal response (maximum response decision rule). This is not the only decision rule that could be used. Perhaps the response is actually determined by the side of the display with the greater aggregate response. Indeed, other decision rules are possible, and I will return to this important issue below. Additionally, one might reasonably question the assumption that there are no other significant sources of noise in the system. The FPL observer may contribute noise perhaps in the decision stage, but in this model, this noise is considered negligible relative to the noise internal to the infant's visual system. One way of thinking about this is that given hypothetically identical orienting behaviors exhibited by the infant (e.g., direction of first look, directional head movement) on multiple trials, the FPL observer would have a high probability of making the same forced choice on all of these trials. In contrast, given identical stimulus configurations on multiple trials, the probability would be lower that the infant would orient in exactly the same way on those trials. It is also worth pointing out that there is no way to verify directly the assumption that orienting is always driven by the element with the largest internal response. Just as it is impossible to observe the internal noise that perturbs responses in a standard signal detection paradigm, so here it must be inferred from a pattern of data consistent with the predictions of the model. A potential problem arises when alternative models predict the same pattern of data, but this is a problem in any empirical study, so I will address it below when I consider alternative models. Why would data that obeyed this model constitute evidence of attentional capture? First, everyday, common definitions of capture are consistent with the idea that a strong or odd stimulus typically captures our attention. The maximum response decision rule simply instantiates this everyday definition of capture. Second, it is clear that overt attention in this case is nearly synonymous with eye and head movements. ~ In other words, looking at one side of the display is prima facie evidence that attention has been drawn to that side of the display. This model is the simplest, testable model of capture that I could generate given the limitations of the paradigm and common definitions of capture. It is not necessarily a model of attentional capture in adults. As is shown below, it is also not the only model that might reasonably predict the data from infants in this paradigm. One advantage of studying these processes in infants is that it permits us to examine the origins of attentional capture uncontaminated by issues of task relevance versus irrelevance (Yantis & Egeth, 1999). Highly salient featural
Capture in Infancy
271
singletons (e.g., a red target among all green distractors) may only capture attention to the extent that they provide some reliable information relevant to the location of the actual target. Because it is impossible to instruct infants to search for the moving target in the displays that we use, task relevance is a moot issue. Instead, this paradigm may allow us to examine the influence of salience when the salient elements are neither task-relevant nor task-irrelevant. It is interesting to speculate on when task-relevance may first come to play a role in visual attention, although it is beyond the scope of this chapter. Is the FPL paradigm the best way to test this model? One could argue that the response measure - a speeded forced-choice left versus right judgment - is not well matched to the predictions of the model. Wouldn't it be better to measure eye movements directly because the model predicts orienting to the specific location of the element that produced the largest internal response? It should be noted again that the maximum intemal response is not an observable quantity, so the model does not really predict the specific location of the maximum response on each trial. The model contains the assumption that the processes that govem early orienting use the maximum response as the basis for a directional decision. From this is derived the pattern of results that should be observed when the spatial balance of the two types of static bars on the display is manipulated. The model explains differences in the percentage of correct judgments made by the FPL observer as arising from differences in the mean intemal responses to the two classes of stimuli on this display. These intemal responses cannot be observed any more than the noise that is a standard part of signal detection theory can be observed. As is shown below, it is possible with such FPL data to refute the model proposed above, so for now the direct measurement of saccadic eye movements and fixational dwell times would provide useful but not absolutely necessary converging evidence. Model Predictions. What would the pattern of data look like if this were a valid model of attentional capture in early infancy? Consider the following three conditions all involving 28 bars: a) homochromatic trials in which all of the bars on the display are the same color, b) heterochromatic trials in which more of the putatively higher salience bars appear on the side ipsilateral to the moving target (ipsilateral condition), and c) heterochromatic trials in which more of the putatively higher salience bars appear on the side of the display contralateral to the moving target (contralateral condition). Salience in this case might simply and intuitively be operationalized by color saturation: red bars appear more saturated than pink bars when both are embedded in a white background. In the model presented above, the mean intemal signal to orient produced by a red bar is represented as being greater than the mean intemal signal to orient produced by a pink bar. Both of these mean internal signals, in turn, are less than the mean intemal signal to orient produced by the singleton moving target. To derive quantitative predictions from such a model, it is necessary to specify the noise distributions that perturb these intemal signals to orient as well as the mean internal responses to these various elements. In most signal detection
272
Dannemiller
models, equal-variance Gaussian distributions are used to model the noise (e.g., Green & Swets, 1966). I have used such distributions, but it is easier to model these distributions using double exponential distribution functions (Yellott, 1977). These double exponential distribution functions are probably indistinguishable from Gaussian distribution functions given the precision of the data generated by the paradigm described above. Their advantage is that they make the mathematical predictions from the single-target/multiple-noise maximum-response model more tractable. Yellott (1977) has shown that choice responses such as the left versus right choices involved in the FPL task described above have particularly simple expressions when these double exponential distribution functions are used. The interested reader is referred to Yellott (1977, p. 123 and pp. 137-141) for details of this derivation. For example, to predict the percentage of correct judgments in homochromatic conditions Equation l a will suffice:
(la) pc
-"
e'am + (n - 1). e,as e~m + (2 n -1). e/~
Here, ,am represents the mean internal response to the moving target, and ,as represents the mean internal response to the static distractors. There are n - 1 static distractors on the same side as the target, and 2n - 1 static distractors on the whole display. Equation l a simply shows the probability that the maximum internal response will arise either from the moving target or from one of the static distractors on the same side as the moving target. Equation 1a can be simplified to Equation l b by assuming without loss of generality that the internal response to the static bars averages zero thus yielding exp(/2s) = 1.0.
(lb) pc
=
e,Um+(n-1) e,Um+(2n-1)
In order to generate a point prediction from Equation 1b, it is necessary to assume a value for the mean internal response to the moving target, ,am. Alternatively, one could estimate this parameter from the observed data. If one were further willing to assume that this mean internal response to the moving target remained invariant over changes in the number of static bars on the display, then Equation 1 becomes most useful for predicting how the percentage of correct judgments should vary as the number of static noise bars on the display is manipulated.
Capture in Infancy
273
Equations l a and l b suffice for the first of the three conditions described above: the homochromatic condition. Now consider the equations for the two conditions that represent the spatial imbalance manipulation. Let the mean internal response to the putatively lower salience class of bars (pink) be ,at with the mean internal response to the static red bars again assumed to be 0. Each side of the display has 14 bars. Equation 2a represents the ipsilateral condition with 11 of 14 bars on the target side being red, and 3 of 14 bars on the side opposite to the target being red. The balance is reversed on the contralateral trials represented by Equation 2b. The moving target is assumed here to be red although the prediction can be easily altered if it is assumed to be pink. These two heterochromatic conditions with half red and half pink bars lead from the same model to Equations 2a and 2b shown next: (2a)
Pqpsi -
e'Um+lO+3e~/ e,Um+13+14e,ul
(2b)
PCc~
e~m+2+l le~l c~m+ 13+ 14e~l
It is worth stating that these equations capture the model in the sense that they show the probability that the maximum internal response on a given trial will arise from one of the 13 static or from the one moving bar on the target side of the display. In other words, it is possible to be correct in this two-altemative forcedchoice paradigm either because the maximum intemal response came from the moving singleton target or from one of the other 13 (or more generically, n - 1) static bars on that side. This probability depends on the balance of the two classes of colored bars on the heterochromatic trials, so Equations 2a and 2b look a little more complicated, but they adhere to the same rule as in Equations l a and lb. Using Gaussian noise distributions makes these equations more complicated because they involve integrals that can't easily be simplified, but the results are basically the same. Foley and Schwarz (1998) have used a similar model to explain the detection of single targets at threshold in the presence of multiple spatially-displaced distractors by adults. Some intuitive sense for these equations can be derived from considering the following simpler situation. Suppose that the decision rule is to look to the side with the element that produced the largest intemal response. Suppose further that all of the elements on the display are identical, that none of them is moving, and that one side has n elements and the other side of the display has m elements. What is the
274
Dannemiller
probability that the maximum response will come from the side with n elements? It is not even necessary to specify the form of the density function that describes the intemal noise in this case. All the elements have an equal probability of contributing the maximum response, so the probability that this maximum will occur on the side with n elements is simply n/(n + m). Equations 1a, lb 2a and 2b are generalizations of this intuition to the case where all of the elements are not identical, but where their mean intemal responses can be modeled as shifts along the same internal scale. I will concentrate first on the pattern of data predicted by this model in the two heterochromatic conditions described by Equations 2a and 2b. To give a feel for the predictions from Equations 2a and 2b, consider a case in which the mean intemal response to the moving target was 2.0, and the mean internal response to the lowersalience, pink bars was assumed to be -0.3. Both of these values are relative to the mean intemal response to the higher salience, red bars of 0. The values of 2.0 and 0.3 are similar to d' values in standard signal detection theory. As in standard signal detection theory, what matters is just the separation between the centers of the relevant distributions relative to their spreads and subsequent distributions of the maxima of a set of random draws from these distributions. With these parameters, and with 14 bars on each side of the display distributed in the ratio of 11:3 or vice versa, the model predicts 64% correct in the ipsilateral condition and 57% correct in the contralateral condition. The absolute values of these numbers are not as important as the fact that the maximum response model predicts the difference between the conditions when the static bars compete to capture initial orienting. Essentially, the difference in the percentage of correct judgments between these two conditions depends on the likelihood that the maximum response on a given trial will come from one of the static bars on the display. This probability is higher when more of the higher salience bars are placed ipsilaterally. Examination of Equations 2a and 2b makes it clear that altering the spatial balance of the two classes of bars should produce an effect on the percentage of correct judgments with the percentage being higher on ipsilateral trials than on contralateral trials because ,at is constrained to be negative. The denominators to these two equations are the same; only the numerators are different. The contribution of the motion singleton is the same in the numerator of both equations. The predicted difference in the percentage of correct judgments between these two conditions depends only on the relative salience of the red and pink bars for a fixed value of the moving target strength. The effects are straightforward given Equations 2a and 2b. Larger differences in salience between the two classes of bars will produce larger swings in the percentage of correct judgments as the balance between the two classes across the two sides of the display is manipulated. This ipsilateral versus contralateral difference will also depend on the strength of the signal from the moving stimulus if that is also allowed to vary. An extremely strong signal from this stimulus will always capture attention making the spatial balance between the two classes of static bars largely irrelevant; both the numerators and the denominators of Equations 2a and 2b will be dominated by the response to the moving singleton. In
Capture in Infancy
275
contrast, a very weak signal from this moving target will make it more likely that one of the static bars will capture attention, whether ipsilateral or contralateral to the target. Summary of model predictions. In the sections below I will concentrate on two of the predictions from this model. These predictions were made by assuming that the 2n bars on the display were divided equally between the two sides of the display, and that one of these 2n bars was the moving target. 1. For displays with only one type of element (e.g., all red bars), as n increases, the probability of a correct judgment decreases. a. This prediction can be understood as follows. The distribution of the maximum of the n response on the target side is dominated by the distribution of responses to the target because of its greater mean internal response. As more static bars are added on the target side, there is little effect on the distribution of the maximum from this side. In contrast, as more static bars are added to the side without the moving target, the distribution of the maximum shifts to higher levels on the internal scale. The probability of a correct response depends on the separation between these two distributions, and this separation decreases (the overlap increases) as the number of static bars on the display is increased. 2. When two types of static bars are displayed with the single moving target, the probability of a correct judgment will depend on the balance in the spatial distribution of the two types of bars. The percentage of correct judgments (indexed by the location of the moving target) should be higher when more of the higher contrast, saturation, etc. bars are placed on the same side as the moving target. Conversely, the percentage of correct judgments should be lower when more of the higher contrast, saturation, etc, bars are placed on the side opposite to the moving target. a. This prediction follows from the property of the model that if the two types of static bars differ in their salience (i.e., mean internal responses), then on trials when the internal response is not completely dominated by the moving target, the probability that one of the static bars on the target side will produce the maximum internal response depends on how many of these bars are on the target side. I will only discuss these two predictions in the sections below. The model makes other interesting predictions that I am in the process of testing. For example, one of the interesting properties of this model is that it predicts invariance in the percentage of correct judgments with uniform expansion of the set of elements on the display (see Yellott, 1977, p. 137 for a discussion of invariance under uniform expansion). In other words, if the number of elements of each type including the moving target is simply increased by a factor of k, then the model makes the prediction that this manipulation should have no effect. The factor of k would
276
Dannemiller
multiply both the numerator and the denominator for all the terms in Equations 1 and 2, so it would effectively cancel leaving the same predicted percentage of correct judgments. Here is a simple intuition for understanding this property of the model. If the probability that the maximum internal response will come from the left side of the display in a display with m identical elements on the left and n identical elements on the right is p, then the probability is also p that the maximum will come from the left side if the display is uniformly expanded to include km elements on the left and kn elements on the right.
Some Sample Data on Selectivity and Capture Spatial imbalance effects Figure 2 shows data from 32 3.5-month-olds using these red and pink bars (each having 66% luminance contrast with the white surround). Sixteen of these infants were tested with the moving target oscillating at 1.2 Hz through 0.75 degrees (left panel), while the other sixteen were tested using the same target oscillating through 1.0 degrees (peak to mean). The data in each panel have been averaged over the color of the moving target. Notice several things about these data. First, performance is above chance (50%) in all conditions. Second, not unexpectedly, performance is slightly better with the larger amplitude (right panel) at least in the 10090-
"6
0.75 degrees
1.0 degrees
80-
o
E
70-
L 0,.
6050-'40
.
.
.
.
.
.
.
.
.
i psi
.
.
.
.
.
contra
.
.
.
.
.
.
.
.
.
.
Balance Condition
.
.
i psi
.
.
.
.
.
.
.
.
.
.
.
.
contra
Figure 2. Percentages of correct judgments with the target oscillating through 0.75 degrees (peak to mean) (left panel) or 1.0 degrees (right panel). Data in each panel have been collapsed across the color of the moving target. The dashed, horizontal line indicates chance responding. The ipsilateral advantage in each case is predicted by the maximumresponse model.
Capture in Infancy
277
ipsilateral conditions. The important point is that performance is systematically related to the balance across the display between the red (more saturated) and pink (less saturated) bars, just as predicted by the model if a) saturation is assumed to lead to salience differences and b) if the mean internal response to the oscillating target is assumed to be monotonically related to the amplitude of oscillation. Putting 11 of the 14 red bars on the display on the side opposite to the target interferes with orienting to the target presumably by capturing attention on a nontrivial proportion of trials. This same pattern of ipsilateral performance being higher than contralateral performance has been observed repeatedly across various salience manipulations. For example, red bars are more effective in capturing attention than green bars when both are embedded in a white background (Dannemiller, 1998), but this difference disappears when they are embedded in a yellow background that more nearly equalizes the two color contrasts (Ross & Dannemiller, 1999). When pure luminance contrast is used, higher luminance contrasts capture attention more effectively than lower luminance contrasts, although the effect does not appear to be as strong as it is with color contrasts (Ross & Dannemiller, 1999). When pink and green are paired, the ipsilateral versus contralateral effect is eliminated as would be predicted from the red/green and red/pink results (Dannemiller, in press). Luminance decrements are more effective in capturing attention than equalmagnitude luminance increments just as would be predicted if perceived contrast at this young age followed Michelson contrast as it does in adults (Dannemiller & Stephens, under review). In all of these cases, an imbalance across the display between two classes of elements results in lesser percentages of correct judgments when most of the putatively higher salience elements appear contralaterally to the moving target just as predicted by the maximum response model. The maximum response model fails
Does this model of attentional capture fare as well in its other predictions? No. Equation 1 shows how the percentage of correct judgments should depend on signal strength,/~n, and on the number of static bars on the display. Again for very strong signals, performance should be independent of the number of bars on the screen because the numerator and denominator are dominated by the response to the signal, so the probability of discriminating the two sides of the display approaches 1.0 as it does in adults when there is a highly suprathreshold, featural singleton, and search is described as being parallel (McLeod, Driver & Crisp, 1988; although see also Theeuwes, Kramer & Atchley, 1999). For less powerful motion signals, the number of bars on the display should have a significant impact on orienting. For example, compare the predictions of the maximum-response model in Equation 1 when there are four bars on each side versus 14 bars on each side. With the mean internal response to the moving target singleton set at 2.0, the percentage of correct judgments with four bars on each side of the display should be 72%. With 14 bars
278
Dannemiller
on each side, the predicted percentage decreases to 59% - a difference of 13%. The actual difference will depend on the strength of the motion signal. To test this model, I collected data from 97 infants over the age range from seven weeks to 21 weeks. The amplitude of movement was fixed at 0.75 degrees (peak to mean) and the temporal frequency was set to 1.2 Hz for all infants. We know from previous work (Roessler & Dannemiller, 1997) that sensitivity to movement improves substantially over this age range, so there should be considerable variance in the overall percentages of correct judgments. This should make it easier to test the predictions of the model from Equation 1. Each infant was tested both with eight and with 28 bars on the display. Twenty-four trials of each were presented, and all of the bars on the display were red with the same luminance contrast (66%) against the white background. Figure 3 shows the percentages of correct judgments for all 97 subjects in each of the two conditions. The two regression lines show the best fits to the data from the two conditions. There was a steady increase in the overall percentage of correct judgments with age as expected. What was not expected, however, was the lack of an effect of the number of bars as predicted by the maximum-response model. This model basically predicts that near 50% correct, the differences between the two conditions should be close to zero. As the percentage of correct judgments increases toward approximately 75%, the model predicts approximately a 15% difference in favor of the condition with eight bars. As the percentage of correct 1.0-
0.9
Number of Bars O~"D
8
(~"'0
28
o o o
4.,
O
o
L O
0 r
9
9
9
9
o 9
o
9
9
9
0
o
0
9
9
9
9
0
o.~J..~,'63~
9
9
o
O0
9
o
9
o
ooc~
o
OqlO
0
o
o
O,..,O,''6""
o ..e--6
9
9
9
qlO 9
~
o
eo
o
9
9
0
9
~
0.6
0
13..
9
9
9 9
0.7
O 0 Q..
9
0.8
o
o
o
o 9
9 9
eo 9 oe
9 oo
o
9 o
9
9,
0.5 o Go
0.4
go
o
ooe
0.3
' ' ' 1 ' ' '
40
50
oo
o
o
9
o
9 oo
9
9
I ' ' ' l
60
' ' ' i ' ' ' 1 ' ' ' 1 ' ' ' 1 ' ' ' 1 ' ' ' 1 '
70
80
90
100
110
120
130
' ' 1 ' ' '
I
140
150
Age (days)
Figure 3. Proportions of correct judgments with eight (solid symbols) bars (4 per side) and 28 (open symbols) bars (14 per side) for infants from approximately seven to 21 weeks of age. Number was manipulated within subjects. All of the bars on this display were identical in color and size. The two lines are from the regressions of the proportion of correct judgments against age. The predictions from the maximum response model do not fit these data well. The proportions of correct judgments should have been systematically higher with eight bars compared to 28 bars, especially at the older ages.
Capture in Infancy
279
judgments then exceeds 75% and approaches 100%, the difference should once again approach zero. It is clear that the data do not conform to this prediction. Notice that in Equation 1, there is only one free parameter that relates the internal response to the motion singleton to the percentage of correct judgments for a given number of static bars. To test the predictions of the model more precisely, the percentage of correct judgments for each infant from the condition with eight bars was used to estimate the mean internal response to the motion target for that infant. This estimated parameter, ,am, was then used to predict the percentage of correct judgments for each infant with 28 bars. 2 Given the distribution of estimates of this parameter from the condition with eight bars, the data from the condition with 28 bars should have averaged approximately 10% lower in this sample across age. Instead, the average percentages of correct judgments in these two conditions were 63.4% (SEM = 1.12%) and 61.8% (SEM = 1.22%). The observed difference of 1.6% is far short of the theoretical prediction of 10%.
Considering Other Models Explaining the failure of the maximum-response model There are several possible explanations for why the model of attentional capture described by Equations 1, 2a, and 2b might have failed to capture the full set of results from the experimental conditions described above. In particular, the model appears to predict the differences in ipsilateral versus contralateral performance, but it fails when the number of static bars is manipulated. Here are several potential explanations for the failure of this model of attentional capture: 1. Increasing the number of bars from eight to 28 also increased the density of bars within the display because the size of the display was fixed. The proximity of static bars near the oscillating target was greater on average with 28 bars than it was with eight bars. Perhaps this increased sensitivity to the moving target so that the assumption of equal mean internal responses to the moving target in both conditions was wrong. 2. The method is just not sensitive enough to measure the predicted differences when the number of bars is manipulated. 3. While the actual numbers of bars on the display were eight and 28, the effective number of bars may have been far fewer leading to less substantial effects of this variable. 4. An alternative model of attentional capture explains both sets of results better. Consider these possibilities in turn. Increased density: Detection of the moving singleton might have been easier in the condition with 28 bars because the proximity of nearby static references bars was greater than it was in the condition with eight bars. In adults, the presence of nearby static reference lines enhances the detection of oscillatory motion,
280
Dannemiller
especially at temporal frequencies below approximately 5 Hz (Tyler & Torres, 1972). There is little evidence on this issue in the infant literature although Dannemiller & Freedland (1989) found no evidence that attention to movement was enhanced at 20 weeks of age by the presence of nearby static reference bars. Nonetheless, if sensitivity to the movement were affected in infants by the density of static bars in the vicinity of the moving target, then finding less than the predicted decrease in performance as the number of bars was increased wouldn't necessarily invalidate the maximum response model. Recall, that to generate the predictions for these two conditions I assumed that the mean internal response to the moving target remained invariant. If the proximity of the static reference bars plays a role in determining the value of the ,u,, parameter, then the model could still be correct but the value of this parameter could have been larger in the condition with 28 bars, and this could have offset the predicted drop in the percentage of correct judgments as more bars were added to the display. There are several ways to test the hypothesis that density near the moving target contributed to the differences or the lack of differences between the two conditions. On each trial, the location of every static bar (as well as the target) on the display was recorded. From these data, it was possible to calculate the average distance of the three static bars nearest to the moving target on each trial. I used three static bars because this comprised all of the static bars on the target side in the condition with eight bars on the display. The average distances between the target bar and the nearest three static bars were 117.75 arcmin (SD = 21.16 arcmin) and 61.88 arcmin (SD = 13.38 arcmin) for the trials with eight and 28 bars, respectively. Although the mean proximities of static bars near the moving target on displays with 8 and 28 bars differed as they should have, there was overlap on this measure for the two conditions in the range from 48.1 to 117 arcmin. In other words, even on trials with only three static bars on the same side as the moving target, these static bars were occasionally positioned as closely or even closer to the moving target as the three nearest static bars on displays with 28 bars. For each subject, I calculated the percentage of correct judgments with eight and 28 bars for those trials on which the density measure was within the overlapping range. The average percentages of correct judgments were 66.1% (SEM = 1.6%) and 61.9% (SEM = 1.3%) for eight and 28 bars, respectively. A two-tailed direct difference t-test showed that this difference was significant, t(96) = 2.40, p = .018. Thus, considering only those trials on which the densities of static bars near the moving target were similar in the two conditions did increase the difference in the percentage of correct judgments in the direction predicted by the model. The percentage of correct judgments held nearly constant with 28 bars at 61.9%. This percentage increased from 63.4% to 66.1% when the density was equalized. The density-adjusted difference (4.2%) is still less than half of the difference (10%) predicted by the maximum response model for this sample. Density differences between the two conditions may be part of the answer, but they are not the complete answer for why the model failed with the number manipulation.
Capture in Infancy
281
Sensitivity of the measurements: As noted above, the observed difference between the percentages of correct judgments with eight versus 28 bars was not significant. Could this simply be the influence of large amounts of measurement error leading to insensitive measures? It is hard to argue for this possibility because of the results from numerous previous studies (see above) with heterochromatic conditions that yielded robust differences between ipsilateral and contralateral conditions. In these previous studies, ipsilateral versus contralateral differences from 5% to 10% in the mean percentages of correct judgments were routinely detected by the experimental procedures. Notice, that this null effect of the number of static bars on the screen is exactly what would be predicted from studies with adults with featural singletons. Indeed, many researchers studying visual search in adults would be surprised by the predictions of the maximum response model for a decrement in performance as set size was increased given that a single moving target is so easily distinguished from its static neighbors. Visual search is usually labeled parallel when detection of a singleton target is relatively uninfluenced by the number of distractors on the display. For all of the reasons listed above, however, visual search in adults is very different from the paradigm with infants employed here, and sensitivity to movement is not nearly as high in infants in this range as it is in adults (cf. Roessler & Dannemiller, 1997 with Wright & Johnston, 1985). Perhaps the maximum response model is wrong, and detection of the moving target by these infants should just be considered a case of parallel detection because the percentage of correct judgments was relatively unaffected by an increase in the number of bars from eight to 28. There is one potential problem with this argument. If discrimination of the two sides of the display and subsequent orienting were parallel in these infants in these two conditions, then why would the spatial balance manipulation in heterochromatic conditions produce such reliable effects? Detection should be parallel in those conditions as well leading to no effects when the spatial balance between bars of the two colors was manipulated. Yet, the procedures used here revealed reliable effects in the heterochromatic conditions, but essentially no effects in the homochromatic conditions. Additionally, parallel detection is usually indexed by almost perfect performance, but the average percentages across conditions were near 65%. It seems unlikely that the answer to why the model failed to predict the results of the number manipulation was because of low power or insensitive measures. Alternatively, one could argue that when all bars on the display are identical, then the process of detecting the moving singleton is explained by a model that differs qualitatively from the model used to explain the data from conditions with different types of bars on the display simultaneously. One might also suppose that parallel detection occurred, but after the detection of the moving target an additional source of noise reduced the overall percentage of correct judgments. These are certainly both possibilities, but it would be more parsimonious to find a
282
Dannemiller
single model that could explain both sets of results or to avoid proliferating different noise sources. I consider this issue in more detail below. Actual versus effective number of bars: The calculations from Equation 1 used the actual number of bars in each display: eight and 28. There is no guarantee that all of these bars affected the process that determines orienting in these infants. Any factor that tended to reduce the actual number of bars that influenced the orienting process could invalidate the predictions of the model. For example, suppose that only the bars near the center of the screen actually influenced orienting. Fixation was drawn to the center of the display before starting each trial, so it is certainly possible that bars near the center of the display might be weighted more heavily in the orienting process than bars in the periphery. There is some evidence for a gradient of detectability away from the center of the visual field in infants in this age range (Dannemiller & Nagata, 1995). Conversely, this type of orienting to peripheral stimuli tends to be driven more strongly by elements in the temporal visual field than by elements in the nasal visual field (Rafal, Henik & Smith, 1991), so it is also possible that the bars near the edges of the display may have been more effective than bars near the center. Recall that the bars were positioned randomly among seven equal-size imaginary columns on each side of the display. The argument above for a differential gradient of effectiveness makes a clear prediction. A reduction in the actual number of bars to some effective number of bars should result in a proportional reduction across the two conditions, so that the effective number of bars on each side should remain in the ratio of 4:14. For example, if either of the above gradient effects held with perhaps half of each side of the display containing effective elements, then the effective manipulation would be 2:7. This is easily accommodated in the model. In fact, the model actually predicts slightly larger differences when the effective number of bars changes in this direction, a For a given value of the mean internal response to the moving singleton (e.g., ,am = 2.0), the model predicts percentages of correct judgments of 72% and 59% for four versus 14 bars per side, 81% versus 66% for two versus seven bars per side, and 88% versus 74% for one versus 3.5 bars per side. Once again, either the model is incorrect or the lack of a difference in the percentages of correct judgments is not explained by an effective number of bars that is less than the actual number of bars. Alternative models." It is worth considering alternative models that might explain all of the results described above. I will consider one alternative model next. Others are certainly possible. There may be a clue to an alternative model in the fact that the maximum response model succeeds in predicting the ipsilateral versus contralateral difference under heterochromatic conditions but fails to predict the data from homochromatic conditions when the number of bars is manipulated. The ipsilateral versus contralateral manipulation is inherently a spatially directional manipulation. One side is loaded with more higher salience bars than the other side. In contrast, the manipulation of the number of bars does not involve any directional imbalance
Capture in Infancy
283
beyond that induced by the motion singleton. Both sides of the display are stochastically identical with the exception of the moving singleton. But the orienting response is inherently directional. It is not possible to look to the right and the left simultaneously. A final common pathway resolves competition between the two directional responses, and one wins out. The alternative model that might capture both results is one that recognizes the global nature of this directional response competition. Suppose that the maximum-response model is wrong in the sense it assumes that the bars on the screen ultimately lead to 28 different internal responses that are independent at the point at which overt, directional orienting movements are planned (e.g., saccades and head movements). According to this model, the individual bars on the display are differentiated from each other and lead to independent internal responses. The model assumes independence of these responses once they are perturbed by internal noise. I will call this assumption the differentiated visual field assumption. The differentiated visual field assumption may not be correct for infants in this age range under these conditions.
A hemifield comparison model The elements that appear on the display are certainly visible and easily differentiated by a normal adult human visual system. Are they necessarily differentiated by the visual systems of young human infants? We know from numerous studies that acuity (Dobson & Teller, 1978) and contrast sensitivity (Gwiazda, Bauer, Thorn & Held, 1997) in this age range (2 to 5 month) are far from adult-like. Is it possible that the initial response to the appearance of these bars in the visual field is much less differentiated than it is in adults to the point that there is only an initial, coarse hemifield differentiation? One way to model this would be to assume that the effective variable that determines orienting is based on the aggregate or summed response to each half of the visual field. Once these two aggregate responses have been computed, they are compared (differenced) and perhaps some post-comparison noise is added to the computed difference. Orienting is then directed to the hemifield with the larger aggregate response. Large receptive fields with their attendant extensive spatial summation of local responses to the individual elements could implement this part of the alternative model. If the internal noise that perturbed the individual responses prior to their summation were negligible relative to the post-comparison noise, then this model would essentially predict no difference between the percentages of correct judgments in the condition with eight versus 28 bars. It is easy to understand this prediction. Essentially all that matters in this model is that there is a moving element on one side of the visual field that is substituted for a static element on the other regardless of whether there are eight or 28 bars. This leads to the same difference in the aggregate response to the two sides of the display in both conditions. This difference may be corrupted by some additional decision noise leading to less than
284
Dannemiller
perfect performance even for a relatively strong motion singleton, but importantly, as long as the two sides differ only by one moving target, then this model predicts no difference as the number of static bars is increased. The model explicitly denies the differentiated visual field assumption. Instead, it is the global balance between the two hemifields that drives orienting to one side or to the other. This alternative model may be referred to as the hemifield comparison model. Does this model also predict the ipsilateral versus contralateral difference observed in the numerous studies reported above? Yes. If there is a difference in salience (mean intemal response) to the two classes of static bars that are distributed unevenly across the two halves of the display, then there will be an additional difference in the aggregate responses between the two sides of the display beyond the motion singleton. When more of the higher salience static bars are on the same side as the moving target, this imbalance will favor the target side. When more of the higher salience static bars are on the side opposite to the target, then this imbalance will favor the contralateral side and compete with the moving target. The outcome of the competition will depend on the strength of the motion stimulus and the relative saliences of the two classes of static bars. The important point is that the hemifield comparison model qualitatively predicts the pattern of results in both the homochromatic condition when the number of static bars is manipulated and in the heterochromatic conditions when the balance between the two sides of the display is manipulated. Additional experiments are underway to test quantitative predictions from this altemative model. Arguments against the hemifield comparison model. It is worth considering some of the implications of the altemative model. One of the most striking aspects of this model is that it implies a largely undifferentiated visual field early in development. This implication would appear to be contradicted by numerous studies with young infants showing differentiated responses to elements in the visual field. Two types of studies are relevant. First, eye movement studies over this age range clearly show that infants direct saccades to individual elements in the visual field (e.g., Aslin & Salapatek, 1975; Bronson, 1994). Second, changes to the internal features of pattems are discriminable by 4-month-olds although 1-month-olds have difficulty (Milewski, 1976). These latter studies imply that attention can be directed to specific parts of a pattern, and not just to the pattern as a whole. Both types of studies would appear to contradict the idea that multiple elements within the visual field are only coarsely differentiated early in development. The contradiction may be more apparent than real. It is important to keep in mind that there are methodological differences between the current results and paradigm and the previous studies that imply a differentiated visual field. First, recall that the empirical results reported above testing the maximum response model relied mostly on orienting behavior within the first two seconds after the simultaneous appearances of all of the bars in the visual field. In contrast, scanning eye movement studies with infants typically involve extended inspection (e.g., Bronson, 1994). Second, in the pattern discrimination
Capture in Infancy
285
studies, it was also the case that the patterns were available for inspection for long periods of time, and the measures of discrimination generally involved cumulative durations of fixation on these patterns. Finally, in the eye movement study by Aslin and Salapatek (1975), the visual field was either populated by one or at most two elements on each trial. This is very different from the studies reported above with eight or 28 small bars scattered more or less randomly across the visual field. It is possible that the initial response to the onset of multiple elements in the visual field at this age is a transient, low spatial resolution response that globally compares the two hemifields to generate an initial orienting reaction. After the elements have been present in the visual field for some time, more spatially refined and local responses may then guide subsequent inspection of the visual field. The studies reported above and the previous literature may not necessarily be in conflict because of the very different temporal parameters in these studies and because of the complexity of the visual patterns. Lasky and Spiro (1980) reported that 5-month-old infants required at least two seconds between the offset of a brief visual pattern and the onset of a mask before showing showing recognition of the familiar pattern. Masks that followed the offset of the pattern by less than two seconds disrupted recognition. This could imply that the information available to infants as old as five months within the first two seconds of the onset of a visual pattern is too coarse to support good pattern recognition although it may be sufficient to support a global comparison of the two hemifields as indicated in the results reported above. Hemifield comparisons in adults? Is there any evidence for this type of coarse, hemifield competition process in mature visual attention? In adults, there are several lines of evidence suggesting this type of hemifield competition. Rizzolatti, Riggio, Dascola and Umilta (1987) have argued for a premotor theory of attention. Part of this theory involves the idea that eye movements and attention are closely linked. There is a cost if an eye movement program to one hemifield has to be cancelled and a new movement to the opposite hemifield reprogrammed. Moving the eyes in one direction versus the other involves different muscle activation patterns, so there is a sharp divide of attention at the vertical meridian. Shifting attention horizontally across the vertical meridian incurs costs that are more severe than would be predicted solely from the distance between an invalid cue and a target. Although Rizollatti et al. (1987) do not argue for complete homogeneity of attention within a hemifield, the premotor theory may be compatible with the alternative model described above because in both cases, a response in one direction or the other has to be resolved based on a comparison of activity in the two hemifields. A second line of evidence for hemifield competition comes from the phenomenon of visual extinction (e.g., Friedrich, Egly, Rafal & Beck, 1998; Mattingley, Pisella, Rossetti, Rode, Tiliket, Boisson, & Vighetto, 2000). Individuals with unilateral brain lesions, especially right parietal cortical lesions can detect single targets in the contralesional visual field when these targets are presented alone. They have great difficulty detecting these targets, however, if a competing
286
Dannemiller
stimulus is simultaneously presented in the ipsilesional hemifield. Mattingley et al. (2000) have suggested that this type of extinction may result from an inability to divide attention across the vertical meridian. They attribute this difficulty ultimately to the conflicting requirements of programming eye movements in opposite directions. Once again, a process that requires a resolution of conflicting movement into the two hemifields appears to play an important role in how visual attention works. I would suggest that this hemifield competition may be revealed in the studies reported above with young infants. A coarse comparison of stimulation in the two hemifields is used to resolve the problem of where to look first when multiple elements appear in the visual field simultaneously. Nakayama and Mackeben (1989) have argued that in adults there is an initial transient component to focal visual attention that differs both in locus and in time course from a more sustained component. Additionally, Nakayama and Mackeben (1989) have argued that both the transient and sustained components of focal visual attention are cortical in origin, although they don't necessarily share the same cortical substrates. The transient component is supposed to operate at earlier stages of visual cortical processing. Both the transient nature of this system and it's cortical substrate are compatible with aspects of the infant data reported above. The hemifield comparison model is meant to apply only to the initial orienting response to the appearance of multiple potential attentional targets in the visual field. The transient aspect of the process is similar to the transient portion of the Nakayama and Mackeben (1989) model. Additionally, several of the heterochromatic studies cited above apparently involved the differential salience of color contrast. Color is thought to be processed cortically (Lueck, Zeki, Friston, Deiber, Cope, Cunningham, Lammertsma, Kennard, & Frackowiak, 1989). Other data on hemifield comparisons in infants. What evidence in infants is there for these kinds of hemifield differences? Monocularly, there are hemifield differences in the simple detection of visual targets even in young infants. Targets that appear in the temporal visual field are more easily detected than identical targets that appear at the same eccentricity in the nasal visual field (Lewis & Maurer, 1992). Fogel, Karns & Kawai (1990) have argued for a model of right-side dominance for attention control in young infants. Liegeois and De Schonen (1997) showed that simultaneous attention to the two hemifields emerges very late in development: as late as 24 months af age. In all of these studies, there is evidence that visual attention may involve coarse, hemifield mechanisms from early in development. The hemifield comparison model is compatible with this type of coarse differentiation. Whether it provides a quantitatively robust account of the development of visual capture early in life can only be determined from future experiments. Conclusions
A maximum response model of visual capture, which assumes a welldifferentiated visual field, was tested with infants from approximately two months of
Capture in Infancy
287
age to five months of age. The model predicted observed differential salience effects in initial orienting to the simultaneous appearance of multiple potential attentional targets when those salience effects involved an imbalance between the two sides of the stimulus display. When more of the putatively higher salience elements in the visual field appeared ipsilaterally to a moving singleton target, orienting was biased toward the target side. In contrast when more of these higher salience elements appeared contralaterally to the moving target, competition ensued and attention was drawn less reliably to the moving target. These effects could be explained by a signal detection model in which it is assumed that initial orienting is determined by the element in the visual field that leads to the maximum internal response. In contrast, this maximum response model failed to predict the effects of a manipulation of the number of elements in the visual field. Whereas the model predicted a decrease in orienting to the moving singleton target as more bars were added to both sides of the display, the percentage of correct judgments was essentially unaffected by the number of elements in the visual field. The density of static elements near the moving target differed between the two conditions and may have been responsible for some of the reduction in the size of the predicted effect, but it was probably not responsible for the full reduction. An alternative hemifield comparison model was proposed to account for the results of both types of manipulations (spatial imbalance and number). According to this model, attention is captured initially by the side of the visual field with the greater aggregate response. In contrast to the maximum response model that assumes that all of the elements in the visual field lead to independent, noiseperturbed responses (the differentiated visual field assumption), the hemifield comparison model assumes that a coarse comparison (difference computation) of the two visual hemifields drives initial orienting. Evidence from adult models of visual attention involving costs associated with switching attention across the vertical meridian, visual neglect and a transient component to focal visual attention bear some similarity to the proposed hemifield comparison model with infants. Data on the development of visual hemifield asymmetries are also compatible with this model. Additional studies will be necessary to test the quantitative predictions from this alternative model of attentional capture during early postnatal life. Footnotes
ZSee Sheliga, Craighero, Riggio & Rizzolatti (1997) for similarities between spatial attention and directional response systems. )-Especially at the youngest ages, the percentage of correct judgments with eight bars was sometimes below chance (50% correct). This was considered measurement error from binomial sampling, and the percentage of correct judgments was set to 50% to estimate ,am. This yields an estimate of 0 for this parameter and a predicted percentage of correct judgments of 50% with 28 bars.
288
Dannemiller
3Equation l b can be used to generate predictions assuming that the effective number of bars on each side of the display is approximately half of the actual number of bars. For example, instead of 13 static bars on the target side and 27 total static bars, one would use 6.5 static bars on the target side (numerator) and 13.5 total static bars (denominator). The half of a bar could be handled conceptually (although technically not exactly) by assuming that on half the trials there were six effective bars and on half the trials there were seven effective static bars on the target side. Instead of comparing the predictions of Equation 1b for 8 versus 28 bars, they can be compared for 4 versus 14 bars. When this is done, one of the properties of the model is that the same ratio produces larger differences in the percentage of correct judgments when the number of bars is small. The intuition is that as the number of bars grows, the predicted percentage of correct judgments approaches an asymptote of 50% because the contribution of the moving target, as it appears in both the numerator and in the denominator, gets diluted by the additional static bars. Although the same 8:28 ratio can be realized in many ways, as the number of bars grows, the floor on the percentage of correct judgments at 50% constrains the predicted difference between two conditions that differ in this ratio. References
Aslin, R.N., & Salapatek, P. (1975). Saccadic localization of peripheral targets by the very young human infant. Perception & Psychophysics, 17, 293-302. Atkinson, J., Hood, B., Wattam-Bell, J., & Braddick, O.J. (1992). Changes in infants' ability to switch visual attention in the first three months of life. Perception,
21,643-653. Banks, M.S. & Ginsburg, A.P. (1985). Infant visual preferences: A review and new theoretical treatment. In: Reese. H.W (Ed.), Advances in Child Development and Behavior. New York: Academic Press. Bronson, G. (1994). Infants' transitions toward adult-like scanning. Child Development, 65, 1243-1261. Casey, B.J., & Richards, J.E. (1988). Sustained visual attention in young infants measured with an adapted version of the visual preference paradigm. Child Development, 59, 1514-1521. Catherwood, D., Skoien, P., & Holt, C. (1996). Colour pop-out in infant response to visual arrays. British Journal of Developmental Psychology, 14, 315326. Cohen, L. (1972). Attention-getting and attention-holding processes of infant visual preferences. Child Development, 43, 869-879. Dannemiller, J.L. (1998). A competition model of exogenous orienting in 3.5-month-old infants. Journal of Experimental Child Psychology, 68, 169-201. Dannemiller, J. L. (in press). Relative color contrast drives competition in early exogenous orienting. Infancy.
Capture in Infancy
289
Dannemiller, J., & Freedland, R. (1989). The detection of slow stimulus movement in 2- to 5-month olds. Journal of Experimental Child Psychology, 47, 337-355. Dannemiller, J.L., & Nagata, Y. (1995). The robusmess of infants' detection of visual motion. Infant Behavior & Development, 18, 371-389. Dannemiller, J. L., & Stephens, B. (under review). Contrast polarity and moving target detection in young human infants. Journal of Vision. Dobson, V., & Teller, D.Y. (1978). Visual acuity in human infants: a review and comparison of behavioral and electrophysiological studies. Vision Research, 18, 1469-1483. Fantz, R.L. (1958). Pattern vision in young infants. Psychological Record, 8, 43-47. Fogel, A., Karns, J., & Kawai, M. (1990). Lateral asymmetry in attention for three-month-old human infants during face-to-face interaction with mother.
Developmental Psychobiology, 23, 1-14. Folk, C.L., & Remington, R. (1998). Selectivity in distraction by irrelevant featural singletons: Evidence for two forms of attentional capture. Journal of Experimental Psychology: Human Perception and Performance, 24, 847-858. Foley, J.M., & Schwarz, W. (1998). Spatial attention: effect of position uncertainty and number of distractor patterns on the threshold-versus-contrast function for contrast discrimination. Journal of the Optical Society of America Part A, Optics and Image Science, 15, 1036-1047. Freeseman, L.J., Colombo, J., & Coldren, J.T. (1993). Individual Differences in Infant Visual Attention - 4-Month-Olds' Discrimination and Generalization of Global and Local Stimulus Properties. Child Development, 64, 1191-1203. Friedrich, F.J., Egly, R., Rafal, R.D., & Beck, D. (1998). Spatial attention deficits in humans - a comparison of superior parietal and temporal-parietal junction lesions. Neuropsychology, 12, 193-207. Gayl, I.E., Roberts, J.O., & Werner, J.S. (1983). Linear systems analysis of infant visual pattern preferences. Journal of Experimental Child Psychology, 35, 3045. Green, D.M. & Swets, J.A (1966). Signal Detection Theory and Psychophysics. New York: Wiley. Gwiazda, J., Bauer, J., Thorn, F., & Held, R. (1997). Development of spatial contrast sensitivity from infancy to adulthood - psychophysical data. Optometry and Vision Science, 74, 785-789. Hood, B.M. (1993). Inhibition of Return Produced by Covert Shifts of visual Attention in 6-Month-Old Infants. Infant Behavior and Development, 16, 245-254. Hood, B.M., & Atkinson, J. (1993). Disengaging visual attention in the infant and adult. Infant Behavior and Development, 16, 405-422. Johnson, M.H., & Tucker, L.A. (1996). The development and temporal dynamics of spatial orienting in infants. Journal of Experimental Child Psychology, 63, 171-188.
290
Dannemiller
Karmel, B.Z. (1969). The effect of age, complexity, and amount of contour on pattern preferences in human infants. Journal of Experimental Child Psychology, 7, 339-354. Lasky, R.E., & Spiro, D. (1980). The processing of tachistoscopically presented visual stimuli by five-month-old infants. Child Development, 51, 12921294. Lewis, T.L., & Maurer, D. (1992). The development of the temporal and nasal visual fields during infancy. Vision Research, 32, 903-911. Liegeois, F., & De Schonen, S. (1997). Simultaneous attention in the two visual hemifields and interhemispheric integration: A developmental study on 20-to26-month-old infants. Neuropsychologia, 35, 381-385. Lueck, C.J., Zeki, S., Friston, K.J., Deiber, M.P., Cope, P., Cunningham, V.J., Lammertsma, A.A., Kennard, C., & Frackowiak, R.S.J. (1989). The colour centre in the cerebral cortex of man. Nature, 340, 386-389. Matsuzawa, M., & Shimojo, S. (1997). Infants' fast saccades in the gap paradigm and development of visual attention. Infant Behavior and Development, 20, 449-455. Mattingley, J.B., Pisella, L., Rossetti, Y., Rode, G., Tiliket, C., Boisson, D., & Vighetto, A. (2000). Visual extinction in oculocentric coordinates: a selective bias in dividing attention between hemifields. Neurocase, 6, 465-475. McLeod, P., Driver, J., & Crisp, J. (1988). Visual search for a conjunction of movement and form is parallel. Nature, 332, 154-155. Milewski, A.E. (1976). Infants' discrimination of internal and external pattern elements. Journal of Experimental Child Psychology, 22, 229-246. Nakayama, K., & Mackeben, M. (1989). Sustained and transient components of focal visual attention. Vision Research, 29, 1631-1647. Nothdurft, H.C. (2000). Salience from feature contrast: additivity across dimensions. Vision Research, 40, 1183-1201. Palmer, J., Ames, C.T., & Lindsey, D.T. (1993). Measuring the effect of attention on simple visual search. Journal of Experimental Psychology: Human Perception and Performance, 19, 108-130. Palmer, J., Verghese, P., & Pavel, M. (2000). The psychophysics of visual search. Vision Research, 40, 1227-1268. Quinn, P.C., & Bhatt, R.S. (1998). Visual pop-out in young infants: Convergent evidence and an extension. Infant Behavior and Development, 21,273288. Rafal, R., Henik, A., & Smith, J. (1991). Extrageniculate contributions to reflex visual orienting in normal humans: A temporal hemifield advantage. Journal of Cognitive Neuroscience, 3, 322-328. Rizzolatti, G., Riggio, L., Dascola, I., & Umilta, C. (1987). Reorienting attention across the horizontal and vertical meridians: evidence in favor of a premotor theory of attention. Neuropsychologia, 25, 31-40.
Capture in Infancy
291
Roessler, J., & Dannemiller, J. (1997). Changes in infants' sensitivity to slow displacements over the first 6 months. Vision Research, 37, 417-423. Ross, S., & Dannemiller, J.L. (1999). Color contrast, luminance contrast and competition within exogenous orienting in 3.5-month-old infants. Infant Behavior and Development, 22, 383-404. Sheliga, B.M., Craighero, L., Riggio, L., & Rizzolatti, G. (1997). Effects of spatial attention on directional manual and ocular responses. Experimental Brain Research, 114, 339-351. Teller, D.Y. (1979). The forced-choice preferential looking procedure: A psychophysical technique for use with human infants. Infant Behavior and
Development, 2, 135-153. Theeuwes, J. (1991). Exogenous and endogenous control of visual attention: The effect of visual onsets and offsets. Perception & Psychophysics, 49, 83-90. Theeuwes, J., Kramer, A.F., & Atchley, P. (1999). Attentional effects on preattentive vision: spatial precues affect the detection of simple features. Journal of Experimental Psychology: Human Perception and Performance, 25, 341-347. Tyler, C.W., & Torres, J. (1972). Frequency respnse characteristics for sinusoidal movement in the fovea and periphery. Perception and Psychophysics, 12, 232-236. Wright, M.J., & Johnston, A. (1985). The relationship of displacement thresholds for oscillating gratings to cortical magnification, spatiotemporal frequency and contrast. Vision Research, 25, 187-193. Yantis, S., & Egeth, H.E. (1999). On the distinction between visual salience and stimulus-driven attentional capture. Journal of Experimental Psychology: Human Perception and Performance, 25, 661-676. Yellott, J.I. (1977). The relationship between Luce's choice axiom, Thurstone's theory of comparative judgment, and the double exponential distribution. Journal of Mathematical Psychology, 15, 109-144. Author Note
James L. Dannemiller is in the Department of Psychology and the Waisman Center at the University of Wisconsin - Madison. This research was supported by NICHD R01 HD32927. I thank Mari Riess Jones for helpful comments on an earlier version of this manuscript. I thank Jacqueline Roessler for observing the infants, Manya Qadir for scheduling the infants, and Daniel Replogle for the computer programming. Correspondence concerning this article should be addressed to James L. Dannemiller, Waisman Center, University of Wisconsin - Madison, 1500 Highland Avenue, Madison, WI 53705-2280. Electronic mail may be sent via the Internet to
[email protected].
This Page Intentionally Left Blank
Attraction, Distraction,and Action: MultiplePerspectiveson AttentionalCapture C. Folkand B. Gibson(Editors) 9 ElsevierScienceB. V. All rights reserved.
12
293
Attentional Capture, Attentional Control and Aging
Arthur F. Kramer, Charles T. Scialfa, Matthew S. Peterson and David E. Irwin
The goal of the present chapter is to address the issue of whether there are age differences in attentional capture and if so to explore the nature of such age-related changes in attention. However, before addressing this specific question we provide a brief description of the theoretical context within the realm of cognition and aging in which the issue of attentional capture might be addressed. More specifically, we discuss current models of cognition and aging, such as general slowing models, inhibitory deficit models, and executive control models, and describe how these models treat the issue of attentional control and attentional capture. We then provide a brief discussion of the major issues of concern to the field of attentional capture and control. Next, we review the literature of relevance to age-related changes in attentional capture and control in a number of different domains including spatial cueing, visual search, focused attention, and overt attention. Finally, we conclude with suggestions for future research on aging and attentional capture.
Cognitive Aging: Theory and Research Perhaps the most robust observation in the literature on cognitive aging over the past several decades has been that performance declines on a multitude of laboratory and real-world tasks from young adulthood to old age. Indeed, decreases in performance during aging have been observed from the simplest laboratory tasks such as simple, choice and disjunctive reaction time to complex real-world tasks such as driving, flying and the operation of automated teller machines (Birren & Schroots, 1996; Mead & Fisk, 1998; Salthouse, 1996; Tsang, 1996). However, although observations of age related declines in cognition abound, the psychological mechanisms which underlie these observations continue to be studied and debated. One line of research and theorizing has focused on processing speed as a central explanatory construct for age-related declines in cognitive function. Early processing speed models suggested that slowing was the result of a general decline in function as a consequence of increased noise in the central nervous system attributable to neuronal and glial degeneration (Birren, 1965). More recent slowing models have suggested that multiple independent factors are responsible for slowing under different conditions and in different tasks (Cerella, 1985; Lawrence, Myerson & Hale, 1998; Salthouse, 1996). For example, distinctions have been made between
294
Kramer, Scialfa, Peterson and Irwin
verbal and visuospatial slowing with visuospatial processing showing more substantial age-related declines (Jenkins et al., 2000). Indeed, a large body of research now suggests that a substantial amount of age-related variance in complex cognitive tasks can be accounted for by a relatively small set of slowing factors. Another line of inquiry concerning cognitive decline during aging can be traced to a seminal paper by Hasher and Zacks (1988). In that paper, the authors provided a detailed and critical review of age-related differences in the inhibition of representations of and actions towards environmental events as well as information stored in long term and working memory. Hasher and Zacks (1988; see also Zacks & Hasher, 1997) suggested that age-related processing deficits in a variety of cognitive skills can be accounted for by a decrease in the efficiency of inhibitory processing. More specifically, inefficient inhibition could result in failures of selective attention which may, in tum, result in the intrusion of task irrelevant information into working memory. The consequences of these inhibitory failures would include both increased processing time and reductions in recognition and recall of relevant information. A thorough review of the evidence for and against this general inhibitory hypothesis is beyond the scope of this paper (see Burke, 1997; McDowd, 1997; Zacks & Hasher, 1997 for reviews of this literature). However, within the context of visual attention it is becoming increasingly apparent that specific rather than general inhibitory deficits are observed during the course of normal aging. Consider a classic interference paradigm, the Stroop task. In this task subjects are to verbalize the color in which a word is printed while ignoring the semantic content of the word. Older adults take substantially longer to verbalize colors which are inconsistent with the semantics of the word (e.g. the word blue painted in red ink - Houx, Jolles & Vreeling, 1993; Kwong See & Ryan, 1995; Rogers & Fisk, 1991; Spieler, Balota & Faust, 1996; but see Salthouse, 1996; Vakil, Manovich, Ramati & Blachstein, 1996). Thus, it would appear that older adults have more difficulty suppressing word meaning during color naming. The negative priming paradigm has also served as a popular testbed for this proposed age-related general inhibitory decline. In the negative priming task, subjects are asked to respond to targets and ignore simultaneously presented distractor stimuli. The critical comparison is between trials in which a distractor from trial n-1 becomes a target on trial n (i.e., the ignored repetition (IR) condition) and trials in which different target and distractor stimuli are presented on trials n and n-1 (i.e., the control condition). In general, longer reaction times (RTs) are obtained in IR conditions than in control conditions, defining the negative priming effect. The negative priming effect is quite robust and has been obtained in a variety of tasks, and for a number of stimulus and response types (Neill & Valdes, 1996). Initial aging and negative priming studies suggested that older adults failed to produce the difference between ignored repetition and control conditions seen in younger adults (Kane et al., 1994; Hasher, Stoltzfus, Zacks & Rympa, 1991; Tipper, 1991). These results were interpreted as indicating a failure of selective inhibition by the older adults. However, more recent studies (Kieley & Hartley, 1997; Kramer
Capture, Control and Aging
295
et al., 1994; Sullivan & Faust, 1993; Sullivan, Faust & Balota, 1995) have reported equivalent negative priming effects for young and old adults. Although there is no agreement on why this discrepancy exists, possibilities include the sensitivity of the experimental design to the relatively small negative priming effect (on the order of 10 to 20 ms) and the difficulty of selection in the task (i.e. larger negative priming effects have been reported when selection of the target is difficult; Moore, 1994). The greater variability in response time for older than younger adults may mask the negative priming effects for older adults. It is also conceivable that since the utilization of inhibition in the negative priming task is presumably effortful (Engle, Conway, Tuholski & Shisler, 1995), its use by older adults will only be observed in difficult selection tasks. Finally, the negative priming effect may be subserved by multiple inhibitory mechanisms only some of which are sensitive to aging (May et al., 1995; Zacks & Hasher, 1997). In any event, it is clear from the extant literature that at least some aspects of inhibitory processing are compromised in older adults (Dempster, 1992; Kramer et al., 1994). Whether such processing deficits impact manifestations and mechanisms which support attentional capture (and attentional control) in older adults will be addressed below. Although inhibitory failures might be implicated in lifespan differences in attentional capture, other types of attentional control might also play a role. A broader view of age related changes in cognition, which subsumes inhibitory control, has been offered in the form of executive control/frontal lobe theories of aging. In his recent critical review of the literature on the neuroanatomy, neurophysiology and neuropsychology of aging West (1996) concluded that relatively strong evidence exists for the frontal lobe hypothesis of cognitive aging (see also Dempster, 1992; Kramer et al., 1994). The frontal lobe hypothesis suggests that older adults are disproportionately disadvantaged on tasks that rely heavily on cognitive processes (e.g. executive control processes) that are supported, in large part, by the frontal and prefrontal lobes of the brain. Indeed, there is a good deal of evidence to suggest that morphological and functional changes in brain activity do not occur uniformly during the process of normal aging (Raz, 2000). Researchers have reported substantially larger reductions in gray matter volume in association areas of cortex, and in particular in the prefrontal and frontal regions, than in sensory cortical regions (Coffey et al., 1992; Pefferbaum et al., 1992; Raz et al., 2000). Studies of functional brain activity employing Positron Emission Tomography (PET) have reported similar trends, with prefrontal regions showing substantially larger decreases in metabolic activity than sensory areas of cortex (Azari et al., 1992; Salmon et al., 1991; Shaw et al., 1984). These data on the structure and function of the aging brain are consistent with numerous reports of large and robust age-related deficits in the performance of tasks that are largely supported by the frontal and prefrontal regions of the cortex, as compared to relatively small age-related deficits on non-frontal lobe tasks (Ardila & Rosselli, 1989; Daigneault et al., 1992; Shimamura & Jurica, 1994). Indeed, many of the tasks subserved, in large part, by the frontal lobes involve processes associated with executive control functions such as the selection, control, and
296
Kramer, Scialfa, Peterson and Irwin
coordination of computational processes that are responsible for perception and action. For example, large age-related deficits have generally been reported when adults are required to perform two or more tasks at the same time or to rapidly shift emphasis among tasks (Korteling, 1991; Kramer et al., 1999; Mayr & Liebscher, 1996; Rogers et al., 1994). Functional magnetic resonance imaging and positron emission tomography studies have shown enhanced activation of regions of the prefrontal and frontal cortices when two tasks are performed together but not when they are performed separately (Corbetta et al., 1991; D'Esposito et al., 1995). Kliegl and colleagues (Mayr & Kliegl, 1993; Verhaeghen et al., 1997) have also reported that reliably larger age-related performance decrements are observed in tasks which require coordinative operations (i.e. mental arithmetic operations in which a product must be held in working memory as other computations are performed) than for tasks which require sequential operations (i.e. mental arithmetic operations that do not require storing and retrieving products from working memory while carrying out arithmetical operations). Furthermore, such differences were independent of general age-related differences in the speed of performance. In summary, a good deal of behavioral, neuropsychological, and neuroanatomical evidence has accrued in recent years which supports the view that selective aspects of executive control (e.g. inhibitory processing, coordination of multiple skills and tasks) decline with advancing age. Given the role of control processes in attention we might then expect to observe age related changes in attentional capture, at least under conditions in which top-down factors such as expectations and intentions oppose stimulus-driven factors such as the appearance of new objects or other changes which render some stimuli dimensionally unique (e.g. a flashing marquee on a theatre). Attentional Control: Interaction of Stimulus-Driven and Goal-Directed Attention
Although the central focus of this chapter is on the phenomenon of attentional capture, it is difficult to discuss this construct without first describing the two highly interacting components of attention which determine whether attentional capture will be realized in any particular context. Concepts like goal-directed (topdown) and stimulus-driven (bottom-up) attention have been discussed for at least the past century (James, 1890) in an effort to describe the interactions that occur in the human information processing system in the service of visual selection. Goaldirected attention refers to an individual's ability to selectively process information in the environment. Central to the definition of goal-directed attention is that this form of attentional control relies on an observer's expectancies about events in the environment, knowledge of and experience with similar environments, and the ability to develop and maintain an attentional set for particular kinds of environmental events. In contrast, stimulus-driven attention entails the control of attention by characteristics of the environment, independently of an observer's intentions, expectancies or experience.
Capture, Control and Aging
297
Quite often goal-directed and stimulus driven aspects of control interact to determine the focus of attention (Egeth & Yantis, 1997). For example, searching your office for a particular book might entail some, however imperfect, knowledge of where you last left the book (i.e. goal-directed attention) along with the fact that the book is unusually large (i.e. stimulus-driven attention). In order to understand how these two forms of control interact to influence the focus of attention researchers have deemed it important to examine the mechanisms and characteristics of each of these constructs independently of the other. Indeed, the phenomenon of attentional capture, the main focus of this chapter, has been defined as an expression of stimulus-driven attention in the absence of observer expectancies or attentional set or preparation (Pashler, 1988, Theeuwes, 1992; Yantis & Jonides, 1984). That is, capture occurs when a feature of the environment which an observer is not searching for grabs attention. Although it is beyond the scope of this chapter to provide an extensive review of the literature on attentional capture we briefly mention a few important points in order to provide a context in which to explore changes in attentional capture and control during aging. First, task irrelevant singletons have been shown to capture attention, under some circumstances. For example, Pashler (1988) had subjects search for an obliquely oriented line among "0" distractors, a feature search task. On a subset of trials one of the distractors appeared in a unique color. Although subjects were instructed to ignore these task irrelevant singletons, their appearance disrupted search performance (see also Theeuwes, 1991). Other studies have failed to find evidence of capture of attention by task irrelevant singletons. For example, Jonides and Yantis (1988) instructed subjects to search for a predefined target letter among letter distractors. In each display one letter differed from the other letters in either luminance or color. However, subjects were instructed that this singleton was uncorrelated with the location and the identity of the target; that is the singleton was the target on 1/n trials with n being the total number of distractors in the display. In this situation evidence of attentional capture by the task irrelevant singleton would be provided by an observation of search performance which was fast and independent of the number of distractors in the display when the singleton served as the target. However, performance did not differ for the singleton and non-singleton targets (see also Hillstrom & Yantis, 1994; Todd & Kramer, 1997; Yantis & Egeth, 1999). Several lines of research have helped to uncover the reasons for the discrepancies between studies which have found evidence that task-irrelevant singletons capture attention and those that did not. Bacon and Egeth (1994) argued that the attentional strategies adopted by subjects influence whether a task-irrelevant singleton will capture attention. More specifically, Bacon and Egeth suggested that under certain conditions subjects can adopt a singleton detection search strategy in which they search for the most salient object in the display. In the case of search for a singleton target in the presence of a singleton distractor (e.g. a uniquely shaped target and a uniquely colored distractor) utilization of the singleton detection search strategy will lead to the capture of attention by the distractor, at least on a proportion
298
Kramer, Scialfa, Peterson and Irwin
of the experimental trials. Indeed, when Bacon and Egeth made it difficult for subjects to use the singleton search strategy, by presenting a non-singleton target (i.e. a target that was not uniquely different among some stimulus dimension from other objects in the display), the task irrelevant singleton distractor failed to capture attention (see also Yantis & Egeth, 1999). The effectiveness of task-irrelevant singletons to capture attention has also been suggested to be modulated by attentional control settings (Folk, Remington, & Johnston, 1992, 1993; Folk & Remington, 1998; see also Atchley et al., 2000; Gibson & Kelsey, 1998), which determine what aspects of the environment can automatically guide attention. For example, Folk et al. (1992) found that when the target was a uniquely colored item, an onset distractor precue with no predictive qualities failed to capture attention; likewise, when the target was defined as the onset item, a colored precue showed no evidence of attentional capture. Only when the precue matched the target-defining attribute did it show evidence of attentional capture. These results led Folk and colleagues to propose the contingent involuntary orienting hypothesis which suggests that stimulus-driven shifts of attention are contingent on attentional control settings, expectancies and experience. However, although high level cognitive processes determine the nature of the attentional set, attentional capture of environmental events is purely stimulus driven without the opportunity for further control (Folk et al., 1993). The results of a number of studies suggest that a subset of stimulus characteristics, specifically abrupt onsets and the appearance of new objects in the visual field, may engender attentional capture independently of at least some forms of top-down control and attentional control settings. For example, Yantis and Jonides (1984) had subjects search for a predefined target letter among other letters in a display. In each display all but one of the letters were constructed by removing segments of figure 8 premasks. These letters were referred to as non-onset stimuli. In addition, one new letter was added to the display concurrently with the removal of segments of the figure 8 premasks. This new letter was referred to as an onset. Although in these experiments the onset letter was no more likely to be the target than any of the other letters (i.e. the onset letter was the target 1/n trials, with n being equal to the total number of letters in a display), when the onset letter was the target search performance was fast and independent of the number of letters in the display. These data were interpreted as evidence that the onset was always attended first, that is, that abrupt onsets capture attention. More recent research (Yantis & Hillstrom, 1994) has suggested that it is not the abrupt onset of the stimulus per se which captures attention but instead the fact that a new object has appeared in the visual field. Similar capture effects for abrupt onsets or new objects have also been observed even when the onset never serves as a target (Remington et al., 1992) and for saccadic eye movements as well as covert attention (Irwin et al., 2000; Kramer et al., 1999, 2000; Theeuwes et al., 1998). However, even capture of attention by onsets or new objects can be moderated by other factors. The effectiveness of onsets to capture attention can be reduced by focusing attention elsewhere in the visual field (Theeuwes, 1991; Theeuwes et al., 1998; Yantis and Jonides, 1990),
Capture, Control and Aging
299
with extensive practice in searching for a target in the presence of a onset distractor singleton (Warner et al., 1990), and when multiple offsets occur in a display (Martin-Emerson & Kramer, 1997). In summary, a great deal of evidence now exists for the independence of stimulus-driven and goal-directed components of attention as well as their interaction in guiding attention in the environment. It also appears clear that while stimulus driven attention can have a substantial influence on the prioritization of stimuli for visual selection, a variety of goal-directed and strategic factors often interact with stimulus-driven aspects of control in influencing the prioritization hierarchy.
Aging and Attentional Capture In subsequent sections we examine the influence of age, from young adulthood to old age, on attentional capture and control. The following sections are organized around both specific experimental and theoretical issues in attention as well as with respect to the paradigms which have been employed to examine substantive theoretical issues concerning aging and attention.
Attentional capture and spatial cueing The study of spatial cueing has long served an important role in the examination of top-down and bottom-up factors in the control of attention. For example, Jonides (1981) conducted a series of studies in which he examined the influence of centrally located symbolic cues (e.g. an arrow) and peripheral location markers (e.g. bar markers) on visual spatial attention in tasks which required finding a target among distractors. Important manipulations in these studies included the validity with which the cue predicted the location of the target and the instructions to subjects to either attend to or ignore the cues. When subjects were instructed to attend to the cue, substantial validity effects (i.e. faster and more accurate performance when the cue predicted the location of the target than when it did not) were obtained for both the peripheral and central cues. However, when subjects were instructed to ignore the cues, a validity effects was only obtained for the peripheral cue (see also Remington et al., 1992). These data were interpreted as evidence that peripheral onset cues captured attention automatically while central symbolic cues required voluntary effort to direct attention (i.e. where employed in a strategic fashion). More recently, Mtiller and Rabbitt (1989; see also Cheal & Lyon, 1991) observed that the temporal dynamics differed for central and peripheral cues, with cueing effects reaching asymptote earlier and decreasing more rapidly for peripheral than for central cues. The results of studies such as these prompted researchers to argue for two different attentional mechanisms. The exogenous mechanism presumably responds to peripheral onsets, captures attention automatically and engenders rapid but transitory cueing effects. On the other hand, the endogenous mechanism appears to
300
Kramer, Scialfa, Peterson and Irwin
respond to symbolic cues, require voluntary effort and display a slower but more sustained time course. Studies which have examined age-related differences in peripheral and central cueing effects have, until recently, produced mixed results. For example, studies of cueing effects with centrally located symbolic cues have observed similar cueing effects for older and younger adults (Hartley et al., 1990), larger cueing effects for older than for younger adults (Greenwood & Parasuraman, 1994; Nissen & Corkin, 1985), and smaller cueing effects for older adults (Folk & Hoyer, 1992). Peripheral, abrupt onset cues have been observed to produce similar cueing effects for young and old adults, at least up to age 75 (Folk & Hoyer, 1992; Greenwood & Parasuraman, 1994; Hartley et al., 1990) as well as smaller cueing effects for older adults (Madden, 1990). Discrepancies in the results of these studies could be due to a number of factors including: the validity of the cues, the age and general health of the populations, the nature of the tasks, and the stimulus onset asynchronies (SOAs) between the cues and the imperative displays. A number of recent studies have addressed some of these issues during the examination of potential age differences in spatial cueing effects. Lincourt, Folk and Hoyer (1997) examined both the time course and the magnitude of cueing effects with central and peripheral cues. The central arrow cues indicated the location of the target with 75% validity while the peripheral cue possessed a validity of 25% (chance given four possible target locations). The time course and magnitude of the central cue effects were similar for young and old adults. Peripheral cueing effects were substantially larger for the old than for the young adults. However, there is one important caveat. The peripheral cueing effects for the young adults were unusually small and not significantly different from zero. Nevertheless, these results are potentially important in that they showed larger peripheral cueing effects for the older adults under conditions in which endogenous or top-down control would not likely be effective (i.e. since the peripheral cues predicted neither the location nor the identity of the target). Juola, Koshino, Warner, McMickell and Peterson (2000) presented both central and peripheral cues on each trial. Subjects were instructed that central arrow cues would always predict the target location with high reliability (i.e. 75%). One group of young and older adults were instructed that the peripheral cues would also be highly reliable (i.e. 75%) while a second group of subjects was instructed to ignore the peripheral cues since they would not reliably predict the location of the subsequent target (i.e. validity was 25%). SOAs were also manipulated such that the peripheral cue either appeared simultaneously with the central cue or 157 ms later. The old and young adults showed similar cueing effects when both the peripheral and central cues were valid or invalid. However, while young adults were able to ignore the unreliable peripheral onset cue, especially when it appeared after the central cue, the older adults were unable to ignore the unreliable peripheral cue even when it followed the central cue. Thus, these results like those obtained by Lincourt et al. (1997) suggest that older adults have a diminished ability to inhibit attentional capture. It is conceivable however, that age equivalence might be observed if
Capture, Control and Aging
301
subjects are given a longer period of time to prefocus attention on the basis of a central cue before the appearance of an abrupt onset peripheral cue. Pratt and Bellomo (1999) employed Folk et al.'s (1992) precueing paradigm to examine potential age-related differences in response to color and onset cues. Subjects were presented with either a spatially unreliable color or onset cue and asked, on different trials, to report the identity of a target which appeared as either an onset or in a unique color. Both young and old adults showed that color but not onset targets captured attention when an unpredictable color cue had been presented. Furthermore, the magnitude of the cueing effect was similar for the two age groups. On the other hand, older adults showed a larger capture effect than younger adults when an onset cue preceded an onset target. Moreover, the age difference remained even with an analysis of proportional validity effects, suggesting that the larger capture effects for the older adults cannot be a accounted for by general slowing. In summary, the bulk of the data collected thus far suggests that older adult have more difficulty inhibiting the capture of covert attention by unreliable peripheral onset cues than do young adults. However, additional research is needed to determine if such deficiencies can be overcome by additional preparation time (i.e. a longer interval to shift attention to a different location in the visual field prior to the appearance of an onset cue) or training. Attentional capture and overt attention
In the majority of the research that we discuss in the present chapter attention is assumed to be covert in nature. That is, researchers are interested in attentional capture which occurs independently of the position of the eyes. Indeed, in many of these studies displays are presented briefly to avoid eye movements or subjects are asked to maintain fixation as they perform a task. However, outside the laboratory door shifts of attention often entail shifts of the eyes in order to foveate areas of interest in the visual field. Thus, it appears reasonable to ask whether age related differences are observed with overt attention (i.e. eye movements or saccades) as they are with covert attention. Prior to discussing the research which inquires as to age-related differences in eye movement control and capture we believe that it is important to briefly discuss the relationship between covert and overt (eye movements) attention. It is clearly the case that we can shift attention independently of the eyes (Eriksen & Yeh, 1985; Posner, 1980). However, research which has examined the relationship between covert attention and overt attention (i.e. via saccades) in free viewing situations has, in general, found a close coupling between saccade programming and covert attention. For example, Deubel and Schneider (1996) found that letter identification performance was best when the letter to be identified was also the target of a saccade. Similarly, Hoffman and Subramaniam (1995) had subjects detect a visual target just prior to making a saccade and found that detection performance was best when the location of the target and the subsequent saccade
302
Kramer, Scialfa, Peterson and Irwin
were the same (see also Henderson & Hollingworth, 1999; Kowler et al., 1995; Sheliga et al., 1997). Indeed, we have recently obtained evidence that attention precedes not only voluntary but also involuntary or reflexive saccades (Irwin, Brockmole & Kramer, submitted). Finally, a number of recent Positron Emission Tomography (PET) and functional Magnetic Resonance Imaging (fMRI) studies have reported highly overlapping activation pattems in the brain during covert and overt attentional tasks (Corbetta, 1998; Nobre et al., 2000). Thus, in summary, it would appear that attention often precedes saccades to locations in the visual field. A number of aging studies have employed an eye movement paradigm that would appear to be ideally suited to the study of attentional control and capture. The antisaccade task, first introduced by Hallett in 1978, involves the presentation of an abrupt onset stimulus to the right or left of fixation in an otherwise empty visual field. The subject's task is to detect the onset using peripheral vision and rapidly look in the opposite direction. Performance on the antisaccade task, which clearly requires that subjects suppress a reflexive eye movement towards the onset stimulus while programming and executing a goal-directed saccade in the opposite direction, is dramatically affected by lesions in the frontal and prefrontal regions of the brain that are involved in the programming of goal-directed saccades (Guitton et al., 1985; Pierrot-Deseilligny et al., 1991; Rivaud et al., 1994). Frontal lobe patients have great difficulty inhibiting reflexive saccades to the onset stimulus, typically making saccades to the onset on 70 to 80% of the trials (as compared with approximately 10% misfixations by non-patients). Given the often reported changes in frontal lobe morphology and decreases in metabolism during the course of normal aging (Azari et al., 1992; Coffey et al., 1992; Raz, 2000; West, 1996) the antisaccade task would appear to provide an excellent testbed for the examination of overt shifts of attention, particularly with regard to the study of resistance to capture by the abrupt onset stimulus. Interestingly, the first study of potential age related differences in performance on the antisaccade task was published only recently. Olincy et al. (1997) had young and old adults perform both prosaccade (i.e. move your eyes to the peripheral onset stimulus) and antisaccade tasks. Three important findings were obtained in their study. First, the proportion of saccades to the onset stimulus (erroneous prosaccades in the antisaccade task) increased linearly from approximately 10% for 20 year olds to 50% for 80 year olds. Second, the latency on those trials on which the eyes did move in the opposite direction of the stimulus (i.e. correct antiscaccade trials) increased substantially with aging. Third, the latency of eye movements for correct trials on the antisaccade task was disproportionally increased for older adults relative to the latency on trials in which subjects were instructed to move their eyes to the onset stimulus. On the basis of these results the authors concluded that the inhibitory processes necessary for the suppression of eye movements to taskirrelevant events are compromised during the course of normal aging. However, other researchers have failed to replicate Olincy et al.'s (1997) age-related differences in antisaccade performance. For example, in large lifespan studies of performance on a variety of eye movement tasks Fischer et al. (1997) and
Capture, Control and Aging
303
Munoz et al. (1998) found that older adults made similar numbers of prosaccade errors (i.e. movement of the eyes toward the onset) on the antisaccade task as younger adults. These researchers also failed to find the disproportionate increase in saccadic latency on correct antisaccade trials for the older adults that was observed by Olincy et al. (1997). One potential reason for the discrepancy in effects among these studies might be the amount of practice received by the subjects on the eye movement tasks. The subjects in the Olincy et al. (1997) study received substantially less practice than subjects in Fischer et al.'s and Munoz et al.'s studies. Thus, older adults might be capable of improving their performance on the antisaccade task, that is successfully suppressing an eye movement to the abrupt onset, with a modest amount of practice. Butler et al. (1999) required subjects to identify the direction of an arrow at the position to which the eyes were to move while also performing the antisaccade task. Clearly, this dual-task version of the antisaccade task requires a greater degree of control and coordination than the traditional antisaccade task and therefore might be expected to reinstate the age related differences in performance observed by Olincy et al. (1997) with unpracticed subjects. Indeed, older subjects made a greater number of prosaccade errors in the antisaccade task but the increase in saccadic latency from the pro- to the antisaccade task was similar for the young and the old adults. The studies reviewed thus far suggest that older adults have difficulty suppressing an inappropriate eye movement in situations in which they are relatively unfamiliar with the task and procedures (Olincy et al., 1997) and when attempting to coordinate their eye movements with an additional task (Butler et al., 1999). Interestingly, Roberts et al. (1994; see also Walker et al., 1998) found that young adults also make substantially more prosaccade errors in the antisaccade task when required to perform the antisaccade concurrently with a working memory task. Thus, working memory, which is often reported to decline with age (West, 1996), would appear to be necessary to ensure that the eyes are directed away from the onset, possibility by maintaining this goal in an active state. In an attempt to further examine age related changes in eye movement control Nieuwenhuis et al. (2000) manipulated the SOA between the direction cue and the peripheral target (similar to the peripheral target to be identified in Butler et al., 1999) and found that at short SOAs the older adults showed similar prosaccade errors and saccadic latency in the antisaccade task to younger adults. However, when the SOA between the direction cue and the peripheral target was lengthened older adults' prosaccade errors and saccadic latency increased to a much greater extent than they did for younger adults. The authors interpreted these results as evidence that older adults are able to capitalize on the exogenous nature of the peripheral target at the short cue-target SOAs. That is, older (and younger) adults' eyes were captured by the peripheral abrupt onset target which, in turn, served to diminish age-related differences in eye movement errors and saccadic reaction time. On the other hand, when this was no longer possible, that is when eye movements had to be programmed prior to the presentation of the peripheral targets (i.e., at long
304
Kramer, Scialfa, Peterson and Irwin
cue-target SOAs), older adults were impaired at maintaining and implementing the goal of the task (i.e., to look in the opposite direction of the cue). In other words, older adults appear to demonstrate goal neglect, that is the inability to maintain the cue-action representation at a sufficient action level to prevent inappropriate eye movements (Duncan, 1995; De Jong et al., 1999). Indeed, this distinction fits well with our increasing understanding of the components of the oculomotor system which support voluntary (goal-directed) and reflexive (stimulus-driven) saccades and their relative sensitivity to aging (LaBerge, 1995; Pierrot-Deseilligny et al., 1995; Schall, 1995). Goal-directed saccades depend on the functional integrity of a number of frontal and prefrontal areas including the frontal eye fields, supplementary eye fields, and dorsolateral prefrontal cortex. On the other hand, reflexive saccades appear to be generated in a parietal-midbrain (i.e. superior colliculus) circuit. Furthermore, frontal regions have been found to be more sensitive to aging than midbrain areas such as the superior colliculus (Raz,
2000). The results of a recent series of studies conducted in our laboratory are consistent with the notion that age-related differences in oculomotor capture will be observed to the extent subjects must exert voluntary control over their eye movements in the presence of stimulus-driven influences such as task-irrelevant abrupt onsets. Old and young adults were presented with six gray circles with small figure-8 pre-masks inside. After 1000 ms the color of five of the circles changed to red and segments of the figure-8 pre-masks were removed to reveal letters. Subjects were instructed to move their eyes from the center of the display to the color singleton (i.e. the uniquely colored item) as soon as they detected the color change and identify the letter inside of the gray circle. On a subset of trials a new red circle (i.e. an abrupt onset) appeared simultaneously with the color change which cued the location of the color singleton target. The abrupt onset never served as the target nor did it predict the location of the target (as in the antisaccade task). This task has been referred to as the oculomotor capture paradigm (Theeuwes et al., 1998) Under the conditions described above older and younger adults misdirected their eyes to the task-irrelevant onset on essentially the same proportion of trials (approximately 20 to 40%) across three separate experiments (Kramer et al., 1999, 2000). Interestingly, the great majority of young and the old adults were unaware of the occurrence of the task-irrelevant onset and those few subjects who noticed its appearance, often on a small proportion of the trials on which it actually occurred, said that they never looked at it. However, when we made the subjects aware of the task-irrelevant onset, either by making it brighter than the other stimuli or instructing subjects as to its occurrence, and asked them to make sure that they did not look at it, older adults had a much more difficult time complying with instructions than did younger adults (Kramer et al., 2000). That is, older adults had more difficulty than young adults suppressing inappropriate eye movements when asked to exert voluntary control but not when they were unaware of the onset distractor. This may have occurred as a result of the engagement of working memory to retain multiple goals (e.g. move your eyes to the color target while
Capture, Control and Aging
305
ignoring the new distractor object) when subjects were aware of the task irrelevant distractor (De Jong, 2001; Roberts et al., 1994). Given that age is associated with diminished working memory capacity (Salthouse, 1994; Waters & Caplan, 2001), older adults would have greater difficulty maintaining multiple goals, thereby becoming more susceptible to stimulus-driven capture by the onset distractor. Of course, this hypothesis should be tested in future research. In summary, the literature on age differences in the capture of eye movements suggest that while reflexive control of saccades is relatively ageinvariant, voluntary control of eye movements in the presence of task-irrelevant prepotent stimuli is subject to age-related decline. However, it is also apparent from the literature that voluntary control of saccades is subject to substantial individual differences, particularly among older adults (Fischer et al., 1997; Munoz et al., 1998; Olincy et al., 1997). Examination of how these individual differences relate to differences in the performance of tasks which tap different control processes (Kramer et al., 1994) might be useful in explicating the factors which influence agerelated decline in the voluntary control of behavior and cognition. Attentional capture and visual search
The view that we are explicating in this chapter is that attentional capture can be considered a manifestation of attentional control. That is, attentional control in many circumstances mediates efficient, preferential selection of stimulus qualities that are consistent with current goals and expectations. However, this same selection can detract from performance if it is maintained when no longer useful, as when either stimuli or goals change. From this perspective, there are phenomena in visual search that fall within the domain of attentional control and capture. As such, although research that directly and explicitly examines aging and attentional capture in search is just beginning to appear, other findings can be brought to bear on the issue. Since Rabbitt's (1965) demonstration that older adults had difficulty in card-sorting, there have been many investigations of age-related differences in visual search. In broad terms, this literature concludes that age effects are trivial in feature and conjunction search when target-distractor similarity is low (Humphrey & Kramer, 1997; Kramer, Martin-Emerson, Larish & Andersen, 1996; Plude & Doussard-Roosevelt, 1989; Scialfa, Esau & Joffe, 1998; Scialfa & Joffe, 1997; Scialfa, Thomas & Joffe, 1994). In contrast age differences can be substantial when target-distractor similarity is increased, as in difficult feature or conjunction search (Humphrey & Kramer, 1997; Plude & Doussard-Roosevelt, 1989; Scialfa et al., 1998; Scialfa & Joffe, 1997). In addition, several micro-longitudinal studies of the development of visual search skill indicate that search proficiency increases at about the same rate for older and younger adults (Anandam & Scialfa, 1997; Kramer et al., 1996; Madden & Nebes, 1980; Salthouse & Somberg, 1982; Scialfa et al., 2000; Ho & Scialfa, submitted).
306
Kramer, Scialfa, Peterson and Irwin
Efficient search is subserved by the preferential processing of targets relative to distractors, and so these findings are relevant to the topic of age differences in attentional control. However, because efficient selection in search is intentional, it may not fall under the heading of attentional capture. On the other hand, there are times when the objects that are search targets for sustained periods suddenly become distractors and vice versa. In these "reversal" conditions, there are costs associated with the allocation of attention to the old target items and yet there is often an involuntary continuation of this disruptive selection (Shiffrin & Dumais, 1991). The linkage between disruption at reversal and attentional capture is made explicit in strength-theoretic models of skill (Schneider, 1985). Under this view, disruption occurs because the attention-attraction strength of targets relative to distractors is so great that the targets draw attention involuntarily. Anandam and Scialfa (1999) examined age differences in feature search for an oriented "Y" (taken from Enns, 1989) embedded in "Y"s with a 180 degree orientation difference. After approximately 2800 consistently-mapping (CM) trials, observers underwent a full reversal and searched for the former distractor in 1, 3, or 7 former targets. Disruption was substantial, increased with display size, particularly on target-absent trials, and brought search performance back to pre-training levels. That this disruption was the result of attentional capture is supported by the observation that there was no evidence of disruption after an equivalent amount of training on a varied-mapping (VM) task where the target and distractor changed roles from training block to training block. Under these conditions, attention attraction strength would not accrue to any item and so no item would be expected to capture attention. Two age effects are worthy of note: Compared to their younger counterparts, the elderly showed less evidence of disruption at reversal. By itself, this observation would suggest that the elderly had not developed an automatic response to the target and so it did not evoke capture at reversal. This conclusion must be qualified, however, because when the analysis was restricted to targets occurring near the central regions of the display, younger and older people exhibited the same, substantial disruption. Thus, older adults demonstrated the same amount of attentional capture as the young adults, but also exhibited a reduced useful field of view (Ball, Beard, Roenker, Miller, & Griggs, 1988; Scialfa et al., 1994) that places limits on the spatial extent over which capture operates. In a more recent study, Scialfa, Jenkins, Hamaluk and Skaloud (2000) compared younger and older adults in the development of automaticity in conjunction search. Observers were trained to look for targets defined by their orientation and contrast polarity (e.g., a black, right target in white, right and black, left distractors). In Experiment 2, both RTs and eye movements were used to index the change in performance with CM practice and reversal. At reversal, the typical disruption was observed in that RTs increased, particularly for larger displays. The same effect was observed in the number of fixations prior to a correct response, indicating that capture has an oculomotor component in search (see also Irwin et al., 2000; Kramer et al., 1999, 2000; Theeuwes et al., 1998). As well, closer
Capture, Control and Aging
307
examination of the objects on which fixations landed indicated that at reversal, there was a tendency to fixate the former target. Importantly, older adults showed these effects to the same degree as the young. Again, there is no evidence for an age difference in attentional capture. Fisk, Rogers and their colleagues have examined age differences in the development of automaticity in a variety of tasks. Included in this program is work on visual search, memory search and also, semantic category search, in which observers search for exemplars of target categories that are embedded in exemplars of distractor categories. The above-mentioned work on fairly traditional visual search tasks prompts the expectation of minimal age differences in performance. In fact, quite a different picture emerges. Fisk et al. (1990) compared younger and older adults in digit-letter search under CM and VM training, which alternated across trial blocks. Memory set size varied between one and four items and display size was fixed at two items. After CM training, memory set size slopes approached zero for younger adults but remained at 16 ms/item for the older observers. At transfer, only the young group showed significant disruption, presumably because attentional capture was operating only in that group. In Rogers and Fisk (1991), older and younger people were compared in consistent-mapping, varied-mapping, and attenuated priority learning where associative learning could occur but priority learning was minimized. Transfer followed all training conditions with letter and semantic category search. Age differences in consistent-mapping training were greater than in attenuated priority training, suggesting that older adults had difficulty with the priority learning that underlies the development of an automatic attention response to the target. These results were corroborated at transfer, because the young exhibited more disruption than their older counterparts. This finding is consistent with the view that younger adults automatize responses to the target and, in consequence, show greater attentional capture. Rogers (1992) and Rogers et al. (1994) gave observers of varying ages a semantic category search task with a memory set of one category and display sizes ranging from one to four. CM and VM blocks alternated throughout the sessions. Relative to the older observers, younger adults showed greater CM improvement and greater disruption at CM reversal. They interpreted this finding to indicate that younger adults had learned to attend better to the CM target and inhibit processing of distractor items, with the result that performance was more adversely affected when this priority learning was no longer appropriate. For the most part, studies of aging and skill acquisition have focused on the costs associated with automatic responses that are inappropriate when targets and distractors are reversed. There has been only one recent study of aging and the positive transfer of automatized skill. Fisk et al. (1997) compared younger and older observers in a semantic category visual search task. Participants were given CM training followed by one reversal session consisting of three different conditions. In the trained/trained (T/T) condition, observers searched displays containing the same
308
Kramer, Scialfa, Peterson and Irwin
target words used in training. In the untrained target/trained category condition (U/T), the target semantic category remained unchanged but different exemplars were used. In the untrained target/untrained category (U/U) condition, new exemplars in new categories were used. Younger adults demonstrated positive transfer in the U/T condition, presumably because attention was captured by exemplars of the previously trained categories. Older adults did not show as much benefit (capture), consistent with the view that they did not automatize their responses to the target category. Thus, in several studies of practice-based changes in search performance, age deficits have been observed in the disruption that follows reversal of CM targets and distractors. These findings have been taken as evidence that older adults do not develop an automatic attention response to trained CM targets or do not exhibit attentional capture when these items become distractors. This conclusion is at odds with the more traditional visual search and aging literature, in that the latter indicates that older adults demonstrate as much capture as the young. A synthesis of these apparently contradictory views may be approached by understanding the cognitive differences between visual and semantic category search. The memory component of semantic category search is much larger than in visual search tasks. Because older adults are known to have difficulties in episodic encoding and retrieval (cf Kausler, 1994), they may be deficient in the development of automatic attention responses in semantic category search and, as a result, demonstrate less capture when these automatized responses are no longer useful. Second, though not an inherent property of semantic category search, the protocol employed by Fisk, Rogers and their colleagues generally requires observers to perform several cognitively heterogeneous tasks in inter-laced fashion. It is possible that older adults perform less well in these conditions because they have difficulty with task-switching (Bailey & Lauber, 1998; Kramer, Hahn, & Gopher, 1999; Kray & Lindenberger, 2000) that compromise the development of automaticity and thus the attentional capture that results. Attentional capture and focused attention
There is another general class of attentional phenomena wherein attentional control is required and attentional capture may be made manifest. These often come under the heading of focused attention tasks. They include the flanker task (Eriksen & Eriksen, 1974) and the Stroop task (MacLeod, 1991). In the flanker task, sometimes called non-search detection, observers typically decide which of several stimuli have been presented centrally while ignoring the stimuli that are presented in their immediate vicinity. Attentional capture, involuntary processing of stimuli that are spatially close to the central target, is seen as benefits when the flanking information is consistent with the target response but perhaps more clearly as costs associated with incompatible flankers. Capture may also be evidenced when, relative to a no-flanker control condition, any flanking letters produce performance decrements.
Capture, Control and Aging
309
In one of the first aging studies of the flanker effect, Wright and Elias (1979, Experiment 1) compared older and younger adult groups in identification of a central letter presented alone or flanked by neutral noise letters. Both age groups had longer RTs when the target was flanked by noise letters, suggesting some obligatory processing of them, but the older adults were not slowed disproportionately in this condition. In a second experiment, Wright and Elias (1979) compared younger and older adults in a more common variant of the flanker task, in which identification of a central letter occurs in no-noise, compatible-noise, and incompatible-noise conditions. For both groups, incompatible noise resulted in longer latencies, but the difference between the incompatible-noise and compatible-noise trials was 22.5 ms for the young and 11.6 for the elderly. Thus, the elderly showed less evidence of capture. These findings must be qualified, however, because the elderly have a reduced useful field of view (Ball et al., 1988; Scialfa et al., 1994). As such, they may show smaller costs from the more eccentric flankers. In fact, Cerella (1985) found that age deficits in flanker effects can be large when item separation is small but actually reverse when item separation is larger. Contrary to Cerella's hypothesis, Kramer et al. (1994) found age-equivalent flanker effects with closely spaced targets and distractors (see also Madden & Gottlob, 1997). However, their subjects were well practiced in the task, unlike previous flanker studies that have examined aging effects. Although not normally considered to involve the flanker task, two additional investigations bear mention here. Madden (1983) and Nissen and Corkin (1985) carried out aging studies of the benefit from advance spatial information for visual search. For example, Madden (1983) presented observers with four-item letter displays. On some trials, a two-sided arrow indicated the possible locations of the impending target letter. If the non-cued items fail to capture or vie for attention, then there should be benefits from the advance cues. In fact, older adults showed more benefit than did the young. Nissen and Corkin reported similar results (but see Plude and Hoyer, 1986). Results from a more recent study suggest that older adults might avoid distractor effects under such conditions by employing a narrower focus of attention than younger adults (Madden & Gottlob, 1997). The Stroop color-word task (cf, MacLeod, 1991) has been an archetypal paradigm for the study of focused attention. Interference produced in the condition where the word is not the same as the ink's color reflects an obligatory processing of semantic information and thus is appropriately considered under the category of attentional capture. Typically, older adults exhibit greater Stroop interference (Dulaney & Rogers, 1994; Hartley, 1993; Houx, Jolles, & Vreeling, 1993; Klein, Ponds, Houx, & Jolles, 1997). While this might be seen as evidence of greater susceptibility to capture amongst the elderly, the conclusion must be tempered by at least three considerations. First, several analyses suggest that age differences in Stroop interference are eliminated once generalized slowing is controlled (Salthouse & Meinz, 1995;
310
Kramer, Scialfa, Peterson and Irwin
Verhaeghen & De Meersman, 1998; but see Spieler et al., 1996). Second, Hartley (1993) demonstrated that when the color and the word are separated spatially, older and younger observers show equivalent Stroop effects. This might be taken to mean that older adults show greater attentional capture by semantic content but only when spatial filtering does not allow for the segregation of color and semantic information. Finally, it is surprising that there is often scant attention given by researchers to normal age-related changes in color vision. Color naming conditions will, perhaps, reveal gross deficits in color perception but, even in the absence of pathology, opacification of the lens (Weale, 1986), will render short-wavelengths less intense and can change the speed with which that color information is transduced and communicated to visual cortex and beyond. Thus, greater Stroop interference among the elderly may well be attributed to non-attentional factors.
Summary and Conclusion The literature discussed above provides a relatively broad and complex view of aging and attentional control given the potentially different forms of capture that have been studied (e.g. capture engendered by training versus capture which occurs with no training such as that observed for onset distractors), theoretical issues that have been pursued and paradigms that have been employed in the studies. It is also the case that the populations of young and old adults employed in studies of attentional control and capture differ in terms of age ranges, health, and overall intellectual functioning. Given such heterogeneity in important aspects of the studies one might wonder whether a set of coherent conclusions can be formulated on the basis of this literature. We believe that the answer is affirmative. There is increasing evidence that sudden onsets or new objects which appear in the environment have a greater impact on older than younger adults (Juola et al., 2000; Lincourt et al., 1997; Pratt & Bellomo, 1999). That is, such features of the environment appear to be more likely to redirect older adults spatial attention, even when these stimuli are clearly irrelevant and indeed harmful to the task at hand, than the spatial attention of younger adults. Whether such age-related deficiencies can be reduced with training or increased preparation time is an important question for future research. The study of overt attention or eye movements seems to provide a similar picture of age-related changes in control and capture to that of the covert cueing literature. In situations in which subjects receive little practice or perform eye movement tasks in concert with other tasks (e.g. with a target discrimination task at the intended location of the eye movement) older adults have more difficulty suppressing inappropriate eye movements than young adults (Butler et al., 1999; Nieuwenhuis et al., 2000). On the other hand, older adults eye movement behavior, even in the presence of prepotent stimuli such as onsets or new but task-irrelevant objects, is similar to that of younger adults when voluntary control is not expressed (Kramer et al., 1999, 2000). Such a pattern of results may either suggest that (a) different varieties of inhibitory processes are associated with reflexive and voluntary
Capture, Control and Aging
311
saccadic control- and that inhibitory processes associated with voluntary control are more sensitive to aging or (b) that older adults have more difficulty than younger adults in maintaining the appropriate task goals in order to overcome capture of the eyes by the onset stimulus. Indeed, it is certainly conceivable that inhibitory failure and goal neglect might both be implicated in age-related difficulties in avoiding covert and overt capture of attention. Future studies will be necessary to distinguish between the contribution of these mechanisms to age-related differences in covert and overt control and capture. At first glance the research on attentional capture within the context of visual search appears to be somewhat perplexing. Some studies of training based capture (i.e. training with specific targets and distractors for 1000's of trials before reversing the role of the targets and distractors) have found similar effects for both old and young adults (Anandam & Scialfa, 1999; Scialfa et al., 2000) while other studies have failed to find the development of automaticity, and therefore capture, for older adults (Fisk et al., 1997; Rogers & Fisk, 1991). However, a potentially important difference between these studies is the role of memory and task coordination. For example, the search studies conducted by Fisk, Rogers and colleagues have, for the most part, employed hybrid memory-visual search paradigms in which subjects are required to search for a number of different targets in a display of targets and distractors. Furthermore, different search tasks have often been intermixed in their studies (e.g. searching for one set of targets on one block of trials and then searching for another set of targets on the next block of trials). On the other hand, Scialfa and colleagues have employed more traditional visual search paradigms which entail searching for a single target in a display of a multitude of distractors. Thus, it seems reasonable to speculate that goal maintenance and updating and the efficiency of inhibition might be more of a concern in the Fisk and Rogers studies, thereby diminishing the training effects on the development of automaticity (and capture - with target/distractor reversal) for the older adults. One way to examine this hypothesis would be to systematically vary the memory and task switching demands (see Kramer et al., 1999; Meiran et al., 2001) of a search task in the study of the acquisition of automaticity or training-based capture. Our expectation is that increasing memory load and switching demands would serve to systematically diminish older adults ability to automatize target detection, thereby reducing capture effects when targets and distractors are reversed. Focused attention studies of attentional control and capture present a relatively straightforward picture of the influence of age on attentional control and capture. Research with both the flanker and Stroop paradigm suggest that older adults are capable of restricting their spatial attention to successfully ignore taskirrelevant (or harmful) distractors (Hartley, 1993; Kramer et al., 1994; Madden & Gottlob, 1997). However, there is also evidence, in the Stroop paradigm, that older adults have more difficulty ignoring task-irrelevant information when the restriction of spatial attention is not a viable strategy (i.e. when the task-relevant and taskirrelevant information is integrated in a single object; Dulaney & Rogers, 1994; Hartley, 1993; Klein et al., 1997). Thus, it would appear that capture of attention by
312
Kramer, Scialfa, Petersonand Irwin
prepotent semantic information (i.e. well known color words), like the capture of attention by onsets and new objects, increases as a function of age. Whether similar mechanisms underlie capture of attention by these different varieties of stimuli is an interesting topic for future research. Thus far, the research that we have discussed has concerned the behavioral study of age-related changes in attentional control and capture. However, the burgeoning field of neuroscience and neuroimaging provides another means to explore age-related changes in attentional control. At present there has been little research on changes in age-related differences in functional brain activity which underlie aspects of attention and memory (but see Cabeza, 2000; Grady, 2000 for recent reviews of this growing literature). However, the neuroimaging research that has been conducted, employing Positron Emission Tomography (PET) and functional Magnetic Resonance Imaging (fMRI), on age-related differences in brain activation has produced some intriguing findings that lead to a set of tentative but potentially important conclusions. First, a number of neuroimaging studies have found that older adults often show less activation than younger adults, in a variety of brain regions, across a variety of memory and attention tasks. Second, studies have reported that older adults often recruit either different regions of cortex or additional regions of cortex as compared to young adults performing the same task. For example, Madden et al. (1997) found that younger adults showed greater activation in the occipitotemporal pathway than older adults while performing a divided attention task. On the other hand older adults showed greater activation in prefrontal regions than did younger adults. Madden and colleagues interpreted these findings as evidence of age-related differences in the forms of control used to perform the tasks, with younger adults relying primarily on letter identification processes and older adults primarily relying on executive control processes supported by prefrontal regions. Other researchers have come to similar conclusions concerning age-related shifts in control strategies in attention and memory tasks on the basis of changes in brain activation patterns (Buckner & Logan, in press; Reuter-Lorenz et al., 2000). Thus, techniques such as PET, fMRI (D'Esposito et al., 1999) and optical imaging (Gratton & Fabiani, 1998), in conjunction with more traditional behavioral measures, offer the promise of enhancing our understanding of age-related changes in the processes which underlie attentional capture and control as well as providing an explication of how such processes are implemented in the brain. References
Anandam, B. T. & Scialfa, C. T. (1999). Aging and the development of automaticity in feature search. Aging, Neuropsychology, and Cognition, 6, 1 1 7 140. Ardilla, A. & Rosselli, M. (1989). Neuropsychological characteristics of normal aging. Developmental Neuropsychology, 5, 307-320.
Capture, ControlandAging
313
Atchley, P., Kramer, A.F. & Hillstrom, A. (2000). Contingent capture for onsets and offsets: Attentional set for perceptual transients. Journal of Experimental Psychology: Human Perception and Performance, 26, 595-606. Azari, N.P., Rapport, S.I., Salerno, J.A., Grady, C.L., Gonzales-Aviles, A., Schapiro, M.B. & Horwitz, B. (1992). Intergenerational correlations of resting cerebral glucose metabolism in old and young women. Brain Research, 552, 556559. Bacon, W.F. & Egeth, H.E. (1994). Overriding stimulus-driven attentional capture. Perception & Psychophysics, 55, 485-496. Bailey, A. & Lauber, E. J. (1998). Learning to task switch and aging. Paper presented at the meeting of the 1998 Cognitive Aging Conference in Atlanta, GA. Ball, K., Beard, B., Roenker, D., Miller, R., & Griggs, D. (1988). Age and visual search: Expanding the useful field of view. Journal of the Optical Society of America, 5, 2210-2219. Birren, J.E. (1965). Age changes in the speed of behavior: Its central nature and physiological correlates. In A.T. Welford & J.E. Birren (Eds.), Behavior, aging and the nervous system, (pp. 114-216). Springfield, IL: Charles C. Thomas. Birren, J.E. & Schroots, J.J. (1996). History, concepts, and theory in the psychology of aging. In J.E. Bireen & K.W. Schaie (Eds.), Handbook of the Psychology of Aging. (pp. 1-23). Sand Diego, CA: Academic Press. Buckner, R.L. & Logan, J.M. (in press). Frontal contributions to episodic memory encoding in the young and elderly. In A.E. Parker, E.L. Wilding & T. Bussey, T. (Eds.), The cognitive neuroscience of memory encoding and retrieval. Philadelphia: Psychology Press. Burke, D.M. (1997). Language, aging, and inhibitory deficits: Evaluation of a theory. Journal of Gerontology: Psychological Sciences, 6, 254-264. Butler, K.M., Zacks, R.T., & Henderson, J.M. (1999). Suppression of reflexive saccades in younger and older adults: Age comparisons on an antisaccade task. Memory and Cognition, 27, 584-591. Cabeza, R. (in press). Functional neuroimaging of cognitive aging. In R. Cabeza & A. Kingstone, (Eds), Handbook of Functional Neuroimaging of Cognition. Cambridge, MA: MIT Press. Cerella, J. (1985). Information processing rates in the elderly. Psychological Bulletin, 98, 67-83. Cheal, M.L. & Lyon, R.D. (1991). Central and peripheral precueing of forced choice discrimination. Quarterly Journal of Experimental Psychology, 43A, 859-880. Coffey, C.E., Wilkinson, W.F., Parashos, I.A., Soady, A.A.R., Sullivan, R.J., Paterson, L.J., Figiel, G.S., Webb, M.C., Spritzer, C.E., & Djang, W.T. (1992). Quantitative cerebral anatomy of the aging human brain: A cross-sectional study using magnetic resonance imaging. Neurology, 42, 527-536. Corbetta, M., Miezin, F. M., Dobmeyer, S., Shulman, G. L., & Petersen, S. E. (1991). Selective and divided attention during visual discrimination of shape,
314
Kramer, Scialfa, Peterson and Irwin
color, and speed: Functional anatomy by positron emission tomography. Journal of
Neuroscience, 11, 2383-2402. Corbetta, M. (1998). Frontoparietal cortical networks for directing attention and the eye to visual locations: Identical, independent, or overlapping neural systems. Proceedings of the National Academy of Science, 95, 831-838. Daigneault, S., Braun, C. & Whitaker, H. (1992). Early effects of normal aging on perseverative and non-perseverative prefrontal measures. Developmental Neuropsychology, 8, 99-114. D'Esposito, M., Detre, J., Alsop, D., Shin, R., Atlas, S. & Grossman, M. (1995). The neural basis of the central executive system of working memory. Nature, 378, 279-281. D'Esposito, M., Zarahn, E. & Aguirre, G.K. (1999). Event-related functional MRI: Implications for Cognitive Psychology. Psychological Bulletin, 125, 155-164. De Jong, R. (2001). Adult age differences in goal activation and goal maintenance. European Journal of Cognitive Psychology, 13, 71-89. De Jong, R., Berendsen, E. & Cools, R. (1999). Goal neglect and inhibitory limitations: dissociable causes of interference effects in conflict situations. Acta Psychologica, 101, 379-394. Dempster, F.N. (1992). The rise and fall of the inhibitory mechanism: Toward a unified theory of cognitive development and aging. Developmental Review, 12, 45-75. Deubel, H. & Schneider, W.X. (1996). Saccade target selection and object recognition: Evidence for a common attentional mechanism. Vision Research, 6, 1827-1837. Dulaney, C. & Rogers, W.A. (1994). Mechanisms underlying reduction in Stroop interference with practice for young and old adults. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 470-484. Duncan, J. (1995). Attention, intelligence, and the frontal lobes. In M.S. Gazzaniga (Ed.), The Cognitive Neurosciences. (pp. 721-733). Cambridge, MA: MIT Press. Egeth, H.E. & Yantis, S. (1997). Visual attention: Control, representation, and time course. Annual Review of Psychology, 48, 269-297. Engle, R.W., Conway, A.R., Tuholski, S.W. & Shisler, R.J. (1995). A resource account of inhibition. Psychological Science, 6, 122-125. Enns, J. T. (1989). Three-dimensional features that pop out in visual search. In D. Brogan (Ed.), Visual search (pp. 37-45) London: Taylor & Francis. Eriksen, B. A., & Eriksen, C.W. (1974). Effects of noise letters upon the identification of a target letter in a nonsearch task. Perception & Psychophysics, 16, 143-149. Eriksen, C.W. & Yeh, Y.Y. (1985). Allocation of attention in the visual field. Journal of Experimental Psychology: Human Perception and Performance, 11, 583-597.
Capture, Control and Aging
315
Fischer, B., Biscaldi, M. & Gezeck, S. (1998). On the development of voluntary and reflexive components in human saccade generation. Brain Research, 754, 285-297. Fisk, A.D., Rogers, W.A., & Giambra, L.M. (1990) Consistent and varied memory/visual search: Is there an interaction between age and response-set effects? Journal of Gerontology: Psychological Sciences, 45, P81-P87. Fisk, A. D., Hertzog, C., Lee, M. D., Rogers, W. A., & Anderson-Garlach, M. (1994). Long-term retention of skilled visual search: Do young adults retain more than old adults? Psychology and Aging, 9, 206 - 215. Fisk, A. D. & Rogers, W. A. (1991). Toward an understanding of agerelated memory and visual search effects. Journal of Experimental Psychology: General, 120, 131-149. Fisk, A. D., Rogers, W. A., Cooper, B. P. & Gilbert, D. K. (1997). Automatic category search and its transfer: Aging, type of search, and level of learning. Journals of Gerontology Series B-Psychological Sciences & Social Sciences, 52B, 91 - 102. Folk, C.L. & Hoyer, W.J. (1992). Aging and shifts of visual spatial attention. Psychology and Aging, 7, 453-465. Folk, C.L. & Remington, R.W. (1998). Selectivity in distraction by irrelevant featural singletons: Evidence for two forms of attentional capture.
Journal of Experimental Psychology: Human Perception and Performance, 24, 112. Folk, C.L., Remington, R.W. & Johnston, J.C. (1992). Involuntary covert orienting is contingent on attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 18, 1030-1044. Folk, C.L., Remington, R.W. & Johnston, J.C. (1993). Contingent attentional capture: A reply to Yantis. Journal of Experimental Psychology: Human Perception and Performance, 19, 682-685. Foster, J.K., Behrmann, M. & Stuss, D.T. (1995). Aging and visual search: Generalized cognitive slowing or selective deficit in attention? Aging and Cognition, 2, 279-299. Gibson, B.S. & Kelsey, E.M. (1998). Stimulus driven attentional capture is contingent on attentional set for display wide visual features. Journal of Experimental Psychology: Human Perception and Performance, 24, 699-706. Grady, C.L. (2000). Functional brain imaging and age-related changes in cognition. Biological Psychology, 54, 259-281. Gratton, G. & Fabiani, M. (1998). Dynamic brain imaging: Event-related optical signal (EROS) measures of the time course and localization of cognitiverelated activity. Psychonomic Bulletin and Review, 5, 535-563. Greenwood, P.M. & Parasuraman, R. (1994). Attentional disengagement deficit in nondemented elderly over 75 years of age. Aging and Cognition, 1, 188202.
316
Kramer, Scialfa, Peterson and Irwin
Guitton, D., Buchtel, H.A., & Douglas, R.M. (1985) Frontal lobe lesions in man cause difficulties in suppressing reflexive glances and in generating goaldirected saccades. Experimental Brain Research, 58, 455-472. Hallett, P.E. (1978). Primary and secondary saccades to goals defined by instructions. Vision Research, 18, 1279-1296. Hartley, A. A. (1993). Evidence for the selective preservation of spatial selective attention in old age. Psychology and Aging, 3, 371-379. Hartley, A.A., Kieley, J.M. & Slabach, E.H. (1990). Age differences and similarities in the effects of cues and prompts. Journal of Experimental Psychology: Human Perception and Performance, 16, 523-537. Hasher, L., Stoltzfus, E.R., Zacks, R. & Rypma, B. (1991). Age and inhibition. Journal of Experimental Psychology: Learning, Memory and Cognition, 17, 163-169. Hasher, L., & Zacks, R. (1988). Working memory, comprehension, and aging: A review and a new view. In G.K. Bower (Ed.), The psychology of learning and motivation. (Vol. 22, pgs. 193-225). San Diego, CA: Academic Press. Henderson, J.M. & Hollingworth, A. (1999). The role of fixation position in detecting scene changes across saccades. Psychological Science, 1O, 438-443. Hillstrom, A.P. & Yantis, S. (1994). Visual motion and attentional capture. Perception & Psychophysics, 55, 399-411. Ho, G., & Scialfa, C.T. (submitted) Age, Skill Transfer, and Conjunction Search. Kausler, D.H. (1994). Learning and memory in normal aging. New York: Academic Press. Hoffman, J.E., & Subramaniam, B. (1995).The role of visual attention in saccadic eye movements. Perception & Psychophysics, 57, 787-795. Houx, P., Jolles, J., & Vreeling, F. (1993). Stroop interference: Aging effects assessed with Stroop color-word test. Experimental Aging Research, 19, 209-224. Humphrey, D.G. & Kramer, A.F. (1997). Age differences in visual search for feature, conjunction, and triple-conjunction targets. Psychology and Aging, 12, 704-717. Irwin, D.E., Brockmole, J. & Kramer, A.F. (submitted). Attention precedes involuntary saccades. Irwin, D.E., Colcombe, A.M., Kramer, A.F. & Hahn, S. (2000). Attentional and oculomotor capture by onset, luminance, and color singletons. Vision Research, 40, 1443-1458. James, W. (1980). The principles of psychology. NY: Henry Holt & Company. Jenkins, L., Myerson, L., Joerding, A. & Hale, S. (2000). Converging evidence that visuospatial cognition is more age-sensitive than verbal cognition. Psychology andAging, 15, 157-175.
Capture, ControlandAging
317
Jonides, J. (1981). Voluntary vs. automatic control over the mind's eye's movement. In J.B. Long and A.D. Baddeley (Eds.), Attention and Performance IX (pp. 187-203). Hillsdale, NJ: Erlbaum. Jonides, J. & Yantis, S. (1988). Uniqueness of abrupt visual onset in capturing attention. Perception & Psychophysics, 43, 346-354. Juola, J.F., Koshino, H., Wamer, C.B., McMickell, M. & Peterson, M. (2000). Automatic and voluntary control of attention in young and elderly adults. American Journal of Psychology, 113, 159-178. Kane, M.J., Hasher, L., Stoltzfus, E.R., Zacks, R.T. and Connelly, S.L. (1994). Inhibitory attentional mechanisms and aging. Psychology and Aging, 9, 103-112. Kieley, J.M. & Hartley, A.A. (1997). Age-related equivalence of identity suppression in the Stroop color-word task. Psychology and Aging, 12, 22-29. Klein, M., Ponds, R.W.H.M, Houx, P.J, & Jolles, J. (1997). Effect of test duration on age-related differences in Stroop interference. Journal of Clinical and Experimental Neuropsychology, 19, 77-82. Korteling, J. (1991). Effects of skill integration and perceptual competition on age-related differences in dual-task performance. Human Factors, 33, 35-44. Kowler, E., Anderson, E., Dosher, B. & Blaser, E. (1995). The role of attention in the programming saccades. Vision Research, 35, 1897-1916. Kramer, A.F., Hahn, S. & Gopher, D. (1999). Task coordination and aging: Explorations of executive control processes in the task switching paradigm. Acta Psychologica, 101, 339-378. Kramer, A.F., Hahn, S., Irwin, D.E. & Theeuwes, J. (1999). Attentional capture and aging: Implications for visual search performancer and oculomotor control. Psychology and Aging, 14, 135-154. Kramer, A.F., Hahn, S., Irwin, D.E. & Theeuwes, J. (2000). Age differences in the control of looking behavior: Do you know where your eyes have been? Psychological Science, 11, 210-216. Kramer, A.F., Humphrey, D.G., Larish, J.F., Logan, G.D., & Strayer, D.L. (1994). Aging and inhibition: Beyond a unitary view of inhibitory processing in attention. Psychology and Aging, 9, 491-512. Kramer, A.F., Larish, J., Weber, T., & Bardell, L. (1999 c). Training for executive control: Task coordination strategies and aging. In D. Gopher & A. Koriat (Eds.), Attention and Performance XVII. Cambridge, MA. MIT Press. Kramer, A.F., Martin-Emerson, R., Larish, J. & Andersen, G.J. (1996). Aging and filtering by movement in visual search. Journal of Gerontology: Psychological Sciences, 51, 201-216. Kray, J. & Lindenberger, U. (2000). Adult age differences in task switching. Psychology and Aging, 15, 126-147. Kwong, See, S.T. & Ryan, E.B. (1995). Cognitive mediation of adult age differences in language performance. Psychology and Aging, 1O, 458-468. LaBerge, D. (1995). Attentional processing. Cambridge, MA: Harvard University Press.
318
Kramer, Scialfa, Peterson and Irwin
Lawrence, B., Myerson, J. & Hale, S. (1998) Differential decline of verbal and visuospatial processing across the adult lifespan. Neuropsychology and Cognition, 5, 129-146. Lincourt, A.E., Folk, C.L. & Hoyer, W.J. (1997). Effects of aging on voluntary and involuntary shifts of attention. Aging, Neuropsychology and Cognition, 4, 290-303. Madden, D.J. (1983). Aging and distraction from highly familiar stimuli during visual search. Developmental Psychology, 19, 499-507. Madden, D.J. (1990). Adult age differences in the time course of visual attention. Journal of Gerontology: Psychological Sciences, 45, 9-16. Madden, D.J. & Gottlob, L.R. (1997). Adult age differences in strategic and dynamic components of focusing visual attention. Aging, Neuropsychology and Cognition, 4, 185-210. Madden, D. J. & Nebes, R. D. (1980). Aging and the development of automaticity in visual search. Developmental Psychology, 16, 377-384. Madden, D.J., Turkington, T.G., Provenzale, J.M, Hawk, T.C., Hoffman, J.M. & Coleman, R.E. (1997). Selective and divided visual attention: Age related changes in regional cerebral blood flow measured by H2 150. Human Brain Mapping, 5, 389-409. Martin-Emerson, R. and Kramer, A.F. (1997). Offset transients modulate attentional capture by sudden onsets. Perception & Psychophysics, 59, 739-751. May, C.P., Kane, M.J. & Hasher, L. (1995). Determinants of negative priming. Psychological Bulletin, 118, 35-54. Mayr, U. & Kliegl, R. (1993). Sequential and coordinative complexity: Age-based processing limitations in figural transformations. Journal of Experimental Psychology: Learning, Memory and Cognition, 19, 1297-1320. Mayr, U. & Liebscher, T. (1998). Poster presented at the 28 th Attention Performance Conference. June, Cumberland, England. McDowd, J.M. (1997). Inhibition in attention and aging. Journal of Gerontology: Psychological Sciences, 52, 265-273. McDowd, J.M., Oseas-Kreger, D.M., & Filion, D.L. (1995). Inhibitory processes in cognition and aging (pgs. 363-400). In F.N. Dempster and C.J. Brainerd (Eds.), Interference and inhibition in cognition. San Diego, CA: Academic Press. MacLeod, C. M. (1991). Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin, 109, 163-203. Mead, S. & Fisk, A.D. (1998). Measuring skill acquisition and retention with an ATM simulator: The need for age-specific training. Human Factors, 40, 516-523. Meiran, N., Gotler, A. & Perlman, A. (2001). Old age is associated with a pattern of relatively intact and impaired task-set switching abilities. Journal of Gerontology Series B Psychological Sciences and Social Sciences, 56, 88-102. Moore, C. (1994). Negative priming depends on probe trial conflict: Where has all the inhibition gone? Perception & Psychophysics, 56, 133-144.
Capture, Control and Aging
319
Mfiller, H.J. & Rabbitt, P.M.A. (1989). Reflexive and voluntary orienting of visual attention: Time course of activation and resistance to interruption. Journal of Experimental Psychology: Human Perception and Performance, 15, 315-330. Munoz, D.P., Broughton, J.R., Foldring, J.E. & Armstrong, I.T. (1998). Age-related performance of human subjects in saccadic eye movement tasks. Experimental Brain Research, 121, 391-400. Neill, T. & Valdes, L. (1996). Facilitatory and inhibitory aspects of attention. In A.F. Kramer, M.G.H. Coles and G.D. Logan (Eds.), Converging operations in the study of visual selective attention. Washington, D.C.: APA Press. Nieuwenhuis, S., Ridderinkhof, K.R., de Jong, R., Kok, A. & van der Molen, M.W. (2000). Inhibitory inefficiency and failures of intention activation: Age-related decline in the control of saccadic eye movements. Psychology and Aging, 15, 635-647. Nissen, M.J., & Corkin, S. (1985). Effectiveness of attentional cueing in older and younger adults. Journal of Gerontology, 40, 185-191. Nobre, A.C., Gitelman, D.R., Dias, E.C. & Mesulam, M.M. (2000). Covert visual spatial orienting and saccades: Overlapping neural systems. Neuroimage, 11, 210-216. Olincy, A., Ross, R.G., Young, D.A., & Freedman, R. (1997). Age diminishes performance on an antisaccade eye movement task. Neurobiology of Aging, 18, 483-489. Pashler, H. (1988). Cross-dimensional interaction and texture segregation. Perception & Psychophysics, 43, 307-318. Pfefferbaum, A., Lim, K.O., Zipursky, R.B., Mathalon, D.H., Rosenbloom, M.J., Lane, B., Ha, C.N., & Sullivan, E.V. (1992). Brain gray and white matter volume loss accelerated with aging in chronic alcoholics: A quantitative MR] study. Alcoholism: Clinical and Experimental Research, 16, 1078-1089. Pierrot-Deseilligny, C., Rivaud, S., & Gaymard, B. (1991). Cortical control of reflexive visually-guided saccades. Brain, 114, 1473-1485. Pierrot-Deseilligny, C., Rivaud, S., Gaymard, B., Muff, R., & Vermersch, A.I. (1995). Cortical control of saccades. Annals of Neurology, 37, 557-567. Plude, D. & Doussard-Rossevelt, J. (1989). Aging, selective attention, and feature integration. Psychology and Aging, 4, 98-105. Plude, D.J., & Hoyer, W.J. (1986). Age and the selectivity of visual information processing. Psychology and Aging, 1, 4-10. Posner, M. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32, 3-25. Pratt, J. & Bellomo, C.N. (1999). Attentional capture in younger and older adults. Aging, Neuropsychology and Cognition, 6, 19-31. Rabbitt, P.M.A. (1965). An age decrement in the ability to ignore irrelevant information. Journal of Gerontology, 20, 233-237. Raz, N. (2000). Aging of the brain and its impact on cognitive performance: Integration of structural and functional findings. In F. Craik & T. Salthouse (Eds.), Handbook of aging and cognition. New Jersey: Erlbaum.
320
Kramer,Scialfa,PetersonandIrwin
Remington, R.W., Johnston, J.C., & Yantis, S. (1992). Involuntary attentional capture by abrupt onsets. Perception & Psychophysics, 51,279-290. Reuter-Lorenz, P.A., Jonides, J., Smith, E.E., Hartley, A., Miller, A., Marshuetz, C., & Koeppe, R.A. (2000). Age differences in the frontal lateralization of verbal and spatial working memory as revealed by PET. Journal of Cognitive Neuroscience, 12, 174-187. Rivaud, S., Muri, R.M., Gaymard, B., Vermersch, A.I., & PierrotDeseilligny, C. (1994). Eye movement disorders after frontal eye field lesions in humans. Experimental Brain Research, 102, 110-120. Roberts, R.J., Hager, L.D. & Heron, C. (1994). Prefrontal cognitive processes: Working memory and inhibition in the antisaccade task. Journal of Experimental Psychology: General, 123, 374-393. Rogers, W.A. (1992). Age differences in visual search: Target and distractor leaming. Psychology and Aging, 7, 526-535. Rogers, W.A., & Fisk, A.D. (1991). Are age differences in consistentmapping visual search due to feature leaming or attention training? Psychology and Aging, 6, 542-550. Rogers, W.A., Fisk, A.D., & Hertzog, C. (1994). Do ability-performance relationships differentiate age and practice effects in visual search? Journal of Experimental Psychology: Learning, Memory and Cognition, 20, 710-738. Salmon, E., Marquet, P., Sandzot, B., Degueldre, C., Lemaire, C., & Franck, G. (1991). Decrease of frontal metabolism demonstrated by positron emission tomography in a population of healthy elderly volunteers. Acta Neurologica Belqique, 91, 288-295. Salthouse, T.A. (1996). General and specific speed mediation of adult age differences in memory. Journal of Gerontology: Psychological Sciences, 51, 30-42. Salthouse, T.A. (1994). The aging of working memory. Neuropsychology, 8, 535-543. Salthouse, T.A., & Meinz, E.J. (1995). Aging, inhibition, working memory, and speed. Journal of Gerontology: Psychological Sciences, 50, 297-306. Salthouse, T. A, & Somberg, B. L. (1982). Skilled performance: Effects of adult age and experience on elementary processes. Journal of Experimental Psychology: General, 111, 176-207. Schall, J.D. (1995). Neuronal basis of saccadic target selection. Review in the Neurosciences, 6, 63-85. Scialfa, C. T., Esau, S. P., & Joffe, K. M. (1998). Age, target-distractor similarity, and visual search. Experimental Aging Research, 24, 337-358. Scialfa, C. T., Jenkins, L., Hamaluk, E. & Skaloud, P. (2000). Aging and the development of automaticity in conjunction search. Journal of Gerontology: Psychological Sciences, 55B, P 27-46. Scialfa, C. T. & Joffe, K. M. (1997). Age differences in feature and conjunction search: Implications for theories of visual search and generalized slowing. Aging, Neuropsychology, and Cognition, 4, 1 - 21.
Capture,ControlandAging
3 21
Scialfa, C.T., Thomas, D.M., & Joffe, K.M. (1994). Age differences in the Useful Field of View: An eye movement analysis. Optometry and Vision Science, 71, 1-7. Shaw, T., Mortel, K., Meyer, J., Rogers, R., Hardenberg, J. & Cutaia, M. (1984). Cerebral blood flow changes in benign aging and cerebrovascular disease. Neurology, 34, 855-862. Sheliga, B.M., Craighero, L., Riggio, L., & Rizzolatti, G. (1997). Effects of spatial attention on directional manual and ocular responses. Experimental Brain Research, 114, 339-351. Shiffrin, R.M., & Dumais, S.T. (1981). The development of automatism. In J.R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 111-140). Hillsdale, NJ: Erlbaum. Shimamura, A.P. & Jurica, P.J. (1994). Memory interference effect and aging: Findings from a test of frontal lobe function. Neuropsychology, 8, 408-412. Spieler, D.H., Balota, D.A., & Faust, M.E. (1996). Stroop performance in healthy younger and older adults an in individuals with dementia of the Alzheimer type. Journal of Experimental Psychology: Human Perception and Performance, 22, 461-479. Sullivan, M.P & Faust, M.E. (1993). Evidence for identity inhibition during selective attention in older adults. Psychology and Aging, 8, 589-598. Sullivan, M.P., Faust, M.E., & Balota, D.A. (1995). Identity negative priming in older adults and individuals with dementia of the Alzheimer type. Neuropsychology, 9, 537-555. Theeuwes, J. (1991). Exogenous and endogenous control of attention: The effects of visual onset and offsets. Perception & Psychophysics, 49, 83-90. Theeuwes, J. (1992). Perceptual selectivity for color and form. Perception
& Psychophysics, 51,599-606. Theeuwes, J., Kramer, A.F., Hahn, S. & Irwin, D.E. (1998). Our eyes do not always go where we want them to go: Capture of the eyes by new objects. Psychological Science, 9, 379-385. Theeuwes, J., Kramer, A.F., Hahn, S., Irwin, D.E. & Zelinsky, G.J. (1999). Influence of attentional capture on eye movement control. Journal of Experimental Psychology: Human Perception and Performance, 25, 1595-1608. Tipper, S. (1991). Less attentional selectivity as a result of declining inhibition in older adults. Bulletin of the Psychonomic Society, 29, 45-47. Todd, S. & Kramer, A.F. (1994). Attentional misguidance in visual search. Perception & Psychophysics, 56, 198-210. Tsang, P. (1996). Boundaries of cognitive performance as a function of age and flight performance. International Journal of Aviation Psychology, 6, 359-377. Verhaeghan, P., & De Meersman, L. (1998). Aging and the Stroop effect: A meta-analysis. Psychology and Aging, 13, 435-444. Verhaeghen, P., Kliegl, R. & Mayr, Y. (1997). Sequential and coordinative complexity in time-accuracy functions for mental arithmetic. Psychology and Aging, 12, 555-564.
322
Kramer, Scialfa, Peterson and Irwin
Walker, R., Husain, M., Hodgson, T., Harrison, J. & Kennard, C. (1998). Saccadic eye movement and working memory deficits following damage to the human prefrontal cortex. Neuropsychologica, 36, 1141-1159. Wamer, C.B., Juola, J.F. & Koshino, H. (1990). Voluntary allocation versus automatic capture of attention. Perception & Psychophysics, 48, 243-251. Waters, G.S. & Caplan, D. (2001). Aging, working memory, and on-line syntactic processing in sentence comprehension. Psychology and Aging, 16, 128144. Weale, R.A. (1986). Senescence and color vision. Journal of Gerontology, 41, 635-640. West, R.L. (1996). An application of prefrontal cortex function theory to cognitive aging. Psychological Bulletin, 120, 272-292. Wright, L.L., & Elias, J.W. (1979). Age differences in the effects of perceptual noise. Journal of Gerontology, 34, 704-708. Vakil, E., Manovich, R., Ramati, E., & Blachstein, H. (1996). The Stroop color-word test as a measures of selective attention: Efficiency in the elderly. Developmental Neuropsychology, 12, 313-325. Yantis, S. & Egeth, H. (1999). On the distinction between visual salience and stimulus-driven attentional capture. Journal of Experimental Psychology: Human Perception and Performance, 25, 661-676. Yantis, S. & Jonides, J. (1984). Abrupt visual onsets and selective attention: Evidence from visual search. Journal of Experimental Psychology: Human Perception and Performance, 1O, 601-621. Yantis, S. & Jonides, J. (1990). Abrupt visual onsets and selective attention: Voluntary versus automatic allocation. Journal of Experimental Psychology: Human Perception and Performance, 90, 121-134. Yantis, S. & Hillstrom, A. P. (1994). Stimulus-driven attentional capture: Evidence from equiluminant visual objects. Journal of Experimental Psychology: Human Perception and Performance, 20, 95-107. Zacks, R. & Hasher, L. (1997). Cognitive gerontology and attentional inhibition: A reply to Burke and McDowd. Journal of Gerontology: Psychological Sciences, 52, 274-283.
Acknowledgments This preparation of this chapter was supported by grants from the National Institute on Aging (RO 1 AG 14966) and the Institute for the Study of Aging. We would like to thank Jan Theeuwes and Charles Folk for their helpful comments on an earlier draft of this manuscript.
Part V
Individual Differences
This Page Intentionally Left Blank
Attraction, Distraction, and Action: MultiplePerspectiveson AttentionalCapture C. Folk and B. Gibson(Editors) 9 Elsevier Science B.V. All rights reserved.
13
325
A Multidisciplinary Perspective on Attentional Control Douglas Derryberry and Marjorie A. Reed
Recent years have seen an increasing interest in the higher level processes that influence attention. This trend is particularly evident in behavioral research on attentional capture, where capture by abrupt onsets and new objects appears to depend on top-down modulation by attentional control settings (e.g., Folk & Remington, 1999; Yantis & Egeth, 1999). It is also evident in neuropsychological investigations of the attentional problems that often arise from damage to the frontal cortex (e.g., Grafman, Holyoak & Boller, 1995). The study of such control processes is important to our understanding of attentional functioning, and moreover, to our understanding of the control of cognition in general. This chapter has three general goals. The first goal is to consider the role of motivational processes in regulating attention. Although motivation has received little emphasis in most models of cognition, its relation to attention is straightforward: Ongoing motivational states bias attention in favor of stimuli that are relevant to the current need (Derryberry & Tucker, 1993). Along these lines, we illustrate several examples of what might be referred to as "incentive capture". Attentional orienting is biased by the motivational valence (positive or negative) of potential target locations, as well as by the motivational "value" (low or high) for targets appearing at those locations. These effects may be distinct and involve different mechanisms from the usual forms of capture by abrupt onset or novel targets. The second goal is to consider potential mechanisms underlying motivational effects. Given the assumption that motivational processes exert both reactive and voluntary effects on attention, it is suggested that motivational effects may be mediated through several attentional systems. Specifically, the involuntary influences may bias posterior attentional operations involved in orienting, whereas voluntary effects are mediated via frontal executive operations (Posner & Raichle, 1994). For a person in an anxious state, for example, orienting may be reflexively biased in favor of locations carrying potential threats. Nevertheless, it is possible for the anxious person to voluntarily control this bias, allowing attention to shift to safer locations. The third goal is to consider individual differences, both at the level of motivational processes and attentional control processes. Motivational differences have long been emphasized in the field of personality, with variability related to dimensions such as Extraversion and Anxiety. In addition, recent models suggest
326
Derryberry and Reed
that attentional differences, particularly the capacity for voluntary self-control, are also of importance to personality. Thus, the person's capacity for self control depends upon the strength of reactive motivational tendencies in relation to their more voluntary attentional skills. People with good voluntary control of attention will be better able to regulate more reactive motivational tendencies, thereby enhancing their performance in many situations (Rothbart, Derryberry & Posner, 1994). The chapter begins with a brief review of the relations between motivation and attention and summarizes relevant behavioral and physiological evidence. Next, a temperament perspective is presented, emphasizing individual differences in the underlying motivational and attentional systems and their contribution to personality. Finally, several recent studies that address these processes are described. The studies suggest that attentional orienting, and therefore vulnerability to capture, depends on (1) the motivational valence or value attached to a location, (2) the individual's motivational tendencies related to trait anxiety and extraversion, and (3) the individual's capacity for voluntary attentional control. Relationships Between Motivation and Attention
Most psychological models view motivation as a set of processes that are initiated by current deficits or deviations related to the organism's appetitive and defensive needs. Traditional models assumed that the primary role of these motivational processes is to bias motor and autonomic responses. This led to models suggesting that motives functioned at a relatively high level to potentiate a set of responses (e.g., various forms of approach or avoidance) that might prove useful in satisfying the current need (Gallistel, 1980). Such hierarchical approaches were consistent with emerging knowledge of the brain, which emphasized descending control of the brainstem and spinal cord by motivational projections from the limbic systems. However, more recent models have gone beyond descending response modulation to propose that motivational processes will be more effective if they also modulate incoming sensory information. For example, defensive or security needs promote avoidance behavior, but at the same time, they facilitate the processing of information relevant to avoidance such as environmental sources of danger and safety. This view is supported by recent findings that motivational processes originating within the limbic system exert a variety of ascending influences on perceptual networks within the cortex. Thus, motivational states can be more thoroughly defined as organizing influences that arise from current deficits and function to modulate both perceptual and response pathways in order to link needed goals to adaptive behaviors (Derryberry & Tucker, 1992).
Multidisciplinary Perspectives
3 27
The modulation of perception is carried out through several mechanisms. The most general mechanism involves motivational adjustments of "arousal", thought to be carried out by projections from the limbic regions to the ascending subsystems of the reticular core. For example, Tucker and his colleagues have suggested that anxious states recruit ascending dopaminergic subsystems, which in tum promote a tonic activation of the cortex and a consequent narrowing of attention (Tucker, 1992; Tucker & Williamson, 1984). Such a general narrowing mechanism is supported by findings that anxious individuals show enhanced processing of central as compared to peripheral stimuli (e.g., Hockey, 1979) and local as opposed to global perceptual elements (Derryberry & Reed, 1998). More specific motivational influences may be carried out by relatively automatic pathway activation. Deutsch (1960) suggested that an animal's state of hunger functions by activating a memory representation of a spatial location associated with food. As activation spreads out and dissipates from that location, the animal uses the resulting "motivational gradient" as a navigational pathway for approaching and finding the food. Along similar lines, a defensive motivational state (e.g., fear) might function by directly activating locations related to danger and safety, thereby setting up an effective escape route. Such direct mechanisms are also consistent with anatomical evidence of direct connections from limbic to perceptual pathways. For example, Mesulam (1981) has emphasized interactions between a "motivational map" in the paralimbic cingulate cortex and a "sensory map" in the parietal neocortex. Still other models have suggested that a motivational regulation of perception should prove most flexible if mediated by attentional processes. In the case of hunger, for example, Wise (1987) suggested that hunger functions by increasing the attentional "holding power" of food objects. Rather than increasing the probability that an animal will make contact with a food object, hunger makes it harder for the animal to shift away. Thus, hunger prolongs eating periods without increasing their frequency. During defensive states such as anxiety, Gray proposed that attention is directed to relevant stimuli in the environment to promote an effective assessment of risk and optimal response selection. As discussed in more detail below, motivational effects on attention are supported by anatomical connections between limbic motivational circuitry to attentional networks within the posterior and anterior parts of the cortex. Also supportive are a number of human studies linking motivation and attention. Several studies using the Stroop task have found that when subjects are deprived of food, they are delayed in naming the color of food-related words, with the amount of delay correlating with their subjective report of hunger (Channon & Hayward, 1990). Similarly, studies using the dichotic listening task have found evidence for attentional biases when sexual words are presented, and enhancing the sexual motive through testosterone injections increases the bias favoring sexual
328
Derryberry and Reed
words (Alexander, Swerdloff, Wang, & Davidson, 1997). In addition, studies using the "dot-probe" task have found that when two words are simultaneously presented and followed by a detection target, trait anxious subjects are biased in favor of threatening words. This bias has been shown to increase as the anxious state increases, with students becoming more attentive to threatening words (e.g., test, flunk) immediately before an important exam (MacLeod & Mathews, 1988). While motivation models tend to view attention as a unified system, models from cognitive neuroscience indicate that attention arises from several interacting networks. This opens up the possibility that motivational effects may be exerted through different sets of attentional processes. For example, Posner and his colleagues suggest that attention involves interacting vigilance, posterior orienting, and anterior executive subsystems (Posner & DiGirolamo, 1998; Posner & Raichle, 1994; Posner & Rothbart, 1998a). The "vigilance system" is thought to involve ascending noradrenergic projections from the brainstem, and is thus similar to the relatively general arousal mechanisms described above. Single-cell studies indicate that the source cells of the noradrenergic system (the locus coeruleus) are highly responsive to the motivational significance of stimuli. In Posner's model, the vigilance mechanism functions to facilitate functioning of higher level attentional systems. The "posterior attentional system" involves a network interconnecting the parietal cortex, superior colliculus, and thalamic pulvinar nucleus. Its primary function is that of orienting attention from one location to another, for the most part in a relatively reflexive manner. Performance and neuropsychological data indicate that orienting involves three component operations: disengaging attention from the current location, moving to a new location, and engaging the new location. The engage operation facilitates the flow of information to frontal regions for further attentional and response-related processing. As noted above, motivationally significant objects can appear in different locations, and thus a mechanism for facilitating information in specific locations would seem highly adaptive. While the performance studies mentioned above suggest motivational influence on orienting (e.g., MacLeod & Mathews, 1988), the relevant anatomical connections remain obscure. Connections from limbic regions to the colliculus, pulvinar, and parietal cortex exist, but their roles have not been adequately characterized. The "anterior attentional system" involves frontal circuitry centering upon the anterior cingulate region. This system is thought to serve an "executive" function in regulating the posterior orienting system. In addition, the anterior system functions to inhibit dominant responses, to inhibit dominant conceptual associations, and to aid in the detection and correction of errors (Posner & DiGirolamo, 1998; Posner & Rothbart, 1998a). A similar "supervisory attentional system" involving related functions has been described by Shallice and his colleagues (Stuss, Shallice, Alexander & Picton, 1995). Motivational influences on the high level attentional
Multidisciplinary Perspectives
329
systems are supported by the extensive projections from limbic to frontal regions, which are particularly strong in the case of the "paralimbic" anterior cingulate region. Many anatomists have noted the close connectivity between the limbic and frontal regions, with some viewing the frontal lobe as the cortical representative of the limbic system (Nauta, 1971). Although clear experimental evidence is lacking, motivational influences at the executive level should be highly adaptive. For example, motivational states are often simultaneously active and may result in potential conflicts. Higher level mechanisms for controlling orienting and suppressing dominant responses may help one motive to suppress others and gain control of behavior. In addition, the anterior attentional system is closely interconnected with adjacent cortical regions that appear crucial to volition. Thus, this system may allow motivational systems to function in a more voluntary as opposed to reactive way. It can be seen from this brief overview that motivational effects extend beyond simple behavioral facilitation or increases in effort. Both psychological and physiological perspectives suggest that motivation may also play important roles in regulating attention. Perhaps one of the reasons that motivational processes have been neglected in much of psychology is that they show considerable variability across individuals. As will be seen in the next section, however, such individual differences provide a useful perspective for understanding the underlying mechanisms, as well as for appreciating their importance in every day life. Individual Differences in Motivation and Attention
The topic of individual differences in motivation has been addressed most directly by temperament approaches to personality. Temperament approaches attempt to relate major personality dimensions to individual differences in the reactivity of underlying neural systems. While some earlier models emphasize arousal systems (e.g., Eysenck, 1967), recent approaches have focused on systems related to motivation and emotion (e.g., Depue & Collins, 1999; Gray & McNaughton, 1996). An underlying assumption is that by playing a central guiding role in information processing, motivational processes are also fundamental to personality and its development (Derryberry & Reed, 1996). Although it may some day be possible to identify more specific motivational/personality relations, current approaches emphasize relatively general systems related to appetitive and defensive needs. Examples of appetitive systems include Gray's (1987) "behavioral activation system," Panksepp's (1998) "expectancy system," and Depue and Collins' (1999) "behavioral facilitation system." These systems are generally thought to receive information concerning positive incentives (e.g., signaling reward, non-punishment) by means of cortical projections to the amygdala and hypothalamus. Outputs from the limbic circuits
330
Derryberry and Reed
interact with ascending dopaminergic projections to facilitate the organization of approach behavior within the basal ganglia and frontal cortex. The resulting motivational state features attention to positive incentives, approach or exploratory behavior, and emotional feelings such as hope, relief, and anticipatory eagerness. Individual differences in the reactivity of the appetitive systems are most commonly related to the personality dimension of Extraversion. Appetitive motivation is viewed as increasing in strength as one moves from more introverted to extraverted individuals, though some models suggest that it will be greatest in extraverts who are also high in neuroticism. In contrast, the defensive systems respond to punishing inputs or to threatening signals that predict punishment, and promote avoidant or inhibited behaviors accompanied by fear or anxiety. Relevant information may be transmitted from the thalamus or the cortex, through which it engages limbic circuitry within the hippocampus, amygdala and hypothalamus, and also brainstem circuitry within the periaqueductal gray. Examples of systems responding to immediate threat include Panksepp's (1998) "fear" system and Gray's "fight-flight" system. Gray has also described a "behavioral inhibition system" that responds to predicted threat by inhibiting approach behavior and promoting an attentional pattern aimed at risk assessment (Gray & McNaughton, 1996). In general, the defensive systems are most often related to the general dimension of Neuroticism or more specific traits such as anxiety. Thus, the strength of defensive motivation increases as one moves from individuals low in Neuroticism to those high in Neuroticism, with anxiety perhaps being strongest in the more introverted neurotics. Temperament models emphasizing general appetitive and defensive systems provide a useful framework for understanding personality differences related to Extraversion and Neuroticism. Because the models incorporate motivational effects on behavior, attention, and emotion, predictions can be made regarding the emotional and behavioral differences across personalities. In addition, these models help in understanding clinical disorders arising from different patterns of appetitive and defensive reactivity. For example, clinical anxiety most likely involves particularly strong defensive motivation, whereas depression can be viewed as weak appetitive motivation (often accompanied by strong defensive reactivity). In contrast, impulsive disorders (e.g., anti-social behavior, psychopathy) may arise from strong appetitive motivation and/or weak defensive motivation (Fowles, 1994; Gray, 1994). Furthermore, given the assumption that the underlying motivational systems influence attention, it is possible to better understand the cognitive functioning of these various individuals. Anxious persons, for example, are particularly attentive to threatening information, which may in turn activate dangerous conceptual content and thus promote worrisome, ruminative, and even catastrophic forms of thought.
Multidisciplinary Perspectives
3 31
When such attentional processes are taken into account, however, it can be seen that differences in motivational systems provide only part of the personality picture. Because attentional systems are separable from motivational systems, individuals may also vary in the reactivity or efficiency of their attentional subsystems. If these subsystems are recruited by a motivational system, then the efficiency of the motivational function should depend on the efficiency of the recruited attentional function. In general, individuals with relatively weak voluntary attention may be prone to more inefficient or maladaptive motivation, whereas those with stronger attention should show more successful motivation. For example, successful defense requires that attention is allocated to the environmental sources of safety as well as the current threat. To the extent that an individual's attention allows them to disengage from threat and engage safety, they should be better able to remain in and cope with the stressful situation (e.g., Derryberry & Reed, 1996). Similarly, many appetitive situations require attention both to the reward and the potential obstacles that may block its pursuit. Individuals who cannot disengage from the reward may have difficulty dealing with these sources of frustration, and in the long run may fail to learn from their unsuccessful experiences (e.g., Newman, 1987). Also, in both defensive and appetitive situations, multiple threats or rewards may be present that vary in incentive value. If the person can flexibly shift attention in order to assess the relative importance of these incentives, they should be better able to select the one that is most rewarding or most dangerous. Such ideas have led theorists to propose that individual differences in attention may be pivotal to effective motivational functioning. Rothbart and her colleagues have proposed that "effortful control" serves a higher level function of regulating more reactive processes related to positive and negative motivation (Derryberry & Rothbart, 1997; Posner & Rothbart, 1998b; Rothbart et al., 1994). Variability in effortful control is thought to reflect functioning of Posner's anterior attentional system, including the voluntary control of posterior orienting and the inhibition of prepotent response tendencies. Developmental studies have shown that children with high effortful control (measured by parent report) show reduced frequencies of negative affect, a finding consistent with the idea that effective use of attention may help attenuate distress (Eisenberg, Fabes, Nyman, Bemzweig & Pinulas, 1994; Rothbart, Ziaie & O'Boyle, 1992). In addition, skillful use of attention appears important in the ability to suppress impulsive approach, as demonstrated in Mischel's studies of delay of gratification (e.g., Metcalfe & Mischel, 1999). Individual differences in effortful control are also related to socioemotional variables, correlating negatively with aggression and positively with empathy and conscience (Kochanska, Murray, Jacques, Koenig & Vandegeest, 1996; Rothbart, Ahadi & Hershey, 1994). While effortful attention may help regulate more reactive motives, it is important to avoid the notion of a single higher level executive that functions with
332
Derryberry and Reed
homuncular powers. It is thus worthwhile to reconsider the higher level processes that contribute to regulating anterior attentional functions. In terms of cognitive controls, the frontal and cingulate regions receive extensive perceptual and conceptual content arising from posterior sensory and association areas. Such afferents provide pathways through which various beliefs, expectancies, and metacognitive knowledge can influence executive attentional functions (e.g., Wells & Matthews, 1994). In terms of motivational controls, massive projections arising from limbic and paralimbic circuits (e.g., amygdala, hippocampus, hypothalamus, orbital and medial frontal ) converge on the attentional regions of the anterior cingulate cortex. These afferents may provide pathways through which motivational processes may recruit attentional functions, and thereby come to function in a more voluntary and less reactive way. This can be most easily seen in conflict situations, where several relatively reactive motives compete for the control of behavior. If one of these motives can gain access to high level control provided by the anterior attentional system, it will be at an advantage due to its enhanced capacity to suppress the orienting and response tendencies related to alternative motives. In most instances, such motives will work in conjunction with available conceptual information, such as the person's beliefs, strategies, and metacognitive knowledge. As an example, a person's motivation to suppress approach and resist temptation should be strengthened by the beliefs involving the costs of approach and the benefits of resistance.
Studies Relating Temperament, Motivation, and Attention Our research has attempted to address such processes through psychometric and performance measures. To assess individual differences in underlying motivational processes, we use standard scales thought to measure differences in appetitive and defensive motives. Measures of Extraversion and Impulsivity are taken to assess the strength of appetitive motivation, and measures of Trait Anxiety and Neuroticism to assess defensive motives. While the trait measures are assumed to reflect relatively tonic differences in motivational processes, we also manipulate more phasic motivational processes on a trial by trial or block by block basis. This is done by varying the incentive value of targets that appear in various locations. Some targets carry appetitive or positive value in the sense that fast (and correct) responses lead to an increase in points, whereas slow responses result in no loss of points. In contrast, defensive or negative targets lead to a loss of points if the response is slow, and no loss if the response is fast. Each reaction time (RT) is scored as "fast" or "slow" by comparing it to the participant's median RT on the last block of trials. Such a criterion gives rise to roughly equal numbers of fast and slow responses, and thus scores tend to stay close to zero.
Multidisciplinary Perspectives
333
Orienting and motivational valence An initial set of studies examined the effects of these trait and state variables on attentional orienting (Derryberry & Reed, 1994). The task was a modified spatial orienting task where positive or negative valences were attached to opposing locations. The basic trial display consisted of three outlined boxes, one located in the screen's center and the other two in the left and right visual fields. Incentive values were assigned to the left and right location by placing an arrow pointing up on one side (e.g., midway between the central and left box) and an arrow pointing down on the other (e.g., between the central and right box). The arrow pointing up indicated a positive value in the sense that fast responses to targets in that location would result in a gain of ten points. The arrow pointing down indicated a negative value in that slow responses to targets in that location would result in a loss of 10 points. Each trial began with a brightening of one of the three boxes. When the central box brightened, targets were equally likely to appear in either peripheral box. However, when one of the peripheral boxes brightened, 80% of the upcoming targets appeared in the cued box (i.e. valid cues) and 20% in uncued box (i.e., invalid cues). Targets appeared at SOAs of either 100 ms or 500 ms following the cue. They took the form of a small circle appearing in one of the two peripheral boxes and stayed on until the subject responded with a simple key press. Each response was immediately followed by a feedback signal indicating whether the response was fast or slow. Thus, the peripheral cues served to initiate orienting to a location carrying either a positive or negative incentive value, and the extent of such orienting could be examined by comparing RTs following valid and invalid cues across the two SOAs. Two motivational/personality effects were found in these studies. The first involved individuals scoring below the median in Extraversion and above the median in Neuroticism as measured by the Eysenck Personality Questionnaire. These neurotic introverts, who tend to be high in trait anxiety, showed an attentional bias at the short SOAs when the negative location, where points could be lost, was cued. This finding is consistent with others demonstrating enhanced attention to threat in anxious people (Wells & Mathews, 1994). The second effect involved impulsive individuals (i.e., neurotic extraverts), who showed a bias favoring positive cues, especially on trials following negative feedback. This is consistent with other evidence that extraverts enhance their approach motivation in aversive situations such as those involving punishment (Newman, 1987). These attentional biases did not arise from faster RTs to targets in cued locations, but instead involved slower RTs to targets in uncued locations; i.e., they were present on trials involving invalid but not valid cues. For example, anxious individuals were slower than low anxious persons in detecting targets in a location
334
Derryberry and Reed
opposite to a negative cue. These effects were assessed more precisely by using RTs following central cues (which initiated no pretarget orienting) to estimate the "benefits" of valid peripheral cues and the "costs" of invalid peripheral cues. Anxiety-related differences were found only in the costs data. Their absence in the benefits data suggests that anxiety does not promote a stronger automatic activation of the threatening lOcation, for such a direct activation might be expected to facilitate attentional movement toward the cued location. In addition, the lack of differences in benefits argues against a facilitation of the posterior "move" operation. Rather than suggesting enhanced orienting toward negative incentives, the increased costs found in anxious persons suggests a difficulty in disengaging attention from such stimuli. As discussed in more detail below, the delays in disengagement may arise from an incentive-related suppression of the "disengage" operation and/or an enhancement of the "engage" operation. Although this effect may not arise from the same mechanism involved in other studies of attentional capture, it illustrates one way in which attention can be "captured" or "held" by motivationally significant stimuli. In terms of higher level controls that adjust the settings, these effects are found regardless of whether the incentive cue is actually predictive of the target's location, and thus cognitive expectancies seem to play a minor role. What seems more influential are the tonic motivational processes related to Anxiety and Extraversion, interacting with phasic changes elicited by negative and positive cues. If motivational processes make it difficult to disengage from significant stimuli, one might expect tendencies to get locked into escalating emotional states. This is often the case for anxious people, who report getting stuck on a threatening stimulus (e.g., an angry look, a hazardous object) along with a consequent increase in anxiety and anxious cognition. However, effective coping with threatening situations often requires attention not only to a dangerous object, but also to the available sources of safety, escape routes, and so on. Anxious people may be at a specific disadvantage because they are unable to take advantage of such information, as a result of which they cannot prepare effective coping responses and fail to experience the relief and reassurance that such options can provide. In contrast, other individuals (and perhaps some anxious people) may be able to override this bias and disengage effectively. One mechanism that may allow such control is the anterior attentional system, especially through its capacity to regulate the posterior system's orienting. Our recent studies have therefore attempted to assess individual differences related to anterior attentional functioning. Because a variety of anterior functions have been proposed, we developed a general scale aimed at assessing overall differences in voluntary "Attentional Control." The scale consists of twenty items assessing the ability to focus attention and avoid distraction (e.g., "When I am reading or studying, I am easily distracted if there are people talking in the same
Multidisciplinary Perspectives
335
room"), to shift attention between tasks (e.g., "It is easy for me to read or write while I'm also talking on the phone"), and to flexibly control thought (e.g., "It is hard for me to break from one way of thinking about something and look at it from another point of view"). As can be seen, the items focus on attentional rather than behavioral processes, and are set within neutral contexts that avoid strong defensive or appetitive motivation. The Attentional Control scale is internally consistent (alpha = .85). Its relations to other scales are consistent with the idea that high attentional control helps to constrain Trait Anxiety (r=-.50) and to facilitate positive emotionality related to Extraversion (t=.30). In addition to the orienting effects described below, the Attentional Control scale has been found to predict performance in several studies focusing on response inhibition. Subjects high in attentional control show reduced response interference in a stimulus-response compatibility task, and fast stop times in a modified stop-signal task. Our first set of orienting studies modified the paradigm described above to examine differences between anxious people with good or poor attentional control (Derryberry & Reed, 2001). Rather than signaling the opportunity to gain or lose points, the pretarget cue signaled the probable outcome of the response. "Threat" cues (an arrow pointing down) informed subjects that targets appearing in that location would be difficult and result in a slow response 75% of the time. "Safe" cues (an arrow pointing up) indicated that targets in the cued location would be easy and result in a fast response 75% of the time. Targets appearing in the uncued side of the screen always carried a probable outcome opposite to those on the cued side. Thus, if an arrow pointing down appeared in the LVF, targets on the left would be difficult whereas those on the right would be easy, and subjects should view the left as the dangerous location and the right as the safe location. Targets were presented either 250 or 500 ms after the cue, and a central feedback signal was presented immediately after the response. Feedback following fast responses was signaled by an arrow pointing up and slow responses by an arrow pointing down, identical in form and color to the pretarget cues. Cutoffs for "fast" and "slow" responses were again based on the median RT from the previous block, but adjusted trial-by-trial in terms of the target's difficulty. Groups were based on median splits on the State Trait Anxiety Inventory and the Attentional Control Scale. When the target followed a threatening cue by 250 ms, all anxious subjects were slower than low anxious subjects in responding to targets in the uncued location. As can be seen in Figure 1, anxiety has no effect given safe cues, but given invalid threat cues (and thus targets in the uncued safe location), anxious subjects were delayed relative to low anxious subjects. This is consistent with our earlier findings, again indicating that anxious people are slow in disengaging from potentially threatening locations. At the 500 ms SOAs, however, individual
3 36
Derryberry and Reed
340
HA
HA
LA
LA
320 RT 300
280 i
V
i
I
i
V
!
I
Threat Safe Cue Valence Figure 1. Anxiety x Validity interactions at 250 ms SOAs for threatening (i.e., hard) and safe (i.e., easy) cues. V=valid cue; I-invalid cue; HA=high Trait Anxiety; LA=low Trait Anxiety (based on median split).
differences in Attentional Control became evident. Although all subjects showed a tendency to shift from the threatening to the safe location, such disengagement was least effective in anxious people with poor control. In contrast, anxious subjects with good control shifted away more effectively, equaling the performance of the low anxious groups. These interactions are illustrated in Figure 2. The anxietyrelated bias was thus limited to anxious people with poor control at long SOAs; anxious people with good control were able to shift from a cued threatening location to respond to a target at a safe location. This interaction was significant in multiple regression as well as analyses of variance, indicating that it is not a spurious effect arising from correlated personality variables. These findings are important in suggesting that our Attentional Control scale does tap individual differences related to executive attentional functions (i.e., the control of posterior orienting). Two aspects of the data are consistent with an anterior function. First, the more reactive influence of trait anxiety was evident at 250 ms., whereas the putative anterior influence became apparent at 500 ms. This is consistent with the idea that anterior intervention should take longer due to the greater time required for frontal processing. Second, the effect appears to involve a voluntary form of control. Specifically, the tendency to shift from the cued to the uncued side location was stronger when the cued location was threatening than safe,
Multidisciplinary Perspectives
3 37
340
320 HA
RT
HA
300 m.-...--.''~11 LA 280 I
L
V
1 Low
I
I
V
I High
ATTENTIONAL CONTROL Figure 2. Anxiety x Validity interactions at 500 ms SOAs for subjects low (left) and high (right) in Attentional Control (based on median split). V=valid cue; I=invalid cue; HA=high Trait Anxiety; LA=lowTrait Anxiety (based on median split).
a tendency most easily interpreted as a voluntary or strategic shift from the harder to the easier location. The underlying mechanism will be discussed in more detail later, but for now, the simplest account is that the anterior system sends a signal to the posterior system allowing attention to disengage from the cued location. This signal may be in some way stronger or faster in individuals with good attentional control. Its influence may not be apparent in low anxious subjects, who have no underlying difficulty in disengaging. But given the early impaired disengagement arising from anxiety, the more effortful influence can become manifest. Regarding the underlying motivational mechanisms, the simplest model would be one in which the functions of responding to threat and safety were carried out by a single defensive motivational system. If this were the case, then the detection of threat should occur early and result in a biasing of posterior orienting (e.g., enhancing engagement), as demonstrated at the short SOAs. The safety detection function would occur later, and lead to a recruitment of anterior functions to regulate orienting (e.g., suppressing engagement). In the present task, these defensive functions are presumably preset based on the degree of activation within the person's motivational system, though the relevant valence and location assignments must be reset on a trial by trial basis. Alternatively, it is also possible that the threat and safety functions are carried out by different motivational systems.
338
Derryberry and Reed
In a model such as Gray's (1987), for example, the "behavioral inhibition" and "behavioral activation" systems might work together in responding to threat and safety signals. The defensive system would function primarily through the posterior system and the safety system through the anterior system. Although the biasing functions could again be preset, more complex interactions between the two systems (e.g., reciprocal inhibition) would need to be considered. For now, the most important point concerns the implications for temperament and everyday performance. The ability to disengage and take advantage of other information is pivotal in coping with threat. When attention is strongly captured and held by a threatening input, coping options become limited and anxiety tends to increase. Often, the only options available are simple endurance or avoidance of the situation. Neither of these are particularly good options, for they limit the person's ability to learn from the situation. To the extent that the person can disengage, however, they can consider available sources of safety and alternative response options. If effectively carried out, these options should increase relief and decrease anxiety, thereby reducing the likelihood of maladaptive avoidance. Orienting and motivational value Another common type of conflict involves potential stimuli that vary in their value or importance rather than in their positive or negative valence. Specifically, we often face situations where we must select among stimuli with differing values, with some stimuli being more rewarding or punishing than others. Effective responding in such situations usually requires an effective distribution of attention between all of the stimuli that supports their evaluation. Otherwise, one could easily respond to a relatively trivial rather than important stimulus. This again seems like the type of situation in which more reactive motivational and voluntary attentional processes will come into play. Such processing has been examined in recent studies that held the target's valence constant across locations but varied the point value of potential targets (Derryberry & Reed, in preparation). Subjects altemated between positive and negative blocks of trials, on which they could gain points (for fast responses) and lose points (for slow responses), respectively. Within each block, each trial began with the appearance of two numbers that signaled the number of points that could be gained or lost if the target appeared in that location. One number always had higher value than the other (e.g., 8 vs 4), thereby defining its location as strategically the more important. Five hundred milliseconds after the numbers appeared, a cue was presented by tuming one number red. This cue signaled the probable location of the target, which appeared adjacent to the red number 75% of the time. Detection targets followed the location cues at SOAs of 250 or 500 ms, and each response was
Multidisciplinary Perspectives
33 9
immediately followed by a feedback signal. The cutoffs for fast or slow responses were again based on the median RT of the previous block of trials, and were equal for targets appearing in either location. This paradigm sets up a conflict between the two potential target locations, which should strategically be resolved in favor of the higher value location. However, a more interesting conflict arises between the point value of the location and the expected probability of a target. On the one hand, subjects should be motivated to attend to the location carrying the higher value, because it carries the possibility of gaining or losing a greater number of points. But the same time, they should also be motivated to attend to the cued location, because it is more likely to be targeted. One way of resolving this conflict would be to adjust the cued orienting based on the value of the cued location. In other words, a reasonable strategy would be to enhance orienting when the higher value location is cued and to attenuate orienting when the lower value location is cued. We expected that subjects with good attentional control would be better at making such trial by trial adjustments. This prediction was based on the notion that efficient anterior attentional functioning should provide greater flexibility in controlling posterior orienting. In addition, we expected that motivational biases related to traits of anxiety and extraversion would render some individuals particularly vulnerable to high value negative or positive targets. The results showed that subjects were generally capable of making strategic adjustments in orienting. Most showed a larger orienting effect (i.e., the difference between targets at cued compared to uncued locations) when the higher value location was cued. Along with its strategic nature, the fact that this effect increased from the 250 ms to 500 ms SOA is consistent with an anterior regulation of posterior orienting. Also consistent with our predictions, the strategic adjustment depended on Attentional Control. As seen in Figure 3, which graphs the data for 500 ms SOAs, poor attenders show very little if any strategic adjustment. In comparison, good attenders show stronger orienting to high value cues and weaker orienting to low value cues. To further explore these effects, we performed multiple regression analyses predicting RTs separately for trials involving valid and invalid cues. These more precise analyses indicated that the difference between good and poor attenders involved invalid trials given both low and high value cues; that is, good attenders were relatively fast to shift from a low value cue to a target in the high value location, and slow to shift from a high value cue to a low value target (see Figure 3). The involvement of uncued targets is similar to our earlier findings, and again suggests a modulation of the ease with which attention is disengaged from the cued location. In this case, good attenders are delayed in disengaging from high value locations and fast in shifting from low value locations. This is a reasonable strategy in this task, allowing good attenders to score more points than poor attenders.
340
Derryberry and Reed
Also of interest was a separate, noninteracting effect of Extraversion. Both introverts and extraverts showed stronger orienting given high value cues, but the adjustment was stronger in extraverts. The Extraversion influence appears similar to that of Attentional Control, but regression analyses suggest that a different mechanism may be involved. While the Attentional Control effects arose on invalid trials, the difference between introverts and extraverts was limited to valid trials; that is, extraverts were faster than introverts in responding to targets in cued high value locations. More research will of course by needed to differentiate the extraversion and attentional control effects. At this point, however, we suspect that the extraversion effect may involve a facilitation of response processing given high value cues. This interpretation is based on a follow-up study in which numbers were presented as targets adjacent to the cue numbers, and subjects responded only if the two numbers matched. Extraverts again showed faster responses to targets adjacent to high value cues, but they also made more errors in responding to nonmatching targets adjacent to high value cues. This speed-accuracy tradeoff suggests facilitated responding to potential targets in high value locations.
300 High
_
Low
280
Low _
RT
High
260
240 I
I
V
I
Low
i
i
V
I High
ATTENTIONAL CONTROL Figure 3. Cue Value x Validity interactions at 500 ms SOAs for subjects low (left) and high (right) in Attentional Control (based on median split). V=valid cue; I=invalid cue; Low = low value cue; High = high value cue.
The extraversion effect is also of interest in that it does not fit neatly within the types of motivational processes typically related to Extraversion. Most models suggest that extraversion involves appetitive processes in response to rewarding or relieving cues. However, Depue and Collins (1999) describe a "behavioral
Multidisciplinary Perspectives
341
facilitation system" thought to underly extraversion. A key component of this system is a "motive circuit" focused on the nucleus accumbens that computes the motivational value of incentive stimuli and adjusts the intensity of the response accordingly. It is possible that such circuitry gives extraverts an advantage in computing the relative value of contextual stimuli and/or facilitating responses.
Summary and Conclusions Our studies demonstrate several types of motivational processes that contribute to the control of spatial orienting. The studies featuring dangerous versus safe targets illustrate the role of stimulus valence (negative versus positive), while the studies with high and low value targets illustrate the role of stimulus value. The findings are consistent with earlier studies demonstrating attentional biases favoring motivationally significant stimuli, as well as physiological evidence linking motivational and attentional circuits. In general, it makes considerable sense that motivational processing has evolved to promote adaptive behavior, which should clearly benefit from the use of attention. Somewhat less intuitive is the nature of the motivational effect. Although our findings are generally consistent with the idea that attention is captured by significant stimuli, this "motivational capture" does not seem to arise from an attraction of attention to a particular location. Rather than enhancing the "attracting power" of potential locations, motivational processes appear to regulate the "holding power" of such locations once they are engaged. This may seem counterintuitive in that we often think of motivation as promoting an active search for specific goals. However, a mechanism specific to regulating holding power would be adaptive in allowing initial attentional movement to remain free of bias so that it can move effectively to all locations. Once these locations are engaged, their motivational relevance can be evaluated more thoroughly, and the holding power can be adjusted accordingly. Rather than influencing the frequency with which significant objects are attended, such a mechanism would influence the duration of attention, as first suggested by Wise (1987) in regard to hunger and food objects. More specifically, an increase in holding power could arise from a motivational enhancement of the Posner's engage operation or an attenuation of the disengage operation. We have recently completed several preliminary studies suggesting that the primary influence may involve the engage operation. Peripheral cues were presented that predicted the location of the upcoming target (on either the cued on uncued side), thereby motivating subjects to engage or disengage the location of the valenced cue. Anxious subjects showed delays in shifting from the negative cues, but only when that location was engaged. When the negative cue informed subjects to disengage and shift to the other side of the screen, anxious and low anxious subjects were equally fast to disengage. If the negative cue functioned
342
Derryberry and Reed
only to suppress the disengage operation, such voluntary disengagement should have been delayed. Clearly, more research is required to isolate the focus of the motivational effect. Nevertheless, a regulation specific to the engage operation would be adaptive in several ways. First, the facilitated processing that results from the enhanced engagement should promote more effective evaluation and response selection. Second, by making it more difficult to inadvertently disengage, the enhanced engagement may protect the individual from unintentional disengagement and distraction. When dealing with imminent danger, for example, distraction by irrelevant stimuli can interfere with effective escape or avoidance responses. Third, if the disengage operation is not directly suppressed, it can remain relatively open to more voluntary forms of control. For example, an anxious person may engage a threatening stimuli quite strongly, but may still be able to voluntarily activate the disengage operation in order to shift to a source of safety. This of course does not rule out to possibility that voluntary disengagement may also be promoted by an inhibition of the engage function. The results are also consistent with temperament models that emphasize individual differences in motivational and attentional processes (e.g., Rothbart et al., 1994). The dimension of trait anxiety appears to bias attention in favor of threatening information, as might be expected given high tonic activity within a defensive motivational system (e.g., Gray's behavioral inhibition system). Such a bias may be adaptive in the sense that it facilitates information that is clearly important. However, if it makes it difficult for the anxious person to disengage, then they will be at a disadvantage when processing other crucial information relevant to safety and coping options. Not only will this bias lead to increasing anxiety within a situation, but across time, it may lead to the development of perceptual and conceptual systems that emphasize threat at the expense of safety. Thus, anxious individuals may come to construe the world as a dangerous place and themselves as vulnerable (Derryberry & Reed, 1996). In contrast, our studies suggest that extraversion is associated with motivational circuitry that assesses the relative value of both positive and negative information (e.g., Depue and Collins' behavioral facilitation system). In several models, extraversion has been related to motivational systems that respond to positive incentives by promoting approach behavior (e.g., Gray's behavioral activation system). Our earlier studies found evidence for attentional biases favoring reward in tasks where positive and negative incentives varied randomly within a block of trials (Derryberry & Reed, 1994). However, the present studies blocked the positive and negative incentives, and extraverts showed no biases favoring positive cues. Instead, they were relatively fast given targets in high value locations, regardless of whether points could be gained or lost. In addition, we suspect that extraverts' bias may involve response-related rather than more purely
Multidisciplinary Perspectives
343
attentional processes. As mentioned earlier, this sort of value extraction processing is consistent with Depue and Collin's (1999) behavioral facilitation system, which might be viewed as functioning under both appetitive and defensive conditions to promote strong approach behavior given reward and active avoidance behavior given danger. Such rapid approach behaviors will give extraverts an advantage in situations that offer targets of varying value and requiring rapid responses. For example, extraverts may excel in social contexts because in part because they recognize opportunities and respond quickly (e.g., Matthews, 1997). But at the same time, there may be disadvantages related to approach responses that are in some way inappropriate. Perhaps of greatest interest in the present context are the individual differences in attentional control. Anxious subjects with good control were better able than those with poor control to disengage from a threatening location and respond to a target in a safe location. In addition, all subjects with good attentional control were better able to adjust orienting in favor of high value targets. Such flexible and adaptive control is consistent with one of the functions of Posner's anterior attentional system, namely the control of posterior orienting. Assuming that reactive motivational influences enhance posterior engagement, the presumably voluntary functions mediated through the anterior system may operate in several ways. The anxiety interaction in the valence studies may involve an inhibition of the engage mechanism that allows faster disengagement to safe locations. The general effect in the value study may involve an enhancement of the engage operation leading to slower disengagement to low value locations. In addition, the anterior system may modulate the disengage operation more directly, or even the perceptual input to the disengage operation. But whatever mechanisms are involved, the extent of "capture" by motivational stimuli appears to depend on individual differences in the ability to employ strategic attentional processing as well as motivation. We have also run additional studies examining other anterior functions related to response processing. In a stimulus-response compatibility task, subjects responded to arrows pointed right and left appearing in the right and left visual field. Anxious subjects with good attentional control were better able than those with poor control to suppress the dominant response tendency to respond with the hand corresponding to the target's irrelevant spatial location. In a stop-signal task, subjects high in impulsivity were slower to stop their responses, but only if they were also low in attentional control. These findings suggest that the Attentional Control scale taps individual differences in response inhibition as well as orienting functions attributed to frontal attentional systems. More research will be needed to explore the attentional differences related to other executive functions. A particularly important anterior function is that of inhibiting dominant conceptual associations. Individual differences in such a
344
Derryberry and Reed
function could be crucial to personality, which often involves relatively automatic and chronic ways of thinking. For example, trait anxiety involves dominant appraisal and attributional patterns that tend to exacerbate threat and undermine perceived control (Mineka & Zinbarg, 1996), self-evaluative processing that emphasizes punishment for shortcomings (Higgins, 1996), and general tendencies to focus negatively on the self (Matthews, 1997; Wells & Matthews,1994). Anxious people report that they often get caught up in worrisome and ruminative thought, leading to considerable interference in their daily lives. If the person can inhibit such dominant thought tendencies, they should be able to constrain the associated feelings of anxiety, and to hopefully come up with a more optimistic or controllable view of the world. It is interesting that recent therapeutic approaches have recognized the role of attentional differences, and various training techniques are being developed (e.g., Wells & Matthews, 1994). Also interesting is the fact that many pharmacological treatments, such as the monoaminergic anti-depressants, function in part by facilitating neurochemical systems involved in attention. Additional executive functions are those involved in working memory. The notion that motivation functions to enhance the "holding power" of significant information is generally consistent with working memory models that emphasize the maintenance of task goals in the face of distraction. Conway and Kane (this volume) review a series of studies indicating that individuals differences in working memory capacity are related to differences in attentional control. It makes good sense that motivational and attentional processes interact in maintaining working memory, rendering individuals more or less vulnerable to distraction and capture. A simple example would be students prone to test anxiety, who often report difficulty in staying on task due to their distraction by thoughts about failure. In conclusion, this chapter has provided a multidisciplinary perspective on attentional capture that emphasizes the influence of reactive motivational and voluntary attentional processes. The relation between these processes and others related to abrupt onsets and novelty marks an important area for future research. What all of these processes have in common, however, is a concern with the potential importance of incoming information. By investigating the various ways in which importance arises, as well as the ways it varies across individuals, we should be in a better position for understanding attention and its control. References
Alexander, G. M., Swerdloff, R.S., Wang, C.W., Davidson, T. (1997). Androgen-behavior correlations in hypogonadal men and eugonadal men: I. Mood and response to auditory sexual stimuli. Hormones & Behavior, 31, 11 O-119.
MultidisciplinaryPerspectives
345
Channon, S., & Hayward, A. (1990). The effect of short-term fasting on processing of food cues in normal subjects. International Journal of Eating Disorders, 9, 447-452. Depue, R. A., & Collins, P. F. (1999). Neurobiology of the structure of personality: Dopamine, facilitation of incentive motivation, and extraversion. Behavioral and Brain Sciences, 22, 521-555. Derryberry, D., & Reed, M. A. (1994). Temperament and attention: Orienting toward and away from positive and negative signals. Journal of Personality and Social Psychology, 66, 1128-1139. Derryberry, D., & Reed, M. A. (1996). Regulatory processes and the development of cognitive representations. Development and Psychopathology, 8, 215-234. Derryberry, D., & Reed, M. A. (1998). Anxiety and attentional focusing: Trait, state and hemispheric influences. Personality and Individual Differences, 25, 745-761. Derryberry, D., & Rothbart, M. K. (1997). Reactive and effortful processes in the organization of temperament. Development and Psychopathology, 9, 633652. Derryberry, D., & Tucker, D. M. (1992). Neural mechanisms of emotion. J. consult, clin. Psychol., 60, 329-338. Derryberry, D., & Tucker, D. M. (1993). Motivating the focus of attention. In P. Niedenthal & S. Kitayama (Eds.), The heart's eye: Emotional influences in perception and attention, (pp. 170-196). San Diego, CA: Academic Press. Deutsch, J. A. (1960). The structural basis of behavior. Chicago: University of Chicago Press. Eisenberg, N., Fabes, R. A., Nyman, M., Bemzweig, J., & Pinulas, A. (1994). The relations of emotionality and regulation to children's anger-related reactions. Child Development, 65, 109-128. Eysenck, H. J. (1967). The biological basis of personality. Springfield, Illinois: Thomas. Folk, C. L., & Remington, R. (1999). Can new objects override attentional control settings? Perception & Psychophysics, 61,727-739. Fowles, D. C. (1994). A motivational theory of psychopathology. In W. G. Spaulding (Ed.), Nebraska symposium on motivation, Vol. 41: Integrative views of motivation, cognition, and emotion, (pp. 181-238). Lincoln, Nebraska: University of Nebraska Press. Gallistel, C. R. (1980). The organization of action: A new synthesis. Hillsdale, New Jersey: Erlbaum. Grafman, J., Holyoak, K. J., & Boller, F. (Eds.). (1995). Annals of the New York Academy of Sicences, Volume 769. Structure and functions of the human prefrontal cortex. New York: New York Academy of Sciences.
346
Derryberry and Reed
Gray, J. A. (1987). Perspectives on anxiety and impulsivity: A commentary. Journal of Research in Personality, 21,493-509. Gray, J. A. (1994). Framework for a taxonomy of psychiatric disorder. In S. H. M. van Goozen, N. E. Van de Poll, & J. A. Sergeant (Eds.), Emotions: Essays on emotion theory, (pp. 29-60). Hillsdale, NJ: Erlbaum. Gray, J. A., & McNaughton, N. (1996). The neuropsychology of anxiety: Reprise. In D. A. Hope (Ed.), Nebraska Symposium on Motivation: Perspectives on anxiety, panic, and fear. Volume 43., (pp. 61-134). Lincoln, Nebraska: University of Nebraska Press. Higgins, E. T. (1996). Ideals, oughts, and regulatory focus: Affect and motivation from distinct pains and pleasures. In P. M. Gollwitzer & J. A. Bargh (Eds.), The psychology of action." Linking cognition and motivation to behavior, (pp. 91-114). New York: Guilford. Hockey, R. (1979). Stress and the cognitive components of skilled performance. In V. Hamilton & D. M. Warburton (Eds.), Human stress and cognition: An information processing approach, (pp. 141-177). New York: Wiley. Kochanska, G., Murray, K., Jacques, T. Y., Koenig, A. L., & Vandegeest, K. A. (1996). Inhibitory control in young children and its role in emerging internalization. Child Development, 67, 490-507. MacLeod, C., & Mathews, A. (1988). Anxiety and the allocation of attention to threat. Quarterly Journal of Experimental Psychology, 40, 653-670. Matthews, G. (1997). Extraversion, emotion and performance: A cognitiveadaptive model. In G. Matthews (Ed.), Cognitive science perspectives on personality and emotion, (pp. 399-442). Amsterdam: Elsevier. Mesulam, M. M. (1981). A cortical network for directed attention and unilateral neglect. Annals of Neurology, 10, 309-325. Metcalfe, J., & Mischel, W. (1999). A hot/cool-system analysis of delay of gratification: Dynamics of willpower. Psychological Review, 106, 3-19. Mineka, S., & Zinbarg, R. (1996). Conditioning and ethological models of anxiety disorders: Stress-in-dynamic context anxiety models. In D. A. Hope (Ed.), Nebraska Symposium on Motivation, Volume 43: Perspectives on anxiety, panic, and fear., (pp. 133-210). Lincoln, Nebraska: University of Nebraska Press. Nauta, W. J. H. (1971). The problem of the frontal lobe: A reinterpretation. Journal of Psychiatric Research, 8, 167-187. Newman, J. P. (1987). Reaction to punishment in extraverts and psychopaths: Implications for the impulsive behavior of disinhibited individuals. Journal of Research in Personality, 21,464-480. Panksepp, J. (1998). Affective Neuroscience. New York: Oxford. Posner, M. I., & DiGirolamo, G. J. (1998). Executive attention: conflict, target detection and cognitive control. In R. Parasuraman (Ed.), The attentive brain, (pp. 401-423). Cambridge, MA: MIT Press.
Multidisciplinary Perspectives
347
Posner, M. I., & Raichle, M. E. (1994). Images of mind. New York: Scientific American Library. Posner, M. I., & Rothbart, M. K. (1998a). Attention, self-regulation and consciousness. Philosophical Transactions of the Royal Society of London B, 353, 1915-1927. Posner, M. I., & Rothbart, M. K. (1998b). Attention, self-regulation and consciousness. Philosophical Transactions of the Royal Society of London B, 353, 1915-1927. Rothbart, M. K., Ahadi, S. A., & Hershey, K. L. (1994). Temperament and social behavior in childhood. Merrill-Palmer Quarterly, 40, 21-39. Rothbart, M. K., Derryberry, D., & Posner, M. I. (1994). A psychobiological approach to the development of temperament. In J. E. Bates & T. D. Wachs (Eds.), Temperament: Individual differences at the interface of biology and behavior, (pp. 83-116). Washington, D. C.: American Psychological Association. Rothbart, M. K., Ziaie, H., & O'Boyle, C. (1992). Self-regulation and emotion in infancy. In N. Eisenberg & R. A. Fabes (Eds.), Emotion and selfregulation in early development: New directions in child development, (pp. 7-24). San Francisco: Jossey-Bass. Stuss, D. T., Shallice, T., Alexander, M. P., & Picton, T. W. (1995). A multidisciplinary approach to anterior attentional functions. In J. Grafman, K. J. Holyoak, & F. Boller (Eds.), Annals of the New York Academy of Sciences, Volume 769. Structure and functions of the human prefrontal cortex,. New York: The New York Academy of Sciences. Tucker, D. M. (1992). Developing emotions and cortical networks. In M. Gunnar & C. A. Nelson (Eds.), Minnesota Symposium on Child Psychology. Vol. 24. Developmental behavioral neuroscience, (pp. 75-128). Hillsdale, N. J.: Erlbaum. Tucker, D. M., & Williamson, P. A. (1984). Asymmetric neural control systems in human self-regulation. Psychological Review, 91, 185-215. Wells, A., & Matthews, G. (1994). Attention and emotion: A clinical perspective. Hillsdale, NJ: Erlbaum. Wise, R. A. (1987). Sensorimotor modulation and the variable action pattern (VAP): Toward a noncircular definition of drive and motivation. Psychobiology, 15, 7-20. Yantis, S., & Egeth, H. E. (1999). On the distinction betwewen visual salience and stimulus-driven attentional capture. Journal of Experimental Psychology: Human Perception and Performance, 25, 661-676.
This Page Intentionally Left Blank
Attraction, Distraction,and Action: MultiplePerspectiveson AttentionalCapture C. Folk and B. Gibson(Editors) 02001 ElsevierScience B.V. All rights reserved.
14
349
Capacity, Control and Conflict: An Individual Differences Perspective on Attentional Capture Andrew R. A. Conway and Michael J. Kane
Webster's dictionary defines capture as "the act of catching or gaining control by force, stratagem, or guile." A clear example of capture is a military coup, such as the North Viemamese takeover of South Vietnam in 1975. But what does it mean to capture attention? The phrase "attentional capture" suggests that some "thing" is being captured and that control has been displaced. It also suggests that the "thing" is limited in supply. After all, capture by force would not be necessary if there was an unlimited supply of the "thing." Central then to the study of attentional capture are the issues of capacity, control and conflict. Capacity, control and conflict (or interference) have been fundamental issues in the study of attention and memory since the cognitive revolution of the 1950s. In that time, most information-processing theories have incorporated a limited capacity system and a mechanism or process of control, and implicit is the notion that conflict is resolved by the system. For example, Broadbent's (1958) model of selective attention posited an "early" filter in order to constrain the amount of information that passed to the limited capacity channel that sat just beyond the selective filter. Thus, there was a limited capacity channel and a filter that controlled the flow of information and in so doing resolved perceptual conflict or interference. Another example, from the memory literature, is Atkinson and Shiffrin's (1968) model of memory, in which there was a limited capacity short-term store and ill-specified control processes that managed the flow of information and resolved interference. Despite these influential cognitive models and half a century of experimental investigations of the processes and mechanisms involved in memory and attention, the precise relationship between capacity and control has yet to be understood. In this chapter we present an individual differences perspective on attentional capture. We will suggest that individuals with greater working-memory capacity (WMC) exhibit greater attentional control than do individuals with lesser WMC. In doing so, we will present a theory of memory and attention, according to which: (1) working memory is a system responsible for the maintenance of goalrelevant information in the face of concurrent processing, (2) individual differences in WMC correspond to the ability to maintain goal-relevant information, especially in contexts providing sources of competition or interference with that goal, and (3) this maintenance ability determines susceptibility to attentional capture. After a
3 50
Conway and Kane
brief review of the development of the concept of WMC, we will present evidence from a series of experiments investigating individual differences in WMC and attentional control across a variety of tasks. Our approach to exploring attentional control and capture is different from most discussed in this book, particularly insofar as our research represents a blend of experimental and differential psychology. Rigorous experimental method is necessary to explore the subtle complexities of attentional control and we therefore make use of classic experimental paradigms such as dichotic listening and the Stroop (1935) color-word task. Yet our objective is also to understand individual differences in WMC and cognitive ability more broadly, and therefore we apply ideas and statistical procedures borrowed from psychometrics. We suggest that combining these two "disciplines of scientific psychology" (Cronbach, 1957) will result in a richer understanding of capacity and control processes in attention and memory. Such a dual approach also lends itself to further exploring the increasingly evident importance of WMC and attention control to complex cognitive capacities such as reasoning and intelligence (e.g., Carpenter, Just, & Shell, 1990; Dempster, 1991; Engle, Tuholski, Laughlin & Conway, 1999; Just & Carpenter, 1992; Kyllonen & Christal, 1990; Miyake, Friedman, Emerson, Witzki, Howerter, & Wagner, 2000). Thus, while some chapters in this book present a detailed analysis of attentional capture within a single paradigm, our broad interests in attention control lead us to explore vulnerability to capture across a variety of experimental tasks. Before we discuss the empirical evidence linking WMC to attentional control, it is necessary to review the development of the concept of working memory and the development of our particular theory of individual differences in WMC. We therefore begin with a brief review of working memory research, paying close attention to notions of capacity and control.
Working Memory and Working Memory Capacity In 1974, Baddeley and Hitch began their now seminal chapter on working memory with the statement, "Despite more than a decade of intensive research on the topic of short-term memory (STM), we still know virtually nothing about its role in normal human information processing" (p. 47). One of the motivating forces behind this statement was the collection of findings suggesting that short-term memory capacity is not a good predictor of more complex cognitive behavior, such as reading comprehension or problem solving. This was inconsistent with the modal model of memory (e.g. Atkison & Shiffrin, 1968), which conceived of the shortterm store as the gateway to the information processing system. The notion that the capacity of the gateway had little impact on the general performance of the system was clearly problematic (see also Crowder, 1982). Baddeley and Hitch (1974) argued that measures of short-term memory capacity, such as the digit span task, are not predictive of more general cognitive behavior because such tasks only tap a passive storage buffer. Baddeley and Hitch
Individual Differences
3 51
argued that cognitive behavior is typically more dynamic than static, with maintenance of active memories required in the face of concurrent processing. They therefore proposed a system called "working memory," which is responsible for active maintenance of information in the service of more complex cognition. Their structural model of working memory consisted of a central executive and two storage buffers, the phonological loop for verbal information and the visuo-spatial sketchpad for spatial information. The role of the central executive was not clearly defined initially but it was supposed that the executive was responsible for coordination, integration, and control processes.
Measurement and theories of working memory capacity Baddeley and Hitch (1974) further proposed that working memory is a limited capacity system and that this capacity constrains cognitive performance. An open question following the publication of their chapter was how to assess this capacity, given that simple span tasks such as digit span tapped only the storage aspect of the working memory system. A task was needed that not only required storage but also concurrent information processing. Indeed, it was several years before Daneman and Carpenter (1980) introduced the first task designed to measure WMC. Their reading span task required subjects to read sentences and remember the last word of each sentence for later recall. The number of sentences presented before each recall cue varied, typically from 2 - 6. The largest such series for which a subject could read each sentence and recall all the sentence-final words was scored as that subject's working-memory span or WMC. Notice that this task requires not only a storage function- maintaining the sentence final w o r d s - but it also requires the simultaneous reading of each sentence. Such simultaneous processing is the hallmark of the working memory system, as defined by Baddeley and Hitch (1974), and it is now incorporated into most, if not all, measures of WMC. In contrast to prior research attempting to link measures of immediate memory to higher-order cognition, Daneman and Carpenter (1980) found that the reading span measure predicted Verbal Scholastic Aptitude Test (VSAT) scores, and it did so much better than did a simple word span task. 1 At first glance, the fact that the reading span task predicts the VSAT may not be surprising. After all, the processing component of the reading span task is reading sentences! Thus, one might argue that better readers have more time or resources to devote to the storage component of the task and therefore score higher on the span task. According to such an argument, a skill or ability that is specific to the processing component of the span task accounts for its relation to the VSAT. An alternative account, however, is that both the span task and the VSAT tap a general process or ability that is not specific to the processing component of the span task. We refer to these two alternatives as the domain-specific and the domain-general views, respectively. A number of findings support the domain-general view of the relation between span measures and measures of more complex cognition. For example, Turner and Engle (1989) developed the operation span task, which is similar to the
352
Conway and Kane
reading span task, except that instead of reading sentences, the subject is required to solve mathematical operations. Thus, a math problem and a word are presented together (e.g., IS (6+4) / 2 = 5 ? TREE) and the subject must solve the math problem and attempt to remember the word for later recall. The number of operation-word pairs per series varies and working-memory span or capacity is defined as the largest series for which the subject can correctly solve the math problems and remember all the words. Turner and Engle found that the reading-span and operation-span tasks correlate equivalently with the VSAT, and furthermore, that the two measures account for the same variance in VSAT. Such findings are clearly not consistent with a strong version of the domain-specific view outlined above, for the operation span task does not involve reading comprehension per se. Further support for the domain-general view comes from a series of experiments by Engle, Cantor, and Carullo (1992). Engle et al. had subjects perform both operation span and reading span, and they recorded a number of dependent measures from each, including the time spent viewing each portion of the processing component (i.e. time spent reading each word, time spent viewing each component of the operation). They also measured the time to read sentences and solve operations without the added requirement of recall. Partial correlations revealed that none of these task-specific (or potentially strategic) measures accounted for the correlation between span and VSAT. That is, the relation between WMC and higher-order verbal ability was not due to specific skills or strategies operating within the working-memory span tasks, themselves (see also Conway & Engle, 1996). In 1992, two influential theories of a domain-general WMC were published (Engle, Cantor & Carullo, 1992; Just & Carpenter, 1992). These general capacity theories proposed that language comprehension and other complex cognitive tasks are constrained by the amount of activation available to the cognitive system. Moreover, WMC represents that total amount of activation. Cantor and Engle (1993) provided key empirical support for these general capacity theories of working memory. Applying the logic of spreading-activation models of cognition (e.g., Anderson, 1983), Cantor and Engle reasoned that if WMC were equivalent to the total amount of activation available to the cognitive system, then individuals who differ in WMC should also differ in tasks that tap the spread of this general activation. Cantor and Engle tested this prediction by examining individual differences in the fan effect (Anderson, 1974). The fan effect is demonstrated in experiments that require subjects to memorize a large number of sentences and then verify their memory of the sentences. Each subject memorizes a number of sentences that take the form, "The person is in the place" (e.g., "The lawyer is in the park"). The number of locations associated with each person typically varies from 1 - 6, and this is referred to as the "location fan." Also, the number of people associated with each location is varied and is referred to as the "person fan." In the test phase, sentences are presented individually and the subject must verify, as quickly and accurately as possible, whether the sentence had been studied or not. The fan effect refers to the finding
Individual Differences
3 53
that verification time and accuracy are a function of both location-fan and personfan, with reaction times and error rates increasing with fan size (there are exceptions, however; for updated reviews of the fan paradigm see Anderson & Reder, 1999; Radvansky, 1999). To test whether individuals with lesser WMC would reveal a more dramatic fan effect than would individuals with greater WMC, Cantor and Engle (1993) assessed each subject's WMC with the operation span task, identifying the upper and lower quartile of the distribution as high and low WMC, respectively. They then compared high and low WMC subjects' performance on the fan task. As predicted, low-WMC subjects showed a more dramatic fan effect than did highWMC subjects. Moreover, when the slope of the fan effect was statistically partialed out of the significant correlation between operation span and VSAT scores, the correlation disappeared. These findings supported the notion that individual differences in WMC correspond to the amount of domain-general activation available to the cognitive system, as suggested by general-capacity theories of working memory.
Working memory "capacity" or working memory "control"? An alternative interpretation of Cantor and Engle's (1993) results is that WMC corresponds to the regulation or control of activation rather than the sheer amount of activation or capacity. Note that in their implementation of the fan paradigm, Cantor and Engle manipulated the person fan, with each person associated with one, three, or four places, and so the fan effect was as a function of location. That is, reaction time and error rate increased as the number of locations associated with an individual person increased. Importantly, Cantor and Engle did not manipulate location-fan. That is, every location was associated with two people. For example, if one of the sentences was, "The lawyer is in the park," another sentence might have been, "The artist is in the park." It was therefore possible that response competition between different characters in the same location influenced performance in the verification stage. In order to verify that the sentence, "The lawyer is in the park," was studied, the subject may have had to block or inhibit the sentence, "The artist in the park." Thus, individual differences in WMC might be related to the ability to resolve interference rather than the rate or amount of spreading activation. In order to examine this possibility, Conway and Engle (1994) had subjects perform a task that married the fan paradigm with a Stemberg-type memoryscanning task. For example, subjects were required to memorize four sets of letters, each set consisting of 2, 4, 6, or 8 letters. For subjects in the response-competition condition, each letter was a member of two different sets. So, if the letter X was a member of set 2 it might also be a member of set 6. For subjects in the nocompetition condition, each letter was a member of only one set. The critical question was whether individual differences in WMC would be seen in both conditions, as predicted by a capacity view, or only when competition was present,
354
Conway and Kane
as predicted by a control view. In fact, low WMC subjects revealed a more dramatic set-size effect than did high WMC subjects only in the response-competition condition (the condition resembling that used by Cantor and Engle, 1993). Although the slopes of the set-size effect in the no-competition condition were also substantial, they were equivalent for high and low WMC subjects. This pattern suggested that individual differences in WMC do not correspond to the rate of spreading activation or the sheer amount of activation. Rather, individual differences in WMC are related to the ability to control the activation of relevant information and block the activation of distracting information. The results of Conway and Engle (1994) motivate the prediction that individual differences in WMC will reveal themselves in contexts that present a significant source of interference. Subsequent work has indeed demonstrated WMC-related differences in long-term memory retrieval in the face of proactive, retroactive, and output interference (Kane & Engle, 2000; Rosen & Engle 1997, 1998). However, at the heart of Conway and Engle's interpretation of WMC is the stronger prediction that even in tasks that place minimal demands on memory, one may find WMC-related differences in any tasks that require attention control to resolve interference or competition. Thus, if WMC fundamentally reflects an attention control capability that is important in cases of competition and conflict, then one should find that high and low WMC individuals differ even in more "molecular" attention tasks tapping competition, even in those that make no explicit demands on memory retrieval. Below we present empirical evidence from four different paradigms that individual differences in WMC are, in fact, related to individual differences in performance of "attentional control" tasks. These tasks, to be discussed in turn, are dichotic listening, visual orienting (anti-saccade), Stroop, and continuous performance.
Working Memory Capacity Predicts Attentional Control and Capture: The Evidence Dichotic listening A dichotic-listening task requires the subject to shadow, or repeat aloud, a message presented to one ear while ignoring a message presented to the other ear. Early work using the dichotic listening paradigm revealed that subjects were very capable of successful shadowing and successful blocking. In fact, subjects are so successful at blocking the unattended message that little or no semantic content is ever reported from the irrelevant channel (Broadbent, 1958; Cherry, 1953). However, Moray (1959) found that when one's own name is presented on the unattended channel, 33% of subjects report hearing it, and so it appears that some semantic information is capable of capturing attention and therefore reaching awareness, at least for some individuals. Using more sophisticated sound technology, Wood and Cowan (1995) replicated Moray's (1959) study and found that 34.6% of subjects reported hearing
Individual Differences
355
their own name on the unattended channel. The question remained, why do some subjects recognize their name while other subjects do not? Note that by a capture/control view of dichotic listening performance, those who notice their names are those who are less successful in controlling attention by blocking task-irrelevant information. Thus, individuals with low WMC should be more likely to hear their name. In contrast, by a capacity view of dichotic listening, those who notice their names are those who have more attentional capacity to simultaneously devote to the task-relevant and task-irrelevant channels. By this view, individuals with high WMC should be more likely to hear their name. Conway, Cowan, and Bunting (2001) tested these possibilities by testing 20 high and 20 low WMC subjects in a version of Moray's dichotic-listening task, with high and low WMC reflecting the upper and lower quartiles of the distribution of operation span scores, respectively. The listening task required subjects to shadow 400 unrelated words presented to the right ear and ignore 350 unrelated words presented to the left ear. After 4 or 5 minutes of shadowing, the subject's own name was presented on the unattended channel. Words were presented simultaneously at a rate of one word per second. The attended channel was always a female voice and the unattended channel was always a male voice. 100 90 80 .J.,~
70 60
O
50 40
Ii
ff] Low span I High span
30 20
10 0 Figure I. Proportion of high and low span subjects who reported heating their own name in the unattended channel.
Conway, Cowan, and Bunting (2001) found very large WMC-related differences in name detection, such that low-WMC subjects were much more likely to hear their own name than were high-WMC subjects (see Figure 1). Although low-WMC subjects committed more overall shadowing errors (M = 30) than did high-WMC subjects (M = 10), the WMC groups did not differ in the number of shadowing errors committed on the two words presented before the presentation of the name. This suggests that the key finding of low spans disproportionately hearing their name was not simply due to attention wandering to the unattended channel at the opportune time. Finally, shadowing performance on the words following
356
Conway and Kane
presentation of the name was also examined. Presumably, hearing one's own name on the unattended channel would come with a cost and this was indeed the case. Regardless of WMC, subjects who reported hearing their name committed more shadowing errors on the two words following presentation of the name than subjects who did not report hearing their name. This cost only persisted for two words as there was no difference on the third or fourth word following presentation of the name. The results of Conway et al. (2001) provide strong support for the notion that WMC is related to attention control. Specifically, high and low WMC subjects differ in performance when blocking a particularly salient, and habitually attended to, stimulus. When attempting to ignore one auditory channel while shadowing another, individuals with lesser WMC are more susceptible to attentional capture by a powerful orienting cue than are those with greater WMC.
Visual orienting The results from dichotic listening suggest that low-WMC subjects encounter particular difficulty when a primed stimulus (hearing one's name) interferes with the task goal (ignoring the sound source of one's name). An experimental paradigm that analogously pits a task goal in conflict with a pre-potent visual response is the anti-saccade paradigm (Hallett, 1978; Hallett & Adams, 1980). The anti-saccade task requires the subject to detect an abrupt-onset visual cue in the environment and then use that cue to direct attention and the eyes away f r o m the cue in order to identify a target stimulus presented to the opposite spatial location (for a review see Everling & Fischer, 1998). Despite its simplicity, the anti-saccade task is much more demanding than a pro-saccade version of the task, in which the visual cue predictably appears in the same spatial location as the subsequent target. In this situation, attention and the eyes may be reflexively drawn to the cued location in order to identify targets. Thus, in only the anti-saccade task is there a conflict between the more automatic, habitual response (look toward the cue) and the goal or target response (look away from the cue). Kane, Bleckley, Conway, and Engle (2001) predicted that individual differences in WMC would not be related to performance on pro-saccade trials because orienting in these trials occurs reflexively. In contrast, given the conflict between goal and reflex presented by the anti-saccade task, Kane et al. predicted that high span subjects would be better able to resist attention capture here than would low span subjects. In order to test these predictions, Kane et al. assessed WMC using the operation span task and classified the upper and lower quartile of the distribution of span scores as high- and low-WMC, respectively. They then had 107 high- and 96 low-WMC subjects perform both pro-saccade and anti-saccade tasks. 2 In both tasks, subjects identified a pattern-masked target letter (either B, P, or R), presented 11.5 ~ to the left or right of fixation on a computer screen. In the prosaccade condition, a blinking visual cue was presented immediately before the target, one character space beneath its eventual location. Thus, the cue elicited a
Individual Differences
357
pre-potent orienting response that guided detection of the target. In the anti-saccade condition, the cue was presented immediately before the target but on the opposite side of the computer screen from the target. Successful identification of the target here required blocking the pre-potent orienting response and initiating an opposing eye movement in the opposite direction. In the first experiment, identification latency and accuracy were recorded as dependent measures. Kane et al. (2001) found that WMC predicted visual orienting performance only in the anti-saccade condition, where high span subjects identified the target nearly 200 ms faster than did low span subjects (see Figure 2). In contrast, mean identification times for the two groups in the pro-saccade condition were within 10 ms of one another. Although high span subjects appeared to be less susceptible to capture from the abrupt-onset cue than were low spans, an open question was whether high spans were able to inhibit the pre-potent orienting response altogether, or whether they simply recovered from erroneous saccades faster than did low span subjects. 1000
T
900 800
I
700 600
[:] Low span
500
1 High span
400 300 200 100 0 Pro-
AntiSaccade Task
Figure 2. Mean target-identification latencies for high and low span subjects for either prosaccade (Pro-) or antisaccade (Anti-). Errorbars depict standard errors of the means.
Kane et al. (2001) conducted a second experiment to answer this question, as well as to examine practice effects on anti-saccade performance, by monitoring subjects' eye movements during several blocks of anti-saccade trials. Twenty high span and 20 low span subjects were tested. As shown in Figure 3, the eyemovement analysis revealed that individuals with low WM spans were more likely to make erroneous reflexive saccades in the anti-saccade task than were individuals with high WM spans, and this WMC difference persisted across all 10 blocks of trials. Thus, low spans were more likely to look in the direction of the cue when they should not have, even after considerable practice. Not shown here is that after making reflexive errors, low spans also took longer to disengage their gaze from the
358
Conway and Kane
cue and move toward the target than did high spans (Ms = 674 and 512 ms, respectively). Thus, compared to individuals with greater WMC, those with lesser WMC not only made more saccade errors, but also, after committing an error they took much longer to correct it. As in dichotic listening, then, high and low WMC individuals demonstrated substantial performance differences in the anti-saccade task, where a powerful, reflexive response captured attention away from a relatively weak goal imposed by the experimental context. 0.5
0.4 " 9 Low span
0.3
9 High span
@ o~
0.2
@ @
0.1
I
i
I
I
I
I
I
I
I
A1 A2 A3 A4 A5 A6 A7 AS A9 A10 Anti-saccade Block
Figure 3. Mean proportionof reflexive eye movements,made in error, across 10 antisaccade trial blocks (A1 - A10) for high and low span subjects. Errorbars depict standard errors of the means. Stroop
Successful performance in tasks like dichotic listening and anti-saccade clearly requires resistance to interference through blocking or inhibition, and so WMC differences in such tasks may be taken to reflect a difference in inhibitory capability (see Engle, 1996; Hasher & Zacks, 1988). It has recently been suggested by several theorists, however, that successful inhibition or blocking may rely on the active maintenance of task-relevant information or the goal state (e.g., Cohen, Dunbar & McClelland, 1990; De Jong, Berendsen, & Cools, 1999; Roberts & Pennington, 1996). According to these theorists, inhibition is a by-product of successful goal-maintenance. Thus, whether WMC should be considered a cause or an effect of efficient inhibition is a current point of controversy (see Kane et al., 2001; May, Hasher & Kane, 1999). Kane and Engle (2001) attempted to tease apart the contributions of memory maintenance and inhibition to the performance of capture tasks by examining WMC-related differences in the Stroop (1935) task. The Stroop task requires the subject to name the colors in which words are presented. In the critical interference condition, the words themselves are color names, incongruent with the
359
Individual Differences
ink color (e.g., the word BLUE presented in red). Thus, the habitual process of identifying the written word comes into conflict with the task-goal of naming the ink color. Kane and Engle (2001) reasoned that actively maintaining the task goal in working memory would be more important to success when the list-wide proportion of congruent trials was high (e.g., RED presented in red). That is, if the majority of words in a Stroop task are presented in a color that matches their name, then subjects might periodically "forget" that they were supposed to be naming the ink color, not the word itself. Congruent trials provide no cost to-periodic neglect of the task goal. In contrast, when all the trials are incongruent, such that the ink color and the word are always inconsistent, the task-goal is reinforced on every single trial, making active goal maintenance less necessary. Note that accurate color naming on incongruent Vials requires blocking the habitual word-reading response no matter whether the prevailing context consists of many or no congruent trials. Thus, any differences in Stroop interference between such contexts, and span differences therein, may be better attributed to differences in goal maintenance rather than inhibition.
1,1 1,1
260 " 240 " 220 " 200 180 160 140 120
//--
100 .... 8O 60 40 20 0
mm
0%
[] L o w
span [] H i g h span
I
50%
75%
Proportion of C o n g r u e n t Trials
Figure 4. Response-timeinterference effects for high and low span subjects, by proportioncongruency condition. Interference effects were calculated by subtracting neutral-trial latencies from incongruent-trial latencies. Verticallines depict standarderrors of the means.
Kane and Engle (2001) therefore manipulated the proportion of congruent trials in the Stroop task and compared the performance of high and low span subjects. As with the dichotic listening and anti-saccade studies, WMC was assessed using the operation span task and the upper and lower quartiles of the span distribution were classified as high and low span respectively. Subjects were randomly assigned to one of three conditions representing different list-wide proportions of congruent trials (0%, 50%, or 75%) out of 288 total trials. In addition to congruent trials, which presented the words RED, BLUE, or GREEN in their matching color, all subjects saw 36 neutral vials (JKM, XTQZ, FPSTW, presented
360
Conway and Kane
in red, blue, or green), and the remaining trials were incongruent (presenting mismatched words and colors). The subject was instructed to name the ink color as quickly and accurately as possible, and that even if many trials were congruent, performance on the critical incongruent trials would be best if they always tried to ignore the word. Stroop interference was assessed by taking the difference between the incongruent trials and the neutral trials in both naming-time and error rate. As illustrated in Figure 4, there was no difference between high- and lowWMC subjects in interference in any condition, as measured by latencies. In contrast, Figure 5 shows that, in errors, high- and low- WMC subjects differed in interference in the 75% congruent condition only. Low-WMC subjects committed almost twice as many errors on incongruent trials as did high-WMC subjects. These results suggest that individual differences in WMC may not be evident when the Stroop context contains a large proportion of incongruent trials. In contrast, a large WMC-related difference was clearly seen when the experiment contained a large proportion of congruent trials, and this difference represented low WMC individuals 1098-
7-
"
654-
7] Low span ll High span
321 0
T
0%
50%
75%
Proportion of Congruent
Figure 5. Error interference effects for high and low span subjects, by proportion-congruency condition. Interferenceeffects were calculated by subtracting the number of neutral-trial errors from the number of incongruent-trial errors. Vertical lines depict standard errors of the means.
naming the word aloud instead of the color. That is, individuals with lesser WMC were especially likely to perform as if they periodically lost access to the goal of the task, to name the color and ignore the word. These findings, in conjunction with those from dichotic listening and anti-saccade tasks, provide a further demonstration that WMC may determine susceptibility to attentional capture of a habitual, dominant response when habit and goal are in conflict. Moreover, the results also indicate that inhibitory processes, as measured by tasks like the Stroop test, may vary in their successful application with the active memory maintenance of goals (for a formal implementation of this idea, see Cohen et al., 1990; Cohen & ServanSchreiber, 1992). Thus, individual differences in inhibitory ability may actually be due to individual differences in goal-maintenance (Kane et al., 2001).
Individual Differences
361
Continuous performance The experiments reviewed thus far support the notion that WMC is related to attentional control and capture. One limitation, however, at least from a psychometric perspective, is that all of the research has been conducted with samples of college students. An open question is whether the relation between WMC and attentional control will be evident in other populations. For instance, recent work from the developmental literature suggests that the processes underlying the performance of working memory tasks may be different for children than for adults (e.g., Towse, Hitch, & Hutton, 1998; but see Kail & Hall, 2001). Therefore, it is not clear if the relation between WMC and attentional control holds for schoolaged children. In an attempt to address this question, Conway, Bottoms, Nysse, Haegerich, and Davis (2001) examined the relation between WMC and distractibility in 7- to 8year-old children. Working memory capacity was measured using the counting span task, originally designed as a developmental assessment tool (Case, Kurland, & Goldberg, 1982). In counting span, the subject is presented with a series of displays, each containing a varying number of targets (e.g., blue circles) and a varying number of distracters (e.g., red circles). The subject's task is to count aloud the number of targets and remember the total for later recall. After a series of displays (typically between 2 and 6) the subject recalls all the totals from the current series. This is considered a test of WMC because it taps not only the storage component of working memory (i.e., remembering digits), but also the processing component (i.e., visual search and counting). In factor analytic studies with adult subjects, the counting span task loads on the same factor as other measures of WMC, including the reading span and the operation span task (Conway, Cowan, Bunting, Therriault, & Minkoff, in press; Engle et al., 1999). Conway et al. (2001) measured capture with a task from the Gordon Diagnostic System (Gordon, 1991), a tool used to diagnose ADHD in both children and adults. The Gordon distractibility task is a continuous performance test (CPT) with distracters. Digits are presented on a three-column display device, one every second. The subject's task is to press a button when a 1 is followed by a 9 in the middle column of the display and to ignore the digits presented in the left and right columns. Three primary measures are derived from the CPT task. First, the total number of correct responses is recorded, that is, the number of times the subject presses the button when in fact he or she should. Second, the number of commission errors is recorded, that is, the number of times the subject presses the button when in fact he or she should not have. Third, the latency of the button press is recorded. This is the time from the onset of the digit to the pressing of the button. Conway et al. (2001) had 66 children (M age approximately 8 years) perform both the counting span and CPT tasks. Correlations among the main measures are presented in Table 1. Consistent with evidence from the adult literature linking WMC to capture, there was a significant negative correlation between counting span (CSPAN) and commission errors (r=-.32), such that greater
3 62
Conway and Kane
WMC was associated with fewer commission errors. In contrast, there was not a significant correlation between CSPAN and correct responses (r=-.02) or between CSPAN and latency (r=.13). Particularly striking is the finding that WMC is correlated with commission errors but not with the total number of correct responses. Thus, children with lesser WMC did not fail to understand the task: they simply responded with more commission errors than children with greater WMC. Table 1. CorrelationsAmong Main Measures from the DistractibilityTask 1
1. AGE 2. CSPAN 3. 4. 5.
CORRECT COMMISSIONS LATENCY
---
2
3
4
5
.23 ....
.20 .02 ....
-.16 -.32* .26* ....
-.18 .13 -.44* .02 ---
* p < .05, two-tailed test Conway et al. also examined the types of commission errors associated with WMC. They used the coding scheme "XXX" to refer to the stimuli that were presented in the middle column of the display on three consecutive displays. For example, "358" refers to a situation where a 3 was presented in the middle column, followed by a 5, followed by an 8. Also, the letter "X" was used as a wildcard. That is, "19X" refers to a situation where a 1 was presented in the middle column, followed by a 9 in the middle column, followed by any number other than 1 or 9 in the middle column. A commission error in the "19X" condition represents a case where the child pressed the display too late. That is, they should have pressed the button when the 9 was presented but they pressed it just after the presentation of the 9. The different types of commission errors (with the error occurring on the third display) were XXX, 19X, XX9, XX1, X1X, and X9X. Note that X19 is not a type of commission error because it represents the target sequence. Correlations between CSPAN, latency to respond, and types of commission errors are presented in Table 2. Significant correlations were found between CSPAN and X1X errors (r=-.34) and X9X errors (r=-.32). CSPAN was not significantly correlated with other types of commission errors. Latency was also correlated with X1X errors (r=-.50) and X9X errors (r=.34). Given the correlations with latency, Conway et al. examined the correlations between CSPAN, X1X errors, and X9X errors, while controlling for latency. The correlations remained significant. Finally, there was a significant correlation between latency and 19X errors (r=.47). This is not surprising, as slower children would be more likely to commit a false alarm in this situation.
Individual Differences
3 63
Table 2. Correlations Among CSPAN, LATENCY, and Types of Commission Errors
1. CSPAN 2. 3. 4. 5. 6. 7. 8.
1
2
3
4
5
6
7
. . . .
.07 ---
.01 .36* ---
-.18 .10 .41" ---
-.14 .35* .33* .16 ---
-.34* .12 .01 .08 .23 ---
-.32* .10 .46* .58* .29* .07 ---
XXX 19X XX9 XX1 X1X X9X LATENCY
8
.13 .01 .47* .13 .03 -.50* .34* ---
* p < .05, two-tailed test
Table 3. (A) Average number of X1X commission errors as a function of latency and CSPAN (n = number of subjects)
Low capacity
LATENCY Slow Fast 1.60 (n = 15) 4.27 (n=22)
High capacity
1.59 (n=l 7)
CSPAN 2.00 (n=12)
(B) Average number of X9X commission errors as a function of latency and CSPAN LATENCY Slow Fast 1.47 (n =15) .45 (n=22) Low capacity CSPAN High capacity
.18 (n=17)
.25 (n=12)
The pattem of correlations between CSPAN, type of commission error, and latency is particularly intriguing. Specifically, the correlations between CSPAN and X1X errors and X9X errors are both negative, again, suggesting that children with greater WMC exhibit fewer commission errors. In contrast, the correlations between latency and X1X errors and X9X errors are mixed, with a negative correlation between latency and X1X errors but a positive correlation between latency and X9X errors. This suggests that some children with lesser WMC are more likely to commit "fast" commission errors (i.e., X1X) while others are more likely to commit "slow" commission errors (i.e., X9X). To further demonstrate this point, Conway et al. compared X1X errors and X9X errors for fast and slow responders as well as for high and low capacity children. Fast and slow responders were determined by a median split on latency and high and low capacity children
364
Conway and Kane
were determined by a median split on CSPAN. Average number of X1X and X9X commission errors per group are reported in Table 3. This analysis nicely demonstrates that, (1) there are slow low spans and there are fast low spans, and (2) they exhibit the same deficiency; namely an inability to stop a response when primed with part of the target sequence. The detailed analysis of the types of commission errors provides insight into the type of responding associated with low WMC. Specifically, children with lesser WMC were most likely to make a commission error in two situations: (1) when the middle column in the display prior to the target display contained a 1 (i.e., X1X), and (2) when the middle column in the display prior to the target display contained a 9 (i.e., X9X). Importantly, the relationship between WMC and the frequency of these types of errors cannot be accounted for by processing speed. That is, controlling for latency did not cause these correlations to be diminished. Instead, it appears that children with low WMC have trouble blocking, or inhibiting, a response when primed by part of the response requirements. For example, when the display prior to the target display contained either aspect of the target combination, 1 or 9, children with low WMC were more likely to commit an error. It appears as if having part of the response requirement primed a response that the child was unable to suppress. When none of the response requirements were present on the display prior to the target (i.e., XXX, XX9, XX1), the frequency of commission errors was not associated with WMC. Thus, low WMC is not associated with general impulsiveness but is associated with a tendency to be impulsive when primed for a response.
Summary of the evidence The empirical evidence suggests that WMC plays a role in a range of tasks traditionally thought to tap "low-level" attentional control and vulnerability to capture. Furthermore, WMC is not critical for all aspects of performance. Rather, WMC is critical in very specific situations, particularly when a pre-potent or habitual response conflicts with a task goal. To help illustrate the consistency of results across paradigms, the task goal, source of conflict, and pre-potent (or habitual) response for each paradigm is listed in Figure 6. Within each paradigm, individual differences in task performance were related to individual differences in WMC when the task goal came into conflict with the pre-potent response. For example, in the anti-saccade task the goal was to direct attention and orient in the opposite direction from the cue but the pre-potent response is to orient to the cue. In this situation, individual differences in WMC predicted task performance. In contrast, when the task goal and the pre-potent response were consistent, as in the pro-saccade condition, individual differences in WMC did not predict task performance.
36 5
Individual Differences
Experiment
Dichotic Listening
Anti-saccade
Task goal
Source of conflict
Shadow message presented to the right ear and ignore message presented to the left ear
One's own name presented to left ear
theShiftoppositeeyes & directionattenti~ofinan
! Abrupt-onset cue
Erroneous pre-potent or habitual response
Orient to name
i
Orient to the cue
abrupt-onset cue in order to detect a target
Stroop
Name the color of the word
The word name (especially difficult when there's a high % of congruent trials)
Read the word
Gordon Continuous Performance Test
Press a button when a 1 is followed by a 9
A I followed by some number other than 9 (or a 9 preceded by some number other than 1)
Press the button
Figure 6. An overview of the experimental situations in which working memory capacity predicts attentional control.
According to our theoretical perspective (see Conway, Cowan, Bunting et al., in press; Engle, Tuholski et al., 1999; Engle, Kane et al., 1999; Kane et al., 2001), working memory is a system responsible for the active maintenance of goalrelevant information in the face of concurrent processing and/or interference. WMC does not refer to a total amount of mental capacity or a speed of information processing per se. Rather, WMC refers to an ability to maintain a task goal in the face of salient interference, such as those situations outlined in Figure 6. In order to facilitate the comparison of our framework to other approaches discussed in this book, we embed our theory within Pashler's (1998) "controlled parallel" theory of selective attention. Pashler argued that the classic debate about an early or late filtering in selective attention has confounded two independent questions; (1) does the processing of multiple attended stimuli occur serially or in parallel? and (2) are "unattended" stimuli identified? Early selection models such as Broadbent (1958) epitomize the lower right quadrant of Figure 7. In contrast, late selection models (e.g., Deutsch & Deutsch, 1963; Norman, 1968) represent the upper left quadrant. Pashler's controlled parallel model represents a middle ground, in which multiple inputs can be processed in parallel and the extent to which unattended information is processed is under the control of the subject. Thus, if the subject's goal is to filter irrelevant information then the system will reveal characteristics of an early filter model. However, if the subject's goal is to monitor multiple inputs then the system will reveal characteristics of a late filter model.
366
Conway and Kane Are unattended stimuli identified? no
ves
Processing of multiple attended stimuli is
Controlled
Late Selection
parallel
Parallel
999 ~176176
serial
Early Selection
Figure 7. Pashler's controlled parallel theory of selective attention
We would add an individual differences perspective to this framework. That is, individual differences in WMC are related to attentional control, such that individuals with greater capacity have greater control. Therefore, if the task goal is to block irrelevant information, then individuals with greater WMC will process less unattended information than individuals with lesser WMC (see Figure 8). In short, if the task goal is to filter irrelevant information then individuals with greater WMC will exhibit evidence for early filter theory while individuals with lesser WMC will exhibit evidence for late filter theory. For example, the cocktail party study described above nicely illustrates this point. An open question at this point is whether individual differences in WMC correspond to attentional control in contexts in which the goal is to process/monitor multiple inputs. All of the research discussed here forced subjects to focus on a relevant goal and block a distracting/competing response. Future research should address whether individuals with greater WMC also have greater attentional flexibility. Are unattended stimuli identified? ves
//
Darallc', Processing of multiple attended stimuli is
no
I r-'l W o r k i n g
serial
999 ,o,
\\ . . . .
ry
capacity]
Early Selection
Figure 8. An individual differences interpretation of Pashler's controlled parallel theory of selective attention.
Individual Differences
3 67
Supporting evidence from cognitive neuroscience We conclude with a brief discussion of two recent studies in the field of cognitive neuroscience. Each of these studies nicely illustrates the relationship between WMC and attentional control in a way that complements our approach. Moreover, these studies begin to point to a neurological basis for the relationship between WMC and control. De Fockert, Rees, Frith, and Lavie (2001) used fMRI to examine the relation between WMC and selective attention. They had subjects classify famous names as either pop stars or politicians. The names were presented with faces that were either congruent or incongruent with the famous name (e.g., Bill Clinton's face presented with Bill Clinton's name or Ricky Martin's face presented with Bill Clinton's name). Distractor interference was assessed by subtracting reaction time in the congruent condition from that in the incongruent condition. Subjects performed the classification task along with a secondary memory task that imposed either a "low" or "high" load. In the low load condition subjects maintained four digits for later recall, but the digits were presented in the same order on every trial. In the high load condition subjects maintained four digits for later recall and digitorder changed every trial. De Fockert et al. found greater interference effects in the high load condition than in the low load condition. This is conceptually equivalent to our findings that low span subjects experience greater interference than high span subjects. They also found that greater memory load was associated with greater activity in the frontal cortex, which is consistent with previous findings suggesting that the active maintenance of information is particularly reliant on prefrontal cortex (for a review see Kane & Engle, 2001). Most relevant here is the finding that increased memory load (and a larger interference effect) was accompanied by greater activity in brain regions which have been shown to be critical for processing visual information (in particular faces) such as the fusiform gyrus and the extrastriate visual cortex. These findings suggest that when subjects experienced a greater memory load, frontal areas were less able to effect top-down control on posterior processing areas. Distractor processing was disinhibited by memory maintenance of irrelevant information. Similar results were reported in a neuropsychological investigation of auditory selective attention using ERP (Chao & Knight, 1998). Ten patients with unilateral lesions to the dorsolateral prefrontal cortex (dPFC) and ten healthy agematched controls were tested in an auditory delayed-matching-to-sample task with a 5000 ms delay. The sample and test stimuli consisted of real-world sounds such as coughing, dogs barking, piano notes, and dishwasher noise. Half the trials included several auditory distractor "tone pips" (4000 Hz) between the offset of the sample sound and the onset of the test sound. The patients and controls showed similar error rates on no-distractor trials, but the patients showed significantly more errors than the controls on distractor trials. Furthermore, the patients showed more responding in auditory cortex to the distractor tones than did the controls, and they
368
Conwayand Kane
showed less cortical responding to target stimuli than did the controls. These results suggest that the dPFC patients, similar to low span subjects, had difficulty maintaining the target representation during the delay, especially in the face of interference (for similar results from the primate literature, see Goldman-Rakic, 1987). Conclusion
To summarize, we conceive of working memory as a system responsible for the active maintenance of information in the face of concurrent processing and interference. The working memory system is limited in capacity and this capacity constrains cognitive performance in a general manner. Individual differences in working memory capacity are evident in samples of college students and these differences are related to performance of a wide variety of cognitive tasks. Most relevant to readers of this book, WMC is related to attentional control and resistance to attentional capture, such that individuals with greater WMC are less susceptible to capture. We submit that these people are less susceptible to capture because they are more capable of goal maintenance in the face of salient interference. Footnotes
Daneman and Carpenter (1980) found a correlation of .59 between reading span and VSAT. However, that correlation may have been inflated due to the small number of subjects tested (n=18) and the particular method used. Correlations between .35 and .49 are more typical (Daneman & Merikle, 1996; Engle, Tuholski, Laughlin, & Conway, 1999). 2 Task type (pro, anti) as well as task order (pro-first, anti-first) were both manipulated between groups. Only the results of the first task performed are reported here. There were 52 high spans and 45 low spans who performed pro-first and 55 high spans and 51 low spans who performed anti-first. Interested readers are referred to Kane et al. (2001) for a detailed discussion of order effects. References
Anderson, J. R. (1974). Retrieval of prepositional information from longterm memory. Cognitive Psychology, 6, 451-474. Anderson, J. R. (1983). A spreading activation theory of memory. Journal of Verbal Learning and Verbal Behavior, 22, 261-295. Anderson, J. R., & Reder, L. M. (1999). The fan effect: New results and new theories. Journal of Experimental Psychology: General, 128, 186-197. Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning & motivation: Advances in research & theory (Vol. 2, pp. 89-195). New York: Academic Press.
Individual Differences
369
Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. A. Bower (Ed.), The psychology of learning and motivation (vol. 8, pp. 47-89). New York: Academic Press. Broadbent, D. E. (1958). Perception and communication. New York: Pergamon Press. Cantor, J., & Engle, R. W. (1993). Working-memory capacity as long-term memory activation: An individual differences approach. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 19, 1101-1114. Carpenter, P. A., Just, M. A., & Shell, P. (1990). What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices test. Psychological Review, 97, 404-431. Case, R., Kurland, M. D., & Goldberg, J. (1982). Operational efficiency and the growth of short-term memory span. Journal of Experimental Child Psychology, 33, 386-404. Chao, L. L., & Knight, R. T. (1998). Contribution of human prefrontal cortex to delay performance. Journal of Cognitive Neuroscience, 10, 167-177. Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America, 25, 975-979. Cohen, J. D., Dunbar, K., & McClelland, J. L. (1990). On the control of automatic processes: A parallel distributed processing account of the Stroop effect. Psychological Review, 97, 332-361. Cohen, J. D., & Servan-Schreiber, D. (1992). Context, cortex, and dopamine: A connectionist approach to behavior and biology in schizophrenia. Psychological Review, 99, 45-77. Conway, A. R. A., Bottoms, B. L., Davis, S. L., Nysse, K. L., & Haegerich, T. M. (2001). Working memory capacity and distractibility in children. Manuscript submitted for publication. Conway, A. R. A., Cowan, N., & Bunting, M. F. (2001). The cocktail party phenomenon revisited: The importance of working memory capacity. Psychonomic
Bulletin & Review, 8, 331-335. Conway, A. R. A., Cowan, N., Bunting, M. F., Therriault, D., & Minkoff, S. (in press). A latent variable analysis of working memory capacity, short-term memory capacity, processing speed, and general fluid intelligence. Intelligence. Conway, A. R. A., & Engle, R. W. (1994). Working memory and retrieval: A resource-dependent inhibition model. Journal of Experimental Psychology: General, 123, 354-373. Conway, A. R. A., & Engle, R. W. (1996). Individual differences in working memory capacity: More evidence for a general capacity theory. Memory, 4, 577-590. Cronbach, L. J., (1957). The two disciplines of scientific psychology. American Psychologist, 12, 671-684. Crowder, R. G. (1983). The demise of short-term memory. Acta Psychologia, 50,291-323.
3 70
Conway and Kane
Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450466. Daneman, M., & Merikle, P. M. (1996). Working memory and language comprehension: A meta-analysis. Psychonomic Bulletin & Review, 3, 422-433. De Fockert, J. W., Rees, G., Frith, C. D., & Lavie, N. (2001). The role of working memory in visual selective attention. Science, 291, 1803-1806. De Jong, R. D., Berendsen, E., & Cools, R. (1999). Goal neglect and inhibitory limitations: Dissociable causes of interference effects in conflict situations. Acta Psychologica, 101,379-394. Dempster, F. N. (1991). Inhibitory processes: A neglected dimension in intelligence. Intelligence, 15, 157-173. Deutsch, J. A., & Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 80-90. Engle, R. W. (1996). Working memory and retrieval: An inhibition-resource approach. In J. T. E. Richardson, R. W. Engle, L. Hasher, R. H. Logie, E. R. Stoltzfus, & R. T. Zacks, Working memory and human cognition. New York: Oxford University Press. Engle, R. W., Cantor, J., & Carullo, J. J. (1992). Individual differences in working memory and comprehension: A test of four hypotheses. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 972-992. Engle, R. W., Kane, M. J., & Tuholski, S. W. (1999). Individual differences in working memory capacity and what they tell us about controlled attention, general fluid intelligence and functions of the prefrontal cortex. In A. Miyake & P. Shah (Eds.), Models of working memory: Mechanisms of active maintenance and executive control (pp. 102-134). New York: Cambridge University Press. Engle, R. W., Tuholski, S. W., Laughlin, J. E., & Conway, A. R. A. (1999). Working memory, short-term memory and general fluid intelligence: A latent variable approach. Journal of Experimental Psychology: General 128, 309-331. Everling, S., & Fischer, B. (1998). The antisaccade: A review of basic research and clinical findings. Neuropsychologia, 36, 885-899. Goldman-Rakic, P. S. (1987). Circuitry of primate prefrontal cortex and regulation of behavior by representational memory. In F. Plum (Ed.), Handbook of physiology- The nervous system (Vol 5, pp. 373-417). Bethesda, MD: American Physiological Society. HaileR, P. E. (1978). Primary and secondary saccades to goals defined by instructions. Vision Research, 18, 1279-1296. Hallett, P. E., & Adams, B. D. (1980). The predictability of saccadic latency in a novel voluntary oculomotor task. Vision research, 20, 329-339. Hasher, L., & Zacks, R. T. (1988). Working memory, comprehension, and aging: A review and a new view. In G. H. Bower (Ed.), The Psychology of Learning and Motivation, Vol. 22, New York: Academic Press. Just, M., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99, 122-149.
Individual Differences
371
Kail, R., & Hall, L. (2001). Distinguishing short-term memory from working memory. Memory & Cognition, 29, 1-9. Kane, M. J., Bleckley, M. K., Conway, A. R. A., & Engle, R. W. (2001). A controlled-attention view of working memory capacity: Individual differences in memory span and the control of visual orienting. Journal of Experimental Psychology: General 130, 169-183. Kane, M. J., & Engle, R. W. (2000). Working memory capacity, proactive interference, and divided attention: Limits on long-term memory retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 336-358. Kane, M. J., & Engle, R. W. (200 l a). The contributions of working-memory capacity, goal neglect, and task set to Stroop interference. Manuscript submitted for publication. Kane, M. J., & Engle, R. W. (2001b). The role of prefrontal cortex in working memory capacity, executive attention, and general fluid intelligence. Manuscript submitted for publication. Kyllonen, P. C., & Christal, R. E. (1990). Reasoning ability is (little more than) working-memory capacity?! Intelligence, 14, 389-433. May, C.P., Hasher, L., & Kane, M.J. (1999). The role of interference in memory span. Memory and Cognition, 27, 759-767. Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, A. H., Howerter, A., & Wagner, T. (2000). The unity and diversity of executive functions and their contributions to complex "frontal lobe" tasks: A latent variable analysis. Cognitive Psychology, 41, 49-100. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology, 11, 56-60. Norman, D. A. (1968). Toward a theory of memory and attention. Psychological Review, 75, 522-536. Pashler, H. (1998). The psychology of attention. Cambridge, MA: The MIT Press. Radvansky, G. A. (1999). The fan effect: A tale of two theories. Journal of Experimental Psychology: General 128, 198-206. Roberts, R. J., Jr., & Pennington, B. F. (1996). An interactive framework for examining prefrontal cognitive processes. Developmental Neuropsychology, 12, 105-126. Rosen, V. M., & Engle, R. W. (1997). The role of working memory capacity in retrieval. Journal of Experimental Psychology: General 126, 211-227. Rosen, V. M., & Engle, R. W. (1998). Working memory capacity and suppression. Journal of Memory and Language, 39, 418-436. Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643-662. Towse, J. N., Hitch, G. J., & Hutton, U. (1998). A reevaluation of working memory capacity in children. Journal of Memory and Language, 39, 195-217. Turner, M. L., & Engle, R. W. (1989). Is working memory capacity task dependent? Journal of Memory and Language, 28, 127-154.
37 2
Conwayand Kane
Wood, N., & Cowan, N. (1995). The cocktail party phenomenon revisited: How frequent are attention shifts to one's name in an irrelevant auditory channel? Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 255260. Authors' Notes
Andrew R. A. Conway, Department of Psychology, University of Illinois at Chicago; Michael J. Kane, Department of Psychology, University of North Carolina at Greensboro. We would like to recognize our collaborators on the experiments described here. They are Kate Bleckley, Bette Bottoms, Michael Bunting, Nelson Cowan, Suzanne Davis, Randy Engle, Tamara Haegerich, and Kari Nysse. Correspondence concerning this manuscript should be sent to Andrew Conway, University of Illinois at Chicago, Department of Psychology (M/C 285), 1007 West Harrison Street, Chicago, IL 60607-7137 (email:
[email protected]).
Dynmical SystemsEvolution
This Page Intentionally Left Blank
Attraction, Distraction, and Action: MultiplePerspectiveson Attentional Capture C. Folk and B. Gibson (Editors) 9 ElsevierScience B.V. All rights reserved.
15
375
A Dynamic, Evolutionary Perspective on Attention Capture 1 William A. Johnston and David L. Strayer
Science, like biological evolution, may progress primarily when relatively unproductive steady states are punctuated by abrupt changes, paradigm shifts, or revolutions (e.g., Kuhn, 1970). We suggest that the study of attention may be due for such a change. In what follows, we examine some possible limitations of the current approach and then outline a new, or supplementary, direction that the field might productively take, one that is based on concepts drawn from dynamical systems theory and both biological and cultural evolution. We conclude by suggesting that one potential advantage of the proposed new direction is that it has important implications for broader, social issues. Our primary goal in this chapter is no doubt somewhat unique. It is neither to summarize the literature bearing on a particular research or theoretical issue with respect to attention capture nor to present new empirical data. Rather it is to suggest a different way of thinking about attention capture, an alternative to the usual metatheoretical framework within which research and theory on attention capture are carried out. Because this alternative framework has yet to be put to significant empirical and theoretical practice in the study of attention, our treatment of it is of necessity highly speculative and relies to some extent on anecdotal evidence, z Indeed, it is not altogether clear what epistemological procedures would be most appropriate for pursuing and "testing" the ideas generated within this paradigm. For example, as we suggest below, naturalistic observation may be more fruitful than the standard laboratory experimentation that is characteristic of contemporary cognitive psychology. Although our focus is on attention capture, the basic ideas, suggestions, and speculations apply as well to contemporary approaches to most other areas of cognition
The Phenomena to be Explained and the Explanations Just what is attention capture? We consider here the natural-language and typical cognitive-psychological meanings of attention capture. Further below we consider our proposed alternative view.
376
Johnston and Strayer
Phenomenology of attention capture With respect to private, subjective experience, strong and weak forms of attention capture may be distinguished. A strong form of attention capture is experienced when an imperative stimulus or event suddenly interrupts some ongoing processing or task and "breaks into awareness." Candidates for such stimuli include sudden onsets such as an object that blinks or moves in a field of steady or stationary objects (e.g., Yantis, 1993), odd stimuli or singletons such as a red object in a field of blue objects (e.g, Folk, Remington, & J. Johnston, 1993; Triesman & Gelade, 1984), and, possibly, novel or unexpected objects in a familiar field (e.g.,W. Johnston, Hawley, Plewe, Elliott, & DeWitt, 1990; W. Johnston & Hawley, 1994). A weaker form of attention capture is experienced when one becomes aware spontaneously of one of many nonimperative stimuli in the environment. An example might be when one is idly gazing out a window drifting in reverie and suddenly becomes aware of some relatively inconspicuous and innocuous object such as one of many plants in a garden or persons in a crowd. The above description of attentional phenomenology, like the vast majority of research on attention, focuses on attention to inputs from the external environment. However, attention may at times be directed "inward" to memories, fantasies, thoughts, and feelings. This internal attention might also be said to be captured when certain of these mental events spring uninvited into awareness. Extreme forms of internal attention may characterize certain thought disorders. We shall return to a consideration of internal attention in a subsequent section. In the meantime, our discussion refers primarily to attention to external stimuli.
Conventional theoretical framework Processing of extemal stimuli is typically divided into preattentional and postattentional stages. Awareness or consciousness is associated with postattentional processing. Because postattentional processing (a.k.a. awareness) is assumed to be limited in capacity and incapable of the parallel processing of the usually massive preattentive inflow of stimulus information, selection of just a small subset of this information is often necessary. In order for behavior to be adaptive and goal directed, this selection must be systematic. Attention is the process by which this systematic selection is accomplished. Although theories differ in terms of the extent or depth of pre-attentional processing, most of them appeal to some sort of gate-keeper that is responsible for the systematic admission of preattentive data into consciousness. This gate-keeping mechanism has been variously dubbed, among other terms, attention director, control processor, executive, and search mechanism (e.g., LaBerge, 1975; Posner & Snyder, 1975; Shiffrin & Schneider, 1977; Wolfe, 1994).
Dynamical Systems and Attention Capture
377
Attention may be systematically directed to certain, task-relevant, target stimuli depending on the individual's current motivation and goals. On occasion, this systematic direction of attention may be interrupted by an attention-capturing event, in which case the capturing input bursts through the gate. Although this sketch of typical theories of attention is somewhat of a caricature and may not do justice to any given theory, it attempts to capture enough of the gist of these theories to expose some of their shortcomings.
Limitations to Contemporary Theories of Attention Capture We have discussed some of the shortcomings of contemporary theories of attention and attention capture elsewhere (e.g.,W. Johnston & Dark, 1986; W. Johnston & Hawley, 1994; W. Johnston, Strayer, & Vecera, 1998). Cognitive psychology purports to be a scientific discipline, one that endorses the received view that any concepts and explanations developed be subject to empirical test. It is against this very standard that many cognitive theories of attention may be considered to come up short. 3 Below we summarize the specific problems of appealing to intelligent homunculi and consciousness and the general problems of theoretical circularity and reductionism.
The problem of homunculi Many theories attempt to explain the intelligent behavior of an organism by appealing to an intelligent intemal entity, often assumed to be the seat of consciousness, such as an executive, search mechanism, or control agent. This is related to the internal I with which we humans identify ourselves (e.g., I think..., I believe .... , I am .... ). Although the idea of an internal homunculus may be consistent with our subjective experience and may be of some heuristic value, it is vacuous as an explanation. The appeal to an intelligent homunculus to explain the intelligent behavior of an organism begs the question of what underlies the intelligence of the homunculus and leads to an infinite regress. It leads also to the problem of consciousness.
The problem of consciousness Consciousness or awareness of an input is usually considered to be that which is captured when attention is captured by the input. We certainly do not deny the existence of consciousness. Indeed, it is arguable that in every waking moment each of us is confined within his/her own consciousness and only indirectly in touch with anything outside of it, such as a material world. Consciousness is useful as a natural-language concept and we shall make use of it below. However, the
37 8
Johnston and Strayer
mind-body problem has been considered by scholars for millennia and has never been resolved. All of the classical solutions to the problem continue to be tenable, including the idea that only mind or consciousness exists and the material world is an illusion. The problem is that consciousness is not amenable to scientific analysis; there is no consensus on what, if anything, it is composed, where, if anywhere, it resides, and how, or even if, it can be empirically assessed. Indeed, consciousness may be entirely epiphenomenal and not play a causal role in behavior at all (e.g., James, 1890/1950; W. Johnston & Dark, 1986). Therefore, the appeal to consciousness in descriptions and explanations of attention is to appeal to a concept that itself defies description and explanation within the usual parameters of Western science. Similar arguments can be directed at related concepts that are often appealed to in the literature on attention such as intention, volition, and deliberateness. 4
The problem of theoretical circularity Subjective experience and ordinary language concepts, such as the internal I and consciousness may be of some value as a starting point for the study of attention capture, but they cannot be relied upon to both define and explain attention capture. An explanation that incorporates the phenomena to be explained is a circular one that explains nothing at all. Attention and attention capture are not explained by claiming that the organism manages to systematically and adaptively attend to outside stimuli because intelligent attentional devices inside its head systematically and adaptively attend to the internal representations of these stimuli. Unfortunately, most contemporary theories of attention do little more than re-describe the phenomenology of attention using different words.
The problem of reductionism Parts vs. wholes. Following the lead of traditional, Western science, cognitive and neural sciences have attempted to understand the behavior of whole organisms by analyzing them into their parts. Just as physical sciences attempt to understand complex material entities in terms of their molecular, atomic, and sub-atomic compositions, cognitive and neural sciences attempt to understand attention by focusing on the intemal cognitive and neural mechanisms presumed to govem it. The problem with reductionism is that the phenomena of interest may be emergent properties of all the parts of the system (e.g., the body of an organism) acting in concert, potentially even in specific, contextually variable relationships to other, outside systems (e.g., other organisms or whole ecosystems). The phenomena may simply not exist in any components of the system or even in all the
Dynamical Systems and Attention Capture
379
components taken together. We suggest below that attention capture may be one such non-reducible, emergent phenomenon. It may be emergent in the relationship between the whole organism and the specific environmental context that it temporarily inhabits. Laboratory methodology. A further limitation to the reductionistic approach is that it is usually associated with laboratory methodology. Human subjects in most studies of attention are disconnected from the rich, dynamic natural habitats in which they normally reside and engage in complex behaviors (e.g., automobile driving, cooking dinner, conversing with friends) and are placed in drab cubicles facing contrived, discrete, displays of relatively simple and contrived stimuli to some of which they may make simple, discrete, arbitrary responses such as button presses. To disconnect an organism from its natural habitat runs the serious risk of altering and distorting the very processes under investigation. Attentional processes no doubt have emerged because they have had adaptive value in the sorts of complex, natural ecologies in which the species of interest has evolved. These processes may not be fully deployed and manifested in, and may even be corrupted by, contexts sufficiently different from these naturalistic contexts. 5
A Dynamical Systems Framework While scientific reductionism is not without merit, it is also not without serious limitations. As noted above, attention capture may be an emergent phenomenon that arises from the fluctuating interactions and relationships among all of the parts acting in concert within complex and dynamic, naturalistic contexts. The process of attention capture is likely to be altered, perhaps seriously so, when the organism is disconnected from its natural habitats and placed in contrived, artificial laboratory contexts that effectively disassemble the organism by engaging a small subset of systems (e.g., visual and brain systems) to a disproportionate degree. 6 We suggest that an understanding of attention capture can benefit from more holistic and naturalistic approaches. We now consider how dynamical systems theory might form the basis for one such approach.
Overview of dynamical systems theory We provide here just a broad, qualitative outline of dynamical systems theory. More thorough treatments are available elsewhere (e.g., Gleick, 1987; Kauffman, 1993; Lewin, 1992, Prigogine & Stengers, 1984). Webs of relationships. The systems approach focuses on relationships rather than material entities or things. Systems are viewed as dynamic webs or patterns of relationships. The nodes or intersections in any one web are themselves
380
Johnston and Strayer
viewed as lower-order webs of relationships rather than material things. The entire universe is viewed as a vast, complex, and dynamical web of relationships, within which are imbedded countless, interconnected lower-order systems or webs of relationships. The human body is an incredibly complex, hierarchical web of relationships composed of lower-order webs defining organs, tissues, cells, and so on all the way down to molecules, atoms, and subatomic systems. The higher-order patterns cannot be reduced to the lower-order patterns in part because of the loss of important emergent phenomena and in part because the interdependencies between the former and the latter are bidirectional (i.e., the "whole" both constrains and is constrained by the "parts"). The atoms, molecules, and cells composing the body are in constant turnover, but the pattern of relationships remains more or less the same. We still recognize people we haven't seen for years even though the matter of which they are composed has been replaced several times over. What defines the human body, then, is not a material, reducible entity but a nonreducible pattern of relationships. The human body is itself a node in many higher-order webs such as the immediate environmental context, family, profession, culture, and the planet as a whole. These webs are spun out across time as well as space. Thus, in order to understand human behavior and any phenomenon it manifests, such as attention capture, it is necessary to consider not only the human body and parts thereof but also the broader systems in which it participates, with which it has evolved, and to which it must adapt. It is necessary to consider the historical, evolutionary webs from which humans have emerged and to which they contribute. For example, human anatomy, physiology, and behavior reflect dynamical, pattern-making processes that have shaped the courses of biological and cultural evolution, not to mention geological and cosmological evolution. 7 Chaos and butterfly Effects. Activity anywhere and at any time in the universal web can ripple widely and potentially affect systems anywhere else and at any other future time in the web, leading to what have been termed butterfly effects (e.g., Gleick, 1987). Most natural systems are nonlinear, often exemplifying deterministic chaos, and, unlike additive, linear components to which Western science often attempts to reduce natural systems, tiny causes can have big effects and big causes can have tiny effects. Butterfly effects ripple across time as well as space. An example of butterfly effects of ancient origin on human bodies and behaviors is that the first hominids (e.g., Lucy) stood upright. This upright stance required a change in hip structure and a restriction of the birth canal of females which, in turn, required that babies be born "prematurely," that is, before their crania and brains were as fully developed as those of their primate cousins. This meant that brain organization became more flexible and responsive to the actual environmental contexts into
Dynamical Systems and Attention Capture
3 81
which the infants were bom. The self-organizing web of relationships emergent in the brain of an Australopithecus infant was capable of reflecting the fine-grained statistical regularities of the immediate environment in addition to the general trend, coarse-grained regularities made innately available by biological evolution. In addition, this upright stance freed up the hands for carrying and wielding objects and changed the articulatory apparatus in ways that made human speech possible. All of these changes may have contributed to the evolutionary increase in hominid brain size, the emergence of human speech and symbolic thought, and, eventually, cultural and technological evolution. We suggest that attention capture has been affected by and contributed to all of these evolutionary changes. In short, a thorough understanding of a system, such as an organism, must consider both the webs of relationships of which it is composed and the whole history or evolution of the broader webs in which it is imbedded. Self-organizing complexification. The basic idea is that systems tend to evolve away from simple states close to thermodynamic equilibrium toward higher levels of complexity. This evolution is self-organizing, rather than based on some sort of blueprint or plan, and often exemplifies deterministic chaos (e.g., Kauffman, 1993; Lewin, 1992). Along this self-organizing course, systems pass through various attractor phases, each of which is self-perpetuating and tends to maintain the system in a dynamic stasis until a sufficiently strong perturbation forces it to undergo a phase transition into another attractor. As the system passes through these attractors, it often increases in terms of complexity and dissipates more energy (e.g., Prigogine & Stengers, 1984). Examples of self-organizing complexification include cosmological evolution, biological evolution, cultural evolution, and individual development (both prenatal and postnatal). All of these are characterized by relatively simple, embryonic states and immensely complex later states, s Edge between order and chaos. Kauffman (1993) suggests that systems tend to self- organize toward and thrive near the edge between order and chaos, an area or zone of optimal system adaptability (see also Lewin, 1992). This abstract edge is reminiscent of the stability/plasticity dilemma and the costs and benefits of expertise (Grossberg, 1987; W. Johnston & Hawley, 1994; W. Johnston, Strayer, & Vecera, 1998). Systems with too little plasticity may be overly rigid and vulnerable to stagnation and decay, and those with too much plasticity may be overly fragile, sensitive to even the slightest perturbations, and vulnerable to dissolution or deterioration. Experts and specialists may be excessively stable; they tend to perform very well within the particular domains and contexts to which they have become precisely attuned and adapted, but they do so at some loss of flexibility or plasticity and may suffer costs of expertise should their environmental context change. Novices and generalists may be excessively labile; they are not precisely attuned to any particular niche, but they are sufficiently flexible that they can resonate with a changing, evolving web and not become marooned in an obsolete
382
Johnston and Strayer
attractor. One who is a "jack of all trades and master of none" is likely to have the advantage over a highly specialized trade master in times of social and environmental change.
Attentional dynamics Attentional processes lend themselves readily to a dynamical-systems perspective because they are inherently relational; they relate organisms to one another and to other systems and potentially keep organisms connected to, and sensitive to changes in, the ecological webs in which they participate. We suggest that attention capture is a non-reducible, whole body phenomenon and that it serves important functions related to the stability/plasticity dilemma. Whole body vs. brain. Contemporary approaches to the study of attention capture tend to be human-centered and brain-centered, reflecting the reductionistic assumption that attention is entirely a cerebral phenomenon. In contrast, we suggest that attention capture is an emergent phenomenon of whole bodies and that it is a fundamental characteristic of countless species of organisms. In many species, including humans, this whole-body response often entails a whole pattern of body changes, including head orientation, pupil dilation, muscle tensing, and postural changes that prepare the body to respond quickly and appropriately to the capturing event (e.g., Sokolov, 1963; N~i/it/~nen, 1992). Holistic, functional approach. Attention capture in humans, stink bugs, crawfish, and any number of other organisms is probably implemented by different internal, anatomical and physiological dynamics, but the primary function of attention capture is probably very much the same in all species. 9 We suggest that a holistic, functional approach can lead to an understanding of attention capture, one that applies to all species, in ways that a strictly reductionistic focus on internal mechanisms cannot. We suggest further that an important function of attention capture is to maintain viable relationships between whole organisms and their habitats. In particular, we suggest that attention capture helps to resolve the stability/plasticity dilemma. Attentional capture and the stability~plasticity dilemma. In order to survive and remain viable, organisms must familiarize themselves with their environments so that they can efficiently and adaptively navigate through them and otherwise relate to them. Organisms become attuned to their environments, capable of anticipating, efficiently processing, and responding to environmental regularities. They become biased toward the predictable features of their habitats and settle in adaptive behavioral routines or attractors. Examples of the bias toward familiar and expected stimuli are replete in the cognitive literature (see W. Johnston & Hawley, 1994). In short, relatively stable relationships develop between organisms and their ecosystems. Of course, an ecosystem is usually composed of a complex web of
Dynamical Systems and Attention Capture
383
diverse forms of life, each one developing relatively stable relationships with every other one, and the whole web tends to settle into a self-perpetuating attractor. Thus, individual organisms and whole ecosystems tend to move away from the edge of chaos toward increasing order and stability. Stability has immense benefits. Without it, organisms and ecosystems could not long survive. But there can be important costs to stability as well. No organism or ecosystem exists in isolation. Every system is at least indirectly connected to every other one. The broader webs are always dynamic and evolving. If a given system has settled for too long in a self-perpetuating attractor, it can become mired there, unable to respond to changes in the broader webs to which it is connected, and fail to undergo a phase transition into a new, more viable and adaptive attractor, l~ Excessive stability can lead to obsolete and isolated attractors in which systems begin to succumb more quickly to the second law of thermodynamics. Thus, in order to avoid excessive and maladaptive rigidity, systems must maintain a degree of plasticity. They must reside somewhere in the optimal zone between order and chaos. ~1 We suggest that an important function of attention capture is to help resolve the stability/plasticity dilemma for the organism and protect it against excessive rigidity. Change detection. Most supposed instances of strong attention capture in the literature involve some sort of change. A sudden onset is a change in a relatively static environment, an odd stimulus is a change in a relatively homogeneous environment, and an unexpected stimulus is a change in an otherwise familiar and predictable environment. Attention capture is a bias toward deviations from the ordinary, commonplace, and predictable, and, as such, it counteracts the strong bias noted above toward the predictable features of the environment. Because of attention capture, the bias of organisms toward the expected features of their environments is balanced to varying degrees by a bias toward unexpected stimuli. 12In addition to affording organisms a degree of vigilance toward potentially important (e.g., threatening) intrusions into their habitats, the bias toward change serves to guard against excessive stability and entrenchment in obsolete and maladaptive attractors. In general, attention capture helps to keep organisms flexible and dynamic, capable of resonating to and evolving with the dynamic webs in which they are imbedded.
Segue In the remainder of this chapter we trace the evolutionary history of attention capture, especially in humans. This evolutionary approach reveals possible shortcomings to the standard, reductionistic analysis of attention capture. In particular, this approach reveals important features of attention capture not often addressed in the contemporary literature, including mutual attraction of attention in
384
Johnston and Strayer
organisms, ecology of attention capture, contextual variability of attention capture, internal capture of attention, co-opting of attention capture in humans to serve cultural and institutional systems, and the relevance of all of this to broader, social issues.
Attention Capture and Biological Evolution Attention capture is adaptive and biologically primitive Clearly, humans are not the only organisms whose attention can be captured. One can readily wimess attention capture in other species, even so-called primitive ones with somewhat simpler brains. When one steps too close to a stink bug on a hiking trail, it immediately stops, collapses on its front end, and raises its rear end. When one comes too close to a crawfish in a stream, it too abruptly ceases whatever it is doing and raises up in a defensive posture with its claws extended upward and outward. Attention capture in stink bugs and crawfish is manifested in the whole body, often as an adaptive response to some environmental perturbation or change. Strong capture. Change detection is often a strong form of attention capture and probably emerged very early in the evolution of life on earth, even earlier than the arrival of stink bugs and crawfish, perhaps earlier even than vision, audition, and, certainly earlier than bipedalism and frontal lobes. ~3 The early onset of strong attention capture in biological evolution no doubt reflects its adaptive value. Organisms that failed to detect and respond appropriately to change in their habitats were less likely to survive. Indeed, complex organisms and ecosystems would probably not have evolved at all if attention capture had not emerged very early in the history of life. The original role of attention capture was no doubt to keep organisms sensitive to environmental events (e.g., a rustle in the bushes, a novel odor wafting in the breeze, or a shriek in the night) that signal potential biologically imperative intrusions into their habitats such as geological perturbations (e.g., storms, fires and floods) and predators, prey, and mates. The attentioncapturing power of abrupt, odd, and unexpected stimuli in our human subjects today probably had its origin at least as early as the Cambrian Explosion of multicellular life forms a half-billion years ago, if not in the microcosmic world of bacteria some three billion years earlier. Weak capture. Weak attention capture can also be adaptive and no doubt also emerged early on in the evolution of life, though perhaps somewhat later than strong capture. Objects that happen to draw attention even when they are nonimperative and do not represent a dramatic change in the environment can also affect survival. A possible example of this may be mate selection in primates and other organisms. ~4 The attention of an adult male passively monitoring his
Dynamical Systems and Attention Capture
385
environment might be prone to capture by a group of females more than by a grove of trees and especially by a female in the group whose morphology is indicative of pubescence, health, and child-bearing capability. Likewise, the attention of a female might be prone to capture by a male whose morphology and other characteristics suggest that he would be an excellent protector and provider of resources. Males and females whose attention is weakly captured by such signs of reproductive fitness are more likely to mate and pass these physical and attentional traits on to their offspring. Attention capture is an ecological process
One interesting aspect of the above examples of attention capture in stink bugs and crawfish and of mate selection in primates is that the capture can be reciprocal and contagious. The crawfish and the observer might capture each other's attention, and the attentional responses of the observer might capture the attention of a third party representing a different species and lead it to notice the crawfish, yielding a three-species ensemble of attention capture, a local and transitory ecological web of attentional relationships. Potential mates can capture each other's attention, and this mutual attraction of attention might capture the attention of a competitor. These examples illustrate another sense in which attention capture is not just a phenomenon of brain activity; it is a phenomenon of whole, dynamical ecosystems. We suggest that ecosystems evolve and flourish in part because of the mutual capturing of attention among many of their participant organisms. Attention capture may very well help to form and govern the complex webs of behavioral relationships among the participants that define particular ecosystems and render them viable. Thus, attention capture might be a vital feature of whole, thriving ecosystems, one that keeps them within an optimal zone between order and chaos. Attention capture is contextual
Attention capture appears to be contextually dependent. This is especially evident in naturalistic situations. What captures the attention of female lions depends on whether they are hungry or in heat. Even the attention-capturing power of sudden onsets varies with context (e.g., Folk, Remington, & J. Johnston, 1993). In hiking through a glen on a sunlit day, one may encounter many sudden onsets in the form of shafts of sunlight that break through small pores in the dense canopy of branches and leaves. At first these sudden onsets might capture one's attention, but, probably owing to their repetitiveness, their attention-capturing power soon diminishes. This diminution of attention capture by repeated events might itself be an adaptive feature of the attentional processes of organisms. Indeed, many
386
Johnston and Strayer
contextual variations in attention capture are likely to be adaptive and may have emerged very early in the course of biological evolution. Organisms are more likely to survive if what captures their attention is contextually appropriate. However, we discuss below in connection with cell phones and automobile driving an example of a context in which the attention-capturing power of sudden onsets is weakened, even though this weakening is probably maladaptive. Attention capture co-evolves
The ecology of attention capture may play an important role in the coevolution of species. This is especially evident in the evolutionary "arms races" between predators and prey such as cheetahs and gazelles and birds and moths (e.g., Dawkins, 1986). The predator whose attention is most readily captured by its main prey is most likely to survive and pass on this trait to its descendants. This is likewise true of the prey whose attention is most readily captured by its main predator. Predators and prey and other cohabitants of ecosystems may be locked into an evolutionary spiral of attention capture as well as a number of other traits and processes (e.g., morphology and sensory processes). In general, the shifting attentional relationships in multi-species ecosystems may serve the evolutionary vitality of these systems. Attention capture in human evolution
Weak and strong forms of attention capture no doubt played important roles in the early stages of human evolution. When our hominid ancestors first stood upright and ventured from the jungles onto the savannas, they must have confronted new biologically important inputs or, at least, new perspectives on these inputs. For example, they could perceive potential predators, prey, and mates at greater distances than they could in the dense jungles. Over the first several millennia of their dual residence in jungles and savannas, attention capture in these early hominids must have been shaped by biological evolution to appropriately sensitize them to these new inputs. Whatever species-specific forms of weak and strong attention capture might have evolved in early humans, contextually variable change detection very likely remained an adaptive algorithm that served them well even as they confronted new inputs in their broadened habitats. It is likely that attention capture continued to be fined-tuned in adaptive ways as human evolution passed through various attractor phases over the course of the last four million or so years, helping to give Homo Habilis the competitive edge over Australopithecus, Homo Erectus the competitive edge over Homo Habilis, and so on. However, we suggest that attention capture in humans underwent a much more dramatic phase transition, or
Dynamical Systems and Attention Capture
387
series of transitions, only very recently in our evolutionary history, and that this transition is attributable more to cultural evolution than biological evolution. Attention Capture and Cultural Evolution
In this section, we first outline cultural evolution and how it has transformed the human mind and human behavior in general and then examine how it may have transformed human attention in particular. Cultural evolution Upper-paleolithic and neolithic revolutions. As Diamond (1992, 1997) points out, for most of its history, human evolution with all of its phase transitions up to and including the first anatomically modem humans was unspectacular. Although the brains of our anatomically-modern ancestors 100,000 years ago were very similar if not identical to our brains today, their behavior barely distinguished them from the earliest, smaller-brained protohumans and their primate cousins. At least 99 % of human evolution passed before significant changes in human behavior occurred, changes that are attributable more to cultural evolution than biological evolution. We suggest that these profound effects of cultural evolution on human behavior have been mediated, in part, by various cognitive processes, including attentional capture. ~5 The first step in this profound change was the upper-Paleolithic revolution which began around 40,000 years ago and which was characterized by, among other things, 1) a rapid diversification of human artifacts, including a variety of specialized tools and weapons, body ornaments, and pottery, and 2) the emergence of language and self-reflective, symbolic thought, as evidenced in part by cave drawings. The next big step was the transition from a hunter-gatherer to an agrarian lifestyle that characterized the Neolithic revolution around 8,000 years ago in an area of the near-east (i.e., Mesopotamia) called the fertile crescent. Third nature and the institutional order. Once linguistic, symbolic thinking humans began to settle down into villages, a whole new web of relationships with powerful emergent properties began to self-organize and complexify, one that has had profound effects on human life and, indeed, the whole planet. This web may even define a qualitatively new form of nature, a "third nature." First nature is material; it burst into existence with the big bang some 12 billion years ago. Second nature is biological; it began, on earth at least, with the first bacteria some 4 billion years ago. Third nature is ideological, cultural and institutional; it began with the neolithic and paleolithic revolutions in human minds and life styles between 8 and 40 thousand years ago and self-organized into the vast
388
Johnston and Strayer
web of relationships defined, in part, by what has been called the institutional order (Tumer, 1997). The institutional order includes, among other dynamical systems, politics, law, religion, the press, industry, economy, technology, science, and education. Prior to the neolithic revolution, all of the functions of what was to become the institutional order were performed within kinship-based clans, but the transition to village living required a separation of these various functions out of the clans and into what was to become an immensely powerful and self-perpetuating institutional order. The history of third nature is especially evident in the history of Western civilization. The three natures are now co-evolving, and third nature is feeding back on first and second natures, often in ways that put the planet in jeopardy. To a nontrivial degree, third nature has literally re-sculpted first and second natures, extinguishing and displacing many populations and species of organisms and replacing much of the fractal geometry of first and second nature with the rectilinear structures of third nature. Humans, especially in industrialized nations, comprise the medium by which third nature lives and wields its power. We suggest that third nature has infiltrated the human mind and controls human behavior in ways that serve the institutional order. Human minds spawned third nature, continue to serve as its "survival machine," and now are affected and potentially victimized by third nature. Third nature may be considered an emergent property of human culture, a form of collective intelligence like that evidenced by ant and bee colonies (e.g., Franks, 1989). 16 Memetic evolution. The basic units of first nature might be atoms or subatomic particles, those of second nature genes, and those of third nature memes (e.g., Dawkins, 1976). Memes are concepts and belief systems and can be manifested in the various physical artifacts and technologies spawned by the institutional order (e.g., money, television, automobiles, and computers). Among the original memes that fueled the growth of the institutional order are the ideas of progress, control over and separation from the rest of nature, and human superiority to the rest of nature (e.g., Nisbet, 1994). The evolution of third nature is based on memetic evolution, and the human~ mind is the medium through which this evolution occurs. Third nature has not only restructured our environments, filling them with buildings, pavement, automobiles, and countless other artifacts, but may have altered our minds in equally profound ways. Transition from generalist to specialist. Prior to the birth of third nature, individual humans were generalists, a trait that may account for their success at survival during especially turbulent times on the earth (e.g., Potts, 1996). There was a gender-based division of labor between the hunters and gatherers, but virtually all clan members possessed all of the skills needed to survive. With the evolution of third nature, individual humans became increasingly specialized. We have been
Dynamical Systems and Attention Capture
389
recruited into particular niches, attractors, and areas of expertise within the institutional order. An adult Australopithecus Africanus would be at a loss if he or she were thrust suddenly in the modem, third-nature environments of Western civilization, ill equipped for any of its specializations, but most modem human adults would be similarly at a loss outside of third nature, ill-equipped with the general skills needed to long survive in the first- and second-nature ecosystems within which our distant ancestors thrived. 17 The transformation to specialists may have moved modem humans, at least those in industrialized nations, out of the optimal zone between order and chaos and rendered them more susceptible to the costs of expertise. Third nature itself is dynamic and plastic, always moving and adjusting to effects that ripple across the institutional web. Religions change, the U.S. constitution changes, governments change, national boundaries change, technology changes, and specializations (e.g., issues, theories, and methodologies in cognitive psychology) change. Yet there is a certain dynamic stasis in all of this flux. The institutional order as a whole is alive and well. As third nature evolves, the landscape of human skills, specializations, and belief systems changes, leaving some human experts to stagnate in obsolete attractors. Obsolete attractors and the people who fill them are replaced by new attractors filled by new, usually younger and more flexible, individuals. So third nature may remain in the adaptive zone between order and chaos even if the human minds on which it once depended are left marooned in obsolete attractors, too rigid to keep pace as the landscape of institutional attractors evolves. Just as the patterns of relationships defining our bodies remain alive and dynamic even as the cells composing them are constantly replaced, so the institutional order remains alive and dynamic even as the human minds on which it depends are constantly replaced. Attentional effects of cultural evolution
We suggest that attention capture in humans has been both exploited and transformed by the evolution of third nature. Third nature has co-opted attentional capture. Clearly our attention continues to be strongly captured by the same kinds of input that captured attention in our Paleolithic ancestors and the protohumans before them (e.g., sudden onsets and singletons of various sorts). However, both the sources and survival value of these capturing events have changed dramatically with the rise of third nature. In lieu of a rustle in the bushes or a screech in the night, what captures our attention today is more likely to be a wailing siren, rap music booming out of a passing automobile, strident admonishments from social agents such as parents and teachers, a letter from a journal editor, funding agency, or department head, or flashing, beeping, colorful advertisements or signs on television, magazine covers, or marquees on casinos, movie theaters, and store fronts. The material objects that
390
Johnston and Strayer
capture our attention are more likely to be constructions of third nature than second nature, such as computers, television, tabloids, fashion magazines, ice-cream wagons, cell phones, and alarm clocks in lieu of wild plants and animals. And what is served by attention capture in modem humans? It is less likely our survival in first and second natures that is served than it is our survival in third nature, and it may be less our survival that is served than it is the survival of the memes and institutions comprising third nature itself. Indeed, in some cases, the exploitation of our attentional processes by third nature does us a disservice, as is exemplified by the flow of carcinogens, ulcers, anorexia, and cardiovascular illnesses through our population, our fixation on physical attractiveness, wealth, and social status, and the steadily increasing proportion of our time that we spend interacting with and relating to the machines and other artifacts of third nature in lieu of other people and the dwindling ecosystems of second nature. Biologicallyevolved attention capture served our ancestors well for hundreds of thousands of years, just as it has served all other species for hundreds of millions of years. Now these same processes have been co-opted by a fledgling third nature for its own selfperpetuating purposes, just as have other biologically evolved processes such as mate selection and the fulfillment of basic needs. ~8 Third nature has transformed attentional capture. We suggest that the evolution of third nature has effected a shift in the orientation of attention away from primarily extemal stimuli to internal ones. We tend to be preoccupied with the memes with which third nature has infiltrated our minds. This internal, memetically-driven attention may be another vehicle by which the institutional order keeps us in its service; it tunes us away from the external inputs of first and second natures and into the intemalized memes of third nature. Because of this extemal to internal shift, our attention may be less likely to be captured by the same sorts of extemal input that captured the attention of our pre-linguistic huntergatherer ancestors and that continue to capture the attention of our close primate cousins. Now when we engage in our routine activities at home and at work, stroll through a park, or even hike through a forest, we are likely to spend more time absorbed in some form of memetically-controlled reverie than attentive to external stimuli. Of course, the degree of attention to intemal sources is likely to be contextually variable. For example, if our stroll through a city park happened to take us accidentally into a gang-infested neighborhood, our attention would very likely shift more toward extemal sources. ~9 Even the strong attention-capturing power of sudden onsets might be vitiated by intemally-directed attention, defining another example of contextuallyvariable attention capture. Some suggestive evidence for this has been generated by our own research on the use of cell phones while performing a simulated driving task (Strayer & W. Johnston, in press). When our subjects were deeply involved in cell phone conversations or difficult mental tasks, they not only performed the
Dynamical Systems and Attention Capture
3 91
driving task more poorly but were more than twice as likely to miss occasional sudden onsets of light at the center of fixation than when they were not on the cell phone. One might argue that these sudden onsets of lights did capture attention but that responses to them were suppressed. However, in more recent studies, we examined implicit, perceptual memory for words that were incidentally flashed at fixation and called for no responses. Implicit memory was reliably stronger when subjects were not engaged in a cell-phone conversation than when they were so engaged. This apparent reduction in the attention-capturing power of sudden onsets of words cannot be attributed to response suppression since no responses to them were required. We acknowledge that this evidence is only suggestive because attention to a cell-phone conversation entails a degree of external attention as well as internal attention. However, it bears some similarity to internal reverie in the sense that subjects are mentally engrossed in a meme-based dialog that bears no relationship to the primary visual-motor task and the proximal environment. Like much of our meme-based reveries, cell-phone conversations can pull our attention away from our current external-environmental context and into an internal, memetically-based context. 2~
Social Implications of Attention Capture In addition to suggesting issues and aspects of attention capture not often encountered in the contemporary literature, an advantage of the perspective on attention capture offered here is that it has implications for some of the large-scale "social" problems and issues facing humanity and the planet as a whole. Our thesis is that cultural evolution has produced a vast institutional order, a third nature, that controls our behavior in self-perpetuating ways that serve its own survival and growth. This control stems in part via memetically-driven attention capture. Our attention still could be, but rarely is, captured by the same first- and second-nature inputs to which millions of years of biological evolution has tuned us (e.g., a rustle in the bushes, a movement through the grass, or an animal cry in the night), in part because first and second natures have themselves been altered and shaped by third nature (e.g., they have been replaced by buildings, highways, domesticated plants and animals, and all of the technological artifacts with which we are constantly surrounded). Our biologically-evolved attentional processes have been co-opted and exploited by third nature such that our attention is directed both externally and intemally toward memes that encourage us to conform to and satisfy the needs and goals of third nature. Our attention and other cognitive processes are the "survival machines" of third nature, the means by which it self-perpetuates. Unfortunately, it has become painfully evident that third nature has altered the planet in ways that put it in peril. The rise of third nature, guided by the original memes of progress and human superiority to, conquering of, and control over
392
Johnston and Strayer
nature, has led to such serious problems as global warming, decimation of species, depletion of resources, degradation of whole ecosystems, overpopulation of the planet by humans, and, very possibly, an impoverishment of the modem human mind (e.g., Diamond, 1992, 1997; Safina, 1995; Smil, 1997). Because we are the instruments or survival machines of third nature, we are co-conspirators, witting or unwitting, in the degradation of the planet and, ultimately, of ourselves. It may not be too late for us to fight back and try to regain control over third nature, forcing it through phase transitions that place it more in harmony with first and second natures. To do this, we would ourselves need to undergo phase transitions in our minds and behaviors. One step in this direction is to cease relying so extensively on reductionistic methodological and theoretical paradigms in our scientific systems and begin putting ourselves and our three natures back together again by assessing the dynamical webs of relationships in which we are all embedded. As individuals, we might try resurrecting the Native American tradition of tuning into first and second natures, to people and natural ecosystems in lieu of third nature machines and memes, reconnecting to the planet and realizing that we are not superior to it but a vital and powerful facet of it. We certainly cannot exorcise our third nature and return to a hunter/gatherer lifestyle, but we may be able to push for a planet-friendly third nature and adopt more planet-friendly life styles. As investigators of attention capture and other cognitive and neural phenomena we might concentrate on how to conceptualize and study these phenomena within the three natural webs (i.e., the three natures) from which they evolved and to which they relate. Like most natural systems, the third nature system of psychology has self-organized and complexified for over a century. Since the "cognitive revolution" of around 1960, cognitive psychology has itself been carved up into an every increasing number of relatively isolated specializations, each with its own jargon and methodologies. The mind essentially been broken down into a multitude of parts. Our primary goal in this chapter has been to suggest how a phase transition to a holistic, dynamical-systems approach might serve to put the mind and body back together again with the natural webs with which they have coevolved. Because these areas of psychology are dynamical systems, they almost certainly will undergo significant transformations in the coming years. If we students of attention want to keep pace with these transformations and not suffer serious costs of expertise, then we may be well advised to prepare ourselves for them. This may entail digging ourselves out of our professional attractors, moving closer to the edge of chaos, and beginning to consider, participate in, and help shape the inevitable transformations in our discipline.
Dynamical Systems and Attention Capture
393
Footnotes
1 We are grateful to Chip Folk and Brad Gibson for encouraging us to submit this rather radical perspective on attention capture and to Elizabeth Cashdan and Jim Dannemiller for providing comments on an earlier version of this chapter. 2 An ancillary goal of this chapter is to promote interest in these potential lines of research and theory. 3 We do not necessarily subscribe to the usual positivistic approach to and interpretation of cognitive psychology, or any other discipline for that matter. Indeed, we suspect that much of what goes on in the name of the science of cognitive psychology is to some extent socially constructed. The received views of attention capture and the appropriate methodology for studying it may to some extent be artifacts of the cultural and technological contexts in which they have arisen and currently flourish (e.g., Shimp, 2001). 4 Nonetheless, many cognitive psychologists and other scholars continue not only to rely on the concept of consciousness but regard it as being open to scientific investigation. For example, we have just received a "Call for Papers" to an international conference entitled Consciousness and its Place in Nature: Toward a Science o f Consciousness (see also Dennett, 1991, and Jackendoff, 1987). 5 Admittedly, there may not be consensus on what distinguishes natural from unnatural contexts (e.g., environments and tasks). Indeed, any context that arises within the universe is, almost by definition, a natural one in the most general sense. However, we are using a more limited, natural-language meaning of naturalistic in this paper. To us, a naturalistic context for any species is one in which most members of that species are bom, reared, survive, and reproduce. For our hunter/gatherer ancestors, hunting and gathering would be naturalistic. For modem humans, driving automobiles, cooking dinner, watching movies, reading books, listening to music, and working at their jobs would be considered naturalistic. 6 The breaking of organisms down into their parts, especially brain parts, is often done literally in research on animal cognition. In human research it is done by contriving situations that call into play only one or a few internal systems. Most contemporary studies of human attention effectively isolate vision and brain activity from the multi-sensory, multi-system interactions and interdependencies that characterize most naturalistic contexts. Naturalistic tasks engage multiple sensory modalities simultaneously and engage the autonomic nervous system in addition to the central nervous system and endocrine, immune, and various other systems in addition to the nervous system. 7 One reviewer of this chapter expressed concern that to place attention capture in a complete evolutionary and cultural context might be desirable but is daunting and probably impractical. We agree that it would be impossible to trace all
394
Johnston and Strayer
the strands of the vast web in which individual humans are embedded. However, we suggest that it is time to at least consider the web and to begin to venture outside the organism and examine some of the more obvious strands to which various cognitive processes are related and with which they have co-evolved. 8 Of course, ultimately all systems begin to reverse their self-organizing, complexifying course, deteriorate, and succumb to the second law of thermodynamics. But, like the phoenix, new stars arise from the ashes of old stars, new empires arise after the fall of old ones, and new life arises from the remains of old life. 9 We wonder if contemporary theorists of human attention would apply the same ideas of attentional gates and intelligent, conscious homunculi to other organisms such as stink bugs and crawfish. ~~ examples of the costs of stability in the psychological literature are language (e.g., Werker, 1989), functional fixity, problem-solving set, and selffulfilling prophecy characteristics of social stereotypes. ~ Examples of the stability/plasticity dilemma and the edge between order and chaos in real organisms and ecosystems are abundant in ecological literature (e.g., Reice, 1994; Tilman, 1996; Vitousek, D'Antonio, Loope, & Westbrooks, 1996). For example, aquatic and forest systems that are too rarely perturbed by floods or fires tend to stagnate (e.g., Reice, 1994). ~2 We have related attention capture to the stability/plasticity dilemma elsewhere (e.g., W. Johnston & Hawley, 1994; W. Johnston, et. al, 1996). W. Johnston & Hawley (1994) offered mismatch theory as a possible description of some of the computational dynamics from which simultaneous biases toward both expected and unexpected inputs arise. This theory does not appeal to homunculi and consciousness and is a possible example of how attention capture can be an emergent phenomenon of normal perceptual dynamics. ~3 Whether or not some form of attention capture occurs in single-celled organisms and multicellular plants is moot and we take no position on this issue here. However, we do suggest that attention capture occurs in most multicellular species of animals. Therefore, in this paper, the term organism refers to multicellular animals. 14 In at least some instances, as when a female animal is in heat, potential mates might strongly capture attention. ~5 The fact that our brains may have remained virtually unchanged for at least 100,000 years while our behavior has changed dramatically argues against the current heavy reliance on reductionistic neural science to explain this behavior change. One must venture outside the brain and into cultural history to understand how the same brains could mediate such dramatic behavioral differences. 16 One reviewer expressed a reluctance to accept the idea of a third nature since second-nature human beings are involved in all of the institutions (e.g.,
DynamicalSystemsandAttention Capture
395
politics) and ideologies (e.g., fascism) that we offer as examples of third nature. We agree that humans are involved in third nature enterprises, but we suggest that there are emergent phenomena in these human-based collectives that are not found in their individual components. Atoms make up living cells, but there are important properties of cells that do not exist in atoms. Cells make up people, but there are emergent properties of people that are not manifested in their cells. People make up institutions, but there are important properties of institutions that do not exist in individual humans. ~7Of course, third nature of an embryonic sort no doubt existed even in our pre-linguistic hunter-gatherer ancestors. It was embedded in the clan culture, its rituals, mores, beliefs, and other collective attributes. Indeed, relatively simple forms of third nature, including culture and technology, can be found in other organisms, notably chimpanzees (e.g., Whiten, 2001). 18 Third nature constantly informs us of what we should wear, how we should look, and what we should consume. Our obedience to these instructions serves clothing designers and retailers, manufacturers of exercise equipment, pharmaceutical companies, food and beverage makers, and virtually the entire institutional order. The value of all of this to us, as biological organisms, is dubious. 19We are grateful to Elizabeth Cashdan of the Department of Anthropology at the University of Utah for suggesting this example of the contextual variability of internal attention. 2o We are currently planning a variant of our cell-phone research in which subjects are probed with sudden onsets both in time blocks when they are preparing for a ensuing debate (e.g., pro- vs. anti-abortion) and in time blocks when they reading a position on the issue. We expect to find that the attention-capturing power of the sudden onsets is less in the former condition than the latter. References
Dawkins, R. (1976). The selfish gene. Oxford: Oxford University Press. Dawkins, R. (1986). The blind watchmaker. New York: W. W. Norton. Dennett, D. C. (1991). Consciousness explained. Boston: Little Brown. Diamond, J. (1992). The third chimpanzee. New York: Harper-Collins. Diamond, J. (1997). Guns, germs, and steel. New York: W. W. Norton. Folk, C. L., Remington, R. W., & Johnston, J. C. (1993). Contingent attention capture: A reply to Yantis (1993). Journal of Experimental Psychology: Human Perception and Performance, 19, 682-685. Franks, N. R. (March-April, 1989). Army ants: A collective intelligence. American Scientist, 77, 134-145. Gleick, J. (1987). Chaos. New York: Viking Press.
396
Johnston and Strayer
Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11, 23-63. Jackendoff, R. (1989). Consciousness and the computational mind. Cambridge, MA: MIT press. James, W. (1890/1950). The Principles of Psychology. New York: Dover. Johnston, W. A., & Dark, V. J. (1986). Selective attention. Annual Review of Psychology, 37, 43-75. Johnston, W. A., & Hawley, K. J. (1994). Perceptual inhibition: The key that opens closed minds. Psychonomic Bulletin & Review, 1, 56-72. Johnston, W. A., Hawley, K. J., Plewe, S. H., Elliott, J. M. G., & DeWitt, M. J. (1990). Attention capture by novel stimuli. Journal of Experimental Psychology: General, 119, 397-411. Johnston, W. A., Strayer, D. L., & Vecera, S. P. (1998). Broadmindedness and perceptual flexibility: Lessons from dynamic ecosystems. In J. S. Jordan (Ed.), Systems Theories and A Priori Aspects of Perception. Amsterdam: Elsevier. Kauffman, S. A. (1993). The Origins of Order. New York: Oxford University Press. Kuhn, T. S. (1970). The Structure of Scientific Revolutions. Second ed. Chicago: University of Chicago Press. LaBerge, D. (1975). Acquisition of automatic processing in perceptual and associative learning. In Rabbitt, P. M. A. & Dornic, S., (eds.) Attention and Performance, Vol. 5. New York: Academic Pres, 50-64. Lewin, R. (1992). Complexity: Life at the Edge of Chaos. New York: Macmillan. N/i~t/inen, R. (1992). Attention and Brain Function. Hillsdale, N. J.: Erlbaum. Nisbet, R. (1994). History of the Idea of Progress. New Brunswick, N. J.: Transaction. Posner, M. I., & Snyder, C. R. R. (1975). Facilitation and inhibition in the processing of signals. In Rabbitt, P. M. A. & Dornic, S., (eds.). Attention and Performance. Vol. 5, New York: Academic Press, 669-682. Potts, R. (1996). Humanity's Descent. New York: William Morrow. Prigogine, I., & Stengers, I. (1984). Order out of Chaos. New York: Bantum. Reice, S. R. (Sept.-Oct., 1994). Nonequilibrium determinants of biological community structure. American Scientist, 82, 424-435. Safina, C. (Nov., 1995). The world's imperiled fish. Scientific American. 46-53. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127-190.
Dynamical Systems and Attention Capture
397
Shimp, C. P. (2001). Behavior as a social construction. Behavioral Processes, 54, 11-33. Smil, V. (July, 1997). Global population and the nitrogen cycle. Scientific American. 76-81. Sokolov, E. N. (1963). Higher nervous functions: The orienting reflex. Annual Review of Physiology, 25, 545-580. Strayer, D. L., & Johnston, W. A. (in press). Driven to distraction: Dualtask studies of driving and conversing on a cellular phone. Psychological Science. Tilman, D. (1996). The benefits of natural disasters. Science, 273, 1518. Tumer, J. H. (1997). The Institutional Order. New york: Addison Wesley. Treisman, A. M., & Gelade, (1980). A feature-integration theory of attention. Cognitive Psychology, 12, 97-136. Vitousek, P. M., D'Antonio, C. M., Loope, L. L., & Westbrooks, R. (Sept.Oct.,1996). Biological invasions as global environmental change. American Scientist, 84, 468-478. Werker, T. F. (Jan.-Feb., 1989). Becoming a native listener. American Scientist, 77, 54-59. Whiten, A., & Boesch, C. (Jan. 2001). The culture of chimpanzees. Scientific American, 60-67. Wolfe, J. M. (1994). Guided Search 2.0: A revised model of visual search. Psychonomic Bulletin & Review, 1,202-238. Yantis, S. (1993). Stimulus-driven attentional capture and attentional control settings. Journal of Experimental Psychology: Human Perception and Performance, 19, 676-681.
This Page Intentionally Left Blank
399
Subject Index
abrupt onsets, 194, 107, 111,128, 134, 135, 136, 137, 138, 139, 154, 155, 158, 159, 191, 195, 196, 206, 208, 209, 210, 212, 213,220, 223,298, 300, 301,302, 303,304, 344 additional singleton paradigm, 134, 139, 155 ADHD, 361 adjacent response filter (Adjar), 8, 11, 18 anterior attentional system, 328, 331,332, 334, 343 anticipatory attending, 209, 210, 211, 212, 213, 216,220,222,223 anti-saccade task, 356, 357, 360, 364 attentional blink (AB), 53, 75, 93, 102, 103, 104, 106, 108, 116, 186, 206, 224, attentional control, 30, 51, 55, 57, 70, 72, 92, 94, 107, 108, 111, 112, 121,127, 293,295, 296, 298, 302, 305,306, 310, 311, 312, 325, 326, 335,337, 339, 340, 343,344, 349, 350, 354, 361,364, 365, 366, 367, 368 Attentional Control scale, 335,336, 343 attentional dwell time, 178 attentional focus, 196, 209, 210, 211, 212, 213, 215,220, 223,224 attentional pace, 194, 200, 201,206, 209, 210, 221 attentional pulse, 212, 213, 215,223 attentional set, 55, 70, 93, 111,124, 128, 135, 141,154, 155, 156, 167, 158, 163, 164, 165, 167, 168, 169, 170, 196, 206, 223,268, 296, 297,298 attractors, 381,382, 383,389, 392 auditory attention, 191,209, 233,237, 239, 241,243,245,247,251 auditory capture, 232, 237, 244, 246, 256 automaticity, 9, 15, 17, 21, 53, 78, 90, 153, 154, 179, 180, 185, 186, 187, 193, 195, 196, 202, 203,204, 205,208, 212, 214, 232, 306, 307, 308, 311,334, 344, 356 awareness, 52, 60, 61, 62, 65, 66, 68, 72, 143, 144, 151,152, 153, 159, 160, 161,162, 164, 165, 166, 167, 168, 169, 170, 354, 376, 377 biological evolution, 375,381,384, 386, 387, 391 butterfly effects, 380
central cues, 153, 158, 159, 299, 300, 334 classical conditioning, 179, 184, 185, 187 co-evolution, 386 cognitive load, 7, 153, 196 conscious detection, 53, 60, 61, 62, 66, 68, 71, 74, 154, 159, 162, 166 consciousness, 151, 152, 372, 378, 381,382, 383,384, 437, 440 contingent involuntary orienting (CIO), 59, 78, 88, 93, 97, 98, 100, 102, 107, 108, 116, 128, 129, 131,135, 142, 156, 157, 163,206, 298 continuous performance, 354, 361 covert orienting, 29, 45, 153,232,265 crossmodal attention, 231,232, 233,234, 235, 236, 237, 238, 239, 240, 241,243,244, 245, 246, 248, 251,252, 254, 255,256 cued search, 192, 195 cue-saccade task, 33, 34, 40 cue-target paradigm, 28, 30, 197, 198, 202, 203 cultural evolution, 375,380, 381,387, 389, 391 default setting, 90 deterministic chaos, 380, 381 dichotic listening, 327, 350, 354, 355,356, 358, 359, 360 difference signals, 51, 155 disengagement, 29, 101,129, 131,328, 334, 336, 337, 342, 343 divided attention, 53, 59, 72, 73, 312 dorsolateral prefrontal cortex (DLPFC), 304, 367 dual task, 52, 69, 72, 103, 104, 105, 106, 107, 303 dynamic attending, 191,208, 209, 211, 213, 214, 215,217, 218, 220, 222,223 dynamical systems, 224, 375,379, 388, 392 early selection, 6, 7, 9, 38 ecology of attention capture, 384, 386 entrainment, 210, 211, 212, 215, 216, 223 event-related potential (ERP), 3, 4, 5, 6, 9, 11, 12, 13, 15, 19, 20, 21, 27, 29, 244, 267 executive control, 293,295,296, 312
400 exogenous cue, 192, 195, 196, 199, 203,204, 208 expectancy profile, 211, 215, 216, 218, 219, 220,221,223 expectations, 55, 152, 154, 160, 162, 163, 165, 167, 169, 170, 296, 305 explicit attention capture, 151, 159, 169 extrastriate cortex, 6, 20, 21,243,367 extraversion, 325,326, 330, 332,333,334, 335,339, 340, 342 feature search mode, 111, 139, 140 filtering cost, 106, 113, 125, 127, 132, 142, first nature, 388 flanker effect, 309 focused attention, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 65, 66, 68, 69, 70, 71, 72, 73, 109, 116, 151, 169, 293,308, 309 frontal lobe hypothesis, 295 Gaussian noise, 273 goal maintenance, 311,359, 368 goal-directed selection, 57, 93, 96, 97, 157, 121,134, 137, 296, 297, 299, 302, 304 guided search, 29, 52, 53, 54, 70 habituation, 181, 184, 185 hemifield comparison model, 283,284, 286, 287 implicit attention capture, 151,152, 153, 159, 168, 169 implicit spatial discrimination, 235,243,245, 247,251, impulsivity, 343,346 inattentional blindness, 52, 53, 60, 62, 65, 143, 151,152, 161,162, 166, 170 individual differences, 156, 305,325,326, 329, 331,332, 334, 336, 342, 343,349, 350, 352, 353,354, 356, 360, 364, 366 inhibition, 11, 17, 20, 21, 27, 28, 29, 30, 37, 39, 40, 41, 42, 43, 127, 132, 133, 134, 137, 157, 177, 178, 179, 180, 184, 185, 186, 187, 265, 294, 311,330, 331,335,338, 342, 343,358 inhibition of return (IOR), 11, 17, 28, 29, 42, 43, 127, 137, 265 inhibitory surrounds, 119, 132 intregrated hazard function, 142 institutional order, 387, 388, 389, 390, 391, 395 internal attention, 376, 391,395
intramodal attention, 243,249, 255,256 involuntary orienting, 77, 78, 79, 84, 90, 113, 127, 156 late selection, 6, 7, 9, 365 maximum response model, 272, 274, 276, 277, 278, 279, 280, 281,282,283,284, 286, 287 memes, 388,390, 391,392 memetic evolution, 388 motion singletons, 56, 155 motivational valence, 325,326, 333 negative priming, 294 Neolithic revolution, 387 neuroimaging, 6, 27, 312, neuronal processing, 6 neuronal stimulation, 40, 41, 42, 121,286 neuroticism, 330, 332, 333 NP80 component, 5, 6 occipital, 6, 12, 14 oculomotor capture, 122, 134, 135, 137, 139, 141,144, 304 oculomotor programming, 28, 37 oculomotor-IOR paradigm, 32 orthogonal cuing, 242, 248, 255 oscillator, 210, 211,212, 218,223,224 overt orienting, 236, 244, 252, 267 P1 component, 5, 6, 12, 13, 14, 18, 19, 21,243 P300 component, 5, 13, 15, 16, 17, 18, 19, 21, parallel processing, 126, 127, 139, 376 pattern-directed attending, 198, 200, 201,202, 208,212 perceptual cycle, 152, 160, 161, 162, 166, 167, 168, 169 perceptual load, 7, 126 peripheral cues, 20, 28, 30, 35, 36, 159, 153, 154, 157, 158, 232, 234, 299, 300, 333,334 phase transitions, 381,383,386, 387, 392 pitch relationships, 208 posterior attentional system, 328 preattentive processes, 51, 52, 53, 54, 55, 57, 59, 124, 127, 131,139 prefrontal cortex, 37, 367 pre-pulse inhibition, 177, 178, 179, 181, 182, 183, 184, 185, 186, 187 priming, 34, 38, 113,237, 244, 294, 295 probabilistic association, 198, 200 probe detection task, 131
40 1 rapid serial visual presentation (RSVP), 29,44, 52,93, 102, 103, 104, 105, 106, 107, 108, 109, 1 1 1 , 112, 113, 116 reactive attending, 209,2 10, 2 I I , 2 12,220, 222,223 reductionism, 377,378, 379 reflexive attention, 7, 8. 9, 12, 13, 15, 17, 1 8, 19,20,21,22,4 I , 2 10 reflexive shifts, 166, 167,2 12, 24 I rhythm, 198,200,201,202, 206,207, 209, 210,211,213,214,215,216,217.21x.219.
220,22 I saccades, 2 8 , 3 4 , 3 5 , 3 8 , 4 0 , 4 1 , 4 2 , 4 3 , 9 0 , 122, 135, 136, 137, 283, 284,290, 301. 302, 304,305,357 saccadic reaction time (SKT), 33.34, 35, 36, 40,43,303 salience, 29, 51, 52, 53, 54, 55,56,57,58, 59, 60,61,62,63, 65, 66, 67,68,69, 70.7 I , 72. 93,97, 100, 102, 123, 124, 127, 131, 134, 139, 155, 159, 205,208, 267, 271, 273, 214, 275,277.282,284,286.287 salience map, 124 saturation, 270,271, 273,277 schemas, 160, 165, 169 second nature, 388, 390,391, 392 selective attention, 6, 63, 89,294, 349, 365, 366,367 selective looking, 152, 161, 162 sensory gating, 12 178 sequence monitoring, 194, 196, 198,200, 203, 205,206,207,208,210,2 13,221 serial search, 29, 138, 224 short-term memory, 350 signal analysis, 8 signal detection, 244,269, 270,271, 274, 287 singletons, 5 I , 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,62,63,64,65,66, 67, 68, 69, 71, 72, 73,96,97,98, 100, 101, 102, 103, 104, 105, 107, 111, 112, 113, 114, 115, 116,122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 137, 138, 139, 140, 141, 142, 143, 155, 156, 160, 194, 195,203,205,206. 208,209,223,267,269,271,273,274.277, 279,299,304 singleton detection mode, 96,97, 100, 102, I l l , 112, 116, 140 slowing, 20, 293, 30 I , 309 spatial attention, 6, 12, 107, 108, 113, 124, 125, 126, 127, 128, 132, 133, 135, 137, 138, 232,235,239,242, 250,25 I , 254,287,299, 310,31 I
spatial capture, I 13, I I5 spatial filtering, 310 spatial orienting, 29, 152, 232,250, 252, 333, 34 1 spatial relevance, 249, 256 startle, 177, 178, 179, 182, 183, 184, 185, 187 sticky fixation. 266 stimulus-driven selection, 5 5 , 56, 121, 122, 124, 131, 134, 140, 154, 157, 191, 195,202, 203,207,209, 210.21 I , 214, 222,232,296, 297,298,299,304,305 striate cortex, 6 strong capture, 384 Stroop, 105, 294, 308, 309, 31 I , 327. 350,354, 358,359,360 superior colliculus (SC), 30, 35, 37,38, 39, 39, 40,41,42, 231,249,251,252,304, 328 supramodal attention, 250,25 I , 256 sustainedattention, 151, 158, 159, 162, 166, 167, 168, 169 sustained inattention blindness, 162, 164 synchrony principle, 209 temperament, 326,329, 330,332, 338, 342 temporal capture, 208,2 12, 220, 223 third nature, 387,388, 389, 390, 391, 392, 392, 394,395 trait anxiety, 326,333, 336, 342, 344 transient attention, 158, 161, 167 uncued search, 193, 194, 195, 197,200,221 upper-Paleolithic revolution, 387 visual search, 5 I , 52, 54, 55.57, 58.59, 60, 62, 64,65,66,69, 70, 71, 72,93,96, 101, 107, 122, 129, 138, 139, 154, 193, 194, 197, 199, 2 10,265,266, 268,269,28 1,293,305,307, 308,309,311,361 visual transients, 8, 9, 11, 15, 18, 84, 85, 88, 89,90, 106, 152, 157, 158, 159, 160, 161, 166, 167, 169,285,286,287 voluntary attention, 6, 7, 8,9, 17, 18,71, 156, 158, 160, 195,200,331,338,344 voluntary shifts, 167 working memory capacity, 305, 344,35 I , 365, 368
This Page Intentionally Left Blank