Unconscious Memory Representations in Perception
Advances in Consciousness Research (AiCR) Provides a forum for scholars from different scientific disciplines and fields of knowledge who study consciousness in its multifaceted aspects. Thus the Series includes (but is not limited to) the various areas of cognitive science, including cognitive psychology, brain science, philosophy and linguistics. The orientation of the series is toward developing new interdisciplinary and integrative approaches for the investigation, description and theory of consciousness, as well as the practical consequences of this research for the individual in society. From 1999 the Series consists of two subseries that cover the most important types of contributions to consciousness studies: Series A: Theory and Method. Contributions to the development of theory and method in the study of consciousness; Series B: Research in Progress. Experimental, descriptive and clinical research in consciousness. This book is a contribution to Series B.
Editor Maxim I. Stamenov
Bulgarian Academy of Sciences
Editorial Board David J. Chalmers
Steven Laureys
Axel Cleeremans
George Mandler
Gordon G. Globus
John R. Searle
Christof Koch
Petra Stoerig
Australian National University Université Libre de Bruxelles University of California Irvine California Institute of Technology
University of Liège, Belgium University of California at San Diego University of California at Berkeley Universität Düsseldorf
Stephen M. Kosslyn Harvard University
Volume 78 Unconscious Memory Representations in Perception. Processes and mechanisms in the brain Edited by István Czigler and István Winkler
Unconscious Memory Representations in Perception Processes and mechanisms in the brain Edited by
István Czigler Institute for Psychology, Hungarian Academy of Sciences/ Debrecen University, Hungary
István Winkler Institute for Psychology, Hungarian Academy of Sciences/ University of Szeged, Hungary
John Benjamins Publishing Company Amsterdam / Philadelphia
8
TM
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.
Library of Congress Cataloging-in-Publication Data Unconscious memory representations in perception : processes and mechanisms in the brain / edited by István Czigler, István Winkler. p. cm. (Advances in Consciousness Research, issn 1381-589X ; v. 78) Includes bibliographical references and index. 1. Implicit memory. I. Czigler, István. II. Winkler, István. BF378.I55U53 2010 153.1’3--dc22 2009053346 isbn 978 90 272 5214 2 (Hb ; alk. paper) isbn 978 90 272 8835 6 (Eb)
© 2010 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. · P.O. Box 36224 · 1020 me Amsterdam · The Netherlands John Benjamins North America · P.O. Box 27519 · Philadelphia pa 19118-0519 · usa
Table of contents
Contributors Preface Conscious and unconscious ����������������������� aspects of working memory �������� ������� 1. �������������� Amanda L. Gilchrist and Nelson Cowan 2. Markers of awareness? EEG potentials evoked by faint and masked events, with special reference to the “attentional blink” Rolf Verleger 3. In search for auditory object representations István Winkler 4. Representation of regularities in visual stimulation: Event-related potentials reveal the automatic acquisition István Czigler
vii ix 1
37 71
107
5. Auditory learning in the developing brain Minna Huotilainen and Tuomas Teinonen
133
6. Neurocomputational models of perceptual organization Susan L. Denham, Salvador Dura-Bernal, Martin Coath and Emili Balaguer-Ballester
147
7. Are you listening? Language outside the focus of attention Yury Shtyrov and Friedemann Pulvermüller
179
8. Unconscious memory representations underlying music-syntactic processing and processing of auditory oddballs Stefan Koelsch
209
vi
Unconscious Memory Representations in Perception
9. On the psychophysiology of aesthetics: Automatic and controlled processes of aesthetic appreciation Thomas Jacobsen
245
Appendix: Using electrophysiology to study unconscious memory representations Alexandra Bendixen
259
Index
273
Contributors
Alexandra Bendixen Institute for Psychology I University of Leipzig Seeburgstr. 14-20 D-04103 Leipzig, Germany
[email protected] Amanda L. Gilchrist and Nelson Cowan Department of Psychological Sciences 18 McAlester Hall University of Missouri Columbia, MO 65211 USA
[email protected] István Czigler Institute for Psychology Hungarian Academy of Sciences 1394 Budapest, P. O. Box 398, Hungary
[email protected] Susan L. Denham Centre for Theoretical and Computational Neuroscience University of Plymouth Drake Circus Plymouth PL4 8AA, UK
[email protected] Minna Huotilainen and Tuomas Teinonen University of Helsinki P.O. Box 9 (Siltavuorenpenger 20 D) FIN-00014 University of Helsinki
[email protected] Thomas Jacobsen Experimental Psychology Unit Faculty of Humanities and Social Sciences Helmut Schmidt University University of the Federal Armed Forces Hamburg Holstenhofweg 85 22043 Hamburg, German
[email protected] Stefan Koelsch Department of Psychology Pevensey Building University of Sussex Brighton, BN1 9QH, UK
[email protected] Yury Shtyrov and Friedemann Pulvermüller Medical Research Council, Cognition and Brain Sciences Unit 15 Chaucer Road Cambridge, CB2 7EF, UK
[email protected] Rolf Verleger Department of Neurology University of Lübeck 23538 Lübeck, Germany
[email protected] István Winkler Institute for Psychology Hungarian Academy of Sciences H-1394 Budapest, P. O. Box 398, Hungary
[email protected] Preface
The study of processes underlying conscious experience is a traditional topic of philosophy, psychology, and neuroscience. In contrast, until recently, the role of implicit (non-conscious) memory systems needed to establish veridical perception received much less discussion. With many new results and novel theoretical models emerging in recent years, it is time to take a look at the state of art in psychology and neuroscience about the role non-conscious processes and memory representations play in perception. Although behavioral studies (e.g. priming) in cognitive psychology yielded indirect evidence about many of the characteristics of implicit memory representations, cognitive neuroscience and computational modeling can provide more direct insights into the structure and contents of these representations and the into various processes related to them. This book presents several different approaches of the study of implicit memory representations, using both psychological, neuroscience and computational modeling methods, assessing these representations in relatively simple situations, such as perceiving discrete auditory and visual stimulus events, as well as in high‑level cognitive functions, such as speech and music perception and aesthetic experience. Each chapter reviews a different topic, its theoretical and major empirical issues. A large part of the results reviewed in this book were obtained through recording event-related brain potentials (ERP). This is because ERPs offer high temporal resolution, and as a consequence, they are sensitive indicators of the dynamics of human information processing. Non‑expert readers are offered with an appendix to help to understand and assess the ERP data reviewed in the various chapters. However, ERPs provide less precise information regarding the sources of the observed neural activity in the brain. Therefore, an increasing amount of research utilizes more recent brain imaging methods, such as functional magnetic resonance imaging (fMRI), whose spatial resolution exceed that of the ERPs. Although the experimental paradigms used in brain imaging are often not yet as highly specific as those worked out through the years in behavioral or ERP research, some of these studies already brought significant new insights into the working of implicit memories. Common to all chapters is
Preface
the approach that empirical evidence is evaluated in terms of its importance to models of the processes underlying conscious perception. In the first chapter, Gilchrist and Cowan review the role of working memory in conscious as well as in non-conscious cognitive processing. Verleger’s chapter discusses ERP components related to subjective experience and non-conscious processes. Winkler suggests that predictive representations of acoustic regularities form the core of auditory perceptual objects, while Denham, Dura-Berna, Coath and Balaguer-Ballester describe a neurocompuational approach to model the processes underlying perceptual objects. The role of sensory memory in automatic visual change detection and perception is discussed in Czigler’s chapter. Huotilainen and Teinonen focus on the role of learning and implicit memory in perceptual development. Shtyrov and Pulvermüller’s chapter describes a novel paradigm for studying the memory representations involved in speech perception. They evaluate the relevant theories in light of their results and provide new insights into the non-conscious processes underlying speech perception. Koelsch presents an advanced theory of the perception of musical structure, emphasizing the role of non-conscious processes and implicit memory representations. Finally, the role of non-conscious and conscious processes in aesthetic appreciation is explored in Jacobsen’s chapter. In the Appendix, Bendixen provides a concise description of the ERP method. In summary, this book provides a theoretical and empirical overview of the various topics of and approaches to the question: What is the role of non-conscious memory representations in perception? We thank Maxim Stamenov for his initiative and stimulating advice. Zsuzsa d’Albini’s technical help is greatfully acknowledged. Our work was supported by the Hungarian Research Fund (OTKA 71600) and the European Community’s Seventh Framework Programme (grant n° 231168 – SCANDLE).
István Czigler and István Winkler Budapest, November 2009
chapter 1
Conscious and unconscious aspects of working memory Amanda L. Gilchrist and Nelson Cowan University of Missouri, Columbia, USA
1.1
Working-memory models and consciousness
A simple (and, alas, oversimplified) history of working memory models helps to lend perspective to the present endeavor. George Miller (1956) is generally credited with kicking off a renaissance in the study of temporary memory related to the conscious mind, in his published speech on the limit of temporary memory to “the magical number seven plus or minus two” (though many researchers failed to understand his humorous intent in describing a magical number surrounded by confidence intervals). Shortly afterward, Miller et al. (1960) coined the term working memory to describe the memory for current goals and a small amount of information that allows one’s immediate plans to be carried out. Several models using something like this working memory concept will now be described.
1.1.1
Information-flow models
Broadbent (1958) proposed a very simple model of memory that took into account the latest evidence; in that model, a large amount of sensory information was encoded in a temporary buffer (like what is now called working memory). However, only a small proportion of it ever made its way into the limited-capacity part of memory, where it was analyzed and categorized with the benefit of the vast amount of information saved in long-term memory, and eventually added to that long-term bank of knowledge. Atkinson and Shiffrin (1968) then developed that sort of memory framework into a more explicit model in which mathematical simulations were made of the transfer of information from one store to another through encoding, retrieval, and rehearsal processes.
Amanda L. Gilchrist and Nelson Cowan
1.1.2
Multi-component models
Baddeley and Hitch (1974) soon found that this simple treatment of working memory would not do. In addition to information temporarily held in a form likely to be related to consciousness (James 1890), they argued in favor of other storage faculties that operated automatically, outside of voluntary control. Broadbent (1958) already had conceived of sensory memory that way, but sensory memory was not enough. There was said to be a phonologically-based temporary memory store that was vulnerable to interference from additional verbal items, no matter whether their source was visual or auditory. Similarly, there was a visuo-spatial temporary memory store responsive to the spatial layout and visual qualities of items; in theory, this store would be vulnerable to interference from spatial information even if its source were auditory. Baddeley (1986) later thought he could do without the central store or primary memory entirely; whereas Baddeley (2000) essentially restored it in the form of an episodic buffer.
1.1.3
Activation-based models
A form of model that will become the centerpiece of this chapter is the activation-based model. In that sort of model (Anderson 1983; Cowan 1988; Hebb 1949; Norman 1968; Treisman 1964) a temporary form of storage such as working memory is viewed not as a separate location in the mind or brain, but rather as a state of activation of some of the information in the long-term memory system. Hebb (1949) envisioned this activated memory as a cell assembly that consisted of a neural firing pattern occurring repeatedly so long as the idea it represented was active. Treisman (1964) and Norman (1968) distinguished between subliminal amounts of activation, which might influence behavior but not consciously, and supraliminal amounts, which attracted attention to the activated item, especially if it was pertinent to the person’s current goals and concerns. Anderson (1983) used the process of activation as a way to bring long-term knowledge into working memory. The instance of the activation-based models that we will discuss in most detail is the one that Cowan (1988, 1999) designed as a way to express what he believed to be known at the time. There were awkward aspects of the box-based models of the time, such as the well-known ones of Atkinson and Shiffrin (1968) and Baddeley (1986). These models did not readily express the assumption that information co-exists in several forms at once. Sensory information can be accompanied by more categorical information that is drawn from long-term memory into working memory. For example, the acoustic properties of a speech sound elicit the appropriate sound category that is stored in long-term memory. Other
Conscious and unconscious aspects of working memory
sensory information persists without invoking long-term memory information. Cowan (1988, 1999) represented this state of affairs with embedded processes. Temporarily-activated features of memory can include both sensory and abstract features. Some of these features become active enough, or active in the right way, to become part of the individual’s current focus of attention, which is of limited capacity. The focus of attention is controlled in part voluntarily, and in part involuntarily through the brain’s reactions to abrupt physical changes in the environment (such as a sudden change in lighting or a loud noise). In the original model of Broadbent (1958), unattended sensory information was filtered out so that it faded away without ever reaching the capacity-limited store. This model became untenable when it was found that aspects of unattended sounds can attract attention (as when one’s own name is presented; see Moray 1959) so Treisman (1964) proposed that the filter only attenuated, rather than completely blocking out, sensory input that was unattended. Potentially pertinent input could still result in attention being attracted to that input. Cowan (1988) solved this problem in another way, by indicating that all sensory input made contact with long-term memory, but with extra activation for the deliberately-attended input or the abruptly-changing input that attracts attention involuntarily. This is the converse of the filtering concept. Cowan (2001) suggested that the focus of attention in the embedded process model of Cowan (1988, 1999) is limited to about 4 separate items in normal adults. This leaves open the basis of individual differences in capacity limits. Cowan and colleages have suggested that there is a basic difference between individuals and age groups in the size of the focus of attention (Chen and Cowan 2009a; Cowan 2001; Cowan et al. 2005, 2009; Gilchrist et al. 2008, 2009). An alternative basis of individual differences and age differences, still in keeping with the model, is that individuals appear to differ in the ability to control or allocate attention or executive function (Engle et al. 1999; Miyake et al. 2001) or, more specifically, to inhibit irrelevant items and prevent them from cluttering up the limited-capacity store (Lustig et al. 2007). These possibilities are not mutually exclusive and both of them are compatible with the basic embedded processes model. (Indeed, for a structural equation model similar to Engle et al. or Miyake et al., see Cowan et al. 2005.) Cowan’s embedded process model assumes that storage and processing both use attention, and that certain processing challenges must be kept in check using attentional capacity that otherwise could be devoted to the temporary storage of information. These challenges include stimuli that elicit strong habitual responses, and irrelevant distracters. Indeed, recent research has suggested that processing, such as tone identification, can interfere with very different storage, such as that of visual arrays of objects (Stevanovski and Jolicoeur 2007); and that,
Amanda L. Gilchrist and Nelson Cowan
conversely, storage, such as that of verbal list items, can interfere with very different processing, such as that in a speeded, nonverbal, choice reaction time task (Chen and Cowan 2009b). It remains to be seen whether attention holds multiple items at once directly or, alternatively, is used to refresh items one at a time before their activation is lost (Barrouillet et al. 2007).
1.1.4
Conscious and unconscious processing in different models
Although these models were constructed with the aim of explaining behavior, they also seem relevant to the issue of consciousness. Figure 1 shows three of the models with our interpretation of what, within each model, is likely to correspond to conscious and unconscious processes. Figure 1A depicts the original model of Broadbent (1958). In that model, incoming stimulation from the environment was said to be held in a temporary store, or buffer, and to go nowhere except for the small amount that is encoded into the limited-capacity buffer. Given that sensory memory is much richer than what individuals can report (which was shown for auditory stimuli by Broadbent himself, among others, and for visual stimuli by Sperling 1960), the model must consider the contents of the sensory buffer to be unconscious. Likewise, the vast information in long-term memory could not all be in consciousness at the same time. It is the limited-capacity buffer that would hold the contents of conscious awareness. Figure 1B shows the model of Baddeley (2000). Baddeley’s models, ever since Baddeley and Hitch (1974), implicitly have assumed more automatic processing than was made clear by Broadbent (1958). In particular, the entry of information into the phonological and visuo-spatial buffers is said to occur automatically, resulting in ready-to-use codes. These, however, need not be in consciousness. For example, Baddeley (1986) summarized research indicating that unattended speech could interfere with working memory for printed words, given that phonological codes are in memory for both types of stimulus at once, leading to corruption of the material to be remembered. In contrast, the episodic buffer, which was added to the model by Baddeley (2000) in order to account for temporary knowledge of abstract features, seems more likely to be linked to conscious awareness. Although to our knowledge Baddeley has not stated as much, abstract information has the advantage that it could refer to any stimulus. Consciousness seems to have the quality of unity in that our subjective experiences are such that there is only one pool of consciousness; for example, if we are looking at a work of art and listening to a tour guide’s description of it, we feel conscious of the work of art, the verbal description, and perhaps the tour guide all at once. In the Baddeley (2000) model, the only buffer capable of representing and inter-relating all of this information is the episodic buffer.
Conscious and unconscious aspects of working memory
A. Broadbent (1958) Model Sensory Buffer (Unconscious)
Filter (?)
Capacitylimited Buffer (Conscious)
Long-term Memory (Unconscious)
B. Baddeley (2000) Model Central Executive Processes (?)
Phonological Buffer (Unconscious)
Episodic Buffer (Conscious)
Visuo-spatial Buffer (Unconscious)
Long-Term Store (Unconscious)
C. Cowan (1988, 1999) Model Central Executive Processes (?)
Long-term memory (Unconscious)
*
Focus of * attention (Conscious)
Activated memory (Unconscious)
*
Figure 1. A depiction of three models of working memory and information processing, and the status we presume for each component of each model with respect to its inclusion in conscious awareness (conscious) or exclusion from it (unconscious). Where we did not have a presumption one way or the other, we added a question mark. The arrows coming from the left-hand side of each figure represent the entry of sensory information into components of the system as shown. A. Broadbent’s early-filter model. B. Baddeley’s multicomponent model. C. Cowan’s embedded-processes model.
A more difficult, thorny issue is whether we are conscious of central executive functions. These are functions driven by motivations and intentions, as one can tell by testing individuals under different sets of instructions or reward contingencies. The central executive is, by definition, the set of mechanisms responsible for planning, scheduling, and executing information processing that transforms codes in buffers and memory stores, affects what information is attended, and affects what actions are performed. Although we are at least typically aware of the items to which attention has been directed and the items that are being transformed from one code to another or, in the parlance of the model, shuttled from
Amanda L. Gilchrist and Nelson Cowan
one buffer to another, we may not be aware of the process by which the transformations are accomplished. Because of the theoretical difficulty of this question, we leave a question mark in the central executive. Basically, it is not clear to us whether it is feasible to consider processes to be in consciousness, or whether it is only information in stores that is held open to conscious reflection. One resolution of this question is that processes may give rise to stored representations that are held in consciousness, whereas other processes operate without attention and consciousness directed to them. Indeed, when one becomes too aware of the central executive processes governing a complex behavior, it can lead to poor performance or “choking” (Beilock and Carr 2005); as Fyodor Dostoevsky’s character complained in the short novel, Notes From Underground, in such cases consciousness is a disease. Finally, Figure 1C shows the activation-based, embedded processes model of Cowan (1988, 1999). For our purposes, this model has the advantage that the access to consciousness is a key aspect of the model. This access to consciousness is synonymous with the focus of attention. Elements of memory outside of the focus of attention, whether active or not, are outside of conscious awareness. If the line between the focus of attention and the rest of activated memory is seen as a blurry or gradual one, then the focus of attention might be viewed as being the same as primary memory as described by James (1890). One reason to demarcate conscious versus unconscious processing in the model is that the information in conscious awareness is assumed to gain access to much deeper and extensive perceptual and conceptual analysis using long-term memory information than information that is outside of conscious awareness. Therefore, directing conscious awareness is an important skill for harnessing information processing to succeed at a task. One of the strong suits of Baddeley’s models is that they have been wellgrounded in research on patients with brain injury. One can find instances in which patients appear to be deficient in temporary phonological storage or rehearsal (e.g., Baddeley et al. 1988), attention-related working memory (e.g., Jefferies et al. 2004), or long-term episodic storage (e.g., Baddeley and Warrington 1970). None of these cases are especially problematic for the Cowan model, either. It is acknowledged in the model that phonological activation is one type of activated memory and that rehearsal of it can take place without much attention. Attention-related working memory and automatic activation are both intrinsic to the Cowan model, as is long-term storage. There is no reason why specific lesions could not selectively impair these aspects of memory. Although patients with long-term episodic storage impairments obviously would be unable to activate information that was not stored in the first place, information in semantic
Conscious and unconscious aspects of working memory
memory can become temporarily activated by new stimuli and, in that way, used in short-term storage tasks. An important issue that was addressed in enough detail by Cowan (1988, 1999) is where information about the individual’s goals is held. When the goals keep shifting or when there is information that undermines the goal, the goal presumably must be kept in the focus of attention, using up some of the limited capacity of that store. An example of this is the Stroop effect, in which the goal is to name the color of ink but the word spells out a different color. Kane and Engle (2003) showed that individuals with high working memory span carry out this task more successfully than low spans if a conflict only occurs on a small proportion of the trials. In contrast, if the goal becomes routine and it is not undermined by conflicting stimuli, that goal need not use up some of the limited capacity of the focus of attention and may be held instead in a portion of long-term memory (or in the central executive itself if it can be considered to have a memory, which was sometimes assumed by Baddeley, but not by Cowan).
1.1.5
More detail on the embedded-processes model
Thus, the embedded-processes model, rather than being composed of separable structures, includes within the long-term memory system a currently activated subset of memory and, within that in turn, a subset that is the current focus of attention. The activated portion of memory was said to include unlimited information at once but it was assumed to lose its activation through time-based decay within 10 to 20 seconds, provided that activated items are not attended or rehearsed. The focus of attention comprises a limited number of items that are not only highly activated, but are also most rapidly accessible and are in one’s awareness (‘in mind’). In contrast to the time-limited decay of activated longterm memory, the focus of attention was assumed not to decay but was limited to only several separate, meaningful items at once. The focus of attention functions as a zoom lens with respect to its comprising information (see also LaBerge and Brown 1989). It can ‘zoom in’ on a particular item, leading to more focused and precise allocation of attention to a single item, with remaining items in the focus receiving considerably less attention and detail; this suggests that attentional resources can be allocated on the basis of prioritization if necessary. In contrast, if multiple items need to be processed simultaneously, the focus of attention can ‘zoom out’, leading to a greater breadth of processing, but with less precision overall. How information enters into awareness depends upon several different factors that are accounted for in the model. The first factor involves the origin of
Amanda L. Gilchrist and Nelson Cowan
information; items may enter into the focus of attention from either an internal (e.g., information contained in long-term memory) or an external (e.g., sensory information from the environment) source. The embedded-process model accounts for such differences through the inclusion of two different means to control the focus of attention. Attention is directed towards internally-generated items in memory via a central executive. Cowan (1999) defined the central executive as a subset of cognitive processes that are modified by task-based instructions, goals, and incentives. Consider how such effortful allocation of attention might operate in a general fluency task. Participants are presented with a given category, such as ‘animals’; they are to name as many animals as they can from memory. In this case, the central executive can direct a search of memory, selecting only those items that are relevant to the task (e.g., ‘ostrich’, but not ‘ostrogoth’). Attention is directed towards these items, allowing access to the focus of attention. Additionally, participants must monitor their responses, to keep from recalling a named item multiple times – the central executive can further select only those animal names that have not yet been recalled. These two processes may be repeated until no further animal names can be recalled – the central executive may terminate memory search and retrieval of category exemplars, thus ending the task. Contrasting the above description, allocating attention to external sensory information occurs through filtering through the central executive (similar to internally-generated information), as well as through orienting to changes in environmental stimuli. Information from the sensory stores enters the contents of memory; whether information gains access to the focus of attention depends upon item features (as when one listens to one person’s voice while ignoring other people in the room). Processing within the embedded-process model also depends upon the relative familiarity or novelty of presented stimuli. In general, features that remain fixed over time or are not novel do not capture attention, though these features may be activated in memory if they are already sufficiently familiar and have been deemed task-relevant via the central executive. Rather, unchanging features are often subject to habituation and lose perceptual salience over time; it should be noted that for these cases, access to the focus of attention will be more likely to occur through voluntary control of attention through the central executive. In contrast, novel or dynamic features present in the environment demand orienting and allocation of attention, and are thus involuntarily captured in the focus of attention. Recent research provides additional evidence for a procedural, activationbased working memory system. Oberauer (2002, 2005) proposes a model that is quite similar to the embedded-process model described above. All memory is contained within the contents of long-term memory, a subset of which is activated.
Conscious and unconscious aspects of working memory
The model proposed by Oberauer differs from the model proposed by Cowan in one important respect – the size of the focus of attention. Whereas the embedded- process model has a focus of attention containing four items, the Oberauer working memory model defines this as a region of direct access. Like Cowan’s focus of attention, the region of direct access holds approximately three or four items or chunks that are immediately accessible and available for cognitive processing. Unlike the embedded-process model, this direct-access region contains an additional nested process, which is the focus of attention. Here, the focus of attention holds only one item or chunk to use for a subsequent cognitive process.
1.1.6
Further embedding of Oberauer (2002)
In a series of experiments examining shifts of attention and updating in lists of words or digits, Oberauer (2002, 2005) found that the direct access region proposed depends on currently-relevant lists, and that the focus of attention holds the item that was most recently updated in a task. Participants were presented with two lists of either 1 or 3 items, which were to be updated when a particular item was cued. For example, a participant might receive a visually-presented digit list: 4 7 2. One of these list items (e.g., the last item) might be cued for updating with –2, indicating to subtract two from the last item’s current value. Thus, the new list should now be 4 7 0 – this must be retained for a later test of recall. In addition to indicating which corresponding item was to be updated, cues also indicated which of the two lists was currently relevant to the task. A critical manipulation occurred when another cueing and subsequent updating instruction was presented to participants. Participants may be asked to update a previouslyupdated item within the task-relevant list (i.e., no switch) or another item in a task-relevant list (i.e., an object switch); finally, they may be instructed to shift their attention towards the previously-irrelevant list, termed a list switch. For updating items within the same list, response time for object switches is longer than when no item switch is required. All relevant-list items are contained within the region of direct access; they are immediately available for processing. The differences in response times for switching items versus no item switch suggest a shift of the focus of attention towards the newly-cued item; presumably, if all items were equally held, as they are in the Cowan model, item switches within an active list should not result in increased response time. In addition to support for a one-item-limited focus of attention, Oberauer has found that list, but not object, switches result in increased response time for updating and recall of list items. This has been taken as evidence that switching lists may involve moving previously-relevant items from the direct-access region into activated long-term memory,
10
Amanda L. Gilchrist and Nelson Cowan
with newly-relevant items following the opposite process. Object switches will require less time to process, as all of the items are in the region of direct access; items will simply be moved in and out of the focus of attention. Finally, Oberauer found that response time increases with increasing list length (i.e., a list-length effect), but only for lists that were previously task-relevant. As switching lists requires moving items between activated long-term memory and the direct-access region, this suggests that moving items out of the focus of attention and directaccess region is more effortful and time-demanding than moving items into these nested processing regions. Specifically, moving items out of the region of direct access occurs in a serial fashion, taking more time as list length increases. The results above provide strong evidence for the further embedded processing described in Oberauer’s model. As we will discuss in a later section, this model fits well with current theories of conscious processing, particularly those where one item receives conscious processing at a time, such as Global Workspace Theory (GWT) (Baars et al. 2003; Baars and Franklin 2003, 2007). However, we are of the opinion that more exploration with the above paradigm is necessary. A particular concern relating to the above paradigm is the use of incredibly small set sizes to test effects of list length and task relevance. We acknowledge that working memory is limited to a small subset of items, and thus smaller sets are necessary, but the inclusion of set sizes is not sufficiently exhaustive. For example, Oberauer (2005) notes that list lengths of 2 and 4 were tested in pilot research, but the task was too difficult for participants under these conditions. For the most comprehensive account of the direct-access region and focus of attention, these set sizes should be included in experimental conditions. This is particularly so for lists of one versus two items, where it is possible that attention must change from a singular focus to being divided in half. Additionally, set sizes that exceed span must also be considered to enrich this model further. As Oberauer (2002, 2005) only examined subspan lists of items, it is unclear how moving items between the embedded components will be affected. Surely interference between the items would be far more likely, and this must somehow be accounted for. Despite the differences between the two models, they are not necessarily in conflict with regards to processing of conscious, directly-accessible items. Although there is no further nested processing regions within the embedded-processing model, Cowan (2001) noted that attention is likely not equally divided between the items in the focus of attention, particularly when the number of items maintained is subspan (see also Zhang and Luck 2008). In cases where the number of items is below working memory span, more than one fixed slot can process a given item. This implies that some items within the focus may require more attentional allocation, and thus greater processing, than others. The idea that limited-capacity attention is allocated to active items is not a new one. Yantis
Conscious and unconscious aspects of working memory
and Johnson (1990) examined various models of such prioritization in a series of visual search tasks. Participants were instructed to search for a target letter in an array of additional letters. Letters in the search array either were present in a prior, masked placeholder array, known as no-onset letters, or appeared in locations that were blank in the previous array (i.e., abrupt-onset items). Fitting the resulting data to the models, Yantis and Johnson found evidence for attentional prioritization in visual memory. Specifically, all abrupt-onset items were processed first, due to automatic registration; following this, no-onset letters received attention. Items that are deemed high priority may be processed via either a proposed queue mechanism or through tags that decay over time, though it is unclear how the order of processing these items is attained. Most likely, for cases involving visual search, high priority items are likely processed on the basis of similarity to a target (i.e., task-relevant features), with items most resembling the target receiving attention first. Additionally, items that are most perceptually salient are most likely to capture attention initially, with less salient items following suit. With this in mind, the embedded-process model may utilize such prioritization within the focus of attention, ‘zooming in’ on one item that receives initial attention for reasons of task relevance or salience of its comprising features. This is analogous to Oberauer’s one-item focus of attention. We should note that more research and exploration is needed to determine the underlying mechanisms for determining priority, for both visual and verbal aspects of working memory. In contrast to the two differences that are initially assumed, the model proposed by Oberauer actually only differs in one respect: Oberauer’s model simply contains an additional level of nesting beyond that of the embedded-process model. The manner in which conscious, directly-accessible items are processed within their respective regions in the two models, however, does not differ. To present in full the evidence for and against each of these models would be the topic of a long book rather than a chapter. Below, we will simply present some of the best-known relevant research in order to provide an understanding of some of the motivation for the construction of the embedded-processes model, to illustrate the motivation behind the assumptions about its relation to conscious awareness, and to explore the current status of these assumptions.
1.2
Research basis of consciousness and embedded processes in working memory
Cowan (1988, 1995, 1999, 2005) summarized diverse evidence relevant to the conscious versus unconscious status of information in activated memory versus the focus of attention. What is critical is that sensory features from multiple
11
12
Amanda L. Gilchrist and Nelson Cowan
stimuli can become activated in memory at the same time, without evidence that they enter awareness. However, deeper processing seems to require awareness, making the system more like that of Broadbent (1958). Cowan (1988) was noncommittal about whether the automatically activated information could include semantic information. Unlike Broadbent we still assume that it may but, like Broadbent, we do not believe that automatic activation of semantic features captures attention. The evidence comes from separate auditory and visual studies.
1.2.1
Auditory studies related to consciousness and working memory
Broadbent (1958) summarized work of this type by several investigators. In some of the work, a dichotic listening method was used in which different messages were presented in the two ears using a tape recording. The message in one ear was to be attended and repeated (shadowed) whereas a different message in the other ear was to be ignored. Occasionally, the tape was stopped and the research participant was to repeat as much as possible from the ear with the message to be ignored. Only the most recent information from that message could be remembered; the last few seconds at best. Many versions of dichotic listening experiment yielded information consistent with the conclusion that information is held in a sensory form and is forgotten within a few seconds unless it is entered into attention and consciousness. Johnston and Heinz (1978) were able to verify the assumption that attention and awareness go together and are found for one channel in selective listening. They presented two auditory channels that were distinguished either in terms of physical cues like location and voice quality, as in most of the studies described by Broadbent (1958), or were presented in physically identical means, differing only in the semantic topic of the message. The key manipulation was that, in addition to shadowing one channel, participants also were to carry out a the subsidiary task of pressing a key as quickly as possible when a visual signal appeared. The subsidiary task reaction time was known from other work to be a good measure of how much attention is free despite the selective listening task. It was found that the subsidiary task reaction time was much faster when the messages were physically separated than when they were only semantically separated. Within the embedded-processes model, this indicates that the focus of attention can be directed at one physical stream of information with little difficulty. This cannot be done when the messages differ only in their semantic meanings because the analysis of semantics is not automatic; it requires attention and effort. It seems most likely that both messages are held in the focus of attention until the portions related to a semantically coherent message can be extracted. Given the need to hold so much
Conscious and unconscious aspects of working memory
information while interpreting it on the basis of knowledge and guesswork, this is an effortful process that leaves little attention to be used for the subsidiary task. Both the embedded-processes model (Cowan 1988, 1999) and the Broadbent (1958) model are consistent with these findings indicating that not much automatic semantic processing contributes to shadowing performance. One way in which they differ is in the manner of the control of attention. Broadbent posits an attentional filter and one might think that the more channels there are to be rejected, the harder the filter must work. In Cowan’s model, in contrast, there is no filtering and the focus of attention must select one channel to be processed semantically, a process that need not depend on how many channels there are to be rejected. A further experiment of Johnston and Heinz (1978) seems to support this additional prediction. A condition in which there were only two physically different messages, one to be attended and one to be ignored, was compared to a condition in which there were three physically different messages, one to be attended (a female voice) and two to be ignored (two different male voices). The subsidiary task reaction time was the same no matter whether it was necessary to ignore one or two channels, as expected according to the embedded processes model in which there is no real filtering out of irrelevant stimuli, just accentuation of the relevant channel or message held in the focus of attention. At one time there was an important movement toward models in which information leaks through the attentional filter, such as the attenuating filter model of Treisman (1964). The main reason for it was that there were a number of studies suggesting that information did get through. The most famous of those was when the participant’s own name was presented in the unattended channel in selective listening (Moray 1959). The finding was that participants sometimes could recognize and remember hearing their own names. Wood and Cowan (1995a, 1995b) confirmed that participants could notice subtle changes in the unattended channel, including their own names. It was about 1/3 of the participants who noticed their names, about the same proportion obtained in the smaller study by Moray. However, there are different ways that this could come about. It could be that information about the name indeed leaked through the filter (or, in the embedded processes model, that acoustic components of the name made contact with the long-term memory representation of the name and thereby attracted attention). Alternatively, it could be that attention often wandered from the assigned message over to the unattended message, or was split between messages, and therefore that the name was already attended when it was heard. It appears that that is the case inasmuch as one study (Conway, Cowan, and Bunting 2001) showed that 65% of individuals in the lowest quartile of working memory noticed their names, whereas only 20% of individuals in the highest quartile of working memory noticed their names. The most straightforward interpretation is that the high spans were on task
13
14
Amanda L. Gilchrist and Nelson Cowan
more often than the low spans. Colflesh and Conway (2007) showed that when the task was to try to monitor both messages at once, high-span individuals noticed their names more often than low spans.
1.2.2
Visual studies related to consciousness and working memory
Sperling (1960) carried out a seminal study that can be considered the basis of the visual work relating working memory to consciousness, much as the work reviewed by Broadbent (1958) can be considered the basis of the corresponding auditory work. Sperling presented an array of characters (printed letters and numbers) on a computer screen, sometimes followed shortly afterward by a tone cue indicating whether the top, middle, or bottom row of characters was to be reported. With no tone cue, participants could recall about 4 items from the array. When the tone cue followed closely after the array, they could recall about 4 items from the selected row. This indicated that there are two limits in recall. First, there is a limit of about 4 items in how much information can be extracted into working memory. Second, there is a limit in how many items can be encoded from the array but it is not nearly as severe as 4 items. Instead, it appears that most or all of the array items are encoded, given that the participants did not know in advance which row would be cued but still could successfully recall 4 items from that row. The retention of most of the array was said to be in the form of sensory memory, usable if attention is turned to it but not reportable in anything near its entirety. This sensory memory therefore seems to be held outside of the focus of attention and awareness. In contrast, the small amount of working memory that is reportable (about 4 items) is in attention and awareness. Luck and Vogel (1997) created a cleaner version of Sperling’s procedure that was intended to quantify visual working memory. They used colored nonverbal shapes rather than verbal characters to show that working memory for even nonverbal information is limited to about 4 items. The array of objects to be remembered was followed by a probe that could even be a single item, the task being to indicate whether the item changed from the corresponding item in the original array. Performance was nearly perfect when there were 3 or 4 items in the array, and declined as the array size increased. Cowan (2001) derived a simple formula to estimate the number of array items in working memory for the single-item probe case, and Rouder et al. (2008) found a good mathematical fit of detailed data to the formula. The type of memory observed by Luck and Vogel (1997) is often considered to be visual working memory, which could make it a rather automatic store like the visuospatial buffer of Alan Baddeley’s models, potentially operating without
Conscious and unconscious aspects of working memory
conscious awareness. Instead, we believe that it is a central memory that has more to do with the focus of attention and consciousness. Morey and Cowan (2004) showed that interference occurs from a spoken verbal memory load (seven random digits), but not from a spoken series that blocks articulation but does not load memory (a known seven-digit telephone number). Saults and Cowan (2007) used digits presented in different voices from four loudspeakers to prevent rehearsal and found that the memory for the digits traded off with memory for the colored visual objects; participants could remember about 3.5 visual objects or, if asked to remember both modalities, they could remember fewer visual objects but about 3.5 objects total, visual plus auditory. Cowan and Morey (2007) showed that this interference between visuospatial and auditory-verbal items occurred in the working memory retention phase, not just in the encoding of the stimuli. It seems, therefore, that we are dealing with a central working memory store that is limited to 3 or 4 items in adults. Not surprisingly, this central store would be the same that Sperling (1960) observed with printed verbal characters. This limitedcapacity store matches the conscious information that the embedded-processes model conceives as the focus of attention and awareness. Studies reviewed by Cowan (2001) showed that individuals vary quite a bit in their working memory capacity; adults vary from 2 to 6 with most individuals in the range of 3 to 5 items, and children and older adults more typically about 1 item fewer. There are at least two mechanisms by which the visual store indexed by the task of Luck and Vogel (1997) could vary among age groups and individuals. First, it could be that some individuals are more distractible and therefore are less likely to encode the correct information into working memory. This can be tested by including some objects that are to be ignored (e.g., red bars) and others that are to be attended (e.g., green bars; attend to their orientations). A distractible individual would incorrectly treat the red bars as if they were relevant and therefore would act as if their working memory were heavily loaded. In contrast, a less distractible individual would exclude the red bars and therefore would act as if their working memory was not so heavily loaded. Vogel et al. (2005) used an event-related potential measure of working memory load and found that individuals who remembered more items were less distractible. McNab and Klingberg (2008) duplicated this result in an fMRI study, showing individual differences in subcortical centers involved in excluding irrelevant objects. Of course, either encoding of information into an unconscious store or encoding of information into a conscious store could be influenced by the ability to filter out or exclude irrelevant items. This type of mechanism also agrees well with the notion that individuals with better span are those who are better able to exclude or inhibit irrelevant information (Lustig et al. 2007).
15
16
Amanda L. Gilchrist and Nelson Cowan
There is other evidence, though, of another mechanism whereby individuals differ in capacity. This mechanism is not well understood but it is a difference in storage ability per se. Gold et al. (2006) examined this in a behavioral study of individuals with schizophrenia and healthy controls using arrays of objects to be attended and ignored. They usually probed objects to be attended but occasionally probed objects to be ignored. The sum of the attended and ignored items in working memory gave an estimate of the total amount in working memory, whereas the difference between attended and ignored items, favoring the attended ones, gave an estimate of the efficiency of the attentional filtering process. Surprisingly, it turned out that the difference between the groups was almost entirely in the total amount in working memory, with very little difference in the filtering efficiency. Cowan et al. (2010) replicated this finding for young children (7 to 8 years old) in comparison to older children (12 to 13 years old) or adults, when the arrays included 2 relevant items (e.g., circles; attend to their colors) and 2 irrelevant items (e.g., triangles). However, when the arrays included 3 relevant and 3 irrelevant items, the large age difference in capacity (as measured by the sum of memory for relevant and irrelevant items) was now supplemented by a smaller age difference in the ability to filter out or exclude irrelevant items (as measured by the difference between memory for relevant versus irrelevant items). This finding suggests that the ability to filter out or exclude irrelevant items is compromised when working memory is overloaded. That is to be expected only if the working memory store in question is a central store that is attention-related in itself, as is the focus of attention as a storage device. Some controversy has arisen regarding the nature of memory in the Luck and Vogel (1997) array task based on the properties of binding between different features of objects. An example is remembering which shape went with which color, and/or which color was found at which location in the array. Our initial expectation a few years ago was that introducing an attentional distraction would differentially impair the ability to remember binding information as opposed to feature information. That is because the focus of attention was viewed as the vehicle for the retention of new bindings between features (Cowan 1995, 1999, 2001). The activation of information from long-term memory could not include new bindings between features, as one finds when new or meaningless objects are presented. It could include the information that an apple is red, for example, but not the information that an abstract circle that has been presented is purple. Similarly, perceptual theory popular in the field had long held that it takes attention to form bindings between features (Treisman and Gelade 1980). Two studies, however, showed that this was not the case. A distracting task during the array retention task was just as harmful for binding information as it was for the feature information (Allen et al. 2006; Cowan et al. 2006).
Conscious and unconscious aspects of working memory
Does this finding contradict the understanding that retention of information in this task is attention-related, as in the focus of attention? Not necessarily. One way to understand this finding is that bringing an item into the focus of attention necessarily entails its features being bound. If the focus of attention were the main faculty in which information is stored in the task, introducing distraction would interfere with storage of the items and therefore would affect both memory for memory for features and memory for the binding between these features.
1.2.3
Units of working memory and consciousness
Miller (1956) emphasized that working memory depends on the ability to group items together to form meaningful chunks. The letters PRF may form 3 chunks if they form no known acronym; the letters USA may form a single chunk if the acronym United States of America is known to the participant. In many procedures, the ability to group items together is limited (because the items are presented quickly and unpredictably, as in the visual array task or in running memory span; or because verbal repetition of a word or short phrase to eliminate verbal rehearsal is required) and the usual result is that about 3 to 5 items are recalled by adults (Cowan 2001). It remains unclear from this research, though, how general is this answer regarding the number of items that can be held in the focus of attention. The item limit appears to hold even when items are grouped together. Chen and Cowan (2009b) show this most clearly. They taught participants pairs of words to 100% correct cued recall, and also presented unpaired word singletons. Thus, there were 1- and 2-word chunks. During the recall of lists of words of various lengths ranging from 4 to 12 words, they showed that the recall limit was narrowly defined in chunks. Individuals could recall about 3 singletons from a list of those items, or they could recall about 3 pairs from a list of those items, regardless of the list length. Thus it does not appear that the limit in the focus of attention is in how many words can be held, but rather in how many meaningful units can be held. Similarly, Gilchrist et al. (2008) presented lists of unrelated spoken sentences and found that young adults could recall the key words from about 3 sentences; older adults could recall somewhat fewer. It is not necessarily the case that every word from a known chunk is held in working memory at the same time, but a marker of the unit in long-term memory is held in working memory. For instance, you probably can remember the trio of songs, Three Blind Mice, The Star-Spangled Banner, and Taps all at once and you might be able to hum them one at a time without a reminder. However, this does not mean that all of the words and notes from all three songs are dumped into the focus of attention at the same time. The focus shifts so that as you get to each song, the elements of the song are shifted into working memory as needed.
17
18
Amanda L. Gilchrist and Nelson Cowan
An ongoing debate is whether the resources that allow information to be held in the central part of working memory or the focus of attention is discrete or continuous. A discrete resource would allow up to a certain limited number of chunks to be held (cf. VanRullen and Koch 2003). In contrast, a continuous resource would be more fluid and could be divided up among different items in any way that was desired. This is actually a much more difficult question than it would seem at first glance. The finding that working memory, without rehearsal or grouping, is usually limited to a few items at a time could be interpreted as the result of a continuous resource. If the resource were spread unevenly over the items or these items were variably encoded, it would still be the case that some of them would surpass a threshold for memory and others would not. A couple of recent studies have used procedures in which the probe item’s discrepancy from the corresponding array item varies (Bays and Husain 2008), or in which the probe has to be recalled on a sliding scale and the amount of discrepancy between the stimulus item and the response can be measured (Zhang and Luck 2008). These studies differ in their interpretation and more research is needed to determine with certainty who is right. Meanwhile, Rouder et al. (2008) found a tradeoff between change detection and no-change detection, depending on the frequency of each type of trial in the trial block, that is linear as expected according to the model in which a discrete number of items is encoded into working memory (Cowan 2001). It is possible that discrete working memory is a programmable stance that the participant adopts rather than an inevitable quality of the mind. Given resources insufficient to retain all items in a visual array, participants may typically choose to retain a few items well. It is known, however, that individuals also are able to judge the statistical properties of a large number of items, such as the average size of circles in a large field (Chong and Treisman 2005). In future, it will be interesting to see whether attention can be fluidly spread across the field for this task or whether an attention limited to a few items can be used in a manner complex enough to explain working memory for statistical properties. The interplay between conscious and unconscious elements of working memory makes for some fascinating phenomena. Among them is inattentional blindness (Simons and Rensink 2005). Even though it appears as if the visual field before you is in your attention, that is the case only in an impoverished manner and, actually, only a few items are fully attended. That is why it is difficult to recognize a change in an array of more than about 4 items. In some experiments, a realistic scene appears and then disappears momentarily and, when it returns, something important has changed (such as the color of shirt that an actor is wearing or the presence of a stop sign as opposed to a yield sign on the road). Such changes are very often missed and it seems that only a few details
Conscious and unconscious aspects of working memory
of each scene, the ones in the focus of attention, are retained well enough that a change can be detected. If you spill pasta sauce on your shirt at an event, feel free to replace your shirt with one of a different color; chances are small that anyone at the event will notice the change. Armed with this body of research, we now return to a more theoretical treatment of the notion of conscious and unconscious processes and how they are involved in working memory.
1.3
Conscious and unconscious processing more generally considered
To understand how conscious and unconscious stimuli might operate within a working memory framework, it is important to understand how unconscious and conscious processes work, as well as how they manifest in cognitive behavior. In the current section, we broadly define what it is for something to be conscious or, conversely, unconscious. We follow this with empirical findings that provide support for how conscious and unconscious stimuli influence behavior in cognitive tasks. We present relevant effects that provide support for dissociating conscious and unconscious influences: priming and perception. Unconscious and conscious processes operate in a variety of cognitive tasks but the findings from these two areas allow for a clear dissociation between conscious and unconscious processes. Using a broad definition, conscious information can be considered any stimulus, either externally- or internally-generated, which we are aware of at any given time – thus, these items are ‘in mind’. In contrast, unconscious stimuli are those items which are currently not in awareness, and have no reportability. Although one is typically unable to classify unconscious information, previous studies find that these still exert effects on behavior, particularly on indirect or implicit tests of memory. Both conscious and unconscious mechanisms may be involved in myriad cognitive tasks, operating in a similar manner as automatic familiarity and controlled recollection, as described by process dissociation models of memory (Jacoby 1991). The degree of involvement of conscious and unconscious processing has been discussed for various cognitive phenomena; a subset of these is discussed in detail below.
1.3.1
Priming
First, consider effects of priming. In a prototypical procedure, participants are presented with a word (e.g., doctor). This word is often consciously processed, though it may also be presented below a threshold of awareness, as is done in
19
20 Amanda L. Gilchrist and Nelson Cowan
subliminal priming procedures discussed below. Participants are then presented with an indirect, or implicit, memory test; this often involves presenting scrambled anagrams or incomplete word stems where one must provide a permitted word (e.g., n________). Due to spreading semantic activation, strongly-related associates to the word ‘doctor’ also receive activation, thus increasing their probability of being reported in a memory test. In the example provided, ‘nurse’ is a very strong associate to ‘doctor’; participants often complete the word stem with this or other strongly-related associates, rather than with unassociated (e.g., neighbor) or weakly-associated words (e.g., neurology). Additionally, responses to targets related to primes tend to be significantly faster. Here, the presented word is clearly conscious; however, primed associates are often no more conscious than any other item in long-term memory. Yet somehow overt responses and behavior are altered. Whereas both conscious and unconscious mechanisms contribute to the general priming effect, the degree of involvement of each is difficult to determine. However, it is clear that the underlying operations of the priming effect are not uniformly conscious in nature. Studies have also examined subliminal priming. As briefly mentioned above, this particular paradigm involves similar methodology to a general priming task with one important difference – primes are presented at a threshold below conscious awareness. Often items are presented so briefly that participants report not seeing any item. One variation of this presentation involves masking items sufficiently fast enough that the initial item is not registered in conscious perception or identification. In one task used by Morey et al. (2008), participants were presented with a masked digit prime; participants rarely reported being able to detect this prime. Following presentation of a below-threshold prime, participants are presented with a target digit, which is consciously perceived and identified. Participants are simply asked whether the target digit is less than or greater than a given number (e.g., greater or less than 5). What is critical in this particular paradigm is the relationship between the prime and the target digit. Primes may either be congruent with the target (i.e., both prime and target are less than or greater than 5) or incongruent with the target (e.g., prime is greater than 5, target is less than 5). For this particular paradigm, priming effects are manifest in response time to verifying the target digit, rather than providing an overt response. Specifically, the time needed to respond to the target is longer when the prime is incongruent. Despite the prime being irrelevant to performing the tasks, as responses are only related to the target itself, some unconscious registration of the ‘unseen’ digit occurs, which affects task performance. This finding is based on earlier studies, such as that of Marcel (1983; Experiment 4), who found a similar subliminal associative priming effect in a lexical decision task. Items that were presented at a subjective detection threshold level, either
Conscious and unconscious aspects of working memory
unmasked or pattern-masked to both eyes, influenced responding to associated words, with faster responding to targets. Morey et al. (2008), however, provide a statistical argument that our understanding of the subliminal priming paradigm is insufficient due to an incorrect assumption of the null hypothesis. Even if subliminal priming does exist, there are limits to how it operates. Conscious stimuli influence unconscious behavior in classic priming procedures. Can it be said that unconscious primes influence conscious behavior? Not really; according to one study at least, it only influences other unconscious information. Specifically, Balota (1983) found semantic priming from both primes that were in awareness and primes that were outside of awareness (i.e., imperceptible). The two kinds of primes differed, however, when it came to influencing direct memory tests. Recollection of having received a target was greater when it was paired with a related word that had been presented as a supraliminal prime, compared to a related word that had not been presented. In contrast, when a target was presented for recollection with a related word that had been presented as an unconscious prime, it had no beneficial effect on recollection compared to when it was paired with a different related word. In the terms of the embedded processes model of working memory (Cowan 1988, 1999) it appears that there is such a thing as automatic semantic activation but that this type of activation does not influence what information reaches the focus of attention. The unconscious activation still appears to depend to some extent on attention to the stimuli at the time of their presentation. Wood et al. (1997) carried out a procedure in which special word pairs were presented in the channel to be ignored in selective listening. An example is the word pair taxi-fare. The first part of the word pair disambiguates the second part. When participants later were asked to spell a word that had two possible spellings (e.g., fare versus fair), those who had received the corresponding disambiguating word pair more often spelled the word in a manner consistent with that pair. Wood et al. found, though, that this effect occurred only when the attended channel was presented at a very slow rate. When it was presented at a rate more typical of selective listening procedures, this sophisticated version of unconscious semantic activation disappeared. Apparently, the slow presentation allowed attention to wander on to the message that was supposed to be ignored, which contained the disambiguating word pairs.
1.3.2
Perception
In general, much of what we perceive requires some sort of initial conscious registration, which is automatic in nature and akin to the attentional orienting described in the embedded-process model (Cowan 1988, 1999). Any conscious
21
22
Amanda L. Gilchrist and Nelson Cowan
perception of environmental stimuli that follows this registration is modulated by controlled, selective attention. With this in mind, is it possible that perception may occur below a conscious threshold? Whereas findings from subliminal priming make a slightly stronger case for unconscious activation in memory, researchers are divided as to whether perception can occur without awareness. Objects that could be described as subliminal could alternatively be described as weakly conscious. One can test awareness of stimuli by presenting percepts at a subjective threshold (i.e., based off of what participants report) or by setting threshold at an objective value (e.g., d’, a common detection parameter, set to 0). Prior studies provide support for unconscious or subliminal influences upon what we perceive. Snodgrass and Shevrin (2006) presented participants with a series of unmasked words briefly flashed on a computer screen, which they were later asked to identify in a forced-choice task. As items were presented onscreen, participants were instructed to follow one of two strategies: ‘look’, where instructions indicated to attempt to intentionally identify a word, or ‘pop’, where participants were to mention the first word that appeared in mind. Participants were later asked about which strategy was preferred. Initial tests showed a strategy congruence effect on perception – participant performance was typically above chance for their preferred strategy, whereas performance on the nonpreferred strategy was below chance (see Snodgrass et al. 2004 for an in-depth discussion). Interestingly, grand means for all conditions centered around chance levels. According to Snodgrass and Shevrin (2006), if subliminal perception is simply weak conscious perception, this implies a hierarchy where higher-order responses are not possible without underlying, lower-order responses; thus, there is a direct relationship between all responses. If this is so, correct identification cannot occur if the ability to detect a stimulus is at chance. Yet, the results show that despite objective below-threshold detection levels (d’ at chance), a preferred strategy facilitated performance, with identification above chance. Studies by Marcel (1983) provide further evidence against this hierarchical processing structure. Here, masking perceptual stimuli at a subjective threshold level was less likely to affect semantic processing, or even graphic processing, than item detection. Decreasing the time between stimulus and mask led to chance performance on detection first, followed by graphic and semantic processing respectively (Experiment 1). Additionally, repetitions of a masked word led to greater associative priming in a lexical decision task, but had no effect upon an item’s probability of being detected (Experiment 5). These results suggest that performance on these types of tasks is not only due to conscious aspects of perception, and that unconscious aspects of perception are actually unconscious. These results were framed in terms of unconscious attributions from
Conscious and unconscious aspects of working memory
participants. A brief flash of a given word increases its activation, regardless of whether it is consciously perceived; whether the activation is used to correctly identify a stimulus depends entirely on what the current and preferred strategy instructions are. If a preferred strategy is instructed, the activation increase influences correct responding (Snodgrass and Shevrin 2006). In the embedded-processes model, these effects could be viewed as consequences of conscious processes in preparing the field of activated memory for perception. The central executive’s shifting of the focus of attention leaves in its wake an activation field and the way in which it is set up can make a difference for perception. Different individuals apparently have different preferred methods of preparing the activation field for perception. This is an important area for future research.
1.4
Theories of consciousness and unconscious processing
There are various theories and models that discuss how conscious and unconscious processes operate in human cognition. Given space constraints we discuss those theories of consciousness that are most applicable to the model of working memory that we propose, with emphases on pertinent behavioral and neuroscientific findings. We discuss Global Workspace Theory (GWT), a theoretical framework of consciousness that has potential to explain both cognitive and neuroscientific phenomena relating to conscious processing, and a taxonomy of consciousness proposed by Dehaene et al. (2006). This latter framework, as we note in following sections, fits well within the parameters of the embedded-process model. Relevant conceptions of conscious processing will be addressed below; this will follow with a hypothesized model of conscious awareness and processing in working memory.
1.4.1
Global workspace theory
GWT, proposed by Baars and colleagues (Baars et al. 2003; Baars and Franklin 2003, 2007), is quite possibly the most relevant model of conscious processing as it pertains to working memory. According to this model, there is a bidirectional information flow, with conscious processes influencing unconscious processes, and vice versa. The priming effects discussed above, where conscious information influences unconscious information (and vice versa), provide support for this. Within the workspace proposed by Baars, stimuli that are unconscious are processed on a local scale, with activation restricted to areas devoted to that
23
24
Amanda L. Gilchrist and Nelson Cowan
particular modality. For example, consider a sound made while you are sleeping. Most likely, unless you are in the lightest stages of sleep, this should not disturb you. In contrast, the brain is able to process this sound, despite the low level of arousal. However, this processing will only be present in neural regions directly responsible for hearing – primary areas in the temporal lobes, extending no farther (see Rees et al. 2002, for a review of similar findings regarding vision). Similar results providing support for local processing of unconscious stimuli have also been found for comatose patients (Kotchoubey et al. 2002), as well as those with blindsight (Cowey and Stoerig 1997). When arousal levels are higher, and we have the capacity to be conscious of internal and external stimuli in our environment, the GWT predicts a different manner of processing. In contrast to local processing of unconscious stimuli, conscious stimuli evoke sufficient activation that not only affects local sensory areas, but also leads to overall global activation, particularly for frontal and parietal regions (e.g., Seth et al. 2005; Vuillemier et al. 2001). The processing of conscious information allows for the convergence of various sensory systems and processing networks that would otherwise operate independently; this allows specialized networks to access other networks, coordinate cognitive operations, and combine various inputs into a bound whole. Thus, conscious inputs allow this proposed workspace to function. According to this theory, only one conscious input can dominate attention and cognitive processing in working memory at any given time; Baars notes that this is analogous to a spotlight focused on a region of theater stage (Baars et al. 2003). The ‘spotlight’ of consciousness is focused and directed both by attention and executive control. Following Baars’s analogy, the remainder of the stage and the theater is considerably darker; thus, as attention and executive control are directed towards one particular item in consciousness, other items present in working memory or in long-term memory are consequently unconscious. Some unconscious items can exert effects on conscious processing – these are known as contexts. For example, structures in the parietal lobe, typically critical for recognizing object motion or location, are not critical for recognizing objects; however, these regions have spatially-oriented maps that can shape one’s perceptual experience of an object. In no case is this made clearer than in that of contralateral neglect: persons with damage to regions of parietal cortex experience the loss of half of their conscious visual field. Although the parietal cortex is not critical for object identification, it obviously plays a crucial role in one’s conscious visual experience, as neglect shows. Via attention, executive processing, and various contexts, conscious content is broadcast to the theater ‘audience’. In terms of neural activation, correlates of this one-item spotlight typically involve association areas of a specific sensory modality. Association areas contain fiber
Conscious and unconscious aspects of working memory
projections to various regions of cortex, especially frontoparietal regions; thus, this can contribute to global activation of conscious input. Other sensory regions are not activated; it is likely that activation from conscious inputs mutually inhibits competing sensory inputs, keeping processing for these items within their respective regions. This model fits well with activation-based models of working memory. One can see from the emphasis on a one-item spotlight of consciousness that Oberauer’s model is the most similar, due to his proposed one-item focus of attention. As previously mentioned above, however, the embedded-process model proposed by Cowan is not necessarily ruled out, as items within a four-item focus of attention may receive processing on the basis of priority; it is entirely possible that one attention-demanding item could receive a considerable share of available resources. These models treat the contents of memory as a workspace, where items may be activated, accessed, attended, chunked, and so on. Additionally, like these working memory models, GWT highlights the role of directed attention; items must receive attention to be conscious. Despite the similarities between the two working memory models and GWT, which is a significant strength, this model of consciousness has one weakness of interest. Here, consciousness is assumed to be dichotomous – items are either conscious and receive a spotlight of attention, or they remain unconscious. This does not appear to follow from Baars’s theater analogy. Light not only shines on a particular object, but it also radiates out towards other items at a diminishing rate. Thus, shining an attentional spotlight on a given item not only helps one item remain in conscious awareness, but also ‘illuminates’ items in close proximity to a lesser degree. Proximal items are not completely hidden in darkness, yet are not as visible as an item in the spotlight; rather, they reside somewhere in the middle of these two extremes. Following this analogy, we assume that this middle area has a wide range of variance – an item may be barely visible, or may be relatively noticeable, depending on many conditions: how large the spotlight is, how much light something on stage receives, the dimensions of the theater, and so on. From this idea, it follows that while items proximal to an attended item are clearly not conscious, these items are not necessarily unconscious either. Instead, consciousness is more continuous, with these items existing on some middle plane between two extremes; how close a given item is to one of these extremes depends on many factors, such as how much attention a conscious item receives, whether attention is divided, and efficacy in executive control. An alternative taxonomy of consciousness has been described by Dehaene et al. (2006), and is discussed below, as this will have significant implications for how consciousness relates to our proposed model of working memory.
25
26 Amanda L. Gilchrist and Nelson Cowan
1.4.2
Dehaene’s taxonomy of conscious processing
As we mentioned above, we do not necessarily advocate the idea that consciousness is all-or-none. Rather, consciousness could exist on a continuum, with some items being conscious, some being unconscious, and many falling in the middle of these two extremes. The taxonomy proposed by Dehaene et al. (2006) implies a tripartite division of conscious processing, where stimuli are classified on the interactions of bottom-up processes, typically perception, and top-down processes, such as attention. Unconscious information and processing are ‘subliminal’ – here, bottom-up perceptual influences are not strong enough to provide sufficient activation that could allow an item the potential to enter consciousness. These items are completely inaccessible, and have limited neural activation beyond localized areas that process a given perceptual modality. Interestingly, this taxonomy allows subliminal information to receive attention, as this is presumed to have modulatory effects on processing. For example, allocating attention to a prime-target pair in a subliminal priming paradigm produces the desired priming effect; when that attention is unable to be allocated, the effect disappears. Similar results have been found in patients with blindsight by presenting a conscious cue for a target in the blind visual field. Although items still remain unseen to participants, allocating attention in the absence of strong perceptual processing amplifies subliminal processing effects; if these items cannot be attended, these effects will be less likely to occur. In contrast to subliminal processes, which have insufficient bottom-up stimulus strength, preconscious processes have enough perceptual stimulus strength, and sufficient neural processing, to become activated. But these items are not in conscious awareness, as there is no top-down allocation of attention. Unlike subliminal stimuli, which are not accessible at all, preconscious stimuli have the potential for conscious access, provided that attention is drawn towards these items. In terms of neural processing, preconscious activation is not as localized as processing for subliminal stimuli. Dehaene et al. (2006) propose that neural activation will spread beyond localized processing areas, into multiple sensorymotor regions; however, the lack of top-down processing keeps neural activation from becoming more global by extending to frontoparietal regions, a hallmark of processing items in conscious awareness. As suggested, necessary conditions for conscious awareness include sufficient perceptual strength and sufficient allocation of attention. Items that meet these criteria are considered conscious in the proposed taxonomy; global neural activation occurs for these items via activation of critical frontal and parietal lobe regions. The concepts proposed here illustrate the various connections between
Conscious and unconscious aspects of working memory
the strength of a stimulus, attention, and consciousness, and impose a three-part distinction of conscious processing instead of a dichotomy. This will have important implications for how conscious processing occurs, as described by the embedded-process model of working memory.
1.5
The embedded-process model and consciousness reconsidered
With a stronger understanding of what constitutes conscious and unconscious processing, as well as how these may operate within cognitive tasks and theoretical frameworks, we use the current section to propose how conscious and unconscious processing may operate within the embedded-process model outlined above. As was mentioned previously, this model of working memory contains three levels of nesting for items contained within memory. These levels of nesting require a framework of consciousness that is not dichotic – for this reason, we frame processing of information in working memory in terms of the taxonomy proposed by Dehaene et al. (2006). Below, we will discuss conscious, preconscious, and unconscious processing, as they pertain to an embedded-process model of working memory.
1.5.1
Working memory and conscious processing
The embedded-process model described previously provides advantages in clarifying our understanding of conscious and unconscious processing over structural working memory models. This is due in part to an emphasis on how items are processed, rather than the comprising subcomponents of working memory. In the model we propose, items are more likely to enter into conscious awareness with increasing activation. Thus, provided that current task goals remain unchanged, items present within activated long-term memory have a higher probability of entering the focus of attention than do long-term memory representations that are not activated. On the other hand, if these goals are dynamic, activation of contents will also change rapidly as new items become task-relevant; an item’s current level of activation will no longer serve as a reliable index for its potential to enter consciousness. For information in memory that is already activated, the deciding factor in whether an item becomes conscious or not is allocation of attention. In the current model, the focus of attention is considered an analogue of consciousness proper – items here have both enhanced activation and awareness, ensuring their reportability in cognitive tasks. Indeed, it is a rare event that items present within the focus are forgotten or are incorrectly recalled. These items are
27
28
Amanda L. Gilchrist and Nelson Cowan
not limited by time; as long as attention is maintained, these items will not be lost or deactivated. However, as mentioned previously, the focus of attention only holds around four items at any given time.
1.5.2
Preconscious processing
If conscious awareness finds its analogue within the contents of the focus of attention, this model describes unconscious processing through its treatment of longterm memory contents. We expect that processing of long-term contents differ, as activated long-term contents have a greater likelihood of access to conscious awareness under unchanging task conditions. Activated items that are not present in the focus of attention fit well with a description of information that undergoes “preconscious” processing that Dehaene et al. (2006) discussed. Items that undergo this sort of processing have sufficient neural activation to be present in consciousness, but are buffered into unconscious stores because of a lack of topdown attentional allocation that is similar to the effortful directing of attention via the central executive. This implies that these items have the potential for entering conscious awareness, provided that they are given sufficient attention. Unlike items in the focus of attention, these items may be subject to time-based decay and interference, while their capacity is presumed to be limitless. Such limits seem intuitive. Preconscious contents have strong perceptual stimulus strength, but, due to a rapidly changing environment, such activation should decline with time (e.g., Sperling 1960). Attention not only causes these items to enter consciousness, but also stops the time-based decay of perceptual stimuli inputs. Despite an inability to enter conscious awareness, items in activated longterm memory are still processed in a way that we might call “behind the scenes” or “in the background”. Thus, these processes still exert influence over conscious processing in working memory without having the ability to be reported. For example, Hassin (2005) describes a study in which participants are shown matrices of empty or filled disks. Disks were presented in a sequential fashion in sets of five; participants were to recall each sequence as a test of working memory. A critical manipulation involved implicit rules that sets of disks could follow or fail to follow. In a rule set, locations and qualitative aspects of disks (i.e., filled vs. unfilled) followed an implicit rule. Control sets involved random presentations of disks and locations. The final sets were known as broken rule sets. Here, the first four items of a disk sequence followed an implicit rule, similar to rule sets; however, the final item within a set did not follow the inferred rule. Hassin’s rationale for the final presentation type was as follows: correct performance on rule sets involves an implicit extraction of the underlying rule. As items are being presented sequentially
Conscious and unconscious aspects of working memory
at a rapid speed, the rule is highly unlikely to be consciously understood - indeed, all participants except one were able to reconstruct a presented sequence rule in a following experiment. If such a rule is extracted implicitly, the final item in a broken rule set should be incorrectly recalled and should invoke a slower response; specifically, participants should recall the final item in a way that follows the underlying rule. Thus, systematic error patterns should result. Hassin found that reaction times for final items were significantly slower for broken rule sets than for rule or control sets. Additionally, rule sets had faster response times than control sets, providing support for unconscious rule extraction. In the above example, the conscious portion of the task, many of the items present in the sequence, is stored and maintained in the focus of attention. Due to item limits in the focus, we expect that some items may also be present in activated long-term memory, linked to currently-attended items. Any underlying rules are not critical for good task performance; one only needs to store and maintain a given sequence until it is to be recalled later. Relationships between the disks are irrelevant to the task. As these items are presented sequentially, we presume that local relationships between proximal items can be inferred – for example, a participant may be able to notice and recall that two subsequent disks are in consecutive grid locations. This can be a conscious process, but may quickly be moved outside of the focus as subsequent disks are presented. A global relationship between the full set of items (i.e., the underlying rule) may be unconsciously inferred and processed within activated long-term memory via multiple local relationships between consecutive disks. Additional processing may occur from allocation of resources within the central executive, which has been proposed to serve as one means of activating items in long-term memory (Baddeley 1996).
1.5.3
Unconscious processing
It appears that in terms of conscious and preconscious processing, using the embedded-processing model as a guide, loci for these respective processes occur within the focus of attention and activated long-term memory. Less is known about the contents of long-term memory which are currently inactive. Like activated long-term memory, no item limits are present for these materials; all of the contents of memory are stored and maintained. As these items are not activated, duration limits are irrelevant, though we assume that items, once in longterm memory, can never be lost. However, our understanding of contents currently within this portion of the model is less understood than for items within the other regions of the model, which depend on activation and/or attentional allocation. Going back to Dehaene’s taxonomy, we propose that items within
29
30
Amanda L. Gilchrist and Nelson Cowan
long-term memory, but are not activated, are truly subliminal. These items do not have sufficient stimulus strength to warrant activation, nor do they receive sufficient allocation of attention. They do not contribute to conscious or preconscious processing in any way. While Dehaene et al. (2006) proposed that subliminal processing can still contribute to cognitive operations, we are considerably less sure about this contribution. It is our hope that future studies will provide further understanding of how activated and inactivated items in long-term memory interact with each other, as well as how both contribute to processing of conscious and unconscious stimuli.
1.6
Conclusions
As a theoretical concept, working memory is inextricably tied to conscious phenomena. To keep such information accessible for short periods of time for various cognitive operations, this implies it must be held in conscious awareness. However, we suggest that working memory also necessarily accounts for processing which is not conscious. While not directly reportable or within awareness, both unconscious and preconscious processing influence behavior and performance on cognitive tasks, including those of priming and perception. Given the various models of working memory discussed above, we suggest that activation-based models, particularly those that include nested processes like the embedded-process models of Cowan (1988, 1999) and Oberauer (2002, 2005), are at an advantage in delineating conscious, preconscious, and subliminal processing. These models also fit relatively well with certain theories regarding conscious and unconscious phenomena (e.g., Baars et al. 2003; Dehaene et al. 2006). Certainly additional research is needed to gain clearer understanding of the connections between working memory and conscious (or unconscious) processing. One research area that is of particular interest to us is how items in the different nested processes of working memory interact with each other. As mentioned above, preconscious contents within activated long-term memory may operate ‘behind the scenes’, potentially influencing conscious processing within the focus of attention, but it is unclear how such influence might be exerted. It is possible that unconscious contents may also interact with preconscious or conscious contents within working memory, and this warrants further exploration. Another issue of importance is whether processes per se can become part of consciousness (e.g., the process of selecting one object at the expense of another), or only elements or objects cast into the focus of attention as direct or indirect consequences of these processes. By using activation as a means to define various
Conscious and unconscious aspects of working memory
levels of consciousness in immediate memory, we hope that future studies will help us better understand the special phenomenon of consciousness and how it interplays with processing in working memory.
Acknowledgements This work was completed with support for NIH Grant R01-HD-21338.
References Allen, R. J., Baddeley, A. D. and Hitch, G. J. (2006). Is the binding of visual features in working memory resource-demanding? Journal of Experimental Psychology: General, 135, 298–313. Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press. Atkinson, R. C. and Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence and J. T. Spence (Eds.), The psychology of learning and motivation: Advances in research and theory (Vol. 2, pp. 89–195). New York: Academic Press. Baars, B. J. and Franklin, S. (2007). An architectural model of conscious and unconscious brain functions: Global workspace theory and IDA. Neural Networks, 20, 955–961. Baars, B. J., Ramsøy, T. Z. and Laureys, S. (2003). Brain, conscious experience and the observing self. Trends in Neurosciences, 26, 671–675. Baars, B. J. and Franklin, S. (2003). How conscious experience and working memory interact. Trends in Cognitive Sciences, 7, 166–172. Baddeley, A. D. (1986). Working memory. New York: Oxford University Press. Baddeley, A. D. (1996). Exploring the central executive. Quarterly Journal of Experimental Psychology, 49, 5–28. Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4, 417–423. Baddeley, A. D. and Hitch, G. J. (1974). Working memory. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory. New York: Academic Press. (Vol. 8, pp. 47–89). Baddeley, A. D., Papagno, C. and Vallar, G. (1988). When long-term learning depends on shortterm storage. Journal of Memory and Language, 27, 586–595. Baddeley, A. D. and Warrington, E. K. (1970). Amnesia and the distinction between long- and short-term memory. Journal of Verbal Learning and Verbal Behavior, 9, 176–189. Balota, D. A. (1983). Automatic semantic activation and episodic memory encoding. Journal of Verbal Learning and Verbal Behavior, 22, 88–104. Barrouillet, P., Bernardin, S., Portrat, S., Vergauwe, E. and Camos, V. (2007). Time and cognitive load in working memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 570–585.
31
32
Amanda L. Gilchrist and Nelson Cowan
Bays, P. M. and Husain, M. (2008). Dynamic shifts of limited working memory resources in human vision. Science, 321, 851–854. Beilock, S. L. and Carr, T. H. (2005). When High-Powered People Fail: Working Memory and “Choking Under Pressure” in Math. Psychological Science, 16, 101–105. Broadbent, D. E. (1958). Perception and communication. London: Pergamon Press. Chen, Z. and Cowan, N. (2005). Chunk limits and length limits in immediate recall: A reconciliation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1235–1249. Chen, Z. and Cowan, N. (2009a). How verbal memory loads consume attention. Memory and Cognition, 37, 829–836. Chen, Z. and Cowan, N. (2009b). Core verbal working memory capacity: The limit in words retained without covert articulation. Quarterly Journal of Experimental Psychology, 62, 1420–1429. Chong, S. C. and Treisman, A. (2005). Statistical processing: computing the average size in perceptual groups. Vision Research, 45, 891–900. Colflesh, G. J. H. and Conway, A. R. A. (2007). Individual differences in working memory capacity and divided attention in dichotic listening. Psychonomic Bulletin and Review, 14, 699–703. Conway, A. R. A. (2005). On the capacity of attention: Its estimation and its role in working memory and cognitive aptitudes. Cognitive Psychology, 51, 42–100. Conway, A. R. A., Cowan, N. and Bunting, M. F. (2001). The cocktail party phenomenon revisited: The importance of working memory capacity. Psychonomic Bulletin and Review, 8, 331–335. Cowan, N. (1988). Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information-processing system. Psychological Bulletin, 104, 163–191. Cowan, N. (1995). Attention and memory: An integrated framework. Oxford Psychology Series, No. 26. New York: Oxford University Press. Cowan, N. (1999). An embedded-process model of working memory. In A. Miyake and P. Shah (Eds.), Models of working memory: Mechanisms of active maintenance and executive control (pp. 62–101). Cambridge, U.K.: Cambridge University Press. Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24, 87–185. Cowan, N. (2005). Working memory capacity. New York: Psychology Press. Cowan, N., Elliott, E. M., Saults, J. S., Morey, C. C., Mattox, C., Hismjatullina, A. and Cowan, N. and Morey, C. C. (2007). How can dual-task working memory retention limits be investigated? Psychological Science, 18, 686–688. Cowan, N., Morey, C. C., AuBuchon, A. M., Zwilling, C. E. and Gilchrist, A. L. (2010). Sevenyear-olds allocate attention like adults unless working memory is overloaded. Developmental Science, 13, 120–133. Cowan, N., Naveh-Benjamin, M., Kilb, A. and Saults, J. S. (2006). Life-Span development of visual working memory: When is feature binding difficult? Developmental Psychology, 42, 1089–1102. Dehaene, S., Changeux, J.-P., Naccache, L., Sackur, J. and Sergent, C. (2006). Conscious, preconscious, and subliminal processing: A testable taxonomy. Trends in Cognitive Sciences, 10, 205–211.
Conscious and unconscious aspects of working memory
Engle, R. W., Tuholski, S. W., Laughlin, J. E. and Conway, A. R. A. (1999). Working memory, short-term memory, and general fluid intelligence: A latent-variable approach. Journal of Experimental Psychology: General, 128, 309–331. Gilchrist, A. L., Cowan, N. and Naveh-Benjamin, M. (2008). Working memory capacity for spoken sentences decreases with adult aging: Recall of fewer, but not smaller chunks in older adults. Memory, 16, 773–787. Gilchrist, A. L., Cowan, N. and Naveh-Benjamin, M. (2009). Investigating the childhood development of working memory using sentences: New evidence for the growth of chunk capacity. Journal of Experimental Child Psychology, 104, 252–265. Gold, J. M., Fuller, R. L., Robinson, B. M., McMahon, R. P., Braun, E. L. and Luck, S. J. (2006). Intact attentional control of working memory encoding in schizophrenia. Journal of Abnormal Psychology, 115, 658–673. Hassin, R. R. (2005). Non-conscious control and implicit working memory. In R. R. Hassin, J. S. Uleman and J. A. Bargh (Eds.), The new unconscious (pp. 196–224). New York: Oxford University Press. Hebb, D. O. (1949). Organization of behavior. New York: Wiley. Jacoby, L. L. (1991). A process dissociation framework: Separating automatic from intentional uses of memory. Journal of Memory and Language, 30, 513–541. James, W. (1890). The principles of psychology. NY: Henry Holt. Jefferies, E., Ralph, M. A. L. and Baddeley, A. D. (2004). Automatic and controlled processing in sentence recall: The role of long-term and working memory. Journal of Memory and Language, 51, 623–643. Johnston, W. A. and Heinz, S. P. (1978). Flexibility and capacity demands of attention. Journal of Experimental Psychology: General, 107, 420–435. Kane, M. J. and Engle, R. W. (2003). Working-memory capacity and the control of attention: The contributions of goal neglect, response competition, and task set to Stroop interference. Journal of Experimental Psychology: General, 132, 47–70. Kotchoubey, B., Lang, S., Bostanov, V. and Birbaumer, N. (2002). Is there a mind? Electrophysiology of unconscious patients. News in Physiological Science, 17, 38–42. LaBerge, D. and Brown, V. (1989). Theory of attentional operations in shape identification. Psychological Review, 96, 101–124. Luck, S. J. and Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions. Nature, 390, 279–281. Lustig, C., Hasher, L. and Zacks, R. T. (2007). Inhibitory deficit theory: Recent developments in a “new view.” In D. S. Gorfein and C. M. MacLeod (Eds.), Inhibition in cognition (pp. 145– 162). Washington, D.C.: American Psychological Association. Marcel, A. J. (1983). Conscious and unconscious perception: Experiments on visual masking and word recognition. Cognitive Psychology, 15, 197–237. McNab, F. and Klingberg, T. (2008). Prefrontal cortex and basal ganglia control access to working memory. Nature Neuroscience, 11, 103–107. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. Miller, G. A., Galanter, E. and Pribram, K. H. (1960). Plans and the structure of behavior. New York: Holt, Rinehart and Winston, Inc. Miyake, A., Friedman, N. P., Rettinger, D. A., Shah, P. and Hegarty, M. (2001). How are visuospatial working memory, executive functioning, and spatial abilities related? A latent variable analysis. Journal of Experimental Psychology: General, 130, 621–640.
33
34
Amanda L. Gilchrist and Nelson Cowan
Miyake, A. and Shah, P. (Eds.). (1999). Models of working memory: Mechanisms of active maintenance and executive control. Cambridge, U.K.: Cambridge University Press. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology, 11, 56–60. Morey, C. C. and Cowan, N. (2004). When visual and verbal memories compete: Evidence of cross-domain limits in working memory. Psychonomic Bulletin and Review, 11, 296–301. Morey, R. D., Rouder, J. N. and Speckman, P. L. (2008). A statistical model for discriminating between subliminal and near-liminal performance. Journal of Mathematical Psychology, 52, 21–36. Norman, D. A. (1968). Toward a theory of memory and attention. Psychological Review, 75, 522–536. Oberauer, K. (2002). Access to information in working memory: Exploring the focus of attention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 28, 411–421. Oberauer, K. (2005). Control of the contents of working memory – A comparison of two paradigms and two age groups. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 714–728. Rees, G., Kreiman, G. and Koch, C. (2002). Neural correlates of consciousness in humans. Nature Reviews, 3, 261–270. Rouder, J. N., Morey, R. D., Cowan, N., Zwilling, C. E., Morey, C. C. and Pratte, M. S. (2008). An assessment of fixed-capacity models in visual working memory. Proceedings of the National Academy of Sciences of the United States of America, 105, 5975–5979. Saults, J. S. and Cowan, N. (2007). A central capacity limit to the simultaneous storage of visual and auditory arrays in working memory. Journal of Experimental Psychology: General, 136, 663–684. Seth, A. K., Baars, B. J. and Edelman, D. B. (2005). Criteria for consciousness in humans and other mammals. Consciousness and Cognition, 14, 119–139. Simons, D. J. and Rensink, R. (2005). Change blindness: Past, present, and future. Trends in Cognitive Sciences, 9, 16–20. Snodgrass, M., Bernat, E. and Shevrin, H. (2004). Unconscious perception: A model-based approach to method and evidence. Perception and Psychophysics, 66, 846–867. Snodgrass, M. and Shevrin, H. (2006). Unconscious inhibition and facilitation at the objective detection threshold: Replicable and qualitatively different unconscious perceptual effects. Cognition, 101, 43–79. Sperling, G. (1960). The information available in brief visual presentations. Psychological Monographs, 74 (Whole No. 498). Stevanovski, B. and Jolicoeur, P. (2007). Visual short-term memory: Central capacity limitations in short-term consolidation. Visual Cognition, 15, 532–563. Treisman, A. M. (1964b). Selective attention in man. British Medical Bulletin, 20, 12–16. Treisman, A. M. and Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97–136. VanRullen, R. and Koch, C. (2003). Is perception discrete or continuous? Trends in Cognitive Sciences, 7, 207–213. Vogel, E. K., McCollough, A. W. and Machizawa, M. G. (2005). Neural measures reveal individual differences in controlling access to working memory. Nature, 438, 500–503.
Conscious and unconscious aspects of working memory
Vuilleumier, P., Sagiv, N., Hazeltine, E., Poldrack, R., Swick, D., Rafal, R. D. and Gabrieli, J. D. E. (2001). Neural fate of seen and unseen faces in visuospatial neglect: A combined eventrelated functional MRI and event-related potential study. Proceedings of the National Academy of Sciences, 98, 3495–3500. Wood, N. and Cowan, N. (1995a). The cocktail party phenomenon revisited: How frequent are attention shifts to one’s name in an irrelevant auditory channel? Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 255–260. Wood, N. and Cowan, N. (1995b). The cocktail party phenomenon revisited: Attention and memory in the classic selective listening procedure of Cherry (1953). Journal of Experimental Psychology: General, 124, 243–262. Wood, N. L., Stadler, M. A. and Cowan, N. (1997). Is there implicit memory without attention? A re-examination of task demands in Eich’s (1984) procedure. Memory and Cognition, 25, 772–779. Zhang, W. and Luck, S. J. (2008). Discrete fixed-resolution representations in visual working memory. Nature, 453, 233–235.
35
chapter 2
Markers of awareness? EEG potentials evoked by faint and masked events, with special reference to the “attentional blink” Rolf Verleger
University of Lübeck, Germany
2.1
Paradigms
This review will start with a brief reminder about the pioneering research done with the paradigm on detection of auditory signals and then will focus on two more recent lines of research which are (1) Detection and identification of faint or masked visual stimuli, and (2) The attentional blink. Other paradigms that have to be skipped for reasons of space include binocular rivalry (e.g., Roeber et al. 2008), change blindness (e.g., Eimer and Mazza 2005; Fernandez-Duque et al. 2003; Koivisto and Revonsuo 2003; Schankin and Wascher 2007, 2008; Turatto et al. 2002; see Czigler this volume), and perception of ambiguous figures (e.g., Keil et al. 1999; Kornmeier and Bach 2004; Smith et al. 2006).
2.2
Detection of auditory signals
It is easily overlooked in nowadays discussions that brain correlates of awareness were already studied in the very beginnings of ERP research. This early approach used the paradigm of detection of faint auditory signals, starting with Hillyard et al.’s (1971) Science paper on “Evoked potential correlates of auditory signal detection”. The paradigm is illustrated in Figure 1. A warning light was or was not followed 0.5 s later by a brief tone. Prompted by another light, 1 s later, participants had to indicate whether a tone was present or not. Tone intensity was varied across blocks, reaching from low intensities where participants’ responses were at chance level to clearly detectable intensities.
38
Rolf Verleger
Figure 1. Auditory Detection Task of Hillyard et al. (1971). See text for description.
This study and the following series of papers established that the principal correlate of the subjective, conscious decision that some signal was present was the P3 component (Kerkhof and Uhlenbroek 1981; Parasuraman and Beatty 1980; Parasuraman et al. 1982; Paul and Sutton 1972; Ruchkin et al. 1980; K. Squires et al. 1973, 1975a, 1975b; N. Squires et al. 1978; Sutton et al. 1982; Wilkinson and Seales 1978). A corresponding approach with somatosensory stimuli was pioneered by Desmedt et al. already in 1965, in the very first description of the P3 component (in parallel to Sutton et al. 1965). However, this approach did not have immediate impact, being published as a monograph in French. Covering the literature on somatosensory stimuli would exceed the limits of this contribution. It may suffice to mention two recent papers that focused on the earliest signs of conscious feelings. Smid et al. (2004) described an early N1-type N60 that distinguished between detected and undetected small passive movements of the lower right leg. Schubert et al. (2006) masked a faint electrical pulse to the left index finger by a strong pulse to the right index finger, and obtained enhanced P100 and N140 amplitudes when the faint stimuli were detected compared to when they remained undetected.
2.3
P3 evoked by unidentified visual stimuli?
When ERPs became available in the sixties, subliminal visual perception had become a topic of general interest (e.g., Dixon 1971). Therefore, some of the earliest applications of ERPs covered this topic, using faint and briefly presented stimuli, either masked by following stimuli (Donchin and Lindsley 1965; Schwartz and Pritchard 1981) or not (Shevrin and Fritzler 1968a, 1968b). A number of findings were obtained (see Shevrin 2001, for a review on his work). But these studies were affected by several shortcomings arising from circumstances prevailing at that time, of which probably the most important one is that ERPs were not recorded at lateral posterior sites (P7, P8, PO7, PO8) where, as was later found, most evoked visual components are largest and are most sensitive to experimental variation. (Cf. Rossion and Jacques 2008, for the continuing relevance of this point.)
Markers of awareness?
Figure 2. Stimulus Series in Brázdil et al. (1998). An example for a portion of the stimulus series is depicted. The interval between stimuli varied between 2 s and 5 s. Participants had to press a button in response to every X. The X targets occurred with a probability of 1/5 among the frequent O nontargets. Half the stimuli were well visible (denoted by black appearance) by being presented for 200 ms. The other half was hardly visible (grey appearance) by being presented for 10 ms.
Of the more recent studies, a few made the claim that some small P3 component, though certainly smaller than the P3 evoked by well visible stimuli, could be obtained with infrequent stimuli presented among frequent stimuli, although all these stimuli were so faint and so briefly presented that they could not be distinguished (Brázdil et al. 1998; Bernat et al. 2001). This finding was even made in intracranial recordings (Brázdil et al. 2001). These results bolstered the claim that some type of cognitive discrimination, reflected by P3, may be made entirely non-consciously. Thereby, these results are in direct conflict with the notion that P3 is a correlate of conscious decision (cf. above, on auditory signal detection, and below). However, identification performance might have not been completely at chance in those studies. The paradigm used by Brázdil et al. (1998, 2001) is depicted in Figure 2. Brázdil et al. (2001) reported that, by absolute numbers, correct hits to the faint signals (brief “X”) occurred only 2/3 as often as false alarms with faint non-signals (brief “O”). So they concluded that the number of hits remained well below chance. However, participants knew very well that the O was presented five times more often than the X, because the faint stimuli were intermingled in clearly visible stimuli. So the ratio of numbers of hits / false alarms has to be multiplied by five to obtain the ratio of the rates of false alarms / hits, yielding 2/3 * 5 = 3.3. Thus, correct responses were 3.3 times more probable than false alarms, which is far above chance. A similar consideration may apply to Brázdil et al. (1998) where numbers of false alarms were not reported. Figure 3 illustrates the paradigm used by Bernat et al. (2001). In their study, the words LEFT or RIGHT were presented for 1 ms, one of these words occurring with a frequency of 20%, the other with 80%. Participants’ ERPs included a slow positivity of about 0.5 µV mean amplitude in response to the 20% stimuli, significantly larger than in response to the 80% stimuli. Since participants were not able to consciously discriminate these words from blank fields in a separate block, Bernat et al. concluded that the slow positivity reflected subconscious discrimination between the rare and the frequent stimuli.
39
40 Rolf Verleger
Figure 3. Trial in Bernat et al. (1998). A sound provided the start signal. In response, participants had to indicate the trial number (“1, ready” until “300, ready”). Then, the word LEFT or RIGHT, with probabilities of 80/20, was presented for 1 ms. After 1 s, a double sound indicated end of the trial. The oddball (20% probability) stimulus was presented in the 3rd, 4th, or 5th trial of a block of five trials.
But infrequent and frequent stimuli were not presented in completely random order within successive blocks of five trials. Rather the oddball stimuli were presented in the 3rd, 4th, or 5th trial of five consecutive trials (Verleger 2001). Thus, a critical question is whether participants were aware of the fact that stimuli were presented in this block structure of five (Bernat 2001). If so then the reported effect might have been due to a difference between earlier and later positions within the fixed order of five stimuli rather than due to a difference between subconsciously perceived rare and frequent stimuli. Thus, these findings remain controversial.
2.4
ERP signature of conscious identification of faint or masked visual stimuli
This section will focus on studies that searched for the neural correlate of consciousness, defined as the difference between identified and unidentified visual stimuli.
2.4.1
The P3 component
As a main finding, like in the above-mentioned auditory studies, the P3 component was present for identified stimuli and was absent, or at least much reduced, for unidentified stimuli. This finding is illustrated in Figures 4, 5, and 6 for the studies by Pins and ffytche (2003), Ergenoğlu et al. (2004), and Koivisto et al. (2005).
Markers of awareness?
Figure 4. Study of Pins and ffytche (2003). Top: Trial structure. The grating was shifted across trials within a 450 ms interval. Its duration was continuously adapted such that it was perceived in 50% of the trials. Bottom: Grand means of recordings from O2 vs. linked mastoids. “Yes” trials where participants reported seeing the grating are in black, “no” trials in dark grey. Catch trials without any grating (used in 4 of the 5 participants) are in light grey (together with the dashed lines as a measure of their variability). Note: 0 ms denotes warning-tone onset in the upper graph but denotes onset of the grating in the lower graph. “Yes” trials differed from “no” trials by the P3 component as well as by the earlier P1 and N2 components (see text). (The lower part of the figure was adapted with permission from Fig. 6 of Pins and ffytche 2003.)
2.4.2 Earlier components: P1 Pins and ffytche (2003) even obtained a very early difference (Figure 4): Gratings that were detected elicited a P1 component at occipital sites at 100 ms, whereas undetected gratings did not. The authors speculated that this P1 effect was a “primary correlate of consciousness” (p. 473), unlike the following effects on N2 and P3 that were assumed to represent “downstream secondary processes” only (ibid.), not contributing directly to perception. This far-reaching interpretation
41
42
Rolf Verleger
Figure 5. Study of Ergenoğlu et al. (2004). Top: Trial structure. A faint square was briefly projected at perceptual threshold. Participants pressed a button when they perceived the square. Bottom: Grand means of recordings (referred to linked earlobes). Trials with detected stimuli (bold) are distinguished by a marked P3 component. (The EEG grand means are reprinted with permission from Fig. 1 in Ergenoğlu et al. 2004.)
of the P1 effect does not appear to be justified, though, because increases of visual P1 have often been obtained with undetected changes of visual input (Kimura et al. 2006, 2008; cf. the chapter by Czigler in this volume). Thus, the P1 effect, if replicable, may just indicate that some change was noticed by the visual system, without any consciousness involved. Such notice of change might indeed be a necessary prerequisite for conscious identification, but in view of Kimura et al.’s (2006, 2008) results not a sufficient one.
Markers of awareness?
Figure 6. Study of Koivisto et al. (2005). Top: Trial structure. The letters H, T, or U were briefly presented and followed by a mask. One of the letters was the target. Bottom: Grand average recording for the target letter when detected (bold line) and when undetected (dashed line) from P7 (= T5; left panel) and from P3 (right panel), referred to nose. Detected targets evoked larger N2 and P3 components than undetected targets (which did evoke these components too, in contrast to the tasks depicted in Figs. 4 and 5, because at least the mask was clearly visible).
2.4.3 N2-type components There are a number of reports on ERP differences between identified and unidentified stimuli after this very early P1 effect and before the late P3 effect. The question is of some interest whether such differences are “downstream processes” only. Even if such components are “not contributing directly to perception” (Pins and ffytche 2003, p. 473) they might nevertheless be reflections of processes necessary to make the subject become aware of the result of perception. There is some evidence for a negative component that distinguishes between identified and unidentified stimuli. Pins and ffytche (2003) obtained a N260 (Figure 4), maximum probably at P7 (their Fig. 6; unfortunately the authors displayed maps of the distribution of current source densities only where the P7 site, by being situated at the margin of the montage, does not appear any more). A similar negative difference with maximum at about 250 ms, again probably largest at P7 (no complete topographic data were reported), was obtained by Koivisto et al. (2005, Exp. 2; see Figure 6). Differing between detected and undetected target
43
44 Rolf Verleger
letters, this N250 was labeled “visual awareness negativity” by Koivisto et al. However, a similar (though smaller) difference was obtained in their Exp.1 between target and nontarget letters when both were unidentified (being followed by a mask after 33 ms). Therefore, it is not plausible that N250 was a direct correlate of awareness in these experiments but rather a correlate of selection of the relevant target. Indeed, the authors termed this N250 effect “selection negativity” when it occurred with unidentified stimuli. (There was a negative potential overlapping this selection negativity, as a difference between perceived and unperceived letters, in Koivisto et al.’s Exp. 1, which the authors termed “awareness negativity”. However, this effect might well be confounded by potentials evoked by the mask because mask timing systematically differed between perceived and unperceived letters. This confound did not occur in their Experiment 2, which was therefore selected here for report). Ojanen et al. (2003) reported a sharp negative potential at about 450 ms after onset of briefly presented pictures that were successfully rated as intact or scrambled (Figure 7). Being absent after pictures where participants could not tell the difference between intact and scrambled, this potential was again called “visual awareness negativity”. However, this “VAN” differs from Koivisto et al.’s (2005) VAN by its much later latency (after the P3 vs. before P3), by its topography (largest at Cz and entirely absent at posterior sites vs. largest at posterior sites) and by its sharp spiky waveform. Furthermore, the difference was confounded by physical factors because most of the identified pictures had high contrast and most of the unidentified pictures had “middle”, i.e., lower, contrast. Thus, the status of this sharp Cz-focused N450 is unclear. Of interest, no difference between identified and unidentified pictures was obtained for P3. Possibly, the strict high-pass filter (0.3 s time constant) had filtered much of the P3 component (Duncan-Johnson and Donchin 1979). Dehaene et al. (2001) measured ERPs to briefly presented words that were embedded in masking streams of patterns of small quadrangles. Words were visible when the screen briefly (70 ms) remained blank before and after the word. Visible words elicited larger potentials than masked words, including P1 (like Pins and ffytche 2003), posterior N250 (like Pins and ffytche 2003, and Koivisto et al. 2005), central N340 (possibly word-specific), and P3. Unfortunately, no waveforms were shown in that paper, so the question cannot be answered to what extent some of these effects might be due to different overlap of the ERPs evoked by the preceding and following blank vs. mask screens.
Markers of awareness?
Figure 7. Study of Ojanen et al. (2003). Top: Coherent and scrambled pictures were presented for 27 ms at high, middle, or low contrast, in random order. Bottom: Grand averages (referred to linked mastoids). Solid lines denote trials in which participants were able to classify the pictures as intact or scrambled, dashed lines denote trials where the “don‘t know“ response was made. (This figure is compiled with permission from Fig. 1 and Fig. 2 of Ojanen et al. 2003.)
45
46 Rolf Verleger
2.4.4 N2pc A way to deal with the overlap of potentials evoked by masked and masking stimuli is to present task-relevant stimuli on one side of fixation and irrelevant stimuli on the other side. Then, an “N2pc” may be recorded in the difference potential between left and right lateral posterior sites: more negativity contralaterally to relevant stimuli than ipsilaterally, with a peak at about 250 ms after stimulus onset (Eimer 1996; Hopf et al. 2006; Luck and Hillyard 1994; Wascher and Wauschkuhn 1996). N2pc may sensitively disentangle participants’ selection of masked and masking stimuli, as is illustrated in Figure 8 and will be detailed in the following (data from Jaśkowski et al. 2002; cf. Verleger and Jaśkowski 2007). The paradigm is described in the legend of Figure 8. Conventional ERPs are shown in the upper ERP panel of Figure 8. They are pooled across trials with left or right side of the target shape and are recorded against common reference (the nose-tip). The first stimulus pair, presented at 0 ms, evoked P1 and N1 components, no matter whether this pair was visible or not (167 ms vs. 83 ms SOA). The N1 component probably included contributions from the second stimulus, presented at 83 ms or 167 ms. Critically, the first notable distinction between compatible and incompatible trials (i.e., trials where the target shapes were at the same or at different positions in masked and visible stimulus pairs) was the N2 component that was much larger in incompatible trials. This N2 was most probably evoked by the visible stimuli, peaking at the fixed latency of approximately 290 ms after onset of the visible stimulus both with 83 ms and with 167 ms SOA cases (i.e., at 370 ms and 455 ms after onset of the masked stimulus). Therefore, this peak is denoted “N2(visible)” in Figure 8. But components evoked by the masked stimuli are not readily seen in these conventional ERPs. For obtaining the lower ERP panel of Figure 8, trials were sorted according to side of the target shape in the visible stimulus, and the difference contralateral minus ipsilateral to the target was formed separately for left-side-target trials (P8-P7, i.e., right minus left site) and right-side target trials (P7-P8). N2pcs, termed “N2pc(visible)”, are evident in the lower panel at the latencies of N2 in the upper panel, which shows that N2 has a larger contribution from the cortex contralateral to the target shape than from the ipsilateral one. Of importance, this effect is preceded by a divergence between contralateral and ipsilateral trials, peaking at 280 ms, which is evidently the N2pc evoked by the target shape in the masked stimulus, “N2pc(masked)” in Figure 8, pointing upwards or downwards, depending on side of the target shape in the masked stimulus (same side as the target in the visible stimulus triggers an upward deflection, opposite side a downward deflection).
Markers of awareness?
Figure 8. Data from Jaśkowski et al. (2002, Exp. 1). Top: A first pair of stimuli, left and right from fixation, was followed at the same screen positions by a second, larger pair, with stimulus-onset asynchrony of either 83 ms or 167 ms. This caused metacontrast masking of the first pair, more or less completely (83 ms) or incompletely (167 ms). Left/right choice responses had to be made according to position of the target in the visible stimulus. (Target was diamond or square, alternating between subjects). The masked stimuli could be compatible or incompatible with the visible stimuli (or neutral, not displayed here). See text for description of the ERP results. Bold lines are from trials with 167 ms SOA, thin lines from 83 ms SOA. Black lines denote compatible trials, grey lines incompatible trials.
47
48 Rolf Verleger
Of interest, this N2pc was only evoked with 167 ms SOA, which is when the masked stimulus remained visible by its relatively long interval to the following masking stimulus, but was insignificant with 83 ms SOA when the masked stimulus could not be identified. Thus, this N2pc(masked) distinguished between stimuli that were consciously perceived and stimuli that were not. Nevertheless, Jaśkowski et al. (2002) were reluctant to consider N2pc (termed PCN in that paper) as a direct reflection of awareness, and wrote: “We do not want to suggest that PCN is a direct reflection of conscious perception, rather it might reflect a process necessary for conscious perception, namely that attention be directed to the stimulus” (p. 53). Indeed, appreciable N2pc potentials were evoked by masked unidentified stimuli in studies of Jaśkowski et al. (2003) and Woodman and Luck (2003) which makes it difficult to associate N2pc with awareness.
2.5
ERP signature of conscious identification of T2 in the Attentional Blink Paradigm
A special case of masked visual stimuli is provided by the “Attentional Blink” paradigm. Two target stimuli (“T1” and “T2”) have to be identified within a rapid stream of stimulation. Identification of T2 is compromised by the necessity to identify T1 in a time-dependent manner: With stimuli rapidly presented at a rate of 10/second (see Potter 2006, for some variation of this presentation rate) T2 is identified well if presented directly after T1 (“lag-1-sparing”), worst at an intermediate interval of 200–300 ms after T1, and then again better when presented at longer lags. Examples are given in Figures 9a and 10a. Due to this temporal specificity, this impairment has been named a “blink”, i.e., a temporary blocking of input processing. Debate is ongoing whether this attentional blink is due to capacity limitations of working memory (Dell’Acqua et al. 2009) or due to T1-induced over-attentive processing of distractors (Olivers 2007) or due to problems in shifting to the new search criterion needed for detecting T2 (Di Lollo et al. 2005; Nieuwenstein and Potter 2006). This debate notwithstanding, it is evident that the attentional blink is mainly caused by interference in cognitive processing rather than by perceptual masking, because T2 usually can be well reported in the very same sequence of stimuli if the instruction says that only T2 has to be reported and T1 may be ignored (see Figures 9b and 10b). Because of the fast regular sequence of stimuli, a priori the early P1 and N1 components evoked by T2 are poor indicators of perception because they undergo habituation due to the preceding sequence of stimuli. Moreover, all components are affected by overlap of potentials evoked by the fast series of consecutive
Markers of awareness?
Figure 9. “Attentional blink” in rapid serial visual presentation: Paradigm and data from Vogel et al. (1998, Exp. 4). (a) Two targets, “T1” and “T2”, were presented within a series of rapidly (83 ms SOA) presented black letters. T1 was a black digit and T2 was a white letter, either an E (in 15% of trials) or another letter. Interspersed between T1 and T2 were 0, 2, or 6 stimuli, = “lags” 1, 3, or 7. (b) When both T1 and T2 had to be identified (solid lines) the E could be identified with chance probability only at lag 3. (Dashed line: Block where only T2 had to be identified). (c) Correspondingly, in the ERPs, there was no P3 elicited by the E in lag 3 trials (densely dashed line). Solid line: lag 1 trials. Dashed line: lag 7 trials. Time-point 0 ms is T2 onset. Shown are difference waveforms between the infrequent E trials and other trials. Recording is from Pz versus averaged mastoids. Being the infrequent target, the E was expected to evoke a P3, which it did in lag 1 and lag 7 trials but not in lag 3 trials. (Parts (b) and (c) of this figure were adapted with permission from Figs. 8 and 9 of Vogel et al. 1998.)
stimuli, similarly to the above-described sequences of masked and masking stimuli. To deal with the problem of overlap, ERP effects have often been measured in difference potentials with some appropriate control condition being subtracted, e.g. in Figure 9c.
49
50
Rolf Verleger
Figure 10. “Attentional blink” in rapid serial visual presentation: Paradigm and data from Kranczioch et al. (2003, 2007). (a) Two targets, “T1” and “T2”, were presented within a series of rapidly (100 ms SOA) presented black letters. T1 was a green letter – vowel or consonant – and T2 was an X, presented in 75% of the trials. Interspersed between T1 and T2 were 0, 1, or 6 stimuli, = “lags” 1, 2, or 7. (b) When both T1 and T2 had to be identified (open circles) the X was identified below chance probability at lag 2. (Black circles: Block where only T2 had to be identified). (c) and (d) ERPs in trials with correctly responded T2 (left) and in trials with missed T2 (right). Time-point 0 ms is T2 onset. (c) P3 effect: In the ERPs to correctly responded T2, P3 was smallest with lag 1 (black line) and largest with lag 7 (dark grey), with lag 3 (light grey) ranging in-between. In the ERPs evoked by missed T2, no distinct potentials are visible. Recording is pooled from Cz and neighboring electrodes versus averaged recordings. (d) N2 effect: At the shaded latency, there is a negative-going difference (upper black line) between detected T2 trials (dark line in the left panel) and trials without T2 (grey line). Recording is pooled from left parieto-occipital electrodes versus averaged recordings. (Parts (b) and (c) of this figure were adapted with permission from Kranczioch et al. 2003, part (d) from Kranczioch et al. 2007.)
2.5.1
Markers of awareness?
The P3 component
Since the study of Luck et al. (1996) that measured ERPs in this paradigm for the first time (focusing on the N400 component, see below), several studies agreed that it is specifically the P3 component that is sensitive to success or failure to report T2. Two arguments support this notion. First, P3 is suppressed when T2 is presented at the critical position compared to longer lags (Vogel et al. 1998; Vogel and Luck 2002; Dell’Acqua et al. 2003; Akyürek et al. 2007; Sessa et al. 2007). One might argue that this suppression is due to the short T1-T2 distance of about 300 ms causing some habituation of the P3 generator. Of particular importance, therefore, is that P3 at the critical lag was even suppressed in comparison to the even shorter lag 1 (Vogel et al. 1998). Vogel et al.’s result is depicted in Figure 9. This latter result was, however, only once replicated, in the MEG study of Kessler et al. (2006) and there for right sources only. In contrast, Kranczioch et al. (2003) which is to my knowledge the only other ERP study that compared T2evoked P3 between lag 1 and other lags, obtained an increase of P3 amplitudes across lags, from no P3 at all at lag 1 via middle-sized P3 at lag 2 (which is where the attentional blink was largest in their preceding behavioral study) to large amplitudes at lag 7. These data are depicted in the left half of part (c) of Figure 10. The second, more specific argument to support the notion that P3 is a correlate of the failure to report T2 is that, at the critical lag, T2-evoked P3 is suppressed in trials where T2 cannot be reported compared to trials where T2 is reported (Rolke et al. 2001; Kranczioch et al. 2003, 2007; Sergent et al. 2005; Martens et al. 2006a, 2006b; Pesciarelli et al. 2007; Koivisto and Revonsuo 2008). This result is illustrated by the comparison between left and right halves of Figure 10c. Thus, the complete suppression of P3 amplitudes in Vogel et al.’s (1998) lag-3 data (Figure 9c) is perhaps a consequence of all trials being included in their averages, irrespective of T2 identification, and that, therefore, their lag 3 data mainly consist of trials in which T2 was missed (cf. Figure 9b).
2.5.2
Earlier components: N2, P2, P1, N1
Some studies measured an earlier N2-type component in T2-minus-control difference waves. Indeed, this posterior-lateral, left-enhanced N270 was larger when T2 was identified than when it was not (Sergent et al. 2005; Kranczioch et al. 2007; Koivisto and Revonsuo 2008). The result is illustrated in Figure 10d with data from Kranczioch et al. (2007; same paradigm as used in Kranczioch et al. 2003, that is shown in the other parts of Figure 10). Sergent et al. (2005), additionally
51
52
Rolf Verleger
described a subsequent, more centrally located N300, which may be vaguely perceived also in the left panel of Figure 10c. A fronto-centrally maximum P250 was smaller when T2 was presented at the critical lag 3 than at other lags (Vogel and Luck 2002; Vogel et al. 1998), but did not differ between detected and undetected items (Kranczioch et al. 2003). The early P1 and N1 components – as far as measurable in the presence of habituation due to the fast presentation – did not differ between identified and unidentified T2 (Sergent et al. 2005) or between T2 positions (Vogel et al. 1998).
2.5.3
N400
Words are expected to evoke an N400 component (Lau et al. 2008). N400 is sensitive to priming: When some earlier presented word had a meaning to which the present word is related then N400 is reduced (Kutas and Hillyard 1980). Of importance – in the present context, by being dependent on the meaning of the two words, this suppression of N400 in interrelated words indicates that both words were perceived to the point that their meaning was being processed. Applying this rationale to the attentional blink, Luck et al. (1996; same data reported in Vogel et al. 1998) presented one word before the fast sequence of stimuli, as “T0” [my term, R.V.] and another word as T2 (amidst senseless letter strings as distractors, preceded by a digit string as T1). N400 evoked by T2 was reduced when T2 was related to T0. Critically, this reduction of N400 did not differ between lags even though T2 words were consciously identified at the critical lag in 65% of trials only, being subject to the attentional blink. Luck et al. (1996) concluded that access of word meaning may proceed without conscious awareness. Giesbrecht et al. (2007) demonstrated that there is in fact a threshold for this non-conscious processing of semantics: N400 suppression was abolished when perceptual load associated with T1 identification increased. Apparently then the “attentional blink” became so intensive as to involve a “semantic blink” [my term, R.V.]. Of note, even undetected words at T2 position can on their own prime processing of a third word (T3) presented after the trial, reducing the T3-evoked N400 when T2 and T3 are related (Rolke et al. 2001). Although clearly present, this priming effect on the ERP evoked by T3 tended to be smaller when T2 was missed than when T2 was identified (Rolke et al. 2001) and, somewhat surprisingly, in a replication of this paradigm (Pesciarelli et al. 2007) was restricted to an earlier portion of the T3-evoked waveshape, around the P2 peak at 270 ms, when T2 was missed, in contrast to a long-lasting modulation of the T3 waveshape, including the N400, when T2 was identified. Taken together, these data where T2 acts as a prime support the conclusions of the above studies (Luck et al. 1996;
Markers of awareness?
Giesbrecht et al. 2007) that the meaning of the unidentified T2 word is processed up to a certain degree.
2.5.4
N2pc
Some recent studies applied the N2pc rationale, as described above (Figure 8), to the attentional-blink paradigm. Being a correlate of selective processing of events, N2pc might be a good indicator of whether T2 stimuli were discriminated among the stream of other stimuli. In order to measure N2pc in the difference from recordings at posterior sites contralateral and ipsilateral to the relevant event, the standard task has to be changed, though. Stimuli have to be presented laterally, with the relevant stimulus presented on one side and an irrelevant stimulus on the other side. Thus, there must be two streams of stimuli, one left and one right, at least from T2 onwards. Figure 11 illustrates the task used by Verleger et al. (2009).
Figure 11. Dual-stream rapid serial visual presentation (Verleger et al. 2009). In this modification of the attentional-blink paradigm, T1 and T2 are presented left or right from fixation. T1 (here depicted as white) was red. The graph presents data on trials in which T1 and T2 was correctly identified, expressed as percentage of all trials in which T1 was correctly identified. These data are from Exp. 1 of Verleger et al. (2009).
53
54
Rolf Verleger
Selecting between these two streams complicates the task, which becomes obvious with “lag-1-sparing”: While T2 is identified as perfectly as in the original task when immediately following T1 in the same stream, this lag-1-sparing is abolished and identification rates drastically decrease when T2 follows T1 at a different location (e.g., Breitmeyer et al. 1999; Verleger et al. 2009; see Figure 11). These considerations notwithstanding, results obtained for N2pc looked promising in such modified versions of the attentional-blink task. Applying this rationale for the first time, Jolicœur et al. (2006) obtained parallel patterns of identification rates and N2pc: T2 was well identified and elicited a large N2pc when T1 could be ignored, identification was worse and N2pc was smaller when both T1 and T2 had to be identified, and this was particularly true when the lag between T1 and T2 amounted to two stimuli, compared to a lag of eight stimuli. Parallel effects on N2pc and behavior were also obtained by Dell’Acqua et al. (2006) and Robitaille et al. (2007). Dell’Acqua et al. (2006) used one T1-T2 lag only (150 ms). N2pc and identification rates were reduced when T1 had to be processed compared to when T1 could be ignored, and were further reduced when processing load of T1 was high. Similarly, using a fixed T1-T2 interval of 350 ms, Robitaille et al. (2007) obtained reduced identification of T2 in parallel with slightly but reliably reduced N2pc to T2 when T1 required an infrequent rather than a frequent response, thereby probably demanding more capacity for its processing and leaving less capacity for processing T2. Thus, these studies converge on having N2pc as an ERP reflection of the attentional blink, similar to P3, but earlier than P3 and therefore possibly closer to the actual reason for the failure in processing, as a failure to select T2 from among the distractors. However, none of these studies demonstrated that N2pc closely covaried with identification by showing that N2pc was larger when T2 was identified than when it was missed. Dell’Acqua et al. (2006) actually reported this comparison but could not obtain significant differences in N2pc amplitudes, possibly due to the poor signal/noise ratio that is typical for difference potentials. This is in contrast to P3 where this comparison had been done, as noted above. Moreover, to exclude the objection that N2pc reduction with the critical lag compared to longer lags is simply due to habituation of the N2pc generator at short lags, N2pc should be large at lag 1 when there is lag-1 sparing. Figure 12 displays results of our study (Verleger et al. 2009) where N2pc was measured at lags 1, 2, and 5. When T1 and T2 were on the same side, N2pc amplitudes were smaller at the short lags 1 and 2 than at lag 5. This is in contrast to identification rates, which were higher at lags 1 and 2 than at lag 5 (Figure 11). When T2 and T1 were on different sides, N2pc amplitudes became more similar to identification rates. To detail, when T2 was on the right, both N2pc amplitudes and identification rates were reduced at lags 1 and 2 compared to lag 5. When T2 was on the left, both
Markers of awareness?
Figure 12. Dual-stream rapid serial visual presentation: Results on N2pc from Verleger et al. (2009). Results of N2pc amplitude in the paradigm depicted in Fig. 11. Data are from Exp. 1 of Verleger et al. (2009). N2pc was measured separately for left T1, in the average across trials of the difference PO8-PO7, and for right T1, in the average of the difference PO7-PO8, separately for same-side and different-side T1 and T2. These difference waveforms are depicted in the upper panels. There, 0 ms is T1 onset, T2 onset is denoted by an arrow pointing down on the x-axis. The first negative peak is the N2pc evoked by T1. This is followed in the Left-T1 data (upper row) by a second negative peak, most probably evoked by the stimulus that follows T1. N2pc evoked by T2 is then seen as a divergence of the waveforms for left T2 and right T2 (bold and thin lines) at 200-300 ms after T2 onset (“T2-evoked”). In the lower graph, amplitudes of T2-evoked N2pc are shown, defined as mean amplitudes 200-300 ms after T2 onset. To measure these amplitudes at lags 1 and 2, the waveforms of lag 5 were subtracted. To measure the N2pc amplitudes at lag 5, mean amplitudes at 600–650 ms were subtracted.
55
56
Rolf Verleger
N2pc amplitudes and identification rates were higher than when T2 was on the right. However, even in this case N2pc amplitudes were at least as large at lags 1 and 2 as at lag 5 although identification rates of T2 were worse at lags 1 and 2 than at lag 5. This latter minor dissociation notwithstanding, for different-side T1 and T2 one might keep the hypothesis that N2pc amplitude reflects identification rate. For same-side T2 the relationship between identification performance and N2pc amplitude does not hold because N2pc is much reduced with the short lags 1 and 2. This latter result might reflect saturation of the N2pc generator. More data are needed to further elucidate the role of N2pc. It may be mentioned that in Jolicœur et al.’s (2006) study, the preponderance of negativity contralateral to T2 that was phasically manifested in the N2pc peak continued during the entire presentation of distractor stimuli following T2. By and large, this tonic contralateral preponderance, termed SPCN by Jolicœur et al. (“sustained posterior contralateral negativity”), varied with the experimental variables in that study as well as in following ones in the same way as N2pc.
2.6
ERP signatures of preparatory states favorable for conscious awareness
2.6.1
Detection of faint stimuli
Some studies searched for EEG correlates of brain states that would increase the probability of detecting an upcoming faint signal. Lutzenberger et al. (1979) found that a moderate increase of sustained negativity (measured at Cz and Fz) was related to better identification of a gap in the outline of the schematic “rocket” that moved across the screen. However, to complicate matters, this rocket served at the same time as feedback signal for the level of cortical negativity, so the results might have been affected by the manner how participants dealt in a given trial with the dual task of controlling their brain negativity and detecting the signal. Similarly, detection of a faint brief laser point in the task illustrated above in Figure 5 was more likely when the DC level of EEG was more negative before stimulus onset (Devrim et al. 1999). Data were reported from Oz recordings only, although EEG was measured from other locations, too. It would be interesting to know whether this relation was indeed specific to the visual cortex, as is suggested by the Oz site. Analyzing another feature of these same data, Ergenoğlu et al. (2004) reported that alpha power was lower before correctly detected signals, specifically at posterior sites above visual cortex (where alpha is most pronounced anyway, so the specificity of this relation might again be doubted).
Markers of awareness?
Increased negativity and reduced alpha might be reasonably interpreted as signs of increased preparatory state. Therefore, these results may be plausibly interpreted as indicating that increased preparatory state is helpful in detecting faint signals. The data do not speak against the possibility that this increased preparatory state is specific to the visual cortex, but may also be interpreted as reflections of heightened general arousal.
2.6.2 Attentional blink Similar evidence has been collected for the attentional-blink task. It was investigated whether subjective states before the trials, as reflected in EEG rhythms, are favorable for T2 identification. Kranczioch et al. (2007) found that decreased coherence in the 10 Hz, 13 Hz, and 20 Hz bands before T1 onset and increased coherence in the 13 Hz and 20 Hz bands between T1 and T2 was advantageous for detecting T2. Generally increased coherence in the 40 Hz band, both before T1 and between T1 and T2, was reported by Nakatani et al. (2005) to be advantageous for detecting T2. The attentional-blink paradigm offers opportunities over and above the masking studies for investigating preconditions of stimulus identification: Because the main determinant of T2 identification is the processing of T1, the brain response to T1 may be a crucial determinant. This possibility was highlighted by McArthur et al. (1999) who likened the time-course of the attentional blink to the P300 waveform elicited by T1 identification: Just as the P300 waveform reaches its peak rather abruptly and then gradually returns to baseline, identification rates of T2 decrease rather abruptly towards their minimum with increasing distance of T2 from T1 and then gradually recover. Thus, McArthur et al. (1999) suggested that the state of cerebral positivity evoked by T1 and indicated by P3 might be a cause for the attentional blink. In possible conflict with this notion, size of P3 evoked by T1 did not correlate across participants with size of the attentional blink in McArthur et al.’s (1999) data, nor in another EEG study that tested for this relationship (Martens et al. 2006b). Such correlation was, however, obtained in an MEG study (Shapiro et al. 2006). The latter finding might be a chance result but, on the other hand, MEG is not as affected as EEG by irrelevant interindividually varying parameters like skull thickness, lending some credibility to this finding. In the same vein, Shapiro et al. (2006) and Kranczioch et al. (2007) report that T1-evoked P3 is larger when T2 is missed than when it is identified, which implies some trade-off of capacities between T1 and T2. However, the T1-effect did actually not reach significance, neither in Shapiro et al. (2006) nor in the recordings actually displayed by Kranczioch et al. (2007,
57
58
Rolf Verleger
C. Kranczioch, personal communication, Oct. 1, 2008). More unambiguous than amplitude results are available data on P3 latency: The peak of T1-evoked P3 was earlier intraindividually when T2 was identified than when it was not (Sergent et al. 2005) and, between individuals, in participants who could well identify T2 compared to participants who suffered from the attentional blink (Martens et al. 2006a). Thus, these data do support a role of timing of T1 processing, as reflected by P3 latency, for identifying vs. missing T2.
2.7
ERP signatures of motor activity evoked by indistinguishable signals
Masked, unidentified stimuli may evoke motor activation. This is illustrated in Figure 13 by data from Jaśkowski et al. (2002) along the lines first described by Leuthold and Kopp (1998). The left or right key had to be pressed in response to the visible stimulus pair according to instruction (alternating across participants: position of either square or diamond was relevant, and top/bottom was mapped to left/right or to right/left). Importantly, this visible pair was preceded by two smaller shapes which were masked by the visible shapes through metacontrast. The position of the target shape in the masked pair produced an early deflection in the Lateralized Readiness Potential (LRP), i.e., in the difference potential between the two motor cortices: The LRP became more negative at the motor cortex contralateral to the hand that would press the key that was assigned to the location of the target shape. Thus, when polarity of the LRP was defined with respect to the hand contralateral to the ensuing target, this early deflection evoked by the primes went to the wrong direction when positions of the target shapes differed between masked primes and masking visible stimulus, and started already going to the right direction when positions of the target shapes were the same for primes and visible stimuli. This result shows that the motor system may be triggered already by stimuli that are not consciously perceived, at least in such prime-target situations where the primes occur in a temporal window very close to the targets and where stimulus-response associations are simple. Indeed, this finding of an early LRP deflection evoked by primes has often been replicated under such situations, with three types of stimuli: First when, like in Figure 13, both the masked primes and the masking imperative stimuli were top-bottom or left-right pairs and responses had to be made with the key corresponding to target position (Jaśkowski et al. 2002; Jaśkowski et al. 2003; Klotz et al. 2007; Leuthold and Kopp 1998; Verleger et al. 2008). Second, when both primes and imperative stimuli were arrows pointing left or right (Eimer and Schlaghecken 1998; Eimer 1999; Jaśkowski et al. 2008;
Markers of awareness?
Figure 13. Motor activation by masked primes (Data from Jaśkowski et al. 2002, Exp. 2). Top: A first pair of stimuli, left and right from fixation, was followed at the same screen positions by a second, larger pair, with stimulus-onset asynchrony of either 83 ms or 167 ms. This caused metacontrast masking of the first pair, more or less completely (83 ms) or incompletely (167 ms). Left/right choice responses had to be made according to top/down position of the target in the visible stimulus. (Target was diamond or square, alternating between subjects). The masked stimuli could be compatible or incompatible with the visible stimuli (or neutral, not displayed here). Bottom: Difference potentials contralateral-ipsilateral to the responding hand. Bold lines are from trials with 167 ms SOA, thin lines from 83 ms SOA. Black lines denote compatible trials, grey lines incompatible trials. Divergence of black and grey lines, both for bold and thin lines, denotes motor priming by the masked stimuli. The absence of such patterns at |P7-P8| (lower panel) speaks for specificity of this effect to the motor system.
59
60 Rolf Verleger
Praamstra and Seiss 2005; Seiss and Praamstra 2004; Verleger et al. 2004). As shown by Eimer and Schlaghecken (2003), these LRP effects are not only due to the intrinsic task-independent directional meaning of arrows but to the fact that arrows were targets, because no LRP deflections were produced by arrow primes when the imperative stimuli were the letters L and R. This finding lends some additional credibility to the LRP effects obtained with the third group of stimuli that do not have any spatial connotations: prime-induced LRP effects have even been reported when primes and imperative targets were the symmetrical symbols and > 2 × 85 ms = 170 ms). Presenting pure tones differing in frequency in the galloping version of the auditory streaming paradigm (see Figure 3), Winkler et al. (2005) asked participants to continuously indicate their perception of the tone sequence by keeping a response button depressed whenever they heard the galloping rhythm. In one condition, stimulus parameters promoted segregation of the two tones into separate streams (streaming condition); in the other condition, parameters allowed perception to fluctuate between segregation and integration (ambiguous condition). Occasional tone omissions elicited two different fronto‑centrally negative ERP responses. The earlier response (peaking between 56 and 72 ms from the time of the omission) was elicited by omissions in the ambiguous stimulus sequence, but not in the streaming sequence. However, the elicitation of this component did not depend on whether the participant perceived the sequence according to the integrated or the segregated organization. The later omission response (peaking between 170 and 180 ms from the start of the omission) was only elicited when participants perceived the sequence as a single integrated stream, but not when they perceived two streams. Thus, in accordance with the results of previous studies, tone omissions only elicited an ERP response, when, due to grouping the high and low tones together, successive tones commenced within the previously observed 170‑ms temporal window. However, whereas the early omission response was fully determined by the stimulus parameters the later response paralleled the participant’s perception, following its dynamic fluctuation within the stimulus sequences. We interpret these results as reflecting two different stages in auditory stream segregation. The early response reflects the outcome of a process (or multiple processes), which determine segregation vs. integration based on acoustic parameters (most likely, the within‑stream inter‑stimulus interval, because this was the parameter varied between the two stimulus conditions; cf. Bregman et al. 2000). In contrast, the late omission response reflects the outcome of auditory scene analysis, dynamically co‑varying with perception. As we stated in Section 3.2 sounds elicit MMN, when they violate some regular feature of the preceding sequence. Thus the results obtained with the MMN response characterize the extraction and representation of sequential regularities. An ERP response, termed object related negativity (ORN), accompanying an immediate form of segregation has been discovered by Alain and his colleagues (Alain and McDonald 2007; Alain et al. 2001; Alain et al. 2002; Zendel and Alain 2009). The ORN is a fronto‑centrally maximal ERP component peaking between 140 and 180 ms from stimulus onset. It is elicited by mistuning a harmonic partial in a complex tone. ORN is elicited even without focused attention. The elicitation of ORN accompanies perception of the mistuned partial and the rest of the
In search for auditory object representations
harmonic complex as separate sounds. Thus it appears that ORN reflects sound segregation by harmonic (spectral) cues. Segregating sound streams by frequency separation, as was shown many times using the auditory streaming paradigm (see above), has been regarded as one of the primitive forms of stream segregation. In a previous study (Winkler et al. 2003b), we showed that, similarly to adults, with large frequency separation and fast presentation rates, the newborn auditory system segregates streams of tones. Thus the algorithms underlying auditory streaming by pitch separation are probably innate, as was assumed by Bregman (1990). On the other hand, schema‑based grouping algorithms are expected to be learned. Counting the notes in a bar, a skill often used by musicians, is a typical schema‑based algorithm. We tested the functioning of this algorithm by presenting professional musicians and non‑musician participants with sequences made up of tone groups separated in frequency and tone duration (van Zuijen et al. 2005; see Figure 5). During the presentation of the tone sequences, participants watched a silenced movie with subtitles. In the ‘Counting’ condition, standard groups (90%) consisting of 4 tones were intermixed with deviant groups (10%) consisting of 5 tones. In the ‘Time’ condition, standard tone groups lasted for 750 ms and deviants for 900 ms. Whereas the ‘Time’ deviants elicited MMN in both groups of participants, ‘Counting’ deviants only elicited MMN in professional musicians. These results suggest that whereas the algorithms extracting the time of sound groups are functional in most adults even when the sounds are not task‑relevant, algorithms “counting” the number of items in a sound group only become automatic during musical training. Thus this study provided evidence for the learning of a schema‑based grouping algorithm. Naturally, automatic operation is not a pre‑requisite of the functioning of a given grouping algorithm. Some grouping algorithms may require focused attention. The role of attention in auditory stream segregation remains a point of dispute. Some results suggest that attention may be needed even for primitive forms of stream segregation (Carlyon et al. 2001; Cusack et al. 2004), whereas other results argue against the requirement of focused attention in such cases (Sussman et al. 2007; Winkler et al. 2003c). Other findings suggest parallel functioning of attentive and pre‑attentive processes in auditory streaming (Snyder et al. 2006; for a review of the role of attention in auditory stream segregation, see Snyder and Alain 2007). We argued above that the regularity representations underlying MMN generation are specific to auditory streams. But is this also true for each individual sound? Ritter et al. (2000, 2006) and De Sanctis et al. (2008) showed that sounds elicit the MMN by violating a regularity of the stream that they belong to, but not by violating a regularity of another sound stream. This means that the deviant sound was exclusively assigned to one stream. In vision, objects are separated from each other by spatio‑temporal borders. In search of auditory objects, a similar
85
Figure 5. Stimulus conditions for the experiment comparing grouping algorithms in professional musicians and non musicians. Sequences consisted of tone groups separated in frequency and tone duration; tones within each group had the same frequency and duration. ‘Counting’ condition (left): The majority of tone groups consisted of 4 tones; occasional deviant groups had 5 tones. The duration of the tone groups varied randomly between 610 and 890 ms. ‘Time’ condition (right): The duration of the majority of tone groups was 750 ms; occasional deviant groups were extended to 900 ms. The number of the tones in the group randomly varied between 2 and 6. (Adapted with permission from van Zuijen et al. 2005.)
86 István Winkler
In search for auditory object representations
principle may be set up. However, since we concentrate on abstract auditory objects, space cannot be the property in the auditory modality within which auditory objects may form borders. Kubovy and Van Valkenburg (2001) argue for the existence of spectro‑temporal borders of auditory objects. That is, auditory objects may be separated from each other by borders drawn in the spectral space. The validity of this notion can be assessed by taking another analogy from visual objects: exclusive assignment of the borders. When two visual objects overlap each other in the visual input (i.e., from the point of view of the observer), the border between them is assigned to the object closer to the observer (the foreground). Under normal circumstances, usually a sufficient number of depth cues are available for the observer to reliably determine foreground and background, and thus the assignment of the border is unambiguous. It is, however, possible to construct displays, in which no additional depth cues are available (e.g., Rubin’s classical face‑vase reversal; Rubin 1915). As a consequence, foreground and background are determined by the assignment of the border. The object receiving the border becomes foreground, whereas the other the background. In order to assess this principle in the auditory modality, we tested whether or not spectro‑temporal borders are assigned exclusively to one of two possible auditory objects (Winkler et al. 2006). Figure 6 shows the stimulus paradigm. High (A, C), medium (E), and low pure tones (B, D) were presented in a cycle with a fixed order (ABCDE). The frequencies and the average SOA was set so, that the low and high tones could not be integrated into a single sound stream, whereas the medium‑pitch tones could be grouped with either one of them. Figure 6 shows the patterns emerging when the medium‑pitch tones are grouped with the high (solid‑line rectangles) or the low tones (dashed‑line rectangles). SOA’s were varied randomly, but within ranges allowing listeners to maintain the structure of the patterns (EAC for the medium‑high and BDE for the low‑medium group). Occasionally, the E tone was presented early, resulting in the formation of deviant medium‑high deviant patterns (EACE followed by AC; marked with thick borders on Figure 6). However, the low‑medium pattern was not affected by this manipulation. Participants were asked either to group together the medium and high or the medium and low tones. They were to maintain this grouping throughout the stimulus blocks. (Grouping was monitored throughout the experiment by a continuous deviance detection task; this is not marked on the figure.) Thus the medium‑pitch tones formed a border between two auditory streams (the high and low streams) and listeners voluntarily assigned this border to one of the two streams. If the assignment of the border was exclusive, then regularities involving the border should only be detected and represented for the stream to which the border was assigned to. That is, the deviation caused by the occasional early presentation of the medium‑pitch tone should only elicit the MMN, when the participant grouped the
87
88
István Winkler
Figure 6. Schematic diagram of the stimulus paradigm for studying exclusive border assignment of auditory objects. Tones are marked with lettered rectangles. Tone frequency is calibrated in the y axis; the x axis marks time. See the text for the explanation. (Adapted with permission from Winkler et al. 2006.)
medium‑pitch tones with the high tones, but not when he/she grouped the low and the medium‑pitch tones together. If, however, both possible groupings of the medium‑pitch tones are evaluated by the auditory system, MMN should be elicited by the deviant tones irrespective of the participants’ voluntary grouping. We found that MMN was only elicited when the participant maintained the grouping that produced the pattern which was violated by the deviant. This means that this pitch‑time border was exclusively assigned to one stream, similarly to the exclusive assignment of space‑time borders in vision. In this section we showed that (1) regularity representations are formed according to the perceived sound streams; (2) the two assumed stages of auditory scene analysis are reflected by ERP responses involving regularity representations; (3) immediate and sequential sound grouping algorithms are indexed by separate ERP responses; (4) at least some of the primitive sound grouping processes affecting the regularity representations are innate, whereas some scheme‑based algorithms are learned; and (5) sounds are usually assigned to only one auditory stream and are only taken into account with respect to the regularity representations belonging to that stream. In summary, the properties of the regularity representations involved in the MMN‑generating process match those expected for auditory object representations.
3.3.3
Generalizing across different stimulus instances
In the previous sections, we already came across some paradigms in which regularities were extracted from acoustically varying stimuli. The paradigm by which MMN was introduced in Section 3.2 is a good example (Figure 1; Winkler et al. 2003a). In this paradigm, the series of footsteps forming the standard was made up of ten different digitized natural footstep sounds. A widely different footstep
In search for auditory object representations
sound elicited the MMN. This result suggests that the representation underlying deviance detection generalized across the different, but generally similar footstep sounds, regarding them as instances of the same regularity. Several studies assessed the effects of acoustic variance on regularity extraction via deviance detection. The paradigms employed in these studies fall into two broad categories. In some of the studies, acoustic variance was introduced in a feature which was irrelevant for the regularity whose detection was tested. In, perhaps, the best example of this approach, both tone frequency and intensity were varied over a wide range while tone duration was kept common for most tones. Occasional longer tones elicited the MMN response showing that the common tone duration was extracted from a widely different set of tones (Gomes et al. 1995). In other studies, acoustic variance was introduced in a feature which was involved in the regularity to be tested. For example, Winkler et al. (1990) varied tone intensity separately within different ranges. Infrequent tones falling outside the range of variance elicited an MMN response whose amplitude was inversely related to the width of the range of variance. There are also a number of studies addressing the issue of regularity extraction from varying exemplars in which the separation between the varied and the regular feature is not so clearcut. In some studies, the regularity tested was quite complex , such as in the study presenting the natural footstep sounds (Winkler et al. 2003b) or when different exemplars of the same vowel were contrasted with a different vowel (Aulanko et al. 1993; Sandridge and Boothroyd 1996). In other studies, the varied feature was closely linked to the regularity. The latter type of paradigms is best exemplified by studies testing whether the common direction or size of the pitch interval within a series of tone pairs can be extracted from tone pairs varying in absolute pitch (see Figure 7 for an illustration of this paradigm). These studies found that infrequent deviations from the common pitch direction or interval size elicit the MMN response in adults (Paavilainen et al. 1999, 2003; Saarinen et al. 1992) as well as in newborn babies (Carral et al. 2005; Stefanics et al. 2009). There are also regularities which are defined by the relationship between sounds in the sequence. That is, variance itself can be regular. Paavilainen and his colleagues (2001) presented tone sequences in which most sounds conformed to a rule defined as “the higher the pitch, the higher (or lower) the intensity of the tone”. Occasional tones violating the rule elicited MMN. Finally, rules based on learned categories or algorithms also provide means to generalize across different sounds. We already mentioned the result showing that professional musicians extract regularities based on the number of sounds within tonal groups varying in pitch (van Zuijen et al. 2005; see Figure 5). Several demonstrations of regularity representations based on learned categories have been obtained using speech stimuli. For example, Phillips and his colleagues (2000)
89
90 István Winkler
Figure 7. Schematic illustration of paradigms testing whether common pitch direction or the size of a pitch interval is extracted from tone pairs varying in absolute pitch. Tones are shown as filled squares. Tone pairs are connected for easier visualization. Regular pairs are marked with “S” (standard), whereas irregular ones with “D” (deviant).
presented synthesized consonants in the ‘d’‑’t’ voice onset time continuum. When the majority of exemplars fell on one side of the border between ‘d’ and ‘t’ the ones falling into the less frequent category elicited MMN (for a review of similar long‑term learning effects, see Näätänen et al. 2001). Thus it appears that regularity representations are formed by extracting what is common amongst different stimuli. Stimuli, which have the features defining the regularity, are absorbed by the regularity representation. In contrast, stimuli that do not possess the features included in the regularity representation are treated as deviants. Thus the regularity representation generalizes across different instances exemplifying the common feature whether the varying feature is relevant or irrelevant for the given regularity. The formation of regularity representations can be aided by learned algorithms or categories. These characteristics of the regularity representations inferred from studies of auditory deviance detection are in line with what is expected of perceptual object representations.
In search for auditory object representations
Figure 8. Schematic illustration of the effects of presentation rate and frequency ratio in the auditory streaming paradigm (see Figure 3). See text for details. (Adapted with permission from Denham and Winkler 2005.)
3.4
Predictive regularity representations and auditory streaming
As we stated before, the auditory streaming paradigm (van Noorden 1975; Figure 3) has been used extensively to test how coherent sound sequences are segregated in perception. The effects of stimulation parameters on the perceptual organization emerging after listening to short tone sequences is one of the most reliable findings of auditory scene analysis (Figure 8). van Noorden found three distinct areas in the frequency-ratio/SOA parameter space. Below the “Fission boundary”, participants could perceive the galloping rhythm (integrated percept), but not two parallel streams. Above the “Temporal coherence boundary”, participants could only perceive the sequence in terms of two separate streams of sound (segregated percept). Between the two boundaries, participants could voluntarily choose one or the other percept (ambiguous area). These findings have been taken to reflect the strength of the support for the two alternative sound organizations within the second phase of auditory scene analysis (Bregman 1990). When the large majority of the primitive heuristic processes support one of the possible organizations, the resulting percept is fully determined by stimulus‑driven (bottom‑up) processes. When, however, support for two or more alternatives is comparably strong, top‑down influences can tip the balance either way.
91
92
István Winkler
Another apparently reliable finding is that participants almost never report perceiving segregation at the beginning of the tone sequences. The probability of the segregated percept increases during the first 5–15 seconds of the sequence. This has been termed the build‑up of auditory streaming (Bregman 1990). Neural correlates of stream build‑up have been obtained by Snyder and his colleagues (2006). It has been assumed that initially, the brain attempts to integrate all incoming sound into a single stream. Streams are only segregated when sufficient evidence has been gathered in support of the presence of two or more separate coherent sequences being mixed together in the auditory input. Furthermore, it was also assumed that once the build‑up period is over, perception of the sound sequence remains constant, at least in the unambiguous areas of the parameter space. However, findings supporting the build‑up of auditory streaming may have been influenced by the methodological aspects of the experiments. Many studies of this phenomenon assumed that participants can only experience the sound sequence in terms of one of two distinct organizations: either hearing the galloping rhythm or, in parallel, a low and a high repeating sound sequence, each with its own presentation rate. As a consequence, some of the studies did not distinguish between the lack of segregation and integration or forced participants to choose between the two pre‑defined precepts, even when their actual perception did not match either one of them. In a pilot study (reported in Denham et al. in press), participants reported a number of different percepts during long (4 minutes) auditory streaming sequences. Some of them could be categorized as integrated, because they included a repeating pattern including both low and high tones; others as segregated, because they only included repeating patterns, separately made up of either only low or only high tones; multiple repeating patterns, at least one of which was integrated and another segregated (“both” percept); and sometimes participants heard no repeating pattern at all. Thus the assumption of two mutually exclusive distinct percepts is probably not valid. Furthermore, recent studies found that, when participants listen to long sequences of the auditory streaming stimulus paradigm, perception does not settle on a final stable organization after the build‑up period (Denham et al. in press; Denham & Winkler 2006; Pressnitzer and Hupé 2005, 2006; Winkler et al. 2005), not even in the unambiguous areas of the parameter space (Denham et al. in press). Thus auditory streaming appears to be a bi‑ or multi‑stable phenomenon, similarly to those found in visual perception (for reviews, see Blake and Logothetis 2002; Rees et al. 2002), such as binocular rivalry (for a review, see Tong et al. 2006). Denham and Winkler (2005) formulated a new hypothesis to explain auditory stream segregation and, specifically, the results obtained in the auditory streaming paradigm. We suggested that auditory streaming can be described in terms of competition between predictions based on two or more alternative regularity
In search for auditory object representations
representations. A sequence of two alternating sounds can be described by at least two principally different rules. One rule makes prediction about the sound immediately following the current one: “Sound “A” will be followed by sound “B” and vice verse”. Another possible rule makes predictions to non‑adjacent sounds: “Every second sound is “A”, while every other is “B”. It is easy to see that in the auditory streaming paradigm, the first rule corresponds to the integrated, whereas the second to the segregated sound organization (see Figure 3). In general, regularity representations are based on links between sounds which are used to predict what sound will be encountered in the future. Following the chain of predictions for a given regularity creates a coherent (regular) sound stream. The hypothesis put forward by Denham and Winkler (2005) is that auditory streams are based on such regularity representations. Furthermore, we assume that regularities providing different predictions (forming different directed links) are incompatible with each other and thus vie for dominance. Competition continues as long as incompatible regularity representations exist in the brain. The organization based on the momentarily stronger set of regularities appears in perception. The strength of a link between two sounds is determined by the temporal and spatial overlap between the stimulus after‑effects in the brain. That is, given the well‑known tonotopic organization in many parts of the auditory system, two tones close to each other in frequency will activate more neurons in common than two tones more separated in frequency. Furthermore, when a sound follows another within a short period of time, it encounters stronger lingering effect of the previous sound than if it followed the other sound after a longer time. These and similar simple principles govern how fast a given regularity can be discovered by the brain and how strong the links between sounds (the regularity representation) is. Many of the grouping principles initially described by researchers of the Gestalt school of psychology (see, e.g., Köhler 1947) can be easily translated into the language of the nervous system this way. In support of the above description of auditory streaming, Denham et al. (in press) found in the auditory streaming paradigm that with very fast presentation rates and large frequency separation between the two tones, segregation often emerges as the first reported percept. That is, no build‑up can be observed. The competition between regularity representations hypothesis provides an explanation to this phenomenon. When the frequency separation between the two tones is large, there is little overlap between their neural after‑effects. Thus in the auditory streaming tone sequence, links between adjacent sounds will be weak and take longer time to build. Links between identical tones are not dependent on the frequency separation between the two tones. However, building these links is very sensitive to the interval separating successive identical tones. With a fast presentation rate, the time separating successive identical tones is short. Thus the
93
94 István Winkler
regularities based on links between identical tones will be strong and can be discovered fast. These two effects together explain, why segregation (the organization based on links between identical tones) appears as the first percept in an auditory streaming sequence presented at a high rate and having large frequency separation between the two tones. Another piece of evidence supporting the competition between regularity representations hypothesis comes from observing the rate of switching between integrated and segregated representations as a function of the stimulation parameters. Traditional descriptions of auditory streaming assume that no or very small amount of switching should occur in the unambiguous areas of the parameter space. Switching may be present in the ambiguous area. In contrast, the regularity competition hypothesis suggests that most switching should occur when both regularity representations are strong. That is, most switching should occur when the frequency separation between the tones is small (strengthening the links between adjacent tones; the integrated percept) and the presentation rate is high (strengthening the links between identical tones; the segregated percept). Although part of this parameter range falls into the ambiguous area, (1) it also extends to unambiguous areas (see Figure 8) and (2) a large part of the ambiguous area has low to medium frequency separation and medium to slow presentation rates. It was found (Denham et al. in press) that the highest rate of switching falls exactly to the area predicted by the regularity competition hypothesis, whereas the amount of switching in a large part of the ambiguous area is much lower (see Figure 9). The interpretation of Denham et al.’s (2005, in press) results suggests that the auditory system forms several regularity representations for the same sound sequence in parallel. They also argue that the regularity representations must provide predictions about sounds occurring later in the sequence. Can we then find signs of that (a) sound sequences are described in parallel by multiple regularity representations in the brain and (b) regularity representations are predictive? As to the first question, Horváth and colleagues (2001) showed that representations for the two regularities described above for tone alternation (links between adjacent and links between identical tones) are active in parallel in the human auditory system. These authors showed that violating either one of the two regularities (without violating the other one) elicits the MMN response. Several studies found results compatible with the predictive nature of the regularity representation underlying the MMN response (Horváth et al. 2001; Paavilainen et al. 2007; Tervaniemi et al. 1994a and b). The most convincing evidence was obtained by Paavilainen and colleagues (2007) who showed that MMN is elicited by occasionally violating a prediction between adjacent sounds. Sequences were constructed from short low, short high, long low, and long high tones. Short tones were usually followed by low, long ones by high tones. The short-long attribute was varied randomly with
In search for auditory object representations
Figure 9. Group-mean distribution of perceptual switching as a function of frequency separation (df in semitones) and the stimulus onset asynchrony (SOA in milliseconds). The grey scale indicates the mean number of switches across participants accumulated throughout the tone trains, separately for each combination of parameters. Note that the surface is interpolated between the discrete experimental data points indicated by the small empty circles. The temporal coherence and the fission boundary are marked by dashed lines.
equal probabilities. Occasional high tones following short tones and low tones following long tones elicited the MMN response. Because in this paradigm, MMN could only be elicited if low tones became expected following a short tone and high tones after a long tone, Paavilainen et al.’s results (since then replicated by Bendixen et al. 2008) strongly argue for predictions being generated on the basis of the regularity representations involved in MMN generation. Recently, direct ERP evidence of the predictive nature of auditory regularity representations has been obtained by Bendixen and colleagues (2009). In this section, we suggested a description of auditory streaming in terms of competition between regularity representations producing incompatible predictions. Evidence favoring this hypothesis over the traditional description of auditory streaming has been reviewed. Finally, we showed that the competing regularity representations are maintained in parallel in the auditory system and that predictions based on these regularity representations are utilized by the auditory deviance detection process indexed by the MMN response.
95
96 István Winkler
3.5
A conceptual model of auditory object formation
In the afore going, we argued that predictive regularity representations are formed by the human auditory system and updated whenever their prediction is mismatched by the sound input. Updating initiated by such mismatches is reflected by the MMN event‑related potential (Winkler 2007). We have also shown that the properties of these regularity representations match those expected from auditory objects and that auditory stream segregation, the mechanism by which auditory objects are separated from each other within the sound input, can be conceptualized as competition between predictive regularity representations. Finally, the notion of regarding predictive regularity representations as the basic building bricks of perception is fully compatible with Gregory’s (1980) theory of perception, which suggests that perception is a constructive process akin to the generation of predictive scientific hypotheses. That is, the objects appearing in perception contain descriptive information not actually present in the sensory input, such as unseen parts of a visually presented object. In the same manner, predictive auditory regularity representations provide assumptions regarding the continuation of the sound object (Winkler et al. 2009). Figure 10 illustrates the outline of a system compatible with the above notion of auditory objects. The upper half of the figure depicts the two‑stage model of auditory scene analysis. Partly analyzed auditory information appears at the input of auditory scene analysis (upper left arrow in Figure 10). The heuristic grouping algorithms assumed by Bregman’s (1990) theory are then activated. The upper left box of the figure includes the distinction between immediate (spectral) and sequential (temporal) grouping processes. These processes provide candidate groupings for the second stage of auditory scene analysis. In the second stage compatible candidate groupings form coalitions. Support for these groups is then compared to establish the dominant (most likely) solution (upper right box). The output of the second stage of auditory scene analysis contains sound information organized in terms of auditory objects (upper right arrow). The lower half of the figure depicts a system that maintains the temporal/sequential regularities detected within the preceding sound input. Such regularities (lower left box) are used to provide predictions for the sequential grouping algorithms. These predictions carry with them a weight measure which informs the system about the observed reliability (confidence) of the given regularity. This information is then used in resolving the competition between alternative sound organizations. That is, the support provided by different algorithms for the alternative organization(s) they are compatible with is not fixed. An algorithm whose prediction has been consistently found to be valid within the actual auditory scene, will provide more support for the sound organization it is compatible with than one whose prediction is
In search for auditory object representations
Figure 10. Schematic illustration of object formation and maintenance in the auditory system. See text for details. (Adapted with permission from Winkler 2007.)
often off the mark. It should be noted that the weight of predictions is influenced by several factors outside the process reflected by MMN. For example, initial confidence levels probably depend on long‑term experience and may even be specific to the actual context (e.g., certain algorithms may have been proven more reliable in often recurring situations, etc.). Furthermore, top‑down effects may also bias the choice between alternative sound organizations (e.g., active search for a sound pattern). Finally, longer‑term adaptation to repeating sounds may result in switches between alternative sound organizations, similarly to the assumptions made in modeling bi‑stable perceptual phenomena in the visual modality (e.g., Klink et al. 2008; Noest et al. 2007). However, should there be a change of the auditory scene, previously unreliable algorithms may become more reliable while the previously reliable ones may become less reliable. The proposed system thus shows continuous adaptation to the changes in the acoustic environment. Adaptation of the weights is achieved by feedback from the stage, where sound organization is finalized. We distinguish three possible outcomes. When the prediction from a stored regularity is matched by the input, its confidence weight is maintained. When it is mismatched, the confidence weight is reduced. Finally, there are often elements within the auditory input which have not been predicted by any of the existing regularity representations. This is typically the case, when a new sound source becomes active in the
97
98 István Winkler
environment. Thus that part of the incoming sound, for which no prediction has been provided by any of the existing regularity representations, is analyzed in order to discover new regularities. This description is fully compatible with the principle of the old-plus-new strategy (Bregman 1990). The old-plus-new principle suggests that the auditory system first extracts those parts of the sound input which appear to continue previously established auditory streams then treats the remaining sounds as the onset of a new stream. In the above suggested model, the regularity representations extracted from the preceding sound input can be regarded as building bricks of auditory perceptual objects. Typically, several such regularity representations are involved in forming any perceived sound object. The set of stored predictive regularities form a low‑level model of the auditory environment, which represents what kind of sensory regularities have been detected within the current auditory scene. The MMN ERP response is marked on the figure (bottom center), showing how it relates to the overall function of the system (i.e., it is activated, when the confidence weight of a regularity representation is decreased; cf. Winkler and Czigler 1998). If MMN is derived from the outcome of auditory scene analysis which places each sound into the context of the whole scene, one should expect MMN to show effects not only of the immediately preceding sound sequence, but also of the context of the whole scene. Such auditory context effects have been observed for MMN (for a review, see Sussman 2007). For example, it was shown that a given sound is either regarded as regular or irregular, but not both at the same time and that the attribute of “being regular” is established at the level of the whole sequence (Sussman et al. 2003).5 Furthermore, deviations are evaluated in accordance with their information content within the global stimulus sequence (Sussman and Winkler 2001). Sussman and Winkler (2001) found that when deviants only occur in pairs delivered within ca. 170 ms within a sound sequence (double deviants), only the first of the two successive deviants elicits MMN. However, introducing single deviants into the stimulus sequence results in double deviants eliciting two successive MMNs; subsequent removal of the single deviants dynamically reverts the system into the state, in which double deviants again elicit only one MMN. Finally, as was already discussed, sounds are only evaluated within the auditory stream they belong to. That is, a sound only elicits MMN, when it violates some regularity within its own stream, but not when violating some regularity of another stream (De Sanctis et al. 2008; Ritter et al. 2000, 2006). Flexibility of the proposed system requires fast establishment of regularity representations, whereas its stability requires that established regularity representations should be retained for relatively long periods of time, even if their predictions prove to be false for some time. In accordance with these expectations,
In search for auditory object representations
MMN studies showed that regularity representations are established by very few (2–3) regular sounds (Bendixen et al. 2007; Cowan et al. 1993; Horváth et al. 2001; Schröger 1997; Winkler et al. 1996b). In contrast, established regularities have been shown to survive several successive deviant events (Winkler et al. 1996a and b) and can be reactivated by a single reminder even after quite long breaks (e.g., 30 s; Winkler et al. 2002; for a review, see Winkler and Cowan 2005). Are we aware of the regularity representations stored for a given auditory scene? Not necessarily so. MMN elicitation was observed to violating rules participants were not aware of even when they attended the sound sequence and were asked to detect the deviants (Paavilainen et al. 2007; Paavilainen et al. 2003; van Zuijen et al. 2006). However, training to perceive a regularity enhances the MMN response (e.g. Näätänen et al. 1993) and the MMN response reflects various long‑term learning effects (for a review, see Näätänen et al. 2001). Furthermore, voluntary selection of a given sound organization (when this is permitted by the ambiguous auditory make‑up of the sequence) can govern the elicitation of MMN (Sussman et al. 2002; Winkler et al. 2006). Thus the formation of auditory perceptual objects is at the crossroad of conscious and unconscious processes. Although it is largely stimulus‑driven and often impenetrable for conscious processes, it is indirectly affected by explicit learning and, under specific circumstances, can be directly affected by intentions. In summary, we proposed a framework for conceptualizing the formation of perceptual auditory objects. We suggested that auditory objects are built from predictive regularity representations which are extracted from the ongoing auditory input and continuously updated with respect to their predictive success. Updating is at least partly done by a system detecting violations of auditory regularities. The updating process, and through it the auditory regularity representations stored in the brain can be studied with the mismatch negativity (MMN) event‑related brain potential. Predictive object representations provide obvious advantages for humans, because they allow actions to adapt to the future, thus aligning their effect with the evolving state of the environment.
Acknowledgements This research was supported by the European Commission’s 7th Framework Programme for ‘‘Information and Communication Technologies’’ (project title: SCANDLE, acoustic SCene ANalysis for Detecting Living Entities, contract no.: 231168). I am grateful to Dr. Alexandra Bendixen for the helpful comments on an earlier version of this paper.
99
100 István Winkler
Notes 1. By unconscious we mean that the process is not reportable, but it has immediate or remote behavioral and/or psychophysiological consequences. 2. Although this property of perception has been regarded as an important argument for the modularity of perceptual processes (Fodor 1983), the current treatment does not assume the modularity point of view. 3. In an auditory oddball sequence, one sound is presented with high probability (typically > 75%), whereas one or more other sounds are presented with low probabilities. The order of different sounds is usually randomized. 4. However, please note that the two functions are not mutually exclusive. 5. Note, however, that conflicts may arise from opposite predictions based on the auditory and e.g., the visual context (Widmann et al. 2004), although such conflicts may not affect the MMN (Ritter et al. 1999).
References Alain, C., Arnott, S. R. and Picton, T. W. (2001). Bottom up and top down influences on auditory scene analysis: Evidence from event related brain potentials. Journal of Experimental Psychology: Human Perception and Performance, 27, 1072–1089. Alain, C. and McDonald, K. L. (2007). Age-related differences in neuromagnetic brain activity underlying concurrent sound perception. Journal of Neuroscience, 27, 1308–1314. Alain, C., Schuler, B. M. and McDonald, K. L. (2002). Neural activity associated with distinguishing concurrent auditory objects. Journal of the Acoustical Society of America, 111, 990–995. Aulanko, R., Hari, R., Lounasmaa, O. V., Näätänen, R. and Sams, M. (1993). Phonetic invariance in the human auditory cortex. NeuroReport, 4, 1356–1358. Belin, P. and Zatorre, R. J. (2000). ‘What’, ‘where’ and ‘how’ in auditory cortex. Nature Neuroscience, 3, 965–966. Bendixen, A., Prinz, W., Horváth, J., Trujillo-Barreto, N. J. and Schröger, E. (2008). Rapid extraction of auditory feature contingencies. NeuroImage, 41, 1111–1119. Bendixen, A., Roeber, U. and Schröger, E. (2007). Regularity extraction and application in dynamic auditory stimulus sequences. Journal of Cognitive Neuroscience, 19, 1664–1677. Bendixen, A., Schröger, E. and Winkler, I. (2009). I heard that coming: ERP evidence for stimulus-driven prediction in the auditory system. Journal of Neuroscience, 29, 8447–8451. Blake, R. and Logothetis, N. K. 2002. Visual competition. Nature Reviews Neuroscience, 3, 13– 21. Bregman, A. S. (1990). Auditory scene analysis: The perceptual organization of sound. Cambridge, MA.: MIT Press. Bregman, A. S., Ahad, P. A., Crum, P. A. C. and O’Reilly, J. (2000). Effects of time intervals and tone durations on auditory stream segregation. Perception & Psychophysics, 62, 626– 636. Brunswik, E. (1955). In defense of probabilistic functionalism: A reply. Psychological Review, 62, 236–242.
In search for auditory object representations 101
Carlyon, R. P., Cusack, R., Foxton, J. M., and Robertson, I. H. (2001). Effects of attention and unilateral neglect on auditory stream segregation. Journal of Experimental Psychology: Human Perception and Performance, 27, 115–127. Carral, V., Huotilainen, M., Ruusuvirta, T., Fellman, V., Näätänen, R. and Escera, C. (2005). A kind of auditory ‘primitive intelligence’ already present at birth. European Journal of Neuroscience, 21, 3201–3204. Cowan, N. (1984). On short and long auditory stores. Psychological Bulletin, 96, 341–370. Cowan, N., Winkler, I., Teder, W. and Näätänen, R. (1993). Short and long term prerequisites of the mismatch negativity in the auditory event related potential (ERP). Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 909–921. Cusack, R., Deeks, J., Aikman, G., and Carlyon, R. P. (2004). Effects of location, frequency region, and time course of selective attention on auditory scene analysis. Journal of Experimental Psychology: Human Perception and Performance, 30, 643–656. De Sanctis, P., Ritter, W., Molholm, S., Kelly, S. P. and Foxe, J. J. (2008). Auditory scene analysis: The interaction of stimulation rate and frequency separation on pre-attentive grouping. European Journal of Neuroscience, 27, 1271–126. Denham, S. L., Gyimesi, K., Stefanics, G. and Winkler, I. (in press). Stability of perceptual organisation in auditory streaming. In E. A. Lopez-Poveda, A. R. Palmer and R. Meddis (Eds.), Advances in auditory research: Physiology, psychophysics and models. New York: Springer. Denham, S. L. and Winkler, I. (2006). The role of predictive models in the formation of auditory streams. Journal of Neurophysiology, Paris, 100, 154–170. Deutsch, D. (1982). Grouping mechanisms in music. In D. Deutsch (Ed.), The psychology of music. New York: Academic Press. Dowling, W. J. (1973). Rhythmic groups and subjective chunks in memory for melodies. Perception & Psychophysics, 14, 37 40. Duncan, J. and Humphreys, G. (1989). Visual search and stimulus similarity. Psychological Review, 96, 433–458. Escera, C. and Corral, M. J. (2007). Role of mismatch negativity and novelty-p3 in involuntary auditory attention. Journal of Psychophysiology, 21, 251–264. Fabiani, M., Gratton, G. and Coles, M. G. H. (2000). Event related brain potentials. In J. T. Cacioppo, L. G. Tassinary and G. G. Bernston (Eds.), Handbook of psychophysiology (2nd edition, pp. 53–84). Cambridge, MA: Cambridge University Press. Fodor, J. (1983). The Modularity of Mind. Cambridge, MA: MIT Press. Fowler, C. A. and Rosenblum, L. D. (1990). Duplex perception: a comparison of monosyllables and slamming doors. Journal of Experimental Psychology: Human Perception and Performance, 16, 742–754. Gomes, H., Bernstein, R., Ritter, W., Vaughan, H. G., Jr. and Miller, J. (1997). Storage of feature conjunctions in transient auditory memory. Psychophysiology, 34, 712–716. Gomes, H., Ritter, W. and Vaughan, H. G., Jr. (1995). The nature of preattentive storage in the auditory system. Journal of Cognitive Neuroscience, 7, 81–94. Gregory, R. L. (1980). Perceptions as hypotheses. Philosophical Transactions of the Royal Society of London – Series B: Biological Sciences, 290, 181–197. Griffiths, T. D. and Warren, J. D. (2004). What is an auditory object? Nature Reviews Neuroscience, 5, 887–892. Hall, M. D., Pastore, R. E., Acker, B. E. and Huang, W. (2000). Evidence for auditory feature integration with spatially distributed items. Perception & Psychophysics, 62, 1243–1257.
102 István Winkler
Helmholtz, H. L. (1867/1910). Handbuch der physiologischen Optik. Leipzig: L. Voss. Reprinted, with extensive commentary, in A. Gullstrand, J. von Kries and W. Nagel (Eds.), Handbuch der physiologischen Optik (3rd edition). Hamburg and Leipzig: L. Voss. Horváth, J., Czigler, I., Sussman, E. and Winkler, I. (2001). Simultaneously active pre-attentive representations of local and global rules for sound sequences. Cognitive Brain Research, 12, 131–144. Horváth, J., Czigler, I., Winkler, I. and Teder-Sälejärvi, W. A. (2007). The temporal window of integration in elderly and young adults. Neurobiology of Aging, 28, 964–975. Jacobsen, T., Schröger, E., Winkler, I. and Horváth, J. (2005). Familiarity affects the processing of task-irrelevant ignored sounds. Journal of Cognitive Neuroscience, 17, 1704–1713. James, W. (1890). The principles of psychology. New York: Holt. Klink, P. C., van Ee, R., Nijs, M. M., Brouwer, G. J., Noest, A. J. and van Wezel, R. J. A. (2008). Early interactions between neuronal adaptation and voluntary control determine perceptual choices in bistable vision. Journal of Vision, 8, 16.1–18. Köhler, W. (1947). Gestalt Psychology. New York: Liveright. Kubovy, M. and Van Valkenburg, D. (2001). Auditory and visual objects. Cognition, 80, 97– 126. Kujala, T., Tervaniemi, M. and Schröger, E. (2007). The mismatch negativity in cognitive and clinical neuroscience: Theoretical and methodological considerations. Biological Psychology, 74, 1–19. Müller, D., Widmann, A. and Schröger, E. (2005). Auditory streaming affects the processing of successive deviant and standard sounds. Psychophysiology, 42, 668–676. Näätänen, R. (1990). The role of attention in auditory information processing as revealed by event related potentials and other brain measures of cognitive function. Behavioral and Brain Sciences, 13, 201–288. Näätänen, R., Gaillard, A. W. K. and Mäntysalo, S. (1978). Early selective attention effect on evoked potential reinterpreted. Acta Psychologica, 42, 313–329. Näätänen, R., Schröger, E., Karakas, S., Tervaniemi, M. and Paavilainen, P. (1993). Development of a memory trace for a complex sound in the human brain. NeuroReport, 4, 503–506. Näätänen, R., Tervaniemi, M., Sussman, E., Paavilainen, P. and Winkler, I. (2001). ‘Primitive intelligence’ in the auditory cortex. Trends in Neurosciences, 24, 283–288. Näätänen, R. and Winkler, I. (1999). The concept of auditory stimulus representation in cognitive neuroscience. Psychological Bulletin, 125, 826–859. Nager, W., Teder-Sälejärvi, W., Kunze, S. and Münte, T. F. (2003). Preattentive evaluation of multiple perceptual streams in human audition. NeuroReport, 14, 871–874. Noest, A. J., van Ee, R., Nijs, M. M. and van Wezel, R. J. A. (2007). Percept-choice sequences driven by interrupted ambiguous stimuli: A low-level neural model. Journal of Vision, 7, 10.1–14. Nordby, H., Roth, W. T. and Pfefferbaum, A. (1988). Event related potentials to breaks in sequences of alternating pitches or interstimulus intervals. Psychophysiology, 25, 262–268. Paavilainen, P., Arajärvi, P. and Takegata, R. (2007). Preattentive detection of nonsalient contingencies between auditory features. NeuroReport, 18, 159–163. Paavilainen, P., Degerman, A., Takegata, R. and Winkler, I. (2003). Spectral and temporal stimulus characteristics in the processing of abstract auditory features. NeuroReport, 14, 715–718.
In search for auditory object representations 103
Paavilainen, P., Jaramillo, M., Näätänen, R. and Winkler, I. (1999). Neuronal populations in the human brain extracting invariant relationships from acoustic variance. Neuroscience Letters, 265, 179–182. Paavilainen, P., Simola, J., Jaramillo, M., Näätänen, R. and Winkler, I. (2001). Preattentive extraction of abstract feature conjunctions from auditory stimulation as reflected by the mismatch negativity (MMN). Psychophysiology, 38, 359–365. Phillips, C., Pellathy, T., Marantz, A., Yellin, E., Wexler, K., Poeppel, D., McGinnis, M. and Roberts, T. (2000). Auditory cortex accesses phonological categories: An MEG mismatch study. Journal of Cognitive Neuroscience, 12, 1038–1055. Pressnitzer, D. and Hupé, J. M. (2005). Is auditory streaming a bistable percept? Forum Acusticum, Budapest, pp. 1557–1561. Pressnitzer, D. and Hupé, J. M. (2006). Temporal dynamics of auditory and visual bistability reveal common principles of perceptual organization. Current Biology, 16, 1351–1357. Pulvermüller, F. and Shtyrov, Y. (2006). Language outside the focus of attention: the mismatch negativity as a tool for studying higher cognitive processes. Progress in Neurobiology, 79, 49–71. Rand, T. C. (1974). Dichotic release from masking for speech. Journal of the Acoustical Society of America, 55, 678–680. Rauschecker, J. P. and Tian, B. (2000). Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proceedings of the National Academy of Sciences USA, 97, 11800–11806. Rees, G., Kreiman, G., Koch, C. (2002). Neural correlates of consciousness in humans. Nature Reviews Neuroscience, 3, 261–270. Regan, D. (1989). Human Brain Electrophysiology: Evoked Potentials and Evoked Magnetic Fields in Science and Medicine. New York: Elsevier. Ritter, W., DeSanctis, P., Molholm, S., Javitt, D. C. and Foxe, J. J. (2006). Preattentively grouped tones do not elicit MMN with respect to each other. Psychophysiology, 43, 423–430. Ritter, W., Sussman, E., Deacon, D., Cowan, N. and Vaughan, H. G., Jr. (1999). Two cognitive systems simultaneously prepared for opposite events. Psychophysiology, 36, 835–838. Ritter, W., Sussman, E. and Molholm, S. (2000). Evidence that the mismatch negativity system works on the basis of objects. NeuroReport, 11, 61–63. Rubin, E. (1915). Synoplevede Figurer. Copenhagen: Gyldendalske. Saarinen, J., Paavilainen, P., Schröger, E., Tervaniemi, M. and Näätänen, R. (1992). Representation of abstract attributes of auditory stimuli in the human brain. NeuroReport, 3, 1149– 1151. Sandridge, S. and Boothroyd, A. (1996). Using naturally produced speech to elicit mismatch negativity. Journal of the American Academy of Audiology, 7, 105–112. Scholl, B. J. (2001). Objects and attention: The state of the art. Cognition, 80, 1–46. Schröger, E. (1997). On the detection of auditory deviants: A pre-attentive activation model. Psychophysiology, 34, 245–257. Smith, D. R., Patterson, R. D., Turner, R., Kawahara, H. and Irino, T. (2005). The processing and perception of size information in speech sounds. Journal of the Acoustical Society of America, 117, 305–318. Shinozaki, N., Yabe, H., Sato, Y., Hiruma, T., Sutoh, T., Matsuoka, T. and Kaneko, S. (2003). Spectrotemporal window of integration of auditory information in the human brain. Cognitive Brain Research, 17, 563–571.
104 István Winkler
Sinkkonen, J. (1999). Information and resource allocation. In R. Baddeley, P. Hancock and P. Földiák (Eds.), Information theory and the brain (pp. 241–254). Cambridge, UK: Cambridge University Press. Snyder, J. S. and Alain, C. (2007). Toward a neurophysiological theory of auditory stream segregation. Psychological Bulletin, 133, 780–799. Snyder, J. S., Alain, C. and Picton, T. W. (2006). Effects of attention on neuroelectric correlates of auditory stream segregation. Journal of Cognitive Neuroscience, 18, 1–13. Stefanics, G., Háden, G. P., Sziller, I., Balázs, L., Beke, A. and Winkler, I. (2009). Newborn infants process pitch intervals. Clinical Neurophysiology, 120, 304–308. Sussman, E. (2005). Integration and segregation in auditory scene analysis. Journal of the Acoustical Society of America, 117, 1285–1298. Sussman, E. S. (2007). A new view on the MMN and attention debate: The role of context in processing auditory events. Journal of Psychophysiology, 21, 164–175. Sussman, E. S., Bregman, A. S., Wang, W. J. and Khan, F. J. (2005). Attentional modulation of electrophysiological activity in auditory cortex for unattended sounds within multistream auditory environments. Cognitive, Affective, & Behavioral Neuroscience, 5, 93–110. Sussman, E., Čeponienė, R., Shestakova, A., Näätänen, R. and Winkler, I. (2001). Auditory stream segregation processes operate similarly in school aged children as adults. Hearing Research, 153, 108–114. Sussman, E., Gomes, H., Nousak, J. M. K., Ritter, W. and Vaughan, H. G., Jr. (1998). Feature conjunctions and auditory sensory memory. Brain Research, 793, 95–102. Sussman, E., Horváth, J., Winkler, I. and Orr, M. (2007). The role of attention in the formation of auditory streams. Perception & Psychophysics, 69, 136–152. Sussman, E., Ritter, W. and Vaughan, H. G., Jr. (1998). Attention affects the organization of auditory input associated with the mismatch negativity system. Brain Research, 789, 130–138. Sussman, E., Ritter, W. and Vaughan, H. G., Jr. (1999). An investigation of the auditory streaming effect using event related brain potentials. Psychophysiology, 36, 22–34. Sussman, E., Sheridan, K., Kreuzer, J. and Winkler, I. (2003). Representation of the standard: Stimulus context effects on the process generating the mismatch negativity component of event-related brain potentials. Psychophysiology, 40, 465–471. Sussman, E. and Winkler, I. (2001). Dynamic sensory updating in the auditory system. Cognitive Brain Research, 12, 431–439. Sussman, E., Winkler, I., Huotilainen, M., Ritter, W. and Näätänen, R. (2002). Top-down effects on stimulus-driven auditory organization. Cognitive Brain Research, 13, 393–405. Takegata, R., Brattico, E., Tervaniemi, M., Varyiagina, O., Näätänen, R. and Winkler, I. (2005). Pre‑attentive representation of feature conjunctions for simultaneous, spatially distributed auditory objects. Cognitive Brain Research, 25, 169–179. Takegata, R., Paavilainen, P., Näätänen, R. and Winkler, I. (1999). Independent processing of changes in auditory single features and feature conjunctions in humans as indexed by the mismatch negativity. Neuroscience Letters, 266, 109–112. Tervaniemi, M. and Huotilainen, M. (2003). The promises of change-related brain potentials in cognitive neuroscience of music. Neurosciences and Music, 999, 29–39. Tervaniemi, M., Maury, S. and Näätänen, R. (1994a). Neural representations of abstract stimulus features in the human brain as reflected by the mismatch negativity. NeuroReport, 5, 844–846.
In search for auditory object representations 105
Tervaniemi, M., Saarinen, J., Paavilainen, P., Danilova, N. and Näätänen, R. (1994b). Temporal integration of auditory information in sensory memory as reflected by the mismatch negativity. Biological Psychology, 38, 157–167. The Merriam‑Webster Dictionary (1997). Springfield, MA: Merriam‑Webster Inc. Thompson, W. F., Hall, M. D. and Pressing, J. (2001). Illusory conjunctions of pitch and duration in unfamiliar tone sequences. Journal of Experimental Psychology: Human Perception and Performance, 27, 128–140. Tong, F., Meng, M. and Blake, R. (2006). Neural bases of binocular rivalry. Trends in Cognitive Sciences, 10, 502–511. Treisman, A. (1993). The perception of features and objects. In A. Baddeley and L. Weiskrantz (Eds.), Attention: Selection, awareness, & control. A Tribute to Donald Broadbent (pp. 5–35). Oxford: Clarendon Press. Treisman, A. M. and Gelade, G. A. (1980). A feature integration theory of attention. Cognitive Psychology, 12, 97–136. Treisman, A. M. and Schmidt, H. (1982). Illusory conjunctions in the perception of objects. Cognitive Psychology, 14, 107–141. van Noorden, L. P. A. S. (1975). Temporal coherence in the perception of tone sequences. PhD thesis, Eindhoven. van Zuijen, T. L., Sussman, E., Winkler, I., Näätänen, R. and Tervaniemi, M. (2005). Auditory organization of sound sequences by a temporal or numerical regularity: A mismatch negativity study comparing musicians and non-musicians. Cognitive Brain Research, 23, 270–276. van Zuijen, T. L., Simoens, V. L., Paavilainen, P., Näätänen, R. and Tervaniemi, M. (2006). Implicit, intuitive, and explicit knowledge of abstract regularities in a sound sequence: An event-related brain potential study. Journal of Cognitive Neuroscience, 18, 1292–1303. Widmann, A., Kujala, T., Tervaniemi, M., Kujala, A. and Schröger, E. (2004). From symbols to sounds: Visual symbolic information activates sound representations. Psychophysiology, 41, 709–715. Wightman, F. L. and Jenison, R. (1995). Auditory spatial layout. In W. Epstein and S. J. Rogers (Eds.), Perception of space and motion (2nd edition, pp. 365–400). San Diego, CA: Academic Press. Winkler, I. (2003). Change detection in complex auditory environment: Beyond the oddball paradigm. In J. Polich (Ed.), Detection of change: Event-related potential and fMRI findings (pp. 61–81). Boston: Kluwer Academic Publishers. Winkler, I. (2007). Interpreting the mismatch negativity (MMN). Journal of Psychophysiology, 21, 147–163. Winkler, I. and Cowan, N. (2005). From sensory memory to long term memory: Evidence from auditory memory reactivation studies. Experimental Psychology, 52, 3–20. Winkler, I., Cowan, N., Csépe, V., Czigler, I. and Näätänen, R. (1996). Interactions between transient and long-term auditory memory as reflected by the mismatch negativity. Journal of Cognitive Neuroscience, 8, 403–415. Winkler, I. and Czigler, I. (1998). Mismatch negativity: Deviance detection or the maintenance of the “standard”. NeuroReport, 9, 3809–3813. Winkler, I., Czigler, I., Sussman, E., Horváth, J. and Balázs, L. (2005). Preattentive binding of auditory and visual stimulus features. Journal of Cognitive Neuroscience, 17, 320–339.
106 István Winkler
Winkler, I., Denham, S. L. and Nelken, I. (2009). Modeling the auditory scene: Predictive regularity representations and perceptual objects. Trends in Cognitive Sciences, 13, 532–540. Winkler, I., Horváth, J., Teder Sälejärvi, W. A., Näätänen, R. and Sussman, E. (2003a). Human auditory cortex tracks task-irrelevant sound sources. NeuroReport, 14, 2053–2056. Winkler, I., Karmos, G. and Näätänen, R. (1996a). Adaptive modeling of the unattended acoustic environment reflected in the mismatch negativity event related potential. Brain Research, 742, 239–252. Winkler, I., Korzykov, O., Gumenyuk, V., Cowan, N., Linkenkaer-Hansen, K., Alho, K., Ilmoniemi, R. J. and Näätänen, R. (2002). Temporary and longer retention of acoustic information. Psychophysiology, 39, 530–534. Winkler, I., Kushnerenko, E., Horváth, J., Čeponienė, R., Fellman, V., Huotilainen, M., Näätänen, R. and Sussman, E. (2003b). Newborn infants can organize the auditory world. Proceedings of the National Academy of Sciences USA, 100, 1182–1185. Winkler, I., Paavilainen, P., Alho, K., Reinikainen, K., Sams, M. and Näätänen, R. (1990). The effect of small variation of the frequent auditory stimulus on the event‑related brain potential to the infrequent stimulus. Psychophysiology, 27, 228–235. Winkler, I., Sussman, E., Tervaniemi, M., Ritter, W., Horváth J. and Näätänen, R. (2003c). Pre attentive auditory context effects. Cognitive, Affective, & Behavioral Neuroscience, 3, 57–77. Winkler, I., Takegata, R. and Sussman, E. (2005). Event-related brain potentials reveal multiple stages in the perceptual organization of sound. Cognitive Brain Research, 25, 291–299. Winkler, I., van Zuijen, T., Sussman, E., Horváth, J. and Näätänen, R. (2006). Object representation in the human auditory system. European Journal of Neuroscience, 24, 625–634. Wolfe, J. M., Cave, K. R. and Franzel, S. L. (1989). Guided search: an alternative to feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance, 15, 419–433. Yabe, H., Tervaniemi, M., Sinkkonen, J., Huotilainen, M., Ilmoniemi, R. J. and Näätänen, R. (1998). The temporal window of integration of auditory information in the human brain. Psychophysiology, 35, 615–619. Yabe, H., Winkler, I., Czigler, I., Koyama, S., Kakigi, R., Sutoh, T. et al. (2001). Organizing sound sequences in the human brain: The interplay of auditory streaming and temporal integration. Cognitive Brain Research, 897, 222–227. Zendel, B. R. and Alain, C. (2009). Concurrent sound segregation is enhanced in musicians. Journal of Cognitive Neuroscience, 21, 1488–1498. Zwislocki, J. J. (1960). Theory of temporal auditory summation. Journal of the Acoustical Society of America, 32, 1046–1060.
chapter 4
Representation of regularities in visual stimulation Event-related potentials reveal the automatic acquisition István Czigler
Institute for Psychology of the Hungarian Academy of Sciences, Budapest
4.1
Introduction
In the Web of Science database 101 items can be retrieved for the term “visual consciousness” and only three publications mention the term “auditory consciousness”. It seems that the term “visual consciousness” is an accepted term in the scientific community, whereas “auditory consciousness” is not. The database does not contain the terms “visual unconsciousness” and “auditory unconsciousness”. In spite of the non-existence of these terms, a large body of studies has been devoted to non-conscious sensory phenomena. Effects of subliminal visual stimulation (e.g. priming), elimination of conscious experience (masking, crowding, etc.) are classical as well as recent research topics (for reviews see Block 2007; Dehaene and Naccache 2001; Kim and Blake 2005; Merilke 1992, 1997). In the auditory world a well-known automatic mechanism unconsciously evaluates regularities (see Winkler this volume). In this chapter I review studies indicating a similar ‘primitive intelligence’ (Näätänen et al. 2001) in vision. Necessity of such system is challenged by recent theories (O’Regan and Noë 2001), but as evidences from cognitive neuroscience show, some regularities of the visual word are preserved and compared to the representation of the ongoing stimulation. Representations of such regularities develop outside the focus of attention, and these representations are not necessarily accessible to consciousness. However, the violation of the regularities is not without behavioral consequences. First I’ll list evidences indicating the existence of a high-capacity visual memory system, thereafter some data will presented showing the possibility of implicit visual memories of longer
108 István Czigler
duration. The next part of the chapter reviews studies showing behavioral effects based on such memory representations. As the results of psychophysiological data, some characteristics of an implicit visual memory system will be discussed. I’ll argue that this memory is sensitive to the regularities of stimulation, and responds to events incompatible to such regularities. Finally, the functional significance of this system will be discussed. An explicit purpose of the chapter is to build a bridge between the “complex unconscious mind” of many branches of psychology and the “dumb unconsciousness” of cognitive psychology (Bargh and Morsella 2008).
4.2
Lack of conscious detection of visual changes and large-capacity visual memories
As the striking change blindness phenomenon shows, even large changes of visual scenes remain unnoticed if (1) the changing objects are outside the field of focal attention, and (2) no change-transients are available (for reviews see Rensink 2000; Simons and Levin 1997). There are several demonstration paradigms of the change blindness. In the popular flicker paradigm alternative scenes are presented for a short period of time (e.g. 250 ms), and a blank field of similar duration is presented between the presentations of the alternative scenes. Several accounts of the effect are available (see Simons 2000 for a review). It is possible that (1) no detailed visual memory is formed from the pre-change scene (e.g. O’Regan and Noë 2001); (2) detailed memory is formed from the pre-change scene, but this representation is annulated by the following scene (e.g. Becker et al. 2000); (3) detailed representation is formed for both pre- and post-change scenes, but no comparison is possible between these representations. This is because comparison is restricted to a limited set of the pre- and post-change scenes (e.g. Hollingworth and Henderson 2002). Finally, (4) even if there is a comparison process, except from a small subset of the pre- and post-change elements, the results of this process remain unconscious (implicit). Concerning the latter option, there are some questions: (a) Is there any proof for a high capacity memory system operating in the period between the offset of the pre-change and the onset of the post-change scene? (b) Are there other implicit memory systems operating short after the presentation of visual scenes? (c) Is there any proof that unnoticed changes influence behavior and/or brain activity? (d) What is the functional role of a system detecting changes if the results this outcome remains unconscious?1
4.2.1
Representation of regularities in visual stimulation 109
Visual memories with large capacity
Landman et al. (2003) demonstrated the existence of a large-capacity visual storage. Participants were asked to report a changing element within eight-element displays. Pre- and post-change displays were separated by blank intervals of 1500 ms. Cues indicated the location of the possible changing elements. The cues were effective if they were presented within the blank period, implicating the activity of memory storage for the whole inter-display period. However, the effectiveness of the cue greatly diminished at the onset of the second display. This result shows that the presentation of the second display overwrites the representation of the first one, at least at the level of consciousness. However, the possibility remains that the memory for the pre-change display has unconscious effects. The now-classic experiments on iconic memory (e.g. Sperling 1960) demonstrated a large-capacity visual storage. By now the term “iconic memory” became ambiguous. In the sixties iconic memory was considered as a passive non-categorical storage, the ‘raw material’ of cognitive processes. The content of iconic memory was supposed to be transferred into a short-term store. Transfer from iconic storage was considered as a relatively slow, serial process (20 ms/item, see e.g. Sperling 1967), with the consequence of item loss before the transfer to the “short-term” storage. Transfer was considered as an active process, in sense that attentional (controlled) processes were supposed to influence the order transfer. Backward visual masking (presentation of patterned stimuli after the offset of a visual array) supposedly destroys the content of iconic memory. In principle, this type of iconic memory is capable of storing the content of the pre-change arrays, but post-change stimuli would overwrite its content. Furthermore, being a passive storage, comparison processes cannot take place within the iconic memory. Data like those reported by Landman et al. (2003) corresponds to such concept of iconic/sensory memory. This is because cue effectiveness decreased at the arrival of the post-change display. The only divergence from the iconic memory results is the longer estimated lifetime of the memory in the Landman et al. (2003) study, i.e. in the traditional measures of iconic memory (partial report of the cued items) the duration of the iconic storage is shorter. Response differences (binary decision) may explain the discrepancy, but there is no proof (neither direct nor indirect) on the connection between the storage underlying partial report superiority effects and the memory underlying any kinds of change detections. Contrary to the concept of a passive iconic storage and slow acquisition of categorized (long-term) memory, an alternative view was pioneered by M. Coltheart (1980). According to his suggestion, identification (What is this?) and location (Where is it?) information is acquired quickly, and the coordination of the two types of information is automatic. Iconic memory is the result of such
110 István Czigler
coordinated information. However, the result of the coordination is fragile, and in the absence of consolidation processes it decays. Consolidation was supposed to be a capacity controlled process (i.e., influenced by top-down effects). An open question is the fate of the non-consolidated contents. If such contents decay quickly, explanations of the change-blindness phenomena built upon the ‘revised’ icon would be similar those based on the traditional view of iconic memory, i.e., except from a small subset of elements (objects), no proper row-material would be available for change detection. One may suppose that the role of the capacity limited processes is the acquisition of memory representations capable of regulating the ongoing behavior and/or influencing the content of conscious experience. However, at the same time, memory representations outside the capacity limited processes may preserve their contents for a longer period as implicit background knowledge. As a part of background knowledge, this memory system is suitable for participating in implicit change detection processes. Therefore the meaning of the term “consolidation” is not unequivocal. In the following, arguments will be presented favoring the persistence of a large-capacity visual memory. In an influential study Rock and Gutman (1981) presented two meaningless superimposed shapes. The shapes had different colors, and the task was to judge the aesthetic value of the shape with a particular color. Recognition performance of the shapes with non-attended color in an incidental (unexpected) task was at chance level. Accordingly, no explicit memory for these shapes was available. However, in a later study DeSchepper and Treisman (1996) reported evidence for an implicit memory effect of such non-attended shapes. In this experiment the participants compared the attended shapes to test shapes, and the matching RT was measured. The design involved the possibility of the negative priming effect. In the negative priming studies (for a review see Tipper 2001) previously rejected stimuli elicit responses with longer RT than control stimuli. In the DeSchepper and Treisman (1996) study in some trials the shapes in the irrelevant color were presented again, but this time in the relevant color. RT increased to such shapes, i.e. negative priming effect emerged. Accordingly, stimuli without explicit memory representation have a consequence on later behavior. The negative priming effect has surprisingly long duration; the effect survived 200 intervening stimuli. Visual search is one of the most frequently employed paradigms in research on visual processing. In various types of the paradigm participants have to indicate the presence/absence or the identity of a particular item (target). Latent learning of visual displays is demonstrated in visual search tasks. Using a contextual cuing2 variation of the search paradigm, Jiang and Leung (2005) asked the participants to search for a black (or white) T among black L and white L distractors. The task was the identification of the orientation of the T-shape. When the black T was the
Representation of regularities in visual stimulation
target, black Ls were considered as attended context, whereas the white Ls served as the ignored context. In several trials either the attended or the ignored context was repeated. Afterwards the attended and ignored context changed color. Context repetition effect was assessed for the attended and ignored context, and the effect of transfer was assessed as the performance after the change of the context colors. Repetition of the attended distractor set facilitated the search, but no such context effect appeared to the repetition of the ignored set. However, reversing the color relevance, the previously ignored context led to faster RT than a new arrangement (context) of distractors. Accordingly, the repeated presentation of an arrangement of non-attended distractors resulted in latent learning, i.e., the acquisition of long-lasting and high capacity visual memory representation. The role of implicit memory in visual search was also demonstrated in a series experiments by Lleras et al. (2005). These authors investigated target identification RT in case of repeated displays containing 16 or 32 elements. Distributions of search RT were markedly different in the first and in the subsequent presentations. As an example (Experiment 1), at the first presentation only 4% of the responses were faster than 500 ms, while in the second and third presentations RTs shorter than 500 ms appeared with probabilities of 53% and 52%, respectively. The authors termed this effect as ‘rapid resumption’. This effect is larger at longer stimulus duration, it is not disrupted by the presentation of a new search display, the duration of the effect (as indicated by the interval between the consecutive presentations) is longer than 3000 ms. It is difficult to explain these results without the consideration of a representation acquired after the first presentation. Further results are needed to disclose the capacity of this memory. The authors suggested that it is restricted to a limited set of display elements, including the target. However, this memory is definitely implicit, and at the same time it is “active” in the sense that it contributes to the identification of the target at the next presentations of the display. As Lleras et al. (2005) suggested, a predictive model was formed at the first presentation, and at the consecutive presentations this model was tested.
4.2.2 Development of memory representations for visual stimuli Several attempts were made to demonstrate the gradual build-up of memory representation in the change blindness paradigm. Rensink et al. (2000) presented one of the scenes for a longer period of time before the flickering sequence of changing scenes. This preview was considered as an opportunity for the consolidation of the representation of the scene. This preview condition resulted in no faster (less flicker cycle) change detection. However, in some other studies accumulation of information was observed over successive presentations. Object preview
111
112 István Czigler
increased localization performance (Hollingworth 2005). Recall memory at successive interrupted presentation of a scene (another scene was presented between two 250 ms presentations of a scene) was similar to the results without interruption (Melcher 2001). In a recent study Vierck and Kiesel (2008) varied the number of pre-change frames. Either one (AB cycle), two (AABB cycle) or five (AAAAABBBBB cycle) pictures were presented before the change. In case of information accumulation, the longer the cycle, the less change is needed for change detection. The results confirmed this assumption. Using an “oddball-like” sequence where one of the scenes were presented frequently, and the other one infrequently (AAAAAB), and comparing change detection with a sequence of equal number of change in a “roving” sequence (AAABBB), the roving procedure produced faster detection. This result shows that the build-up of both representations contributes to conscious change-detection. Finally, the authors obtained better change-detection performance for the regular presentation of the scenes (AABB) than for the random presentation with identical number of repetitions and changes. It seems that the random presentation prevented the build-up of separate memory representation for the two different scenes. Such results emphasize the role of stimulus repetition. It seems that repetition is not only a temporal summation process. Successive presentations can be conceived as successive samples from the environment. Identical samples are capable of creating representation of regularities, whereas a single sample, even if the sampling time is long, is just a single event. However, at this time there is no direct support of this suggestion. At any rate, this suggestion may account for the lack of beneficial effect of preview in the Rensink et al. (2000) study. In a general theory of visual scene representation Rensink (2000) proposed that coherent object representation depends on focal attentional processes. Without a low-capacity attentional system only volatile units (proto-objects) are formed. Proto-object generation is continuous, and in the absence of consolidated memory consecutive proto-objects replace each other at each stimulus presentation.3 Therefore proto-objects cannot participate in comparison processes between the pre- and post-change scenes. Proto-objects are stabilized when feed-back arrives from high-level connections (called nexus). This process requires attention. Objects of such state belong to the coherence field. Withdrawal of attention leads the loss of coherence, without any remaining detailed (visual) memory4 Rensink (2002) also hypothesized fast, non-attentive processes capable of recognizing the abstract meaning (gist) of scenes, and capable of preserving the layout (spatial arrangement) of some objects. Gist and layout information may influence spatial attention, and this way they contribute to the formation of object coherence. It is unclear, however, what is the status of the ‘gist’ and the ‘layout’.
Representation of regularities in visual stimulation 113
Fast and automatic processes contribute to the formation of short-term conceptual memory (e.g., Potter 1976; Intraub 1999). However, such memories are “fleeting” (V. Coltheart 1999), and these representations are usually considered as only precursors of stabilized memory representations. Contrary to traditional models of cognitive psychology, emergence of categorical representations is surprisingly fast. As an example, VanRullen and Thorpe (2001) asked participants to decide on category membership (animals or vehicles) of picture contents. Event-related potentials (ERPs) differentiated the two categories as early as 70–80 ms after stimulus onset, and large differences were obtained 150 ms post stimulus (for related results see e.g. Mouchetant-Rostaing et al. 2000; Schendan et al. 1998). Saccadic eye movements to a target category were initiated as early as 120 ms (Kirchner and Thrope 2005), with a median RT of 228 ms. Minimal manual RT in the categorization task was 260 ms, but the median was 400 ms, i.e., much longer (Delorme et al. 2004). Physical parameters of the pictures, like contrast influenced the processing time (Macé et al. 2005). Development of gist, as a representation of visual scenes seems to be an automatic process. Li et al. (2002) obtained no performance decrement in a categorization task where the pictures were presented eccentrically in non-cued positions, and a simultaneous demanding task was performed at the center of the visual field. On the basis of the previous results one may predict no change blindness in case of (a) changing gist with similar elements, (b) with scenes containing objects only within the capacity of short-term categorical memory. Testing possibility (a) is difficult (but not impossible). Concerning possibility (b), results of a recent study on a related phenomenon are promising. As the inattentional blindness phenomenon (Mack and Rock 1998) shows, even a sudden appearance of an object remains unnoticed if the object is presented together with task-related stimuli. However, as Koivisto and Revonsuo (2007) demonstrated, unexpected stimuli with semantic relation to the task stimuli avoid the inattentional blindness phenomenon. Unexpected stimuli were recognized even if the physical appearance of the task stimuli and the unexpected stimuli were different (related words and pictures). As an explanation, semantic activation might contribute to the consolidation of the (otherwise fleeting) representation of the unexpected irrelevant stimuli.
4.2.3 Behavioral effects of non-conscious change on detection Perceptual processes without awareness are well demonstrated (for a review see e.g. Merikle et al. 2001), and criteria of perception without awareness are carefully defined (e.g. Schmidt and Vorberg 2006). Semantic processing of verbal material
114 István Czigler
without awareness is shown in periods of attentional blink5 (Luck et al. 1996). However, the level of processing seems to be different when conscious processing is prevented by insufficient input quality (like masking) and by the lack of control processes (like inattention). In the former case higher level processing is questionable (Kouider and Dehaene 2007), whereas in the latter semantic processing is well-demonstrated, even at the level of brain electric activity (Vogel et al. 1998; Rolke et al. 2001). Dehaene (e.g. Dehaene et al. 2006) introduced the term ‘preconscious’ when supra-threshold stimulation elicits no concurrent and feed-back activity from central executive (frontal) mechanism. Maybe this is the case in the implicit demonstrations of semantic processing. In a widely discussed study Fernadez-Duque and Thornton (2000) designed a version of unattended priming, and combined it with the change blindness effect. The task was the detection of the orientation of a white bar among an array of dark bars. The target was presented for a short duration, and its appearance was preceded by the presentation of two similar arrays. The timing of the preceding arrays corresponded to parameters optimal for change blindness (250 ms stimulus duration and 250 ms inter-stimulus interval). In certain trials one of the bars in the second array was rotated by 90 degree (relative to the orientation in the first array). The most interesting finding of the study was an increased response accuracy in trials when (a) the changed bar had the same orientation as the test bar (congruency effect), even if (b) participants did not notice the rotation (change blindness), and (c) increased performance appeared also in locations other than the changing ones (i.e., responses were more accurate even in trials where the test bar appeared in a location opposite to the changing bar). Mitroff et al. (2002) challenged the implicit change detection explanation. They pointed out that the in the Fernadez-Duque and Thornton (2000) study the changing bar might function as a localization cue for the appearance of the test bar (the test bar occurred either in the position of change or in the position opposite to the change, but not in the other six possible positions). Eliminating the spatial effect (tests were presented in each location), Mitroff et al. (2002, Experiment 4B) obtained performance increase only the in locations of the changing bar of the second array. Furthermore in the experiment copied the original design (Experiment 4A) the participants were aware of the contingency between the location of the changing bar and the test bar. However, Fernadez-Duque and Thornton (2003) pointed out that Mitroff et al. (2002) obtained no congruency effect (i.e., in the “spatially cued” trials performance was higher for both changed and unchanged bars, irrespective of the participants’ awareness of the change). The lack of congruency effect was attributed to a floor effect (generally low accuracy and slow RT). Therefore Fernadez-Duque and Thornton (2003) went on to replicate the study using a condition with spatial uncertainty, but with a higher
Representation of regularities in visual stimulation 115
general performance level (duration of the test array was longer). They obtained congruence effect even in trials of non-detected change. In this study RT was shorter in the congruent trials. Finally, in a more recent study Laloyaux et al. (2006) attempted to control for some possible methodical problems of the previous studies, and increased the number of participants. The results reinforced the implicit-change-effect view (in unaware change trials both RT and accuracy performance was higher when the orientation of the test stimulus was congruent with that of the bar after the change), i.e., the original suggestions as proposed by Fernadez-Duque and Thornton (2000).
4.2.4 Neuroscience of undetected changes 4.2.4.1 Change blindness related studies Using the methods of event-related brain potentials (ERPs), Turatto et al. (2002) investigated effects of detected and undetected changes within the visual background and foreground. To this end they presented grey circles (foreground) over vertical black-and-white stripes (background). A pair of such stimuli was presented, and “same-different” decision was required. While participants spontaneously detected brightness changes in the foreground (i.e., in the circles), changes of the background (i.e. change of the dark stripes to white and vice versa) were detected only after the instruction about the possibility of such changes. Unlike undetected changes (change in the background) detected changes (change of the circles and change of the stripes after the instruction) elicited ERP activity (late positivity). This activity emerged earlier over the anterior (frontal) regions than over the posterior (parietal) regions. The authors concluded that this anteriorposterior direction was a reflection of top-down influences. In this study no earlier activities were analyzed, so it is unknown, whether undetected changes elicited activity different from stimulus repetition. In the latency range of late positivity there are various sources of ERP activity (P3a, P3b), and these activities are not hierarchically dependent. Therefore the results are not conclusive in assessing top-down influences. Using alphanumeric stimuli Niedeggen et al. (2001) also obtained differences in the late positivity latency range between detected and undetected changes. In this experiment either the identity or the location of a character became different. The alternative stimuli were presented in repeating cycles. Detected changes elicited the late positivity, whereas missed changes elicited ERP activity similar to the no-change trials. Interestingly, late positivity emerged to changing stimuli in the stimulus cycle preceding the reported change, i.e., ERP appeared to be a more sensible measure of change detection than the explicit report. The authors did
116 István Czigler
not measure any change-related effects in earlier ERP latency ranges. It should be noted that the averaged number of trials was relatively low; therefore identification of ERP components with lower amplitudes would have been difficult. Koivisto and Revonsuo (2003) designed an ERP experiment using stimuli similar to that of the Fernadez-Duque and Thornton (2000) study. Pairs of displays containing vertical and horizontal bars were presented, and participants decided whether the second member was identical to the first one, or one of the bars changed orientation. As a hint of implicit change detection, they obtained longer RT in undetected change trials than in the no-change trials (see also Williams and Simons 2000). However, there was no ERP difference between the non-detected change and no-change conditions. Eimer and Mazza (2005) capitalized on results showing that attention directed to one side of the visual display elicits a negative ERP component over the contralateral posterior locations (N2pc). Pairs of displays containing faces were presented, with the possibility of a changing face across the successive pairs. The task was “same-different” decision. Unlike undetected changes, detected changes elicited the N2pc component, and elicited enhanced late positivity.6 Fernandez-Duque et al. (2003) presented pictures of complex scenes in a change blindness situation. They were able to compare the ERPs to undetected changes to non-changing scenes (i.e., when participants were unaware of the change versus non-changing scenes when the participants searched for a possible change). Significant ERP difference emerged in the 240–300 ms range as a positive deflection over the anterior locations to the undetected change. This finding opened the possibility of ERP signs of implicit change detection. In a recent investigation Kimura et al. (2008c) obtained an anterior positive ERP component to undetected irrelevant color-changes. The relevant stimuli (dots) appeared at the center, the task was the detection of the possible sizechange of the second member of a dot-pair. These dots were surrounded by other, but task-unrelated colored dots. In several trials the color of the task-irrelevant dots was different in the two members of the stimulus pair. Color change elicited a positive ERP component. Interestingly, the latency of the positive component was shorter (160–180 ms) than the positivity of the Fernandez-Duque et al. (2003) study. The authors attributed the difference to the complexity difference between the displays of the two experiments. While ERP methods are sensitive to the temporal aspects of brain activity, sources of activity can be identified by using brain imaging methods (PET, fMRI). In an fMRI study Beck et al. (2001) compared the activity to detected and undetected changes within a secondary task. Undetected stimulus change elicited activity in posterior areas (fusiform and lingual gyri) and also in the in the inferior frontal gyrus. This pattern was markedly different from the activity following
Representation of regularities in visual stimulation 117
detected changes. Detected changes elicited activity in parietal and dorsolateral frontal cortices, and the activity increased in the stimulus-specific brain areas. The importance of these results is twofold. First, undetected changes elicited characteristic brain activity, and second, in conscious detection prefrontal areas were recruited.7
4.3
“Oddball” studies: The visual mismatch negativity
Results from the auditory modality provided convincing evidences about an implicit memory system capable of the detecting environmental regularities. Stimuli violating such regularities elicit the mismatch negativity (MMN) component of ERPs. MMN is elicited even if the irregular stimuli are irrelevant unattended events (for reviews see Schröger 2007; Winkler 2007; Winkler this volume). In the majority of studies MMN was investigated in the passive oddball paradigm. In this paradigm regularity is set up by identical (or almost identical) stimuli (standard), and the regulation is violated by the presentation of stimuli with different characteristics (e.g. deviant pitch, loudness, duration, etc.). Such deviant stimuli are presented infrequently, and the MMN is the difference between the ERPs to deviant and regular stimuli. The paradigm is passive, because in the typical case neither the standard nor the deviant stimuli are involved into the ongoing task. Participants usually read interesting books, play video-games, or perform various tasks. In order to investigate change-related brain electric without the involvement of focal attention and conscious awareness in vision, the proper paradigm has to be similar to the passive oddball paradigm of the auditory modality. The candidate of deviant-related ERP component would be the visual mismatch negativity (vMMN). In fact, occipital/occipito-temporal negative waves to task-irrelevant, infrequently presented (deviant) visual stimuli were recorded as early as 1990 and 1992 (Alho et al. 1992; Czigler and Csibra 1990, 1992; Woods et al. 1992). However, authors of these studied noticed that attentive processing of the irrelevant stimuli were not strictly controlled in these studies. Later, started with a study by Tales et al. (1999) and Heslenfeld (2003), a considerable collection of papers claimed to demonstrate the emergence of ERP components indexing automatic (preconscious, preattentive) change detection (for reviews see Pazo-Alvarez et al. 2003; Czigler 2007). Figure 1 shows a typical vMMN. In oddball experiments vMMN was elicited by various deviant stimulus features, like color, spatial frequency, spatial contrast, motion direction, shape, line orientation, stimulus location, facial expression.8 In order to consider the vMMN (or any other ERP components) as an index of implicit memory-related
118 István Czigler
Figure 1. Event-related potentials and difference potentils in a passive visual oddball paradigm. The standard:deviant ratio was 9:1. Stimuli were grating patterns of low vs. high spatial frequency. In different sequences either the low or the high frequency stimuli were standard or deviant. Difference potentials (deviant minus standard) were calculated for the same stimuli in deviant and standard roles. Group average of 12 participants. (Czigler, unpublished data.)
activity, two important issues have to clarify. First, in oddball studies ERP difference between the rare (deviant) and frequent (standard) stimuli may emerge from reasons other than the mismatch between the memory representation of the standard and the representation of the deviant. Frequent stimulation may elicit refractory state in brain structures; therefore responses to “fresh” neural substrate to the deviant may elicit different ERP activity. Whereas some results support such interpretation (e.g. Kenemans et al. 2003; Kimura et al. 2006), in studies I’ll review the refractoriness interpretation is inappropriate (for research specifically investigating this issue see Czigler et al. 2002; Czigler et al. 2007; Kimura et al. 2008b; Pazo-Alvarez et al. 2004). The second issue concerns the
Representation of regularities in visual stimulation 119
control of attention. Unfortunately there are only few studies where the participants were interviewed whether they had noticed the vMMN-related stimulus change. However, in these studies (Winkler et al. 2005; Czigler and Pató 2009) vMMN was fairly similar to the results of other studies in the literature. Hereinafter those vMMN results will be discussed that go beyond the simple “rare feature versus different feature” oddball paradigm, showing that the implicit memory system is more complex than a simple change-detection device.9 This is a system capable of registering regularities and the violation of regularities.
4.3.1
Visual mismatch negativity to feature conjunction (Winkler, Czigler, Sussman, Horváth and Balázs 2005)
It is truism that we perceive objects and scenes, not isolated features like color, contour, location, etc. The role of attentive processes in binding stimulus features is debated. Some prevalent theories emphasize the role of attention (e.g. Treisman and Gelade 1980; Rensink 2000; Quinlan 2003; Wolfe 1994), whereas object-related theories of attention emphasize the primacy of object formation (and accordingly the automatic conjunction of features; e.g. Duncan 1984). In a vMMN study we investigated the possibility of automatic conjunction of two features: line orientation and color. Colored (red/black or blue/black) grating patterns (horizontal or vertical) were constructed. Two conjunctions (e.g. blue-vertical and red-horizontal) were frequent members of a stimulus sequence, whereas the other two conjunctions were rare. The stimulus sequence was presented on the upper and lover part of a screen as task-irrelevant stimuli. Stimulus duration was 17 ms, and the inter-stimulus interval was 350–450 ms. Participants fixated on the center of a dark stripe, and attended to a cross in this location. From time to time the cross was made wider or longer. Participants had to indicate the unpredictable change of the cross. ERPs were compared to the rare and frequent color/orientation conjunctions. ERP difference emerged as a posterior negativity with 128 ms peak latency (vMMN), followed by a small posterior positive wave with 188 ms latency. Having completed this part of the study we conducted a recognition task. Gratings were presented and the participants were asked to choose the frequent ones. Performance was at chance level. In the second part of the experimental session the grating patterns became task relevant. Participants had to detect one of the two rare color-direction conjunctions (target stimulus). Grating patterns elicited characteristic attention-related ERP components, and from the records we were able to reconstruct also the vMMN.10 The main finding of this study is the identification of a memory system capable of registering the probability relationships among the grating patterns, even
120 István Czigler
if there is no conscious experience on such relationships. The memory system is operating on a level above feature binding, because in the study all features had equal probabilities. The results indicate the possibility of automatic feature binding, at least in case of a relatively simple visual environment. Furthermore, according to the results, this memory is capable of storing more than one representation (there were two frequent conjunctions).
4.3.2 The storage underlying vMMN is not a passive iconic memory (Czigler, Weisz and Winkler 2007) In the majority of vMMN studies the inter-stimulus interval was short, within the range of 300–1200 ms. This duration is within the life-time of the iconic memory. Sensitivity to backward visual masking is one of the defining characteristics of iconic memory. In this study both the standard and the deviant stimuli were followed by patterned masks. We varied the test-mask interval. In case of a maskable memory no vMMN is expected to emerge at any test-mask of the intervals, because the appearance of a mask between the presentations of the two stimuli destroys the memory content of the “icon”. In case of development of a non-maskable memory, no vMMN is expected below the critical stimulus-response asynchrony (SOA), and vMMN emergence is expected above the critical SOA. Furthermore, the value of the critical SOA would indicate the duration necessary for the development of the memory system underlying vMMN. VMMN was investigated to green/black and red/black equiluminant checkerboards (test stimuli). One of the checkerboard appeared frequently within the sequence (standard), the other appeared infrequently (deviant). These stimuli were followed by masks (random arrangements of red and green hexagons). Both test and mask durations were 14 ms. In two experiments the SOA was varied between 14 and 174 ms. Both test and mask stimuli were irrelevant. Participants attended the center of the screen and performed a task similar to that of the previous (feature conjunction) study. In case of short SOA (14 and 27 ms) no vMMN emerged. However, at longer SOAs we recorded a posterior negativity (vMMN) with ~130 ms peak latency, followed by a posterior positive component. VMMN amplitudes were similar in the 40 to 174 ms SOA range. These results were fairly similar to those of the Winkler et al. (2005) study. In separate sequences we investigated the detection performance under similar conditions. Participants responded to the appearance of the deviant checkerboards. At 14 ms SOA detection performance was at chance level. Performance increased at 67 ms SOA. At 174 ms SOA detection was almost perfect.
Representation of regularities in visual stimulation 121
As these results show, a mask, above a critical SOA value does not prevent vMMN emergence. Accordingly, the memory underlying vMMN is different from the maskable iconic storage. However, in order to develop such representation, ~30–40 ms is needed. This interval is comparable to the duration necessary for the development of a memory that allows categorical decision (Kovacs et al. 1995). However, on the level of explicit discrimination we obtained partial backward masking effects even at higher values of SOA.
4.3.3
Sequential rules and vMMN (Czigler, Weisz and Winkler 2006)
The memory system underlying the (auditory) MMN is sensitive to sequential rules. As an example, a repeated stimulus within a sequence of alternating stimuli elicits the MMN (e.g. Horváth et al. 2001). In the Czigler et al. (2006) study regular sequences were constructed from isoluminant green/black and red/black checkerboards in an AABBAABB… order. Infrequently an irregular third identical stimulus was presented (AABBAABBAAABB…). Accordingly, stimulus change per se was contrasted to the violation of regularity. The task was similar to those of our previous studies. There were two ERPs of interest in this paradigm, ERP to the irregular repetition and ERP to the regular change. These ERPs were compared to the ERPs to regular repetitions. Irregular repetition elicited larger negativity as early as 100–140 ms, and a further negative component in the 220–260 ms epoch. Although the latency of the latter component is longer than the vMMNs of the previous studies, the distribution was clearly posterior, and the latency is within the latency range of some vMMN studies (e.g. Tales et al. 1999). Regular change, however, elicited no vMMN. Instead, ERP to such changes were positive in relation to the regular repetition. It is possible that such ERP difference is related to the “change positivity”, reported by Kimura (Kimura et al. 2005, 2006, 2008a, b). The negativity to a stimulus repetition cannot be explained on the basis of changing physical appearance, or as the refractoriness of an otherwise negative component emerging in this range. As a conclusion, vMMN, like its auditory homolog, is sensitive to the violation of sequential regularities, not only to the physical change of stimulation.
4.3.4 Implicit change detection and vMMN (Czigler and Pató 2009) To use vMMN as a tool for investigating the possibility of implicit registration of a changing scene, the method has to accomplish two requirements. First, (as the Czigler et al. 2006 study indicated) the change has to violate the regularity of stimulation. Second, a careful test on the lack of conscious change detection is needed.
122 István Czigler
In order to establish regularity we used a “roving standard” procedure. In this procedure a sequence of identical stimuli are presented (in the present study the length of the sequence varied randomly within the range of 10–15 stimuli); thereafter in a new sequence a new stimulus type is presented. Such cycle can be repeated several times (18–22 in the present study). The stimuli were grid patterns (either green on red background or vice versa). In the lower half of the visual field the rectangle elements of the grids were either horizontal or vertical within a sequence. In the upper half-field there was no stimulus change. The grid pattern was irrelevant. Participants responded to the size change of a small rectangle within the center of the screen. At the half-time of the experimental session we introduced a semi-structured interview. If there was the smallest indication that a participant detected the changing pattern in the lower half field, her/his data were omitted from further processing and the session was terminated. From the initial 17 participants only 3 reported that the grids were not identical at all presentations (no one reported the real stimulus change). Before the second part of the session the “roving standard” construction was explained and demonstrated. Participants were instructed that the task remained the same, but they are free to observe the changes of the pattern. In spite of the lack of any detected change in the first part of the session, in the 250–400 ms latency range the stimuli in the change position elicited a right posterior negative shift, in comparison to the ERPs to the regular stimuli (the fifth identical stimulus within the sequence). Markedly different change-related ERP effects emerged in the second part of the session. In similar comparisons we obtained two negativities. The earlier emerged in the 205–220 ms latency range as wide negativity over the right hemispheres, while the later one (in the 305–330 ms latency range) was a bilateral posterior negativity. Accordingly, ERP correlates of conscious detection were not only stronger, but they were qualitatively different from the effects of implicit registration of stimulus change. As a control condition we introduced regular, AABBAABB… presentation of the alternative grid patterns. This case there was no participant who reported any regularity within the stimulation. Comparing the ERPs to the changing and repeating stimuli, we obtained no EPR effects. As the results of these experiments show, in order to elicit vMMN-like ERP effects, it is necessary to establish the representation of environmental regularity, and violate the regularity.
4.4
Representation of regularities in visual stimulation 123
Functional significance of implicit registration of regularities in vision
Having demonstrated the possibility of a memory system capable of registering regularities in the unattended stimulus background and reacting to the violation of such regularities, the question is obvious: “Why do we need such a system?” Unlike the transient auditory word, visual scenes are supposed to be continuously available, as “memory outside” (O’Regan and Noë 2001). However, even in case of a steady scene (like a painting) and strong fixation, ones per three seconds on the average eye-blinks appeare. Duration of an eye-blink is surprisingly long, darkness lasts more than 200 ms (e.g. Caffier et al. 2005; Casse et al. 2007). Furthermore, in case of normal conditions, 100 ms periods without stimulus processing (saccadic suppression) occur at each saccadic eye movement (e.g. Diamond et al. 2000). As a conservative estimation, we spend 7–10% of our total awaked period in darkness. In the periods of darkness ‘local transients’ cannot cue changes. In a rich visual environment a limited capacity memory would be unable to follow all potentially important changes. Accordingly, in order to detect potentially important new objects/events a large capacity visual memory, not unlike the auditory memory, has evolutional advantage in serving the orienting system. However, the operation of a visual memory does not mean that we have to be aware of the results of its operations (e.g. consciously detect the violated regularities). On one hand, a phenomenon like change blindness cannot disproof the possibility of implicit detection. On the other hand, indirect measures (priming effects and ERP data) may indicate the contribution of the implicit system. In a recent paper Bargh and Morsella (2008) discussed the contrast between a dumb and a smart unconscious mind (Loftus and Klinger 1992). Based on research on subliminal, masked, etc. (i.e., ill-perceived) stimulation, effects of nonconscious processes seems to be rather limited. On the contrary, as Bargh and Morsella (2008) argue, the presence of some events automatically activates representations and response tendencies (contextual priming). In this sense unconsciousness is a sophisticated, flexible, and adaptive behavior guidance system. The system disclosed by the vMMN research is “moderately smart”. Like in the auditory modality, this system is a “primitive intelligence” (Näätänen et al. 2001), capable of registering not only basic visual features like colour, spatial frequency, direction, contrast, shape, movement direction, but also the conjunction of features, object-related changes and temporal regularities. However, there are no direct evidences about the adaptive functions of this memory. As have we suggested, violation of regularities may call for orientation processes (attentional capture, increased autonomic activation).
124 István Czigler
The system underlying auditory mismatch negativity is supposed to serve anticipatory functions (Winkler 2007, this volume). Registration of background activity is necessary for veridical auditory perception. A predictive model of the regularities of acoustic background is an effective tool for the computation of the difference between the actual net stimulation and the estimated background activity (Winkler 2007). Veridical visual perception also requires the registration of background stimulation. A simple example is the lightness constancy. In order to perceive the lightness (or colour) of an object,11 the perceptual system has to consider the general level of illumination. Since characteristics or the background illumination and the brightness of any objects are available simultaneously, one may say, that there is no need of registering either the regularities of background illumination or the intrinsic lightness properties (reflectance) of the objects in a memory system. Apart from theoretical positions (see Palmer 1999, pp. 126–133 for a summary), the models of lightness constancy assume the computation of a space-averaged reflectance value. Computation involves both fast and slow processes (Shimozaki et al. 2001). The latter is calculated across saccadic movements (Cornelissen and Brenner 1995), and supposedly, across eye-blinks. Calculation of reflectance is a laborious computation (particularly in case of coloured objects). Therefore registration of initial values and results of previous computations would be of considerable benefit. This initial value can be considered a part of the object file (Shimozaki et al. 1999). Stimulus background may change during the periods of the absence of information uptake (e.g. eye-blink, saccadic suppression). Accordingly, an implicit memory system registering the regularities of the background stimulation, and reacting to the violation of the regularities has real adaptive value. Recent results (Alvarez and Oliva 2008) show the operation of a memory system capable of storing ensembles of task-irrelevant visual features. In a multiple tracking task the participants followed the trajectory of four dots, whereas four other dots were irrelevant in the tracking task. All dots disappeared for a short period of time. Afterwards several dots reappeared. In one of the condition the participants indicated the location of the sole missing dot (attended or unattended), and in the other condition they indicated the centroid of four (either the attended or the unattended) missing dots. As the results show, performance was poor when the task required the location of an unattended dot, whereas performance was well above chance level when the task concerned the identification of the centroid of the unattended four dots. This result shows that the representation of statistical/holistic properties of the visual field does not require focal attentional processes. This finding fits to our suggestion about a memory system capable of storing the regularities of the background stimulation, and supporting our speculations about the role of such memory system in perceptual constancies.12
Representation of regularities in visual stimulation 125
Like in the auditory modality, updating of the content of implicit memory systems can be considered as an event giving rise to recordable brain activity, i.e. the vMMN. Research on the involvement of an implicit memory system in perceptual constancies is a topic of further research.
Acknowledgements Supported by the National Research Found of Hungary (OTKA – K716000). I thank for László Balázs, Lívia Pató, István Wikler and Júlia Weisz for their help.
Notes 1. The terminology in this description is different from a frequently used one. Lamme (2003) (see also Block 1996) made a distinction between “phenomenal” and “access” consciousness. Access consciousness is available for report, whereas phenomenal consciousness is an intermediate state, emerging as a consequence of feed-forward activation in the hierarchy of brain areas involved in stimulus processing and the activation of feed-back loops. Competition among these brain areas (attention) is supposed to select activation underlying “access” awareness. In the present paper the term “consciousness” is used in sense of “access consciousness”. We suppose that representation underlying ‘phenomenal consciousness’ has the capacity of producing implicit change detection. We consider this representation as unconscious. In the present paper the term ‘unconscious’, is similar to the term ‘ineffectively unconscious’ Lamme’s (2003). Apart from the different usage of some terms we share the view about the substantial role of feedback mechanisms in consciousness, as Lamme (2003) suggested it. 2. The effectiveness of contextual cuing in visual search is debated (see e.g. Kunar et al. 2007). However, analysis of the processes underlying visual search performance is beyond the scope of this paper. 3. It should be noted that Johnson, Hollingworth and Luck (2008) obtained evidences showing that an attention-demanding task had no larger interference on detection of conjunction change than the effect of such task on feature search. Such result contradicts to the need of sustained attention in preserving feature conjunctions (Wheeler and Treisman 2002). 4. Note that in this respect this model is different from an account proposed by Kahneman and Treisman (1992). The object file in the latter model preserves conjunction of features following a change of the attentional field. 5. See Verleger (this volume) for a comprehensiove discussion of the attentional blink phenomenon and the results of event-related potential studies using this phenomenon. 6. In this chapter we concentrate on effects of non-conscious stimulation. The other side of the coin, the event-related potential correlates of conscious processing is discussed in the chapter by Rolf Verleger. 7. A detailed review on the neural correlates of consciousness is beyond the scope of this paper. For a review see Dehane and Naccache (2001).
126 István Czigler
8. VMMN is sometimes preceded by a posterior positivity (‘change-related positivity’ (CRP); e.g. Kimura et al. 2005, 2006, in press). According to the results from this laboratory, CRP emerges different kinds of stimulus change. However, using oddball paradigm, from different laboratories only one study reported CRP-like activity (Fonteneau and Davidoff 2007). Furthermore, attention-independence of this component has not been seriously investigated. 9. Only one published study attempted to use vMMN for investigating non-conscious representation the pre-change displays in change blindness situation. Henderson and Orbach (2006) presented arrays of six patches. The second member of a stimulus pair were either identical to the first, or one of the patches changed orientation. The task was a non-speeded “same-different” detection. Throughout the session the same pattern was presented in 300 consecutive trials, and in the next 300 trials patches with orthogonal orientation were presented. The location with changing orientation was correctly cued, miscued or uncued. Change trials elicited a posterior negativity in the 180–320 ms latency range. This negativity emerged even in the miscuing condition. In the miscuing Detection performance was only 59.5%. However, in change trials in the absence of detected change no such posterior negativity emerged. These results are not without controversies. First, even if the same stimuli were presented in a series of trials as the first member of the pair, the number of different stimuli was relatively high (25%). Therefore the chance for a detected regularity was relatively low. Second, even if mismatch negativity (both visual and auditory) is considered to be elicited by unattended stimuli, it is possible that attentional processes modulate the component (as a well-known example from the auditory modality see Woldorff et al. 1991). Third, as suggested also by the authors, the change might be not large enough to elicit the vMMN. 10. The difference between the ERP to the rare conjuncion without task-related featues (Color–/Direction–) and average of the ERPs to stimuli with on target-related feature (Color+/Direction– and Color+/Direction+). 11. Lightness refers to the intrinsic property of an object, the proportion of reflected and absorbed light, whereas brightness refers to the value of light reflected by the object, i.e., it is dependent on illuminance and reflectance. 12. An obvious difference is the conscious access to ensemble features in the Alvarez and Oliva (2008) study and the non-conscious representation as disclosed by the vMMN results. Furthermore, a due to the dual-task nature of this task, the non-tracked dots were not fully unattended.
References Alho, K., Woods, D. L., Algazi, A. and Näätänen, R. (1992). Intermodal selective attention 2. Effects of attentional load on processing of auditory and visual stimuli in central space. Electroencephylography and Clinical Neurophysiology, 82, 356–368. Alvarez, G. A. and Oliva, A. (2008). The representation of simple ensemble visual features outside the focus of attention. Psychological Science, 19, 392–398. Bargh, J. A. and Morsella, E. (2008). The unconscious mind. Perspectives on Psychological Science, 3, 73–79. Beck, D. M., Rees, G., Frith, C. D. and Lavie, N. (2001). Neural correlates of change detection and change blindness. Nature Neuroscience, 4, 645–650.
Representation of regularities in visual stimulation 127
Becker, M. W., Pashler, H. and Anstis, S. M. (2000). The role of iconic memory in change detection tasks. Perception, 29, 273–286. Block, N. (2007). Consciousness, accessibility, and the mesh between psychology and neuroscience. Behavioral and Brain Sciences, 30, 481–499. Caffier, P. P., Erdmann, U. and Ullsperger, P. (2005). The spontaneous eye-blink as sleepiness indicator in patients with obstructive sleep apnoe syndrome: A pilot study. Sleep Medicine, 6, 155–162. Casse, G., Sauvage, J. P., Adenis, J. P. and Robert, P. Y. (2007). Videonystagmography to assess blinking. Graefes Archive for Clinical and Experimental Ophthalmology, 243, 1789–1796. Coltheart, M. (1980). Iconic memory and visible persistence. Perception and Psychophysics, 27, 183–228. Coltheart, V. (1999). Introduction: Perceiving and remembering brief visual stimuli. In V. Coltheart (Ed.), Fleeting memories (pp. 1–12). Cambridge, MA: MIT Press. Cornelissen, F. W. and Brenner, E. (1995). Simultaneous colour constancy revisited: An analysis of viewing strategies. Vision Research, 35, 2431–2448. Czigler, I. (2007). Visual mismatch negativity: Violating of nonattended environmental regularities. Journal of Psychophysiology, 21, 224–230. Czigler, I., Balázs, L. and Winkler, I. (2002). Memory-based detection of task-irrelevant visual change. Psychophysiology, 39, 869–873. Czigler, I. and Csibra, G. (1990). Event-related potentials in a visual discrimination task: Negative waves related to detection and discrimination. Psychophysiology, 39, 869–873. Czigler, I. and Csibra, G. (1992). Event-related potentials and the identification of deviant stimuli. Psychophysiology, 29, 471–484. Czigler, I. and Pató, L. (2009). Unnoticed regularity violation elicits change-related brain activity. Biological Psychology, 80, 339–347. Czigler, I., Weisz, J. and Winkler, I. (2006). ERPs and deviance detection: Visual mismatch negativity to repeated visual stimuli. Neuroscience Letters, 401, 178–182. Czigler, I., Weisz, J. and Winkler, I. (2007). Backward masking and visual mismatch negativity: electrophysiological evidence for memory-based automatic detection of deviant stimuli. Psychophysiology, 44, 610–619. Dehaene, S., Changeux, J. P., Naccache, L., Sackur, J. and Sergent, C. (2006). Conscious, preconscious and subliminal processing: A testable taxonomy. Trends in Cognitive Sciences, 10, 204–211. Dehane, S. and Naccache, L. (2001). Towards a cognitive neuroscience of consciousness: basic evidence and a workspace framework. Cognition, 79, 1–7. Delorme, A., Rousselet, G. A., Mace, M. J. M. and Fabre-Thorpe, M. (2004). Interaction of top-down and bottom-up processing in the fast visual analysis of natural scenes. Cognitive Brain Research, 19, 103–113. DeSchepper, B. and Treisman, A. (1996). Visual memory for novel shapes: Implicit coding without attention. Journal of Experimental Psychology: Learning, Memory and Cognition, 22, 27–47. Diamond, M. R., Ross, J. and Morrone, M. C. (2000). Extraretinal control of saccadic suppression. Journal of Neuroscience, 20, 3449–3455. Duncan, J. (1984). Selective attention and the organization of visual information. Journal of Experimental Psychology: General, 113, 501–517.
128 István Czigler
Fernandez-Duque, D., Grossi, G., Thornton, I. M. and Neville, H. J. (2003). Representation of change: Separate electrophysiological markers of attention, awareness, and implicit processing. Journal of Cognitive Neuroscience, 15, 491–507. Fernandez-Duque, D. and Thornton, I. M. (2000). Change detection without awareness: Do explicit reports underestimate the representation of change in the visual system? Visual Cognition, 7, 323–334. Fernandez-Duque, D. and Thornton, I. M. (2003). Explicit mechanisms do not account for implicit localization and identitication of change: An empirical reply to Mitroff et al. (2002). Journal of Experimental Psychology: Human Perception and Performance, 29, 846–857. Henderson, R. M. and Orbach, H. S. (2006). Is there mismatch negativity during change blindness? NeuroReport, 17, 1011–1015. Heslenfeld, D. J. (2003). Visual mismatch negativity. In J. Polich (Ed.), Detection of change: Event-related potential and fMRI findings (pp. 41–59). Boston: Kluver Academic Press. Hollingworth, A. (2006). Scene and position specificity in visual memory for objects. Journal of Experimental Psychology: Learning Memory and Cognition, 32, 58–69. Hollingworth, A. and Henderson, J. M. (2002). Accurate visual memory for previously attended objects in natural scene. Journal of Experimental Psychology: Human Perception and Performance, 28, 113–116. Horváth, J., Czigler, I., Sussman, E. and Winkler, I. (2001). Simultaneously active pre-attentive representations of local and global rules for sound sequences in human brain. Cognitive Brain Research, 12, 131–144. Intraub, H. (1999). Understanding and remembering brief glimpsed pictures: Implications for visual scanning and memory. In V. Coltheart (Ed.), Fleeting memories (pp. 47–70). Cambridge, MA: MIT Press. Jiang, Y. H. and Leung, A. W. (2005). Implicit learning of ignored visual context. Psychonomic Bulletin and Review, 12, 100–106. Kahneman, D., Treisman, A. and Gibbs, B. J. (1992). The reviewving of object files: Object specific integration of information. Cognitive Psychology, 24, 175–219. Kenemans, J. L., Jong, T. G. and Verbaten, M. N. (2003). Detection of visual change: Mismatch or rareness? NeuroReport, 14, 1239–1242. Kim, C. Y. and Blake, R. (2005). Psychophysical magic: rendering the visible ‘invisible’. Trends in Cognitive Sciences, 9, 381–388. Kimura, M., Katayama, J. and Murohashi, H. (2005). Positive difference in ERPs reflect independent processing pf visual changes. Psychophysiology, 42, 369–379. Kimura, M., Katayama, J. and Murohashi, H. (2006). Probability-independent and -dependent ERPs reflecting visual change detection. Psychophysilogy, 43, 180–189. Kimura, M., Katayama, J. and Murohashi, H. (2008a). Involvement of memory-comparisonbased change detection in visual distraction. Psychophysiology, 45, 445–457. Kimura, M., Katayama, J. and Murohashi, H. (2008b). Implicit change detection: Evidence from event-related potential. Poster presented at the 29th International Congress of Psychology, Berlin. Kimura, M., Katayama, J. and Ohira, H. (2008c). Event-related potential evidence for implicit change detection: A replication of Fernadez-Duque et al. (2003). Neurosceience Letters, 448, 236–239. Kirchner, H. and Thorpe, S. J. (2005). Ultra-rapid object detection with saccadic eye movements: Visual processing seppd revisited. Vision Research, 46, 1762–1776.
Representation of regularities in visual stimulation 129
Koivisto M. and Revonsuo, A. (2003). An ERP study of change detection, change blindness, and visual awareness. Psychophysiology, 40, 423–429. Koivisto, M. and Revonsuo, A. (2007). How meaning shapes seeing. Psychological Science, 18, 845–849. Kouider, S. and Dehaene, S. (2007). Levels of processing during non-conscious perception: a critical review of visual masking. Philosophical Transactions of the Royal Academy of Sciences, 362, 857–875. Kovacs, G., Vogels, R. and Orban, G. A. (1995). Cortical correlate of pattern backward masking. Proceedings of the National Academy of Sciences, USA, 923, 5587–5591. Kunar, M. A., Flusberg, S., Horowitz, T. S. and Wolfe, J. M. (2007). Does contextual cueing guide the deployment of attention? Journal of Experimental Psychology: Human Perception and Performance, 33, 816–828. Laloyaux, C., Destrebecqz, A. and Cleeremans, A. (2006). Inplicit change identification: A repetition of Fernandez-Duque and Thornton (2003). Journal of Experimental Psychology: Human Perception and Performance, 32, 1366–1397. Lamme, V. A. F. (2003). Why visual attention and awareness are different. Trends in Cognitive Sciences, 7, 12–18. Landman, R., Spekreijse, H., and Lamme, A. F. (2003). Large capacity storage of integrated objects before change blindness. Vision Research, 43, 149–164. Li, F. F., VanRullen, R. and Perona, P. (2002). Rapid natural scene categorization in the near absence of attention. Proceedings of the National Academy of Sciences of the United States of America, 99, 9596–9601. Lleras, A., Rensink, R. A. and Enns, J. T. (2005). Rapid resumption of interrupted visual search. Psychonomic Science, 16, 684–688. Loftus, E. F. and Klinger, M. R. (1992). Is the unconscious smart or dumb? American Psychologist, 47, 761–765. Luck, S. J., Vogel, E. K. and Shapiro, K. L. (1996). Word meaning can be assessed but not reported during the attentional blink. Nature, 383, 616–618. Macé, M. J. M., Thorpe, S. J. and Fabre-Thorpe, M. (2005). Rapid categorization of achromatic natural scenes: How robust at very low contrasts? European Journal of Neuroscience, 21, 2007–2018. Mack, A. and Rock, I. (1998). Inattentional Blindness. Cambridge, MA: MIT Press. Melcher, D. (2001). Persistence for visual memory for scenes. Nature, 412, 401. Merikle, P. M., Smilek, D. and Eastwood, J. D. (2001). Perception without awareness: Perspectives from cognitive psychology. Cognition, 79, 115–134. Merilke, P. M. (1992). Perception without awareness: Critical issues. American Psychologist, 47, 792–796. Mitroff, S. R., Simonsm, D. J. and Franconeri, S. L. (2002). The siren song of implicit change detection. Journal of Experimental Psychology: Human Perception and Performance, 28, 798–815. Mouchetant-Rostaing, Y., Giard, M. H., Bentin, S., Augera, P. A. and Pernier, J. (2000). Neurophysiological correlates of face gender processing in humans. European Journal of Neuroscience, 12, 303–310. Müller, D., Winkler, I., Roeber, U., Schaffer, S., Czigler, I. and Schröger, E. (2008). Pre-attentive object formation is revealed by differential processing of deviances related to same or different objects. Perception, 37, 212–Suppl., S.
130 István Czigler
Näätänen, R., Tervaniemi, M., Sussman, E., Paavilainen, P. and Winkler, I. (2001). ‘Primitive intelligence’ in the auditory cortex. Trends in Neurosciences, 24, 283–288. Niedeggen, M., Wichmann, P. and Stoerig, P. (2001). Change blindness and time to consciousness. European Journal of Neuroscience, 14, 1719–1726. Norvalis, A. and Miller, J. (2008). Unconscious detection of change: Can we do without awareness? Personal communication, 16th April, 2008. O’Regan, J. K. and Noë, A. (2001). A sensorimotor account of vision and visual consciousness. Behavioral and Brain Sciences, 24, 883–975. Palmer, S. E. (1999). Vision science. Cambridge, MA: MIT Press. Pazo-Alvarez, P., Amenedo, E. and Cadaveira, F. (2004). Automatic detection of motion direction changes in the human brain. European Journal of Neuroscience, 19, 1978–1986. Pazo-Alvarez, P., Cadaveira, F. and Amendeo, E. (2003). MMN in the visual modality: A review. Biological Psychology, 63, 199–236. Potter, M. C. (1976). Short-term conceptual memory for pictures. Journal of Experimental Psychology: Human Learning and Memory, 2, 509–522. Quinlan, P. T. (2003). Visual feature integration theory: Past, present, future. Psychological Bulletin, 129, 643–673. Raymond, J. E., Shapiro, K. L. and Arnell, K. M. (1992). Temporary suppression of visual processing in an RSVP task: An attentional blink? Journal of Experimental Psychology: Human Perception and Performance, 18, 849–860. Rensink, R. A. (2000). Seeing, sensing and scrutinizing. Vision Research, 40, 1469–1487. Rensink, R. A. (2000). The dynamic representation of scenes. Visual Cognition, 7, 17–42. Rensink, R. A. (2002). Internal vs. external information in visual perception. Hawthorne, NY: Symposium on smart graphics, 63–70. Rensink, R. A., O’Regan, J. K. and Clark, J. J. (2000). On the failure to detect changes in scenes across brief interruptions. Visual Cognition, 7, 127–145. Rock, I. and Gutman, D. (1981). The effect of inattention and form perception. Journal of Experimental Psychology: Human Perception and Performance, 7, 275–282. Rolke, B., Heil, M., Streb, J. and Henninghausen, E. (2001). Missed prome words within the attentional blink evoke an N400 semantic priming effect. Psychophysiology, 38, 165–174. Schendan, H. E., Ganis, G. and Kutas, M. (1998). Neurophysiological evidence for visual perceptual categorization of words and faces within 150 ms. Psychophysiology, 35, 240–251. Schmidt, T. and Vorberg, D. (2006). Criteria for unconscious cognition: Three types of dissociation. Perception and Psychophysics, 68, 489–504. Schröger, R. (2007). Mismatch negativity: A microphone into auditory memory. Journal of Psychophysiology, 21, 138–146. Shimozaki, S. S., Eckstein, M. and Thomas, J. P. (1999). The maintenance of apparent luminance of an object. Journal of Experimental Psychology: Human Perception and Performance, 25, 1433–1453. Shimozaki, S. S., Thomas, J. P. and Eckstein, M. P. (2001). Effects of luminance oscillations on simulated lightness discrimination. Perception and Psychophysics, 63, 1048–1062. Simons, D. J. (2000). Current approaches to change blindness. Visual Cognition, 7, 1–15. Simons, D. J. and Levin, D. T. (1997). Change blindness. Trends in Cognitive Sciences, 1, 78–89. Sperling, G. (1960). The information available in brief visual presentations. Psychological Monograph, 74, 1–29. Sperling, G. (1967). Successive approximations to a model for short term memory. Acta Psychologica, 27, 285–292.
Representation of regularities in visual stimulation 131
Tales, A., Newton, P., Troscianko, T. and Butler, S. (1999). Mismatch negativity in the visual modality. NeuroReport, 10, 3363–3367. Tipper, S. P. (2001). Does negative priming reflect inhibitory mechanisms? A review and integration of conflicting views. Quarterly Journal of Experimental Psychology, Section A. Human Experimental Psychology, 54, 321–343. Treisman, A. M. and Gelade, G. (1980). A feature integration theory of arrention. Cognitive Psychology, 12, 97–136. Turatto, M., Angrilli, A., Mazza, V., Umilta, C. and Driver, J. (2002). Looking without seeing the background change: Electrophysiological correlates of change detection versus change blindness. Cognition, 84, B1–B10. VanRullen, R. and Thorpe, S. J. (2001). The time course of visual processing: From early perception to decision making. Journal of Cognitive Neuroscience, 13, 454–461. Vierck, E. and Kiesel, A. (2008). Change detection: Evidence for information accumulation in flicker paradigm. Acta Psychologica, 127, 309–323. Vogel, E. K., Luck, S. J. and Shapiro, K. L. (1998). Electrophysiological evidence for postperceptual locus of suppression during attentional blink. Journal of Experimental Psychology: Human Perception and Performance, 24, 1654–1674. Wheeler, M. and Treisman, A. M. (2002). Binding in short-term visual memory. Journal of Experimental Psychology: General, 131, 48–64. Williams, P. and Simons, D. J. (2000). Detecting changes in novel, complex three-dimensional pbjects. Visual Cognition, 7, 297–332. Winkler, I. (2007). Interpreting the mismatch negativity. Journal of Psychophysiology, 21, 147– 163. Winkler, I., Czigler, I., Sussman, E., Horváth, J. and Balázs, L. (2005). Preattentive binding of auditory and visual stimuli. Journal of Cognitive Neuroscience, 17, 320–339. Woldorff, M. G., Hackley, S. A. and Hillyard, S. A. (1991). The effect of channel-selecticve attention on the mismatch negativity wave by deviant tones. Psychophysiology, 28, 30–42. Wolfe, J. M. (1994). Guided search 2.0: A revised model of visual-search. Psychonomic Bulletin and Review, 1, 202–238. Woods, D. L., Alho, K. and Algazi, A. (1992). Intermodal selective attention 1. Effects on event– related potentials to lateralized auditory and visual stimuli. Electrophyisiology and Clinical Neurophysiology, 82, 341–355.
chapter 5
Auditory learning in the developing brain Minna Huotilainen and Tuomas Teinonen University of Helsinki, Finland
5.1
Early auditory learning and measurements of behaviour
Today, we know that learning from the auditory environment starts already in feto. Among the first evidence was the noticed preference of newborn infants towards the voice of their mother. DeCasper and Fifer (1980) were the first to demonstrate this effect. They performed two experiments investigating the preference of newborn infants towards different sounds. As an index of preference, they used a behavioural procedure in which the infant is sucking on a special pacifier measuring the frequency of the sucking bursts. Infants were conditioned to learn that a change in the sucking frequency from the baseline produced either a recording of their mother reading a children’s story or, when the change happened to the opposite direction, a recording of another infant’s mother reading aloud the same story. The procedure was counterbalanced by reversing the pattern in half of the infants. 8 of the 10 infants measured chose to listen to their mother’s voice by shifting the sucking frequency to the respective direction. As the infants were less than 4 days old, the result was a strong indication that the infants had learned their mother’s voice during the fetal period. This learning is most probably strongly based on learning of the typical prosodic patterns of the mother’s speech, and less so for information on higher frequencies (voice quality, phoneme quality), which are attenuated in utero. This assumption is supported by recent studies of neonatal learning (Vouloumanos and Werker 2007), highlighting the importance of low frequencies for neonates. Similar learning results have also been acquired for music (Hepper 1991; Wilkin 1995), but more research is needed to confirm the extent of pre-natal musical learning. Most of the behavioural research of infant learning concentrates on infants of 6 months or older, when more developed behavioural measures are available. Jusczyk and Hohne (1997) tested the word-learning skills of 8-month-old infants
134 Minna Huotilainen and Tuomas Teinonen
by exposing them ten times over a two-week period to 30 minutes of speech containing three recorded stories. The stories contained key words that occurred repeatedly. After a 2-week interval in which the infants did not hear the story, the learning of the words was tested using the head-turn preference procedure: a procedure in which the infants display recognition of familiar or novel items by keeping their heads turned significantly longer towards a loudspeaker playing this type of stimulus (for details, see Jusczyk 1997: 244). Only infants exposed to the stories displayed a familiarity preference when presented with words from the story along with novel words with similar stress and phonetic properties that did not occur in the story. Infants in a control group, who were not exposed to the story, showed no preference. Subsequently, Saffran et al. (2000) used a similar approach to investigate learning of a musical piece by seven-month-old infants. They found that the infants were able to separate the Mozart piano sonata movements that they had been exposed to from similar but novel music. The potential for memorising speech and music exist early in development. However, the general, context-independent memory representations for words and melodies might not be available until later in life. Houston and Jusczyk (2003) studied 7.5 month-old infants for their memory for words. In the first session, they exposed the infants to a short story containing 30 repetitions of the target words. After one day, the infants were tested for their recognition for the target words using the head-turn preference procedure. The infants recognised the target words when they were spoken by the same speaker as in the exposure. However, when the speaker identity was changed, they failed to recognise the words. This suggests that early in the development, the memory for words is at least partly speaker-specific. Trainor et al. (2004) studied 6-month-old infants’ memory for musical melodies with a similar approach. They exposed the infants to a melody 3 minutes a day for seven days. On the eighth day, they tested the infants with the head-turn preference procedure. They found that the infants recognised the melody they had been listening to during the preceding week reflected by a preference towards an alternative, novel melody. However, when either the tempo or the timbre of the familiar melody was changed, the infants showed no preference, indicating that they had memorised the specific tempo and the timbre of the melodies. Thus, analogically to the speaker-specific memory representations for words in speech, the authors concluded that the infants’ memory for music is timbre- and tempo-specific. An alternative explanation is possible, however. It should be noted that the infants showed preference towards a novel melody when compared to a known melody in its original tempo and timbre, but no preference towards a novel melody when compared to a known melody in a changed tempo or timbre. This may be a general indication of a preference towards any novel elements, be
Auditory learning in the developing brain 135
it a melody that is completely novel, or a novel tempo or timbre in an otherwise recognized, known melody. Using fluent natural speech as stimuli with infants is challenging for various reasons. Natural speech varies in multiple dimensions, and for many research questions it is important to keep all but one dimension constant. The natural language is immensely complex, and still today we lack comprehensive knowledge of which cues the infants are able to use and in which way they use them when processing natural language. Artificial languages have been developed to probe single components of language learning (Gómez and Gerken 2000). An artificial language typically contains a small set of simple words or syllables and rules that define how the language is produced. The studies focusing on statistical learning and rule-learning below all use artificial languages to control the properties of the language input. Statistical learning has been suggested as one of the key components to learning in infancy. It comprises all learning mechanisms that are based exclusively on the statistical properties of the input, such as transitional probabilities and statistical distributions. Within the auditory domain, statistical learning could help the infants to organise the speech input into intelligible units, grasp the underlying structure of music, or separate essential sounds from the environment. Saffran et al. (1996) found that 8-month-old infants were able to separate words from a stream of syllables using only information about the transitional probabilities between adjacent syllables. They exposed the infants over a period of two minutes to a continuous stream of syllables containing four embedded three-syllabic pseudowords. After this period, they tested whether the infants were able to separate the pseudowords from non-words, i.e., words which contained the final syllable of one pseudoword and two first syllables of another. Thus, the syllables of the nonwords had been present in the exposure, but were not statistically coherent unlike the pseudowords, in which the three syllables always occurred together. The infants were able to distinguish the pseudowords from the non-words, providing evidence for learning of the transitional probabilities embedded in the sequence. More recently, Teinonen et al. (2009) demonstrated this same ability in neonates. Saffran et al. (1999) also showed that 8-month-old infants display similar learning skills also for sequences of tones. Thus, the statistical learning skills are not tailored for language, but can be applied to other domains as well. Maye et al. (2002) investigated the effects of statistical distributions of phonemes on learning of phoneme categories. They presented 6- and 8-month-old infants with syllables ranging from /da/ to /ta/. These syllables followed either a unimodal distribution, in which the middle sounds in the continuum were presented with the highest frequency or a bimodal distribution, in which the sounds close to the endpoints of the continuum had the highest frequency of occurrence.
136 Minna Huotilainen and Tuomas Teinonen
Using a modified head-turn preference procedure, they showed that after only 2.3 minutes of exposure, the infants exposed to the bimodal distribution of syllables discriminated the test syllables /da/ and /ta/. However, the infants exposed to the unimodal distribution of syllables did not show this discrimination. Maye et al. (2002) concluded that infants are sensitive to the frequency distribution of speech sounds and that they are able to use this information to detect phonetic category structure. Recently, Teinonen et al. (2008) demonstrated that visual information contributes strongly to learning of phoneme boundaries. They showed that even a unimodal distribution of speech sounds can be perceived bimodally by 6-month-old infants with the help of bimodal visual counterparts. Learning of simple rules in 7-month-old infants was tested by Marcus and others (1999). They divided the infants into two groups that were exposed to three-syllabic pseudowords constructed according to one of two rules: either the first two syllables of the words were equal (e.g., “le-le-di”) or the two final syllables were equal (e.g., “ji-li-li”). After a 2-minute-long exposure, they were tested in a variant of the head-turn preference procedure whether they distinguish novel words following the rule that they had been exposed to from novel words following the other rule. In both groups, the infants attended longer the test items that did not follow the rule that they had been exposed to. This demonstrated that they had learned the rule they were exposed to. Not everything encountered in life can or even should be learned. Infants may deal with the abundance of information by having different constraints in the learning mechanisms. It is likely that the evolution has shaped the learning mechanisms to focus on the information relevant to us and to ignore the majority that, from a learning perspective, is noise. Saffran (2002) showed that adult participants acquire an artificial language easier, when it contains predictive dependencies, similar to those present in the phrases of natural languages. The results were also generalised to 7- to 10-year-old children and to non-linguistic sounds. The infant research addressing the constraints or the biases of learning (or in a more restricted domain, statistical learning) has much to achieve. Today, the research is more focused on the important topic of what can be learned, when it can be learned, and what mechanisms are available for acquiring that particular type of information. The future step to be taken is to widen the perspective into what constraints infants have for learning from different types of sensory input. The two combined will give us a more comprehensive view on the learning process.
5.2
Auditory learning in the developing brain 137
Mismatch negativity and other brain indices of simple auditory learning
The mismatch negativity (Näätänen et al. 1978) or MMN is an event-related potential extracted from the EEG and recorded as a response to a change in an otherwise constant auditory environment. It is proposed to index the functioning of the auditory short-term memory or auditory sensory memory. MMN has turned out to be very useful in studying learning during the course of the development, partly because it does not require any active participation from the subject. Indeed, the MMN is best recorded in neonates during sleep (Fellman and Huotilainen 2006; Kushnerenko et al. 2002). The MMN is typically determined from the subtraction signal obtained by subtracting the response of the repetitive sound, standard, from that of the deviating sound, deviant. The response is observed in adults in the 120–300 ms latency range as a negative peak appearing especially strong at the frontocentral scalp sites and as a positive peak at mastoids (Kujala et al. 2007). Naturally, all peaks observed in the subtraction signal are not necessarily MMNs, since the subtraction signal may contain two types of other peaks: Category 1: traces of difference in the obligatory responses of the deviant and the standard, related to habituation, refractoriness, tonotopy, factors correlating with duration or intensity differences of the sounds, and Category 2: later responses that are caused by processes triggered by the MMN, related to, for example, updating or strengthening of the memory trace, involuntary attention switching (in adults called P3a), voluntary attention switching (P3b/P300), semantic processing (N400), or re-orienting after an attention switch (RON). All peaks observed in the subtraction signal should be interpreted with caution. In practise, if the standard and deviant sounds do not differ acoustically from each other very much, all peaks observed in the subtraction signal are either the MMN or belong to Category 2, that is, processes that are triggered by the MMN or some other, still unknown, process in the neonatal brain related to change-detection. Different Category 2 responses may partly explain the different scalp distributions, polarities and latencies found in the different experiments (Kushnerenko et al. 2007). In sum, to answer questions related to learning in infants, it is often sufficient to make sure that the responses seen in the subtraction signal are not Category 1 peaks simply by keeping the acoustic difference between the sounds as small as possible. Another possibility is to compare the responses to the deviant and an identical sound recorded in a condition without memory trace effects, e.g., an equiprobable condition (Kushnerenko et al. 2001a). The first pitch MMN recordings (1000 Hz vs 1200 Hz) in infants were published by Alho et al. (1990a). Thereafter, several studies have shown MMN-like
138 Minna Huotilainen and Tuomas Teinonen
responses to changes in pitch: Kurtzberg et al. (1995) (1000 vs 1200 Hz) with two presentation rates, Cheour et al. (1999) and control groups of Ceponiene et al. (2000) (1000 vs 1100 Hz), Leppänen et al. (1997) (1000 vs 1300 and 1100 Hz) in newborns, Tanaka et al. (2001) (750 vs 1000 Hz) in newborns, Morr et al. (2002) (1000 vs 1200 and 2000 Hz) in a wide age group and Novitski et al. (2007) in neonates in a wide frequency range. These results, even though varying in polarity and scalp distribution, show that infants do build memory representations of repetitive auditory events and notice changes in the pitch of sinusoidal and harmonic tones. Memory for pitch is essential for the infants in their goal of language acquisition. Responses to changes in sound duration have been reported by Kushnerenko et al. (2001a) (200 vs 300 and 400 ms) and by Cheour et al. (2002a) (100 vs 40 and 200 ms) in newborns. Responses to changes in phoneme duration have been reported by Pihko et al. (1999) in 0-6-month-old infants, by Friederici et al. (2002) in 2-month-old infants, and by Kushnerenko et al. (2001b) with pseudowords in newborns. Responses to silent gaps in sinusoidal sounds were reported by Trainor et al. (2001) in 6–7-month-old infants. These studies show that the neonatal brain can learn a specific duration of a sound and use the memory trace of this duration quite accurately in comparing it to incoming sounds. Several groups have also reported infant MMN responses to changes in phonemes. For example, Dehaene-Lambertz and Dehaene (1994) showed responses (/ba/ vs /ga/) in 2–3-month-old infants, Cheour-Luhtanen et al. (1995, 1996) (/y/ vs /i/ but not to /y/ vs /y-i/) in full-term and pre-term newborns, Cheour et al. (1997) (/y/ vs /i/ and /y/ vs /y-i/) in 3-month-olds, and Martynova et al. (2002) (/o/ vs /e/) in neonates. Pang et al. (1998) showed responses to /da/-/ta/-contrast in 8-month-old infants, Cheour et al. (1998) to Finnish and Estonian vowels in 6–12-month-old infants. Before them, also Kurtzberg et al. (1986) had recorded a very slow and negative response to deviant /ta/-syllables presented amongst standard /da/-syllables, but they interpreted this response as modality-nonspecific and not resembling the MMN, and the responses recorded to full words by Deregnier et al. (2000) presented late negative responses in an equiprobable condition. These results show that the neonatal auditory system has highly accurate memory capabilities for phonemes. The memory traces are formed quickly and accurately enough to enable the detection of the change of a phoneme category. This detection is, however, not a proof of the existence of phonetic categories. The data act merely as a demonstration of a sufficient sensory memory system and a sufficient accuracy of the detection of the phoneme-specific features (formant frequencies and their transitions) to allow the separation of the phonemes or syllables.
Auditory learning in the developing brain 139
During the first year of life, the infant acquires a so-called phoneme map. On this map, most probably situated within auditory cortical areas but possibly also within lower brain areas, a representation of each of the native-language phonemes is stored. These long-term memory traces of phonemes serve as templates for speech perception, making it possible to perceive speech quickly and automatically and moving the focus of speech sound perception from acoustic features towards more generalised templates. Evidently depending on the number of vowels in the native language, these phoneme maps develop between the ages of 6 to 12 months for vowels and later for consonants (Kuhl 2004; Cheour et al. 1998). During the process of building the phoneme maps, the auditory system is committing to the native language – the perception of native language is enhanced, while simultaneously the perception of foreign languages becomes less and less accurate. Still at the age of 3–6 years, the foreign phonemes can be added to the native language phoneme map in just a few months by exposing the child to a foreign language daily (Cheour et al. 2002c), while in adults this would probably require years of language lessons. The basic MMN experiments in neonates demonstrate, in sum, that the neonatal auditory system is accurate enough to differentiate sounds with small pitch and duration differences and differences in natural and semi-synthetic phonemes. The results also show that differences in the temporal structure of the sound are detected. The learning in these experiments occurs in the time scale of a few seconds: when the standard sounds start repeating, a memory trace is immediately formed. Cheour et al. (2002b), however, presented healthy neonates with a more challenging task in which learning took a few hours. The infants were presented with semi-synthetic phonemes /i/, replaced occasionally with phoneme /y-i/, synthesised to include formant information that places the phoneme in the midpoint of /i/ and /y/. In the initial experiments, the neonatal brain was not able to detect such a small change in formant frequencies. Thereafter, the infants were exposed to the same phonemes while they were sleeping. Next morning, the neonates presented MMN responses to the same change, demonstrating that learning had occurred on the basis of mere exposure to the phonemes during sleep. These important results have unfortunately not been replicated. It would also be important to know to which extent these memory traces remain and at which ages such sleep learning can take place.
5.3
Magnetoencephalography reveals fetal learning
Magnetoencephalography (MEG) (Huotilainen 2006) is a useful method for investigating learning and its neurophysiological correlates. In neonates, MEG has
140 Minna Huotilainen and Tuomas Teinonen
been used to record MMNm responses (the magnetic counterpart of MMN) in neonates to changes in sound frequency (Huotilainen et al. 2003) and phonemes (Kujala et al. 2004; Pihko et al. 2004; Cheour et al. 2004). In addition to simple MMN experiments, MEG is a good tool also for studies of higher cognitive functions in infants (Huotilainen et al. 2008). For example, Sambeth et al. studied attention allocation (2006) and speech prosody detection (2008) in neonates with MEG. Most importantly, however, the MEG, with specific technical arrangements, is suitable for recording the magnetic field produced by the fetal brain completely noninvasively outside the mother’s abdomen. Most large-scale flat-bottom instruments designed for magnetocardiography are suitable for the purpose, and a specifically designed SARA is also available (Eswaran et al. 2007). Studies with these instruments have demonstrated that also fetuses have the MMNm response (Huotilainen et al. 2005; replicated by Draganova et al. 2005 and 2007 with SARA). After these promising first simple experiments, it is expected that MEG will provide us with more detailed data on learning in the human fetus also with more natural sounds.
5.4
Early learning in complex auditory environments
Recently, the development of ERP research has progressed into more complex research designs. Consequently, the research questions have expanded from questions such as ‘is the brain able to discriminate A from B’ to questions such as ‘can the brain detect a violation of rule X’ or rather ‘can the brain learn the rule X’. This development is especially fruitful with infants, as the behavioural research methods available early in life are limited. In the following, some of the most recent developments within the more complex studies of auditory learning are presented. A crucial task in organising the auditory environment is separating the sounds produced by different sources. This process, called auditory stream segregation enables us to selectively attend to a sound produced by a specific source while ignoring other sounds. For infants, it would be useful in tasks such as language learning by enabling them to focus on, say, the mother’s voice in a noisy environment. Winkler and others (2003) studied auditory stream segregation in newborn infants. They presented the infants with two streams of auditory tones so that every third tone was from the first stream and the rest from the second stream. The first stream contained standard tones and tones deviating in intensity in Oddball fashion. In the second stream, the tones varied both in intensity and in pitch. When the tones in the two streams were close together in frequency (Figure 1a), the deviant tones embedded in the first stream did not evoke an MMN response, suggesting that the infants were not able to segregate the two streams. However,
Auditory learning in the developing brain 141
Figure 1. In the one stream condition (a), intervening tones varied in frequency and intensity. For the two-stream condition (b), the frequencies of one of the streams (white tones) were lowered from those used in the one-stream condition, but the intensity values were retained. (Adapted with permission from Winkler et al. 2003.)
when the two streams were presented in separate frequency ranges (Figure 1b), the deviant tones evoked an MMN response, suggesting that the infants were able to segregate the two streams and form a memory trace for the standard tones in the stream with the higher pitch. The results demonstrate that even newborn infants can separate the sound sources in their auditory environment. In real life, also timbre is likely to notably facilitate the auditory stream segregation. Vanessa Carral and others (2005) exposed newborn infants to sound pairs with a simple abstract rule: in most of the pairs, the second sound had a higher pitch compared to the first sound. However, in one eighth of the cases, the second sound had a lower pitch. The ERPs measured to the onset of the second sound differed significantly between the two cases. As the sound properties were otherwise acoustically balanced, this was seen as evidence that the newborn infants were able to learn the abstract rule, i.e., that typically the pitch of the second sound is higher than that of the first one. Ruusuvirta and others (2003) used a novel approach to study auditory feature binding in newborn infants. They played to the infants tones that varied in frequency and intensity. Altogether four tones were used with all possible combinations of the two intensity levels (50 and 70 dB sound pressure level) and two frequency levels (750 and 1000 Hz). Two of the tones occurred frequently and the other two rarely (together one tone in ten). The tones were assigned to standards and deviants so that the probability distribution of the levels of both features was symmetrical across the categories. This way, a deviant could not be discriminated from a standard on the basis of either frequency or intensity alone. The measured ERPs displayed a significant difference between the standard and deviant responses. Thus, the newborn brain was able to combine the two features of the tones when forming memory representations for these sounds.
142 Minna Huotilainen and Tuomas Teinonen
5.5
Conclusions
Our view of auditory learning in the infant brain is constantly challenged by new results showing more complex, faster, and more specialised learning than what was expected before. Partly these new results stem from new methods and developments of data processing and analysis techniques that directly benefit the infant learning studies. Still, most of the new results are obtained by developing novel paradigms for brain research, allowing scientists to approach more complex and naturalistic situations of infant learning. Especially the neonatal brain is very plastic and can adapt to many kinds of environments. The limits of this adaptation are challenged in infants who have major brain damage or minor disadvantages prior to, during, or immediately after birth. In order to understand the nature of these problems and especially the challenges they pose for speech development and other attention-demanding tasks in the future, the development of neonatal auditory processing, memory, and attention allocation should be studied with natural sounds. This exciting field is progressing rapidly and it is expected that the first applications of early remediation of auditory perceptive problems will be tested in the near future. Indeed, major results are expected due to the plasticity of the brain prior to age 12 months. These new approaches will be especially helpful to infants with brain damage or a very premature birth.
References Alho, K., Sainio, K., Sajaniemi, N., Reinikainen, K. and Näätänen, R. (1990). Event-related brain potential of human newborns to pitch change of an acoustic stimulus. Electroencephalography and Clinical Neurophysiology, 77, 151–155. Carral, V., Huotilainen, M., Ruusuvirta, T., Fellman, V., Näätänen, R. and Escera, C. (2005). A kind of auditory ‘primitive intelligence’ already present at birth. European Journal of Neuroscience, 21, 3201–3204. Ceponiene, R., Hukki, J., Cheour, M., Haapanen, M. L., Koskinen, M., Alho, K. and Näätänen, R. (2000). Dysfunction of the auditory cortex persists in infants with certain cleft types. Developmental Medicine and Child Neurology, 42, 258–265. Cheour-Luhtanen, M., Alho, K., Kujala, T., Sainio, K., Reinikainen, K., Renlund, M., Aaltonen, O., Eerola, O. and Näätänen, R. (1995). Mismatch negativity indicates vowel discrimination in newborns. Hearing Research, 82, 53–58. Cheour-Luhtanen, M., Alho, K., Sainio, K., Rinne, T., Reinikainen, K., Pohjavuori, M., Aaltonen, O., Eerola, O. and Näätänen, R. (1996). The ontogenetically earliest discriminative response of the human brain. Psychophysiology, Special Report, 33, 478–481. Cheour, M., Haapanen, M. L., Hukki, J., Ceponiene, R., Kurjenluoma, S., Alho, K., Tervaniemi, M., Ranta, R. and Näätänen, R. (1997). The first neurophysiological evidence for cognitive brain dysfunctions in CATCH children. NeuroReport, 8, 1785–1787.
Auditory learning in the developing brain 143
Cheour, M., Ceponiene, R., Lehtokoski, A., Luuk, A., Allik, J., Alho, K. and Näätänen, R. (1998). Development of language-specific phoneme representations in the infant brain. Nature Neuroscience, 1, 351–353. Cheour, M., Ceponiene, R., Hukki, J., Haapanen, M.-L., Näätänen, R. and Alho, K. (1999). Brain dysfunction in neonates with cleft palate revealed by the mismatch negativity (MMN). Electroencephalography and Clinical Neurophysiology, 110, 324–328. Cheour, M., Kushnerenko, E., Ceponiene, R., Fellman, V. and Näätänen, R. (2002a). Electric brain responses obtained from newborn infants to changes in duration in complex harmonic tones. Developmental Neuropsychology, 22, 471–480. Cheour, M., Martynova, O., Näätänen, R., Erkkola, R., Sillanpää, M., Kero, P., Raz, A., Kaipio, M.-L., Hiltunen, J., Aaltonen, O., Savela, J. and Hämäläinen, H. (2002b). Speech sounds learned by sleeping newborns. Nature, 415, 599–600. Cheour, M., Shestakova, A., Alku, P., Ceponiene, R. and Näätänen, R. (2002c). Mismatch negativity shows that 3–6-year-old children can learn to discriminate non-native speech sounds within two months. Neuroscience Letters, 325, 187–190. Cheour, M., Imada, T., Taulu, S., Ahonen, A., Salonen, J. and Kuhl, P. (2004). Magnetoencephalography is feasible for infant assessment of auditory discrimination. Experimental Neurology, 190, S44–51. DeCasper, A. J. and Fifer, W. P. (1980). Of human bonding: Newborns prefer their mothers’ voices. Science, 208, 1174–1176. Dehaene-Lambertz, G. and Dehaene, S. (1994). Speed and cerebral correlates of syllable discrimination in infants. Nature, 28, 293–294. Deregnier, R. A., Nelson, C. A., Thomas, K. M., Wewerka, S. and Ceorgieff, M. K. (2000). Neurophysiologic evaluation of auditory recognition memory in healthy newborn infants and infants of diabetic mothers. Journal of Pediatrics, 137, 777–784. Draganova, R., Eswaran, H., Murphy, P., Huotilainen, M., Lowery, C. and Preissl, H. (2005). Sound frequency change detection in fetuses and newborns, a magnetoencephalographic study. NeuroImage, 28, 354–361. Draganova, R., Eswaran, H., Murphy, P., Lowery, C. and Preissl, H. (2007). Serial magnetoencephalographic study of fetal and newborn auditory discriminative evoked responses. Early Human Development, 83, 199–207. Eswaran, H., Haddad, N. I., Shihabuddin, B. S., Preissl, H., Siegel, E. R., Murphy, P. and Lowery, C. L. (2007). Non-invasive detection and identification of brain activity patterns in the developing fetus. Clinical Neurophysiology, 118, 1940–1946. Fellman, V., Kushnerenko, E., Mikkola, K., Ceponiene, R., Leipälä, J. and Näätänen, R. (2004). Atypical auditory event-related potentials in preterm infants during the first year of life: A possible sign of cognitive dysfunction? Pediatric Research, 56, 291–297. Fellman, V. and Huotilainen, M. (2006). Cortical auditory event-related potentials in newborn infants. Seminars in Fetal and Neonatal Medicine, 11, 452–458. Friederici, A. D., Friederich, M. and Weber, C. (2002). Neural manifestation of cognitive and precognitive mismatch detection in early infancy. NeuroReport, 13, 1251–1254. Gómez, R. L. and Gerken, L. (2000). Infant artificial language learning and language acquisition. Trends in Cognitive Sciences, 4, 178–186. Hepper, P. G. (1991). An examination of fetal learning before and after birth. Irish Journal of Psychology, 12, 95–107.
144 Minna Huotilainen and Tuomas Teinonen
Houston, D. M. and Jusczyk, P. W. (2003). Infants’ long-term memory for the sound patterns of words and voices. Journal of Experimental Psychology: Human Perception and Performance, 29, 1143–1154. Huotilainen, M., Kujala, A., Hotakainen, M., Shestakova, A., Kushnerenko, E., Parkkonen, L., Fellman, V. and Näätänen, R. (2003). Auditory magnetic responses of healthy newborns. NeuroReport, 14, 1871–1875. Huotilainen, M., Kujala, A., Hotakainen, M., Parkkonen, L., Taulu, S., Simola, J., Nenonen, J., Karjalainen, M. and Näätänen, R. (2005). Short-term memory functions of the human fetus recorded with magnetoencephalography. NeuroReport, 16, 81–84. Huotilainen, M. (2006). Magnetoencephalography of the newborn brain. Seminars in Fetal and Neonatal Medicine, 11, 437–443. Huotilainen, M., Shestakova, A. and Hukki, J. (2008). Using magnetoencephalography in assessing auditory skills in infants and children. International Journal of Psychophysiology, 68, 123–129. Jusczyk, P. W. (1997). The discovery of spoken language. Cambridge, MA: MIT Press. Jusczyk, P. W. and Hohne, E. A. (1997). Infants’ memory for spoken words. Science, 277, 1984– 1986. Kuhl, P. (2004). Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience, 5, 831–843. Kujala, A., Huotilainen, M., Hotakainen, M., Lennes, M., Parkkonen, L., Fellman, V. and Näätänen, R. (2004). Speech-sound discrimination in neonates as measured with MEG. NeuroReport, 15, 2089–2092. Kujala, T., Tervaniemi, M. and Schröger, E. (2007). The mismatch negativity in cognitive and clinical neuroscience: Theoretical and methodological considerations. Biological Psychology, 74, 1–19. Kurtzberg, D., Vaughan, H. G. Jr. and Novak, G. P. (1986). Discriminative brain responses to speech sounds in the newborn high risk infant. In V. Gallai (Ed.), Maturation of the CNS and evoked potentials (pp. 253–259). Amsterdam: Elsevier. Kurtzberg, D., Vaughan, H. G. Jr., Kreuzer, J. A. and Fliegler, K. Z. (1995). Developmental studies and clinical application of mismatch negativity: Problems and prospects. Ear and Hearing, 16, 105–117. Kushnerenko, E., Ceponiene, R., Fellman, V., Huotilainen, M. and Winkler, I. (2001a). Eventrelated potential correlates of sound duration: Similar pattern from birth to adulthood. NeuroReport, 12, 3777–3781. Kushnerenko, E., Cheour, M., Ceponiene, R., Fellman, V., Renlund, M., Soininen, K., Alku P., Koskinen, M., Sainio, K. and Näätänen, R. (2001b). Central auditory processing of durational changes in complex speech patterns by newborns: An event-related brain potential study. Developmental Neuropsychology, 19, 83–97. Kushnerenko, E., Ceponiene, R., Balan, P., Fellman, V. and Näätänen, R. (2002). Maturation of the auditory change detection response in infants: A longitudinal ERP study. NeuroReport, 13, 1843–1848. Kushnerenko, E., Winkler, I., Horváth, J., Näätänen, R., Pavlov, I., Fellman, V. and Huotilainen, M. (2007). Processing acoustic change and novelty in newborn infants. European Journal of Neuroscience, 26, 265–274. Leppänen, P. H. T., Eklund, K. M. and Lyytinen, H. (1997). Event-related brain potentialsto change in rapidly presented acoustic stimuli in newborns. Developmental Neurophysiology, 13, 175–204.
Auditory learning in the developing brain 145
Marcus, G. F., Vijayan, S., Bandi Rao, S. and Vishton, P. M. (1999). Rule learning in sevenmonth-old infants. Science, 283, 77–80. Martynova, O., Kirjavainen, J. and Cheour, M. (2003). Mismatch negativity and late discriminative negativity in sleeping human newborns. Neuroscience Letters, 340, 75–78. Maye, J., Werker, J. F. and Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82, B101–B111. Morr, M. L., Shafer, V. L., Keruzer, J. A. and Kurtzberg, D. (2002). Maturation of mismatch negativity in typically developing infants and preschool children. Ear and Hearing, 23, 118–136. Novitski, N., Huotilainen, M., Tervaniemi, M., Näätänen, R. and Fellman, V. (2007). Neonatal frequency discrimination in 250–4000-Hz range: Electrophysiological evidence. Clinical Neurophysiology, 118, 412–419. Pang, E. W., Edmonds, G. E., Desjardins, R., Khan, S. C., Trainor, L. J. and Taylor, M. J. (1998). Mismatch negativity to speech stimuli in 8-month-old infants and adults. International Journal of Psychophysiology, 29, 227–236. Pihko, E., Leppänen, P. H., Eklund, K. M., Cheour, M., Guttorm, T. K. and Lyytinen, H. (1999). Cortical responses of infants with and without a genetic risk for dyslexia: I. Age effects. NeuroReport, 10, 901–905. Ruusuvirta, T., Huotilainen, M., Fellman, V. and Näätänen, R. (2003). The newborn human brain binds sound features together. NeuroReport, 14, 2117–2119. Saffran, J. R., Aslin, R. N. and Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926–1928. Saffran, J. R., Johson, E. K., Aslin, R. N. and Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70, 27–52. Saffran, J. R., Loman, M. M. and Robertson, R. R. W. (2000). Infant memory for musical experiences. Cognition, 77, B15–B23. Sambeth, A., Huotilainen, M., Kushnerenko, E., Fellman, V. and Pihko, E. (2006). Newborns discriminate novel from harmonic sounds: A study using magnetoencephalography. Clinical Neurophysiology, 117, 496–503. Sambeth, A., Ruohio, K., Alku, P., Fellman, V. and Huotilainen, M. (2008). Sleeping newborns extract prosody from continuous speech. Clinical Neurophysiology, 119, 332–341. Tanaka, M., Okubo, O., Fuchigami, T. and Harada, K. (2001). A study of mismatch negativity in newborns. Pediatrics International, 43, 281–286. Teinonen, T., Aslin, R. N., Alku, P. and Csibra, G. (2008). Visual speech contributes to phonetic learning in 6-month-old infants. Cognition, 108, 850–855. Teinonen, T., Fellman, V., Näätänen, R., Alku, P. and Huotilainen, M. (2009). Statistical language learning in neonates revealed by event-related brain potentials. BMC Neuroscience, 10, 21. Tervaniemi, M., Kallio, J., Sinkkonen, J., Virtanen, J., Ilmoniemi, R. J., Salonen, O. and Näätänen, R. (2005). Test-retest stability of the magnetic mismatch response (MMNm). Clinical Neurophysiology, 116, 1897–1905. Trainor, L. T., Samuel, S. S., Desjardins, R. N. and Sonnadara, R. R. (2001). Measuring temporal resolution in infants using mismatch negativity. NeuroReport, 12, 2443–2448. Trainor, L. J., Wu, L. and Tsang, C. D. (2004). Long-term memory for music: Infants remember tempo and timbre. Developmental Science, 7, 289–296. Vouloumanos, A. and Werker, J. F. (2007). Listening to language at birth: Evidence for a bias for speech in neonates. Developmental Science, 10, 159–164.
146 Minna Huotilainen and Tuomas Teinonen
Wilkin, P. (1995). A comparison of fetal and newborn responses to music and sound stimuli with and without daily exposure to a specific piece of music. Bulletin of the Council for Research in Music Education, 27, 163–169. Winkler, I., Kusnherenko, E., Horváth, J., Čeponienė, R., Fellman, V., Huotilainen, M., Näätänen, R. and Sussman, E. (2003). Newborn infants can organize the auditory world. Proceedings of the National Academy of Sciences of the USA, 100, 11812–11815.
chapter 6
Neurocomputational models of perceptual organization Susan L. Denham,* Salvador Dura-Bernal,* Martin Coath* and Emili Balaguer-Ballester** * University of Plymouth, UK / ** University of Heidelberg, Germany
6.1
Introduction
Our perceptual systems provide us with sensors and effectors to probe the external world and processing systems which allow us to make sense of the wealth of incoming information. We assume here that the goal of perception is to find simplifying explanations for the incoming signals in order to allow us to detect, differentiate and predict the behaviour of animate and inanimate entities in the world. This is achieved through the formation of associations between different parts of the scene and between different events. Perceptual systems typically receive discontinuous input sequences, so some means for linking discrete events is required. In audition this is self-evident as the signal is inherently temporal and, for most sounds of interest, intermittent. However, even though models of visual perception often treat the problem as one of segmenting static images, the nature of saccadic eye movements means that the visual system also processes sequences of discrete events from which it constructs representations of entities in the environment. It is therefore necessary for perceptual systems to create and maintain temporally persistent representations of detected entities. It is these representations which we consider to constitute ‘perceptual objects’ and the goal of perceptual organisation. The challenge for perceptual systems is to form appropriate representations on the fly, which reflect the nature of external entities as accurately as necessary. Importantly, for autonomous systems this also requires that the system has some means to verify its representations, and the ability to assess the efficacy of its models of the world without supervision. These requirements constrain and determine
148 Susan L. Denham et al.
the processing strategies of perceptual organisation and the architecture of perceptual systems in interesting ways. The approach that we argue for here is that, as originally suggested by Helmholtz (Helmholtz 1860/1962), perception can be understood as a process of inference, in which probabilistic integration of prior knowledge can be combined with and influence subsequent processing. The basic idea is that making predictions at many temporal and featural scales is an effective strategy for discovering ‘what’s out there’, and for refining and verifying the accuracy of representations of the world, because in this way the world can act as its own check. Mismatches between expected and actual sensory experience allow us to identify the things that we don’t know about, and hence fail to predict. Unexpected events are therefore information bearing and can tell us new things about the world. This information can then be used in the creation and refinement or updating of internal representations or models of the world, which in turn lead to better predictions. A natural consequence of these ideas is that the processing architecture and sensitivities should reflect the structure and statistics of natural sensory inputs (Friston 2005; Kiebel et al. 2008). In summary, we suggest that it is this process of active sensory exploration and model building that underpins the development and ongoing operation of intelligent perception. The idea that the brain is continually and automatically extracting patterns from the incoming sensory signals is strongly supported by recent experimental findings on auditory cognition in neonates; suggesting that this processing strategy is innate. Experiments have shown that neonates are sensitive to patterns at many different time scales, including regularities in the sound waveform that define pitch independent of timbre (Háden et al. 2009), regularities in the interval relationships between successive pitches (Stefanics et al. 2009), regularities in the relationship between pitch and timbre which define the perceived size of a sound source (Vestergaard et al. 2009), regularities in the rhythmic structure of events defined by inter-onset timing intervals (Winkler et al. 2009) and regularities in tone sequences defined by repeating patterns of notes (Stefanics et al. 2007). Thus regularities or patterns, which may be defined by a range of features or feature conjunctions, are extracted at many different times scales right from birth, if not before. In this chapter, we will consider models of perceptual organisation in the visual and auditory modalities and try to motivate our view of perception as a process of inference and verification through a number of examples. We first present some basic concepts as useful building blocks for understanding perceptual organisation, and then show how these ideas can provide unifying insights into models of visual and auditory perception. In the following sections we review a number of models of perceptual organisation and focus on a particular class of
Neurocomputational models of perceptual organization 149
models, hierarchical generative models, which have at their core notions of prediction, verification and model building, organized within a hierarchical architecture. Finally we discuss the phenomenon of perceptual bistability which is found in both modalities, and show how this can be interpreted within the proposed generic framework.
6.2
Perception as inference: Basic concepts
Although an attractively simple idea, interpreting and implementing perception as inference is a complex problem, requiring the perceptual system to develop an architecture which reflects the nature of the external world; i.e. the system requires a structure which allows it to represent the multi-scale nature of regularities found in the natural world. This view of perception, which has gained support from physiological studies and has inspired a number of computational models, highlights some important processes that are necessary for a generic inferential architecture. Here we outline some of the basic concepts. Change: Processing aimed at highlighting changes in the incoming activity is prevalent within all sensory systems. In vision a change in texture or contrast typically defines the edge of an object, and many models of edge detection motivated by this idea have been proposed; e.g. (Jehee et al. 2007). In the auditory system, sharply enhanced responses are found at event onsets at all levels of the system, and modeling change detection, primarily in order to identify event onsets, has been widely studied; e.g. (Coath et al. 2005; Fishbach et al. 2001). Regularities: However, change is only meaningful within the context of some regularity, i.e. change and regularity are relative concepts. The key is to find and represent patterns or regularities in the incoming activity so that the system is sensitive to significant changes, i.e. new information, while not being unnecessarily sensitive to insignificant variations. Adaptation to stimulus statistics can be understood from this point of view. The result is a heightened response to stimuli which differ in feature distribution from that to which the system has adapted; examples include the auditory inferior colliculus (Dean et al. 2005) and primary somatosensory cortex (Garcia-Lazaro et al. 2007). Multi-scale analysis: Change and regularity are meaningful only within a particular time or featural scale, and the scale of analysis has to be chosen appropriately. For example, the time scale necessary for detecting the patterns which define musical form, would completely obscure the patterns in the timing of individual events which lead to a sense of rhythm, or the even more rapid periodic variations in pressure which lead to a sense of pitch; conversely, the spatial/ featural scale necessary for resolving the individual letters on a page in order to
150 Susan L. Denham et al.
read a book would generate far too complex a representation for appreciating larger scale objects. Thus a multi-scale analysis of the world and the ability to switch flexibly between scales is an essential aspect of perception. Resolution versus integration: Although in some cases it may be necessary to integrate information over long time scales (or large spatial scales) in order to perceive regularities, it is also important to be able to construct sharp boundaries between unrelated objects; i.e. mixing into a model or representation of an object or event activity belonging to some other object or event would contaminate the model, thereby reducing its predictive power. This problem is particularly acute at the boundaries of objects or events; and a sharp reset of the long scale integration is required if information relating to some other object is detected. Spatial organisation of important features: The extraction and representation of features in terms of some form of topographic organisation is an intrinsic aspect of neural processing architectures. Biological sensory systems are typically organized hierarchically, with processing levels sensitive to increasing spatial (Felleman et al. 1991) and temporal (Hasson et al. 2008) scales at higher levels of the hierarchy. This allows increasingly more complex features, spanning increasingly longer time scales, to be represented. Joint time/feature representations: Although time may be objectively measured, the internal representation of time, time intervals and durations, is also fundamentally important, and necessary for generating appropriately timed behaviours, for example. An important insight coming out of auditory modeling studies is that the transformation of time through the projection into a topographic representation is useful at many time scales. The prediction of event timing is implicit in this representation, and deviations in the expected timing of events can be detected by changes in the patterns of activity across the map, rather than by explicit forward projections in time. Even more importantly the remapping of time can be made at many places in the processing hierarchy, which leads to an emergent joint time/feature representation, which we believe may be a fundamental aspect of perceptual processing. Competition between competing representations: Localized lateral inhibition is commonly found in the nervous system, and expresses a local competition between competing features which can serve to sharpen representations. An important outcome of this organisation is that finely balanced competition between different interpretations of the sensory input can ensure that the system remains in a critical state, easily controllable through top-down signals. An influential model of visual attention, the ‘biased competition’ model, is based on the idea that competition at all levels of the system provides a substrate whereby attention can influence processing by modulating or biasing the competition in response to task demands (Beck et al. 2008; Deco et al. 2002; Desimone et al.
Neurocomputational models of perceptual organization 151
1995; Duncan 1984), causing the system to switch to or maintain the required organisation. As described in the final section similarities between the dynamics of visual and auditory bistability suggest that competition may be a generic aspect of biological processing, necessary for resolving the ambiguities inherent in the sensory input.
6.3
Hierarchical generative models in vision
It has long been appreciated that information falling on the retina cannot be mapped unambiguously back onto the real-world; very different objects can give rise to similar retinal stimulation, and the same object can give rise to very different retinal images. So how can the brain perceive and understand the outside visual world based on these ambiguous two-dimensional retinal images? A possible explanation comes from the generative modeling approach, which has as its goal the mapping of external causes to sensory inputs. By building internal models of the world the brain can explain observed inputs in terms of inferred causes. This in turn suggests the visual cortex might have evolved to reflect the hierarchical causal structure of the environment which generates the sensory data (Friston 2003b; Friston 2005; Friston et al. 2006a; Friston et al. 2006b), and that it can consequently employ processing analogous to hierarchical Bayesian inference to infer the causes of its sensations, as depicted in Figure 1. Making inferences about causes depends on a probabilistic representation of the different values the cause can take, i.e. a probability distribution of the causes. This suggests replacing the classical deterministic view, where patterns are treated as encoding a feature (e.g. the orientation of a contour), with a probabilistic approach where population activity patterns represent uncertainty about stimuli (e.g. the probability distribution over possible contour orientations). The model maps well onto anatomical, physiological and psychophysical aspects of the brain. Visual cortices are organized hierarchically (Felleman et al. 1991), in recurrent architectures using distinct forward and backward connections with functional asymmetries. While feedforward connections are mainly driving, feedback connections are mostly modulatory in their effects (Angelucci et al. 2003; Hupe et al. 2001). Evidence shows that feedback originating in higher level areas such as V4, IT or MT, with bigger and more complex receptive fields, can modify and shape V1 responses, accounting for contextual or extra-classical receptive field effects (Guo et al. 2007; Harrison et al. 2007; Huang et al. 2007; Sillito et al. 2006). As we will see in this section, hierarchical generative models are reminiscent of the described architecture, sharing many of the structural and
152 Susan L. Denham et al.
Figure 1. Learned internal model in visual cortex reflects hierarchical causal structure of the environment which generates the sensory input. The ambiguous information provided by sensory inputs (e.g. 2D retinal image) is only a function of the internal state of the World (e.g. 3D objects). The brain (observer) needs to inversely map this function as precisely as possible to generate an accurate internal representation of the World. The hierarchical organization of the brain suggests it has evolved to reflect the inherent hierarchical structure of the World. (Adapted with permission from Rao 1999.)
connectivity properties. Moreover, they predict basic synaptic physiology such as associative and spike-timing-dependent plasticity. In terms of the neural mechanisms involved, although it is not yet practical to test the proposed framework in detail, there are some relevant findings using functional magnetic resonance imaging (fMRI) and electrophysiological recordings. Murray et al. (Murray et al. 2004) showed that when local information is perceptually organized into whole objects, activity in V1 decreases while activity in higher areas increases. They interpreted this in terms of high-level hypotheses
Neurocomputational models of perceptual organization 153
or causes ‘explaining away’ the incoming sensory data. Further, Lee and Mumford (Lee et al. 2003) studied the temporal response of early visual areas to different visual illusions, concluding there are increasing levels of complexity in information processing and that low-level activity is highly interactive with the rest of the visual system. Results of both experiments are consistent with the generative modeling approach. The perspective of Bayesian inference also provides a unifying framework for modeling the psychophysics of object perception (Kersten et al. 2004; Knill et al. 1996), resolving its complexities and ambiguities by probabilistic integration of prior object knowledge with image features. Similarly, visual illusions, which are typically interpreted as errors of some imprecise neural mechanism, can in fact be seen as the optimal adaptation of a perceptual system obeying rules of Bayesian inference (Geisler et al. 2002). In further support of this view (Kording et al. 2004) concluded that the central nervous system also employs an inferential approach during sensorimotor learning. Additionally, phenomena such as repetition suppression in single unit recordings, mismatch negativity and the P300 in electroencephalography can also be explained in this framework (Friston et al. 2006a). It is not surprising, therefore, that recent reviews of cortical function (Carandini et al. 2005; Olshausen et al. 2005; Schwartz et al. 2007), point in the direction of more global approaches, which take into account the feedback of information from higher areas. One of the first to propose formulating perception in terms of a generative model was Mumford (Mumford 1991, 1992, 1996) , based on ideas from Grenader’s pattern theory and earlier suggestions by Helmholtz (Helmholtz 1860/1962). Applied to visual perception, this theory states that what we perceive is not the true sensory signal, but a rational reconstruction of what the signal should be. The ambiguities present in the early stages of processing an image, never become conscious because the visual system finds an explanation for every peculiarity of the image. Pattern theory is based on the idea that pattern analysis requires pattern synthesis; thereby adding to the previous purely bottom-up or feedforward structure, a top-down or feedback process in which the signal or pattern is reconstructed. The Helmholtz machine (Dayan et al. 1995) extended these ideas by implementing inferential priors using feedback. Here the generative and recognition models were both implemented as structured networks whose parameters have to be learned. The connectivity of the system is based on the hierarchical top-down and bottom-up connections in the cortex. This layered hierarchical connectionist network provides a tractable implementation to computing the exponential number of possible causes underlying each pattern, unlike other approaches such as the Expectation-Maximization algorithm which run into prohibitive
154 Susan L. Denham et al.
computational costs. The key insight is to rely on using an explicit recognition model with its own parameters instead of using the generative model parameters to perform recognition in an iterative process.
6.4
Predictive coding
A similar approach, inspired by concepts of Kalman filtering was explored by (Rao et al. 1999). In predictive coding, which under the Bayesian framework is derived from maximizing the posterior probability, each level of the hierarchical structure attempts to predict the responses of the next lower level via feedback connections. The difference between the predicted and actual input, is then transmitted to higher order areas via feedforward connections, and used to correct the estimate. The predictions are made on progressively larger scale contexts, so for example, if the ‘surround’ can predict the ‘centre’, little response is evoked by the ‘error-detecting’ neurons. In other words, when top-down predictions match incoming sensory information, the lower-level cortical areas are relatively inactive. However, when the central stimulus is isolated or difficult to predict from the surrounding context, then the top-down predictions fail, and a large response is elicited. The basic architecture, shown in Figure 2, was implemented using the Kalman filter equation. The model achieved robust segmentation and object recognition, even with noisy images and occluded objects; and managed to show how receptive fields (model basis vectors) similar to those reported in V1 simple cells, could be learnt by training the network with natural images. It also elegantly explains the end-stopping effect, as V1 error-signaling neurons’ activity is reduced only when higher-levels (with larger receptive fields) manage to correctly predict the centre response using the surround stimulus.
Figure 2. Predictive coding architecture implemented using the Kalman filter: feedback paths generate predictions, while feedforward paths transmit the differences between the predicted and actual input; i.e. the prediction errors. (Adapted with permission from Rao and Ballard 1999.)
Neurocomputational models of perceptual organization 155
A similar predictive coding scheme, simulated using a two-layer neuronal hierarchy (Friston et al. 2006a), managed to reproduce the phenomenon of repetition suppression observed in the brain. The model illustrates the basic principles of a rich and complex cortical theory which proposes that brain dynamics act to minimize the free energy available, or under simplifying assumptions, suppress prediction errors. Applying hierarchical Bayesian models to infer the causes of sensory input arises as a natural consequence of this theory. Supporting experimental evidence shows the suppression of prediction error through feedback connections (Harrison et al. 2007) as predicted by the free energy theory.
6.5
Bayesian belief propagation (BBP)
A different set of computational models are based on Bayesian belief propagation (BBP), which can be considered as a general algorithm for the implementation of Bayesian inference. In this approach messages (probability distributions) are passed between the processing nodes of a graphical model (Bayesian network) to compute locally each posterior probability. This is achieved by combining at each node, higher with lower level evidence. The hierarchical arrangement and the fact that all computations can be carried out locally makes the algorithm suitable for neural implementation. In this section we describe three different BBP models providing significant insights to this approach. In the first example of this approach, it was shown that a model, implemented as a single-layered recurrent network, was able to perform a simple visual motion detection task (Rao 2004, 2006). The firing rate of each neuron encoded the log of the posterior probability of being in a specific state, where the different states represented all possible combinations of position and direction of motion of the stimulus. To model the likelihood function, the input image is filtered by a set of feedforward weights (gaussian functions); while the prior is approximated by multiplying the previous posterior probability by a set of recurrent weights which represent the transition probabilities between states. The probability distributions are treated as the messages of the BBP architecture. The main contribution of this model is that it managed to implement Bayesian inference using equations representing a recurrently connected network of spiking neurons. The model was later extended by adding a second layer of Bayesian decision-making neurons that calculated a log-posterior ratio to perform the random-dot motion detection task. A similar implementation using a 3 level hierarchical network with 2 interconnected pathways for features and locations (as observed in the visual system), was used to model attention (Rao 2004).
156 Susan L. Denham et al.
At a more abstract level the BBP approach was used by (Lee et al. 2003) to explain processing in the ventral visual pathway (V1, V2, V4 and IT). The visual cortex was suggested to represent beliefs or conditional probability distributions on feature values, which are passed forward and backward between the areas to update each other’s distribution. In other words, each area is assumed to compute or infer a set of beliefs based on the immediate bottom-up data conveyed through the feedforward pathway, and the top-down data or priors conveyed through feedback connections. In this way, the visual areas are linked together as a Markov chain, so that each area is continually updated based on changes to the conditional probabilities in both lower and higher areas (e.g. V2 is influenced only by V1 and V4). This model implements hierarchical Bayesian inference by incorporating particle filtering. This mathematical tool approximates high-dimensional probability distributions using a set of sample points or particles and an attached set of weights that represent their probabilities. This algorithm has provided outstanding results in computer vision and artificial intelligence, and is argued to be a promising technique for statistical inference in large, multidimensional domains. The essential idea is to compute for each area, not only one hypothesis for the true value of its sets of features, but a moderate number of hypotheses. This allows multiple highprobability values to stay alive until longer range feedback loops have had a chance to exert an influence. Global competition involving all aspects of each hypothesis allows the system to find the one that optimally integrates low-level and high-level data, making the probability of all other particles suddenly collapse. To better understand the concept of how BBP can be applied to visual processing, consider the shadowed face example shown in Figure 3. Initially, bottomup cues from the illuminated part of the face cause a ‘face’ hypothesis to become activated at the higher levels. Then information about the likely features and proportions of a face is conveyed through top-down feedback to the lower-level high resolution buffer. Re-examination of the data results in a reinterpretation of the faint edge in the shadowed area as an important part of the face contour. This new detailed information can then be used by the higher levels to infer additional characteristics of the image, such as the precise identity of the face. It is important to also note that the Bayesian framework provides a way of reconciling two existing contradictory approaches, i.e. adaptive resonance or biased competition, where responses are sharpened (‘stop gossiping’ effect), with predictive coding, where responses are suppressed due to high-level explaining away (‘shut-up’ effect). The dilemma is resolved by considering two functionally distinct subpopulations, one encoding the causes of the sensory input or current prediction; and the other encoding the prediction error used to refine the probability distribution of causes. The compatibility of these two approaches has
Neurocomputational models of perceptual organization 157
Figure 3. Bayesian belief propagation architecture. (a) Initially, bottom-up cues from the illuminated part of the face (B1) cause a ‘face’ hypothesis to become activated at the higher levels. Then information about the likely features and proportions of a face is conveyed through top-down feedback (B2) to the lower-level high resolution buffer. Re-examination of the data results in a reinterpretation of the faint edge in the shadowed area as an important part of the face contour. (b) Each area computes a set of beliefs, Xi, based on bottom-up sensory data (Xi–1) and top-down priors (P(Xi/Xi+1), which are integrated according to the Bayesian inference equation. Beliefs are continually updated according to changes in earlier and higher areas to obtain the most probable distribution of causes at each level. (Adapted with permission from Lee and Mumford 2003.)
recently been mathematically proven, and it has been highlighted that previous apparent inconsistencies may have resulted from the strong emphasis predictive coding places on error-detecting nodes, and the corresponding under-emphasis on prediction nodes (Spratling 2008). The prediction nodes, which can be considered equivalent to the Belief nodes, are responsible for maintaining an active representation of the stimulus and will show an enhanced response when topdown knowledge correctly predicts the input; while, at the same time, the errordetecting nodes show suppression. An interesting architecture, and one which takes into account temporal as well as spatial information, is the Hierarchical Temporal Memory (HTM) proposed by (Hawkins et al. 2007). Again, this model assumes that images are generated by a hierarchy of causes, and that a particular cause at one level unfolds into a sequence of causes at a lower level. HTMs therefore attempt capture the algorithmic and structural properties of the cortex, which in turn are argued to match the causal hierarchy of image generation (Figure 1). When an HTM sees a novel input it determines not only the most likely high-level cause of that input, but also the hierarchy of sub-causes.
158 Susan L. Denham et al.
During a learning stage, HTMs attempt to discover the causes underlying the sensory data. Each node consists of a spatial pooler that learns the most common input spatial patterns, and a temporal pooler that groups these patterns according to their temporal proximity and assigns them a label. For example a set of corner lines at different positions (input spatial patterns), could be grouped into a common temporal group labeled ‘corner’. To do this the input must be a motion picture with objects moving around, so that temporal information can be extracted. The spatial pooler in the parent node combines the output of several lower-level nodes, i.e. a probability distribution over the temporal groups of those nodes. This allows it to find the most common co-occurring temporal groups below, which then become the alphabet of ‘spatial patterns’ in the parent node, e.g. features of a face (eyes, nose, mouth, ...) which always move together. The learning process is repeated throughout the hierarchy to obtain the causes at the highest level. As a result a tree structured Bayesian network is obtained, based on the spatio-temporal characteristics of the inputs. For the inference stage, as higher levels have converging inputs from multiple low-level regions, several parallel and competing hypotheses will finally converge on a high-level cause leading to recognition. This allows prior information related to that object to disambiguate or explain away lower level patterns. The simulation results for a line drawing recognition system, using this model, showed very robust scale, translation and distortion invariance even for very noisy inputs. It also provides a Bayesian explanation for the experimental finding that when activity is increased in higher level object processing areas, the responses in lower levels is reduced. The reason is that the high-level, more global interpretation narrows the hypothesis space maintained by lower level regions, which leads to a reduction in total activity. The main difference between HTM and other probabilistic models, such as the Helmholtz machine, is the use of temporal information to achieve position and scale invariance. A similar invariance approach was adopted in Hmax, a biologically realistic feedforward model of vision proposed by Riesenhuber et al. (1999), which has proved successful for understanding both perceptual and physiological aspects of object recognition (Cadieu et al. 2007; Serre et al. 2007). In fact this model shares the same hierarchical structure, including simple and complex layers which closely resemble HTM’s spatial and temporal groups. Units in the lower level, modelled using Gabor filters, show receptive field sizes, spatial frequencies and orientation bandwidths consistent with V1 simple cells. These are pooled over space and scale using a max operation to model the properties of complex cells (position and size invariance). Groups of complex cells are combined to form object features, which are then used as prototypes to compute the intermediate level’s response by performing a template-matching operation (selectivity). The
Neurocomputational models of perceptual organization 159
invariance and selectivity operations are repeated up the hierarchy, as observed in the ventral pathway, resulting in invariant object-tuned units which resemble neurons in inferotemporal cortex. Recently we have shown that this feedforward architecture can also be interpreted in probabilistic terms, and extended it to include feedback using the BBP algorithm (Dura et al. 2008). Feedforward responses are understood as probability distributions over the set of features at the different positions, which combine hierarchically to obtain, at the top level, the most probable object represented by the input image. The stored invariant prototype of this object is then fed back down the network and combined with the feedforward response at each level to obtain the resulting belief (see Figure 4). However, high-level invariant object prototypes generate ambiguous and diffuse feedback which needs to be refined. The disambiguation process, based on cues from the existing feedforward responses, uses a local extrapolation algorithm which exploits collinearity, co-orientation and good continuation to guide and adapt the feedback (see Figure 5). Previous studies have shown a plausible implementation of the interactions between feedback and lower-level perceptual grouping by the laminar circuits of V1 and V2 (Raizada et al. 2003). The resulting lower level beliefs, after top-down influence, are then fed forward to complete the recurrent cycle which gradually builds the optimal beliefs in each level. The proposed model performs successful feedforward object recognition, including cases of occluded and illusory images. Recognition is both position and size invariant. The model also provides a functional interpretation of feedback connectivity which accounts for several observed phenomena. The model responses qualitatively match representations in early visual cortex of occluded and illusory contours (Lee et al. 2001; Rauschenberger et al. 2006), resulting from higher levels imposing their knowledge on lower levels through feedback; and fMRI data showing that when activity is increased in high-level object processing areas, the response in lower levels is reduced, as feedback narrows the hypothesis space maintained by lower regions. Additionally a dynamic mechanism for illusory contour formation, which can adapt a single high-level object prototype to Kanizsa’s figures of different sizes, shapes and positions, is proposed. The mechanism is consistent with a recent review on illusory contours (Halko et al. 2008), and is supported by experimental studies which suggest that interactions between global contextual feedback signals and local evidence, precisely represented in V1, is responsible for contour completion (Lee et al. 2001; Stanley et al. 2003). By imposing top-level priors the model can also simulate the effects of spatial attention, priming and visual search.
160 Susan L. Denham et al.
Figure 4. Diagram of hierarchical model of object recognition (Dura et al. 2009). A Kanizsa’s square input image is used to illustrate the output of the model at different points. Units is the lower level (S1), modelled using Gabor filters, show tuning parameters consistent with V1 simple cells. Their outputs are pooled over space and scale using a max operation to model the properties of complex cells (S2) which show position and size invariance. Groups of complex cells are combined to form object features, which are then used as prototypes to compute the intermediate level’s response (S2 and C2) by performing a template-matching operation (selectivity). The invariance and selectivity operations are repeated up the hierarchy resulting in invariant object-tuned units (S3 and C3) which resemble neurons in inferotemporal cortex. Feedforward responses can also be understood as probability distributions over the set of features at the different positions, which combine hierarchically to obtain, at the top level, the most probable
Neurocomputational models of perceptual organization 161
object (square) represented by the input image (Kanizsa’s illusory square). The stored invariant prototype of this object is then fed back (C3F and S3F) down the network and combined with the feedforward response at each level to obtain the resulting Belief (BS3, BS2 and BS1). These are then fed forward back up the network, generating a recurrent process which, due to the overlapping of units and the extrapolation mechanism between C1F and S1F, builds up the contour of the illusory square in the lower levels as observed experimentally.
Figure 5. Feedback disambiguation using cues from the feedforward response (Dura et al. 2009). Left: Feedback from single high-level prototype completing distorted versions of the Kanizsa’s square. Note the contour completion will only occur when the variations lies within the position and scale invariance range of the high-level prototype; and when there is enough local evidence to guide the local disambiguation process (interpolation/ extrapolation). The second condition is not satisfied in the figure with blurred edges. Right: Example of feedback disambiguation for a Kanizsa input image smaller than the stored prototype. The feedback signal C1F is derived from a chain of processes starting at the highest level where the invariant object prototype of the square is fed back, as shown in Figure 4. The precise spatial information contained in S1 provides the cues required to further refine C1F and generate S1F. In this way, Kanizsa’s illusory squares of different sizes can be completed using the same high-level square prototype.
162 Susan L. Denham et al.
6.6
Hierarchical generative models in auditory perception
The idea of using predictive coding techniques in processing sounds is not new, and in the 1960’s was first applied to speech compression. The idea was to predict each new sound sample by a weighted sum of preceding samples, with the weights determined adaptively to minimise the prediction error (Schroeder 1999). Since speech sounds, and hence the weights, change more slowly than the sampling rate of the signal, a more efficient representation could be achieved. This technique is also relevant to speech recognition, since the weights carry information about the sound uttered, and can therefore be used to classify the sound. Furthermore, the hierarchical nature of communication signals led naturally to the development of hierarchically structured speech recognition systems. At progressively higher levels of the system, sequences of speech sounds, represented typically by hidden Markov models are transformed into words, and at even higher levels, language models are used to transform word sequences into phrases or sentences. Information from higher levels is used to constrain and guide the choices made at lower levels. Most automatic speech recognition systems have been constructed to solve an engineering problem rather than to explain neural processing, nevertheless, the resulting solutions are rather similar. This makes sense as the constraints are determined by the structure of the signal. As we have mentioned previously an explicit account of cortical organisation in terms of hierarchical predictive coding was proposed by Friston (Friston 2003a, 2005). In this model, at each level there are error and expectation units, feedforward signals encode prediction errors from the level below, and top-down signals provide modulatory information regarding prior expectations. More recently this model was extended to explore the notion that the cortical anatomy should reflect the temporal hierarchy of the environment (Kiebel et al. 2008). The validity of these ideas were demonstrated in a model of birdsong learning which was able to extract temporal structure at two time scales from a rapidly changing sensory input. The hypothesis is that sensory systems have evolved to infer the causes of sensory input through the formation of representations which provide temporally stable predictions about future input. In this account it is suggested to use the term, ‘concept’ to refer to a state which exists for some time, and ‘percept’ to refer to more transient fluctuations. Concepts provide the contextual attractors within which the lower level trajectories unfold. In essence, a concept at one level, subsequently becomes a percept in relation to the level above, and so on. Competition between representations at each level ensures that sensory systems act as a set of nested, internally consistent dynamical systems. This issue of internal consistency/inconsistency is one that we will return to during the discussion of perceptual bistability. Clearly, the notion of concepts and percepts described in
Neurocomputational models of perceptual organization 163
this way coincide with ideas of objects and object parts. The distinction may be useful though as it highlights the temporal differences between representations at successive levels in the hierarchy.
6.6.1 Pitch An aspect of sensory processing which to our knowledge has not been previously studied within the hierarchical generative framework is the need to consider how a multi time-scale model can cope with abrupt changes in context; i.e. a change in the underlying cause. For example, in a conversation between two speakers, it is necessary to ensure that signals from one person are not confused with those from the other. A model which uses fixed temporal integration windows would inevitably at some level mix the two. We have recently considered this problem at a much lower level of the perceptual hierarchy in the formation of a hierarchical predictive model of pitch perception (Balaguer-Ballester et al. 2009 ). Pitch, one of the most important features of auditory perception, underlying the perception of melody in music and prosody in speech, is usually associated with periodicities in the sound signals (Moore 2004). Hence, a number of models of pitch perception are based upon a temporal analysis of the neural activity evoked by the stimulus (Cariani et al. 1996a, 1996b; Licklider 1951; Meddis et al. 1991, 1997; Slaney et al. 1990). As suggested in the introduction, a useful way to think about auditory perception is in terms of the iterative extraction of temporal regularities detected in the input at each stage of processing, and the projection of these temporal patterns along a spatial axis ordered in terms of the scale of the regularity. For example, at the first stage of processing, the regularities inherent in the individual frequency components of the sounds are projected along the cochlea scaled from short (high frequency) to long (low frequency) time scales. The periodicities upon which the sense of pitch depends are essentially second order regularities which can be extracted by projecting regularities in the patterns of auditory nerve activity onto yet another axis ordered from short periods (high pitches) to long periods (low pitches), within an initially spectrally localised organisation. The subsequent extraction of pitches, can be derived by combining periodicity patterns within spectral clusters which are formed through the detection of sharp discontinuities in the temporal patterns across different frequency channels (Balaguer-Ballester et al. 2007). A simple mathematical technique for extracting periodicity regularities is autocorrelation, which hence forms the basis for most temporal models of pitch perception. Most of these models compute a form of short-term autocorrelation of the simulated auditory nerve activity using an exponentially weighted integration
164 Susan L. Denham et al.
time window (Balaguer-Ballester et al. 2008; Bernstein et al. 2005; de Cheveigne et al. 2006; de Cheveigné 2005; Meddis et al. 1991, 1997). Autocorrelation models have been able to predict the reported pitches of a wide range of complex stimuli. However, choosing an appropriate integration time window has proved problematic. The problem is that in certain conditions, the auditory system is capable of integrating pitch-related information over time scales of several hundred milliseconds (Hall et al. 1981), while at the same time it is also able to follow changes in pitch or pitch strength with a resolution of only a few milliseconds (Wiegrebe 2001). This stimulus-dependent nature of pitch perception suggests that a simple fixed feed-forward model can never adequately account for the phenomenon. One way to think about it is as a change detection problem; i.e. if the pattern you are interested in emerges over a fairly long time, how is it possible to forget the old and no longer relevant pattern and establish the new one quickly? This suggests a formulation of pitch processing in terms of a generative modelling approach. The proposed model consists of a feed-forward as well as a feedback process, which modifies the parameters of feed-forward processing (Balaguer-Ballester et al. 2009). The role of the feed-forward process is to detect pitches in the incoming stimulus. This is achieved by extracting the periodicities within each frequency channel using autocorrelation at relatively short time scales, and then integrating the output of this stage at the next level in the hierarchy using a much longer time constant. In contrast the role of the feedback processing is to detect unexpected changes in the input stimulus, such as the offset of a tone in a sequence, and to modulate the temporal integration windows of the feed-forward processing when such changes occur. A stimulus change typically requires a fast system response, so that information occurring around the time of the change can be updated quickly. Thus, during periods when there is a significant discrepancy between the current and expected pitch estimates, the effective integration time windows become very short, so that the “memory” component of the model response is reduced to near zero and essentially reset. The principal novelty of the multi-scale predictive model was the introduction of feed-back modulation of the recurrent inhibitory processes which operate at each level (Balaguer-Ballester et al. 2009) . These top-down signals serve to alter the “effective” integration windows used at the different stages. As a result, the responses of the model adapt to recent and relevant changes in the input stimulus. This approach is consistent with the available neuroimaging data: a sustained pitch response (SPR) in lateral Heschl’s Gyrus has been shown to adapt to the recent temporal context of a pitch sequence, enhancing the response to rare and brief events (Gutschalk et al. 2007). The rapid reset caused by mismatches between the predictions at various levels of the model is important because it allows the representation of new information to build up with as little contamination as
Neurocomputational models of perceptual organization 165
possible from previous activity. This suggests that precisely-timed offset activity should also be observed in the auditory system, as is indeed the case (Kadner et al. 2008). Furthermore, the observed rapid offset response is modulated by feedback, consistent with the proposed model circuit. A prediction of the model is that blocking the feed-back circuits, i.e. cortical modulation of sub-cortical processing, would impair the segmentation of successive sound events (BalaguerBallester et al. 2009). An important computational principle embodied in this model is the transformation of time, or more precisely time intervals, into a topographic representation at many time scales. Prediction of expected timing is implicit in the representation, rather than involving an explicit projection forward in time, and deviations in the expected timing of events can be detected by changes in the patterns of activity across the map. This principle was explored in another modeling study in which the same architecture was applied at a much higher cognitive level to extract rhythmic structure, and to signal when changes in rhythm would become salient.
6.6.2 Rhythm One of the most compelling features of music is the sense of rhythm it engenders. Humans can quickly and accurately interpret musical rhythmic structure, and do so very flexibly; for example, they can easily distinguish between rhythmic, tempo and timing changes (Honing 2002). What are the representations and relevant features that humans so successfully use to interpret rhythm? One idea is that the perceived metrical hierarchy emerges from the relative coincidence of activity at one or more level of representation, i.e. it arises from interactions between different time scales expressed in the patterns of activity in the various feature spaces; i.e. the joint time feature spaces outlined in the introduction. Structural grouping can occur across phrases longer than a single measure, and can be achieved by associations between accentuations (including harmonic and melodic implications) or by the use of expressive timing which breaks the strict regularity of the previously established meter. This suggests that the way in which these structural grouping cues affect perception can be understood in terms of the establishment of regularities which are violated , for example by means of by the use of expressive timing. Although this cue is rather subtle, it may nevertheless be sufficiently perceptible to make the events at that point in time more salient (Honing 2002). This is consistent with experimental findings that the perception of expressive timing is influenced by the listener’s categorical perception of time, since the differences are detected in relation to the regularities of the current context (Desain et al. 2003).
166 Susan L. Denham et al.
Figure 6. Model predictions and experimental thresholds (Vos et al. 1996) of tempo change detection in relation to the base tempo rates shown on the x axis.
In modeling the perception of rhythm, one problem is to differentiate variability in timing, such as that arising from the use of expressive timing, from an actual change in tempo, and to account for the dependence of perceptual sensitivity upon tempo. In an exploratory study, the model described above with an appropriate choice of time constants at different levels of the hierarchy was able to replicate human performance in a tempo discrimination task, illustrated above. Reset signals in the model caused by mismatches with expectations were used to indicate when a variation in timing would cause the model to decide that something new had happened, and was inconsistent with its current representation, as suggested by experimental studies (Vuust et al. 2009; Vuust et al. 2005). The modified pitch model was able to accurately predict the tempo dependent thresholds found in a perceptual experiment (Vos et al. 1996).
6.6.3 Tonality Natural environments contain highly structured information systems to which we are exposed in everyday life. As argued here, the human brain internalizes these regularities by passive exposure, and the acquired implicit knowledge influences perception and performance. Some evidence that similarity of musical scales and consonance judgments across humans arise from the statistical structure of naturally occurring periodic sounds has been reported (Schwartz et al. 2003). Motivated by evidence that image statistics predict the response properties of some aspects of visual perception, the relationship between music statistics and human
Neurocomputational models of perceptual organization 167
judgments of tonality were investigated (Serrà et al. 2008). A statistical analysis of chroma features, extracted from audio recordings of Western music was used to build a tonal profile of the music. This empirical profile was compared with a range of tonal profiles proposed in the literature, either cognitively or theoretically inspired. The very high degree of correlation between the profiles supports the notion that human sensitivity to tonality faithfully reflects the statistics of the musical input to which they have been exposed. Tonality profiles can be seen as another kind of regularity found in music; and violations of tonality in the form of chromaticism may act in precisely the same way as rhythmic violations to enhance particular points in the music. Implicit learning of regularities such as tonality can be modeled using covariances in the stimuli (Chen et al. 2007). This goes beyond simple pair-wise correlations; because of the hierarchically structured representations in which increasingly more complex patterns, spanning longer time scales come to be represented, implicit learning can also apply to higher-order patterns of coherent co-variation among stimulus properties. Hence this generic mechanism of perceptual learning can give rise to internal representations that capture a similarity structure which is different from that available directly from a simple sequential single scale analysis of the sensory input. However, the time scale for tonality analysis appears to be surprisingly circumscribed. In the study of Serra et al. (Serrà et al. 2008) the covariance profiles of chroma features were averaged over 0.5, 15 and 30 s long segments, representing pitch class distributions over three different time scales. The high accuracy of the 0.5 second profiles in predicting human judgment suggested that perceptions of tonal stability are strongly influenced by the short-term music listening experience, consistent with the experimental findings of Tillman and Bigand (Tillmann et al. 2004) highlighting the importance of local structure in music perception. From their work they concluded that using short temporal windows, listeners understand the local function of cadences, and perceive changes in tonality. This is also consistent with the work of Leman (Leman 2000) who showed that a model which incorporated processing at time scales consistent with the typical span of short term echoic memory (~1.5 seconds) could simulate human sensitivity to tonality in a probe-tone rating task, thereby refuting the claim that the sensitivity necessarily depended upon some long term exposure to Western music. Clearly the generic architecture derived from the pitch model, illustrated in Figure 7, could be applied to the perception of tonality, and the problem of detecting changes in key in musical excerpts; a subject of ongoing investigation.
168 Susan L. Denham et al.
Figure 7. A generic hierarchical architecture for the extraction and modeling of features contained in temporal patterns of activity. This generic circuit is hypothesized to recur in parallel in many different feature spaces, which are conjoined iteratively to span increasingly longer time scales.
6.7
Flexibility in perceptual organization
The phenomenon of perceptual bistability has been known for some time in vision. For example, if a different image is presented to each eye, perceptual awareness typically switches from one image to the other rather than forming some sort of composite image. This effect is known as ‘binocular rivalry’. There are many other forms of visual bistability, such as the Necker cube and the vase-face example. However, it has only recently been shown that auditory perception similarly exhibits bistability (Denham et al. 2006; Pressnitzer et al. 2005, 2006; Winkler et al. 2005). In the case of auditory streaming, in response to a sequence of tones, ABA-ABA-ABA- …, where A and B are two different frequencies, and – is a silent period, subjects report hearing a galloping pattern which characterizes the integrated organisation, or an isochronous pattern which signifies that the tones have been segregated into two separate streams. Our analysis showed that there are two fundamentally different phases of perceptual organisation (Denham et al. 2009); in the first perceptual phase (formation of associations), alternative interpretations of the auditory input are formed on the basis of stimulus features; and in the
Neurocomputational models of perceptual organization 169
Figure 8. The dynamics of percept choice (a) and percept switching (b). (Adapted with permission from Noest et al. 2007.)
second phase (coexistence of interpretations), perception stochastically switches between the alternatives, thus maintaining perceptual flexibility. Interestingly the characteristics of the second phase are very similar to a number of aspects of visual bistability, suggesting common underlying mechanisms. Key ingredients in most models of perceptual bistability are competition between rivaling percepts, adaptation so that the winning percept never remains dominant for all time and a hierarchical organisation in which competition occurs simultaneously at different levels in the hierarchy; e.g. (Wilson 2003; Brascamp et al. 2006; Noest et al. 2007). Consider the analysis of the percept choices and switching illustrated in Figure 8 which depicts the dynamics of the model. The grey lines represent the stable states of each of the rivaling populations, also known as the null clines of the system. The black arrows show how the system state will evolve from any particular point in the space. The attractors of the system are the points where the grey lines intersect. The central one, known as the bifurcation point, is unstable, and the two outer ones are stable and correspond to each of the alternative perceptual organisations. With adaptation, which has the effect of moving the line corresponding to the dominant percept as illustrated in the right hand diagram, a stable attractor is gradually lost and a switch to the other attractor occurs. Key here is perceptual flexibility. If the neural system is maintained in a state close to the bifurcation point, then it is relatively easy to switch between different alternatives, and for attention to exert its influence. This is only possible if the two rivaling systems are more or less balanced. The diagram makes it clear that a significant imbalance between the competing inputs would eliminate the bifurcation
170 Susan L. Denham et al.
point and the existence of two stable attractors, such as is the case in the right hand diagram. This is interesting because it is well known that adaptation to the current context is pervasive throughout sensory systems. For example, adaptation to the distribution of stimulus amplitudes in inferior colliculus has been demonstrated to move the sigmoidal rate level function such that it optimally matches the distribution of input levels (Dean et al. 2005). The result is that the activity levels of neurons are held relatively constant, with fluctuations around the mean being accentuated. A natural consequence of such adaptation would be to maintain roughly equitable activity levels in response to rivaling percepts, and thereby to keep the system close to critical bifurcation points. The effects of attention can be understood in the same way; i.e. ‘biasing’ the competition is essentially the same as moving one of the null-clines, though in this case in the opposite direction to adaptation. We can also consider a model of auditory streaming in the light of the generic model proposed above. The forward pathway extracts the pitches of the alternating tones (ABA-ABA-…) in the sequence, and results in sequences of activity associated with each tone. It has been shown that activity across primary auditory cortex gradually clusters into three regions, one responding to the A tones, one to the B tones, and an intermediate region to both tones (Fishman et al. 2004, 2001; Micheyl et al. 2005). It is clear that each region would then be characterized by a different periodicity which could be projected to higher level areas as different quasi-static patterns, as suggested previously. Rivalry at the upper level between the different periodicity patterns would result in one pattern dominating and hence reaching perceptual awareness. Adaptation at all levels would gradually weaken the attractor until a switch to the other organisation occurred. Incidentally the model can explain why it is easier to switch between A-A-A and B----B----B than between A-A-A and ABA-ABA since lateral competition across the tonotopic axis would allow both the separate streams to exist while suppressing the central cluster, and vice versa, as illustrated in Figure 9. Both in visual (Brascamp et al. 2006) and in auditory (Denham et al. 2008) bistability, there have been reports of so-called transition durations which, although not as long as the stable organisations typically experienced, are nevertheless longer than would be expected from the bifurcation model described above. An explanation for this phenomenon emerges from the proposed hierarchical model as follows. While the goal of sensory processing is to form a self-consistent embedded set of dynamical systems, our modeling studies have shown that it is possible that the winner at one level of the hierarchy may be inconsistent with that at another level, and that it takes some time for the feedback projections to reinstate consistency between all levels. This we suggest corresponds to the transition period in which both organisations are reported.
Neurocomputational models of perceptual organization 171
Figure 9. The first two stages in the perceptual hierarchy in the proposed model of auditory streaming. In the stimulus ABA-ABA- each tone has duration T. Competition between clusters of activity corresponding to different periodicity patterns at the upper level results in the emergence of a dominant perceptual organisation. Feedback to the lower level ensures consistent organisation throughout the hierarchy. Adaptation gradually weakens the upper level attractor, allowing a switch to another organisation.
6.8
Concluding remarks
In this chapter we have reviewed visual and auditory models of perceptual organisation and have argued for an interpretation of perception as a process of inference, in support of the important role of sensory systems to discover ‘what is out there’. A consistent framework emerges across modalities in the form of a hierarchical generative architecture, which performs a multi-scale analysis of the incoming data both in stimulus feature spaces and in time. In a novel extension to this approach it was also shown that top-down modulation of inhibition at lower levels can result in an appropriate reset of processing when new information is detected, and in this way the system can adjust its processing time constants to match the sensory input. In conclusion, we believe that an interpretation of sensory processing as information gathering can provide many insights into the likely architecture and functionality of sensory systems.
172 Susan L. Denham et al.
Acknowledgements This work was supported by the European Research Area Specific Targeted Projects EmCAP (IST-FP6-013123) and SCANDLE (ICT-FP7-231168).
References Angelucci, A. and Bullier, J. (2003). Reaching beyond the classical receptive field of VI neurons: horizontal or feedback axons? Journal of Physiology-Paris, 97, 141–154. Balaguer-Ballester, E., Clark, N., Coath, M., Krumbholz, K. and Denham, S. L. (2009). Understanding pitch perception as a hierarchical process with top-down modulation. PLoS Computational Biology, 5, e1000301. Balaguer-Ballester, E., Coath, M. and Denham, S. L. (2007). A model of perceptual segregation based on clustering the time series of the simulated auditory nerve firing probability. Biological Cybernetics, 97, 479–491. Balaguer-Ballester, E., Denham, S. L. and Meddis, R. (2008). A cascade autocorrelation model of pitch perception. Journal of the Acoustical Society of America, 124, 2186–2195. Beck, D. M. and Kastner, S. (2008). Top-down and bottom-up mechanisms in biasing competition in the human brain. Vision Research, 49, 1154–1165. Bernstein, J. G. and Oxenham, A. J. (2005). An autocorrelation model with place dependence to account for the effect of harmonic number on fundamental frequency discrimination. Journal of the Acoustical Society of America, 117, 3816–3831. Brascamp, J. W., van Ee, R., Noest, A. J., Jacobs, R. H. and van den Berg, A. V. (2006). The time course of binocular rivalry reveals a fundamental role of noise. Journal of Vision, 6, 1244–1256. Cadieu, C., Kouh, M., Pasupathy, A., Connor, C. E., Riesenhuber, M. and Poggio, T. (2007). A model of V4 shape selectivity and invariance. Journal of Neurophysiology, 98, 1733–1750. Carandini, M., Demb, J. B., Mante, V., Tolhurst, D. J., Dan, Y., Olshausen, B. A., Gallant, J. L. and Rust, N. C. (2005). Do we know what the early visual system does? Journal of Neuroscience, 25, 10577–10597. Cariani, P. A. and Delgutte, B. (1996a). Neural correlates of the pitch of complex tones. II. Pitch shift, pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the dominance region for pitch. Journal of Neurophysiology, 76, 1717–1734. Cariani, P. A. and Delgutte, B. (1996b). Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. Journal of Neurophysiology, 76, 1698–1716. Chen, Z., Haykin, S., Eggermont, J. J. and Becker, S. (2007). Correlative learning: A basis for brain and adaptive systems. Hoboken, NJ: John Wiley and Sons, Hoboken. Coath, M., Brader, J. M., Fusi, S. and Denham, S. (2005). Multiple views of the response of an ensemble of spectrotemporal responsefeatures supports concurrent classification of utterance, prosody, sex and speaker identity. Network: Computation in Neural Systems, 16, 285–300. Dayan, P., Hinton, G. E., Neal, R. M. and Zemel, R. S. (1995). The Helmholtz Machine. Neural Computation, 7, 889–904.
Neurocomputational models of perceptual organization 173
de Cheveigne, A. and Pressnitzer, D. (2006). The case of the missing delay lines: Synthetic delays obtained by cross-channel phase interaction. Journal of the Acoustical Society of America, 119, 3908–3918. de Cheveigné, A. (2005). Pitch perception models. In C. J. Plack, A. J. Oxenham, R. R. Fay and A. N. Popper (Eds.), Pitch: Neural coding and perception. New York: Springer. Dean, I., Harper, N. S. and McAlpine, D. (2005). Neural population coding of sound level adapts to stimulus statistics. Nature Neuroscience, 8, 1684–1689. Deco, G. and Lee, T. S. (2002). A unified model of spatial and object attention based on intercortical biased competition. Neurocomputing, 44–46, 775–781. Denham, S. L. and Winkler, I. (2006). The role of predictive models in the formation of auditory streams. Journal of Physiology-Paris, 100, 154–170. Denham, S. L., Gyimesi, K., Stefanics, G. and Winkler, I. (2009). Perceptual bi-stability in auditory streaming: How much do stimulus features matter? Biological Psychology, under review. Desain, P. and Honing, H. (2003). The formation of rhythmic categories and metric priming. Perception, 32, 341–365. Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual atten- temporal neurons in the awake rhesus monkey. Annual Review of Neuroscience, 18, 193–222. Duncan, J. (1984). Selective attention and the organization of visual information. Journal of Experimental Psychology-General, 113, 501–517. Dura, S. and Denham, S. L. (2008). Feedback in a hierarchical model of object recognition: A Bayesian Inference Approach, Local-Area Systems and Theoretical Neuroscience Day, Gatsby Unit – UCL. London. Dura, S., Wennekers, T. and Denham, S. (2009). Feedback in a hierarchical model of object recognition in cortex. BMC Neuroscience, 10, P355. Felleman, D. J. and Van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1, 1–47. Fishbach, A., Nelken, I. and Yeshurun, Y. (2001). Auditory edge detection: A neural model for physiological and psychoacoustical responses to amplitude transients. Journal of Neurophysiology, 85, 2303–2323. Fishman, Y. I., Arezzo, J. C. and Steinschneider, M. (2004). Auditory stream segregation in monkey auditory cortex: Effects of frequency separation, presentation rate, and tone duration. Journal of the Acoustical Society of America, 116, 1656–1670. Fishman, Y. I., Reser, D. H., Arezzo, J. C. and Steinschneider, M. (2001). Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hearing Research, 151, 167–187. Friston, K. (2002). Functional integration and inference in the brain. Progress in Neurobiology, 68, 113–143. Friston, K. (2003a). Learning and inference in the brain. Network-Computation in Neural Systems, 16, 1325–1352. Friston, K. (2003b). Learning and inference in the brain. Network-Computation in Neural Systems, 16, 1325–1352. Friston, K. (2005). A theory of cortical responses. Philosophical Transactions of the Royal Society of London, B Biological Sciences, 360, 815–836. Friston, K., Kilner, J. and Harrison, L. (2006). A free energy principle for the brain. Journal of Physiology-Paris, 100, 70–87.
174 Susan L. Denham et al.
Friston, K. J., Harrison, L. and Penny, W. (2003). Dynamic causal modelling. Neuroimage, 19, 1273–1302. Garcia-Lazaro, J. A., Ho, S. S., Nair, A. and Schnupp, J. W. (2007). Shifting and scaling adaptation to dynamic stimuli in somatosensory cortex. European Journal of Neuroscience, 26, 2359–2368. Geisler, W. S. and Kersten, D. (2002). Illusions, perception and Bayes. Nature Neuroscience, 5, 508–510. Guo, K., Robertson, R. G., Pulgarin, M., Nevado, A., Panzeri, S., Thiele, A. and Young, M. P. (2007). Spatio-temporal prediction and inference by V1 neurons. European Journal of Neuroscience, 26, 1045–1054. Gutschalk, A., Patterson, R. D., Scherg, M., Uppenkamp, S. and Rupp, A. (2007). The effect of temporal context on the sustained pitch response in human auditory cortex. Cerebral Cortex, 17, 552–561. Háden, G. P., Stefanics, G., Vestergaard, M. D., Denham, S. L., Sziller, I. and Winkler, I. (2009). Timbre-independent extraction of pitch in newborn infants. Psychophysiology, 46, 69–74. Halko, M. A., Mingolla, E. and Somers, D. C. (2008). Multiple mechanisms of illusory contour perception. Journal of Vision, 8, Article. Hall, J. W., 3rd and Peters, R. W. (1981). Pitch for nonsimultaneous successive harmonics in quiet and noise. Journal of the Acoustical Society of America, 69, 509–513. Harrison, L. M., Stephan, K. E., Rees, G. and Friston, K. J. (2007). Extra-classical receptive field effects measured in striate cortex with fMRI. Neuroimage, 34, 1199–1208. Hasson, U., Yang, E., Vallines, I., Heeger, D. J. and Rubin, N. (2008). A hierarchy of temporal receptive windows in human cortex. Journal of Neuroscience, 28, 2539–2550. Hawkins, J. and George, D. (2007). Hierarchical temporal theory: Concepts, theory and terminology. Redwood City, CA: Numenta Inc. Helmholtz, H. L. (1860/1962). Handbuch der physiologischen Optik (English translation). New York: Dover. Honing, H. (2002). Structure and interpretation of rhythm and timing. Tijdschrift voor muziektheorie, 7, 227–232. Huang, J. Y., Wang, C. and Dreher, B. (2007). The effects of reversible inactivation of posterotemporal visual cortex on neuronal activities in cat’s area 17. Brain Research, 1138, 111– 128. Hupe, J. M., James, A. C., Girard, P., Lomber, S. G., Payne, B. R. and Bullier, J. (2001). Feedback connections act on the early part of the responses in monkey visual cortex. Journal of Neurophysiology, 85, 134–145. Jehee, J. F., Roelfsema, P. R., Deco, G., Murre, J. M. and Lamme, V. A. (2007). Interactions between higher and lower visual areas improve shape selectivity of higher level neuronsexplaining crowding phenomena. Brain Research, 1157, 167–176. Kadner, A. and Berrebi, A. S. (2008). Encoding of temporal features of auditory stimuli in the medial nucleus of the trapezoid body and superior paraolivary nucleus of the rat. Neuroscience, 151, 868–887. Kersten, D., Mamassian, P. and Yuille, A. (2004). Object perception as Bayesian inference. Annual Review of Psychology, 55, 271–304. Kiebel, S. J., Daunizeau, J. and Friston, K. J. (2008). A hierarchy of time-scales and the brain. PLoS Computational Biology, 4, e1000209.
Neurocomputational models of perceptual organization 175
Knill, D. C., Kersten, D. and Mamassian, P. (1996). Implications of a Bayesian formulation of visual information for processing for psychophysics. In D. Knill and W. Richards (Eds.), Perception as Bayesian Inference (pp. 239–286). Cambridge University Press. Kording, K. P. and Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning. Nature, 427, 244–247. Lee, T. S. and Mumford, D. (2003). Hierarchical Bayesian inference in the visual cortex. Journal of the Optical Society of America A-Optics Image Science and Vision, 20, 1434–1448. Lee, T. S. and Nguyen, M. (2001). Dynamics of subjective contour formation in the early visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 98, 1907–1911. Leman, M. (2000). An auditory model of the role of short term memory in probe-tone ratings. Music Perception, 17, 481–509. Licklider, J. C. R. (1951). A duplex theory of pitch perception. Experientia, 7, 128–134. Meddis, R. and Hewitt, M. J. (1991). Virtual pitch and phase sensitivity of a computer model of the auditory periphery. I: Pitch identification. Journal of the Acoustical Society of America, 89, 2866. Meddis, R. and O’Mard, L. (1997). A unitary model of pitch perception. Journal of the Acoustical Society of America, 102, 1811–1820. Micheyl, C., Tian, B., Carlyon, R. P. and Rauschecker, J. P. (2005). Perceptual organization of tone sequences in the auditory cortex of awake macaques. Neuron, 48, 139–148. Moore, B. C. J. (2004). An introduction to the psychology of hearing. London: Elsevier. Mumford, D. (1991). On the computational architecture of the neocortex I. The role of the thalamo-cortical loop. Biological Cybernetics, 65, 135–145. Mumford, D. (1992). On the computational architecture of the neocortex II. The role of the cortico-cortical loop. Biological Cybernetics, 66, 241–251. Mumford, D. (1996). Pattern theory: a unifying perspective. In D. R. W. Knill (Ed.), Perception as Bayesian Inference (pp. 25–62). Cambridge University Press. Murray, S. O., Schrater, P. and Kersten, D. (2004). Perceptual grouping and the interactions between visual cortical areas. Neural Networks, 17, 695–705. Noest, A. J., van Ee, R., Nijs, M. M. and van Wezel, R. J. (2007). Percept-choice sequences driven by interrupted ambiguous stimuli: A low-level neural model. Journal of Vision, 7, 10. Olshausen, B. A. and Field, D. J. (2005). How close are we to understanding v1? Neural Computation, 17, 1665–1699. Pressnitzer, D. and Hupé, J. M. (2005). Is auditory streaming a bistable percept? Forum Acusticum, Budapest, 1557–1561. Pressnitzer, D. and Hupe, J. M. (2006). Temporal dynamics of auditory and visual bistability reveal common principles of perceptual organization. Current Biology, 16, 1351–1357. Raizada, R. D. S. and Grossberg, S. (2003). Towards a theory of the laminar architecture of cerebral cortex: Computational clues from the visual system. Cerebral Cortex, 13, 100–113. Rao, R. P. N. (1999). An optimal estimation approach to visual perception and learning. Vision Research, 39, 1963–1989. Rao, R. P. N. (2004). Bayesian computation in recurrent neural circuits. Neural Computation, 16, 1–38. Rao, R. P. N. (2006). Neural Models of Bayesian Belief Propagation. In K. Doya (Ed.), Bayesian brain: Probabilistic approaches to neural coding (pp. 239–268). Cambridge, MA: MIT Press.
176 Susan L. Denham et al.
Rao, R. P. N. and Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2, 79–87. Rauschenberger, R., Liu, T., Slotnick, S. D. and Yantis, S. (2006). Temporally unfolding neural representation of pictorial occlusion. Psychological Science, 17, 358–364. Riesenhuber, M. and Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025. Schroeder, M. R. (1999). Computer speech. New York: Springer. Schwartz, D. A., Howe, C. Q. and Purves, D. (2003). The statistical structure of human speech sounds predicts musical universals. Journal of Neuroscience, 23, 7160–7168. Schwartz, O., Hsu, A. and Dayan, P. (2007). Space and time in visual context. Nature Reviews Neuroscience, 8, 522–535. Serrà, J., Gómez, E., Herrera, P. and Serra, X. (2008). Statistical analysis of chroma features in western music predicts human judgements of tonality. Journal of New Music Research, 37, 299–309. Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M. and Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 411–426. Sillito, A. M., Cudeiro, J. and Jones, H. E. (2006). Always returning: Feedback and sensory processing in visual cortex and thalamus. Trends in Neurosciences, 29, 307–316. Slaney, M. and Lyon, R. F. (1990). A perceptual pitch detector. International Conference on Acoustics, Speech, and Signal Processing ICASSP-90, 1, 357–360. Spratling, M. W. (2008). Predictive coding as a model of biased competition in visual attention. Vision Research, 48, 1391–1408. Stanley, D. A. and Rubin, N. (2003). fMRI activation in response to illusory contours and salient regions in the human lateral occipital complex. Neuron, 37, 323–331. Stefanics, G., Haden, G. P., Sziller, I., Balazs, L., Beke, A. and Winkler, I. (2009). Newborn infants process pitch intervals. Clinical Neurophysiology, 120, 304–308. Stefanics, G., Haden, G., Huotilainen, M., Balazs, L., Sziller, I., Beke, A., Fellman, V. and Winkler, I. (2007). Auditory temporal grouping in newborn infants. Psychophysiology, 44, 697–702. Tillmann, B. and Bigand, E. (2004). The relative importance of local and global structures in music perception. The Journal of Aesthetics and Art Criticism, 62, 211–222. Vestergaard, M. D., Háden, G. P., Shtyrov, Y., Patterson, R. D., Pulvermüller, F., Denham, S. L., Sziller, I. and Winkler, I. (2009). Auditory size-deviant detection in adults and newborn infants. Biological Psychology, 82, 169–175. Vos, P. G., van Assen, M. and Franek, M. (1996). Perceived tempo change is dependent on base tempo and direction of change: Evidence for a generalized version of Schulze’s (1978) internal beat model. Psychological Research, 59, 240–247. Vuust, P., Ostergaard, L., Pallesen, K. J., Bailey, C. and Roepstorff, A. (2009). Predictive coding of music – Brain responses to rhythmic incongruity. Cortex, 45, 80–92. Vuust, P., Pallesen, K. J., Bailey, C., van Zuijen, T. L., Gjedde, A., Roepstorff, A. and Ostergaard, L. (2005). To musicians, the message is in the meter pre-attentive neuronal responses to incongruent rhythm are left-lateralized in musicians. Neuroimage, 24, 560–564. Wiegrebe, L. (2001). Searching for the time constant of neural pitch extraction. Journal of the Acoustical Society of America, 109, 1082–1091. Wilson, H. R. (2003). Computational evidence for a rivalry hierarchy in vision. Proceedings of the National Academy of Sciences of the United States of America, 100, 14499–14503.
Neurocomputational models of perceptual organization 177
Winkler, I., Takegata, R. and Sussman, E. (2005). Event-related brain potentials reveal multiple stages in the perceptual organization of sound. Brain Research – Cognitive Brain Research, 25, 291–299. Winkler, I., Haden, G. P., Ladinig, O., Sziller, I. and Honing, H. (2009). Newborn infants detect the beat in music. Proceedings of the National Academy of Sciences of the United States of America, 106, 2468–2471. Yuille, A. and Kersten, D. (2006). Vision as Bayesian inference: Analysis by synthesis? Trends in Cognitive Sciences, 10, 301–308.
chapter 7
Are you listening? Language outside the focus of attention Yury Shtyrov and Friedemann Pulvermüller
Medical Research Council, Cognition and Brain Science Unit, Cambridge, UK
7.1
Introduction
Do all stages of speech analysis require our active attention on the subject? Can some of them take place irrespective of whether or not we focus on incoming speech and can they be, in this sense, automatic? Are various types of linguistic information processed by our brain simultaneously or in certain order? These and other questions have been long debated in both psycholinguistics and neuroscience of language. Recent studies investigating the cognitive processes underlying spoken language processing found that even under attentional withdrawal, size and topography of certain brain responses reflect the activation of memory traces for language elements in the human brain. Familiar sounds of one’s native language may elicit a larger brain activation than unfamiliar sounds, and at the level of meaningful language units, words elicit a larger response than meaningless word-like sound combinations. This suggests that using modern neuroimaging tools, we can trace the activation of memory networks for language sounds and spoken words even when no attention is paid to them. Unattended word stimuli elicit an activation sequence starting with superior-temporal cortex and rapidly progressing to left-inferior-frontal lobe. The spatio-temporal patterns of these cortical activation depend on lexical and semantic properties of word stems and affixes, thus providing clues about lexico-semantic information processing. At the syntactic level, we can see reflections of grammatical regularities in word strings. This growing body of results suggests that lexical, semantic and syntactic information can be processed by the central nervous system outside the focus of attention in a largely automatic manner. Analysis of spatio-temporal patterns of generator activations underlying such attention-independent responses to speech
180 Yury Shtyrov and Friedemann Pulvermüller
may be an important tool for investigating the brain dynamics of spoken language processing and the distributed cortical networks involved.
7.2
Types and stages of linguistic information processing: Theories and conflicting evidence in psychololinguisitics and neurophysiology
It has been a matter of debate whether the distinctively human capacity to process language draws on attentional resources or is automatic. The ease, with which we can perceive the entire complexity of incoming speech and seemingly instantaneously decode syntactic, morphological, semantic and other information while doing something else at the same time, prompted suggestions that linguistic activity may be performed by the human brain in a largely automatic fashion (Fodor 1983; Garrett 1984; Garrod and Pickering 2004; Pickering and Garrod 2004). Before considering the experimental data that speak to the issue, let us briefly look at a more general psycholinguistic and neurophysiological background of what is known about linguistic information processing. Traditional psycholinguistic models of language comprehension (Morton 1969; Fromkin 1973; Garrett 1980; Dell 1986; MacKay 1987), also reflected in current approaches to speech comprehension and production (Levelt et al. 1999; Norris et al. 2000), distinguished a few types of information involved in these processes. Even though these theories may be rather different, they mostly agree on the plausibility of (1) phonological processing level, at which speech sounds are analysed for their phonetic/phonological features (following the level of basic acoustic analysis), (2) lexical processing, sometimes conceptualised as the lookup of an item in a “mental lexicon”, which lists only word forms but not their meaning or other related information, (3) semantic processing, where the item’s meaning is accessed, and (4) syntactic level, where grammatical information linking words in sentences to each other is analysed. These different “modules” of processing can sometimes be merged, omitted or extended to include more levels. A great debate in psycholinguistics is, however, between models according to which processing of these information types is consecutive, and access to them commences at substantially different times, and models implying that these processes take place near-simultaneously, even in parallel. This issue could potentially be resolved on the basis of neurophysiological data, using EEG or MEG which can track brain processes with highest possible temporal resolution, on a millisecond scale. Consequently, an entire body of neurophysiological research done with various linguistic materials assisted in establishing the dominance of a view that posits serial order of linguistic information access in the brain. With basic acoustic
Are you listening? 181
feature extraction commencing at 20–50 ms after the stimulus onset (Krumbholz et al. 2003; Lutkenhoner et al. 2003), phonological tasks modulated responses with latencies of 100–200 ms (Poeppel et al. 1996; Obleser et al. 2004). ‘Higher’ levels of information appeared to take substantially longer to be assessed by the human brain: the N400 component, which peaks around 400 ms after the onset of a critical visual word (Kutas and Hillyard 1980) is traditionally seen as the main index of semantic processes. A slightly earlier (350 ms) peaking sub-component of it is sometimes separated as an index of lexical processing (Bentin et al. 1999; Embick et al. 2001; Pylkkanen et al. 2002; Stockall et al. 2004) aiding the elusive distinction between the lexical and semantic levels of processing. An even more complicated situation emerged for syntactic processing, with an early component (ELAN, early left anterior negativity) appearing at ~100 ms and reflecting the phrase’s grammaticality (Neville et al. 1991; Friederici et al. 1993); this is complimented by later grammatically related frontal negativities with longer (> 250 ms) latencies (Münte et al. 1998; Gunter et al. 2000), and, finally, by the P600, a late positive shift reaching its maximum at ~600 ms at centro-parietal sites (for review, see Osterhout et al. 1997; Osterhout and Hagoort 1999). Although not without controversies, such data suggested a stepwise access to different types of linguistic information, thus forming a mainstream view in the neurophysiology of language (for review, see Friederici 2002). This view, however, came into a conflict with a body of behavioural evidence collected in a number of pshycholinguistic studies that indicated parallel processing of crucial information about incoming words and their context very early, within the first ~200 milliseconds after a critical word can be recognized (Marslen-Wilson 1973, 1987; Rastle et al. 2000; Mohr and Pulvermüller 2002). For example, subjects can already make reliable button-press motor responses to written words according to their evaluation of aspects of phonological and semantic stimulus properties within 400–450 ms after their onset (Marslen-Wilson and Tyler 1975). Therefore, the earliest word-related psycholinguistic processes as such must take place substantially before this, as considerable time is required for the motor response preparation and execution. Surprisingly, at these early latencies, linguistic processing was already influenced by syntactic and semantic information (Marslen-Wilson and Tyler 1975). Furthermore, studies using shadowing technique suggested that the language output must be initiated by 150 to 200 ms already after the input onset (Marslen-Wilson 1985). Early behavioural effects reflecting semantic processing and context integration were documented in cross modal priming, where specific knowledge about an upcoming spoken word could be demonstrated to be present well before its end, within 200 milliseconds after the acoustic signal allows for unique word identification or even well ahead of this (Zwitserlood 1989; Moss et al. 1997; Tyler et al. 2002).
182 Yury Shtyrov and Friedemann Pulvermüller
Eye-tracking experiments demonstrated that a range of psycholinguistic properties of stimuli influence short-latency eye movement responses (Sereno and Rayner 2003). The view imposed by such studies is that near-simultaneous access to the different types of linguistic information may commence within 200 ms after the relevant information becomes available. This calls in question the stepwise models of language processing also put forward in psycholinguistics and supported, at least partially, by early ERP experiments.
7.3
Mismatch negativity as a tool for language science
The above controversy clearly presents a need for a more refined methodology that would address this issue. As we will see below, this issue is intrinsically involved, for theoretical and methodological reasons, with the issue of automaticity and control in linguistic processing. To address these, we will focus on data obtained outside the focus of attention in a passive oddball paradigm using the so-called mismatch negativity (MMN) brain response. MMN is an evoked brain response elicited by rare (so-called deviant) acoustic stimuli occasionally presented in a sequence of frequent (standard) stimuli (Alho 1995; Näätänen 1995). Importantly, MMN can be elicited in the absence of the subject’s attention to the auditory input (Tiitinen et al. 1994; Schröger 1996). It therefore became considered to reflect the brain’s automatic discrimination of changes in the auditory sensory input and thus to be a unique indicator of automatic cerebral processing of acoustic events (Näätänen 1995). A number of MMN’s specific properties led to the suggestion to use it as a tool for investigating the neural processing of speech and language (Näätänen 2001; Pulvermüller and Shtyrov 2006). The main motivations for applying MMN to exploring the brain processing of language are: (a) MMN is early; (b) MMN is automatic; (c) MMN is a response to individual sound; (d) MMN is response to a change. Whereas (a) and (b) are based on the known properties of the MMN response tentatively related to its neural mechanisms, (c) and (d) are important from the technical, methodological point of view on recording brain responses to language.
7.3.1
The earliness of MMN response
Various behavioural data suggested linguistic processes in the brain to commence within 200 ms after the relevant acoustic information is available. Should these processes be reflected in the dynamics of event-related activity recorded on the scalp surface, the respective brain responses must have similar, if not earlier,
Are you listening? 183
latencies in order to be considered directly related to these processes rather then being their remote consequences. This clearly eliminates late shifts (N400, M350, P600) as potential indicators of early processing. Early obligatory auditory responses (P1, N1) have not been found sensitive to linguistic variables. ELAN, the early syntax-related response, is well within the time window, but cannot be seen for most types of linguistic (even syntactic) tasks. MMN, on the contrary, is both early (usually reported as having latencies of 100–200 ms) and firmly linked to such cognitive processes as memory, attention allocation, and primitive auditory system intellect (Winkler et al. 1993; Näätänen 1995, 2000; Näätänen and Alho 1995; Näätänen et al. 2001). It is known to be sensitive to highly abstract features of auditory signals and is therefore a good candidate for identifying putative early linguistic activations.
7.3.2
Automaticity of the MMN
The jury is on about whether or not, and to what extent, the MMN is independent of or modulated by attention (Picton et al. 2000). It is however uncontroversial that MMN can be easily elicited in the absence of attention to the stimulus input and therefore does not require active voluntary processing of stimulus material by the individual, who may be engaged in an unrelated primary task (typically watching a videofilm) while MMN responses are being evoked by random infrequent changes in auditory stimulation. In this very sense the MMN can be considered an automatic brain response, at least until a better term is found. For language research, this has an important implication. Typically, in language experiments, subjects are asked to attend to presented words or sentences (e.g., Neville et al. 1991; Osterhout and Swinney 1993; Friederici et al. 2000). The task is often to make a judgment of the stimulus material (e.g. familiar/unfamiliar, correct/incorrect) or even perform a specific linguistic task (e.g. lexical decision, grammar assessment). When attention is required, one can not be sure to what extent the registered responses are influenced by brain correlates of attention rather than by the language-related activity as such. Attention-related phenomena are known to modulate a variety of brain’s evoked responses involving a number of brain structures including those close to, or overlapping with, the core language areas (see e.g. Picton and Hillyard 1974; Alho 1992; Woods et al. 1993a, 1993b; Tiitinen et al. 1997; Escera et al. 1998; Yamasaki et al. 2002; Yantis et al. 2002). It is also likely that subjects pay more attention to unusual or incorrect stimuli (most frequently used stimulus types include e.g. pseudowords, nonsense sentences or grammatical violations) as they try to make some sense of them, or that they use different strategies to process proper and ill-formed
184 Yury Shtyrov and Friedemann Pulvermüller
items. Such different stimulus-specific strategies and attention variation may be reflected in the event-related measures, thus obscuring, masking, modifying or even canceling any true language-related responses. So, to register the true language-related activity of the brain, it is essential to tease it apart from such attention- and task-related effects. The MMN provides a straightforward solution to this, as it can be recorded when the subjects are distracted from the stimuli and are not engaged in any stimulus-oriented tasks.
7.3.3
MMN as a single-item response
Even though early psycholinguistic processes had been suggested, such putative early activity remained mostly undetected neurophysiologically. One reason for the failure of most studies to detect any early language activation may be of methodological nature. In most brain studies of language, large groups of stimuli are investigated and compared with each other, and average responses are used to draw general conclusions on all materials falling into a certain category. This leads to the problem of physical stimulus variance, with stimuli having different physical features (e.g. duration, spectral characteristics, distribution of sounds energy, etc.). Differences even in basic physical features may lead to differential brain activation (Näätänen and Picton 1987; Korth and Nguyen 1997) that could in principle overlap with, mask or be misinterpreted as language-related effects. Equally importantly, it raises psycholinguistic variance problem, with stimuli differing in their linguistic features, e.g. the frequency of their occurrence in the language or their word recognition parameters. The latter may be especially difficult to control, as different words, even of identical length, become uniquely recognized from their lexical competitor environment at different times, in extreme cases shortly after their onset or only after a substantial post-offset period (Marslen-Wilson 1987). Although traditional strategy of matching average parameters across stimulus categories can help mitigate this, it still has a serious caveat: if the brain responses reflecting early linguistic processes are small and short-lived (as all known early ERPs peaks are), the variance in the stimulus group may reduce or even remove any effects in the average brain responses (Pulvermüller 1999; Pulvermüller and Shtyrov 2006). Figure 1 illustrates how variability in stimulus set may potentially lead to washing out of ERP effects. Later responses (N400, P600), on the other hand, will likely survive such averaging, as they are large in amplitude and span across hundreds of milliseconds. Therefore, to locate the putative early effects with any certainty, stimulus variance should be maximally reduced. As MMN is typically a response to a single deviant item presented randomly a large number of times in order to optimize signal-to-noise ratio (SNR) of the ERP, this offers an ultimate control over the stimulus variance by removing it altogether.
Are you listening? 185
Figure 1. Physical and psycholinguistic variance between different spoken words: example acoustic waveforms of three English words. Approximate positions of individual word recognition points (WRP) are marked with white arrowheads. A hypothetical short-lived brain response that might reflect the comprehension of each word shortly after the point in time when it can be uniquely recognized from the acoustic signal is schematically indicated. Please note that physical features and WRP latencies differ substantially between words. Averaging of neurophysiological activity over such variable stimulus sets will therefore blur and thus minimize or remove any early short-lived brain response locked to the recognition point.
7.3.4
MMN as a difference response
Mismatch negativity is elicited first of all by contrasts between the standard and deviant stimuli and is computed as a difference between responses to these two types. In turn, this offers an opportunity to strictly control acoustic stimulus properties in language experiments. This can be done by using the same identical acoustic contrasts in different experimental conditions, while manipulating their linguistic properties. Consider the following hypothetic experimental conditions: (1) the word ‘ray’ as standard stimulus vs. ‘rays’ as deviant, (2) ‘lay’ vs. ‘lays’, (3) ‘may’ vs. ‘maze’, (4) ‘tay’ vs. ‘taze’. In all of these, the MMN responses would be elicited by the presence of stimulus final sound [z] in the deviant as opposed to silence in the standard one. So, a purely acoustic MMN elicited by deviant-standard stimulus contrast should stay the same. However, the four conditions differ
186 Yury Shtyrov and Friedemann Pulvermüller
widely in their linguistic context while incorporating the identical acoustic contrast: in condition 1, this contrast constitutes a noun inflection (number change); in condition 2, a verb is inflected (3rd person) instead. In condition 3, the same acoustic contrast signifies a complete change of the stimulus lexical and semantic features when both part-of-speech information and meaning diverge. Finally, in the fourth condition, two meaningless pseudowords are contrasted offering an additional acoustic control for any effects that can be obtained in the other three conditions. So, by recording MMNs to such identical acoustic contrasts, one may focus on effects of the different linguistic contexts without the usual confound of diverging acoustic features. Additionally, the same identical deviant stimuli can be presented in a separate set of conditions as frequent standards, and the MMN can then be computed by using responses to physically identical items presented as both deviants and standard, offering an ultimate control over the physical stimulus features. The use of such strictly controlled stimulation proved to be very successful in a number of studies. Let us now briefly review this evidence.
7.4
Current data: Early, parallel and automatic access to linguistic information
At this stage (winter 2008–2009), experimental evidence using MMN and nonattend designs has been collected in all major domains of linguistic information: phonological, lexical, semantic and syntactic (Figures 2–5).
7.4.1
Phonological processes
Single phonemes and their simple combinations, syllables, were the first linguistic materials to be studied using MMN paradigm. These experiments showed that native language sounds, e.g. vowels, elicit larger MMN responses than their analogues that do not have corresponding representations in one’s phonological system (Dehaene-Lambertz 1997; Näätänen et al. 1997) This happened irrespective of the magnitude of acoustic contrasts between the standards and deviants. Furthermore, while MMN is usually a bilateral or even right-dominant response (Paavilainen et al. 1991), such phonetic MMNs showed left-hemispheric dominance, potentially linking their origin to the language-specific structures housed in the left hemisphere (Näätänen et al. 1997; Shtyrov et al. 1998, 2000). Further experiments even showed how changes in the pattern of the MMN response may reflect development of a new phonological representation in the process of learning a language by children or adults (Cheour et al. 1998; Winkler et al.
Are you listening? 187
Figure 2. MMN reflections of early automatic access to linguistic information: phonological enhancement for syllables (adapted from Shtyrov et al. 2000), lexically enhanced magnetic MMN for meaningful words (adapted from Shtyrov et al. 2005); fMRI counterpart of lexical enhancement for words (adapted from Shtyrov et al. 2008).
1999). Later experiments demonstrated MMN sensitivity to phonotactic probabilities (Dehaene-Lambertz et al. 2000; Bonte et al. 2005, 2007), stress patterns (Honbolygo et al. 2004; Weber et al. 2004), audio-visual integration of phonetic information (Colin et al. 2002, 2004) and various other phonological variables. The timing of these phonetic and phonological mismatch negativities ranged from close to 100 ms (Rinne et al. 1999) to nearly ~200 ms (Shtyrov et al. 2000). Accommodation of such data in the theoretical framework of the MMN required a certain revision of it. It was suggested that, in addition to change-detection and short-term memory processes, MMN is sensitive to long-term memory traces pre-formed in the subject’s neural system in the process of their previous experience with spoken language (Näätänen et al. 1997, 2001; Shtyrov et al. 2000). Importantly, this implied that such long-term memory traces for language elements can become activated in the brain by a deviant stimulus in an odd-ball sequence and this specific activation can be recorded neurophysiologically even without attention to the stimulus or any stimulus-oriented task. This led to further MMN experiments with language stimuli that targeted ‘higher-order’ linguistic processes.
188 Yury Shtyrov and Friedemann Pulvermüller
7.4.2
Lexical processes
Similar to the phonological enhancement of the MMN in response to native language’s phonemes and syllables, we found that mismatch negativity evoked by individual words was greater than that to comparable meaningless word-like (i.e., following phonological rules of the language) stimuli. In a series of studies, we presented subjects with sets of acoustically matched word and pseudoword stimuli and found an increased MMN response whenever the deviant stimulus was a meaningful word (Pulvermüller et al. 2001, 2004; Shtyrov and Pulvermüller 2002; Shtyrov et al. 2005). This enhancement, typically peaking at 100–200 ms, is best explained by the activation of cortical memory traces for words realised as distributed strongly connected populations of neurones (Pulvermüller and Shtyrov 2006). The lexical enhancement of the MMN, repeatedly confirmed by our group, was also demonstrated by other groups using various stimulus set-ups and languages (Korpilahti et al. 2001; Kujala et al. 2002; Sittiprapaporn et al. 2003; Endrass et al. 2004; Pettigrew et al. 2004) incuding a recent validation of its superior- (STC) and medio-temporal (MTG) sources by fMRI (Shtyrov et al. 2008). Our studies indicated that the lexical status of the deviant stimulus was relevant for eliciting the MMN, but the lexical status of the standard stimulus did not significantly affect the MMN amplitude (Shtyrov and Pulvermüller 2002). Other reports suggested that the event-related brain response evoked by the standard stimulus may also be affected by its lexical status (Diesch et al. 1998; Jacobsen et al. 2004). More detailed investigations also showed that MMN may be sensitive to more than a general lexicality, and can serve as an index of word-categoryspecific processing supporting, for example, the notion of early processing and representational differences between verbs and nouns (Hasting et al. 2008). Further scrutiny of word-elicited MMN activations using state-of-the-art technology, such as whole-head high-density MEG combined with source reconstruction algorithms, allowed for detailing word-related processes in the brain at unprecedented spatio-temporal scale. Using MMN responses to single words, we could demonstrate how the activation appears first in the superior temporal lobe shortly (~130 ms) after the information allows for stimulus identification, and, in a defined order (with a delay of ~20 ms), spreads to inferior-frontal cortices, potentially reflecting the flow of information in the language-specific perisylvian areas in the process of lexical access (Pulvermüller et al. 2003). Furthermore, we found a correlation between the latency of MMN responses in individual subject and the stimulus-specific word recognition points determined behaviourally, which suggests that word-elicited MMN reflects the processes of individual word recognition by the human brain and the two processes may therefore be linked (Pulvermüller et al. 2006).
Are you listening? 189
The pattern of MMN response to individual word deviants, as a result of such increasingly accumulating evidence, became considered as a word’s neural “signature”, its memory trace activated in the brain. This, in turn, offered a unparalleled opportunity to investigate neural processing of language in non-attention demanding and task-independent fashion. Indeed, investigations of language function using MMN that followed were able to provide a great level of detail on the spatio-temporal patterns of activation potentially related to processing of semantic information attached to individual words.
7.4.3
Semantic processes
To target semantic word properties using MMN, the very first studies in this domain utilised predictions of the semantic somatotopy model (SSM, Pulvermüller 1999, 2005). SSM is a part of a general framework maintaining that word-specific memory traces exist in the brain as distributed neuronal circuits formed in the process of mutual connection strengthening between different (even distant) areas, as actions, objects or concepts are experienced in conjunction with the words used to describe them (Hebb 1949; Pulvermüller 1999). The referential meaning is an integral part of a word’s semantics (Frege 1980). Action words’ regular usage for referring to arm/hand actions (words such as pick, write) or leg/foot actions (kick, walk) is therefore an essential characteristic of their meaning (even though their semantics may not be exhausted by it). If lexical representations become manifest cortically as neuronal assemblies and the actions referred to by these words are somatotopically represented in motor areas of the brain (Penfield and Rasmussen 1950), the semantic links between neuronal sets in these cortical regions should realise the semantic relationship between the word forms and their actions (Pulvermüller 2001). Crucially, this leads to the specific prediction that action words with different reference domains in the body also activate the corresponding areas of motor cortex. This claim, which received strong support from conventional neuroimaging studies (Hauk et al. 2004; Hauk and Pulvermüller 2004), was put to test in MMN experiments using different methods (EEG, MEG) and different stimuli of two languages (English, Finnish). In addition to the usually observed superior temporal MMN sources, the activation elicited by the words referring to face and/or arm movements involved inferior fronto-central areas likely including the cortical representation of the upper body (Shtyrov et al. 2004; Pulvermüller et al. 2005). Furthermore, the legrelated words elicited a stronger superior central source compatible with the leg sensorimotor representation. This leg-word specific superior fronto-central activation was seen later (~170 ms) than the more lateral activation for face- and
190 Yury Shtyrov and Friedemann Pulvermüller
arm-related words (~140 ms). These spatio-temporal characteristics suggest that MMN sources in perisylvian areas along with near-simultaneous activation distant from the Sylvian fissure can reflect access to word meaning in the cortex. The minimal delays between local activations may be mediated by conduction delays caused by the traveling of action potentials between cortical areas. These data suggested that processing of semantic features of action words is reflected in the MMN as early as 140–170 ms after acoustic signals allow for word identification. Similar to lexical access and selection, meaning access may therefore be an early brain process occurring within the first 200 ms, and the two processes may thus be near-simultaneous. Similar, if not more provoking, results were produced by MMN experiments on semantic context integration. In these, complete phrases or word combinations were presented to the subjects in a passive non-attend oddball design while the physical stimulus properties were strictly controlled for. Sometimes, the deviant combinations included a semantic mismatch between the words, similar to the established semantic violation paradigm known to produce an N400 response. Surprisingly, these contextual sentence-level violations modulated the MMN response rather early in its time course and, in the same time, elicited no N400-like response in the passive oddball paradigm. In one of the studies, this modulation was seen as early as ~115 ms after the words could be recognised as different and was shown to be mediated by the superior-temporal and inferior-frontal cortices in the left hemisphere (Shtyrov and Pulvermüller 2007). In another study (Menning et al. 2005), the semantic abnormalities were detected by MMN responses at 150–200 ms. Crucially, these separate proofs of MMN reflections of semantic context integration were obtained in different laboratories which utilised diverging techniques, stimulation protocols and even stimulus languages (Finnish, German). These MMN studies showed higher-order processes of semantic integration of spoken language as occurring not only outside the focus of attention, but also well before the N400 time range, within 200 ms. This strongly supports the notion of parallel access to different types of linguistic information. Additional support for this view was found in recent visual studies (Sereno et al. 2003; Penolazzi et al. 2007) indicating that the semantic context integration affect visual brain responses at latencies under 200 ms. An even higher-level aspects of pragmatic processing, such as sentence incorporation in the discourse context or even world knowledge, which were shown to have effects in the N400 time range when attention is allocated to stimuli (van Berkum et al. 2003; Chwilla and Kolk 2005), had so far not been touched by the MMN research into early automaticity. It is worth noting however that there is already a discussion as to what extent linguistic MMNs are affected by the general language environment. For example, in one study (Peltola and Aaltonen 2005)
Are you listening? 191
Figure 3. Early neurophysiological reflections of automatic semantic information processing: single-word category-specific semantic effects (adapted from Pulvermüller et al. 2005) and semantic context integration (adapted from Shtyrov and Pulvermüller 2007).
responses to the same stimuli in the oddball paradigm applied to bilingual subjects differed when they were specifically told which of the two languages (Finnish or English) were used in stimulation. However, in a different study (Winkler et al. 2003), where the experimental environment (interaction with experimenter, instructions, conversations prior to the study etc) was switched between the two familiar languages (Hungarian vs. Finnish), this environmental change had no effect on the linguistic MMN responses. Additional research is therefore needed to explore such possible high-level interactions.
7.4.4
Syntactic processes
The domain of morpho-syntactic information processing in the brain was also addressed in a number of experiments. To control exactly for physical stimulus properties, we once again presented identical spoken stimuli in different contexts. In these experiments, the critical word occurred after a context word with which it matched in syntactic features or mismatched syntactically; this has been a standard approach used in neurophysiological studies on syntax (Osterhout 1997). We again used different methods (EEG and MEG) and languages (Finnish, German and English), enabling us to draw generalised conclusions which is
192 Yury Shtyrov and Friedemann Pulvermüller
especially important in MMN studies when only a very limited set of stimuli can be used in each single study. The first experiment looked at the neurophysiological responses to Finnish pronoun-verb phrases ending in a verb suffix which did or did not agree with the pronoun in person and number; this was done in an orthogonal design in which physically identical stimuli marked a syntactically congruent or incongruent event (e.g. he brings vs. *we brings), thus fully controlling for physical stimulus features. The results showed an increase of the magnetic MMN to words in ungrammatical context compared with the MMNs to the same words in grammatical context. Similar results were also seen in English and German experiments (Pulvermüller and Shtyrov 2003; Pulvermüller and Assadollahi 2007). The latencies where grammaticality effects were found varied somewhat between studies but responses were generally present within 200 ms after the word recognition point, sometimes starting as early as at 100 ms (Pulvermüller and Shtyrov 2003; Shtyrov et al. 2003). The cortical loci where the main sources of the syntactic MMN were localized also varied. MEG results indicated a distributed superior temporal main source with possibly some weak effects in inferior frontal cortex, whereas the EEG suggested the opposite, a pronounced grammaticality effect in the left inferior frontal cortex with possibly minor superior-temporal contribution. This divergence reflects the previous neuroimaging literature on the cortical basis of syntax, where this module is sometimes localized in frontal areas and sometimes in temporal lobes (e.g. Kaan and Swaab 2002; Bornkessel et al. 2005; Kircher et al. 2005). It is likely therefore that different areas in perisylvian cortex contribute to grammatical and syntactic processing and their activity may be differentially reflected in EEG and MEG recordings due to the known specificity of these techniques to positioning and orientation of current sources in the brain. These findings of early syntactic processing outside the focus of attention were further supported by studies of other groups which used similar paradigms and found a specific MMN response peaking at 150–200 ms whenever the presented sentence stimulus contained a syntactic mismatch (Menning et al. 2005; Hasting et al. 2007). Remarkably, some of this work used a more elaborate stimulus setup, with longer sentences (unlike our two-word phrases; e.g. we *comes in Pulvermüller and Shtyrov 2003, vs. Die Frau düngt *den rosen im Mai in Menning et al. 2005), and found the syntactic mismatch effects nevertheless. One of these experiments was specifically designed to address the question of whether the syntactic MMN reflects grammaticality per se, or simply follows sequential string probability. To this end, strings with low and high sequential probability, also varying in their grammaticality, were used. The results clearly indicated that the increased amplitude of syntacitc MMN is linked to
Are you listening? 193
Figure 4. Early automatic syntax in the brain: syntactic MMN (adapted from Pulvermüller and Shtyrov 2003; and Shtyrov et al. 2003), and absence of attention effects on the earliest stages of syntactic processing (adapted from Pulvermüller and Shtyrov 2003; and Shtyrov et al. 2003).
grammaticality as such rather then the mere probability of item co-occurance in the language (Pulvermüller and Assadollahi 2007). Further, to rule out the repetition confounds of the MMN design, a different study (Hasting and Kotz 2008) used a similar acoustically controlled approach without multiple repetitions, and found the early syntactic response independent of attention in the same time range. The early syntactic MMN resembles the ELAN component (Neville et al. 1991; Friederici et al. 1993) which has been suggested to index early syntactic structure building (Friederici 2002). Syntactic MMN results support this interpretation. Importantly, they also show that the early syntactic processing in the brain does not require focused attention on the language input. In this sense, early syntactic processing seems to be automatic. This appears true even for agreement violations, which are considered more demanding computationally and which did not elicit ELAN in the past. The late positivity (P600), which is abolished in the passive oddball task, may in turn reflect secondary controlled attempts at parsing a string after initial analysis has failed (Osterhout and Holcomb 1992; Friederici 2002).
194 Yury Shtyrov and Friedemann Pulvermüller
7.4.5
Attentional control or automaticity?
One may argue that demonstrating language-related effects in a paradigm where subjects are instructed to attend to a video film or read a book while language stimuli are presented does not control strictly for attentional withdrawal. To draw conclusions on the automaticity of the processes investigated from such studies, subjects must strictly follow the instruction to try to ignore the speech stimuli which may be difficult to verify. It would be more desirable to control for the level of attention in each subject throughout the experiment more closely. Therefore, we performed an experiment to further investigate the role of attention in language processing by comparing the classic MMN paradigm with its moderate attentional withdrawal by a silent video film with a distraction task where subjects had to continuously perform a non-speech acoustic detection task. Language (syntactic) stimuli were only played through the right ear while acoustic stimuli were delivered to the left ear. In a streaming condition, subjects had to press a button to a “deviant” acoustic stimulus in the left ear while, at the same time, the language stimuli were still played to the right ear. In the other condition, subjects were allowed to watch a video as usual, without further distraction, while the same stimuli, non-speech sounds and syntactic stimuli, were presented. Results showed a replication of the grammaticality effect, i.e. stronger MMNs to ungrammatical word strings than to ungrammatical ones. Up to a latency of ~150 ms, there was no difference between the tasks with different attentional withdrawal. Only later, we found significant interactions of the task and attention factors indicating that at these later stages the grammar processes revealed by the MMN were influenced by attention and task demand. We interpret this as strong evidence for the attention independence of the early part of the MMN and for the automaticity of early syntactic analysis (Pulvermuller et al. 2008). Remarkably, we found an almost identical time course of language-attention interaction (early automaticity up to ~140–150 ms with attention-dependance at later stages) when investigating lexical contrasts (words vs pseudowords, Garagnani et al. 2009; Shtyrov et al. 2010). These data resonate well with earlier suggestions of automatic early stages as opposed to attention-dependant late processes in e.g. syntactic analysis processing (Hahne and Friederici 1999; Gunter et al. 2000; Friederici 2002).
7.5
Discussion: Theories, challenges and directions
We have reviewed the premises for neuropsysiological research into the earliest stages of language perception and their automaticity and how the MMN could be used to help elucidate them. Not only has the MMN been found sensitive to
Are you listening? 195
these different types of linguistic information but it has also proved useful for disentangling the neural correlates of corresponding processes from those related to attention or stimulus-oriented tasks and strategies. These studies also allowed for the strictest possible control over the stimulus properties minimizing the possibility that the obtained effects are simply due to acoustically or psycholinguistically unbalanced stimulus sets. First of all, these studies shed new light on the time course of access to linguistic information. Throughout these experiments, different types of stimuli, including phonetic/phonological, lexical, semantic and syntactic contrasts, modulated evoked responses at latencies within 100–200 ms after the relevant acoustic information was present in the input (Figure 5). These data supported, and very strongly so, parallel or near-simultaneous access to different types of linguistic information commencing very rapidly in the brain shortly after the words could be uniquely identified. Such access was envisaged in some earlier psycholinguistic studies and has remained hotly debated until now. Secondly, in spatial domain, these results provided strong support for the existence of distributed neural circuits which may underlie the processing of the incoming linguistic information. For example, an interplay of activations between superior-temporal and inferior-frontal cortices was shown as occurring in the process of word perception (Pulvermüller et al. 2003; Shtyrov and Pulvermüller 2007). Depending on the exact referential semantics of the words presented, constellations of areas in temporal, inferior-frontal, fronto-central dorsal cortices were sparked by the stimuli, sometimes even spreading to the right hemisphere (Pulvermüller et al. 2004, 2005). Even more refined patterns of activation delays between areas were found when scrutinisng MMN source dynamics for non-speech noise and different linguistic stimuli (Pulvermüller and Shtyrov 2009). To provide an account of early near-simultaneity and the critical latency differences between cortical area activations, a model is necessary that spells out language and conceptual processes in terms of mechanistic neuronal circuits and their activation. Mechanistic brain-based models of language are available since the early 1990s (Damasio 1990; Mesulam 1990; Pulvermüller and Preissl 1991; Braitenberg and Schüz 1992). We highlight here an account that postulates lexico-semantic circuits with specific cortical distributions (Pulvermüller and Preissl 1991; Pulvermüller 1999, 2005), as we believe that such specificity is necessary for explaining fine-grained delays between near-simultaneous area activations. The model posits that strongly connected neuronal ensembles spread out over different sets of areas of the cortex are the basis of cognitive and language processing. The momentary explosion-like ignition of one of these networks accounts for near-simultaneity of area-specific activations and the conduction delays of
196 Yury Shtyrov and Friedemann Pulvermüller
Figure 5. Near-simultaneous early neural reflection of psycholinguistic information types in speech comprehension: summary. Phonological processing became manifest in a modulation of the MMN around 140 ms (Shtyrov et al. 2005) and lexicality was reflected by two sources in superior-temporal and inferior frontal cortex, sparked, respectively, at 136 and 158 ms (Pulvermüller et al. 2003). Syntactic violations elicited a syntactic MMN at about the same time, with sources in inferior frontal and superior-temporal cortex (Shtyrov et al. 2003; Menning et al. 2005). Semantic effects were seen at 140–170 ms when the same syllables were presented in words that indicated face/arm movements, arm or leg actions (Pulvermüller et al. 2005). These results are consistent with near-simultaneous early access to different types of psycholinguistic information. Critically, there were fine-grained time lag differences, especially in the semantic domain: Legrelated words (e.g., “kick”) activated the central-dorsal sensorimotor leg representation 30 ms later than inferior-frontal areas were sparked by face/arm-related words (“eat”, “pick”). This shows category-specificity in the temporal structure of semantic brain activation.
Are you listening? 197
cortical fibers within the circuits explain fine-grained activation time lags in the millisecond range. Neuronal circuits processing spoken word forms comprise at least neurons in superior-temporal cortex activated by phonetic features of a spoken word, neurons in inferior-frontal cortex programming and controlling articulatory movements and additional neurons connecting the acoustic and articulatory populations. The strict simultaneity of acoustic, phonological and lexical processing indexes is explained by this model, as neuronal populations in the same local structure, in superior-temporal cortex, are assumed to contribute to acoustic, phonological and lexical processes. Therefore, conduction times of the auditory input to these critical sites are roughly the same. The superior-temporal lobe indeed seems to contribute to all of these processes (Scott and Johnsrude 2003; Uppenkamp et al. 2006) and local activation differences between different sounds revealed by fMRI (as well as MEG; Pulvermüller and Shtyrov 2009) may be explained, in part, by their differential linkage to articulatory circuits. Importantly, the evidence for stronger cortical activation (Shtyrov and Pulvermüller 2002) and motor links (Buccino et al. 2001) of words compared with pseudowords supports the existence of perception-action circuits for spoken words. Further evidence that these lexical memory networks link superior-temporal (acoustic) circuits to inferior-frontal (speech motor planning) circuits comes from imaging work revealing coactivation of these areas in speech processing as mentioned above. Importantly, close examination showed that the frontal areas involved are sparked 10–20 ms after superior-temporal activation, which is consistent with the conduction time of action potentials in the most common cortico-cortical fibers in human cortex (Pulvermüller et al. 2003; Shtyrov and Pulvermüller 2007). The MMN studies of syntactic processing may, in turn, contribute to theorising about neural implementation of grammar. Differences in brain responses between grammatical and ungrammatical strings are usually interpreted as evidence that some grammar-related process is activated by grammatical deviance. The MMN studies provided an alternative view. Because physically identical items were used in grammatical or erroneous contexts, and sometimes outside of any context, we could detail the response dynamics more precisely. Compared with the control conditions where words were presented out of linguistic context (Pulvermüller and Shtyrov 2003), the grammatical violation did not increase significantly the early brain response, but, instead, the grammatically correct string modulated early word-evoked activity by reducing it. The most feasible explanation for this is that neural representations of frequently co-occurring items (e.g. a pronoun and a corresponding verb affix, as in he brings) become linked via, for example, a sequence detector (Knoblauch and Pulvermüller 2005). Therefore,
198 Yury Shtyrov and Friedemann Pulvermüller
presentation of one of such items leads not only to the activation of its own representation in the brain, but also to preactivation of the related representation. This is equivalent to priming which is known to reduce the size of neurobiological responses. In syntactic context, word/morpheme representations can prime each other, possibly through a sequence detector linking them together, so that the brain response to the critical word would be attenuated by grammatical context. This process of syntactic priming at the neuronal level (Pulvermüller 2002) would be absent both for words in ungrammatical and non-linguistic context. The syntactic MMN-related effects are thus consistent with the sequence detector model for syntax, but are not easily explained as the sign of an “error signal” produced by the brain-internal grammar. The work reviewed here has been instrumental in promoting the view of early linguistic activity as independent of attention and therefore automatic. Discrete memory circuits for existing words appear to be fully activated irrespective of attention level when the respective words are heard. Strong mutual connections between the subparts of such memory networks may be the reason why the entire circuit becomes quickly activated to its full capacity even when little or no attention is allocated to language. Whereas modulation of these connections by attention is still possible, it does not appear relevant for the early network activity, which may thus indeed be automatic and even encapsulated from other cognitive processes. In computational terms, attention effects may be explained as strength of cortical feedback connections: whereas low attention resources may be realised as strong feedback inhibition, reduction in this inhibition leads to greater availability of attentional resources (Garagnani et al. 2008). If such a global adjustment of cortical activity levels, a regulation system that can be a basis for attentional resources, exists, a clear prediction can be made. As attention is allocated to or withdrawn from the linguistic input, stimuli without existing representations would partially activate a larger range or only a few lexical representations belonging to the same cohort (Marslen-Wilson 1987). In the same time, early activation processes for existing words can be expected to be relatively immune to attentional modulation, as words would automatically activate their existing discrete memory circuits irrespective of attention variation (Garagnani et al. 2008). Thus, the advantage of words over pseudowords may only be observed when the stimuli are not attended (Shtyrov et al. 2010); when attention is required this effect may be cancelled or even reversed. The latter is certainly the case in N400 experiments, where an N400 advantage for pseudowords over words is typically found (Friedrich et al. 2006). An experimental test of this suggestion in the context of a single experiment could provide a further validation of the proposed model (Garagnani et al. 2009).
Are you listening? 199
It is still, however, the issue of attention control in MMN experiments that they are frequently criticised for. Traditional passive MMN paradigms, in which the subjects’ primary task is to watch a videofilm, play a computer game or read a book, indeed cannot guarantee that the subject cannot sometimes ‘sneak a little listen’ to the experimental stimuli (Carlyon 2004). One of the possible directions for future research, therefore, is to further explore more stringent protocols, including distracters in the same modality, in order to be able to fully separate attention-independent, and thereby possibly automatic, linguistic processes from those that require the individual’s active participation. As reviewed above, we have been successful in doing this to validate the syntactic MMN as being attention-independent at its earliest stages. Although this, combined with other results, strongly suggests that the earliest linguistic access is automatic, this approach needs to be extended to different types of linguistic information processing (cf. Garagnani et al. 2009; Shtyrov et al. 2010). The reviewed data by no means falsify the earlier studies which delineated the later neurophysiological indices of language processing. Instead, they add a new dimension to the neurolinguistic research suggesting that these later processes reflected in e.g. N400, P600 or syntactic negativities (LAN) may build upon the early automatic ones and are possibly related to secondary mental processes that can be triggered by the earlier access to linguistic memory representations, but depend on attention and task-related strategies. As many psycholinguistic and conceptual processes seem to be brain-reflected both early and late, it is important to clarify the relation between the two time ranges. Are the early semantic, syntactic and lexical effects just the beginning of late effects? Their differential dynamics speak against this possibility. Would the late processes just repeat the early ones, or occur only if the early ones are unsuccessful? Here, our own data indicate that the early near-simultaneous processes exhibit surprising specificity to information types, both topographically and in terms of cortical generators, whereas the late ones, for example the N400, seem equally modulated by different linguistic features (including word frequency, lexicality, and semantic properties, Hauk et al. 2006). Still, are the late components reflections of prolonged specifically linguistic processes, in-depthor re-processing, or would they rather reflect post-comprehension processes (Glenberg and Kaschak 2002) following completed psycholinguistic information access and context integration? And how fixed are the lags between different activations? It has been recently shown that stimulus context can modulate the time lag of brain responses indexing word and object processing (Barber and Kutas 2007). Such context dependence and flexibility is of greatest relevance in the study of cognitive processes and points the way to fruitful future research. The relationship between specific activation times of defined brain areas on the
200 Yury Shtyrov and Friedemann Pulvermüller
one hand and specific cognitive processes on the other is one of the most exciting topics in cognitive neuroscience at present. Addressing this issue using MEG/ EEG and source localization has only become possible very recently. If successful, the new available methods will make it possible to read the activation signatures of cortical circuits processing language and concepts in the human brain in more and more detail, leading the science of language to a new stage.
Acknowledgements Supported by the Medical Research Council, UK (U.1055.04.014.00001.01, U.1055.04.003.00001.01).
References Alho, K. (1992). Selective attention in auditory processing as reflected by event-related brain potentials. Psychophysiology, 29, 247–263. Alho, K. (1995). Cerebral generators of mismatch negativity (MMN) and its magnetic counterpart (MMNm) elicited by sound changes. Ear and Hearing, 16, 38–51. Barber, H. A. and Kutas, M. (2007). Interplay between computational models and cognitive electrophysiology in visual word recognition. Brain Research Review, 53, 98–123. Bentin, S., Mouchetant-Rostaing, Y., Giard, M. H., Echallier, J. F. and Pernier, J. (1999). ERP manifestations of processing printed words at different psycholinguistic levels: Time course and scalp distribution. Journal of Cognitive Neuroscience, 11, 235–260. Bonte, M. L., Mitterer, H., Zellagui, N., Poelmans, H. and Blomert, L. (2005). Auditory cortical tuning to statistical regularities in phonology. Clinical Neurophysiology, 116, 2765–2774. Bonte, M. L., Poelmans, H. and Blomert, L. (2007). Deviant neurophysiological responses to phonological regularities in speech in dyslexic children. Neuropsychologia, 45, 1427–1437. Bornkessel, I., Zysset, S., Friederici, A. D., von Cramon, D. Y. and Schlesewsky, M. (2005). Who did what to whom? The neural basis of argument hierarchies during language comprehension. Neuroimage, 26, 221–233. Braitenberg, V. and Schüz, A. (1992). Basic features of cortical connectivity and some considerations on language. In J. Wind, B. Chiarelli, B. H. Bichakjian, A. Nocentini and A. Jonker (Eds.), Language origin: A multidisciplinary approach (pp. 89–102). Dordrecht: Kluwer. Buccino, G., Binkofski, F., Fink, G. R., Fadiga, L., Fogassi, L., Gallese, V., Seitz, R. J., Zilles, K., Rizzolatti, G. and Freund, H. J. (2001). Action observation activates premotor and parietal areas in a somatotopic manner: An fMRI study. European Journal of Neuroscience, 13, 400–404. Carlyon, R. P. (2004). How the brain separates sounds. Trends in Cognitive Sciences, 8, 465– 471. Cheour, M., Alho, K., Ceponiene, R., Reinikainen, K., Sainio, K., Pohjavuori, M., Aaltonen, O. and Näätänen, R. (1998). Maturation of mismatch negativity in infants. International Journal of Psychophysiology, 29, 217–226.
Are you listening? 201
Chwilla, D. J. and Kolk, H. H. (2005). Accessing world knowledge: Evidence from N400 and reaction time priming. Brain Research – Cognitive Brain Research, 25, 589–606. Colin, C., Radeau, M., Soquet, A. and Deltenre, P. (2004). Generalization of the generation of an MMN by illusory McGurk percepts: Voiceless consonants. Clinical Neurophysiology, 115, 1989–2000. Colin, C., Radeau, M., Soquet, A., Demolin, D., Colin, F. and Deltenre, P. (2002). Mismatch negativity evoked by the McGurk-MacDonald effect: A phonetic representation within short-term memory. Clinical Neurophysiology, 113, 495–506. Damasio, A. R. (1990). Category-related recognition defects as a clue to the neural substrates of knowledge. Trends in Neurosciences, 13, 95–98. Dehaene-Lambertz, G. (1997). Electrophysiological correlates of categorical phoneme perception in adults. Neuroreport, 8, 919–924. Dehaene-Lambertz, G., Dupoux, E. and Gout, A. (2000). Electrophysiological correlates of phonological processing: A cross-linguistic study. Journal of Cognitive Neuroscience, 12, 635–647. Dell, G. S. (1986). A spreading-activation theory of retreival in sentence production. Psychological Review, 93, 283–321. Diesch, E., Biermann, S. and Luce, T. (1998). The magnetic mismatch field elicited by words and phonological non-words. Neuroreport, 9, 455–460. Embick, D., Hackl, M., Schaeffer, J., Kelepir, M. and Marantz, A. (2001). A magnetoencephalographic component whose latency reflects lexical frequency. Brain Research – Cognitive Brain Research, 10, 345–348. Endrass, T., Mohr, B. and Pulvermüller, F. (2004). Enhanced mismatch negativity brain response after binaural word presentation. European Journal of Neuroscience, 19, 1653–1660. Escera, C., Alho, K., Winkler, I. and Näätänen, R. (1998). Neural mechanisms of involuntary attention to acoustic novelty and change. Journal of Cognitive Neuroscience, 10, 590–604. Fodor, J. A. (1983). The modulatity of mind. Cambridge, MA: MIT Press. Frege, G. (1980). Über Sinn und Bedeutung (first published in 1892). In G. Patzig (Ed.), Funktion, Begriff, Bedeutung (pp. 25–50). Göttingen: Huber. Friederici, A. (2002). Towards a neural basis of auditory sentence processing. Trends in Cognitive Sciences, 6, 78–84. Friederici, A. D., Pfeifer, E. and Hahne, A. (1993). Event-related brain potentials during natural speech processing: Effects of semantic, morphological and syntactic violations. Cognitive Brain Research, 1, 183–192. Friederici, A. D., Wang, Y., Herrmann, C. S., Maess, B. and Oertel, U. (2000). Localization of early syntactic processes in frontal and temporal cortical areas: A magnetoencephalographic study. Human Brain Mapping, 11, 1–11. Friedrich, C. K., Eulitz, C. and Lahiri, A. (2006). Not every pseudoword disrupts word recognition: An ERP study. Behavior and Brain Functions, 2, 36. Fromkin, V. A. (1973). Introduction. In V. A. Fromkin (Ed.), Speech errors as linguistic evidence (pp. 11–45). The Hague: Mouton. Garagnani, M., Shtyrov, Y. and Pulvermüller, F. (2009). Effects of attention on what is known and what is not: MEG evidence for functionally discrete memory circuits. Frontiers in Human Neuroscience, 3, Article 10. Garagnani, M., Wennekers, T. and Pulvermuller, F. (2008). A neuroanatomically grounded Hebbian-learning model of attention-language interactions in the human brain. European Journal of Neuroscience, 27, 492–513.
202 Yury Shtyrov and Friedemann Pulvermüller
Garrett, M. (1980). Levels of processing in sentence production. In B. Butterworth (Ed.), Language Production I (pp. 177–220). London: Academic Press. Garrett, M. (1984). The organization of processing structures for language production. In D. Caplan, A. R. Lecours and A. Smith (Eds.), Biological perspectives on language (pp. 172– 193). Cambridge, MA: MIT Press. Garrod, S. and Pickering, M. J. (2004). Why is conversation so easy? Trends in Cognitive Sciences, 8, 8–11. Glenberg, A. M. and Kaschak, M. P. (2002). Grounding language in action. Psychonomic Bulletin and Reviews, 9, 558–565. Gunter, T. C., Friederici, A. D. and Schriefers, H. (2000). Syntactic gender and semantic expectancy: ERPs reveal early autonomy and late interaction. Journal of Cognitve Neuroscience, 12, 556–568. Hahne, A. and Friederici, A. D. (1999). Electrophysiological evidence for two steps in syntactic analysis. Early automatic and late controlled processes. Journal of Cognitive Neuroscience, 11, 194–205. Hasting, A. S. and Kotz, S. A. (2008). Speeding up syntax: On the relative timing and automaticity of local phrase structure and morphosyntactic processing as reflected in event-related brain potentials. Journal of Cognitive Neuroscience, 20, 1207–1219. Hasting, A. S., Kotz, S. A. and Friederici, A. D. (2007). Setting the stage for automatic syntax processing: The mismatch negativity as an indicator of syntactic priming. Journal of Cognitive Neuroscience, 19, 386–400. Hasting, A. S., Winkler, I. and Kotz, S. A. (2008). Early differential processing of verbs and nouns in the human brain as indexed by event-related brain potentials. European Journal of Neuroscience, 27, 1561–1565. Hauk, O., Davis, M. H., Ford, M., Pulvermuller, F. and Marslen-Wilson, W. D. (2006). The time course of visual word recognition as revealed by linear regression analysis of ERP data. Neuroimage, 30, 1383–1400. Hauk, O., Johnsrude, I. and Pulvermüller, F. (2004). Somatotopic representation of action words in human motor and premotor cortex. Neuron, 41, 301–307. Hauk, O. and Pulvermüller, F. (2004). Neurophysiological distinction of action words in the fronto-central cortex. Human Brain Mapping, 21, 191–201. Hebb, D. O. (1949). The organization of behavior. A neuropsychological theory. New York: John Wiley. Honbolygo, F., Csepe, V. and Rago, A. (2004). Suprasegmental speech cues are automatically processed by the human brain: A mismatch negativity study. Neuroscience Letters, 363, 84–88. Jacobsen, T., Horvath, J., Schroger, E., Lattner, S., Widmann, A. and Winkler, I. (2004). Pre-attentive auditory processing of lexicality. Brain and Language, 88, 54–67. Kaan, E. and Swaab, T. Y. (2002). The brain circuitry of syntactic comprehension. Trends in Cognitive Sciences, 6, 350–356. Kircher, T. T., Oh, T. M., Brammer, M. J. and McGuire, P. K. (2005). Neural correlates of syntax production in schizophrenia. British Journal of Psychiatry, 186, 209–214. Knoblauch, A. and Pulvermüller, F. (2005). Sequence detector networks and associative learning of grammatical categories. In S. Wermter, G. Palm and M. Elshaw (Eds.), Biomimetic neural learning for intelligent robots. Heidelberg: Springer.
Are you listening? 203
Korpilahti, P., Krause, C. M., Holopainen, I. and Lang, A. H. (2001). Early and late mismatch negativity elicited by words and speech-like stimuli in children. Brain and Language, 76, 332–339. Korth, M. and Nguyen, N. X. (1997). The effect of stimulus size on human cortical potentials evoked by chromatic patterns. Vision Research, 37, 649–657. Krumbholz, K., Patterson, R. D., Seither-Preisler, A., Lammertmann, C. and Lutkenhoner, B. (2003). Neuromagnetic evidence for a pitch processing center in Heschl’s gyrus. Cerebral Cortex, 13, 765–772. Kujala, A., Alho, K., Valle, S., Sivonen, P., Ilmoniemi, R. J., Alku, P. and Näätänen, R. (2002). Context modulates processing of speech sounds in the right auditory cortex of human subjects. Neuroscience Letters, 331, 91–94. Kutas, M. and Hillyard, S. A. (1980). Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, 207, 203–205. Levelt, W. J. M., Roelofs, A. and Meyer, A. S. (1999). A theory of lexical access in speech production. Behavioral and Brain Sciences, 22, 1–75. Lutkenhoner, B., Krumbholz, K., Lammertmann, C., Seither-Preisler, A., Steinstrater, O. and Patterson, R. D. (2003). Localization of primary auditory cortex in humans by magnetoencephalography. Neuroimage, 18, 58–66. MacKay, D. G. (1987). The organization of perception and action. A theory of language and other cognitive skills. New York: Springer. Marslen-Wilson, W. (1973). Linguistic structure and speech shadowing at very short latencies. Nature, 244, 522–523. Marslen-Wilson, W. and Tyler, L. K. (1975). Processing structure of sentence perception. Nature, 257, 784–786. Marslen-Wilson, W. D. (1985). Speech shadowing and speech comprehension. Speech and Communication, 4, 55–74. Marslen-Wilson, W. D. (1987). Functional parallelism in spoken word-recognition. Cognition, 25, 71–102. Menning, H., Zwitserlood, P., Schoning, S., Hihn, H., Bolte, J., Dobel, C., Mathiak, K. and Lutkenhoner, B. (2005). Pre-attentive detection of syntactic and semantic errors. Neuroreport, 16, 77–80. Mesulam, M. M. (1990). Large-scale neurocognitive networks and distributed processing for attention, language, and memory. Annals of Neurology, 28, 597–613. Mohr, B. and Pulvermüller, F. (2002). Redundancy gains and costs in cognitive processing: Effects of short stimulus onset asynchronies. Journal of Experimental Psychology: Learning, Memory and Cognition, 28, 1200–1223. Morton, J. (1969). The interaction of information in word recognition. Psychological Review, 76, 165–178. Moss, H. E., McCormick, S. F. and Tyler, L. (1997). The time course of activation of semantic information during spoken word recognition. Language and Cognitive Processes, 12, 695–731. Münte, T. F., Schiltz, K. and Kutas, M. (1998). When temporal terms belie conceptual order. Nature, 395, 71–73. Näätänen, R. (1995). The mismatch negativity: A powerful tool for cognitive neuroscience. Ear and Hearing, 16, 6–18. Näätänen, R. (2000). Mismatch negativity (MMN): Perspectives for application. International Journal of Psychophysiology, 37, 3–10.
204 Yury Shtyrov and Friedemann Pulvermüller
Näätänen, R. (2001). The perception of speech sounds by the human brain as reflected by the mismatch negativity (MMN) and its magnetic equivalent (MMNm). Psychophysiology, 38, 1–21. Näätänen, R. and Alho, K. (1995). Mismatch negativity – a unique measure of sensory processing in audition. International Journal of Neuroscience, 80, 317–337. Näätänen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen, M., Iivonen, A., Vainio, M., Alku, P., Ilmoniemi, R. J., Luuk, A., Allik, J., Sinkkonen, J. and Alho, K. (1997). Languagespecific phoneme representations revealed by electric and magnetic brain responses. Nature, 385, 432–434. Näätänen, R. and Picton, T. (1987). The N1 wave of the human electric and magnetic response to sound: A review and an analysis of the component structure. Psychophysiology, 24, 375–425. Näätänen, R., Tervaniemi, M., Sussman, E., Paavilainen, P. and Winkler, I. (2001). ‘Primitive intelligence’ in the auditory cortex. Trends in Neurosciences, 24, 283–288. Neville, H., Nicol, J. L., Barss, A., Forster, K. I. and Garrett, M. F. (1991). Syntactically based sentence processing classes: Evidence from event-related brain potentials. Journal of Cognitive Neuroscience, 3, 151–165. Norris, D., McQueen, J. M. and Cutler, A. (2000). Merging information in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences, 23, 299–370. Obleser, J., Lahiri, A. and Eulitz, C. (2004). Magnetic brain response mirrors extraction of phonological features from spoken vowels. Journal of Cognitive Neuroscience, 16, 31–39. Osterhout, L. (1997). On the brain response to syntactic anomalies: Manipulations of word position and word class reveal individual differences. Brain and Language, 59, 494–522. Osterhout, L. and Hagoort, P. (1999). A superficial resemblance does not necessarily mean you are part of the family: Counterarguments to Coulson, King and Kutas (1998) in the P600/ SPS-P300 debate. Language and Cognitive Processes, 14, 1–14. Osterhout, L. and Holcomb, P. J. (1992). Event-related brain potentials elicited by syntactic anomaly. Journal of Memory and Language, 31, 785–806. Osterhout, L., McLaughlin, J. and Bersick, M. (1997). Event-related brain potentials and human language. Trends in Cognitive Sciences, 1, 203–209. Osterhout, L. and Swinney, D. A. (1993). On the temporal course of gap-filling during comprehension of verbal passives. Journal of Psycholinguistic Research, 22, 273–286. Paavilainen, P., Alho, K., Reinikainen, K., Sams, M. and Näätänen, R. (1991). Right hemisphere dominance of different mismatch negativities. Electroencephalography and Clinical Neurophysiology, 78, 466–479. Peltola, M. S. and Aaltonen, O. (2005). Long-term memory trace activation for vowels depends on the mother tongue and the linguistic context. Journal of Psychophysiology, 19, 159– 164. Penfield, W. and Rasmussen, T. (1950). The cerebral cortex of man. New York: Macmillan. Penolazzi, B., Hauk, O. and Pulvermüller, F. (2007). Early semantic context integration and lexical access as revealed by event-related brain potentials. Biological Psychology, 74, 374– 388. Pettigrew, C. M., Murdoch, B. E., Ponton, C. W., Finnigan, S., Alku, P., Kei, J., Sockalingam, R. and Chenery, H. J. (2004). Automatic auditory processing of English words as indexed by the mismatch negativity, using a multiple deviant paradigm. Ear and Hearing, 25, 284– 301.
Are you listening? 205
Pickering, M. J. and Garrod, S. (2004). Toward a mechanistic psychology of dialogue. Behavioral Brain Sciences, 27, 169–190. Picton, T. and Hillyard, S. (1974). Human auditory evoked potentials: II. Effects of attention. Electroencephalography and Clinical Neurophysiology, 36, 191–200. Picton, T. W., Alain, C., Otten, L., Ritter, W. and Achim, A. (2000). Mismatch negativity: Different water in the same river. Audiology and Neurootology, 5, 111–139. Poeppel, D., Yellin, E., Phillips, C., Roberts, T. P., Rowley, H. A., Wexler, K. and Marantz, A. (1996). Task-induced asymmetry of the auditory evoked M100 neuromagnetic field elicited by speech sounds. Brain Research – Cognitive Brain Research, 4, 231–242. Pulvermüller, F. (1999). Words in the brain’s language. Behavioral and Brain Sciences, 22, 253– 336. Pulvermüller, F. (2001). Brain reflections of words and their meaning. Trends in Cognitive Sciences, 5, 517–524. Pulvermüller, F. (2002). A brain perspective on language mechanisms: From discrete neuronal ensembles to serial order. Progress in Neurobiology, 67, 85–111. Pulvermüller, F. (2005). Brain mechanisms linking language and action. Nature Review Neurosciences, 6, 576–582. Pulvermüller, F. and Assadollahi, R. (2007). Grammar or serial order?: Discrete combinatorial brain mechanisms reflected by the syntactic mismatch negativity. Journal of Cognitive Neuroscience, 19, 971–980. Pulvermüller, F., Kujala, T., Shtyrov, Y., Simola, J., Tiitinen, H., Alku, P., Alho, K., Martinkauppi, S., Ilmoniemi, R. J. and Näätänen, R. (2001). Memory traces for words as revealed by the mismatch negativity. Neuroimage, 14, 607–616. Pulvermüller, F. and Preissl, H. (1991). A cell assembly model of language. Network: Computation in Neural Systems, 2, 455–468. Pulvermüller, F. and Shtyrov, Y. (2003). Automatic processing of grammar in the human brain as revealed by the mismatch negativity. Neuroimage, 20, 159–172. Pulvermüller, F. and Shtyrov, Y. (2006). Language outside the focus of attention: The mismatch negativity as a tool for studying higher cognitive processes. Progress in Neurobiology, 79, 49–71. Pulvermüller, F. and Shtyrov, Y. (2009). Spatiotemporal signatures of large-scale synfire chains for speech processing as revealed by MEG. Cerebral Cortex, 19, 79–88. Pulvermüller, F., Shtyrov, Y., Hasting, A. S. and Carlyon, R. P. (2008). Syntax as a reflex: Neurophysiological evidence for early automaticity of grammatical processing. Brain and Language, 104, 244–253. Pulvermüller, F., Shtyrov, Y. and Ilmoniemi, R. (2003). Spatiotemporal dynamics of neural language processing: An MEG study using minimum-norm current estimates. Neuroimage, 20, 1020–1025. Pulvermüller, F., Shtyrov, Y. and Ilmoniemi, R. (2005). Brain signatures of meaning access in action word recognition. Journal of Cognitive Neuroscience, 17, 884–892. Pulvermüller, F., Shtyrov, Y., Ilmoniemi, R. J. and Marslen-Wilson, W. D. (2006). Tracking speech comprehension in space and time. Neuroimage, 31, 1297–1305. Pulvermüller, F., Shtyrov, Y., Kujala, T. and Näätänen, R. (2004). Word-specific cortical activity as revealed by the mismatch negativity. Psychophysiology, 41, 106–112. Pylkkanen, L., Stringfellow, A. and Marantz, A. (2002). Neuromagnetic evidence for the timing of lexical activation: An MEG component sensitive to phonotactic probability but not to neighborhood density. Brain and Language, 81, 666–678.
206 Yury Shtyrov and Friedemann Pulvermüller
Rastle, K., Davis, M. H., Marslen-Wilson, W. D. and Tyler, L. K. (2000). Morphological and semantic effects in visual word recognition: A time-course study. Language and Cognitive Processes, 15, 507–537. Schröger, E. (1996). On detection of auditory diviations: A pre-attentive activation model. Psychophysiology, 34, 245–257. Scott, S. K. and Johnsrude, I. S. (2003). The neuroanatomical and functional organization of speech perception. Trends in Neurosciences, 26, 100–107. Sereno, S. C., Brewer, C. C. and O’Donnell, P. J. (2003). Context effects in word recognition: Evidence for early interactive processing. Psychological Science, 14, 328–333. Sereno, S. C. and Rayner, K. (2003). Measuring word recognition in reading: Eye movements and event-related potentials. Trends in Cognitive Sciences, 7, 489–493. Shtyrov, Y., Hauk, O. and Pulvermüller, F. (2004). Distributed neuronal networks for encoding category-specific semantic information: The mismatch negativity to action words. European Journal of Neuroscience, 19, 1083–1092. Shtyrov, Y., Kujala, T., Ahveninen, J., Tervaniemi, M., Alku, P., Ilmoniemi, R. J. and Näätänen, R. (1998). Background acoustic noise and the hemispheric lateralization of speech processing in the human brain: Magnetic mismatch negativity study. Neuroscience Letters, 251, 141–144. Shtyrov, Y., Kujala, T., Palva, S., Ilmoniemi, R. J. and Näätänen, R. (2000). Discrimination of speech and of complex nonspeech sounds of different temporal structure in the left and right cerebral hemispheres. Neuroimage, 12, 657–663. Shtyrov, Y., Kujala, T. and Pulvermüller, F. (2010). Interactions between language and attention systems: Early automatic lexical processing? Journal of Cognitive Neuroscience, 22, in press. Shtyrov, Y., Osswald, K. and Pulvermüller, F. (2008). Memory traces for spoken words in the brain as revealed by the hemodynamic correlate of the mismatch negativity. Cerebral Cortex, 18, 29–37. Shtyrov, Y., Pihko, E. and Pulvermüller, F. (2005). Determinants of dominance: Is language laterality explained by physical or linguistic features of speech? Neuroimage, 27, 37–47. Shtyrov, Y. and Pulvermüller, F. (2002). Neurophysiological evidence of memory traces for words in the human brain. Neuroreport, 13, 521–525. Shtyrov, Y. and Pulvermüller, F. (2007). Early MEG activation dynamics in the left temporal and inferior frontal cortex reflect semantic context integration. Journal of Cognitive Neuroscience, 19, 1633–1642. Shtyrov, Y., Pulvermüller, F., Näätänen, R. and Ilmoniemi, R. J. (2003). Grammar processing outside the focus of attention: An MEG study. Journal of Cognitive Neuroscience, 15, 1195– 1206. Sittiprapaporn, W., Chindaduangratn, C., Tervaniemi, M. and Khotchabhakdi, N. (2003). Preattentive processing of lexical tone perception by the human brain as indexed by the mismatch negativity paradigm. Annals of the New York Academy of Sciences, 999, 199–203. Stockall, L., Stringfellow, A. and Marantz, A. (2004). The precise time course of lexical activation: MEG measurements of the effects of frequency, probability, and density in lexical decision. Brain and Language, 90, 88–94. Tiitinen, H., May, P. and Näätänen, R. (1997). The transient 40-Hz response, mismatch negativity, and attentional processes in humans. Progress in Neuropsychopharmacology and Biological Psychiatry, 21, 751–771.
Are you listening? 207
Tiitinen, H., May, P., Reinikainen, K. and Näätänen, R. (1994). Attentive novelty detection in humans is governed by pre-attentive sensory memory. Nature, 372, 90–92. Tyler, L., Moss, H. E., Galpin, A. and Voice, J. K. (2002). Activating meaning in time: The role of imageability and form-class. Language and Cognitive Processes, 17, 471–502. Uppenkamp, S., Johnsrude, I. S., Norris, D., Marslen-Wilson, W. and Patterson, R. D. (2006). Locating the initial stages of speech-sound processing in human temporal cortex. Neuroimage, 31, 1284–1296. van Berkum, J. J., Zwitserlood, P., Hagoort, P. and Brown, C. M. (2003). When and how do listeners relate a sentence to the wider discourse? Evidence from the N400 effect. Brain Research – Cognitive Brain Research, 17, 701–718. Weber, C., Hahne, A., Friedrich, M. and Friederici, A. D. (2004). Discrimination of word stress in early infant perception: Electrophysiological evidence. Brain Research – Cognitive Brain Research, 18, 149–161. Winkler, I., Kujala, T., Alku, P. and Naatanen, R. (2003). Language context and phonetic change detection. Brain Research – Cognitive Brain Research, 17, 833–844. Winkler, I., Kujala, T., Tiitinen, H., Sivonen, P., Alku, P., Lehtokoski, A., Czigler, I., Csepe, V., Ilmoniemi, R. J. and Näätänen, R. (1999). Brain responses reveal the learning of foreign language phonemes. Psychophysiology, 36, 638–642. Winkler, I., Reinikainen, K. and Näätänen, R. (1993). Event-related brain potentials reflect traces of echoic memory in humans. Perception and Psychophysics, 53, 443–449. Woods, D. L., Alho, K. and Algazi, A. (1993a). Intermodal selective attention: Evidence for processing in tonotopic auditory fields. Psychophysiology, 30, 287–295. Woods, D. L., Knight, R. T. and Scabini, D. (1993b). Anatomical substrates of auditory selective attention: Behavioral and electrophysiological effects of posterior association cortex lesions. Brain Research – Cognitive Brain Research, 1, 227–240. Yamasaki, H., LaBar, K. S. and McCarthy, G. (2002). Dissociable prefrontal brain systems for attention and emotion. Proceedings of the National Academy of Sciences USA, 99, 11447– 11451. Yantis, S., Schwarzbach, J., Serences, J. T., Carlson, R. L., Steinmetz, M. A., Pekar, J. J. and Courtney, S. M. (2002). Transient neural activity in human parietal cortex during spatial attention shifts. Nature Neuroscience, 5, 995–1002. Zwitserlood, P. (1989). The locus of the effects of sentential-semantic context in spoken-word processing. Cognition, 32, 25–64.
chapter 8
Unconscious memory representations underlying music-syntactic processing and processing of auditory oddballs* Stefan Koelsch
University of Sussex, Brighton, UK / Max-Planck-Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
8.1
Introduction
In 1992, a study from Saarinen and colleagues changed the concept of the mismatch negativity (MMN) dramatically (Saarinen et al. 1992). Whereas previous studies had investigated the MMN only with physical deviants (such as frequency, intensity, or timbre deviants), Saarinen et al. showed that a brain response reminiscent of the MMN can be elicited by changes of abstract auditory features (in that study, standard stimuli were tone pairs with frequency levels that varied across a wide range, but were always rising in pitch, whereas deviants were tone pairs falling in pitch). By introducing the concept of an “abstract feature MMN” (henceforth referred to as afMMN), Saarinen et al. implicitly changed the previous concept of the MMN as a response to a physical deviance within a repetitive auditory environment (henceforth referred to as phMMN) to a concept of the MMN as a negative event-related potential (ERP) response to mismatches in general, i.e. to mismatches that do not necessarily have to be physical in nature (for other studies reporting abstract feature MMNs see, e.g., Paavilainen et al. 1998, 2001, 2003, 2007; Korzyukov et al. 2003; Schröger et al. 2007). Hence, when a few years after the study from Saarinen et al. (1992) a study on neurophysiological correlates of music processing reported a mismatch response for music-syntactic regularities (Koelsch et al. 2000), it was difficult to decide whether or not this mismatch response should be referred to as MMN: In that study (Koelsch et al. 2000), stimuli were chord sequences, each sequence consisting of five chords. There were three sequence types of interest: (1) sequences consisting of music-syntactically regular chords, (2) sequences with a
210 Stefan Koelsch
music-syntactically irregular chord at the third position (i.e., in the middle) of the sequence, and (3) sequences with a music-syntactically irregular chord at the fifth (i.e. final) position of the sequence (Figure 1a; for studies using similar experimental stimuli see Loui et al. 2005; Leino et al. 2007). Irregular chords were so-called “Neapolitan sixth chords” which are consonant chords when played in isolation, but which are harmonically only distantly related to the preceding harmonic context and, hence, sound highly unexpected when presented at the end of a chord sequence (right panel of Figure 1a). The same chords presented in the middle of these chord sequences (middle panel of Figure 1a), however, sound much less unexpected, but relatively acceptable (presumably because Neapolitan sixth chords are similar to subdominants, which are music-syntactically regular at that position of the sequence). In the experiments of Koelsch et al. (2000), chord sequences were presented in direct succession (reminiscent of a musical piece, Figure 1b), with 50 percent of the stimuli being regular sequences,
Figure 1. (a) Examples of chord sequences containing in-key chords only (left), a Neapolitan sixth chord at the third (middle), and at the fifth position (right). In the experiment, sequences were presented in direct succession (b). Compared to regular in-key chords, the music-syntactically irregular Neapolitan chords elicited an ERAN (c). Note that when Neapolitans are presented at the fifth position of a chord sequence (where they are music-syntactically highly irregular), the ERAN has a larger amplitude compared to when Neapolitan chords are presented at the third position of the sequences (where they are music-syntactically less irregular than at the fifth position).
Unconscious memory representations 211
25 percent containing an irregular chord at the third, and 25 percent an irregular chord at the final position of the sequence. The irregular chords elicited an ERP effect which had a strong resemblance with the MMN: It had negative polarity, maximal amplitude values over frontal leads (with right-hemispheric predominance), and a peak latency of about 150–180 ms (Figure 1c). However, when Tom Gunter and I discovered this brain response, we did not label it as “music-syntactic MMN”, but as early right anterior negativity (ERAN, Koelsch et al. 2000). One reason for this terminology was that the ERAN was also strongly reminiscent of an ERP effect elicited by syntactic irregularities during language perception: the early left anterior negativity (ELAN, Friederici 2002; see also below). Denoting the ERP response to harmonic irregularities as ERAN, thus, emphasized the notion that this ERP was specifically related to the processing of musical syntax (or “musical structure”). Nevertheless, some subsequent studies have also referred to this effect as music-syntactic MMN (Koelsch et al. 2002a, 2003a, b, c), not only due to the resemblance with the MMN, but also because the term early right anterior negativity falls short when the effect elicited by irregular chords is not significantly lateralized. Lack of lateralization also led authors to label effects elicited by music-syntactically irregular events as early anterior negativity (Loui et al. 2005), or early negativity (Steinbeis et al. 2006). However, other studies used the term ERAN even when the effect was not significantly right-lateralized over the scalp for two reasons: (1) Functional neuroimaging studies consistently showed right-hemispheric predominance for the activation of the structures that are assumed to be involved in the generation of the ERAN (see also section about the neural generators of the ERAN in this chapter). That is, although the ERAN was sometimes not significantly lateralized in the scalp topographies of electroencephalographic (EEG) data, it is highly likely that the brain activity underlying the generation of the ERAN was nevertheless stronger in the right than in the left hemisphere. (2) The term ERAN was also established with regard to the functional significance of this ERP component, rather than only for its scalp distribution (Koelsch et al. 2007; Maess et al. 2001; Miranda and Ullman 2007; Maidhof and Koelsch 2007). Note that similar conflicts exist for most (if not all) endogenous ERP components: E.g., the P300 is often not maximal around 300 ms (e.g., McCarthy and Donchin 1981), the N400 elicited by violations in high cloze probability sentences typically starts around the P2 latency range (Gunter et al. 2000; van den Brink et al. 2001), and the MMN has sometimes positive polarity in infants (e.g., Winkler et al. 2003; Friederici et al. 2002). Thus, the term ERAN should be used for (relatively early) ERP effects elicited by music-syntactic irregularities.
212 Stefan Koelsch
8.2
Musical syntax
The ERAN reflects music-syntactic processing, i.e. processing of abstract regularity-based auditory information. In major-minor tonal music (often simply referred to as “Western” music), music-syntactic processing of chord functions (be they constituted by chords or by subsequent tones) comprises several subprocesses (see Figure 2a for explanation of chord functions). These sub-processes are grouped in the following into several core aspects, with the ordering of these aspects not being intended to reflect a temporal order of music-syntactic processing (that is, the sub-processes may partly happen in parallel). Note that musical syntax also comprises other structural aspects, such as melodic (e.g. voice leading), rhythmic, metric, and timbral structure. Syntactic processing of such structural aspects has, however, to my knowledge not been investigated so far. (1) Extraction of a tonal centre. In major-minor tonal music, music-syntactic processing of chord functions (be they constituted of chords or melodies) relies on the extraction of a tonal centre, for example, C in the case of a passage in C major. The tonal centre corresponds to the root of a tonal key (and is perceived as the best representative of a tonal key, see Krumhansl and Kessler 1982). In terms of harmonic function, the tonal centre is also the root of the tonic chord (see Figure 2a for explanation of the term “tonic”), and thus the reference point for the tonal hierarchy or chord functions (see also below). The process of establishing a representation of a tonal centre is normally an iterative process (Krumhansl and Toivainen 2001), and this process has to be engaged each time the tonal key changes. To extract a tonal centre, or to detect a shift in tonal key (and thus a shift of the tonal centre), listeners have to sequence musical information, abstract a tonal centre from the different tones of a musical passage, keep the representation of the tonal centre in short-term memory, and realize when the memory representation of a tonal centre differs from that of new musical information. A description of the cognitive representation of tonal keys is provided, e.g., in Krumhansl and Kessler (1982). Previous studies have shown that listeners tend to interpret the first chord (or tone) of a sequence as the tonic (that is, as the tonal centre; Krumhansl and Kessler 1982). In case the first chord is not the tonic, listeners have to modify their initial interpretation of the tonal centre during the perception of subsequent tones or chords (Krumhansl and Kessler 1982; for a conception of key identification within the tonal idiom see the intervallic rivalry model from Brown et al. 1994).
Unconscious memory representations 213
(2) Successive tones and chords are actively related to the tonal centre, as well as to each other, in terms of harmonic distance (see also Figure 3a). E.g., in C major, a G major chord is more closely related to C major than a G# major chord. (3) Based on the harmonic relations between chords, a tonal order (or “hierarchy”, Bharucha and Krumhansl 1983; see also Tillmann et al. 2000, 2008) is established, according to which the configuration of previously heard chord functions forms a tonal structure (or a structural context). For example, within the tonal “hierarchy of stability” (Bharucha and Krumhansl 1983) the tonic chord is the most “stable” chord, followed by the dominant and the subdominant, whereas chords such as the submediant and the supertonic represent less stable chords (ibid.). Importantly, when individuals familiar with major-minor tonal music listen to such music, they transfer the sensual pitch information (that is, information about the pitches of the tones of melodies or chords) into a cognitive representation of the location of tones and chords within the tonal hierarchy of a key, as well as within the (major-minor) tonal key space (see also Figure 3). For example, within a G major context, the sensory percept of three simultaneously sounding tones with pitches forming a major triad (such as c’, e’, and g’) is transformed into the location relative to the tonic (Figure 3a) as well as relative to the tonal centre (that is, relative to the tonal reference point; see Figure 3). The term “location relative to the tonal centre” refers to the place in the map of keys (or on the torus of keys; Krumhansl and Kessler 1982, see also Figure 3d) in relation to the tonic, and the term “relative to the tonic” refers to the chord function (c-e-g is the subdominant in G major). In other words, when processing harmonic information, listeners relate new harmonic information to the previous harmonic context in terms of harmonic distance, and in terms of its functional-harmonic information. Note that pitch height information is one-dimensional (ranging from low to high), whereas the cognitive representation of major-minor tonal space is at least two- (if not four-) dimensional (Figure 3d and Krumhansl and Kessler 1982, see ibid. for a description of the multi-dimensional cognitive representation of major-minor tonal space; see also Krumhansl et al. 1982). The distances between chord functions and keys in terms of music theory correlate with acoustic similarity (see Leman 2000, who showed that the key profiles obtained by Krumhansl and Kesssler 1982, can largely be accounted for by measures of acoustic similarity). However, acoustic similarity of two chords is a one-dimensional measure, whereas the tonal space of chord functions and keys appears to be represented in listeners in more than one dimension (see above and Krumhansl and Kessler 1982; Krumhansl et al. 1982). That is, when we listen to a
214 Stefan Koelsch
musical piece (at least if we are familiar with major-minor tonal music), we do not only monitor chord transitions with regard to their acoustic similarity, but also with regard to both the tonal space of keys, and the hierarchy of stability (that is, with regard to their chord function). This difference in dimensionality is an important difference between “acoustical” and “musical” processing. (4) Once a harmonic hierarchy is established, moving away from a tonal centre may be experienced as tensioning, and moving back as releasing (see also Lerdahl 2001; Patel 2003). This simple statement has several important implications: Firstly, moving through tonal space establishes hierarchical processing (for details see Lerdahl 2001 and Patel 2003). Secondly, the tension-resolution patterns emerging from moving through tonal space have emotional quality (Meyer 1956; Steinbeis et al. 2006; Koelsch et al. 2008). Thirdly, moving away, and back to a tonal centre also opens the possibility for recursion, because while moving away from a tonal centre (e.g., to the dominant, that is in C major: a G major chord), a change of key might take place (e.g., from C major to G major), and within the new key (now G major) – which now has a new tonal centre – the music might again move away from the tonal centre (e.g. to the dominant of G major), until it returns to the tonal centre of G, and then to the tonal centre of C major (for studies investigating neural correlates of the processing of changes in tonal key with electroencephalography or functional magnetic resonance imaging see Koelsch et al. 2002c, 2003b, 2005b; Janata et al. 2002b). (5) In addition, the succession of chord functions follows style-specific statistical regularities, that is, probabilities of chord transitions (Riemann 1877; Rohrmeier 2005). For example, in a statistical study by Rohrmeier (2005) on the frequencies of diatonic chord progressions in Bach chorales, the supertonic was five times more likely to follow the subdominant than to precede it. These statistical regularities are an important characteristic of musical syntax with regards to the harmonic aspects of major-minor tonal music (other characteristics pertain, e.g., the principles of voice-leading). The representations of such regularities are stored in long-term memory, and by its very nature it needs listening experience to extract the statistical properties of the probabilities for the transitions of chord functions (see also Tillmann et al. 2000). These statistical properties are implicitly learned in the sense that they are extracted without conscious effort by individuals (usually even without individuals being aware of it), and stored in a long-term memory format.
Unconscious memory representations 215
It is important to understand that, while listeners familiar with (Western) tonal music perceive a sequence of chords, they automatically make predictions of likely chord functions to follow. That is, listeners extrapolate expectancies for subsequent sounds of regular chords, based on representations of music-syntactic regularities; chords (or tones) that mismatch with the music-syntactic sound expectancy of a listener elicit processes that are electrically reflected in an ERAN (Koelsch et al. 2000). The mathematical principles from which the probabilities for chord transitions within a tonal key might have emerged are under current investigation (see, e.g., Woolhouse and Cross 2006, for the interval cycle-based model of pitch attraction), and it appears that many of these principles represent abstract, rather than physical (or acoustical) features (Woolhouse and Cross 2006; note that, in addition to transition probabilities of chord functions, frequencies of co-occurrences, as well as frequencies of occurrences of chord functions and tones also represent statistical regularities, see Tillmann et al. 2008). It is likely that steps (1) and (2) can – at least approximately – be performed even by humans without prior experience of Western music (e.g., by newborns, or by adult listeners naive to Western music). However, several studies suggest that the fine-grained cognitive processes required for tonic identification that are typically observed in Western listeners (even when they haven’t received formal musical training) are based on extensive musical experience (e.g., Lamont and Cross 1994). Likewise, calculating subtle distances between chord functions and a tonal centre appears to rely on learning (see also Tekman and Bharucha 1998). Whether step (3) can be performed without prior experience of Western music is unknown, but previous studies strongly suggest that the detailed nature of the tonal hierarchy schema is learnt through early childhood (Lamont and Cross 1994). That is, while it is conceivable that humans naive to Western music find the probabilities for chord transitions plausible (because they follow abstract mathematical principles which become apparent in specific transitions of chords, Woolhouse and Cross 2006), repeated experience of Western music is necessary to acquire the knowledge about the probabilities of the transitions of chord functions, as well as knowledge about frequencies of co-occurrences of chord functions, and frequencies of occurrences of chord functions and tones (see above). Because this knowledge is essential for the prediction of subsequent chord functions (and, thus, for building up a harmonic sound expectancy), it is highly likely that the ERAN would not be elicited without such knowledge (see also section on musical training in this chapter).
216 Stefan Koelsch
Figure 2. Chord functions are the chords built on the tones of a scale (a). The chord on the first scale tone, e.g., is denoted as the tonic, the chord on the second scale tone (in major) as supertonic, on the third scale tone as mediant, on the fourth scale tone as subdominant, and the chord on the fifth scale tone as the dominant. The major chord on the
Unconscious memory representations 217
second tone of a major scale can be interpreted as the dominant to the dominant (square brackets). In major-minor tonal music, chord functions are arranged within harmonic sequences according to certain regularities. One example for a regularity-based arrangement of chord functions is that the dominant-tonic progression is a prominent marker for the end of a harmonic sequence, whereas a tonic-dominant progression is unacceptable as a marker of the end of a harmonic sequence (see text for further examples). A sequence ending on a regular dominant-tonic progression is shown in the left panel of (b). The final chord of the right panel of (b) is a dominant to the dominant. This chord function is irregular, especially at the end of a harmonic progression (sound examples are available at www.stefan-koelsch.de/TC_DD). In contrast to the sequences shown in Figure 1, the irregular chords are acoustically even more similar to the preceding context than regular chords (see text for details; adapted with permission from Koelsch 2005). (c) shows the ERPs elicited by the final chords of these two sequence types (recorded from a right-frontal electrode site [F4] from twelve subjects, from Koelsch 2005). Both sequence types were presented in pseudorandom order equiprobably in all twelve major keys. Although music-syntactically irregular chords were acoustically more similar to the preceding harmonic context than regular chords, the irregular chords still elicit an ERAN. The ERAN can best be seen in the difference wave (solid line), which represents regular subtracted from irregular chords. With MEG, the magnetic equivalent of the ERAN was localized in the inferior frontolateral cortex (adapted with permission from Maess et al. 2001; single-subject dipole solutions are indicated by striped disks, the white dipoles indicate the grand-average of these source reconstructions). (e) shows activation foci (small spheres) reported by functional imaging studies on music-syntactic processing using chord sequence paradigms (Koelsch et al. 2005; Maess et al. 2001; Tillmann et al. 2003; Koelsch et al. 2002c) and melodies (Janata et al. 2002b). The two larger spheres show the mean coordinates of foci (averaged for each hemisphere across studies, coordinates refer to standard stereotaxic space). (Adapted with permission from Koelsch and Siebel 2005.)
218 Stefan Koelsch
Figure 3. (a) Two-dimensional representation of key relationships in the visualization of Krumhansl and Kessler (1982). One linear axis corresponds to the Circle of Fifths, the other to the heterogeneous axes of parallel and relative third major/minor relationships. The topographical distances between keys roughly correspond to perceived harmonic distances. (b) By linking relative and parallel relations (with thirds on the horizontal plane and fifths on the vertical), Gottfried Weber (Weber 1817) created a schematic diagram of all major and minor keys. This figure shows a strip of keys based on Weber’s schematic diagram. (c) Curling of the key map presented in (b). For example, g-minor occurs as the relative minor of B-major, and as the parallel minor of G-major. When the strip is curled as above, these keys can be overlaid. (d) Once the redundant strips overlap, the curls can be compacted into a single tube, the enharmonic equivalents at both ends of the tube can be wrapped horizontally to produce a three-dimensional torus. This torus roughly represents major-minor tonal key space. (Adapted with permission from Purwins et al. 2007.)
8.3
Unconscious memory representations 219
A neural mechanism underlying music-syntactic processing
With regard to interpreting event-related brain potentials (ERPs) such as the ERAN, it is important to be aware of the difficulty that music-syntactically irregular chords often also represent a physical deviance (as will be described below in more detail). This is the case, e.g., for the Neapolitan chords shown in Figure 1a: The regular chords belonged to one tonal key, thus most notes played in an experimental block belonged to this key (e.g., in C major all white keys on a keyboard), whereas the Neapolitan chords introduced pitches that had not been presented in the previous harmonic context (see the flat notes of the Neapolitan chords in Figure 1b). Thus, the ERAN elicited by those chords was presumably partly overlapped by a phMMN. In other words, the detection of the irregular chords was partly due to implicit, or “unconscious”, memory representations of music-syntactic irregularities, but also partly due to unconscious representations (or so-called sensory memory traces) of acoustic regularities (from which the Neapolitan chords deviated). Nevertheless, it is also important to note that the Neapolitan chords at the third position of the sequences served as acoustical control stimuli, allowing an estimate of the degree of physical mismatch responses to Neapolitan chords. The data showed that the ERAN elicited by chords at the final position of chord sequences was considerably larger than the ERAN elicited by chords at the third position of the sequences (Figure 1c). This showed that the effects elicited by the Neapolitan chords at the final position of the chord sequences could not simply be an MMN, because an MMN would not have shown different amplitudes at different positions within the stimulus sequence (Koelsch et al. 2001; in that study the ERAN, but neither the phMMN nor the afMMN differed between positions in the chord sequences). Corroborating these findings, the study from Leino et al. (2007) showed that the amplitude of the ERAN, but not the amplitude of an MMN elicited by mistuned chords, differed between different positions within chord sequences. A very nice feature of that study was that chord sequences comprised of seven chords, and that they were composed in a way that Neapolitan chords occurring at the fifth position were music-syntactically less irregular than Neapolitans at the third position (contrary to the sequences presented in Figure 1a). Consequently, the ERAN elicited at the fifth position was smaller than the ERAN elicited at the third position (and the ERAN was largest when elicited by Neapolitan chords at the seventh position, where they were most irregular). Importantly, irregular chords at both the third and at the seventh position followed dominant-seventh chords; therefore, the fact that the ERAN was larger at the seventh than at the third position cannot simply be explained by local statistical probabilities for the transition
220 Stefan Koelsch
of chord functions. This is a strong hint that the generation of the ERAN involved hierarchical processing, because within the hierarchical structure of the chord sequences, the degree of irregularity was higher at the seventh than at the third position of the chord sequences. However, the fact that the ERAN elicited by music-syntactically irregular events is often partly overlapped by a phMMN results from the fact that, for the most part, music-syntactic regularities co-occur with acoustic similarity. For example, in a harmonic sequence in C major, a C# major chord (that does not belong to C major) is music-syntactically irregular, but the C# major chord is also acoustically less similar to the C major context than any other chord belonging to C major (because the C# major chord contains two tones that do not belong to the C major scale). Thus, any experimental effects evoked by such a C# major chord can not simply be attributed to music-syntactic processing. Because such a C# major chord is (in the first inversion) the enharmonic equivalent of a Neapolitan sixth chord, it is likely that effects elicited by such chords in previous studies (e.g., Koelsch et al. 2000; Loui et al. 2005; Leino et al. 2007) are not only due to music-syntactic processing, but also partly due to acoustic deviances that occurred with the presentation of the Neapolitan chords (for further details see also Koelsch et al. 2007). In fact, tonal hierarchies, and music-syntactic regularities of major-minor tonal music are partly grounded on acoustic similarities (e.g., Leman 2000), posing considerable difficulty on the investigation of music-syntactic processing. However, a number of ERP studies has been published so far that aimed at disentangling the “cognitive” mechanisms (related to music-syntactic processing) from the “sensory” mechanisms (related to the processing of acoustic information; Regnault et al. 2001; Poulin-Charronnat et al. 2006; Koelsch 2005, 2007; Koelsch and Jentschke 2008; for important behavioural studies on this issue see, e.g., Bigand et al. 1999; Tillmann et al. 2008), and some of them showed that the ERAN can be elicited even when the syntactically irregular chords are acoustically more similar to a preceding harmonic context than syntactically regular chords (Koelsch 2005, 2007; Koelsch and Jentschke 2008). For example, in the sequences shown in Figure 2b, the music-syntactically regular chords (i.e., the final tonic chord of the sequence shown in the left panel of Figure 2b) introduced two new pitches, whereas the irregular chords at the sequence ending (so-called double dominants, shown in the right panel of Figure 2b) introduced only one new pitch (the new pitches introduced by the final chords are indicated by the arrows of Figure 2b). Moreover, the syntactically irregular chords had more pitches in common with the penultimate chord than regular chords, thus the “sensory dissonance” (of which pitch commonality is the major component) between final and penultimate chord was not greater for the irregular than for the regular
Unconscious memory representations 221
sequence endings. Nevertheless, the irregular chord functions (occurring with a probability of 50 percent) elicited a clear ERAN, suggesting that electrical reflections of music-syntactic processing can be measured without the presence of a physical irregularity (Figure 2c). In the sequences of Figure 2b, the irregular chords (i.e., the double dominants) did not belong to the tonal key established by the preceding chords (similar to the Neapolitan chords of previous studies). However, experiments using inkey chords as music-syntactically irregular chords have shown that an ERAN can also be elicited by in-key chords, indicating that the elicitation of the ERAN does not require out-of-key notes (Koelsch et al. 2007; Koelsch and Jentschke 2008). The peak latency of the ERAN is often between 170 and 220 ms, with the exception of four studies: Koelsch and Mulder (2002) reported an ERAN with a latency of around 250 ms, Steinbeis et al. (2006) reported an ERAN with a latency of 230 ms (in the group of non-musicians), James et al. (2008) an ERAN with a latency of 230 ms in a group of musicians, and Patel et al. (1998) reported an ERAN-like response (the right anterior temporal negativity, RATN) with a peak latency of around 350 ms. The commonality of these four studies was that they used non-repetitive sequences, in which the position at which irregular chords could occur was unpredictable. It is also conceivable that the greater rhythmic complexity of the stimuli used in those studies had effects on the generation of the ERAN (leading to longer ERAN latencies), but possible effects of rhythmic structure on the processing of harmonic structure remain to be investigated. It is also interesting that all four of these studies (Patel et al. 1998; Koelsch and Mulder 2002; Steinbeis et al. 2006; James et al. 2008) reported a more centrotemporal than frontal scalp distribution of the ERAN. However, in a recent study that also used non-repetitive sequences, in which the occurrence of an irregular chord was unpredictable (Koelsch et al. 2008), the peak amplitude of the ERAN was between 158 and 180 ms, and the maximum of the ERAN was over frontal electrodes. The specific effects on latency and scalp distribution of the ERAN are, thus, not yet understood. A recent study on oscillatory activity associated with music-syntactic processing (Herrojo et al. 2009) showed that music-syntactic processing in response to irregular chord functions is primarily reflected by low frequency (< 8 Hz) brain oscillations. The spectral energy (total and evoked) of both delta (< 4 Hz) and theta (4–7 Hz) oscillations increased in response to regular chords (presented at the final position of chord sequences), but this increase was stronger for the regular than for the irregular chords. The difference in oscillatory activity between regular and irregular chords was larger over the right than over the left hemisphere, and presumably due to phase-resetting of oscillations in the theta band, and both phase-resetting and increase in amplitude of oscillations in
222 Stefan Koelsch
the delta band. Interestingly, in the ERAN-time window no effects of irregular chords were found in the gamma band (differences were found, however, in a later time window, around 500–550 ms after stimulus onset). The ERAN can not only be elicited by chords. Two previous ERP studies with melodies showed that early anterior brain responses can also be elicited by single tones (Miranda and Ullmann 2007; Brattico et al. 2006; the latter study referred to these brain responses as MMN). Moreover, a study from Schön and Besson (2005) showed that the ERAN can even be elicited by visually induced musical expectancy violations (that study also used melodies).
8.4
Processing of acoustic vs. music-syntactic irregularities
There is a crucial difference between the neural mechanisms underlying the processing of acoustic irregularities (as reflected in phMMN and afMMN) on the one side, and music-syntactic processing (as reflected in the ERAN) on the other: The generation of both phMMN and afMMN is based on an on-line establishment of regularities – that is, based on representations of regularities that are extracted on-line from the acoustic environment. By contrast, music-syntactic processing (as reflected in the ERAN) relies on representations of music-syntactic regularities that already exist in a long-term memory format (although music-syntactic processing can modify such representations). That is, the statistical probabilities that make up music-syntactic regularities are not learned within a few moments, and the representations of such regularities are stored in a long-term memory format (as described above). With regards to the MMN, it is important to not confuse the on-line establishment of regularities with long-term experience or long-term representations that might influence the generation of the MMN: E.g., pitch information can be decoded with higher resolution by some musical experts (leading to a phMMN to frequency deviants that are not discriminable for most non-experts; Koelsch et al. 1999), or the detection of a phoneme is facilitated when that phoneme is a prototype of one’s language (leading to a phMMN that has a larger amplitude in individuals with a long-term representation of a certain phoneme compared to individuals who do not have such a representation; Näätänen et al. 1997; Winkler et al. 1999; Ylinen et al. 2006). In this regard, long-term experience has clear effects on the processing of physical oddballs (as reflected in the MMN). However, in all of these studies (Koelsch et al. 1999; Näätänen et al. 1997; Winkler et al. 1999; Ylinen et al. 2006), the generation of the MMN was dependent on representations of regularities that were extracted on-line from the acoustic environment: For example, in the classical study from Näätänen et al. (1997),
Unconscious memory representations 223
the standard stimulus was the phoneme /e/, and one of the deviant stimuli was the phoneme /õ/, which is a prototype in Estonian (but not in Finnish). This deviant elicited a larger phMMN in Estonians than in Finnish subjects, reflecting that Estonians have a long-term representation of the phoneme /õ/ (and that Estonians were, thus, more sensitive to detect this phoneme). However, the regularity for this experimental condition (“/e/ is the standard and /õ/ is a deviant”) was independent of the long-term representation of the phonemes, and this regularity was established on-line by the Estonian subjects during the experiment (and could have been changed easily into “/õ/ is the standard and /e/ is the deviant”). That is, the statistical probabilities that make up the regularities in such an experimental condition are learned within a few moments, and the representations of such regularities are not stored in a long-term memory format. With regards to the phMMN and the afMMN, Schröger (2007) describes four processes that are required for the elicitation of an MMN, which are related here to the processes underlying the generation of the ERAN (see also Figure 4): (1) Incoming acoustic input is analyzed in multiple ways resulting in the separation of sound sources, the extraction of sound features, and the establishment of representations of auditory objects. Basically the same processes are a required for music-syntactic processing, and thus for the elicitation of the ERAN (see also top left of Figure 4; for exceptions see Widmann et al. 2004; Schön and Besson 2005). (2) Regularities inherent in the sequential presentation of discrete events are detected and integrated into a model of the acoustic environment. Similarly, Winkler (2007) states that MMN can only be elicited when sounds violate some previously detected inter-sound relationship. These statements nicely illustrate a crucial difference between the cognitive operations underlying music-syntactic processing (as reflected in the ERAN) and processing of acoustic oddballs (as reflected in the MMN): As mentioned above, during (music-) syntactic processing, representations of regularities already exist in a long-term memory format (similarly to the processing of syntactic aspects of language) and determine automatically, or unconsciously, the processing of music-syntactic information. That is, the regularities themselves do not have to be detected, and it is not the regularity that is integrated into a model of the acoustic environment, but it is the actual sound (or chord) that is integrated into a cognitive (structural) model according to longterm representations of regularities. That is, the representations of (music-) syntactic regularities are usually not established on-line, and they are, moreover, not necessarily based on the inter-sound relationships of the acoustic input (see top right of Figure 4). Note that, due to its relation to representations that are stored in a long-term format, music-syntactic processing is intrinsically connected to learning and memory. Unless an individual is formally trained in music theory,
224 Stefan Koelsch
Figure 4. Systematic overview of processes required to elicit MMN and ERAN (see text for details). Whereas the extraction of acoustic features is identical for both MMN and ERAN (top left quadrant), MMN and ERAN differ with regard to the establishment of a model of intersound-relationships (top right quadrant): In the case of the MMN, a model of regularities is based on inter-sound relationships that are extracted on-line from the acoustic environment. These processes are linked to the establishment and maintenance of representations of the acoustic environment, and thus to the processes of auditory scene analysis. In the case of the ERAN, a model of inter-sound relationships is built based on representations of music-syntactic regularities that already exist in a long-term memory format. The bottom quadrants illustrate that the neural resources for the prediction of subsequent acoustic events, and the comparison of new acoustic information with the predicted sound, presumably overlap strongly for MMN and ERAN.
these memory representations are unconscious in the sense that they are implicit, and that they cannot (and need not!) be described with words, or concepts. It is also important to note that music-syntactic processing usually requires processing of long-distance dependencies at a level of complexity termed phrase structure grammar (for explanation of phrase structure grammar see Fitch and Hauser 2004; for studies comparing neural correlates of phrase structure grammar and finite state grammar see, e.g., Friederici et al. 2006; Opitz and Friederici 2007). For instance, as mentioned in the section about musical syntax, moving away from a tonal centre in major-minor tonal space creates a hierarchical structure that is always concluded by the return to the initial tonal centre (with the possibility of recursion). By contrast, processing of acoustic oddballs involves
Unconscious memory representations 225
sequential processing guided by local organizational principles with regularities usually limited to neighbouring units, but not hierarchical processing. Interestingly, the ability to process phrase structure grammar is available to all humans, whereas non-human primates are not able to master such grammars (Fitch and Hauser 2004). Thus, it is highly likely that only humans can adequately process music-syntactic information at the phrase structure level. Auditory oddballs, by contrast, can be detected by non-human mammals such as cats and macaques (and auditory oddballs elicit MMN-like responses in these animals; for a brief overview see Näätänen et al. 2005, p. 26). (3) Predictions about forthcoming auditory events are derived from the model (see also Winkler 2007; Garrido et al. 2009). This process is similar (presumably at least partly identical) for the ERAN: A sound expectancy (Koelsch et al. 2000) for following musical events (e.g., a chord) is established based on the previous structural context and the knowledge about the most likely tone, or chord, to follow. As mentioned in (2), however, in the case of the MMN the predictions are based on regularities that are established on-line based on the inter-sound relationships of the acoustic input, whereas in the case of the ERAN the predictions are based on representations (of music-syntactic regularities) that already exist in a long-term memory format. In other words, in the case of the processing of auditory oddballs, the predictions are established based on local organizational principles, whereas in the case of music-syntactic processing, the predictions are usually based on the processing of phrase structure grammar involving probabilities for the transition of chord functions within hierarchical structures. To date it is not known to which degree the predictions underlying the generation of the MMN and the ERAN are established in the same areas. Because the premotor cortex (PMC, corresponding to Brodmann’s area 6) has been implicated in serial prediction (Schubotz 2007), it is likely that PMC serves the formation of both (i) predictions underlying processing of auditory oddballs and (ii) predictions underlying processing of music-syntactic information. However, it is also likely that, in addition to such overlap, predictions for the MMN are also generated in sensory-related areas (i.e., in the auditory cortex), and the predictions for the ERAN (perhaps also for the afMMN) in hetero-modal areas such as Broca’s area (BA 44/45; see bottom left of Figure 4; for neural generators of the ERAN see next section). (4) Representations of the incoming sound and the sound predicted by the model are compared. For the ERAN, this process is, again, presumably at least partly the same as for the MMN (see bottom right of Figure 4). However, similarly to (3) it is unknown whether such processing comprises primarily auditory areas for the MMN (where the sound representation might be more concrete, or
226 Stefan Koelsch
“sensory”, due to directly preceding stimuli that established the regularities), and primarily frontal areas for the ERAN (see also next section for further details). In addition, Winkler (2007) states that the primary function of the MMNgenerating process is to maintain neuronal models underlying the detection and separation of auditory objects. This also differentiates the processes underlying the MMN from those underlying music-syntactic processing, because syntactic processing serves the computation of a string of auditory structural elements that – in their whole – represent a form that conveys meaning which can be understood by a listener familiar with the syntactic regularities (Koelsch and Siebel 2005; Steinbeis and Koelsch 2008).
8.5
Unconscious interactions between musicand language-syntactic processing
During the last years, a number of studies has revealed interactions between music-syntactic and language-syntactic processing (Koelsch et al. 2005; Steinbeis and Koelsch 2008; Slevc et al. 2009; Fedorenko et al. 2009). In these studies, chord sequences were presented simultaneously with visually presented sentences while participants were asked to focus on the language-syntactic information, and to ignore the music-syntactic information. Using EEG and chord sequence paradigms similar to those described in Figures 1, two studies showed that the ERAN elicited by irregular chords interacts with the left anterior negativity (LAN) elicited by linguistic (morpho-syntactic) violations (Koelsch et al. 2005; Steinbeis and Koelsch 2008): The LAN elicited by words was reduced when the irregular word was presented simultaneously with an irregular chord (compared to when the irregular word was presented with a regular chord). In the study from Koelsch et al. (2005) a control experiment was conducted in which the same sentences were presented simultaneously with sequences of single tones. The tone sequences ended either on a standard tone or on a frequency deviant. The phMMN elicited by the frequency deviants did not interact with the LAN (in contrast to the ERAN), indicating that the processing of auditory oddballs (as reflected in the phMMN) does not consume resources related to syntactic processing. Whether the afMMN consumes such resources remains to be investigated. Results of these ERP studies (Koelsch et al. 2005; Steinbeis and Koelsch 2008) indicate that the ERAN reflects syntactic processing, rather than detection and integration of inter-sound relationships inherent in the sequential presentation of discrete events into a model of the acoustic environment. In other words, the results of those ERP studies indicate that the automatic processing of music-syntactic information (based on
Unconscious memory representations 227
unconscious long-term memory representations), but not the automatic processing of acoustic oddballs (based on unconscious sensory memory representations) interacts with language-syntactic processing. These ERP findings were corroborated by behavioural studies: In a study from Slevc et al. (2009) participants performed a self-paced reading of “garden-path” sentences (that is, of sentences which have a different syntactic structure than initially expected). Words (presented visually) occurred simultaneously with chords (presented auditorily). When a syntactically unexpected word occurred together with a music-syntactically irregular (out-of-key) chord, participants needed more time to read the word (that is, participants showed stronger gardenpath effects). No such interaction between language-syntactic and music-syntactic processing was observed when words were semantically unexpected, nor when the chord presented with the unexpected word had an unexpected timbre (but was harmonically correct). Similar results were reported by Fedorenko et al. (2009) in a study in which sentences were sung. Sentences were either subjectextracted or object-extracted relative clauses, and the note sung on the critical word of a sentence was either in-key or out-of-key. Participants were less accurate in their understanding of object-related extractions compared to subject-related extractions (as expected). Importantly, the difference between the comprehension accuracies of these two sentence types was larger when the critical word (the last word of a relative clause) was sung on an out-of-key note. No such interaction was observed when the critical word was sung with greater loudness. Thus, both of these studies (Slevc et al. 2009; Fedorenko et al. 2009) showed that music- and language-syntactic processing specifically interact with each other, presumably because they both rely on common processing resources. The findings of the mentioned EEG- and behavioural studies showing interactions between language- and music-syntactic processing have been corroborated by a recent patient study (Patel et al. 2008). This study showed that individuals with Broca’s aphasia also show impaired music-syntactic processing in response to out-of-key chords occurring in harmonic progressions (note that all patients had Broca’s aphasia, but only some of them had a lesion that included Broca’s area).
8.6
Neural basis of music-syntactic processing
A number of studies suggest that music-syntactic processing relies on neural sources located in the pars opercularis of the inferior fronto-lateral cortex (corresponding to inferior Brodmann’s area [BA] 44), presumably with additional
228 Stefan Koelsch
contributions from the ventrolateral premotor cortex and the anterior superior temporal gyrus (planum polare; Koelsch 2006). A study with magnetoencepha lography (Koelsch 2000) using a chord sequence paradigm with the stimuli depicted in Figure 1a and b, reported a dipole solution of the ERAN with a two-dipole model, the dipoles being located bilaterally in inferior BA 44 (see also Maess et al. 2001, and Figure 2d, the dipole strength was nominally stronger in the right hemisphere, but this hemispheric difference was statistically not significant). The main frontal contribution to the ERAN reported in that study stays in contrast to the phMMN which receives its main contributions from neural sources located within and in the vicinity of the primary auditory cortex, with additional (but smaller) contributions from frontal cortical areas (Alho et al. 1996; Alain et al. 1998; Giard et al. 1990; Opitz et al. 2002; Liebenthal et al. 2003; Molholm et al. 2005; Rinne et al. 2005; Schönwiesner et al. 2007; for a review see Deouell 2007). Likewise, the main generators of the afMMN have also been reported to be located in the temporal lobe (Korzyukov et al. 2003). That is, whereas the phMMN (and the afMMN) receives main contributions from temporal areas, the ERAN appears to receive its main contributions from frontal areas. The results of the MEG study (Koelsch 2000) were supported by functional neuroimaging studies using chord sequence paradigms (Koelsch et al. 2002; Koelsch et al. 2005; Tillmann et al. 2006), “real”, polyphonic music (Janata et al. 2002a), and melodies (Janata et al. 2002b). These studies showed activations of inferior fronto-lateral cortex at coordinates highly similar to those reported in the MEG study (Figure 2e). Particularly the functional magnetic resonance imaging (fMRI) study from Koelsch et al. (2005) supported the assumption of neural generators of the ERAN in inferior BA 44: As will be reported in more detail below, the ERAN has been shown to be larger in musicians than in nonmusicians (Koelsch et al. 2002), and in the fMRI study from Koelsch et al. (2005) effects of musical training were correlated with activations of inferior BA 44, in both adults and children. Data obtained from patients with lesions of the left inferior frontal cortex (left pars opercularis) showed that the scalp distribution of the ERAN differs from that of healthy controls (Sammler 2008), supporting the assumption that neural sources located in the pars opercularis are involved in the ERAN generation. Moreover, data recorded from intracranial grid-electrodes from patients with epilepsy identified two ERAN sources, one in the inferior fronto-lateral cortex, and one in the superior temporal gyrus (Sammler 2008; the latter one was inconsistently located in anterior, middle, and posterior superior temporal gyrus). Further support stems from EEG studies investigating the ERAN and the phMMN under propofol sedation: Whereas the phMMN is strongly reduced,
Unconscious memory representations 229
but still significantly present under deep propofol sedation (Modified Observer’s Assessement of Alertness and Sedation Scale level 2–3, mean Bispectral Index = 68), the ERAN is abolished during this level of sedation (Koelsch et al. 2006). This highlights the importance of the frontal cortex for music-syntactic processing, because propofol sedation appears to affect heteromodal frontal cortices earlier, and more strongly than unimodal sensory cortices (Heinke et al. 2004; Heinke and Koelsch 2005). An EEG-study from James et al. (2008) reported a localization of the main generators of an ERAN potential in medial temporal areas (supposedly hippocampus and amygdala) and the insula, but this source analysis appears to be somewhat questionable because (a) converging evidence from a number of EEG, MEG, and fMRI studies indicates that main ERAN generators are located in the frontal lobe, and (b) no control ERP component (such as the P1) was localized in the study from James et al. (2008). Finally, it is important to note that inferior BA 44 (which is in the left hemisphere often referred to as Broca’s area) plays a crucial role for the processing of syntactic information during language perception (e.g., Friederici 2002), as well as for the hierarchical processing of action sequences (e.g., Koechlin and Jubault 2006), and for the processing of hierarchically organized mathematical formulas and termini (Friedrich and Friederici 2009). Thus, with regard to language-syntactic processing, the neural resources for the processing of musical and linguistic syntax appear to overlap strongly, and this view is particularly supported by the mentioned studies showing interactions between music-syntactic and languagesyntactic processing (Koelsch et al. 2005; Steinbeis and Koelsch 2008; Slevc et al. 2007). On a more abstract level, it is highly likely that Broca’s area is involved in the processing of hierarchically organized sequences in general, be they musical, linguistic, action-related, or mathematical. Interestingly, the data from the studies on music-syntactic processing show us that Broca’s area is capable of processing hierarchical information even when this information is established based on implicit (and in this sense unconscious) memory representations. Moreover, 5-year-old children with specific language impairment (characterized by marked difficulties in language-syntactic processing) do not show an ERAN (whereas children with normal language development do; Jentschke et al. 2008), and 11-year-old children with musical training do not only show an increase of the ERAN amplitude, but also an increase of the amplitude of the ELAN (reflecting language-syntactic processing, Jentschke et al. 2005; see also section on development below). The latter finding was interpreted as the result of training effects in the musical domain on processes of fast and automatic syntactic sequencing during the perception of language.
230 Stefan Koelsch
8.7
A note on the lateralization of the ERAN
Although the ERAN was significantly lateralized in some studies (e.g., Koelsch et al. 2000, 2007; Koelsch and Sammler 2007; Koelsch and Jentschke 2008), it was not right-lateralized in a few studies also using chord-sequence paradigms (Steinbeis et al. 2006; Loui et al. 2005), or melodies (Miranda and Ullman 2007). An important difference between those studies is that the studies in which the ERAN was right-lateralized had relatively large numbers of participants (twenty or more; Koelsch et al. 2007; Koelsch and Sammler 2007; Koelsch and Jentschke 2008), whereas studies in which no lateralization of the ERAN was reported have mostly measured less than twenty subjects (eighteen in the study from Loui et al. 2005; ten in the study from Leino et al. 2007). This difference is important because it seems that the ERAN is more strongly lateralized in males than in females (who often show a rather bilateral ERAN, Koelsch et al. 2003a); thus, a relatively large number of subjects is required until the lateralization of the ERAN reaches statistical significance. Additional factors that modulate the lateralization of the ERAN might include the salience of irregular chords, attentional factors, and the signal-to-noise ratio of ERP data (see Koelsch 2009). However, as mentioned above, a number of functional neuroimaging studies showed that, on average, the neural activity underlying the processing of musicsyntactically irregular chords has a right-hemispheric weighting (see also Koelsch 2009). Thus, even if the EEG effect is sometimes not significantly lateralized, it is reasonable to assume that the neural generators of the ERAN are activated more strongly in the right than in the left hemisphere.
8.8
Automaticity: Unconscious processing of musical syntax
So far, several ERP studies have used the ERAN to investigate the automaticity of music-syntactic processing. The ERAN has been observed while participants play a video game (Koelsch et al. 2001), read a self-selected book (Koelsch et al. 2002b), or perform a highly attention-demanding reading comprehension task (Loui et al. 2005). In the latter study, participants performed the reading task while ignoring all chord sequences, or they attended to the chord sequences and detected chords which deviated in their sound intensity from standard chords. These conditions enabled the authors to compare the processing of task-irrelevant irregular chords under an attend condition (intensity detection task) and an ignore condition (reading comprehension task). Results showed that an ERAN was elicited in both conditions and that the amplitude of the ERAN was reduced (but still significant) when the musical stimulus was ignored (Figure 5a; because
Unconscious memory representations 231
Figure 5. (a) shows difference ERPs (tonic subtracted from Neapolitan chords) elicited when attention was focussed on the musical stimulus (grey line), and when attention was focussed on a reading comprehension task (black line). The E(R)AN (indicated by the arrow) clearly differed between conditions, being smaller in the unattend-condition. (Adapted with permission from Loui et al. 2005.) (b) shows difference ERPs (tonic subtracted from Neapolitan chords) elicited in musicians (solid line) and non-musicians (dotted line). The ERAN (arrow) clearly differed between groups, being smaller in the group of non-musicians.
the ERAN was not significantly lateralized, it was denoted as early anterior negativity by the authors). Another recent study (Maidhof and Koelsch 2008) showed that the neural mechanisms underlying the processing of music-syntactic information (as reflected in the ERAN) are active even when participants selectively attend to a speech stimulus. In that study, speech and music stimuli were presented simultaneously from different locations (20° and 340° in the azimuthal plane). The ERAN was elicited even when participants selectively attended to the speech stimulus, but its amplitude was significantly decreased compared to the condition in which participants listened to music only. The findings of the latter two studies (Loui et al. 2005; Maidhof and Koelsch 2008) show that the neural mechanisms underlying the processing of harmonic structure operate in the absence of attention, but that they can be clearly modulated by different attentional demands. That is, in the conscious individual, musical syntax is processed by
232 Stefan Koelsch
unconscious mechanisms in the sense that (a) the listener does not have to pay attention to the music, and (b) the brains of listeners register music-syntactic irregularities even when these irregularities are not even consciously noticed by the listener. A recent study (Koelsch et al. 2006) showed that, in the unconscious individual, music-syntactic processing is presumably abolished: That study reported that no ERAN was observable under deep propofol sedation (where participants were in a state similar to natural sleep), in contrast to the phMMN, which was strongly reduced, but still significantly present during this level of sedation. This suggests that the elicitation of the ERAN requires a different state of consciousness on the part of the listeners than the phMMN (see also Heinke et al. 2004; Heinke and Koelsch 2005). With regard to the MMN, several studies have shown that the MMN amplitude can be reduced in some cases by attentional factors (for a review see Sussman 2007). However, it has been argued that such modulations could be attributed to effects of attention on the formation of representations for standard stimuli, rather than to the deviant detection process (Sussman 2007), and that MMN is largely unaffected by attentional modulations (Grimm and Schröger 2005; Sussman et al. 2004; Gomes et al. 2000). That is, the MMN seems to be considerably more resistant against attentional modulations than the ERAN.
8.9
Effects of musical training
Like the MMN, the ERAN can be modulated by both long-term and short-term training. Effects of musical training have been reported for the MMN with regard to the processing of temporal structure (Rüsseler 2001), the processing of abstract features such as interval and contour changes (Tervaniemi et al. 2001; Fujioka et al. 2004), as well as for the processing of pitch (Koelsch et al. 1999). In all these studies, the MMN was larger in individuals with formal musical training (“musicians”) than in individuals without such training (“non-musicians”). With regard to the ERAN, four studies have so far investigated effects of musical long-term training on the ERAN (Koelsch et al. 2002, 2007; Koelsch and Sammler 2007; Koelsch and Jentschke 2008). These studies have shown that the ERAN is larger in musicians (Figure 5b; Koelsch et al. 2002), and in amateur musicians (Koelsch et al. 2007) compared to non-musicians. In the latter study, the difference between groups was just above the threshold of statistical significance, and two recent studies reported nominally larger ERAN amplitude values for musicians (compared to non-musicians, Koelsch and Sammler 2007) and
Unconscious memory representations 233
amateur musicians (compared to non-musicians, Koelsch and Jentschke 2008), although the group differences did not reach statistical significance in these studies. Despite this lack of statistical significance it is remarkable that all studies that have so far investigated training effects on the ERAN observed larger ERAN amplitudes for musically trained individuals compared to non-musicians. Corroborating these ERP data, significant effects of musical training on the processing of music-syntactic irregularities have also been shown in fMRI experiments for both adults and 11 year-old children (Koelsch et al. 2005b). The evidence from the mentioned studies indicates that the effects of musical long-term training on the ERAN are small, but reliable and consistent across studies. This is in line with behavioural studies showing that musicians respond faster, and more accurately to music-structural irregularities than non-musicians (e.g., Bigand et al. 1999), and with ERP studies on the processing of musical structure showing effects of musical training on the generation of the P3 using chords (Regnault et al. 2001), or the elicitation of a late positive component (LPC) using melodies (Besson and Faita 1995; see also Schön et al. 2004; Magne et al. 2006; Moreno and Besson 2006). The ERAN is presumably larger in musicians because musicians have (as an effect of the musical training) more specific representations of music-syntactic regularities and are, therefore, more sensitive to the violation of these regularities (for effects of musical training in children see next section). With regards to short-term effects on music-syntactic processing, a recent experiment presented two sequence types (one ending on a regular tonic chord, the other one ending on an irregular supertonic) for approximately two hours (Koelsch and Jentschke 2008; subjects were watching a silent movie with subtitles). The data showed that music-syntactically irregular chords elicited an ERAN, and that the amplitude of the ERAN decreased over the course of the experimental session. These results, thus, indicated that neural mechanisms underlying the processing of music-syntactic information can be modified by short-term musical experience. Interestingly, although the ERAN amplitude was significantly reduced, it was still present at the end of the experiment, suggesting that cognitive representations of basic music-syntactic regularities are remarkably stable, and cannot easily be modified. This notion was corroborated by a recent study (Carrión and Bly 2008) showing that the amplitude of the ERAN to irregular chords does not increase when participants undergo a training session in which they are presented with eighty-four chord training sequences that ended on syntactically correct chord sequences. Notably, in contrast to the ERAN, the P3b amplitude increased in that study as an effect of training.
234 Stefan Koelsch
8.10
Development
The youngest individuals in whom music-syntactic processing has been investigated so far with ERPs were, to my knowledge, 4 month-old babies. These babies did not to show an ERAN (unpublished data from our group, irregular chords were Neapolitan chords; some of the babies were asleep during the experiment). In 2.5 year-old children (30 months) we observed an ERAN to supertonics as well as Neapolitan chords (unpublished data from our group). In this age group, the ERAN was quite small, suggesting that the development of the neural mechanisms underlying the generation of the ERAN commence around, or not long before this age (for behavioural studies on the development of music-syntactic processing see, e.g., Schellenberg et al. 2005). By contrast, MMN-like responses can be recorded even in the fetus (Draganova et al. 2005; Huotilainen et al. 2005), and a number of studies has shown MMN-like discriminative responses in newborns (although sometimes with positive polarity; Ruusuvirta et al. 2003, 2004; Winkler et al. 2003; Maurer et al. 2003; Stefanics et al. 2007) to both physical deviants (e.g., Alho et al. 1990; Cheour et al. 2002a, 2002b; Kushnerenko et al. 2007; Winkler et al. 2003; Stefanics et al. 2007) and abstract feature deviants (Ruusuvirta et al. 2003, 2004; Carral et al. 2005). Cheour et al. (2000) reported that, in some experiments, the amplitudes of such ERP responses are only slightly smaller in infants than the MMN usually reported in school-age children (but see also, e.g., Maurer et al. 2003; Kushnerenko et al. 2007; Friederici 2005, for differences). With regard to pitch perception, Háden et al. (2009) reported that already neonates can generalize pitch across different timbres. Moreover, reliable pitch perception has been shown in neonates for complex harmonic tones (Ceponiene et al. 2002), speech sounds (Kujala et al. 2004), environmental sounds (Sambeth et al. 2006), and even noise (Kushnerenko et al. 2007). Neonates are less sensitive than adults, however, for changes in the pitch height of sinusoidal tones (Novitski et al. 2007). The findings that MMN-like responses can be recorded in the fetus and in newborn infants support the notion that the generation of such discriminative responses is based on the (innate) capability to establish representations of intersound regularities that are extracted on-line from the acoustic environment (and the innate capability to perform auditory scene analysis), whereas the generation of the ERAN requires representations of musical regularities that first have to be learned through listening experience, involving the detection of regularities (i.e. statistical probabilities) underlying, e.g., the succession of harmonic functions. Children at the age of 5 years show a clear ERAN, but with longer latency than adults (around 230–240 ms; Jentschke et al. 2008, in that study the ERAN was elicited by supertonics). Similar results were obtained in another study using
Unconscious memory representations 235
Neapolitans as irregular chords (Koelsch et al. 2003). It is not known whether the longer latency in 5-year-olds (compared to adults) is due to neuro-anatomical differences (such as fewer myelinated axons), or due to less specific representations of music-syntactic regularities (or both). At the age of nine, the ERAN appears to be very similar to the ERAN of adults. In a recent study, 9-year-olds with musical training showed a larger ERAN than children without musical training (unpublished data from our group), and the latency of the ERAN was around 200 ms, in both children with and without musical training, thus still being longer than in older children and adults. With fMRI, it was observed that children at the age of 10 show an activation pattern in the right hemisphere that is strongly reminiscent to that of adults (with clear activations of inferior frontolateral cortex elicited by Neapolitan chords; Koelsch et al. 2005b). In this study, children also showed an effect of musical training, notably a stronger activation of the right pars opercularis in musically trained children (as in adults, see above). In 11-year-olds, the ERAN has a latency of around 180 ms, regardless of musical training, and is practically indistinguishable from the ERAN observed in adults (Jentschke et al. 2005). As in 9-year-olds, 11 year-old children with musical training show a larger ERAN than children without musical training (Jentschke et al. 2005). With regard to its scalp distribution, we previously reported that 5 year-old girls showed a bilateral ERAN, whereas the ERAN was rather left-lateralized in boys (Koelsch, Grossmann et al. 2005; irregular chords were Neapolitans). However, in another study with 5-year-olds (Jentschke et al. 2008; irregular chords were supertonics) no significant gender difference was observed, and nominally the ERAN was even more right-lateralized in boys, and more left-lateralized in girls. Thus, when interpreting data obtained from children, gender differences in scalp distribution should be treated with caution. Interestingly, it is likely that, particularly during early childhood, the MMN system is of fundamental importance for music-syntactic processing: MMN is inextricably linked to the establishment and maintenance of representations of the acoustic environment, and thus to the processes of auditory scene analysis. The main determinants of MMN comprise the standard formation process (because deviance detection is reliant on the standard representations held in sensory memory), detection and separation of auditory objects, and the organization of sequential sounds in memory. These processes are indispensable for the establishment of music-syntactic processing, e.g. when harmonies are perceived within chord progressions, and when the repeated exposure to chord progressions leads to the extraction and memorization of statistical probabilities for chord- or sound-transitions. In addition, because music-syntactic irregularity
236 Stefan Koelsch
and harmonic distance is often related to acoustic deviance (see section about functional significance of the ERAN), the acoustic deviance detection mechanism proliferates sometimes information about the irregularity (i.e., unexpectedness) of chord-functions, and perhaps even the harmonic distance between some chords. Such information aids the detection of music-syntactic regularities, and the build-up of a structural model. Importantly, when processing an acoustically deviant music-syntactic irregularity, MMN-related processes also draw attention to music.
8.11
Conclusions
In summary, processing of music-syntactically irregular information is often reflected electrically in an early right anterior negativity (ERAN). The ERAN resembles the MMN with regard to a number of properties, particularly polarity, scalp distribution, time course, and sensitivity to acoustic events that mismatch with a preceding sequence of acoustic events, and sensitivity to musical training. Therefore, the ERAN has previously also been referred to as music-syntactic MMN. In cognitive terms, the similarities between both MMN and ERAN comprise the extraction of acoustic features required to elicit both ERPs (which is identical for both ERPs), the prediction of subsequent acoustic events, and the comparison of new acoustic information with a predicted sound, processes which presumably overlap strongly for MMN and ERAN. However, there are also differences between music-syntactic processing and the processing of auditory oddballs, the most critical being that the processing of auditory oddballs (as reflected in phMMN and afMMN) is based on a model of regularities that is establishment based on intersound-relationships which are extracted on-line from the acoustic environment. By contrast, music-syntactic processing (as reflected in the ERAN) is based on a structural model which is established with reference to representations of syntactic regularities that already exist in a long-term memory format. In other words, in the case of the processing of auditory oddballs, the predictions are established based on local organizational principles, whereas in the case of music-syntactic processing, the predictions are usually based on the processing of phrase structure grammar involving long-distance dependencies and hierarchical processing. That is, the representations of regularities building the structural model of the acoustic environment are in the case of the MMN more sensory, and in the case of the ERAN more cognitive in nature. It is perhaps this difference between ERAN and MMN which leads to the different topographies of neural resources underlying the generation of both
Unconscious memory representations 237
components, with the ERAN usually showing more frontal and less temporal lobe involvement than the MMN. Notably, MMN is inextricably linked to the establishment and maintenance of representations of the acoustic environment, and thus to the processes of auditory scene analysis. These processes are indispensable for the acquisition of representations of music-syntactic regularities during early childhood, e.g. when the repeated exposure to chord progressions leads to the extraction and memorization of statistical probabilities for chord- or sound-transitions. Thus, the mechanisms required for the MNN also represent the foundation for musicsyntactic processing.
Acknowledgements I thank Erich Schröger, Matthew Woolhouse, Aniruddh Patel, Barbara Tillmann, Istvan Winkler, Elyse Sussman, Daniela Sammler, Niko Steinbeis, and Sebastian Jentschke for helpful discussion.
Note * A shortened version of this chapter has been published as article in Psychophysiology (Koelsch 2009).
References Alain, C., Woods, D. L. and Knight, R. T. (1998). A distributed cortical network for auditory sensory memory in humans. Brain Research, 812, 23–37. Alho, K., Tervaniemi, M., Huotilainen, M., Lavikainen, J., Tiitinen, H., Ilmoniemi, R. J., Knuutila, J. and Näätänen, R. (1996). Processing of complex sounds in the human auditory cortex as revealed by magnetic brain responses. Psychophysiology, 33, 369–375. Besson, M. and Faita, F. (1995). An event-related potential (ERP) study of musical expectancy: Comparison of musicians with non-musicians. Journal of Experimantal Psychology: Human Perception and Performance, 21, 1278–1296. Bharucha, J. and Krumhansl, C. (1983). The representation of harmonic structure in music: Hierarchies of stability as a function of context. Cognition, 13, 63–102. Bigand, E., Madurell, F., Tillmann, B. and Pineau, M. (1999). Effect of global structure and temporal organization on chord processing. Journal of Experimental Psychology: Human Perception and Performance, 25, 184–197. Brattico, E., Tervaniemi, M., Näätänen, R. and Peretz, I. (2006). Musical scale properties are automatically processed in the human auditory cortex. Brain Research, 1117, 162–174.
238 Stefan Koelsch
Brown, H., Butler, D. and Jones, M. R. (1994). Musical and temporal influences on key discovery. Music Perception, 11, 371–391. Carral, V., Huotilainen, M., Ruusuvirta, T., Fellman, V., Näätänen, R. and Escera, C. (2005). A kind of auditory ‘primitive intelligence’ already present at birth. European Journal of Neuroscience, 21, 3201–3204. Carrión, R. E. and Bly, B. M. (2008). The effects of learning on event-related potential correlates of musical expectancy. Psychophysiology, 45, 759–775. Ceponiene, R., Kushnerenko, E., Fellman V., Renlund, M., Suominen, K. and Näätänen, R. (2002). Event-related potential features indexing central auditory discrimination by newborns. Brain Research – Cognitive Brain Research, 13, 101–113. Cheour, M., Ceponiene, R., Leppanen, P., Alho, K., Kujala, T. and Renlund, M. (2002a). The auditory sensory memory trace decays rapidly in newborns. Scandinavian Journal of Psychology, 43, 33–39. Cheour, M., Kushnerenko, E., Ceponiene, R., Fellman, V. and Naatanen, R. (2002b). Electric brain responses obtained from newborn infants to changes in duration in complex harmonic tones. Developmental Neurophysiology, 22, 471–479. Cheour, M., Leppanen, P. H. and Kraus, N. (2000). Mismatch negativity (MMN) as a tool for investigating auditory discrimination and sensory memory in infants and children. Clinical Neurophysiology, 111, 4–16. Deouell, L. Y. (2007). The Frontal Generator of the Mismatch Negativity Revisited. Journal of Psychophysiology, 21, 147–160. Fedorenko, E., Patel, A. D., Casasanto, D., Winawer, J. and Gibson, E. (2009). Structural integration in language and music: Evidence for a shared system. Memory and Cognition, 37, 1–9. Fitch, W. T. and Hauser, M. D. (2004). Computational constraints on syntactic processing in a nonhuman primate. Science, 303, 377–380. Friederici, A. D. (2002). Towards a neural basis of auditory sentence processing. Trends in Cognitive Sciences, 6, 78–84. Friederici, A. D. (2005). Neurophysiological markers of early language acquisition: From syllables to sentences. Trends in Cognitive Sciences, 9, 481–488. Friederici, A. D., Bahlmann, J., Heim, S., Schubotz, R. I. and Anwander, A. (2006). The brain differentiates human and non-human grammars: Functional localization and structural connectivity. Proceedings of National Academy of Sciences, 103, 2458–2463. Friederici, A. D., Friedrich, M. and Weber, C. (2002). Neural manifestation of cognitive and precognitive mismatch detection in early infancy. Neuroreport, 13, 1251–1254. Friedrich, R. and Friederici, A. D. (2009). Mathematical Logic in the Human Brain: Syntax. PLoS ONE, 4, e5599. doi:10.1371/journal.pone.0005599 Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R. and Pantev, C. (2004). Musical training enhances automatic encoding of melodic contour and interval structure. Journal of Cognitive Neurosciences, 16, 1010–1021. Garrido, M. I., Kilner, J. M., Stephan, K. E. and Friston, K. J. (2009). The mismatch negativity: A review of underlying mechanisms. Clinical Neurophysiology, 120, 453–463. Giard, M. H., Perrin, F., Pernier, J. and Bouchet, P. (1990). Brain generators implicated in the processing of auditory stimulus deviance: A topographic event-related potential study. Psychophysiology, 27, 627–640.
Unconscious memory representations 239
Gomes, H., Molholm, S., Ritter, W., Kurtzberg, D., Cowan, N. and Vaughan, H. G., Jr. (2000). Mismatch negativity in children and adults, and effects of an attended task. Psychophysiology, 37, 807–816. Grimm, S. and Schröger, E. (2005). Preattentive and attentive processing of temporal and frequency characteristics within long sounds. Cognitive Brain Research, 25, 711–721. Gunter, T. C., Friederici, A. D. and Schriefers, H. (2000). Syntactic gender and semantic expectancy: ERPs reveal early autonomy and late interaction. Journal of Cognitive Neuroscience, 12, 556–568. Háden, G. P., Stefanics, G., Vestergaard, M. D., Denham, S. L., Sziller, I. and Winkler, I. (2009). Timbre-independent extraction of pitch in newborn infants. Psychophysiology, 46, 69–74. Heinke, W., Kenntner, R., Gunter, T. C., Sammler, D., Olthoff, D. et al. (2004). Differential Effects of Increasing Propofol Sedation on Frontal and Temporal Cortices: An ERP study. Anesthesiology, 100, 617–625. Heinke, W. and Koelsch, S. (2005). The effects of anesthetics on brain activity and cognitive function. Current Opinion in Anesthesiology, 18, 625–631. Herrojo, R. M., Koelsch, S. and Bhattacharya, J. (2009). Decrease in early right alpha band phase synchronization and late gamma band oscillations in processing syntax in music. Human Brain Mapping, 30, 1207–1225. Janata, P., Birk, J., Van Horn, J. D., Leman, M., Tillmann, B. and Bharucha, J. J. (2002b). The cortical topography of tonal structures underlying Western music. Science, 298, 2167–2170. Janata, P., Tillmann, B. and Bharucha, J. J. (2002a). Listening to polyphonic music recruits domain-general attention and working memory circuits. Cognitive, Affective and Behavioral Neuroscience, 2, 121–140. Jentschke, S., Koelsch, S. and Friederici, A. D. (2005). Investigating the Relationship of Music and Language in Children: Influences of Musical Training and Language Impairment. Annals of the New York Academy of Sciences, 1060, 231–242. Jentschke, S., Koelsch, S., Sallat, S. and Friederici, A. D. (2008). Children with Specific Language Impairment also show impairment of music-syntactic processing. Journal of Cognitive Neuroscience, 20, 1940–1951. Koechlin, E. and Jubault, T. (2006). Broca’s area and the hierarchical organization of human behavior. Neuron, 50, 963–974. Koelsch, S. (2000). Brain and Music – A contribution to the investigation of central auditory processing with a new electrophysiological approach. PhD thesis, University of Leipzig, Leipzig, Germany. Koelsch, S. (2005). Neural substrates of processing syntax and semantics in music. Current Opinion in Neurobiology, 15, 1–6. Koelsch, S. (2006). Significance of Broca’s area and ventral premotor cortex for music-syntactic processing. Cortex, 42, 518–520. Koelsch, S. (2009). Music-syntactic processing and auditory memory – similarities and differences between ERAN and MMN. Psychophysiology, 46, 179–190. Koelsch, S., Fritz, T., Schulze, K., Alsop, D. and Schlaug, G. (2005b). Adults and children processing music: An fMRI study. Neuroimage, 25, 1068–1076. Koelsch, S., Grossmann, T., Gunter, T. C., Hahne, A., Schröger, E. and Friederici, A. D. (2003b). Children processing music: Electric brain responses reveal musical competence and gender differences. Journal of Cognitive Neuroscience, 15, 683–693.
240 Stefan Koelsch
Koelsch, S., Gunter, T. C., Cramona, D. Y., Zysseta, S., Lohmanna, G. and Friederici, A. D. (2002c). Bach speaks: A cortical ‘language-network’ serves the processing of music. Neuroimage, 17, 956–966. Koelsch, S., Gunter, T. C., Friederici, A. D. and Schröger, E. (2000). Brain Indices of Music Processing: ‘Non-musicians’ are musical. Journal of Cognitive Neuroscience, 12, 520– 541. Koelsch, S., Gunter, T. C., Schröger, E. and Friederici, A. D. (2003c). Processing tonal modulations: An ERP study. Journal of Cognitive Neuroscience, 15, 1149–1159. Koelsch, S., Gunter, T. C., Schröger, E., Tervaniemi, M., Sammler, D. and Friederici, A. D. (2001). Differentiating ERAN and MMN: An ERP-study. Neuroreport, 12, 1385–1389. Koelsch, S., Gunter, T. C., Wittfoth, M. and Sammler, D. (2005). Interaction between Syntax Processing in Language and in Music: An ERP Study. Journal of Cognitive Neuroscience, 17, 1565–1579. Koelsch, S., Heinke, W., Sammler, D. and Olthoff, D. (2006). Auditory processing during deep propofol sedation and recovery from unconsciousness. Clinical Neurophysiology, 117, 1746–1759. Koelsch, S., Jentschke, S., Sammler, D. and Mietchen, D. (2007). Untangling syntactic and sensory processing: An ERP study of music perception. Psychophysiology, 44, 476–490. Koelsch, S. and Jentschke, S. (2008). Short-term effects of processing musical syntax: An ERP study. Brain Research, 1212, 55–62. Koelsch, S., Kilches, S., Steinbeis, N. and Schelinksi, S. (2008). Effects of unexpected chords and of performer’s expression on brain responses and electrodermal activity. PLoS-ONE, 3(7), e2631. Koelsch, S. and Mulder, J. (2002). Electric brain responses to inappropriate harmonies during listening to expressive music. Clinical Neurophysiology, 113, 862–869. Koelsch, S., Maess, B., Grossmann, T. and Friederici, A. D. (2003a). Electric brain responses reveal gender differences in music processing. Neuroreport, 14, 709–713. Koelsch, S. and Sammler, D. (submitted). Cognitive components of processing auditory regularities. Koelsch, S., Schmidt, B. H. and Kansok, J. (2002a). Influences of musical expertise on the ERAN: An ERP-study. Psychophysiology, 39, 657–663. Koelsch, S., Schröger, E. and Gunter, T. C. (2002b). Music Matters: Preattentive musicality of the human brain. Psychophysiology, 39, 1–11. Koelsch, S., Schröger, E. and Tervaniemi, M. (1999). Superior pre-attentive auditory processing in musicians. Neuroreport, 10, 1309–1313. Koelsch, S. and Siebel, W. (2005). Towards a neural basis of music perception. Trends in Cognitive Sciences, 9, 578–584. Korzyukov, O. A., Winkler, I., Gumenyuk, V. I. and Alho, K. (2003). Processing abstract auditory features in the human auditory cortex. Neuroimage, 20, 2245–2258. Krumhansl, C. and Kessler, E. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review, 89, 334–368. Krumhansl, C. and Toivainen, P. (2001). Tonal Cognition. In R. J. Zatorre and I. Peretz (Eds.), The Biological Foundations of Music. Annals of the New York Academy of Sciences, 930, 77–91. Kujala, A., Huotilainen, M., Hotakainen, M., Lennes, M., Parkkonen, L., Fellman, V. and Näätänen, R. (2004). Speech-sound discrimination in neonates as measured with MEG. Neuroreport, 15, 2089–2092.
Unconscious memory representations 241
Kushnerenko, E., Winkler, I., Horváth, J., Näätänen, R., Pavlov, I., Fellman, V. and Huotilainen, M. (2007). Processing acoustic change and novelty in newborn infants. European Journal of Neurosciences, 26, 265–274. Lamont, A. and Cross, I. (1994). Children’s cognitive representations of musical pitch. Music Perception, 12, 68–94. Leino, S., Brattico, E., Tervaniemi, M. and Vuust, P. (2007). Representation of harmony rules in the human brain: Further evidence from event-related potentials. Brain Research, 1142, 169–177. Leman, M. (2000). An auditory model of the role of short-term memory in probe-tone ratings. Music Perception, 17, 481–509. Lerdahl, F. (2001). Tonal Pitch Space. Oxford: Oxford University Press. Liebenthal, E., Ellingson, M. L., Spanaki, M. V., Prieto, T. E., Ropella, K. M. and Binder, J. R. (2003). Simultaneous ERP and fMRI of the auditory cortex in a passive oddball paradigm. Neuroimage, 19, 1395–1404. Loui, P., Grent-’t Jong, T., Torpey, D. and Woldorff, M. (2005). Effects of attention on the neural processing of harmonic syntax in Western music. Cognitive Brain Research, 25, 589–998. Magne, C., Schön, D. and Besson, M. (2006). Musician children detect pitch violations in both music and language better than nonmusician children: Behavioral and electrophysiological approaches. Journal of Cognitive Neuroscience, 18, 199–211. Maidhof, C. and Koelsch, S. (2008). Effects of selective attention on neurophysiological correlates of syntax processing in music and speech. Submitted. Maess, B., Koelsch, S., Gunter, T. C. and Friederici, A. D. (2001). ‘Musical Syntax’ is processed in the area of Broca: An MEG-study. Nature Neuroscience, 4, 540–545. Meyer, L. B. (1956). Emotion and Meaning in Music. Chicago: Chicago University Press. McCarthy, G. and Donchin, E. (1981). A metric for thought: A comparison of P300 latency and reaction time. Science, 211, 77–80. Miranda, R. A. and Ullman, M. T. (2007). Double dissociation between rules and memory in music: An event-related potential study. Neuroimage, 38, 331–345. Molholm, S., Martinez, A., Ritter, W., Javitt, D. C. and Foxe, J. J. (2005). The neural circuitry of pre-attentive auditory change-detection: An fMRI study of pitch and duration mismatch negativity generators. Cerebral Cortex, 15, 545–551. Moreno, S. and Besson, M. (2006). Musical training and language-related brain electrical activity in children. Psychophysiology, 43, 287–291. Näätänen, R., Jacobsen, T. and Winkler, I. (2005). Memory-based or afferent processes in mismatch negativity (MMN): A review of the evidence. Psychophysiology, 42, 25–32. Näätänen, R., Lehtokoski, A., Lennes, M., Cheour, M., Huotilainen, M., Iivonen, A., Vainio, M., Alku, P., Ilmoniemi, R. J., Luuk, A., Allik, J., Sinkkonen, J. and Alho, K. (1997). Languagespecific phoneme representations revealed by electric and magnetic brain responses. Nature, 385, 432–434. Novitski, N., Huotilainen, M., Tervaniemi, M., Näätänen, R. and Fellman, V. (2007). Neonatal frequency discrimination in 250–4000-Hz range: Electrophysiological evidence. Clinical Neurophysiology, 118, 412–419. Opitz, B. and Friederici, A. D. (2007). Neural basis of processing sequential and hierarchical syntactic structures. Human Brain Mapping, 28, 585–592. Opitz, B., Rinne, T., Mecklinger, A., von Cramon, D. Y. and Schröger, E. (2002). Differential contribution of frontal and temporal cortices to auditory change detection: fMRI and ERP Results. Neuroimage, 15, 167–174.
242 Stefan Koelsch
Paavilainen, P., Arajärvi, P. and Takegata, R. (2007). Preattentive detection of nonsalient contingencies between auditory features. Neuroreport, 18, 159–163. Paavilainen, P., Degerman, A., Takegata, R. and Winkler, I. (2003). Spectral and temporal stimulus characteristics in the processing of abstract auditory features. Neuroreport, 14, 715–718. Paavilainen, P., Jaramillo, M. and Näätänen, R. (1998). Binaural information can converge in abstract memory traces. Psychophysiology, 35, 483–487. Paavilainen, P., Simola, J., Jaramillo, M., Näätänen, R. and Winkler, I. (2001). Preattentive extraction of abstract feature conjunctions from auditory stimulation as reflected by the mismatch negativity (MMN). Psychophysiology, 38, 359–365. Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6, 674–681. Patel, A. D., Gibson, E., Ratner, J., Besson, M. and Holcomb, P. (1998). Processing syntactic relations in language and music: An event-related potential study. Journal of Cognitive Neuroscience, 10, 717–733. Patel, A. D., Iversen, J. R., Wassenaar, M. and Hagoort, P. (2008). Musical syntactic processing in agrammatic Broca’s aphasia. Aphasiology, 22, 776–789. Poulin-Charronnat, B., Bigand, E. and Koelsch, S. (2006). Processing of musical syntax tonic versus subdominant: An event-related potential study. Journal of Cognitive Neuroscience, 18, 1545–1554. Pulvermüller, F. and Shtyrov, Y. (2003). Automatic processing of grammar in the human brain as revealed by the mismatch negativity. Neuroimage, 20, 159–172. Regnault, P., Bigand, E. and Besson, M. (2001). Different brain mechanisms mediate sensitivity to sensory consonance and harmonic context: Evidence from auditory event-related brain potentials. Journal of Cognitive Neuroscience, 13, 241–255. Riemann, H. (1877). Musikalische Syntaxis: Grundriss einer harmonischen Satzbildungslehre. Niederwalluf: Sändig. Rinne, T., Degerman, A. and Alho, K. (2005). Superior temporal and inferior frontal cortices are activated by infrequent sound duration decrements: An fMRI study. Neuroimage, 26, 66–72. Rohrmeier, M. (2005). Towards modelling movement in music: Analysing properties and dynamic aspects of pc set sequences in Bach’s chorales. MPhil Thesis. University of Cambridge. Darwin College Research Reports 04. (http://www.dar.cam.ac.uk/dcrr/dcrr004. pdf) Rüsseler, J., Altenmüller, E,. Nager, W., Kohlmetz, C. and Münte, T. F. (2001). Event-related brain potentials to sound omissions differ in musicians and non-musicians. Neuroscience Letters, 308, 33–36. Saarinen, J., Paavilainen, P., Schöger, E., Tervaniemi, M. and Näätänen, R. (1992). Representation of abstract attributes of auditory stimuli in the human brain. Neuroreport, 3, 1149– 1151. Sambeth, A., Huotilainen, M., Kushnerenko, E., Fellman, V. and Pihko, E. (2006). Newborns discriminate novel from harmonic sounds: a study using magnetoencephalography. Clinical Neurophysiology, 117, 496–503. Sammler, D. (2008). The Neuroanatomical Overlap of Syntax Processing in Music and Language – Evidence from Lesion and Intracranial ERP Studies. PhD thesis, University of Leipzig. Schellenberg, E. G., Bigand, E., Poulin-Charronnat, B., Garnier, C. and Stevens, C. (2005). Children’s implicit knowledge of harmony in Western music. Developmental Science, 8, 551–566.
Unconscious memory representations 243
Schön, D. and Besson, M. (2005). Visually induced auditory expectancy in music reading: A behavioural and electrophysiological study. Journal of Cognitive Neuroscience, 17, 694–705. Schön, D., Magne, C. and Besson, M. (2004). The music of speech: Music training facilitates pitch processing in both music and language. Psychophysiology, 41, 341–349. Schonwiesner, M., Novitski, N., Pakarinen, S., Carlson, S. Tervaniemi, M. and Naatanen, R. (2007). Heschl’s gyrus, posterior superior temporal gyrus, and mid-ventrolateral prefrontal cortex have different roles in the detection of acoustic changes. Journal of Neurophysiology, 97, 2075–2082. Schröger, E. (2007). Mismatch negativity – a microphone into auditory memory. Journal of Psychophysiology, 21, 138–145. Schröger, E., Bendixen, A., Trujillo-Barreto, N. J. and Roeber, U. (2007). Processing of abstract rule violations in audition. PLoS ONE, 2, e1131. Schubotz, R. I. (2007). Prediction of external events with our motor system: Towards a new framework. Trends in Cognitive Sciences, 11, 211–218. Shtyrov, Y., Hauk, O. and Pulvermüller, F. (2004). Distributed neuronal networks for encoding category-specific semantic information: The mismatch negativity to action words. European Journal of Neuroscience, 19, 1083–1092. Slevc, L. R., Rosenberg, J. C. and Patel, A. D. (2009). Making Psycholinguistics Musical: Selfpaced reading time evidence for shared processing of linguistic and musical syntax. Psychonomic Bulletin and Review, 16, 374–381. Stefanics, G., Háden, G., Huotilainen, M., Balázs, L., Sziller, I., Beke, A., Fellman, V. and Winkler, I. (2007). Auditory temporal grouping in newborn infants. Psychophysiology, 44, 697–702. Steinbeis, N. and Koelsch, S. (2008). Shared neural resources between music and language indicate semantic processing of musical tension-resolution patterns. Cerebral Cortex, 18, 1169–1178. Steinbeis, N., Koelsch, S. and Sloboda, J. (2006). The role of harmonic expectancy violations in musical emotions: Evidence from subjective, physiological, and neural responses. Journal of Cognitive Neuroscience, 18, 1380–1393. Sussman, E. S. (2007). A new view on the MMN and attention debate: The role of context in processing auditory events. Journal of Psychophysiology, 21, 164–170. Sussman, E. S., Kujala, T., Halmetoja, J., Lyytinen, H., Alku, P. and Näätänen, R. (2004). Automatic and controlled processing of acoustic and phonetic contrasts. Hearing Research, 190, 128–140. Tekman, H. G. and Bharucha, J. J. (1998). Implicit knowledge versus psychoacoustic similarity in priming of chords. Journal of Experimental Psychology: Human Perception and Performance, 24, 252–260. Tervaniemi, M., Rytkönen, M., Schröger, E., Ilmoniemi, R. J. and Näätänen, R. (2001). Superior formation of cortical memory traces for melodic patterns in musicians. Learning and Memory, 8, 295–300. Tillmann, B., Bharucha, J. J., Bigand, E. (2000). Implicit learning of tonality: A self-organized approach. Psychological Review, 107, 885–913. Tillmann, B., Janata, P. and Bharucha, J. J. (2003). Activation of the inferior frontal cortex in musical priming. Cognitive Brain Research, 16, 145–161.
244 Stefan Koelsch
Tillmann, B., Janata, P., Birk, J. and Bharucha, J. J. (2008). Tonal centers and expectancy: Facilitation or inhibition of chords at the top of the harmonic hierarchy? Journal of Experimental Psychology: Human Perception and Performance, 34, 1031–1043. Tillmann, B., Koelsch, S., Escoffier, N., Bigand, E., Lalitte, P., Friederici, A. D. and von Cramon, D. Y. (2006). Cognitive priming in sung and instrumental music: Activation of inferior frontal cortex. Neuroimage, 31, 1771–1782. van den Brink, D., Brown, C. M. and Hagoort, P. (2001). Electrophysiological evidence for early contextual influences during spoken-word recognition: N200 versus N400 effects. Journal of Cognitive Neuroscience, 13, 967–985. Weber, G. (1817). Versuch einer geordneten Theorie der Tonsetzkunst, 2 vols. Mainz: B. Schott. Widmann, A., Kujala, T., Tervaniemi, M., Kujala, A. and Schröger, E. (2004). From symbols to sounds: Visual symbolic information activates sound representations. Psychophysiology, 41, 709–715. Winkler, I. (2007). Interpreting the Mismatch Negativity. Journal of Psychophysiology, 21, 147–160. Winkler, I., Kujala, T., Tiitinen, H., Sivonen, P., Alku, P., Lehtokoski, A., Czigler, I. Csépe, V., Ilmoniemi, R. J. and Näätänen, R. (1999). Brain responses reveal the learning of foreign language phonemes. Psychophysiology, 36, 638–642. Winkler, I., Kushnerenko, E., Horváth, J., Ceponiene, R., Fellman, V., Huotilainen, M., Näätänen, R. and Sussman, E. (2003). Newborn infants can organize the auditory world. Proceedings of National Academy of Sciences, 100, 11812–11815. Woolhouse, M. H. and Cross, I. (2006). An interval cycle-based model of pitch attraction. In Proceedings of the 9th International Conference on Music Perception & Cognition, Bologna, Italy (pp. 763–771). Ylinen, S., Shestakova, A., Huotilainen, M., Alku, P. and Näätänen, R. (2006). Mismatch negativity (MMN) elicited by changes in phoneme length: A cross-linguistic study. Brain Research, 1072, 175–185.
chapter 9
On the psychophysiology of aesthetics Automatic and controlled processes of aesthetic appreciation Thomas Jacobsen
Helmut Schmidt University / University of the Federal Armed Forces Hamburg / University of Leipzig, Germany
9.1
Aesthetic appreciation
Aesthetics deals with aspects of the visual, plastic or fine arts, such as painting, drawing, sculpture, architecture, film-making, photography, print-making and the like, and with the performing arts, opera, music, theatre and dance. Aesthetics is concerned with literature and other arts, including applied arts. It also deals with the beauty of natural settings and everyday objects. This enormous variety in mental processes is reflected by the numerous different, unconscious and conscious memory processes involved. Aesthetic processing is also a uniquely human faculty that makes for a very interesting, albeit complex research topic (e.g., Jacobsen 2006). Aesthetic processing, in general, can be divided into receptive, central and productive processes (see Figure 1; Jacobsen and Höfel 2003). Aesthetic appreciation encompasses receptive and central processes. Productive processes relate to overt behavior, i.e. aesthetic expressions such as painting, music, poetry, dance, theater and the like. There are varied qualities of aesthetic appreciation that recruit different sub-processes. For instance, aesthetic judgment, aesthetic contemplation, and aesthetic distraction can be counted among aesthetic appreciation. All of these entail perception, i.e. receptive processes. Aesthetic distraction can be observed when a person involuntarily switches attention towards the aesthetic processing of an entity, e.g., an object that has very high aesthetic appeal, or someone who is strikingly beautiful. Aesthetic contemplation, on the other hand, requires thinking. It involves reflection about a subjective evaluation, but it does not necessarily include an overt response or a judgment. One might think about the aesthetic value of an
246 Thomas Jacobsen
Figure 1. Schematic representation of sub-processes involved in aesthetic processing. Aesthetic processing includes receptive, central, and productive sub-processes. Aesthetic appreciation of beauty is a sub-form of aesthetic processing which mainly includes receptive and central processes. In the present experiment, aesthetic judgments of beauty were performed. Other aspects of aesthetic appreciation of beauty may be aesthetic distraction or aesthetic contemplation, for example. (Adapted with permission from Höfel and Jacobsen 2007b.)
object without coming to a conclusion in the end. Aesthetic judgment also requires reflective thinking. Furthermore, it has a product, the judgment. This may be in the form of an evaluative categorization and a decision whether the entity is beautiful, harmonic, elegant, etc., or not. Finally, there may be an overt expression of the decision (judgment report). This chapter will focus on receptive and central aspects of aesthetic processing, that is on aesthetic appreciation. It will highlight the contributions of unconscious and conscious memory processes involved. When we take the different art forms mentioned above into consideration, it becomes readily obvious that the number and complexities of mental component processes and stages of processing involved in an episode of aesthetic appreciation can vary dramatically. Contemplating the beauty of a nameless exotic flower differs vastly from enjoying a staging of Wagner’s Ring of the Nibelung. Expertise with the subject matter, naturally, determines the levels of processing that can be reached at all. Furthermore, there is virtually no thing, it seems, that cannot be appreciated aesthetically. The human mind is accommodating all the different entities, objects, designs, works of art, and individual tastes. Are there limits to the extent of variation in aesthetic appreciation, limits to the seeming arbitrariness? Understanding the mental modes of processing appears to be crucial for answering this question. One approach to illuminating this issue is looking at automatic versus controlled processing.
9.2
On the psychophysiology of aesthetics 247
Automatic and controlled processing
Situations of aesthetic appreciation differ vastly in complexity. A quick glimpse of an unknown, but flower may be quick and relatively simple. It may engage only (a few) automatic mental processes. Whereas a night at the opera requires the operation of a multitude of mental processes in concert, automatic and controlled ones at that. There are hierarchies and cascades of processes that lock into each other. Some instances of aesthetic appreciation, like aesthetic contemplation and aesthetic judgment, entail controlled mental processing at the highest level of processing (e.g, Jacobsen and Höfel 2003; Leder et al. 2006). The case is less clear for aesthetic distraction or mere aesthetic preference (for beauty). In any case, numerous cognitive processes forming the informational basis for aesthetic appreciation is carried out automatically. The present chapter attempts to trace some of these processes by describing underlying memory systems, processes, and representations.
9.3
Memory systems, processes and representations
In this section, I review memory systems, processes and representations that have been implicated in the literature, or should be implicated to contribute to aesthetic appreciation. These can be modality-specific, or even domain-specific within a given sensory modality, like can be postulated for music and speech, for instance. Or they are supra-modal, central and maybe also domain-general. Memory, obviously, is affected greatly by expertise, at different levels of processing. Whether or not changes to perceptual organization based on training, frequent exposure or the like should be considered memory proper can be a matter of debate, nonetheless they are included in this chapter. Apparently, memory operates and contributes at all levels of cognitive processing to aesthetic appreciation. The present chapters traces several lines of research in aesthetic appreciation. These do not constitute mutually exclusive postulations of memory systems, rather these are lines of research that set different foci such that their objectives may overlap in terms of memory systems.
9.3.1
Mere exposure / familiarity
The mere exposure accounts of aesthetic appreciation holds that familiarity through repeated exposition with an entity leads to increased liking (e.g., Zajonc 1968). This is a very well established account (see Berlyne 1971) that can operate outside
248 Thomas Jacobsen
the cognitive framework as well. A behaviorist stimulus-response analysis is sufficient in order to make predictions about an individuals aesthetic preference. In a recent study, Tinio and Leder (2009) investigated effects of familiarization on the aesthetic judgment of graphic patterns. They used stimuli that varied in symmetry as well as in complexity (Jacobsen and Höfel 2002) in different mere exposure protocols. They could, indeed, show substantial effects using massive familiarization. Mere exposure is one of the candidate concepts accounting for the appeal and effects of advertising. Familiarization at shallow levels of processing is used to induce positive attitudinal effects. These processes, in general, occur automatically and non consciously.
9.3.2 Fluency theory Fluency theory of aesthetic appreciation holds that we generally prefer entities that we process fluently over those that we process less fluently (Reber et al. 2004). Apart from aspects of the architecture of the sensory system as reflected for instance by the Gestalt laws of visual perception, there are effects of perceptual training, the acquisition of expertise in a sensory domain, or simply sensory conventions that are culturally determined and that vary over time. Semantic memory and conceptual knowledge affect processing top-down. Taken together, these factors affect the perceptual fluency with which a stimulus is processed. Fluency theory is a broad concept that overlaps with almost all memory concepts mentioned in this chapter in one way or another. Research will have to determine under which conditions of aesthetic appreciation the fluency concept can account for most of the variance in the data, and under which conditions other theoretical notions hold more explanatory value (see e.g. the subsection on attitudes).
9.3.3
Procedural knowledge / specialized perceptual systems
We command a wealth of procedural knowledge, that we most often engage automatically. What does this have to do with aesthetic appreciation, one may ask. Quite a lot, because our perceptual strategies, our ways to see, hear, touch, and taste things are dependent on procedural knowledge. Perceptual strategies are massively historically, culturally and socially relative. When we have learned to see the world differently, the world becomes different. Now, one may ask whether or not this notion has any bearing on aesthetic appreciation. It does, a lot in fact, because the procedural knowledge controls the construction of the mental representations that form the basis of aesthetic appreciation.
On the psychophysiology of aesthetics 249
In a series of ERP studies of aesthetic judgment, for instance, we have observed that the cortical ERP components up to about 250 ms after stimulus onset in a visual protocol were not affected by the intention to make an aesthetic judgment or by beauty. The N170 component that is related to face processing was not modulated by the intention to make an aesthetic judgment in a study by Roye et al. (2008). Cognitive information construction takes place before intentional evaluation with respect to beauty sets in. In a similar vein, ERPs reflecting visual processes of novel graphic patterns were not affected by evaluative versus descriptive judgment processing in Jacobsen and Höfel (2003). Culturally and historically determined viewing strategies affect aesthetic appreciation. Starting with literacy, our media usage affects and determines perceptual processes and therewith aesthetic appreciation. One example of culturally determined viewing habits affecting aesthetic judgment has been reported by Jacobsen and Wollsdorf (2007) in a study of the Bauhaus color-form icon. In audition, language and musical training have vast effects on auditory object formation (cross-reference to MMN and Music chapters). The perceptual construction of mental representations in aesthetic appreciation uses procedural knowledge. In a study of eye movements and ocular fixations, for instance, Nodine et al. (1993) observed that trained art viewers employed visual scanning patterns that differed from those of not trained individuals. That is, experts constructed information about the compositional design of a given visual artwork in a different way than non-experts. Procedural knowledge certainly operates at various levels of processing during aesthetic appreciation. While some processes employ conventional procedural knowledge that is culturally determined and therefore available to most individuals of a given group, expert knowledge requires substantial training and refinement. Future research will help to plot the respective contributions of these kinds of procedural memory against each other.
9.3.4 Prototypes Prototypes are long-term memory representations of semantic categories (Rosch 1975). They guide classification of perceptual objects via goodness of fit to a central exemplar of a category, i.e. the prototype. Hence they are part of semantic knowledge, but do receive special mention in the course of this chapter, because a body of work on their role in aesthetic appreciation has been compiled. Whitfield and Slatter (1979) showed that prototypicality of items of furniture had an effect on the aesthetic choices participants made. Furniture that fit a particular prototype better was preferred aesthetically over items with a poorer fit.
250 Thomas Jacobsen
In product design, the so-called MAYA principle, “most advanced, yet acceptable” (Richard Loewi; see Hekkert et al. 2003), captures the idea that good design should seek to differentiate itself from the average while still being accommodated into the prototype for this design. A new Volkswagen Golf, for example, has to be different from the predecessor while still clearly being a Golf. If a beholder can successfully accommodate an object into the knowledge system, this object, on average, is likely to have grater aesthetic appeal than a very unusual object, or one that cannot be accommodated at all. Hekkert and colleagues (2003) showed that objects are likely to receive higher appraisal when they also feature a certain degree of novelty. Therefore, typicality and novelty are joint predictors of aesthetic preference. In a study on the appreciation of paintings, Hekkert and van Wieringen (1996) showed that prototypicality along with complexity are determinants of the appraisal of cubist paintings (see also Martindale and Moore 1988, for a study on prototypes affecting preference). Our ERP studies on aesthetic appreciation (Jacobsen and Höfel 2003; Höfel and Jacobsen 2007a, 2007b; Roye et al. 2008) revealed time courses that suggest that perceptual processing takes place prior to aesthetic evaluation. Therefore, it may be argued that prototypes, if a beholder commands them, are automatically activated during the perceptual processing of an object, that underlies aesthetic appreciation. Prototypes matter in the construction of a mental representation of an object. This, of course, does not imply that prototypiclity is sufficient in accounting for aesthetic preference. Future research will have to determine under which conditions prototypicality can account for the bulk of the variance in aesthetic appreciation, and under which it cannot (see for instance section on attitudes).
9.3.5
Semantic memory
Semantic memory, our conceptual knowledge system in a broader sense, plays a major role in aesthetic appreciation. In a study of expert and non-expert beholder, Hekkert and van Wieringen (1996) observed that expertise in visual art affects the aesthetic appreciation of works of art. In their study, experts weighted perceived originality of an artwork more strongly than non-experts did. Knowledge about art history, art theory and the like determine the richness, complexity, and precision of the mental representation of an artwork that a beholder is able to construct. Giving viewers background information about an artwork, for instance titles of differing degrees of elaborateness, therefore affects aesthetic appreciation (Leder et al. 2006). In a similar vein, Nodine et al. (1993) suggested that trained
On the psychophysiology of aesthetics 251
art viewers constructed information about the compositional design of a given visual artwork in a different way than non-experts. This may in part be considered to be driven by procedural knowledge, but search for particular pieces of information is guided by the expert conceptual knowledge. Expert and non-expert judges alike, often spontaneously use associations based on world knowledge in aesthetic processing, for instance when asked to for instance when asked to produce color-form assignments (Jacobsen 2002). Participants also use spontaneous associations retrieved from semantic (and episodic) memory as informational cues informing their evaluative processing, for instance aesthetic judgement of novel graphic patterns (Jacobsen 2004; Jacobsen and Höfel 2002, 2003; Jacobsen et al. 2006). In addition, beholders command conceptual structures representing various domains of aesthetic appreciation. Semantic memory that is relevant for the aesthetics of objects (Jacobsen et al. 2004) or music (Istok et al. 2009) can be thought of as being represented in networks that engage when aesthetic processing in the relevant domain occurs. Future research will have to carefully determine which levels of representation and types of semantic memory contribute to domain-general and to domainspecific processes of aesthetic appreciation.
9.3.6 Cognitive fluency Cognitive fluency denotes a variant of the fluency account that emphasizes higher-level cognitive processes. It deals with cognitive mastering, style classification and other related processes (e.g., Leder et al. 2006). For the appreciation of some works of art, having a little background knowledge, history or even just a title makes all the difference. For a lot of modern art, giving it a title does change the aesthetic processing considerably. On the other hand, if no background knowledge about the work is conveyed to the beholder he or she might not have a chance to fully appreciate it. Cognitive fluency deals with these aspects. But also with style classification and other elements of higher cognition is aesthetic appreciation. In particular, being able to construct meaning attached to an artwork, on the basis of title information or other, may lead to a pleasurable aesthetic experience (e.g., Millis 2001). Even though access to the knowledge structure is predominantly automatic, active search for the critical bits of information, in a museum for instance, is highly controlled, intentional processing.
252 Thomas Jacobsen
9.3.7 Attitudes Attitudes are memory representations that entail three facets, a knowledge portion, a valence aspect, and a behavioural component. For a lot of entities, we store our evaluation and our behavioural tendency along with our knowledge. Attitudes get activated automatically and allow to make quick assessments in everyday situations (for a review see e.g., Petty et al. 1997). Attitudes, not surprisingly, also have powerful influence on aesthetic appreciation. It stands to reason that memory-stored evaluations may frequently override mere exposure, familiarity, or fluency when individual aesthetic preferences are concerned. Ritterfeld (2002) reported a study of interior design preference. These were governed by attitudes via social heuristic processing, if the social marking of an object was decodeable. An analysis based on formal features of an object (piece of furniture) would only be conducted, if the attitudinal information was not available or not sufficient. The implicit association test (IAT) is a behavioral experimental technique that allows to covertly measure automatic activation of attitudes, i.e., the unconscious use of the attitude memory system (see e.g., Fazio and Olson 2003). In a study of the thin-body ideal with female undergraduate students, Ahern and colleagues (2008) investigated body dissatisfaction internalization of a thin-ideal, and eating disorder symptoms using the IAT as a measure of automatic attitude activation (The IAT has also been used in so far unpublished studies of attitudes towards visual arts). Based on the research on the automatic activation of attitudes, it stands to reason that attitudes may have a strong, fast effect on aesthetic appreciation whenever stimuli are accommodated into an individual’s attitude system. Base on knowledge and expertise, this will very often be the case. In some cases, attitudes may even lead to full-blown memory-based judgments of, for instance, beauty, rather than aesthetic ones. Partly for these reasons some research on aesthetic appreciation of beauty has resorted to using novel stimuli in order to prevent attitude effects (e.g., Jacobsen 2004; Jacobsen and Höfel 2002, 2003; Höfel and Jacobsen 2003, 2007a, 2007b). Future research will have to illuminate the specific contributions of attitudes to aesthetic appreciation as well as their time courses in the various domains.
On the psychophysiology of aesthetics 253
9.3.8 Episodic memory Episodic memory, most likely, plays a fundamental and very important role in aesthetic appreciation. The systematic study of its involvement appears to be developed to a lesser degree than other memory systems mentioned in the present chapter. In our studies, participants routinely report to spontaneously employ associations based on episodic memory as part of the informational bases for their aesthetic judgment or their aesthetic preferences (e.g., Jacobsen 2004; Jacobsen and Höfel 2002, 2003). The brain network subserving aesthetic judgment of novel graphic patterns, as contrasted with a symmetry judgment of the same material, entails structures considered to be involved in the memory of life events (Jacobsen et al. 2006). Again, participants spontaneously make use of associations form episodic memory without being instructed to do so. These findings go along well with the fact that episodic memory is often considered to be activated spontaneously or even automatically. The latter holds, especially, when contents of strong valence or high emotional importance are involved. Episodic memory may be often revisited, and restructured in the course. It can even be considered constructive (for a review see, e.g., Schacter and Addis 2007). Automatic activation of the memory of life events can be strongly inductive of (aesthetic) emotions. Konecni (2008) argued that episodic memory is a major, if not the major, factor governing the induction of emotion by music. Listeners prefer, and aesthetically appreciate, music for its sentimental value. The choice of the music per se may be somewhat arbitrary, but the strength of its link to episodic memory is not. This may in fact be the driving force, as Konecni suggests, behind the influential results of Blood and Zatorre (2001), who reported brain correlates of intensely pleasurable music reception. Participants were asked to bring music of their own choice, that would reliably induce chills. Their selection of music, indeed, comprised a wider range of musical styles and genres, suggesting that emotion elicitation is driven more strongly by individual episodic memory associations than a common denominator in the music itself. In a different vein, Zentner et al. (2008) identified ‘nostalgia’ as one of the nine important aesthetic emotions in music. Using participants' self-report data, a model of the structure of aesthetic emotions in music was derived based on a cluster analysis approach. Episodic memory may be of differential importance for various content domains of aesthetics and the arts and for different sensory modalities (recall the special status of smell), but its unconscious, as well as conscious, processes clearly render major contributions to aesthetic appreciation. Future research will illuminate these in greater detail.
254 Thomas Jacobsen
9.3.9 Scripts/schemata Scripts, or schemata in more general terms, are highly structured often hierarchical long-term memory representations that allow us to navigate everyday life without being overtaxed by the most mundane things (see e.g., Abelson 1981). Entering a restaurant, we automatically activate a script that lets us expect to be given a menu after being seated, to make our choice of food and drink, wait for our order to be brought to the table by a waitress or waiter, and pay after we have eaten. When everything runs smoothly as expected, we often do not even notice the script that allows us to allocate processing resources to things other than eating and drinking, the conversation for instance. When our script is insufficient, someone challenges it, or considers it inappropriate, that is when we need to allocate attention to the situation at hand. Scripts are learned, they are historically, culturally, and socially relative, and we command quite a number of them for many types of situations (for an introduction see e.g., Brewer and Nakamura 1984). Schemata also govern many situations in which aesthetic appreciation takes place. Viewing a ready-made work of art like Marcel Duchamp’s “fountain” in a museum or an art gallery is a fundamentally different experience than viewing the same object in a hardware store, provided that the beholder has activated an art script in the former and an utilitarian everyday script in the latter context. Schemata can also be social relational schemata that help us navigate encounters and interactions with other individuals (e.g., Baldwin 1992). Also in aesthetic appreciation, we have such mental representations of (ritualized) social interaction. Artists often use the mental script representations of their audience by introducing deviations and surprises (see Bernstein and Rubin for a study on the cultural relativity of scripts). Research on the role of these memory systems in the various domains of aesthetic appreciation is quite scarce. This is so despite the fact that scripts and schemata are central concepts in accounting for the role of situational context. Furthermore, individuals have the capacity to engage schemata actively. The intention to contemplate an object aesthetically can be generated by a beholder, fully independently of an (automatic) activation of an art script. Future research in Experimental Aesthetics will elaborate on the pivotal role that schemata play in many aesthetic receptive situations.
9.4
On the psychophysiology of aesthetics 255
Summary and conclusion
The present chapter has reviewed memory systems that operate at different levels of processing in aesthetic appreciation. Basic sensory, perceptual, and cognitive processing is carried out automatically, and largely unconsciously, providing the informational basis for higher order processes of aesthetic contemplation or judgment. Our electrophysiological, predominantly ERP results show us that aesthetic judgment of graphic patterns, faces and musical cadences is preceded by the construction of information based on various memory systems operation unconsciously.
References Abelson, R. P. (1981). Psychological status of the script concept. American Psychologist, 36, 715–729. Ahern, A. L., Bennett, K. M. and Hetherington, M. M. (2008). Internalization of the ultra-thin ideal: Positive implicit associations with underweight fashion models are associated with drive for thinness in young women. Eating Disorders, 16, 294–307. Baldwin, M. W. (1992). Relational Schemas and the Processing of Social Information. Psychological Bulletin, 112, 461–484. Berlyne, D. E. (1971). Aesthetics and Psychobiology. New York: Appleton-Century-Crofts. Bernstein, D. and Rubin, D. C. (2004). Cultural life scripts structure recall from autobiographical memory. Memory & Cognition, 32, 427–442. Blood, A. J. and Zatorre, R. J. (2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the National Academy of Sciences, 98, 11818–11823. Brattico, E., Brattico, P. and Jacobsen, T. (2009). The origins of the aesthetic enjoyment of music. A review of the literature. Musicae Scientiae, Special Issue, 15–39. Brewer, W. F. and Nakamura, G. V. (1984). The nature and functions of schemas. Urbana-Champaign: University of Illinois Press. Fazio, R. H. and Olson, M. A. (2003). Implicit measures in social cognition research: Their meaning and use. Annual Review of Psychology, 54, 297–327. Istók, E., Brattico, E., Jacobsen, T., Krohn, K. and Tervaniemi, M. (2009). Aesthetic responses to music: A questionnaire study. Musicae Scientiae, 13, 183–206. Hekkert, P. and van Wieringen, P. C. W. (1990). Complexity and prototypicality as determinants of the appraisal of cubist paintings. British Journal of Psychology, 81, 483–495. Hekkert, P. and van Wieringen, P. C. W. (1996). Beauty in the eye of expert and nonexpert beholders: A study in the appraisal of art. American Journal of Psychology, 109, 389–407. Hekkert, P., Snelders, D. and van Wieringen, P. C. W. (2003). ‘Most advanced, yet acceptable’: Typicality and novelty as joint predictors of aesthetic preference in industrial design. British Journal of Psychology, 94, 111–124.
256 Thomas Jacobsen
Höfel, L., Lange, M. and Jacobsen, T. (2007). Beauty and the teeth: Perception of tooth color and its influence on the overall judgment of facial attractiveness. International Journal of Periodontics & Restorative Dentistry, 27, 349–357. Höfel, L. and Jacobsen, T. (2003). Temporal stability and consistency of aesthetic judgments of beauty of formal graphic patterns. Perceptual and Motor Skills, 96, 30–32. Höfel, L. and Jacobsen, T. (2007a). Electrophysiological indices of processing aesthetics: Spontaneous or intentional processes? International Journal of Psychophysiology, 65, 20–31. Höfel, L. and Jacobsen, T. (2007b). Electrophysiological indices of processing symmetry and aesthetics: A result of judgment categorization or judgment report? Journal of Psychophysiology, 21, 9–21. Jacobsen, T. (2002). Kandinsky’s questionnaire revisited: Fundamental correspondence of basic colors and forms? Perceptual and Motor Skills, 95, 903–913. Jacobsen, T. (2004). Individual and group modeling of aesthetic judgment strategies. British Journal of Psychology, 95, 41–56. Jacobsen, T. (2006). Bridging the arts and sciences: A framework for the psychology of aesthetics. Leonardo, 39, 155–162. Jacobsen, T., Buchta, K., Köhler, M. and Schröger, E. (2004). The primacy of beauty in judging the aesthetics of objects. Psychological Reports, 94, 1253–1260. Jacobsen, T. and Höfel, L. (2002). Aesthetic judgments of novel graphic patterns: Analyses of individual judgments. Perceptual and Motor Skills, 95, 755–766. Jacobsen, T. and Höfel, L. (2003). Descriptive and evaluative judgment processes: Behavioral and electrophysiological indices of processing symmetry and aesthetics. Cognitive, Affective and Behavioral Neuroscience, 3, 289–299. Jacobsen, T., Schubotz, R. I., Höfel, L. and v. Cramon, D. Y. (2006). Brain correlates of aesthetic judgment of beauty. Neuroimage, 29, 276–285. Jacobsen, T. and Wolsdorff, C. (2007). Does history affect aesthetic preference? Kandinsky’s teaching of colour-form correspondence, empirical aesthetics, and the Bauhaus. The Design Journal, 10, 16–27. Konecni, V. J. (2008). Does music induce emotion? A theoretical and methodological analysis. Psychology of Aesthetics, Creativity, and the Arts, 2, 115–129. Leder, H., Carbon, C. C. and Ripsas, A. L. (2006). Entitling art: Influence of title information on understanding and appreciation of paintings. Acta Psychologica, 121, 176–198. Martindale, C. and Moore, K. (1988). Priming prototypically and preference. Journal of Experimental Psychology: Human Perception and Performance, 14, 661–670. Millis, K. (2001). Making meanings brings pleasure: The influence of titles on aesthetic experiences. Emotion, 1, 320–329. Nodine, C. F., Locher, P. J. and Krupinski, E. A. (1993). The role of formal art training on perception and aesthetic judgment of art compositions. Leonardo, 26, 219–227. Petty, R. E., Wegener, D. T. and Fabrigar, L. R. (1997). Attitudes and attitude change. Annual Review of Psychology, 48, 609–647. Reber, R., Schwarz, N. and Winkielman, P. (2004). Processing Fluency and Aesthetic Pleasure: Is Beauty in the Perceiver’s Processing Experience? Personality and Social Psychology Review, 8, 364–382. Ritterfeld, U. (2002). Social heuristics in interior design preferences. Journal of Environmental Psychology, 22, 369–386. Rosch, E. (1975). Cognitive Representations of Semantic Categories. Journal of Experimental Psychology: General, 104, 192–233.
On the psychophysiology of aesthetics 257
Roye, A., Höfel, L. and Jacobsen, T. (2008). Aesthetics of faces: Behavioural and electrophysiological indices of evaluative and descriptive judgment processes. Journal of Psychophysiology, 22, 41–57. Schacter, D. L. and Addis, D. R. (2007). The cognitive neuroscience of constructive memory: Remembering the past and imagining the future. Philosophical Transactions of the Royal Society, London, Biological Sciences, 362, 773–786. Tinio, P. L. and Leder, H. (2009). Just how stable are stable aesthetic features? Symmetry, complexity, and the jaws of massive familiarization. Acta Psychologica, 130, 241–250. Whitfield, T. W. A. and Slatter, P. E. (1979). The effects of categorization and prototypicality on aesthetic choice in a furniture selection task. British Journal of Psychology, 70, 65–75. Zajonc, R. B. (1968). Attitudinal effects of mere exposure. Journal of Personality and Social Psychology Monograph Supplements, 9(2 Pt 2), 1–27. Zentner, M., Grandjean, D. and Scherer, K. R. (2008). Emotions evoked by the sound of music: Characterization, classification, and measurement. Emotion, 8, 494–521.
appendix
Using electrophysiology to study unconscious memory representations Alexandra Bendixen
Institute for Psychology of the Hungarian Academy of Sciences, Budapest, Hungary / Institute for Psychology I, University of Leipzig, Germany
10.1
Basic principles of electroencephalography (EEG)
Studying unconscious memory essentially taps into phenomena which cannot be verbally reported by the participant. Consequently, there is a need for measurement techniques that do not require overt responses. This appendix introduces one such technique, the electroencephalogram (EEG), as well as two main measures that can be derived from it, namely event-related potentials (ERPs) and oscillatory activity. The EEG technique is only briefly outlined here; for more comprehensive treatments, the reader is referred to Fabiani et al. (2007), Handy (2005), Luck (2005) as well as Rugg and Coles (1995). Electroencephalography is a relatively old technique for measuring brain activity (Berger 1929) that has kept its advantages in terms of non-invasiveness and low cost of the measurement. Most importantly, however, the EEG provides a temporal resolution on the order of milliseconds, which is superior to any other non-invasive technique. The beauty of this temporal accuracy lies in the acquisition of an on-line measure of brain activity. This allows for determining the timecourse of information processing, which is impossible with composite measures such as response times and error rates because they comprise all processes from stimulus perception to response execution. Unlike behavioral measures, the EEG also allows for tracking the processing of unattended stimuli. Moreover, the EEG facilitates the investigation of clinical groups and infants who may not be able to give a verbal or motor response. In principle, the EEG measures the electrical activity of the brain through electrodes placed on the scalp. Metal electrodes are specifically prepared to enhance conductivity, attached to the scalp surface, and connected to an amplifier.
260 Alexandra Bendixen
Electrical activity can then be measured as a variation of voltage over time between an electrode of interest and an electrically neutral reference electrode. Although electric neutrality is an idealized assumption, some feasible reference positions have become common practice; these include the nose, earlobes, and mastoid sites. The EEG signal at a given electrode site thus consists in a voltage difference relative to the reference electrode. For placing the electrodes of interest, the so-called 10–20 system has been developed as a standard scheme (Jasper 1958). As illustrated in Figure 1, anatomical landmarks are defined by the nasion (Nz), the inion (Iz), and the bilateral pre-auricular points (A1, A2). The distances between these points are divided in steps of 10 and 20%, respectively, to yield standard electrode locations labeled by their general region with C (central, midway between nasion and inion), F (frontal), Fp (frontopolar), P (parietal), O (occipital), and T (temporal). The last character of each label denotes the electrode position from left to right, with ‘z’ (‘zero’) indicating the midline (50% between the pre-auricular points), odd numbers denoting the left hemisphere (increasing towards the left pre-auricular point), and even numbers denoting the right hemisphere (increasing towards the right pre-auricular point). The 10–20 system was later expanded (Chatrian et al. 1985; Oostenveld and Praamstra 2001) to include intermediate electrode positions. The majority of electrophysiological studies use
Figure 1. Placement of electrodes in the 10–20 system (Jasper 1958). Intermediate positions are labeled accordingly, e.g., ‘C1’ halfway between Cz and C3, and ‘FCz’ halfway between Fz and Cz (Chatrian et al. 1985; Oostenveld and Praamstra 2001).
Appendix 261
placement schemes derived from the 10–20 system, which facilitates comparability between studies (Nuwer et al. 1998; Picton et al. 2000). The obtained EEG signal reflects summated electrical activity in the brain, whether it was caused by neuronal activity (mainly postsynaptic potentials) or by other sources such as muscle activity, blood flow, and eye movements. The impact of the latter contributors can be reduced by appropriate recording, filtering, and artifact correction procedures. The remaining signal is thought to reflect neuronal activation mainly generated in vertically aligned pyramidal cells (Allison et al. 1986). Activity from other neuronal populations might remain undetectable if their geometric configuration is such that their activation does not summate, or if the population is not large enough: Synchronous activity of about 10.000 neurons is necessary for a voltage to be measurable at the scalp. Therefore, the EEG is insensitive towards part of the neuronal activity, which should be kept in mind for its interpretation. The ongoing EEG signal can be used for a rough assessment of cognitive states such as level of vigilance. Yet the major potential of the technique lies in analyzing the changes in the EEG that are time-locked to an internal or external event in order to track the processing of this event (and to compare it to the processing of a different event, or of the same event under different conditions). Unfortunately, signal changes related to event processing are small compared to background activity caused by ongoing processes in the brain. Nevertheless, it is possible to separate the relevant signal from the background (noise) by presenting the same event numerous times, cutting out EEG epochs around each event, and calculating an average of the resulting EEG segments (Figure 2). The averaging technique is based on the assumption that any activity that is uncorrelated to the processes under study should fluctuate randomly and thus cancel out, whereas the activity of interest should be time-locked to event onset and thus remain as a characteristic potential elicited by the given event under the given conditions. This potential is called the event-related potential (ERP). ERP averaging leads to a decrease of the EEG background noise as a function of the square root of the number of epochs in the average. A further improvement of the signal-to-noise ratio is obtained by averaging across the ERPs elicited by the same event in different individuals. So-called grand averages are usually formed from the ERPs of 10 to 20 participants. The underlying assumption here is that of general psychology: the elicitation of similar processes and thus similar brain activity in all individuals in the sample. The interpretation of ERP data needs to take into account potential distortions caused by averaging over trials (e.g., Atienza et al. 2005) and over participants (e.g., Winkler et al. 2001). Single-trial ERP analysis remains challenging, yet highly desirable to solve these interpretation issues and
262 Alexandra Bendixen
Figure 2. Schematic illustration of EEG measurement and ERP derivation. A stimulus is presented numerous times. Signs of stimulus processing are hardly noticeable in the ongoing EEG, but can be revealed by averaging over multiple EEG epochs time-locked to stimulus presentation. The signal-to-noise ratio of the resulting ERP increases with the square root of the number of epochs in the average. Note that the EEG and ERP panels have different scales and that negative is plotted upwards.
to increase the flexibility of ERP protocols (e.g., Atienza et al. 2005; Jongsma et al. 2006; cf. review by Spencer 2005). ERPs are displayed in voltage-over-time diagrams, with ‘0’ usually denoting event onset (Figure 2). ERP waveforms are characterized by a sequence of positive and negative voltage deflections with local maxima (peaks). These deflections are labeled by their polarity (‘P’ for positive, ‘N’ for negative) and by their ordinal position in the waveform (e.g., ‘N1’ denoting the first negative peak) or by their latency in ms (e.g., ‘N100’ denoting a negative peak occurring 100 ms after stimu-
Appendix 263
lus onset). Further subdivisions are indicated by small letters after the name (e.g., ‘N2a’, ‘N2b’ for two subcomponents of the second negative peak in the waveform). Many of these deflections became associated with putative cognitive functions as they were consistently observed under certain experimental conditions. New labels were chosen to reflect this functional aspect (e.g., ‘mismatch negativity’). Still alternatively, names can be determined topographically (e.g., ‘left anterior negativity’) when a deflection exhibits a consistent voltage distribution over the scalp. Associating voltage deflections with a circumscribed topography and functionality leads to the notion of an ERP component. An ERP component is a voltage deflection with a characteristic polarity, latency, amplitude, and topography (together called ‘morphology’) and with typical conditions of its elicitation that allude to the functionality of the underlying process. If an experimental manipulation taps into only few sensory or cognitive processes that are widely distributed in time, the relevant ERP components can be identified by means of the observable voltage peaks. Most of the time, however, processes will overlap such that there is no one-to-one mapping between ERP components and voltage peaks (see Luck 2005, for a thorough treatment of this issue). In this case, it may help to compare the voltage course between two experimental conditions by analyzing the difference wave between the ERPs elicited in the two conditions. Assuming that the processing in the two conditions is similar in many respects, deflections in the difference wave reveal components that are unique to one of the conditions, or that differ in their parameters between conditions. The comparison of ERPs elicited in different experimental conditions is actually the major strength of the ERP technique, as it allows for determining (1) whether the brain distinguished between two classes of events, and (2) at which point in time the divergence occurred at the latest (bearing in mind that parts of the neuronal activity are not captured by the EEG, it is impossible to exclude differentiation at earlier stages). The inference of a processing difference is important in its own right, but it becomes even more significant when the difference occurs in the latency range of a well-known component linked with a specific function. An overview of typically elicited components in cognitive paradigms is provided in the second section of this appendix. In addition to ERPs, there is a second main approach for the analysis of EEG recordings: the transformation into frequency space to derive measures of oscillatory activity. Oscillations occur in the spontaneous EEG (i.e., uncorrelated with the experimental conditions), but also in systematic relation to internal and external events. Event-related oscillatory activity can be found with or without timeand phase-locking to the onset of the relevant event. The former is called ‘evoked’ oscillatory activity and is visible in the average ERP. The latter is called ‘induced’ activity; it results from latency shifts of oscillatory bursts in single trials and
264 Alexandra Bendixen
averages out in ERP calculation. Special analysis methods are needed to reveal oscillatory activity. Wavelet analysis is a prominent approach; for illustrative descriptions, see Tallon-Baudry and Bertrand (1999) as well as Herrmann et al. (2005). The analysis of oscillatory event-related activity significantly complements the information provided by ERPs. For instance, oscillations can reflect synchronized activity patterns of neuronal assemblies coding features that belong to the same object (Eckhorn et al. 1988; Gray et al. 1989). The frequency band of this synchronized activity lies in the range of 30–100 Hz (the so-called gamma band). Beyond feature binding, oscillations in different frequency bands are associated with broader cognitive functions including perception, attention, and memory (Başar et al. 2001; Jensen et al. 2007; Ward 2003). For further details about the analysis and interpretation of oscillatory EEG activity, the reader is referred to Engel et al. (2001), Herrmann et al. (2005), and Ward (2003). Both ERPs and oscillatory activity can be characterized by their topographical scalp distribution, including the identification of electrodes with maximum amplitudes. However, obtaining an EEG signal at a given electrode is not equivalent to the brain areas underneath the electrode generating this signal. In principle, an infinite number of generator configurations can account for a given activation on the scalp (electromagnetic inverse problem; Helmholtz 1853). This ambiguity can only be overcome by introducing prior assumptions and constraints. Different source modeling approaches (for review, see Michel et al. 2004) have yielded encouraging results from source analyses of EEG recordings, especially with high-density electrode montages (Lantz et al. 2003). Nevertheless, inferences on generator localizations should always be conceived as approximations. When spatial information is in the focus of interest, the EEG can be complemented by brain imaging techniques such as functional magnetic resonance imaging (fMRI), positron emission tomography (PET), and optical imaging (event-related optical signal, EROS; near-infrared spectroscopy, NIRS). Another possibility, which is closest to the EEG in the activity that it captures, is the magnetoencephalogram (MEG; Hämäläinen et al. 1993). MEG exploits the fact that every current is surrounded by a magnetic field. In recording this magnetic field, MEG provides information that can easily be related to EEG, and can be analyzed by the same principles. Every ERP component has an MEG counterpart, usually denoted by a small ‘m’ after the name of the component (e.g., ‘N1m’). MEG provides fine spatial information unless deep sources are involved (Hillebrand and Barnes 2002). Yet the superior localization in MEG comes at the cost of a much more expensive measurement procedure. In spite of the poor spatial resolution of the EEG and some other limitations of the technique (partly mentioned above; for comprehensive treatments, see Handy 2005; Luck 2005), important conclusions can be derived from electro-
Appendix 265
physiological measures when interpretations are made with due caution. Since the majority of findings reported in this book are based on ERPs, the remainder of this appendix introduces major ERP components that are typically found in cognitive paradigms. The description focuses on components elicited by visual and auditory stimuli, as these are much better studied than the other senses.
10.2
Event-related potential components (ERPs)
ERP components can be divided into ones preceding and ones following a specified event (i.e., a stimulus or response). Event-preceding components include the readiness potential (RP) with its lateralized aspect (LRP), and the contingent negative variation (CNV). The RP (also called ‘Bereitschaftspotential’) is a slow negative potential preceding a response; it indicates the preparation of a voluntary movement (Kornhuber and Deecke 1965). If the responding hand (left vs. right) is known in advance, the RP is larger at contralateral than at ipsilateral electrode sites, which can be exploited to calculate the lateralized readiness potential (LRP) as a measure of differential response activation (Coles 1989). The LRP is a powerful tool for investigating whether the information provided by a stimulus was processed far enough to reach the motor system, even when a response is never overtly initiated (e.g., Verleger in this volume). The CNV likewise is a slow negative potential with preparatory or anticipatory characteristics (Walter et al. 1964). The CNV precedes a stimulus rather than a response, but it may also contain a motor component in case the stimulus calls for a response (Brunia and van Boxtel 2001). For recent reviews on anticipatory negativities including CNV subcomponents and their associated functionalities, see Leuthold et al. (2004), Macar and Vidal (2004), as well as van Boxtel and Böcker (2004). After the presentation of a stimulus, characteristic ERP components are elicited that can be categorized into ‘exogenous’ and ‘endogenous’ ones. Exogenous components occur until ca. 200 ms after stimulus onset. They are modality-specific and are assumed to constitute an obligatory response to the external stimulus, reflecting information transmission from the peripheral system to cortical areas as well as sensory cortical processing. In contrast, endogenous components are less sensitive to physical stimulus properties and more affected by higher-order processes resulting from psychological states of the participant such as the allocation of attention. The independence of endogenous components from physical stimulus parameters is most clearly illustrated by the fact that some can be elicited even by the unexpected absence of an event (Sutton et al. 1967). Endogenous components cover the time range of 100–500 ms after stimulus onset (later responses are considered ‘slow’ potentials, merging into the
266 Alexandra Bendixen
preparation- and anticipation-related components mentioned above as eventpreceding components). Yet the clear distinction between endogenous and exogenous components does not hold: Almost all components share characteristics of both groups, being influenced by both external and internal factors. Therefore, the endogenous-exogenous dimension can be seen as a continuum rather than a discrete classification (Coles and Rugg 1995), with components occurring at longer latencies usually being influenced to a higher degree by cognitive factors. The early, ‘sensory’ components include the brainstem responses (ABR) (e.g., Miller et al. 2008), mid-latency responses (MLR) (e.g., Yvert et al. 2001), P1 / P50 (e.g., Nagamoto et al. 1991) and N1 (Näätänen and Picton 1987) in audition, as well as C1 (Jeffreys and Axford 1972a, 1972b), P1 (e.g., Luck et al. 1990) and N1 (e.g., Vogel and Luck 2000) in vision (see also Di Russo et al. 2002). Stimulus processing continues with the elicitation of the P2 (e.g., Crowley and Colrain 2004) and N2 (e.g., Folstein and Van Petten 2008) components between 100 and 200 ms post-stimulus. At the same time, some additional ‘cognitive’ components emerge that are associated with processing specifics depending on attentional and contextual factors. Among these are the processing negativity (PN) (Näätänen, Gaillard, and Mäntysalo 1978), the negative difference (Nd) (Hillyard et al. 1973), and the N2pc (Luck and Hillyard 1994), all indicating sustained processing differences driven by selective attention (e.g., Alho et al. 1986; Muller-Gass and Campbell 2002). In this volume, both Czigler and Verleger illustrate how the N2pc can be used to answer questions of unconscious memory. Negativities of a more transient nature are elicited by stimuli that deviate from a given context; these include the mismatch negativity (MMN) (Näätänen et al. 1978) and the N2b (Näätänen et al. 1982). MMN (sometimes also called N2a) is elicited when an auditory or visual stimulus violates a regularity that was set up by previous stimuli (Winkler 2007). As MMN elicitation is largely independent of the participant’s attention (Sussman 2007), the component lends itself to the study of unconscious phenomena (see Czigler; Huotilainen and Teinonen; Koelsch; Shtyrov and Pulvermüller; and Winkler in this volume). N2b is more dependent on attention; it indicates that a stimulus has been classified as a deviant event (Novak et al. 1990). If the deviant status of an event is not only defined by a regularity violation but also by some physical deviation from the preceding stimuli, deviance-related components of a more sensory character emerge, such as an N1 enhancement in audition (e.g., Jääskeläinen et al. 2004) and the changerelated positivity (CRP) in vision (Czigler in this volume; Kimura et al. 2006). Components associated with deviance detection are often followed by positivities in the P3 latency range. These positive components reflect a number of different processes and have been given different names in different contexts (P300, P3a, P3b, novelty P3, target P3, and the like). P3a and novelty P3 indicate an
Appendix 267
involuntary shift of attention towards a deviating or novel event (Friedman et al. 2001). Recent data suggest that the two components are identical (Polich and Criado 2006), their major characteristic being that the attention-capturing event was previously unattended. In contrast, P3b is elicited by task-relevant stimuli (Knight and Scabini 1998) that require a decision or response, as the alternative term ‘target P3’ implies. The name ‘P300’ itself (which is somewhat misleading in view of considerable latency variations between experiments) is mostly used synonymously with ‘P3b’. Partly because of its diversity, the debate on the functional significance of the P3 and its subcomponents is still unsettled (e.g., Barcelo et al. 2006; Linden 2005; Verleger 2008). For a discussion of how P3 may be related to conscious stimulus perception, see Verleger (this volume). In many cases, interpretations are not based on the elicitation of a component per se, but on its modulation by experimental manipulations. For instance, Shtyrov and Pulvermüller (this volume) show how MMN modulations can indicate certain aspects of language processing. Studies with linguistic material have also revealed a number of specifically language-related ERP components (the early left anterior negativity, ELAN; the left anterior negativity, LAN; the N400; and the P600), indicating the detection of syntactic or semantic violations and their re-integration into the linguistic context (Friederici 2002). Shtyrov and Pulvermüller (this volume) provide an overview of these components; Verleger (this volume) illustrates how the N400 can be applied to study awareness. In the domain of music processing, a similarly specific component is given by the early right anterior negativity (ERAN) indicating violations of musical syntax (Koelsch in this volume). There are numerous other components that are found in specific contexts. For instance, the N170 is a negative potential at (predominantly right) lateral occipital electrode sites elicited by face relative to non-face stimuli (cf. its application by Jacobsen in this volume). Many more components could be mentioned, but this short outline hopefully provides guidance along the main electrophysiological findings in this book. For further descriptions of relevant ERP components, the reader may consult Coles and Rugg (1995), Fabiani et al. (2007), Fonaryova Key et al. (2005), Luck (2005), as well as Muller-Gass and Campbell (2002).
Acknowledgements This work was supported by the German Research Foundation (DFG, BE 4284/ 1-1).
268 Alexandra Bendixen
References Alho, K., Paavilainen, P., Reinikainen, K., Sams, M. and Näätänen, R. (1986). Separability of different negative components of the event-related potential associated with auditory stimulus-processing. Psychophysiology, 23, 613–623. Allison, T., Wood, C. C. and McCarthy, G. M. (1986). The central nervous system. In M. G. H. Coles, E. Donchin and S. W. Porges (Eds.), Psychophysiology: Systems, processes, and applications (pp. 5–25). New York: Guilford. Atienza, M., Cantero, J. L. and Quian Quiroga, R. (2005). Precise timing accounts for posttraining sleep-dependent enhancements of the auditory mismatch negativity. Neuroimage, 26, 628–634. Barcelo, F., Escera, C., Corral, M.-J. and Periañez, J. A. (2006). Task switching and novelty processing activate a common neural network for cognitive control. Journal of Cognitive Neuroscience, 18, 1734–1748. Başar, E., Başar-Eroglu, C., Karakaş, S. and Schürmann, M. (2001). Gamma, alpha, delta, and theta oscillations govern cognitive processes. International Journal of Psychophysiology, 39, 241–248. Berger, H. (1929). Über das Elektroenkephalogramm des Menschen [On the human electroencephalogram]. Archiv für Psychiatrie und Nervenkrankheiten, 87, 527–570. Brunia, C. H. M. and van Boxtel, G. J. M. (2001). Wait and see. International Journal of Psychophysiology, 43, 59–75. Chatrian, G. E., Lettich, E. and Nelson, P. L. (1985). Ten percent electrode system for topographic studies of spontaneous and evoked EEG activities. American Journal of EEG Technology, 25, 83–92. Coles, M. G. H. (1989). Modern mind-brain reading: Psychophysiology, physiology, and cognition. Psychophysiology, 26, 251–269. Coles, M. G. H. and Rugg, M. D. (1995). Event-related brain potentials: An introduction. In M. D. Rugg and M. G. H. Coles (Eds.), Electrophysiology of mind: Event-related brain potentials and cognition (pp. 1–26). Oxford: Oxford University Press. Crowley, K. E. and Colrain, I. M. (2004). A review of the evidence for P2 being an independent component process: Age, sleep and modality. Clinical Neurophysiology, 115, 732–744. Di Russo, F., Martinez, A., Sereno, M. I., Pitzalis, S. and Hillyard, S. A. (2002). Cortical sources of the early components of the visual evoked potential. Human Brain Mapping, 15, 95– 111. Eckhorn, R., Bauer, R., Jordan, W., Brosch, M., Kruse, W., Munk, M. and Reitboeck, H. J. (1988). Coherent oscillations: A mechanism of feature linking in the visual cortex? Multiple electrode and correlation analyses in the cat. Biological Cybernetics, 60, 121–130. Engel, A. K., Fries, P. and Singer, W. (2001). Dynamic predictions: Oscillations and synchrony in top-down processing. Nature Reviews Neuroscience, 2, 704–716. Fabiani, M., Gratton, G. and Federmeier, K. D. (2007). Event-related brain potentials: Methods, theory, and applications. In J. T. Cacioppo, L. G. Tassinary and G. G. Berntson (Eds.), Handbook of psychophysiology (3rd ed., pp. 85–119). Cambridge: Cambridge University Press. Folstein, J. R. and Van Petten, C. (2008). Influence of cognitive control and mismatch on the N2 component of the ERP: A review. Psychophysiology, 45, 152–170.
Appendix 269
Fonaryova Key, A. P., Dove, G. O. and Maguire, M. J. (2005). Linking brainwaves to the brain: An ERP primer. Developmental Neuropsychology, 27, 183–215. Friederici, A. D. (2002). Towards a neural basis of auditory sentence processing. Trends in Cognitive Sciences, 6, 78–84. Friedman, D., Cycowicz, Y. M. and Gaeta, H. (2001). The novelty P3: An event-related brain potential (ERP) sign of the brain’s evaluation of novelty. Neuroscience and Biobehavioral Reviews, 25, 355–373. Gray, C. M., König, P., Engel, A. K. and Singer, W. (1989). Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature, 338, 334–337. Hämäläinen, M., Hari, R., Ilmoniemi, R. J., Knuutila, J. and Lounasmaa, O. V. (1993). Magnetoencephalography – theory, instrumentation, and applications to noninvasive studies of the working human brain. Reviews of Modern Physics, 65, 413–497. Handy, T. C. (Ed.). (2005). Event-related potentials: A methods handbook. Cambridge: MIT Press. Helmholtz, H. (1853). Ueber einige Gesetze der Vertheilung elektrischer Ströme in körperlichen Leitern mit Anwendung auf die thierisch-elektrischen Versuche [Some laws concerning the distribution of electric currents in volume conductors with applications to experiments on animal electricity]. Annalen der Physik, 165, 211–233. Herrmann, C. S., Grigutsch, M. and Busch, N. A. (2005). EEG oscillations and wavelet analysis. In T. C. Handy (Ed.), Event-related potentials: A methods handbook (pp. 229–259). Cambridge: MIT Press. Hillebrand, A. and Barnes, G. R. (2002). A quantitative assessment of the sensitivity of wholehead MEG to activity in the adult human cortex. Neuroimage, 16, 638–650. Hillyard, S. A., Hink, R. F., Schwent, V. L. and Picton, T. W. (1973). Electrical signs of selective attention in the human brain. Science, 182, 177–180. Jääskeläinen, I. P., Ahveninen, J., Bonmassar, G., Dale, A. M., Ilmoniemi, R. J., Levänen, S., Lin, F.-H., May, P., Melcher, J., Stufflebeam, S., Tiitinen, H. and Belliveau, J. W. (2004). Human posterior auditory cortex gates novel sounds to consciousness. Proceedings of the National Academy of Sciences of the United States of America, 101, 6809–6814. Jasper, H. H. (1958). The ten-twenty electrode system of the International Federation. Electroencephalography and Clinical Neurophysiology, 10, 371–375. Jeffreys, D. A. and Axford, J. G. (1972a). Source locations of pattern-specific components of human visual evoked potentials. I. Component of striate cortical origin. Experimental Brain Research, 16, 1–21. Jeffreys, D. A. and Axford, J. G. (1972b). Source locations of pattern-specific components of human visual evoked potentials. II. Component of extrastriate cortical origin. Experimental Brain Research, 16, 22–40. Jensen, O., Kaiser, J. and Lachaux, J.-P. (2007). Human gamma-frequency oscillations associated with attention and memory. Trends in Neurosciences, 30, 317–324. Jongsma, M. L. A., Eichele, T., Van Rijn, C. M., Coenen, A. M. L., Hugdahl, K., Nordby, H. and Quian Quiroga, R. (2006). Tracking pattern learning with single-trial event-related potentials. Clinical Neurophysiology, 117, 1957–1973. Kimura, M., Katayama, J. and Murohashi, H. (2006). Probability-independent and -dependent ERPs reflecting visual change detection. Psychophysiology, 43, 180–189. Knight, R. T. and Scabini, D. (1998). Anatomic bases of event-related potentials and their relationship to novelty detection in humans. Journal of Clinical Neurophysiology, 15, 3–13.
270 Alexandra Bendixen
Kornhuber, H. H. and Deecke, L. (1965). Hirnpotentialänderungen bei Willkürbewegungen und passiven Bewegungen des Menschen: Bereitschaftspotential und reafferente Potentiale [Brain potential changes with voluntary and passive human movement: Readiness potential and reafferent potentials]. Pflügers Archiv, 284, 1–17. Lantz, G., Grave de Peralta, R., Spinelli, L., Seeck, M. and Michel, C. M. (2003). Epileptic source localization with high density EEG: How many electrodes are needed? Clinical Neurophysiology, 114, 63–69. Leuthold, H., Sommer, W. and Ulrich, R. (2004). Preparing for action: Inferences from CNV and LRP. Journal of Psychophysiology, 18, 77–88. Linden, D. E. J. (2005). The P300: Where in the brain is it produced and what does it tell us? Neuroscientist, 11, 563–576. Luck, S. J. (2005). An introduction to the event-related potential technique. Cambridge, MA: MIT Press. Luck, S. J., Heinze, H. J., Mangun, G. R. and Hillyard, S. A. (1990). Visual event-related potentials index focused attention within bilateral stimulus arrays. II. Functional dissociation of P1 and N1 components. Electroencephalography and Clinical Neurophysiology, 75, 528–542. Luck, S. J. and Hillyard, S. A. (1994). Spatial filtering during visual search: Evidence from human electrophysiology. Journal of Experimental Psychology: Human Perception and Performance, 20, 1000–1014. Macar, F. and Vidal, F. (2004). Event-related potentials as indices of time processing: A review. Journal of Psychophysiology, 18, 89–104. Michel, C. M., Murray, M. M., Lantz, G., Gonzalez, S., Spinelli, L. and Grave de Peralta, R. (2004). EEG source imaging. Clinical Neurophysiology, 115, 2195–2222. Miller, C. A., Brown, C. J., Abbas, P. J. and Chi, S.-L. (2008). The clinical application of potentials evoked from the peripheral auditory system. Hearing Research, 242, 184–197. Muller-Gass, A. and Campbell, K. (2002). Event-related potential measures of the inhibition of information processing: I. Selective attention in the waking state. International Journal of Psychophysiology, 46, 177–195. Näätänen, R., Gaillard, A. W. K. and Mäntysalo, S. (1978). Early selective-attention effect on evoked potential reinterpreted. Acta Psychologica, 42, 313–329. Näätänen, R. and Picton, T. (1987). The N1 wave of the human electric and magnetic response to sound: A review and an analysis of the component structure. Psychophysiology, 24, 375–425. Näätänen, R., Simpson, M. and Loveless, N. E. (1982). Stimulus deviance and evoked potentials. Biological Psychology, 14, 53–98. Nagamoto, H. T., Adler, L. E., Waldo, M. C., Griffith, J. and Freedman, R. (1991). Gating of auditory response in Schizophrenics and normal controls. Effects of recording site and stimulation interval on the P50 wave. Schizophrenia Research, 4, 31–40. Novak, G. P., Ritter, W., Vaughan, H. G. and Wiznitzer, M. L. (1990). Differentiation of negative event-related potentials in an auditory discrimination task. Electroencephalography and Clinical Neurophysiology, 75, 255–275. Nuwer, M. R., Comi, G., Emerson, R., Fuglsang-Frederiksen, A., Guérit, J. M., Hinrichs, H., Ikeda, A., Luccas, F. J. C. and Rappelsburger, P. (1998). IFCN standards for digital recording of clinical EEG. Electroencephalography and Clinical Neurophysiology, 106, 259–261. Oostenveld, R. and Praamstra, P. (2001). The five percent electrode system for high-resolution EEG and ERP measurements. Clinical Neurophysiology, 112, 713–719.
Appendix 271
Picton, T. W., Bentin, S., Berg, P., Donchin, E., Hillyard, S. A., Johnson, R., Jr., Miller, G. A., Ritter, W., Ruchkin, D. S., Rugg, M. D. and Taylor, M. J. (2000). Guidelines for using human event-related potentials to study cognition: Recording standards and publication criteria. Psychophysiology, 37, 127–152. Polich, J. and Criado, J. R. (2006). Neuropsychology and neuropharmacology of P3a and P3b. International Journal of Psychophysiology, 60, 172–185. Rugg, M. D. and Coles, M. G. H. (Eds.). (1995). Electrophysiology of mind: Event-related brain potentials and cognition. Oxford: Oxford University Press. Spencer, K. M. (2005). Averaging, detection, and classification of single-trial ERPs. In T. C. Handy (Ed.), Event-related potentials: A methods handbook (pp. 209–227). Cambridge: MIT Press. Sussman, E. S. (2007). A new view on the MMN and attention debate: The role of context in processing auditory events. Journal of Psychophysiology, 21, 164–175. Sutton, S., Tueting, P., Zubin, J. and John, E. R. (1967). Information delivery and the sensory evoked potential. Science, 155, 1436–1439. Tallon-Baudry, C. and Bertrand, O. (1999). Oscillatory gamma activity in humans and its role in object representation. Trends in Cognitive Sciences, 3, 151–162. van Boxtel, G. J. M. and Böcker, K. B. E. (2004). Cortical measures of anticipation. Journal of Psychophysiology, 18, 61–76. Verleger, R. (2008). P3b: Towards some decision about memory. Clinical Neurophysiology, 119, 968–970. Vogel, E. K. and Luck, S. J. (2000). The visual N1 component as an index of a discrimination process. Psychophysiology, 37, 190–203. Walter, W. G., Cooper, R., Aldridge, V. J., McCallum, W. C. and Winter, A. L. (1964). Contingent negative variation: An electric sign of sensorimotor association and expectancy in the human brain. Nature, 203, 380–384. Ward, L. M. (2003). Synchronous neural oscillations and cognitive processes. Trends in Cognitive Sciences, 7, 553–559. Winkler, I. (2007). Interpreting the mismatch negativity. Journal of Psychophysiology, 21, 147– 163. Winkler, I., Schröger, E. and Cowan, N. (2001). The role of large-scale memory organization in the mismatch negativity event-related brain potential. Journal of Cognitive Neuroscience, 13, 59–71. Yvert, B., Crouzeix, A., Bertrand, O., Seither-Preisler, A. and Pantev, C. (2001). Multiple supratemporal sources of magnetic and electric auditory evoked middle latency components in humans. Cerebral Cortex, 11, 411–423.
Index
A aesthetic processing 245–246, 251 articulatory movements 197 attentional blink 37, 48–52, 54, 57–58, 63–64, 114 audio-visual 187 auditory objects 71, 73, 77, 80, 85, 87–88, 96, 99, 223, 226, 235 auditory streaming 74, 80–81, 83–85, 91–95, 168, 170–171 B Bayesian belief propagation 155, 157, 175 inference 148–149, 151, 153, 155–158, 171, 173–175, 177, 263 blindsight 24, 26 C chord-sequence 230 cognitive fluency 251 coherence field 112 contralateral neglect 24 D downstream processes 43 E electroencephalography 153, 214 embedded-processes model 15 event-related potential (ERP) 38–40, 43, 46–49, 51–52, 54, 56, 58, 60, 62, 64, 73–76, 84, 98, 113, 126, 129, 140, 184, 209, 211, 220,
226–227, 229–230, 233– 234, 249–250, 255, 261–265 Bereitschaftspotential 265, 270 C1 260, 266 CNV (contingent negative variation) 265, 270–271 ELAN (early left anterior negativity) 181, 183, 193, 211, 229, 267 ERAN 210–212, 215, 219–226, 228–237, 267 LAN 226, 267 LRP (lateralized readiness potential) 58, 60–62, 265 MMN (mismatch negativity) 74–80, 82–85, 87–90, 94–100, 117, 119, 121, 124, 137–140, 182–185, 209, 211, 219, 222–226, 232, 234, 249, 266–268 N1 38, 46, 48, 51–52, 62, 183, 262, 266 N170 68, 249, 267 N2 41, 43, 46, 50–51, 60, 63, 266 N250 44 N2b 263, 266 N2pc 46, 48, 53–56, 60, 63, 116, 266 N400 51–52, 67–68, 137, 181, 183–184, 190, 198–199, 211, 267 N450 44 P1 41–44, 46, 48, 51–52, 60, 62, 183, 229, 266 P250 52 P2 51–52, 211, 266 P300 57, 137, 153, 204, 211, 266–267
P3 38–44, 49–51, 54, 57–58, 63–64, 233, 266–267 P3a 115, 137, 266 P3b 115, 137, 233, 266–267 P600 181, 183–184, 193, 199, 267 PN (processing negativity) 266 novelty P3 266 target P3 266–267 executive function 3 F feature binding 77–80, 120, 141, 264 feature conjunctions 78–80, 148 G generative modeling 151 global workspace theory 10, 23 H harmonic hierarchy 214 I inattentional blindness 18, 113 intracranial recordings 39 L lexical processing 180–181, 197 M magnetic resonance imaging (fMRI) 15, 116, 152, 159, 188, 197, 214, 228–229, 233, 235, 264 magnetoencephalography (MEG) 51, 57, 139–140, 180, 188–189, 191–192, 197, 217, 228–229, 264
274 Unconscious Memory Representations in Perception
Markov models 162 memory 1–25, 27–30, 69–70, 80, 82, 109–113, 123–124, 157, 189, 247–253 auditory sensory 75–76, 104, 137, 182, 237–238 episodic 2, 4, 6, 31, 77, 251, 253 implicit 110, 117, 123 semantic 6, 12–13, 22, 113– 114, 179–181, 186, 189–190, 195, 199, 248–251, 267 sensory 5, 8, 14, 76, 109, 137–138, 235, 271 short-term conceptual 113, 130 working 1–2, 4–19, 21, 23–25, 27–28, 48, 63 morpho-syntactic information processing 191 multi-scale analysis 149–150, 171 music-syntactic 212
N neapolitan chords 210, 219– 221, 231, 234–235 neonatal auditory system 138– 139 O object related 84 oddball stimuli 40, 65 P perceptual object 72–73, 77, 98–99, 147 perceptual organisation 147– 148 phonetic and phonological mismatch 187 phonological 6, 103, 186, 195 predictive coding 154–157, 162, 176 priming 20–22, 198 primitive intelligence 107, 123 probabilistic 148, 151, 153, 158–159
procedural knowledge 249 pseudowords 135–136, 138, 186, 194, 197–198 R regularity violations 76 rule-learning 135 S somatotopy model 189 statistical learning 135–136, 145 stream segregation 84–85, 92, 96, 140–141 subliminal perception 22, 65 T temporally persistent representations 147 temporary buffer 1 V visuo-spatial buffers 4
Advances in Consciousness Research
A complete list of titles in this series can be found on the publishers’ website, www.benjamins.com 79 Perry, Elaine, Daniel Collerton, Heather Ashton and Fiona E.N. LeBeau (eds.): Conscious Nonconscious Connections. Neuroscience in Mind. Forthcoming 78 Czigler, István and István Winkler (eds.): Unconscious Memory Representations in Perception. Processes and mechanisms in the brain. 2010. x, 274 pp. 77 Globus, Gordon G.: The Transparent Becoming of World. A crossing between process philosophy and quantum neurophilosophy. 2009. xiii, 169 pp. 76 Bråten, Stein: The Intersubjective Mirror in Infant Learning and Evolution of Speech. 2009. xxii, 351 pp. 75 Skrbina, David (ed.): Mind that Abides. Panpsychism in the new millennium. 2009. xiv, 401 pp. 74 Cañamero, Lola and Ruth Aylett (eds.): Animating Expressive Characters for Social Interaction. 2008. xxiii, 296 pp. 73 Hardcastle, Valerie Gray: Constructing the Self. 2008. xi, 186 pp. 72 Janzen, Greg: The Reflexive Nature of Consciousness. 2008. vii, 186 pp. 71 Krois, John Michael, Mats Rosengren, Angela Steidele and Dirk Westerkamp (eds.): Embodiment in Cognition and Culture. 2007. xxii, 304 pp. 70 Rakover, Sam S.: To Understand a Cat. Methodology and philosophy. 2007. xviii, 253 pp. 69 Kuczynski, John-Michael: Conceptual Atomism and the Computational Theory of Mind. A defense of content-internalism and semantic externalism. 2007. x, 524 pp. 68 Bråten, Stein (ed.): On Being Moved. From mirror neurons to empathy. 2007. x, 333 pp. 67 Albertazzi, Liliana (ed.): Visual Thought. The depictive space of perception. 2006. xii, 380 pp. 66 Vecchi, Tomaso and Gabriella Bottini (eds.): Imagery and Spatial Cognition. Methods, models and cognitive assessment. 2006. xiv, 436 pp. 65 Shaumyan, Sebastian: Signs, Mind, and Reality. A theory of language as the folk model of the world. 2006. xxvii, 315 pp. 64 Hurlburt, Russell T. and Christopher L. Heavey: Exploring Inner Experience. The descriptive experience sampling method. 2006. xii, 276 pp. 63 Bartsch, Renate: Memory and Understanding. Concept formation in Proust’s A la recherche du temps perdu. 2005. x, 160 pp. 62 De Preester, Helena and Veroniek Knockaert (eds.): Body Image and Body Schema. Interdisciplinary perspectives on the body. 2005. x, 346 pp. 61 Ellis, Ralph D.: Curious Emotions. Roots of consciousness and personality in motivated action. 2005. viii, 240 pp. 60 Dietrich, Eric and Valerie Gray Hardcastle: Sisyphus’s Boulder. Consciousness and the limits of the knowable. 2005. xii, 136 pp. 59 Zahavi, Dan, Thor Grünbaum and Josef Parnas (eds.): The Structure and Development of SelfConsciousness. Interdisciplinary perspectives. 2004. xiv, 162 pp. 58 Globus, Gordon G., Karl H. Pribram and Giuseppe Vitiello (eds.): Brain and Being. At the boundary between science, philosophy, language and arts. 2004. xii, 350 pp. 57 Wildgen, Wolfgang: The Evolution of Human Language. Scenarios, principles, and cultural dynamics. 2004. xii, 240 pp. 56 Gennaro, Rocco J. (ed.): Higher-Order Theories of Consciousness. An Anthology. 2004. xii, 371 pp. 55 Peruzzi, Alberto (ed.): Mind and Causality. 2004. xiv, 235 pp. 54 Beauregard, Mario (ed.): Consciousness, Emotional Self-Regulation and the Brain. 2004. xii, 294 pp. 53 Hatwell, Yvette, Arlette Streri and Edouard Gentaz (eds.): Touching for Knowing. Cognitive psychology of haptic manual perception. 2003. x, 322 pp. 52 Northoff, Georg: Philosophy of the Brain. The brain problem. 2004. x, 433 pp. 51 Droege, Paula: Caging the Beast. A theory of sensory consciousness. 2003. x, 183 pp. 50 Globus, Gordon G.: Quantum Closures and Disclosures. Thinking-together postphenomenology and quantum brain dynamics. 2003. xxii, 200 pp. 49 Osaka, Naoyuki (ed.): Neural Basis of Consciousness. 2003. viii, 227 pp. 48 Jiménez, Luis (ed.): Attention and Implicit Learning. 2003. x, 385 pp. 47 Cook, Norman D.: Tone of Voice and Mind. The connections between intonation, emotion, cognition and consciousness. 2002. x, 293 pp. 46 Mateas, Michael and Phoebe Sengers (eds.): Narrative Intelligence. 2003. viii, 342 pp.
45 Dokic, Jérôme and Joëlle Proust (eds.): Simulation and Knowledge of Action. 2002. xxii, 271 pp. 44 Moore, Simon C. and Mike Oaksford (eds.): Emotional Cognition. From brain to behaviour. 2002. vi, 350 pp. 43 Depraz, Nathalie, Francisco J. Varela and Pierre Vermersch: On Becoming Aware. A pragmatics of experiencing. 2003. viii, 283 pp. 42 Stamenov, Maxim I. and Vittorio Gallese (eds.): Mirror Neurons and the Evolution of Brain and Language. 2002. viii, 392 pp. 41 Albertazzi, Liliana (ed.): Unfolding Perceptual Continua. 2002. vi, 296 pp. 40 Mandler, George: Consciousness Recovered. Psychological functions and origins of conscious thought. 2002. xii, 142 pp. 39 Bartsch, Renate: Consciousness Emerging. The dynamics of perception, imagination, action, memory, thought, and language. 2002. x, 258 pp. 38 Salzarulo, Piero and Gianluca Ficca (eds.): Awakening and Sleep–Wake Cycle Across Development. 2002. vi, 283 pp. 37 Pylkkänen, Paavo and Tere Vadén (eds.): Dimensions of Conscious Experience. 2001. xiv, 209 pp. 36 Perry, Elaine, Heather Ashton and Allan H. Young (eds.): Neurochemistry of Consciousness. Neurotransmitters in mind. With a foreword by Susan Greenfield. 2002. xii, 344 pp. 35 Mc Kevitt, Paul, Seán Ó Nualláin and Conn Mulvihill (eds.): Language, Vision and Music. Selected papers from the 8th International Workshop on the Cognitive Science of Natural Language Processing, Galway, 1999. 2002. xii, 433 pp. 34 Fetzer, James H. (ed.): Consciousness Evolving. 2002. xx, 253 pp. 33 Yasue, Kunio, Mari Jibu and Tarcisio Della Senta (eds.): No Matter, Never Mind. Proceedings of Toward a Science of Consciousness: Fundamental approaches, Tokyo 1999. 2002. xvi, 391 pp. 32 Vitiello, Giuseppe: My Double Unveiled. The dissipative quantum model of brain. 2001. xvi, 163 pp. 31 Rakover, Sam S. and Baruch Cahlon: Face Recognition. Cognitive and computational processes. 2001. x, 306 pp. 30 Brook, Andrew and Richard C. DeVidi (eds.): Self-Reference and Self-Awareness. 2001. viii, 277 pp. 29 Van Loocke, Philip (ed.): The Physical Nature of Consciousness. 2001. viii, 321 pp. 28 Zachar, Peter: Psychological Concepts and Biological Psychiatry. A philosophical analysis. 2000. xx, 342 pp. 27 Gillett, Grant R. and John McMillan: Consciousness and Intentionality. 2001. x, 265 pp. 26 Ó Nualláin, Seán (ed.): Spatial Cognition. Foundations and applications. 2000. xvi, 366 pp. 25 Bachmann, Talis: Microgenetic Approach to the Conscious Mind. 2000. xiv, 300 pp. 24 Rovee-Collier, Carolyn, Harlene Hayne and Michael Colombo: The Development of Implicit and Explicit Memory. 2000. x, 324 pp. 23 Zahavi, Dan (ed.): Exploring the Self. Philosophical and psychopathological perspectives on selfexperience. 2000. viii, 301 pp. 22 Rossetti, Yves and Antti Revonsuo (eds.): Beyond Dissociation. Interaction between dissociated implicit and explicit processing. 2000. x, 372 pp. 21 Hutto, Daniel D.: Beyond Physicalism. 2000. xvi, 306 pp. 20 Kunzendorf, Robert G. and Benjamin Wallace (eds.): Individual Differences in Conscious Experience. 2000. xii, 412 pp. 19 Dautenhahn, Kerstin (ed.): Human Cognition and Social Agent Technology. 2000. xxiv, 448 pp. 18 Palmer, Gary B. and Debra J. Occhi (eds.): Languages of Sentiment. Cultural constructions of emotional substrates. 1999. vi, 272 pp. 17 Hutto, Daniel D.: The Presence of Mind. 1999. xiv, 252 pp. 16 Ellis, Ralph D. and Natika Newton (eds.): The Caldron of Consciousness. Motivation, affect and selforganization — An anthology. 2000. xxii, 276 pp. 15 Challis, Bradford H. and Boris M. Velichkovsky (eds.): Stratification in Cognition and Consciousness. 1999. viii, 293 pp. 14 Sheets-Johnstone, Maxine: The Primacy of Movement. 1999. xxxiv, 583 pp. 13 Velmans, Max (ed.): Investigating Phenomenal Consciousness. New methodologies and maps. 2000. xii, 381 pp.
12 Stamenov, Maxim I. (ed.): Language Structure, Discourse and the Access to Consciousness. 1997. xii, 364 pp. 11 Pylkkö, Pauli: The Aconceptual Mind. Heideggerian themes in holistic naturalism. 1998. xxvi, 297 pp. 10 Newton, Natika: Foundations of Understanding. 1996. x, 211 pp. 9 Ó Nualláin, Seán, Paul Mc Kevitt and Eoghan Mac Aogáin (eds.): Two Sciences of Mind. Readings in cognitive science and consciousness. 1997. xii, 490 pp. 8 Grossenbacher, Peter G. (ed.): Finding Consciousness in the Brain. A neurocognitive approach. 2001. xvi, 326 pp. 7 Mac Cormac, Earl and Maxim I. Stamenov (eds.): Fractals of Brain, Fractals of Mind. In search of a symmetry bond. 1996. x, 359 pp. 6 Gennaro, Rocco J.: Consciousness and Self-Consciousness. A defense of the higher-order thought theory of consciousness. 1996. x, 220 pp. 5 Stubenberg, Leopold: Consciousness and Qualia. 1998. x, 368 pp. 4 Hardcastle, Valerie Gray: Locating Consciousness. 1995. xviii, 266 pp. 3 Jibu, Mari and Kunio Yasue: Quantum Brain Dynamics and Consciousness. An introduction. 1995. xvi, 244 pp. 2 Ellis, Ralph D.: Questioning Consciousness. The interplay of imagery, cognition, and emotion in the human brain. 1995. viii, 262 pp. 1 Globus, Gordon G.: The Postmodern Brain. 1995. xii, 188 pp.