March 2011 Volume 15, Number 3 pp. 95–140 Editor Stavroula Kousta Executive Editor, Neuroscience Katja Brose
Update Book Review
95
Journal Manager Rolf van der Sanden Journal Administrator Myarca Bonsink Advisory Editorial Board R. Adolphs, Caltech, CA, USA R. Baillargeon, U. Illinois, IL, USA N. Chater, University College, London, UK P. Dayan, University College London, UK S. Dehaene, INSERM, France D. Dennett, Tufts U., MA, USA J. Driver, University College, London, UK Y. Dudai, Weizmann Institute, Israel A.K. Engel, Hamburg University, Germany M. Farah, U. Pennsylvania, PA, USA S. Fiske, Princeton U., NJ, USA A.D. Friederici, MPI, Leipzig, Germany O. Hikosaka, NIH, MD, USA R. Jackendoff, Tufts U., MA, USA P. Johnson-Laird, Princeton U., NJ, USA N. Kanwisher, MIT, MA, USA C. Koch, Caltech, CA, USA M. Kutas, UCSD, CA, USA N.K. Logothetis, MPI, Tübingen, Germany J.L. McClelland, Stanford U., CA, USA E.K. Miller, MIT, MA, USA E. Phelps, New York U., NY, USA R. Poldrack, U. Texas Austin, TX, USA M.E. Raichle, Washington U., MO, USA T.W. Robbins, U. Cambridge, UK A. Wagner, Stanford U., CA, USA V. Walsh, University College, London, UK
How does the brain make economic decisions? Review of: Foundations of Neuroeconomic Analysis (by Paul W. Glimcher)
Antonio Rangel
Opinion
97
What drives the organization of object knowledge in the brain?
104 Specifying the self for cognitive neuroscience
Bradford Z. Mahon and Alfonso Caramazza Kalina Christoff, Diego Cosmelli, Dorothée Legrand and Evan Thompson
Review
113 Songs to syntax: the linguistics of birdsong
Robert C. Berwick, Kazuo Okanoya, Gabriel J.L. Beckers and Johan J. Bolhuis
122 Representing multiple objects as an ensemble enhances visual cognition
George A. Alvarez
132 Cognitive neuroscience of self-regulation failure
Todd F. Heatherton and Dylan D. Wagner
Editorial Enquiries Trends in Cognitive Sciences
Cell Press 600 Technology Square Cambridge, MA 02139, USA Tel: +1 617 397 2817 Fax: +1 617 397 2810 E-mail:
[email protected] Forthcoming articles Implicit social cognition: from measures to mechanisms Brian A. Nosek, Carlee Beth Hawkins and Rebecca S. Frazier
Thalamic pathways for active vision Robert H. Wurtz, Kerry McAlonan, James Cavanaugh and Rebecca A. Berman
Posterior cingulate cortex: adapting behavior to a changing world John M. Pearson, Sarah R. Heilbronner, David L. Barack, Benjamin Y. Hayden and Michael L. Platt
Visual Crowding: a fundamental limit on conscious perception and object recognition David Whitney and Dennis M. Levi
Frontal Pole Cortex: encoding ends at the end of the endbrain Satoshi Tsujimoto, Aldo Genovesio and Steven P. Wise
Cover: Failing to control one's own behavior underlies several social and mental health problems. On pages 132–139 Todd F. Heatherton and Dylan D. Wagner review a large body of recent psychological and neuroscientific research on self-regulation failures, including addictive or hedonistic behavior, lack of emotional control, as well as stereotyping and prejudicial behavior. The authors propose a model of self-regulation that accounts for relf-regulation failures in terms of a loss of balance between prefrontal cortical regions that implement cognitive control and subcortical structures that drive appetitive behaviors. Although facetious, the cover image (Brett Lamb/iStock Vectors/Getty Images) powerfully demonstrates the detrimental effects of loss of control.
Update Book Review
How does the brain make economic decisions? Foundations of Neuroeconomic Analysis by Paul W. Glimcher. Oxford University Press, 2010. $69.95/£40.00 (488 pages) ISBN 978-0-19r-r974425-1.
Antonio Rangel Division of Humanities and Social Sciences & Computational and Neural Systems, Caltech, 1200 E. California Blvd, Pasadena, CA, USA
For millennia the quest to understand human nature and, in particular, why we behave the way we do, was mostly the domain of religion and philosophy. Over the last two centuries, this quest has become the domain of three scientific disciplines: behavioral neuroscience, psychology and economics. Although these disciplines share a common goal, their methodology and sensibilities are significantly different, which often leads to inconsistent and even contradictory explanations of the same behavioral phenomena. Consider, for example, the basic question of why some individuals become addicted whereas others do not. The most popular economic theory, called the rational addiction model [1], assumes that individuals become addicted as a result of maximizing a strong taste for consuming drugs in the short-term that also increases the desire to consume them in the future. By contrast, current neurobiological theories of addiction are based on the idea that consumption of drugs leads to a systematic malfunction of the brain’s reward learning systems, which induces addicted individuals to consume them even when it is not optimal to do so [2,3]. Neuroeconomics is a relatively new field that seeks to reconcile these conflicting theories of human behavior [4]. The goal of the field is to combine methods and theories from behavioral neuroscience, psychology, economics and computer science to answer the following basic questions: (i) What are the computations made by the brain to make different types of decisions? (ii) How does the underlying neurobiology implement and constrain those computations? (iii) What are the implications of this knowledge for understanding behavior in economic, clinical, policy and legal contexts? The ultimate goal of the field is to produce a computational and neurobiological account of decision-making that can serve as a common foundation for understanding human behavior across the natural and social sciences. In this sense, neuroeconomics can be thought of as the realization of the dream outlined by E.O. Wilson in Consilience: The Unity of Knowledge [5]. In Foundations of Neuroeconomic Analysis, Paul Glimcher, one of the founders of the field, outlines his vision for this ambitious research agenda. The book accomplishes several aims with remarkable effectiveness.
Corresponding author: Rangel, A. (
[email protected]).
First, it makes the case for bringing all of the parent fields together in a unified and interdisciplinary effort to understand human behavior simultaneously at multiple levels of analysis. Importantly, Glimcher argues that the benefits of this ‘unholy marriage’ flow in all directions: economists and psychologists will benefit from grounding their theories on the reality of how the brain actually makes decisions, and neuroscientists will benefit by being forced to understand the brain at the computational level. Glimcher forcefully argues that this effort will result in a synthetic theory of human behavior that will generate new critical insights for all of the parent disciplines. Second, the book provides a brilliant introduction to critical ideas in economics and psychology for neuroscientists, and to critical ideas in behavioral and perceptual neuroscience for economists and psychologists. For this reason alone, anyone considering doing research in the computational or neurobiological foundations of decisionmaking, and anyone interested in why we act the way we do (from lawyers to philosophers), should read this book. Third, the book reviews some critical findings in the field and argues that they already provide a glimpse of how a unified model of decision-making might look. For example, Glimcher argues that we have begun to understand how the brain computes values, makes choices by comparing those values, and learns those values through a process known as reinforcement learning. He also argues that the existing findings, together with some basic neuroscience ideas such as divisive normalization [6] (a principle explaining how the cortex integrates competing inputs to maximize encoded information while keeping neurons within bounded firing ranges), provide a computational and neurobiological implementation of economic concepts such as prospect theory or random utility. Although the ideas in this section of the book are controversial, they are also extremely thought-provoking. By necessity, this ambitious book also reflects some of the current shortcomings of this young field. For example, traditional economists are skeptical as to whether the field will provide transformative insights for their discipline [7], and at this early stage it is hard to provide concrete examples against this view. In addition, although I admire Glimcher’s attempt to begin sketching a synthetic model of choice, it can be argued that it might be too early to do so. For example, at this stage in our understanding of the brain’s decision-making circuitry, it is unclear how to reconcile the standard neuroeconomic model proposed in the book with evidence showing that behavior can be 95
Update influenced by at least three different behavioral controllers (called the Pavlovian, habitual and goal-directed controllers) that are often at odds with each other [4], or that that there might be multiple and competing value learning systems. These caveats notwithstanding, I was truly inspired by this book. It is an impressive piece of scholarly work by one of the world’s most prominent neuroeconomists. Although I have been working in the field for years, it has changed the way I think about many of the open questions we study. The book will probably stir up debate among the parent disciplines about the feasibility and virtues of the neuroeconomics approach. It is beautifully written, with a voice that is scholarly yet accessible at the same time. It will be of interest not only to those working in the field, but also to a wide audience of readers. Finally, I suspect that the thoughtfulness of its arguments and the passion of its rhetoric will inspire a new generation of researchers to stake their careers on the vision outlined
96
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
by its author. In fact, in many ways this book might do for neuroeconomics what David Marr’s Vision did for vision science [8]. References 1 Becker, G. and Murphy, K. (1988) A theory of rational addiction. J. Polit. Econ. 96, 675 2 Redish, A.D. (2004) Addiction as a computational process gone awry. Science 306, 1944–1947 3 Redish, A.D. et al. (2008) Addiction as vulnerabilities in the decision process. Behav. Brain Sci. 31, 461–487 4 Rangel, A. et al. (2008) A framework for studying the neurobiology of value-based decision making. Nat. Rev. Neurosci. 9, 545–556 5 Wilson, E.O. (1999) Consilience: The Unity of Knowledge, Vintage 6 Reynolds, J.H. and Heeger, D.J. (2009) The normalization model of attention. Neuron 61, 168–185 7 Bernheim, B.D. (2009) On the potential of neuroeconomics: a critical (but hopeful) appraisal. Am. Econ. J. Microecon. 1, 1–41 8 Marr, D. (1982) Vision, W.H. Freeman and Co. 1364-6613/$ – see front matter doi:10.1016/j.tics.2010.12.006 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
Opinion
What drives the organization of object knowledge in the brain? Bradford Z. Mahon1,2 and Alfonso Caramazza3,4 1
Department of Brain and Cognitive Sciences, Meliora Hall, University of Rochester, Rochester, NY 14627, USA Department of Neurosurgery, 601 Elmwood Ave, University of Rochester Medical Center, Rochester, NY 14642, USA 3 Department of Psychology, William James Hall, 33 Kirkland Street, Harvard University, Cambridge, MA 02138, USA 4 Center for Mind/Brain Sciences, University of Trento, Palazzo Fedrigotti, Corso Bettini 31, I-38068 Rovereto (TN), Italy 2
Various forms of category-specificity have been described at both the cognitive and neural levels, inviting the inference that different semantic domains are processed by distinct, dedicated mechanisms. In this paper, we argue for an extension of a domain-specific interpretation to these phenomena that is based on networklevel analyses of functional coupling among brain regions. On this view, domain-specificity in one region of the brain emerges because of innate connectivity with a network of regions that also process information about that domain. Recent findings are reviewed that converge with this framework, and a new direction is outlined for understanding the neural principles that shape the organization of conceptual knowledge.
Category-specificity as a means to study constraints on brain organization Brain-damaged patients with category-specific semantic impairments have conceptual level impairments that are specific to a category of items, such as animals, fruit/ vegetables, nonliving things or conspecifics. Detailed analysis of those patients (Box 1) suggests that conceptual knowledge is organized according to domain-specific constraints [1,2]. According to the domain-specific hypothesis [2], there are innately dedicated neural circuits for the efficient processing of a limited number of evolutionarily motivated domains of knowledge. This interpretation of the neuropsychological phenomenon of category-specific semantic deficits has been extended to interpret results from functional magnetic resonance imaging (fMRI) in healthy subjects [3,4]. Much of the research using fMRI to study category-specificity has focused on the pattern of responses in the ventral visual pathway, which projects from early visual areas to lateral and ventral occipital– temporal regions, and processes object shape, texture and color in ways that are relatively invariant to viewpoint, size and orientation [5–7]. Different regions within the ventral pathway preferentially respond to images of faces, animals, tools, places, written words and body parts [4,6,8– 13], see also [13–15]. The existence of consistent topographic biases by semantic category in the ventral stream raises fundamental Corresponding authors: Mahon, B.Z. (
[email protected]); Caramazza, A. (
[email protected]).
questions about the principles that determine brain organization [4,10–12,16,17]. To date, the emphasis of research on the organization of the ventral stream has been on the stimulus properties that drive responses in a particular brain region, studied in relative isolation from other regions. This approach was inherited from well-established traditions in neurophysiology and psychophysics where it has been enormously productive for mapping psychophysical continua in primary sensory systems. It does not follow that the same approach will yield equally useful insights for understanding the principles of the neural organization of conceptual knowledge. The reason is that unlike the peripheral sensory systems, the pattern of neural responses in higher order areas is only partially driven by the physical input – it is also driven by how the stimulus is interpreted, and that interpretation does not occur in a single, isolated region. The ventral object processing stream is the central pathway for the extraction of object identity from visual information in the primate brain – but what the brain does with that information about object identity depends on how the ventral stream is connected to the rest of the brain. Here, we focus on visual object recognition, as this has been the aspect of object knowledge and processing that has been studied in greatest depth; however, similar principles would be expected to apply to other modalities as appropriate. We argue that there are innately determined patterns of connectivity that mediate the integration of information from the ventral stream with information computed by other brain regions. Those channels are at the grain of a limited number of evolutionarily relevant domains of knowledge. We further suggest that what is given innately is the connectivity, and that specialization by semantic category in the ventral stream is driven by that connectivity. The implication of this proposal is that the organization of the ventral stream by category is relatively invariant to visually based, bottom-up, constraints. This approach corrects an imbalance in explanations of the causes of the consistent topography by semantic category in the ventral object-processing stream by giving greater prominence to endogenously determined constraints on brain organization. The distributed domain-specific hypothesis A domain-specific neural system is a network of brain regions [11] in which each region processes a different type
1364-6613/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.004 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
97
Opinion
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Box 1. Cognitive neuropsychological evidence for domain-specific constraints varying extents in the representation of items from different categories. However, the existence of category-specificity in imaging [4], neurophysiology [67] and neuropsychology [11] cannot be explained exclusively by appeal to modality-based principles of organization. This suggests that the dimensions of brain organization that express themselves as phenomena of category-specificity (across methods and populations) are in fact domain-specific constraints on brain organization. Finally, there is emerging neuropsychological evidence for endogenous constraints on brain organization, including the existence of category-specific semantic deficits tested at age 16 years after stroke at 1 day of age [patient Adam, see below; ref 68]. There are also parallels between the patterns of category-specific semantic deficits and psychophysical studies of putatively specialized routes for processing specific classes of visual stimuli. For instance, New and colleagues [69], using a change detection paradigm, demonstrated a significant advantage for living animate stimuli. Thorpe and colleagues [70] have demonstrated extremely rapid and accurate detection of face and animal stimuli. Almeida and colleagues [65] have demonstrated that conceptual information about manipulable objects can be extracted from stimuli that are putatively not processed by the ventral visual pathway. These and other findings could indicate experimental ways of isolating domain-specific networks.
Patients with category-specific semantic deficits can be differentially or even selectively impaired for knowledge of animals, plants, conspecifics or artifacts (for review see [11]). The knowledge impairment cannot be explained in terms of a differential impairment to a sensory or motor-based modality of information. Although discussion and debate continues as to whether non-categorical dimensions of organization can lead to category-specific brain organization, there is consensus that the phenomenon itself is ‘categorical’ (see Figure I for representative patients’ performance in picture naming and answering semantic probe questions). There are important parallels between the neuropsychological literature on category-specific semantic deficits and the findings from functional neuroimaging and neurophysiology. First, the categories that emerge from the neuropsychological literature map onto the categories that emerge in functional imaging and neurophysiology. This indicates that the different methods and populations are tracking the same underlying property of brain organization. Second, the resistance of category-specific deficits to be explained by dimensions of organization that do not include semantic category [2] parallels the same pattern that has emerged in imaging and neurophysiology [60]. It is clearly the case that the brain is organized by sensory and motor modalities, and it is also the case that different sensory and motor modalities participate to
[()TD$FIG]
Category-specific semantic deficits Picture naming performance by category
Key:
100 Percent correct
Living animate + nonliving` 80 Fruit/vegetable + nonliving 60
Fruit/vegetable
40
Living animate
20
Nonliving
0
Conspecifics RC
EW
RS
MD KS Patients
APA
CW
PL
Semantic probe questions by category and modality 100 Percent correct
Key: 80
Living: visual/perceptual
60
Nonliving: visual/perceptual
40
Living: nonvisual
20
Nonliving: nonvisual
0 EW
GR
FM DB Patients
RC
ADAM TRENDS in Cognitive Sciences
Figure I. Representative patients with category-specific semantic deficits. Patients with category-specific semantic deficits may have selective impairments for naming items from one category of items compared to other categories (top panel). Those patients may also have categorical impairments for answering questions about all types of object properties (i.e., visual/perceptual and functional/associative; bottom panel). For further discussion and references to the patients shown here, see [11].
of information about the same domain or category of objects [2,18]. The types of information processed by different parts of a network can be sensory, motor, affective or conceptual. The range of potential domains or classes of items that can have dedicated neural circuits is restricted to those with an evolutionarily relevant history that could have biased the system toward a coherent organization. A second important characteristic of domain-specific systems is that the compu98
tations that must be performed over items from the domain are sufficiently ‘eccentric’ [19] so as to merit a specialized process. In other words, the coupling across different brain regions that is necessary for successful processing of a given domain is different in kind from the types of coupling that are needed for other domains of knowledge. For instance, the need to integrate motor-relevant information with visual information is present for tools and
Opinion other graspable objects and less so for animals or faces. By contrast, the need to integrate affective information, biological motion processing and visual form information is strong for conspecifics and animals, and less so for tools or places. Thus, our proposal is that domain-specific constraints are expressed as patterns of connectivity among regions of the ventral stream and other areas of the brain that process nonvisual information about the same classes of items. For instance, specialization for faces in the lateral fusiform gyrus (fusiform face area [20–22]) arises because that region of the brain has connectivity with the amygdala and the superior temporal sulcus (among other regions) which are important for the extraction of socially relevant information and biological motion. Specificity for tools and manipulable objects in the medial fusiform gyrus is driven, in part, by connectivity between that region and regions of parietal cortex that subserve object manipulation [23–26]. Connectivity-based constraints can also be responsible for other effects of category-specificity in the ventral visual stream, such as connectivity between somatomotor areas and regions of the ventral stream that differentially respond to body parts [27–29] (extrastriate body area), connectivity between left lateralized frontal language processing regions and ventral stream areas specialized for printed words (visual word form area [30,31]), and connectivity between regions involved in spatial analysis and ventral stream regions showing differential responses to highly contextualized stimuli, such as houses, scenes and large non-manipulable objects (parahippocampal place area [32]). The role of visual experience According to the distributed domain-specific hypothesis, the organization by category in the ventral stream is not only a reflection of the visual structure of the world, it also reflects the structure of how ventral visual cortex is connected to other regions of the brain [11,23,33]. However, visual experience and dimensions of visual similarity are also crucial in shaping the organization of the ventral stream [34,35] – after all, the principal afferents to the ventral stream come from earlier stages in the visual hierarchy [36]. Although some authors have recently discussed nonvisual dimensions that could be relevant in shaping the organization of the ventral stream [4,6,7], many accounts differentially weight the contribution of visual experience in their explanation of the causes of category specific organization within the ventral stream. Several hypotheses have been developed, and we merely touch on them here to illustrate a common assumption: that the organization of the ventral stream reflects the visual structure of the world, as interpreted by domain-general processing constraints. Thus, the general thrust of those accounts is that the visual structure of the world is correlated with semantic category distinctions in a way that is captured by how visual information is organized in the brain. One of the most explicit proposals is that there are weak eccentricity preferences in higher order visual areas that are inherited from earlier stages in the processing stream. Those eccentricity biases interact with our experience of foveating some classes of items (e.g. faces) and viewing others in
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
the relative periphery (e.g. houses) [37]. Another class of proposals is based on the suppositions that items from the same category tend to look more similar than items from different categories, and similarity in visual shape is mapped onto the ventral occipital–temporal cortex [17]. It has also been proposed that a given category could require differential processing relative to other categories, for instance in terms of expertise [38], visual crowding [39] or the relevance of visual information for categorization [40]. Other accounts appeal to ‘feature’ similarity and distributed feature maps [41]. Finally, it has been suggested that multiple, visually based, dimensions of organization combine super-additively to generate the boundaries among category-preferring regions [12]. Common to all of these accounts is the assumption that visual experience provides the necessary structure, and that a visual dimension of organization happens to be highly correlated with semantic category. Although visual information is important in shaping how the ventral stream is organized, recent findings indicate that visual experience is not necessary in order for the same, or similar, patterns of category-specificity to be present in the ventral stream. In an early position emission tomography study, Buchel and colleagues [42] showed that congenitally blind subjects show activation for words (presented in Braille) in the same region of the ventral stream as sighted individuals (presented visually). Pietrini and colleagues [43] used multi-voxel pattern analysis to show that the pattern of activation over voxels in the ventral stream was more consistent across different exemplars within a category than exemplars across categories. More recently, we [44] have shown that the same medial-tolateral bias in category preferences on the ventral surface of the occipital–temporal cortex that is present in sighted individuals is present in congenitally blind subjects. Specifically, nonliving things, compared to animals elicit stronger activation in medial regions of the ventral stream (Figure 1). Although these studies on category-specificity in blind individuals represent only a first-pass analysis of the role of visual experience in driving category-specificity in the ventral stream, they indicate that visual experience is not necessary in order for category-specificity to emerge in the ventral stream. This fact raises an important question – if visual experience is not needed for the same topographical biases in category-specificity to be present in the ventral stream, then, what drives such organization? One possibility, as we have suggested, is innate connectivity between regions of the ventral stream and other regions of the brain that process affective, motor and conceptual information. Connectivity as an innate domain-specific constraint A crucial component of the distributed domain-specific hypothesis is the notion of connectivity. The most obvious candidate to mediate such networks is white matter connectivity. However, it is important to underline that functional networks need not be restricted by the grain of white matter connectivity and, perhaps more importantly, task- and state-dependent changes could bias processing toward different components of a broader anatomical brain network. For instance, connectivity between lateral 99
()TD$FIG][ Opinion
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Category-specific organization does not require visual experience Sighted: picture viewing
Sighted: picture viewing t Values (Living - Nonliving)
Left ventral ROI 2
2
1
1
0
0
-1
-1
-2
-2
-3
-3
-4
Sighted: auditory task
-40 -38 -36 -34 -32 -30 -28 -26 -24 Tal. Coord. X Dim
Right ventral ROI
-4 24 26 28 30 32 34 36 38 40 Tal. Coord. X Dim
t Values (Living - Nonliving)
Sighted: auditory task 1
Left ventral ROI
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
-1.5
-40 -38 -36 -34 -32 -30 -28 -26 -24
Right ventral ROI
-1.5 24 26 28 30 32 34 36 38 40
Tal. Coord. X Dim
Tal. Coord. X Dim
Congenitally blind: auditory task
Congenitally blind: auditory task t Values (Living - Nonliving)
Left ventral ROI 0
0
-0.5
-0.5
-1
-1
-1.5
-1.5
-2
-2
-2.5
-2.5 -40 -38 -36 -34 -32 -30 -28 -26 -24 Tal. Coord. X Dim
Right ventral ROI
24 26 28 30 32 34 36 38 40 Tal. Coord. X Dim TRENDS in Cognitive Sciences
Figure 1. Congenitally blind and sighted participants were presented with auditorily spoken words of living things (animals) and nonliving things (tools, non-manipulable objects) and were asked to make size judgments about the referents of the words. The sighted participants were also shown pictures corresponding to the same stimuli in a separate scan. For sighted participants viewing pictures, the known finding was replicated that nonliving things such as tools and large non-manipulable objects lead to differential neural responses in medial aspects of the ventral occipital–temporal cortex. This pattern of differential BOLD responses for nonliving things in medial aspects of the ventral occipital–temporal cortex was also observed in congenitally blind participants and sighted participants performing the size judgment task over auditory stimuli. These data indicate that the medial-to-lateral bias in the distribution of category-specific responses does not depend on visual experience. For details of the study, see [44].
and orbital prefrontal regions and the ventral occipital– temporal cortex [45,46] is crucial for categorization of visual input. It remains an open question whether multiple functional networks are subserved by this circuit, each determined by the type of visual stimulus being categorized. For instance, when categorizing manipulable objects, connectivity between parietofrontal somatomotor areas and prefrontal cortex could dominate, whereas when categorizing faces other regions could express stronger functional coupling to those same prefrontal regions. Such a suggestion would generate the expectation that whereas damaging prefrontal-to-ventral stream connections could 100
result in difficulties categorizing all types of visual stimuli, disruption of the afferents to the prefrontal cortex from a specific category-preferring area could lead to categorization problems selective to that domain. The neural basis of the connectivity that supports domain-specific neural systems is, admittedly, in need of further development and articulation. Below, we will return to expectations that can be drawn from this explanation. Evidence for innate constraints The signature of innate structure is similarity across individuals, both within a species and potentially across
Opinion species. ‘Innate’ does not imply ‘present-from-birth’, although present-from-birth strongly suggests an innate contribution. Maturation in the context of the right types of experience could be necessary for the expression of innate structure, and interactions between innate and experiential factors can jointly constrain outcome [47]. This is particularly the case for mental processes, as there would be nothing to process without the content provided by experience. Several lines of evidence show that genetic variables capture similarity in functional brain organization as it relates to the presence of domain-specific neural circuits. Twin studies Two recent reports highlight greater neural or functional similarity between monozygotic twin pairs than between dizygotic twin pairs (for discussion see [48,49]). The strength of these studies is that experiential contributions are held constant across the two types of twin pairs. In a fMRI study, Polk and colleagues [50] studied the similarity between twin pairs in the distribution of responses to faces, houses, pseudowords and chairs in the ventral stream. The authors found that face and place-related responses within face and place selective regions, respectively, were significantly more similar for monozygotic than for dizygotic twins. In another study, Wilmer and colleagues [51] studied the face recognition and memory abilities [52] in monozygotic and dizygotic twin pairs. The authors found that the correlation in performance on the face recognition task for monozygotic twins was more than double that for dizygotic twins. This difference was not present for control tasks of verbal and visual memory, indicating selectivity in the genetic contribution to behavioral abilities (see also [53]). Congenital prosopagnosia Further evidence for a genetic contribution to face recognition abilities comes from congenital prosopagnosia, a developmental disorder in which individuals can have selective impairments for recognizing faces [54]. A recent study by Thomas and colleagues [55] found that congenital prosopagnosia was associated with reduced structural integrity of the inferior longitudinal fasciculus, which projects from the fusiform gyrus to anterior regions of the temporal lobe. Reduced structural integrity was also observed for the inferior fronto-occipital fasciculus which projects from the ventral occipital–temporal cortex to frontal regions. Such observations of reduced integrity of major white matter tracts linking the posterior occipital–temporal cortex with other brain regions underlines the strength of a network-level analysis in understanding the constraints that shape the organization of knowledge in the ventral stream. Non-human primates An expectation on the view that innate constraints shape category-specificity in the ventral stream is that such specificity, at least for some categories, can also be found in non-human primates. It is well known, using neurophysiological recordings, that preferences for natural object stimuli exist in the inferior temporal (IT) cortex of
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
monkeys [35,56], comparable to observations with similar methods in awake human subjects [15]. More recently, functional imaging with macaques [57] and chimpanzees [58] suggests that at least for the category of faces, comparable clusters of face preferring voxels can be found in the temporal cortex in monkeys, as are observed in humans. Such common patterns of neural organization for some classes of items in monkeys and humans could, of course, be entirely driven by dimensions of visual similarity, which are known to modulate responses in the IT cortex [59]. However, even when serious attempts have been made to explain such responses in terms of dimensions of visual similarity, taxonomic structure emerges over and above the contribution of known visual dimensions. For instance, Kriegeskorte and colleagues [60] used multi-voxel pattern analysis to compare the similarity structure of a large array of different body, face, animal, plant and artifact stimuli in the monkey IT cortex and human occipital– temporal cortex. The similarity among the stimuli was measured in terms of the similarity of the patterns of brain responses they elicited, separately on the basis of the neurophysiological data (monkeys) [56] and fMRI data (humans). The similarity structure that emerged revealed a tight taxonomic structure common to monkeys and humans, and which could not be reduced to known dimensions of visual similarity. Next steps Specialization of function in the brain is clearest at the level of primary sensory and motor areas that have a physical organization in the brain that projects topographically onto a psychophysical dimension such as retinotopy, tonotopy or somatotopy. At the other end of the continuum, there are aspects of human cognition that have eluded neat parcellation in the brain, such as the neural instantiation of the abstract and recursive systems that make human thought and metacognition possible. Somewhere in the middle are conceptual representations – they interface with and draw on the sensory and motor systems and at the same time require the flexibility characteristic of symbolic representations [61]. We have outlined a framework for understanding the causes of category-specific organization in the brain that is based on the hypothesis that there are innate patterns of connectivity that constrain the distribution of category-specific neural regions. This proposal fully embraces a hierarchical view of the organization of conceptual knowledge [3]: the organization of the ventral stream reflects the final product of a complex tradeoff of pressures, some of which are expressed locally within the ventral stream and some of which are expressed as connectivity to the rest of the brain. Our suggestion is that connectivity to the rest of the brain is the first, or broadest, principle according to which the ventral stream comes to be organized by semantic category. Although there is striking overlap in the semantic categories that can dissociate under conditions of brain damage and which show consistent topographic organization in the ventral stream (Box 1), there is some divergence between the lesion locations in patients with category-specific deficits and the patterns of neural activation observed with fMRI. In particular, focal lesions to category-preferring 101
Opinion regions within the ventral stream do not invariably lead to category-specific semantic deficits. This suggests that what is damaged in patients with category-specific semantic deficits are the broader neural circuits that are specialized for the impaired domain of knowledge. Damage to multiple regions within that domain-specific neural circuit could lead to a category-specific deficit by disrupting or disorganizing the broader network. Furthermore, damage to regions that serve to integrate processing across the whole domain, such as the anterior temporal lobes [62,63] for the domains of animals and conspecifics, could particularly disrupt functioning throughout the broader network. A second direction for research that is encouraged by the distributed domain-specific hypothesis is to characterize the patterns of both anatomical and functional connectivity within domain-specific neural circuits. The expectation is that there will be a tight coupling between patterns of connectivity and the locations of category-preferring regions. In this regard, it is important to note that regions expressing connectivity with category-specific regions within the ventral stream are not necessarily ‘downstream’ from visual object recognition, and do not necessarily represent ‘more developed’ or ‘more processed’ information than what is computed in the ventral stream. Stimuli are processed through multiple routes in parallel, such as subcortical processing of emotional face stimuli [20,21] and dorsal stream processing of manipulable objects [64,65]. Thus, one exciting possibility is that fast but coarse analysis of the visual input that bypasses the geniculate striate pathway could ‘cue’ or ‘bias’ processing within the ventral stream according to the content of the stimulus to be processed [45], analogous to attentional modulation of early visual responses. A third way in which the distributed domain-specific hypothesis can be tested is to explore the connectivity of all the categories that show selective responses in the ventral stream. For instance, an expectation that could be generated is that stimuli from different domains, such as hands and tools, can live next to each other in the ventral stream because both would be predicted to have connectivity to the somatomotor cortex. In other words, the way in which representations are organized in the ventral stream should follow patterns of connectivity, such that they are organized according to similarity metrics represented in other parts of the brain, rather than (only) by dimensions of visual similarity. Perhaps the most pressing issue that must be addressed by the distributed domain-specific hypothesis is whether connectivity drives specialization by category, as we have proposed, or whether specialization of function is present independently of connectivity, and the connectivity emerges later. One way to empirically address this is to test individuals who are blind since birth. Sensory deprivation will remove the influence of local constraints, presumably expressed over short-range bottom-up connections from earlier visual regions, but would not be expected to fundamentally alter the ‘longer range’ connections. Combining detailed analysis of connectivity in such individuals with analysis of the location of category-preferring regions in the ventral stream could ground inferences about whether connectivity in fact drives the location of category preferences in 102
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
the ventral stream. In particular, the regions specialized for printed words could offer a means to test this issue, as there is no motivation for presuming specialization of function to be innately present for printed words in the human brain. Because there are regions that are consistently specialized for printed words, the expectation would be that this specialization is driven by connectivity between the ventral stream and regions of the brain involved in linguistic processing. The prediction can be made that subject-by-subject variation in the location of the visual word form area (tested with Braille) in congenitally blind individuals will match up with subject-by-subject variation in connectivity between that region of the ventral stream and other language processing regions of the brain. The core of our proposal, that specialization in a region of the brain is driven, in part, by constraints on how that information will ultimately be used in the service of behavior, is not new. It is well established that visual processing bifurcates into a dorsal stream for object-directed action and spatial processing and a ventral stream for the extraction of object identity [66]. The two visual system model places important restrictions on plasticity of function within the visual system. Analogously, the distributed domain-specific hypothesis places new limits on plasticity of function within the ventral object processing stream, and suggests that the key to describing those limits lies in the patterns of connectivity between the ventral stream and other category-specific brain regions. References 1 Capitani, E. et al. (2003) What are the facts of category-specific deficits? A critical review of the clinical evidence. Cogn. Neuropsychol. 20, 213– 261 2 Caramazza, A. and Shelton, J.R. (1998) Domain specific knowledge systems in the brain: the animate-inanimate distinction. J. Cogn. Neurosci. 10, 1–34 3 Caramazza, A. and Mahon, B.Z. (2003) The organization of conceptual knowledge: the evidence from category-specific semantic deficits. Trends Cogn. Sci. 7, 354–361 4 Martin, A. (2007) The representation of object concepts in the brain. Annu. Rev. Psychol. 58, 25–45 5 Miceli, G. et al. (2001) The dissociation of color from form and function knowledge. Nat. Neurosci. 4, 662–667 6 Grill-Spector, K. and Malach, R. (2004) The human visual cortex. Annu. Rev. Neurosci. 27, 649–677 7 Cant, J.S. et al. (2009) fMR-adaptation reveals separate processing regions for the perception of form and texture in the human ventral stream. Exp. Brain Res. 192, 391–405 8 Allison, T. et al. (1994) Human extrastriate visual cortex and the perception of faces, words, numbers, and colors. Cereb. Cortex 4, 544–554 9 Chao, L.L. et al. (1999) Attribute-based neural substrates in posterior temporal cortex for perceiving and knowing about objects. Nat. Neurosci. 2, 913–919 10 Kanwisher, N. (2000) Domain specificity in face perception. Nature 3, 759–763 11 Mahon, B.Z. and Caramazza, A. (2009) Concepts and categories: a cognitive neuropsychological perspective. Annu. Rev. Psychol. 60, 1–15 12 Op de Beeck, H.P. et al. (2008) Interpreting fMRI data: maps, modules and dimensions. Nat. Rev. Neurosci. 9, 123–135 13 Pitcher, D. et al. (2009) Triple dissociation of faces, bodies, and objects in extrastriate cortex. Curr. Biol. 19, 319–324 14 Bentin, S. et al. (1996) Electrophysiological studies of face perception in humans. J. Cogn. Neurosci. 8, 551–565 15 Kreiman, G. et al. (2000) Category-specific visual responses of single neurons in the human medial temporal lobe. Nat. Neurosci. 3, 946–953
Opinion 16 Cantlon, J.F. et al. (2011) Cortical representations of symbols, objects, and faces are pruned back during early childhood. Cereb. Cortex 21, 191–199 17 Haxby, J.V. et al. (2001) Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293, 2425–2430 18 Carey, S. and Spelke, E. (1994) Domain specific knowledge and conceptual change. In Mapping the Mind: Domain Specificity in Cognition and Culture (Hirschfeld, L. and Gelman, S.A., eds), pp. 169–200, Cambridge University Press 19 Fodor, J. (1983) Modularity of Mind, MIT Press 20 Pasley, B.N. et al. (2004) Subcortical discrimination of unperceived objects during binocular rivalry. Neuron 42, 163–172 21 Vuilleumier, P. et al. (2004) Distant influences of amygdala lesion on visual cortical activation during emotional face processing. Nat. Neurosci. 7, 1271–1278 22 Martin, A. and Weisberg, J. (2003) Neural foundations for understanding social and mechanical concepts. Cogni. Neuropsychol. 20, 575–587 23 Mahon, B.Z. et al. (2007) Action-related properties shape object representations in the ventral stream. Neuron 55, 507–520 24 Valyear, K.F. and Culham, J.C. (2010) Observing learned objectspecific functional grasps preferentially activates the ventral stream. J. Cogn. Neurosci. 22, 970–984 25 Noppeney, U. et al. (2006) Two distinct neural mechanisms for category-selective responses. Cereb. Cortex 16, 437–445 26 Rushworth, M.F.S. et al. (2006) Connection patterns distinguish 3 regions of human parietal cortex. Cereb. Cortex 16, 1418–1430 27 Astafiev, S.V. et al. (2004) Extrastriate body area in human occipital cortex responds to the performance of motor actions. Nat. Neurosci. 7, 542–548 28 Orlov, T. et al. (2010) Topographic representation of the human body in the occipitotemporal cortex. Neuron 68, 586–600 29 Peelen, M.V. and Caramazza, A. (2010) What body parts reveal about the organization of the brain. Neuron 68, 331–333 30 Dehaene, S. et al. (2005) The neural code for written words: a proposal. Trends Cogn. Sci. 9, 335–341 31 Martin, A. (2006) Shades of De´jerine – forging a causal link between the visual word form area and reading. Neuron 50, 173–190 32 Bar, M. and Aminoff, E. (2003) Cortical analysis of visual context. Neuron 38, 347–358 33 Riesenhuber, M. (2007) Appearance isn’t everything: news on object representation in cortex. Neuron 55, 341–344 34 Op de Beeck, H.P. et al. (2006) Discrimination training alters object representations in human extrastriate cortex. J. Neurosci. 26, 13025– 13036 35 Tanaka, K. et al. (1991) Coding visual images of objects in the inferotemporal cortex of the macaque monkey. J. Neurophysiol. 66, 170–189 36 Felleman, D.J. and Van Essen, D.C. (1991) Distributed hierarchical processing in primate visual cortex. Cereb. Cortex 1, 1–47 37 Levy, I. et al. (2001) Center-periphery organization of human object areas. Nat. Neurosci. 4, 533–539 38 Gauthier, I. et al. (1999) Activation of the middle fusiform ‘face area’ increases with expertise in recognizing novel objects. Nat. Neurosci. 2, 568–573 39 Rogers, T.T. et al. (2005) Fusiform activation to animals is driven by the process, not the stimulus. J. Cogn. Neurosci. 17, 434–445 40 Mechelli, A et al. (2006) Semantic relevance explains category effects in medial fusiform gyri. Neuroimage 3, 992–1002 41 Tyler, L.K. et al. (2003) Do semantic categories activate distinct cortical regions? Evidence for a distributed neural semantic system. Cogn. Neuropsychol. 20, 541–559 42 Buchel, C. et al. (1998) A multimodal language region in the ventral visual pathway. Nature 394, 274–277 43 Pietrini, P. et al. (2004) Beyond sensory images: object-based representation in the human ventral pathway. Proc. Natl. Acad. Sci. U.S.A. 101, 5658–5663
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3 44 Mahon, B.Z. et al. (2009) Category-specific organization in the human brain does not require visual experience. Neuron 63, 397–405 45 Kveraga, K. et al. (2007) Magnocellular projections as the trigger of topdown facilitation in recognition. J. Neurosci. 27, 13232–13240 46 Miller, E.K. et al. (2003) Neural correlates of categories and concepts. Curr. Opin. Neurobiol. 13, 198–203 47 Lewontin, R. (2000) The Triple Helix: Genes, Organisms, and Environment, Harvard University Press 48 Park, J. et al. (2009) Face processing: the interplay of nature and nurture. Neuroscientist 15, 445–449 49 Zhu, Q. et al. (2010) Heritability of the specific cognitive ability of face perception. Curr. Biol. 20, 137–142 50 Polk, T.A. et al. (2007) Nature versus nurture in ventral visual cortex: a functional magnetic resonance imaging study of twins. J. Neurosci. 27, 13921–13925 51 Wilmer, J. et al. (2010) Human face recognition ability is specific and highly heritable. Proc. Natl. Acad. Sci. U.S.A. 107, 5238–5241 52 Duchaine, B. and Nakayama, K. (2006) The Cambridge Face Memory Test: results for neurologically intact individuals and an investigation of its validity using inverted face stimuli and prosopagnosic subjects. Neuropsychologia 44, 576–585 53 Zhu, Q. et al. (2010) Heritability of the specific cognitive ability of face perception. Curr. Biol. 20, 1–6 54 Duchaine, B.C. et al. (2006) Prosopagnosia as an impairment to face specific mechanisms: elimination of the alternative hypotheses in a developmental case. Cogn. Neuropsychol. 23, 714–747 55 Thomas, C. et al. (2009) Reduced structural connectivity in ventral visual cortex in congenital prosopagnosia. Nat. Neurosci. 12, 29–31 56 Kiani, R. et al. (2007) Object category structure in response patterns of neuronal population in monkey inferior temporal cortex. J. Neurophysiol. 97, 4296–4309 57 Tsao, D.Y. et al. (2006) A cortical region consisting entirely of faceselective cells. Science 311, 670–674 58 Parr, L.A. et al. (2009) Face processing in the chimpanzee brain. Curr. Biol. 19, 50–53 59 Op de Beeck, H. et al. (2001) Inferotemporal neurons represent lowdimensional configurations of parameterized shapes. Nat. Neurosci. 4, 1244–1252 60 Kriegeskorte, N. et al. (2008) Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron 60, 1126–1141 61 Mahon, B.Z. and Caramazza, A. (2008) A critical look at the embodied cognition hypothesis and a new proposal for grounding conceptual content. J. Physiol. Paris 102, 59–70 62 Damasio, H. et al. (2004) Neural systems behind word and concept retrieval. Cognition 92, 179–229 63 Patterson, K. et al. (2007) Where do you know what you know? The representation of semantic knowledge in the human brain? Nat. Rev. 8, 976–987 64 Fang, F. and He, S. (2005) Cortical responses to invisible objects in the human dorsal and ventral pathways. Nat. Neurosci. 8, 1380–1385 65 Almeida, J. et al. (2008) Unconscious processing dissociates along categorical lines. Proc. Natl. Acad. Sci. U.S.A. 105, 15214–15218 66 Goodale, M.A. and Milner, A.D. (1992) Separate visual pathways for perception and action. Trends Neurosci. 15, 20–25 67 Kriegeskorte, N. et al. (2008) Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron 60, 1126–1141 68 Farah, M.J. and Rabinowitz, C. (2003) Genetic and environmental influences on the organization of semantic memory in the brain: Is ‘‘living things’’ an innate category? Cogn. Neuropsychol. 20, 401– 408 69 New, J. et al. (2007) Category-specific attention for animals reflects ancestral priorities, not expertise. Proc. Natl. Acad. Sci. U.S.A. 104, 16598–16603 70 Thorpe, S. et al. (1996) Speed of processing in the human visual system. Nature 381, 520–522
103
Opinion
Specifying the self for cognitive neuroscience Kalina Christoff1, Diego Cosmelli2, Dorothe´e Legrand3 and Evan Thompson4 1
Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC, V6T 1Z4 Canada Escuela de Psicologı´a, Pontificia Universidad Cato´lica de Chile, Av. Vicun˜a Mackenna 4860, Macul, Santiago, Chile 3 Centre de Recherche en Episte´mologie Applique´ (CREA), ENSTA-32, boulevard Victor, 75015 Paris, cedex 15, France 4 Department of Philosophy, University of Toronto, 170 St George Street, Toronto, ON, M5R 2M8 Canada 2
Cognitive neuroscience investigations of self-experience have mainly focused on the mental attribution of features to the self (self-related processing). In this paper, we highlight another fundamental, yet neglected, aspect of self-experience, that of being an agent. We propose that this aspect of self-experience depends on self-specifying processes, ones that implicitly specify the self by implementing a functional self/non-self distinction in perception, action, cognition and emotion. We describe two paradigmatic cases – sensorimotor integration and homeostatic regulation – and use the principles from these cases to show how cognitive control, including emotion regulation, is also self-specifying. We argue that externally directed, attention-demanding tasks, rather than suppressing self-experience, give rise to the selfexperience of being a cognitive–affective agent. We conclude with directions for experimental work based on our framework.
Investigating self-experience in cognitive neuroscience How does the embodied brain give rise to self-experience? This question, long addressed by neurology [1] and neurophysiology [2], now attracts strong interest from cognitive neuroscience and the neuroimaging community [3–6]. Recent neuroimaging studies have investigated selfexperience mainly by employing paradigms that contrast self-related with non-self-related stimuli and tasks. Such paradigms aim to reveal the cerebral correlates of ‘selfrelated processing’ (see Glossary). Recent reviews identify several brain regions that appear most consistently activated in self-related paradigms such as assessing one’s personality, physical appearance or feelings; recognizing one’s face; or detecting one’s first name (see [4,6] for extensive reviews). The medial prefrontal cortex (mPFC) and the precuneus/posterior cingulate cortex (Precuneus/ PCC) are the most frequently discussed [4–10], but two additional regions, the temporoparietal junction (TPJ) and temporal pole, are also consistently activated [6]. Although these studies have contributed valuable information about the neural correlates of self-related processing, two issues have recently arisen [3,6]. First, the identified regions, especially the midline regions (mPFC, Precuneus/PCC) often associated with self-related Corresponding author: Thompson, E. (
[email protected]).
104
processing [4,7–10], might not be self-specific, because they are also recruited for a wide range of other cognitive processes – recall of information from memory, inferential reasoning, and representing others’ mental states [3,5,6]. In addition, the PCC appears to be engaged in attentional processes and might be a hub for attention and motivation [11,12], whereas the TPJ is important for attentional reorienting [13]. Hence, describing these regions (singly or collectively) as self-specific could be unwarranted [3,5,6]. Second, studies employing self-related processing approach self-experience through the self-attribution of mental and physical features, and thereby focus on the self as an object of attribution and not the self as the knowing subject and agent. To invoke James’ [14] classic distinction, this paradigm targets the ‘Me’ – the self as known through its physical and mental attributes – and not the ‘I’ – the self as subjective knower and agent. Thus, relying exclusively on this paradigm would limit the cognitive neuroscience of self-experience to self-related processing (the ‘Me’), to the neglect of the self-experience of being a knower and agent (the ‘I’) [6,15]. In this paper, we focus on the ‘I’ – experiencing oneself as the agent of perception, action, cognition and emotion – and Glossary Cognitive control: the process by which one focuses and sustains attention on task-relevant information and selects task-relevant behavior. Emotion regulation: the process by which one influences one’s experience and expression of emotion. Homeostatic regulation: the process of keeping vital organismic parameters within a given dynamical range despite external or internal perturbations. ‘I’ versus ‘Me’: experiencing oneself as subjective knower and agent versus experiencing oneself as an object of perception or self-attribution. Self-related processing: processing requiring one to evaluate or judge some feature in relation to one’s perceptual image or mental concept of oneself. Self-specific: a component or feature that is exclusive (characterizes oneself and no one else) and noncontingent (changing or losing it entails changing or losing the distinction between self and non-self). Self-specifying: any process that specifies the self as subject and agent by implementing a functional self/non-self distinction. Sensorimotor integration: the mechanisms by which sensory information is processed to guide motor acts, and by which motor acts are guided to facilitate sensory processing. Task-negative/default-network brain regions: regions exhibiting sustained functional activity during rest but showing consistent deactivations during externally directed, attention-demanding tasks. Such regions include the precuneus/posterior cingulate cortex, medial prefrontal cortex and bilateral temporoparietal junction. Task-positive brain regions: regions consistently activated during externally directed, attention-demanding tasks. Such regions include the intraparietal sulcus, frontal eye field, middle temporal area, lateral prefrontal cortex and dorsal anterior cingulate.
1364-6613/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.001 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
Opinion
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
we propose a theoretical framework that links this type of self-experience to a wide range of neuroscientific findings at different levels of neural functioning. According to our proposal, experiencing oneself as an agent depends on the existence of specific types of dynamic interactive processes between the organism and its environment. We call these processes ‘self-specifying’ because they implement a functional self/non-self distinction that implicitly specifies the self as subject and agent [6,16]. To illustrate the basic principles of self-specifying processes, we describe two paradigmatic examples – sensorimotor integration and homeostatic regulation – that underlie the self-experience of being a bodily agent. We then argue that although externally directed attention-demanding tasks can compromise self-related processing [7–10,17– 19], such tasks can be expected to enhance another fundamental type of self-experience, namely that of being a cognitive–affective agent [6,15,16]. In support of this point, and to show how cognitive neuroscience can begin to model this type of self-experience, we apply the concept of selfspecifying processes to cognitive control, including emotion regulation. We conclude with suggestions for future experimental work based on our framework. Self-experience as arising from self-specifying processes Many neuroimaging studies have focused on the type of self-experience that occurs when a person directs his or her attention away from the external world (e.g. when task demands are low, when performing a self-reflective task or during rest) [7–10,17] (Figure 1a). At the same time, other lines of investigation concerned with embodied experience have examined self-experience during world-directed per-
[()TD$FIG]
(a)
ception and action [1,20,21] (Figure 1b). These investigations have focused on bodily awareness in sensorimotor integration [20,21] and homeostatic regulation [1,22,23]. Central to this approach is the notion that the organism constantly integrates efferent and afferent signals in a way that distinguishes fundamentally between reafference – afferent signals arising as a result of the organism’s own efferent processes (self) – and exafference – afferent signals arising as a result of environmental events (non-self). By implementing this functional self/non-self distinction, efferent–afferent integration implicitly specifies the self as a bodily agent [6,16,21]. Sensorimotor integration The notion of self-specifying processes is easiest to illustrate through the systematic linkage of sensory and motor processes in the perception–action cycle (Box 1). An organism needs to be able to distinguish between sensory changes arising from its own motor actions (self) and sensory changes arising from the environment (non-self). The central nervous system (CNS) distinguishes the two by systematically relating the efferent signals (motor commands) for the production of an action (e.g. eye, head or hand movements) to the afferent (sensory) signals arising from the execution of that action (e.g. the flow of visual or haptic sensory feedback). According to various models going back to Von Holst [24], the basic mechanism of this integration is a comparator that compares a copy of the motor command (information about the action executed) with the sensory reafference (information about the sensory modifications owing to the action) [25]. Through such a mechanism, the organism can register that it has executed a given movement, and it can use this information to (b)
OR
TRENDS in Cognitive Sciences
Figure 1. Two types of self-experience. (a) The ‘Me’ or self-related processing (here depicted as self-recognition and reflective thinking about oneself). Its neural substrates are thought to be restricted to a subset of midline cortical regions (mPFC and Precuneus/PCC). It is also thought to compete for cognitive resources when some aspect of the world demands attention. (b) The ‘I’ as embodied agent. This type of self-experience arises from the integration of efferent and reafferent processes, notably sensorimotor integration (green loop) and homeostatic regulation (red loop), as well as possible higher level efferent–reafferent regulatory loops such as the one instantiated by cognitive control processes (blue loop). Such regulatory loops implement a functional self/non-self distinction that implicitly specifies the self as agent. This type of self-experience implicitly occurs during attention-demanding interactions with the environment (black arrows).
105
Opinion
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Box 1. Self-experience and sensorimotor integration The self-experience of being an embodied agent depends on the sensorimotor mechanisms that integrate efference with reafference (Figure I). A basic level mechanism allows efferences to be systematically related to their reafferent consequences. This anchoring of efference to reafference implements a functional self/non-self distinction that implicitly specifies the self as a bodily agent [6,21]. For example, consider the motor act of biting a lemon and the resulting taste. This experience is characterized by (i) a specific content (lemon, not chocolate); (ii) a specific mode of presentation (tasting, not seeing); and (iii) a specific perspective (my experience of tasting). The process of relating an efference (the biting) to a reafference (the resulting taste of acidity) is what allows the perception to be characterized not only by a given content (the acidity) but also by a self-specific perspective (I am the one experiencing the acidity of the lemon juice) [6,21]. The agent’s perspective is thus a central concept within this framework. Although the basic sensorimotor integration processes do not involve any representation of the self per se, they are nonetheless self-specifying [6] because they implement a unique egocentric perspective in perception and action, and thus implicitly specify the self as subject and agent of that perspective. According to this view, self-experience is present whenever a self-specific perspec-
[()TD$FIG]
tive exists, regardless of the properties of the represented content [6,15,16,21]. The original mechanism of sensorimotor integration (Figure I) can be elaborated to include higher level comparators between intended, predicted and actual reafference (Figure II). For example, Wolpert and colleagues [25] described a two-process model of action monitoring. The first process (Figure II, left) uses the motor command and the current state estimate to achieve a next state estimate using the forward model (or a prediction) to simulate the arm’s dynamics. The second process (Figure II, right) uses the difference between expected and actual sensory feedback to correct the forward model’s next state estimate. Through such sophisticated comparators, the model can handle higher level phenomena, such as intentions, predictions, mental simulation and goals [20].
[()TD$FIG]
Next state estimate estimate Sensory discrepancy/ state correction
Comparator
Self Sensorimotor integration Comparator
Reafference
Comparator
Actual reafference
Predicted reafference
Efference copy
Motor command
Predicted next state (Forward model)
Effector
External world
TRENDS in Cognitive Sciences
Current state estimate Motor command TRENDS in Cognitive Sciences
Figure I. Sensorimotor integration Comparator mechanism for relating efferent signals to reafferent sensory feedback.
process the resulting sensory reafference. The crucial point for our purposes is that reafference is self-specific, because it is intrinsically related to the agent’s own action (there is no such thing as a non-self-specific reafference). Thus, by relating efferent signals to their afferent consequences, the CNS marks the difference between self-specific (reafferent) and non-self-specific (exafferent) information in the perception–action cycle. In this way, the CNS implements a functional self/non-self distinction that implicitly specifies the self as the perceiving subject and agent. Homeostatic regulation Self-specifying reafferent–efferent processes are key components of homeostatic regulation, which implements the self/non-self distinction at the basic level of life preservation [1,16,22,23]. To ensure the organism’s survival through changing internal and external conditions, afferent signals conveying information about the organism’s 106
Figure II. Two-process model of action monitoring (Ref. [25]).
internal state are continually coupled with corresponding efferent regulatory processes that keep afferent parameters within a tight domain of possible values [1,22,23]. Reafferent–efferent loops from spinal nuclei to brainstem nuclei and midbrain structures are involved in somatoautonomic adjustments; these loops are modulated by the hypothalamus as well as mid/posterior insula (sensory) and anterior cingulate (motor) cortices [23]. This vertically integrated, interoceptive homeostatic system specifies the self as a bodily agent by maintaining the body’s integrity (self) in relation to the environment (non-self) [22], and by supporting the implicit feeling of the body’s internal condition in perception and action [23]. Specifying the self as knowing subject and agent The reafferent–efferent processes just described specify the self not as an object of perception or attribution (the ‘Me’) but as the experiential subject and agent of perception,
Opinion action and feeling (the ‘I’). Sensorimotor integration specifies a unique perceptual perspective on the world, whereas homeostatic regulation specifies a unique affective perspective based on the inner feeling of one’s body. The resulting perspective is self-specific in the strict sense of being both exclusive (it characterizes oneself and no one else) and noncontingent (changing or losing it entails changing or losing the distinction between self and nonself) [6]. In the general case, ‘I’ perceive and act from my self-specific perspective while implicitly experiencing myself as perceiver and agent. In some particular cases, what ‘I’ perceive is ‘Me’, such as when I visually recognize myself. Although many non-human animals can implicitly experience themselves as embodied agents through the types of self-specifying sensorimotor and homeostatic processes described above [26], only humans and a few other species seem capable of self-recognition [27], and thus of experientially relating the ‘I’ and the ‘Me’. What we emphasize here is that whereas the ‘Me’ consists in the features one perceives as belonging to oneself, the ‘I’ consists in the self-specific, agentive perspective from which such perceptions occur; hence, to explain the ‘I’ we need to explain how such a perspective is implemented. Our proposal is that the reafferent–efferent processes of sensorimotor integration and homeostatic regulation implement a self-specific, agentive perspective at the bodily level of perception and feeling. This model predicts that if a brain process involves only afference without a matching efference/reafference, it will not specify the organism as subject or agent, and thus will not constitute a self-specifying process. For example, the ‘feedforward sweep’ in visual processing from early visual areas to extrastriate areas, which Lamme [28] argues is not accompanied by conscious awareness, would not qualify as self-specifying, whereas ‘recurrent processing’ in multiple visual areas, which Lamme argues is associated with ‘phenomenal awareness’ (short-lived awareness that is not necessarily reportable), would qualify as self-specifying only if linked to matching efference/reafference. Our model thus allows that non-self-specifying processes occur in parallel with self-specifying ones, and it leaves open the question whether there exist conscious processes that do not include even minimal self-specification (as Lamme’s proposal suggests) or whether every conscious process is also minimally self-specifying (as others have argued [15]). Given this model, we next consider the view, prevalent in the recent neuroimaging literature [7–10,17–19], that self-experience is suppressed during externally directed, attention-demanding tasks. We argue that this view needs qualification to take into account the self-experience of being a cognitive–affective agent. Is self-experience suppressed during world-directed attention? One outcome of functional magnetic resonance imaging (fMRI) studies using self-related processing as the main paradigm for understanding self-experience is the view that self-experience occurs mostly when individuals are not preoccupied with externally oriented tasks and that it is suppressed when such tasks do occur [7–10]. This view is
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
based partly on findings from a growing number of studies examining spontaneous fluctuations in the fMRI signal during task-free, resting-state conditions [29]. These findings have distinguished between (i) task-positive regions (e.g. dorsolateral PFC, inferior parietal cortex and supplementary motor area), whose activity increases during externally oriented attention and (ii) task-negative/defaultnetwork regions (e.g. mPFC, Precuneus/PCC and TPJ), whose activity decreases across a wide variety of tasks. These task-positive and task-negative networks also appear to be anticorrelated in their spontaneous activity during the resting state [30], so that increased activity in one network has been noted to correlate with decreased activity in the other [17–19]. A prominent interpretation of these findings is that the brain alternates dynamically between a task-oriented, externally directed state and a task-independent, self-directed state, with self-experience in the form of self-related processing mainly occurring during the task-independent, self-directed state [8–10,18,19]. A wide variety of studies have been taken to support this interpretation; these studies indicate that externally oriented, attention-demanding tasks, which are considered to suppress introspective thoughts, tend to suspend default-network activity, whereas resting conditions, as well as practiced tasks that do not suppress introspective thoughts, correlate with an active default network (see [31] for a comprehensive review). Additional support is thought to come from the finding that tasks requiring individuals to make explicit reference to some aspect of themselves implicate medial prefrontal regions also active as part of the default network [4,5,26,31]. Hence, it has been proposed, on the one hand, that self-experience is largely absent during world-directed attention (because self-related processing is strongly suppressed) [17], and, on the other hand, that during rest conditions, subjects mainly engage in selfreferential processing [7–10]. This conclusion, however, rests on the following assumptions: (i) the main way to experience the self is as an object of one’s attention (i.e. through self-related processing); (ii) self-reflective, introspective processes are linked to task-negative/default-network regions; and (iii) the brain is organized into a dynamic system of taskpositive regions subserving world-directed attention and task-negative/default regions subserving self-directed attention, with these two networks acting in opposition so that recruitment of one suppresses the other. Each of these assumptions, however, needs qualification in light of the recent theoretical literature and empirical findings. First, treating self-related processing as the main form of self-experience limits self-experience to the ‘Me’ (self as object of one’s attention) while neglecting the ‘I’ (self as knowing subject and agent). For example, if the agentic ‘I’ is considered at the bodily level of sensorimotor integration, then task-positive regions such as the supplementary motor cortex and inferior parietal cortex could be viewed as crucial to self-experience, for these regions serve to implement sensorimotor integration tasks [25,32,33]. More generally, although world-directed attention can suppress selfrelated processing, one cannot conclude that it suppresses 107
Opinion every form of self-experience, especially the self-experience of being a cognitive agent (which it can instead enhance). Second, self-referential and introspective processes have also been linked to recruitment of regions outside the default network. For example, self-related processing activates the temporopolar cortex as consistently as the three main default network regions (mPFC, Precuneus/ PCC and TPJ) [34], and is also frequently associated with activations in the insula and lateral PFC [6]. Furthermore, introspective mental processes have been linked to a recruitment of the anterior portion of the lateral PFC, namely the rostrolateral PFC [35–37], which is considered to be part of a cognitive control network separable from the default network [38]. These findings indicate that selfreferential processing is not uniquely associated with task-negative/default-network regions. Therefore, reduced or inhibited activity in default network regions does not necessarily indicate that self-directed introspective processes are suppressed, because they can be implemented through regions outside the default network. Finally, recent studies have begun to qualify the picture of task-positive and task-negative/default networks as invariably acting in opposition to each other. A parallel recruitment of task-positive and task-negative/default-network regions has been observed during several tasks, such as passive sensory stimulation [39], continuous movie viewing [40], narrative speech comprehension [41], autobiographical planning [42] and mind wandering during a sustained attention task [36]. These diverse findings suggest that characterizing brain activity as either task-positive/world-directed or task-negative/self-directed is incomplete. Rather, such neural recruitments and cognitive processes can occur in parallel. In contrast to the view that attention-demanding tasks suppress self-experience, we propose that such tasks can be expected to enhance the self-experience of being a cognitive–affective agent. An outstanding task for cognitive neuroscience is to integrate this type of self-experience and self-related processing into an overarching explanatory framework that can guide empirical research. In the next section, we propose what we believe is a crucial element of such a framework. By describing how the concept of self-specifying processes can be applied to cognitive control, including emotion regulation, we argue that cognitive–affective processes instantiate the self-experience of being a cognitive–affective agent. In this way, we show how cognitive neuroscience can investigate this type of self-experience by including paradigms involving attention to the external world. Self-specifying processes during attention-demanding tasks Can cognitive control processes in affectively neutral contexts and affectively arousing contexts implicitly specify the self as a cognitive–affective agent? Cognitive control processes in affectively neutral contexts Cognitive control processes serve both to focus attention on task-relevant information versus other competing sources of information and to select task-relevant behavior over 108
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
habitual or otherwise prepotent responses. For example, in a Stroop task, the goal is to name the ink color of a printed color name while ignoring the word’s meaning. Individuals are slower to respond when the information is incongruent (e.g. the word RED is printed in blue ink) than when it is congruent (e.g. the word RED is printed in red ink), and the slower response time is taken to reflect the need for higher attentional control when a conflict in perceptual information is present. According to the influential ‘conflict-monitoring model’ [43], cognitive control is implemented through a regulatory conflict–control loop consisting of two components. An evaluative or conflict-monitoring component detects conflicts in the information available for task performance, whereas a regulative component exerts a top-down biasing influence on the cognitive and motor processes required for task performance. At the neural level, the dorsal anterior cingulate cortex (dACC) has been proposed to support the evaluative process of conflict monitoring [43,44], whereas lateral PFC regions have been proposed to underlie the regulative process of cognitive control [43,45]. This model predicts that strong ACC activity should be followed by behavior reflecting relatively focused attention, and weak ACC activity by behavior reflecting less focused attention. In keeping with this prediction, Kerns and colleagues [46] found that high dACC activation for incongruent trials in the Stroop task was followed by low interference on the subsequent trial, as well as by strong activation in dorsolateral PFC. These findings suggest that the dACC could signal the need for control adjustments to lateral PFC and thereby strengthen cognitive control [45]. Our aim in describing the conflict-monitoring model is not to endorse it against other important models of cognitive control [47–49] or ACC functioning [50,51]. In particular, we do not suppose that dACC is involved in cognitive but not emotional functions, whereas ventral ACC does the reverse [52], because recent experimental findings and theoretical considerations argue against both this particular cognitive–affective division [53] as well as emotion–cognition separations more generally in the brain and behavior [53,54]. Instead, we use the model to illustrate how cognitive–control processes can be self-specifying. For the purposes of the present argument, the key feature of the conflict-monitoring model is the functional distinction between a regulatory function and an evaluative function. The control loop comprising these two functions (Figure 2) strongly resembles the integration of efferent and reafferent information during sensorimotor processing, with the regulative component corresponding to efferent influence and the evaluative component corresponding to a reafferent process. We propose that such a regulative–evaluative loop can implement a functional self/ non-self distinction between, on the one hand, reafferent signals about modifications in level of conflict resulting from one’s own cognitive–control efforts (self), and, on the other hand, exafferent signals about the level of conflict resulting from environmental sources such as stimulus properties (non-self). By implementing this self-specific, agentive perspective in cognitive control, the regulatory conflict–control loop would implicitly specify the self as a
()TD$FIG][ Opinion Dorsal ACC
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Conflict detection Evaluation/re-afference
Lateral PFC
Biasing influence Regulation/efference
Modified level of conflict
Posterior brain regions
TRENDS in Cognitive Sciences
Figure 2. Cognitive control as a self-specifying process. The conflict-monitoring model of cognitive control [43] depicted as implementing a possible efferent/ reafferent regulatory loop. This loop can define the functional self/non-self distinction between reafferent signals resulting from one’s own cognitive control efforts (self) and exafferent signals about the level of conflict resulting from environmental sources such as stimulus properties (non-self).
cognitive agent. Note that this cognitive form of self-experience would subsume the self-experience of being an embodied agent resulting from sensorimotor integration, because cognitive control operates on sensorimotor processes themselves, and thus occurs at higher levels of integration in the perception–action cycle [55]. As originally conceived, the cognitive control of attention was closely linked to self-regulation [56,57], including the self-experience of being a cognitive agent [57]. Concern with this link, however, seems to have largely disappeared from the recent cognitive neuroscience literature, possibly because of the assumption that self-experience is suppressed during attention-demanding tasks [7–10,17–19], as well as the observation that brain regions associated with cognitive control, such as the lateral PFC and dACC, largely overlap with the task-positive regions outlined earlier. Indeed, meta-analyses show that the lateral PFC and dACC are among the most consistently recruited brain regions across a broad range of attentiondemanding tasks, including perception, response selection, executive control, working memory, episodic memory and problem solving [58,59]. Nevertheless, as discussed above, recruitment of these task-positive regions is not mutually exclusive with recruitment of the task-negative/default-network regions. Although intense engagement in sensorimotor tasks can suppress the task-negative/default-network regions that also subserve self-related processing [17–19], one can envision situations (e.g. introspection, envisioning the perspective of others, mind wandering) in which the required mental processes call upon resources from both sets of regions and hence lead to more balanced activations between them, as indicated by recent results [36,39–42]. Furthermore, even in situations where the dACC and lateral PFC are recruited in opposition to task-negative/default-network regions (i.e. with a concomitant deactivation of these regions), self-experience might still be crucially present in the form of the ‘I’ or self-as-cognitive-agent, as a result of cognitive control processes being self-specifying in the way just outlined above. Emotion regulation The cognitive and behavioral control of emotion in affectively arousing or challenging situations [60,61] provides another case where we can expect to find the self-experience of being a cognitive–affective agent. Although emotion
regulation and self-related processing have often been linked by pointing to their common reliance on midline cortical structures [61,62], we propose that another fundamental but less explored link between self-experience and emotion regulation can be found in how emotion regulation processes are also self-specifying. Recent discussions have proposed a distinction between two main forms of emotion regulation – a deliberate or voluntary form, and an implicit or incidental form [60,61,63–65]. Deliberate emotion regulation relies on the same cognitive control mechanisms required for attention-demanding tasks [61]. Thus, tasks requiring reappraisal – reinterpreting the meaning of a stimulus to change one’s emotional response to it [60,61] – recruit dACC and lateral PFC regions [61]. Here these regions are thought to subserve explicit reasoning about how the association between a situation and one’s emotional response to it can be changed. For example, if one is viewing a picture of a burn victim in a hospital bed, it might be possible to modify the original emotional response of distress or sadness by focusing on possible positive aspects, such as the victim’s successful progress toward a healthier state or that the victim survived. Maintaining such descriptions is thought to bias perceptual and associative-memory systems; these systems in turn send signals to subcortical appraisal systems, such as the amygdala and ventral striatum [61], and thus indirectly modify the original emotional response. We propose that such a regulatory–evaluative loop can implement a functional self/non-self distinction between the effortful reappraisal process (self) and the target of that process, namely the emotional scene (non-self). In this way, emotion regulation can implicitly specify the self as the cognitive–affective agent engaged in trying to reinterpret and thereby control an emotional response. Deliberate forms of emotion regulation are associated not only with dACC and lateral PFC – regions crucially involved in cognitive control – but also with recruitment of dorsomedial PFC (dmPFC) [61,64,65], a brain region considered to support reflective awareness of one’s feelings, and thus to enable higher level, metarepresentations of one’s own experience [63]. By allowing the maintenance of such emotion-specific metarepresentations, and through its dense interconnections with the ventromedial PFC (vmPFC) [66], the dmPFC can exert a biasing influence on emotion processes during deliberate attempts at emotion regulation. Thus, by both influencing and re-representing the emotion processes in more ventral systems, the dmPFC and its interconnected ventral structures can form another regulatory–evaluative loop that implicitly specifies the self as cognitive–affective agent in effortful emotion regulation. In contrast to deliberate emotion regulation, implicit or incidental emotion regulation has been linked to medial regions such as the rostral ACC (rACC), subgenual ACC and vmPFC [61]. For example, the rACC is associated with regulation of attention to emotional (but not non-emotional) distracters during an emotional version of the Stroop task [67,68]. During this task, subjects are not instructed to regulate their emotions, thus the recruitment of the rACC and its accompanying regulation of emotional attention can 109
Opinion be considered incidental to the main task [65]. Activation in rACC appears to be accompanied by a simultaneous and correlated reduction of amygdala activity; this relation suggests that resolving emotional conflict depends on a rACC– amygdala regulatory loop [67] that also appears to use the general cognitive monitoring mechanism of the dACC to detect the presence of conflict [68]. Thus, a self-specifying evaluative–regulatory loop can be formed between rACC and dACC, analogous to that between lateral PFC and dACC, but dedicated to the resolution of emotional conflict through an rACC biasing influence on amygdala activity. Furthermore, regions playing a role in deliberate emotion regulation, such as the dACC and dmPFC [63,64], and possibly the right ventrolateral PFC [65], also appear to participate in implicit emotion regulation. For example, the dACC and dmPFC have autonomic regulatory functions mediated by direct neural connections with subcortical visceromotor centers such as the lateral hypothalamus [66]. In addition, neuroimaging studies noting an inverse correlation between medial PFC activity and heart rate variability suggest that medial PFC activity can have a tonic inhibitory effect mediated through the vagus nerve [63]. Based on these findings, researchers have described an evaluative–regulatory feedback mechanism, including an equilibration process between bottom-up and top-down interactions, through which the body state is altered as arousal processes become modulated and differentiated [63]. This mechanism provides another candidate for a self-specifying process at implicit levels of emotion regulation. Given that these candidate self-specifying processes belong to implicit emotion regulation, the functional self/ non-self distinction they implement would be closely related to the one established through homeostatic regulation between the feeling body and the environment. Indeed, implicit emotion regulation processes overlap conceptually and neurally with the higher levels of the homeostatic regulation system described earlier [1,22,23,26]. Thus, the self-experience of being an emotional agent that these processes elicit would occur at the level of affect and action tendencies [26], whereas this bodily level would be subsumed by the self-experience of being a cognitive–affective agent in deliberate emotion regulation, analogous to the way the self-experience of being a cognitive agent also subsumes the self-experience of being an embodied agent in attention-demanding cognitive tasks. Concluding remarks and future directions Using the concept of self-specifying processes, we have outlined a model of how cognitive control processes, including emotion regulation, implicitly specify the self as a cognitive–affective agent. Our model suggests several questions for future investigations (Box 2). We highlight two issues here. One issue concerns the types of neural mechanisms that integrate the efferent–reafferent and regulatory–evaluative signals in self-specifying processes. On the one hand, the comparison between efferent and reafferent signals can be remapped at higher levels by specific neural structures. For example, the anterior insula can serve to remap the second-order comparison between efferent and reafferent 110
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Box 2. Questions for further research Is the ‘I’ or self as-subject all or nothing, or graded? When multiple self-specifying processes are activated, does a stronger sense of ‘I’ occur? Can self-specifying processes be altered through attentional and emotion regulation training? Do self-specifying processes require higher level remapping of efferent–reafferent integration, or can such integration occur through dynamical mechanisms such as phase synchronization? Can self-specifying processes be identified in neuroimaging data through functional connectivity measures, and can statistical measures such as Granger causality be used to identify directional influences in such processes? Can self-specifying processes be identified as part of the brain’s intrinsic functional architecture through intrinsic connectivity measures in resting state neuroimaging data? Can transcranial magnetic stimulation interfere selectively in selfspecifying loops and thereby alter cognitive–affective self-experience? Are self-specifying processes altered in psychiatric disorders, such as schizophrenia or anorexia nervosa, which involve altered self-experience and self-other evaluation?
signals in more posteriorly located motor and sensory regions during homeostatic regulation [23]. Similarly, during cognitive control, anteriorly located lateral PFC regions, such as the rostrolateral PFC, can remap the second-order comparison between the regulative and evaluative outcomes of processes supported by the more posteriorly located dorsolateral PFC and dACC [35]. Such hierarchically organized systems can be present at multiple neural levels and in multiple functional domains. On the other hand, another type of mechanism not requiring explicit remapping by dedicated neural structures, but relying instead on dynamical coupling across multiple areas [69] (e.g. through phase synchronization of neuronal signals [70]), could be responsible for signal integration. Such dynamical mechanisms can also be implemented at multiple neural levels and in various functional domains [69,70]. Whether self-specifying processes depend on either or both of these mechanisms is an important issue for future research. A second issue concerns the subjective nature of selfexperience. Although objective measures from experimentally controlled tasks and uncontrolled rest conditions are certainly useful, we believe a richer understanding of selfexperience requires the incorporation of subjective measures such as self-reports into neuroimaging protocols [36,71]. Certain questions seem tractable only with such an approach. For example, is self-experience all-or-nothing or graded in character? When multiple self-specifying processes are activated at various levels of neural functioning, does a stronger sense of self occur than when only a few are recruited? Can mental training of attention and emotion regulation [72,73] alter self-experience and its neural substrates? As argued here, how cognitive neuroscience specifies the self profoundly shapes our view of self-experience and its neural substrates. By broadening our investigations to include the self-experience of being a cognitive agent, we can deepen our understanding of how the brain and body work together to create our sense of self.
Opinion Acknowledgments For helpful comments we thank Norm Farb, Alisa Mandrigin, Luiz Pessoa, Rebecca Todd and four anonymous reviewers. K.C. was supported by grants from the Canadian Institutes of Health Research (CIHR MOP 81188), the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Michael Smith Foundation for Health Research (MSFHR); D.C. by Fondo National de Desarrollo Cientifico y Tecnolo´gico Grant 1090612; and E.T. by the Social Sciences and Humanities Research Council of Canada.
References 1 Damasio, A.R. (1999) The Feeling of What Happens, Harcourt 2 Llinas, R. (2001) The I of the Vortex, MIT Press 3 Gillihan, S. and Farah, M. (2005) Is self special? A critical review of evidence from experimental psychology and cognitive neuroscience. Psychol. Bull. 131, 76–97 4 Northoff, G. et al. (2006) Self-referential processing in our brain – a meta-analysis of imaging studies on the self. Neuroimage 31, 440– 457 5 Uddin, L.Q. et al. (2007) The self and social cognition: the role of cortical midline structures and mirror neurons. Trends Cogn. Sci. 11, 153–157 6 Legrand, D. and Ruby, P. (2009) What is self-specific? A theoretical investigation and critical review of neuroimaging results. Psychol. Rev. 116, 252–282 7 Gusnard, D.A. et al. (2001) Medial prefrontal cortex and self-referential mental activity: relation to a default mode of brain function. Proc. Natl. Acad. Sci. U.S.A. 98, 4259–4264 8 Gusnard, D.A. (2005) Being a self: considerations from functional imaging. Conscious. Cogn. 14, 679–697 9 Wicker, B. et al. (2003) A relation between rest and self in the brain? Brain Res. Rev. 43, 224–230 10 Schneider, F. et al. (2008) The resting brain and our self: selfrelatedness modulates resting state neural activity in cortical midline structures. Neuroscience 157, 120–131 11 Mohanty, A. et al. (2008) The spatial attention network interacts with limbic and monoaminergic systems to modulate motivation-induced attention shifts. Cereb. Cortex 18, 2604–2613 12 Engelmann, J.B. et al. (2009) Combined effects of attention and motivation on visual task performance: transient and sustained motivational effects. Front. Hum. Neurosci. 3, 1–17 13 Corbetta, M. et al. (2000) Voluntary orienting is dissociated from target detection in human posterior parietal cortex. Nat. Neurosci. 3, 292–297 14 James, W. (1890/1981) The Principles of Psychology, Harvard University Press 15 Legrand, D. (2007) Pre-reflective self-as-subject from experiential and empirical perspectives. Conscious. Cogn. 16, 583–599 16 Thompson, E. (2007) Mind in Life, Harvard University Press 17 Goldberg, I.I. et al. (2006) When the brain loses its self: prefrontal inactivation during sensorimotor processing. Neuron 50, 329–339 18 Fransson, P. (2005) Spontaneous low-frequency BOLD signal fluctuations: an fMRI investigation of the resting-state default mode of brain function hypothesis. Hum. Brain Mapp. 26, 15–29 19 Fransson, P. (2006) How default is the default mode of brain function? Further evidence from intrinsic BOLD signal fluctuations. Neuropsychologia 44, 2836–2845 20 Blakemore, S-J. and Frith, C. (2003) Self-awareness and action. Curr. Opin. Neurobiol. 13, 219–224 21 Legrand, D. (2006) The bodily self: the sensori-motor roots of prereflexive self-consciousness. Phenom. Cogn. Sci. 5, 89–118 22 Parvizi, J. and Damasio, A.R. (2001) Consciousness and the brainstem. Cognition 79, 135–160 23 Craig, A.D. (2009) How do you feel – now? The anterior insula and human awareness. Nat. Rev. Neurosci. 10, 59–70 24 Von Holst, E. (1954) Relations between the central nervous system and the peripheral organs. Br. J. Anim. Behav. 2, 89–94 25 Wolpert, D.M. et al. (1995) An internal model for sensorimotor integration. Science 269, 1880–1882 26 Northoff, G. and Panksepp, J. (2008) The transpecies concept of self and the subcortical-cortical midline system. Trends Cogn. Sci. 12, 259–264 27 de Waal, F.B.M. (2008) The thief in the mirror. PLoS Biol. 6, e201 28 Lamme, V.A.F. (2003) Why visual awareness and attention are different. Trends. Cogn. Sci. 7, 12–18
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3 29 Fox, M.D. and Raichle, M.E. (2007) Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging. Nat. Rev. Neurosci. 8, 700–711 30 Fox, M.D. et al. (2005) The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc. Natl. Acad. Sci. U.S.A. 102, 9673–9678 31 Buckner, R.L. et al. (2008) The brain’s default network: anatomy, function, and relevance to disease. Ann. N. Y. Acad. Sci. 1124, 1–38 32 Andersen, R.A. and Buneo, C.A. (2003) Sensorimotor integration in posterior parietal cortex. Adv. Neurol. 93, 159–177 33 Haggard, P. and Whitford, B. (2004) Supplementary motor area provides an efferent signal for sensory suppression. Cogn. Brain Res. 19, 52–58 34 Christoff, K. et al. (2004) Neural basis of spontaneous thought processes. Cortex 40, 623–630 35 Christoff, K. and Gabrielli, J.D.E. (2000) The frontopolar cortex and human cognition: evidence for a rostrocaudal hierarchical organization within the human prefrontal cortex. Psychobiology 28, 168–186 36 Christoff, K. et al. (2009) Experience sampling during fMRI reveals default network and executive system contributions to mind wandering. Proc. Natl. Acad. Sci. U.S.A. 106, 8719–8724 37 McCaig, R.G. et al. (2010) Improved modulation of rostrolateral prefrontal cortex using real-time fMRI and meta-cognitive awareness. Neuroimage [Epub ahead of print]. 38 Vincent, J.L. et al. (2008) Evidence for a frontoparietal control system revealed by intrinsic functional connectivity. J. Neurophysiol. 100, 3328–3342 39 Greicius, M.D. and Menon, V. (2004) Default-mode activity during a passive sensory task: uncoupled from deactivation but impacting activation. J. Cogn. Neurosci. 16, 1484–1492 40 Golland, Y. et al. (2007) Extrinsic and intrinsic systems in the posterior cortex of the human brain revealed during natural sensory stimulation. Cereb. Cortex 17, 766–777 41 Wilson, S.M. et al. (2008) Beyond superior temporal cortex: intersubject correlations in narrative speech comprehension. Cereb. Cortex 18, 230– 242 42 Spreng, R.N. et al. (2010) Default network activity, coupled with the frontoparietal control network, supports goal-directed cognition. Neuroimage 53, 303–317 43 Botvinick, M.M. et al. (2001) Conflict monitoring and cognitive control. Psychol. Rev. 108, 624–652 44 Botvinick, M.M. et al. (2004) Conflict monitoring and anterior cingulate cortex: an update. Trends Cogn. Sci. 8, 539–546 45 Miller, E.K. and Cohen, J.D. (2001) An integrative theory of prefrontal cortex function. Annu. Rev. Neurosci. 24, 167–202 46 Kerns, J.G. et al. (2004) Anterior cingulate conflict monitoring and adjustments in control. Science 303, 1023–1026 47 Enger, T. (2008) Multiple conflict-driven control mechanisms in the human brain. Trends Cogn. Sci. 12, 374–380 48 Vergut, T. and Notebaert, M. (2009) Adaptation by binding: a learning account of cognitive control. Trends Cogn. Sci. 13, 252–257 49 Mayr, U. and Ach, E. (2009) The elusive link between conflict and conflict adaptation. Psychol. Res. 73, 794–802 50 Rushworth, M.F.S. et al. (2007) Contrasting roles for anterior cingulate and orbitofrontal cortex in decisions and social behaviour. Trends Cogn. Sci. 11, 169–176 51 Etkin, A. et al. (2010) Emotional processing in anterior cingulate and medial prefrontal cortex. Trends Cogn. Sci. DOI: 10.1016/j.tics.2010. 11.004 52 Bush, G. et al. (2000) Cognitive and emotional influences in anterior cingulate cortex. Trends Cogn. Sci. 4, 215–222 53 Pessoa, L. (2008) On the relationship between emotion and cognition. Nat. Rev. Neurosci. 9, 148–158 54 Pessoa, L. (2010) Emotion and cognition and the amygdala: from ‘what is it?’ to ‘what’s to be done?’. Neuropsychologia 48, 3416–3429 55 Botvinick, M.M. (2007) Multilevel structure in behaviour and in the brain: a model of Fuster’s hierarchy. Philos. Trans. R. Soc. Lond. B Biol. Sci. 362, 1615–1626 56 Norman, D.A. and Shallice, T. (1986) Attention to action: willed and automatic control of behavior. In Consciousness and Self-regulation. Advances in Research and Theory (Vol. 4) (Davidson, R.J. et al., eds), In pp. 1–18, Plenum Press 57 Posner, M.I. and Rothbart, M.K. (1998) Attention, self-regulation and consciousness. Philos. Trans. R. Soc. Lond. B Biol. Sci. 353, 1915–1927
111
Opinion 58 Duncan, J. and Owen, A.M. (2000) Common regions of the human frontal lobe recruited by diverse cognitive demands. Trends Neurosci. 23, 475–483 59 Corbetta, M. and Shulman, G. (2002) Control of goal-directed and stimulus-driven attention in the brain. Nat. Rev. Neurosci. 3, 201–215 60 Gross, J.J. and Thomspon, R.A. (2007) Emotion regulation: conceptual foundations. In Handbook of Emotion Regulation (Gross, J.J., ed.), pp. 3–25, Guilford 61 Ochsner, K.N. and Gross, J.J. (2005) The cognitive control of emotion. Trends Cogn. Sci. 9, 242–249 62 Northoff, G. (2005) Is emotion regulation self-regulation? Trends Cogn. Sci. 9, 408–409 63 Lane, R.D. (2008) Neural substrates of implicit and explicit emotional processes: a unifying framework for psychosomatic medicine. Psychosom. Med. 70, 214–231 64 Phillips, M.L. et al. (2008) A neural model of voluntary and automatic emotion regulation: implications for understanding the pathophysiology and neurodevelopment of bipolar disorder. Mol. Psychiatr. 13, 833–857 65 Berkman, E.T. and Lieberman, M.D. (2009) Using neuroscience to broaden emotion regulation: theoretical and methodological considerations. Soc. Pers. Psychol. Comp. 3/4, 475–493
112
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3 66 Price, J.L. et al. (1996) Networks related to the orbital and medial prefrontal cortex; a substrate for emotional behavior? Prog. Brain Res. 107, 523–536 67 Etkin, A. et al. (2006) Resolving emotional conflict: a role for the rostral anterior cingulate cortex in modulating activity in the amygdala. Neuron 51, 871–882 68 Egner, T. et al. (2008) Dissociable neural systems resolve conflict from emotional versus nonemotional distracters. Cereb. Cortex 18, 1475– 1484 69 Bressler, S.L. and Menon, V. (2010) Large-scale brain networks in cognition: emerging methods and principles. Trends Cogn. Sci. 14, 277– 290 70 Varela, F.J. et al. (2001) The brainweb: phase synchronization and large-scale integration. Nat. Rev. Neurosci. 2, 229–239 71 Jack, A. and Roepstorff, A. (2002) Introspection and cognitive brain mapping: from stimulus-response to script-report. Trends Cogn. Sci. 6, 333–339 72 Lutz, A. et al. (2008) Attention regulation and monitoring in meditation. Trends Cogn. Sci. 12, 163–169 73 Farb, N.A.S. et al. (2007) Attending to the present: mindfulness meditation reveals distinct neural modes of self-reference. Soc. Cogn. Affect. Neurosci. 2, 313–322
Review
Songs to syntax: the linguistics of birdsong Robert C. Berwick1, Kazuo Okanoya2,3, Gabriel J.L. Beckers4 and Johan J. Bolhuis5 1
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA Department of Cognitive and Behavioral Sciences, University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902, Japan 3 RIKEN Brain Science Institute, 2-1 Hirosawa, Wako-City, Saitama 351-0198, Japan 4 Department of Behavioural Neurobiology, Max Planck Institute for Ornithology, D-82319 Seewiesen, Germany 5 Behavioural Biology and Helmholtz Institute, University of Utrecht, Padualaan 8, 3584 CH Utrecht, The Netherlands 2
Unlike our primate cousins, many species of bird share with humans a capacity for vocal learning, a crucial factor in speech acquisition. There are striking behavioural, neural and genetic similarities between auditory-vocal learning in birds and human infants. Recently, the linguistic parallels between birdsong and spoken language have begun to be investigated. Although both birdsong and human language are hierarchically organized according to particular syntactic constraints, birdsong structure is best characterized as ‘phonological syntax’, resembling aspects of human sound structure. Crucially, birdsong lacks semantics and words. Formal language and linguistic analysis remains essential for the proper characterization of birdsong as a model system for human speech and language, and for the study of the brain and cognition evolution. Human language and birdsong: the biological perspective Darwin [1] noted strong similarities between the ways that human infants learn to speak and birds learn to sing. This ‘perspective from organismal biology’ [2] initially led to a focus on apes as model systems for human speech and language (see Glossary), with limited success, however [3,4]. Since the end of the 20th century, biologists and linguists have shown a renewed interest in songbirds, revealing fascinating similarities between birdsong and human speech at the behavioural, neural, genomic and cognitive levels [5–9]. Yip has reviewed the relationship between human phonology and birdsong [7]. Here, we address another potential parallel between birdsong and human language: syntax. Comparing syntactic ability across birds and humans is important, because at least since the beginning of the modern era in cognitive science and linguistics, a combinatorial syntax has been viewed to lie at the heart of the distinctive creative and open-ended nature of human language [10]. Here, we discuss current understanding of the relationship between birdsong and human syntax in light of recent experimental and linguistic advances, focusing on the formal parallels and their implications for underlying cognitive and computational abilities. Finally, we sketch the prospects for future experimental work, as part of the Corresponding author: Bolhuis, J.J. (
[email protected]).
Glossary Bigram: a subsequence of two elements (notes, words or phrases) in a string. Context-free language (CFL): the sets of strings that can be recognized or generated by a pushdown-stack automaton or context-free grammar. A CFL might have grammatical dependencies nested inside to any depth, but dependencies cannot overlap. Finite-state automaton (FSA, FA): a computational model of a machine with finite memory, consisting of a finite set of states, a start state, an input alphabet, and a transition function that maps input symbols and current states to some set of next states. Finite-state grammar (FSG): a grammar that formally replicates the structure of a FSA, also generating the regular languages. K-reversible finite-state automaton: an FSA that is deterministic when one ‘reverses’ all the transitions so that the automaton runs backwards. One can ‘look behind’ k previous words to resolve any possible ambiguity about which next state to move to. Language: any possible set of strings over some (usually finite) alphabet of words. Locally testable language: a strict subset of the regular languages formed by the union, intersection, or complement of strictly locally testable languages. (First-order) Markov model or process: a random process where the next state of a system depends only on the current state and not its previous states. Applied to word or acoustic sequences, the next word or acoustic unit in the sequence depends only on the current word or acoustic unit, rather than previous words or units. Mildly context-sensitive language (MCSL): a language family that lies ‘just beyond’ the CFLs in terms of power, and thought to encompass all the known human languages. A MCSL is distinguished from a CFL in that it contains clauses that can be nested inside clauses arbitrarily deeply, with a limited number of overlapping grammatical dependencies. Morphology: the possible ‘word shapes’ in a language; that is, the syntax of words and word parts. Phoneme: the smallest possible meaningful unit of sound. Phonetics: the study of the actual speech sounds of all languages, including their physical properties, the way they are perceived and the way in which vocal organs produce sounds. Phonology: the study of the abstract sound patterns of a particular language, usually according to some system of rules. Push-down stack automaton (PDA): a FSA augmented with a potentially unbounded memory store, a push-down stack, that can be accessed in terms of a last-in, first-out basis, similar to a stack of dinner plates, with the last element placed on the stack being the top of the stack, and first accessible memory element. PDAs recognize the class of CFLs. Recursion: a property of a (set of) grammar rules such that a phrase A can eventually be rewritten as itself with non-empty strings of words or phrase names on either side in the form aAb and where A derives one or more words in the language. Regular language: a language recognized or generated by a FSA or a FSG. Semantics: the analysis of the meaning of a language, at the word, phrase, sentence level, or beyond. Strictly locally testable language (or stringset): a strict subset of the regular languages defined in terms of a finite list of strings of length less than or equal to some upper length k (the ‘window length’). Sub-regular language: any subset of the regular languages, in particular generally a strict subset with some property of interest, such as local testability. Syllable: in linguistics, a vowel plus one or more preceding or following consonants. Syntax: the rules for arranging items (sounds, words, word parts or phrases) into their possible permissible combinations in a language.
1364-6613/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.002 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
113
()TD$FIG][ Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Motif Syllable Note 10 Frequency (kHz)
i
i
i
0 0.5 s TRENDS in Cognitive Sciences
Figure 1. Sound spectrogram of a typical zebra finch song depicting a hierarchical structure. Songs often start with ‘introductory notes’ (denoted by ‘i’) that are followed by one or more ‘motifs’, which are repeated sequences of syllables. A ‘syllable’ is an uninterrupted sound, which consists of one or more coherent time-frequency traces, which are called ‘notes’. A continuous rendition of several motifs is referred to as a ‘song bout’.
ongoing debate as to what is species specific about human language [3,11]. We show that, although it has a simple syntactic structure, birdsong cannot be directly compared with the syntactic complexity of human language, principally because it has neither semantics nor a lexicon. Comparing human language and birdsong Human speech and birdsong both consist of complex, patterned vocalizations (Figure 1). Such sequential structures can be analysed and compared via formal syntactic methods. Aristotle described language as sound paired with meaning [12]. Although partly accurate, a proper interspecies comparison calls for a more articulated ‘system diagram’ of the key components of human language, and their non-human counterparts. We depict these as a tripartite division (Figure 2): (i) an ‘external interface’, a sensorimotor-driven, input–output system providing proper articulatory output and perceptual analysis; (ii) a rule system generating correctly structured sentence forms, incorporating words; and (iii) an ‘internal interface’ to a conceptual–intentional system of meaning and reasoning; that is, ‘semantics’. Component (i) corresponds to systems for producing, perceiving and
[()TD$FIG]
learning acoustic sequences, and might itself involve abstract representations that are not strictly sensorimotor, such as stress placement. In current linguistic frameworks, (i) aligns with acoustic phonetics and phonology, for both production and perception. Component (ii) feeds into both the sensorimotor interface (i), as well as a conceptual–intentional system (iii), and is usually described via some model of recursive syntax. Although linguists debate the details of these components, there seems to be more general agreement as to the nature of (i), less agreement as to the nature of (ii) and widespread controversy as to (iii). For instance, whereas the connection between a fully recursive syntax and a conceptual–intentional system is sometimes considered to lie at the heart of the species-specific properties of human language, there is considerable debate over the details, which plays out as the distinct variants of current linguistic theories [13–16]. Some of these accounts reduce or even eliminate the role of (ii), assuming a more direct relation between (i) and (iii) (e.g. [17,18]). The system diagram in Figure 2 therefore cannot represent any detailed neuroanatomical or abstract ‘wiring diagram’, but
Words (lexical items) + Syntactic rules
External interface
Internal interface
Phonological forms/sequencing acoustic-phonetics
Perception
Production
Sounds, gestures (external to organism)
Concepts, intentions, reasoning (internal to organism) TRENDS in Cognitive Sciences
Figure 2. A tripartite diagram of abstract components encompassing both human language and birdsong. On the left-hand side, an external interface (i), comprised of sensorimotor systems, links the perception and production of acoustic signals to an internal system of syntactic rules, (ii). On the right-hand side, an internal interface links syntactic forms to some system of concepts and intentions, (iii). With respect to this decomposition, birdsong seems distinct from human language in the sense of lacking both words and a fully developed conceptual–intentional system.
114
Review rather a way to factor apart the distinct knowledge types in the sense of Marr [19]. Notably, our tripartite arrangement does not preclude the possibility that only humans have syntactic rules, or that such rules always fix information content in a language-like manner. For example, in songbirds, sequential syntactic rules might exist only to construct variable song element sequences rather than variable meanings per se [9]. Birdsong and human syntax: similarities and differences Both birdsong and human language are hierarchically organized according to syntactic constraints. We compare them by first considering the complexity of their sound structure, and then turning in the next section, to aspects beyond this dimension. Overall, we find that birdsong sound structure, at least for the Bengalese finch, seems characterizable by an easily learnable, highly restricted subclass of the regular languages (languages that can be recognized or generated by finite-state machines; see Box 3). Whereas human language sound structure also appears to be describable via finite-state machines, comparable results are lacking in the case of human language, although certain parts of human language sound structure, such as stress patterns, have also recently been shown to be easily learnable [20]. In birdsong, individual notes can be combined as particular sequences into syllables, syllables into ‘motifs’, and motifs into complete song ‘bouts’ (Figure 1). Birdsong thus consists of chains of discrete acoustic elements arranged in a particular temporal order [21–23]. Songs might consist of fixed sequences with only sporadic variation (e.g. zebra finches), or more variable sequences (e.g. nightingales, starlings, or Bengalese finches), where a song element might be followed by several alternatives, with overall song structure describable by probabilistic rules between a finite number of states [23,24] (Figure I, Box 1). For example, a song of a nightingale is built out of a fixed 4second note sequence. An individual nightingale has 100– 200 song types, clustered into 2–12 ‘packages’. Package singing order remains probabilistic [25]. A starling song bout might last up to 1 minute, composed of many distinct
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
motifs containing song elements in a fixed order lasting 0.5–1.5 seconds. Gentner and Hulse [26] found that a firstorder Markov model (i.e. bigrams) suffices to describe most motif sequence information in starling songs (Box 2). Thus, for the most part, the next motif is predictable by the immediately preceding motif. Starlings also use this information to recognize specific song bouts. Similarly, in American thrush species, relatively low-order Markov chains suffice for modelling song sequence variability [27]. Can songbird ‘phonological syntax’ [28] ever be more complex than this? Bengalese finch song typically contains approximately eight song note types organized into 2–5 note ‘chunks’ that also follow local transition probabilities [29] (Figure I, Box 1). Unlike single-note Markov processes, chunks such as the three-note sequence cde can be reused in other places in a song [24,30]. However, chunks are not reused inside other chunks, so the hierarchical depth is strictly limited. If Bengalese finch song could be characterized solely in terms of bigrams, it would belong to the class of so-called ‘strictly locally 2-testable languages’, a highly restricted subset of the class of the regular languages, That is, a bird could verify, either for purposes of production or for recognition, whether a song is properly formed by simply ‘sliding’ a set of two-note sequences or ‘window constraints’ across the entire note sequence, checking to see that all the two-note sequences found ‘pass’ (Box 3). For example, if the valid note sequences were ab, abab, ababab, and so on, then every a must be followed by a b, except at the song start; and every b must be followed by an a, except at the song end. Thus, aside from the beginning and end of a song, a bird could check whether a song is well formed by using two bigram templates: [a-b] and [b-a]. This turns out to be the simplest kind of pattern recognizable by a finite-state automaton (FSA), because the internal states of the automaton need not be used for any detailed computation aside from bigram note template matching (Box 3). The Bengalese finch song automaton in Figure I (Box 1), which encompasses the full song sequence repertoire extracted from a single, actual bird [31], indicates that birdsong structure can be more complicated than a simple
Box 1. Birdsong, human language syntax and the Chomsky hierarchy All sets of strings, or languages, can be rank ordered via strict setinclusion according to their computational power. The resulting ‘rings’ are called the ‘Chomsky hierarchy’ [61] (Figure I; ring numbers are used below). For birdsong and human syntax comparisons, the most important point is the small overlap between the possible languages generated by human syntax (the irregularshaded grey set), as opposed to birdsong syntax (the stippled grey set). 1. The finite languages, all sets of strings of finite length. 2. The FSA generating the regular languages. An FSA is represented as a directed graph of states with labelled edges, a finite-state transition network. The corresponding grammar of an FSA has rules of the form X!aY or X!a, or right-linear, where X and Y range over possible automaton states (nonterminals), and a ranges over symbols corresponding to the labelled transitions between states. The FSA recognizing the (ab)1 language only need to test for four specific adjacent string symbol pairs (bigrams; the pairs (leftedge, a); (a,b); (b, a); and (b, right-edge) [62]. 3. The PDA, generating the CFLs. PDAs are finite-state machines augmented with a potentially unbounded auxiliary memory that
can be accessed from the top working down. PDAs can be thought of as augmenting FSA with the ability to use subroutines, yielding the recursive transition networks. Grammars for these languages are consequently more general and can include rules such as X!Ya, X!aYa or X!aXa, or context-free rules. 4. The PDA whose stacks might themselves be augmented with embedded stacks, generating the MCSLs. Examples of such patterns in human languages are rare, but do exist [63,64]. These patterns are exemplified by stringsets such as anbmcndm, where the as and cs must match up in number and order and, separately, the bs and the cs, so-called ‘cross-serial’ dependencies (see [65,66]). A broad range of linguistic theories accommodate this much complexity [13–16,59,66]. No known human languages require more power than this. The two irregular sets drawn cutting across the hierarchy depict the probable location of the human languages (shaded) and birdsong (stippled). Both clearly do not completely subsume any of the previously mentioned stringsets. Birdsong and human languages intersect at the very bottom owing to the possible overlap of finite lists of human words and the vocal repertoire of certain birds.
115
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
[()TD$FIG]
6 Recursively enumerable languages
5 Context-sensitive languages anbncndnen
?
4 Mildly context-sensitive languages anbncndn
a1 a2 a3 a4
b1 b2 b3 b4
Jon Mary Peter Jane
lets help teach swim
3 Context-free languages anbn
Human languages
a1
a2 the starling the cats
b2 want
was tired
b1
2 Regular languages Bengalese finch song ab 0 Birdsong
ab
1
cde
2
fg 3
ab 1 Finite languages
TRENDS in Cognitive Sciences
Figure I. The Chomsky hierarchy of languages along with the hypothesized locations of both human language and birdsong. The nested rings in the figure correspond to the increasingly larger sets, or languages, generated or recognized by more powerful automata or grammars. An example of the state transition diagram corresponding to a typical Bengalese finch song [31] is shown in the next ring after this, corresponding to some subset of the regular languages.
116
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Box 2. Is recursion for the birds? Recursive constructs occur in many familiar human language examples, such as the starling the cats want was tired, where one finds a full sentence phrase, the starling was tired, that contains within it a second, ‘nested’ or ‘self-embedded’ sentence, S, the cats want. In this case, the rule that constructs Sentences can apply to its own output, recursively generating a pattern of ‘nested’ or ‘serial’ dependencies. We can write a simple CFG with three rules that illustrates this concept as follows: S!aB; B!Sb; S!e, where e corresponds to the empty symbol. We can use this grammar to show that one can first apply the rule that expands S as aB and then can apply the second rule to expand B as Sb, thus obtaining, aSb; S now appears with non-null elements on both sides, so we say that S has been ‘self-embedded’. If we now use the third rule to replace S with the empty symbol, we obtain the output ab. Alternatively, we could apply the first and second rules over again to obtain the string aabb, or, more generally, anbn for any integer n. In our example, the as and the bs in fact form nested dependencies because they are correspondingly paired in the same
way that the starling must be paired with the singular form was, rather than the plural were; similarly, the cats must be paired with want rather than the singular form wants. So, for example, to indicate a nested dependency pattern properly, the form a3b3 should be more accurately written as a1a2a3b3b2b1, where the superscripts indicate which as and bs must be paired up. Thus, any method to detect whether an animal can either recognize or produce a strictly context-free pattern requires that one demonstrates that the correct as and bs are paired up; merely recognizing that the number of as matches the number of bs does not suffice. This is one key difficulty with the Gentner et al. protocol and result [56], which probed only for the ability of starlings to recognize an equal number of as and bs in terms of warble and rattle sound classes (i.e. warlble3rattle3 patterns) but did not test for whether these warblerattles were properly paired off in a nested dependency fashion. As a result, considerable controversy remains as to whether any nonhuman species can truly recognize strictly context-free patterns [11,67].
Box 3. Descriptive complexity, birdsong and human syntax The substructure of the regular languages, sub-regular language hierarchies, could be relevant to gain insight into the computational capacities of animals and humans in the domain of acoustic and artificial language learning [62,68,69]. Similar to the Chomsky hierarchy, the family of regular languages can itself be ordered in terms of strictly inclusive sets of increasing complexity [69]. The ordering uses the notion of descriptive complexity, corresponding informally to how much local context and internal state information must be used by a finite-state machine to recognize a particular string pattern correctly. For example, to recognize the regular pattern used in the starling experiment [56], (ab)1, a finite-state machine needs only to check four adjacency relations or bigrams as they appear directly in a candidate string: the beginning of the string followed by an a; an a followed by a b;, a b followed by an a or else a b followed by the end of the string. We can say such a pattern is strictly locally 2-testable or SL2 [69]. As we increase the length of these factors, we obtain a strictly increasing set hierarchy of regular languages, the strictly
bigram description. Although there are several paths through this network from the beginning state on the left to the double-circled end state on the right, the ‘loop’ back from state 2 to state 1 along with the loop from state 3 to 1 can generate songs with an arbitrary number of cde ab notes, followed by the notes cde fg. From there, a song can continue with the notes ab back to state 1, and so lead to another arbitrary number of cde ab notes, all finally ending in cde fg. In fact, the transitions between states are stochastic; that is, the finch can vary its song by choosing to go from state 2 back to state 1 with some likelihood that is measurably different from the likelihood of continuing on to state 3. In any case, formally this means that the notes cde fg can appear in the ‘middle’ of a song, arbitrarily far from either end, bracketed on both sides by some arbitrarily long number of cde ab repetitions. Such a note pattern is no longer strictly locally testable because now there can be no fixed-length ‘window’ that can check whether a note sequence ‘passes’. Instead of checking the note sequences directly, one must use the memory of the FSA indirectly to ‘wait’ after encountering the first of a possibly arbitrarily long sequence of cde abs. The automaton must then stay in this state until the required cde fg sequence appears. Such a language pattern remains recognizable by a restricted FSA, but one more powerful than a simple bigram checking machine. Such complexity seems typical. Figure 3 displays
locally testable languages, denoted SLk, where k is the ‘window length’ [56,62,68]. It might be of some value to understand the range of sub-regular patterns that birds can perceive or produce. To tentatively answer this question, we applied a program for computing local testability [38,44,70]. For example, the FSA in Figure I (Box 1) recognizes a language that is locally testable. This answer agrees with the independent findings of Okanoya [31] and Gentner [26,57]. Other sub-regular pattern families have been recently explored in connection with human language sound systems [20,71]. Some of these might ultimately prove relevant to birdsong because they deal with acoustic patterns. In particular, possible sound combinations might fall into the same classes as those of human languages. Finally, all these sub-regular families could be extended straightforwardly to include phrases explicitly, but still without the ability to ‘count’, as seems true of human language ([66,72–74] R. Berwick, PhD Thesis, MIT, 1982). It is clear that we have only just begun to scratch the surface of the detailed structure of sub-regular patterns and their cognitive relevance.
more fully a second, more complex Bengalese finch song drawn according to the same transition network methodology, this time explicitly showing the probability that one state follows another via the numbers on the links between [()TD$FIG]
aaa
b
bcadb
adb 0.33
hh hh 0.08 b 0.12 eekfff 0.88 bhh 0.22
ilga 0.37
lga 0.55
f 0.44 jaa TRENDS in Cognitive Sciences
Figure 3. Probabilistic finite-state transition diagram of the song repertoire of a Bengalese finch. Directed transition links between states are labelled with note sequences along with the probability of moving along that particular link and producing the associated note sequence. The possibility of loops on either side of fixed note sequences such as hh or lga mean that this song is not strictly locally testable (see Box 3 and main text). However, it is still k-reversible, and so easily learned from example songs [35]. Adapted, with permission, from [75].
117
Review states [32]. It too contains loops, including one from the final, double-circled state back to the start, so that a certain song portion can be found located arbitrarily far in the middle. For example, among several other possibilities, the note sequence lga, which occurs on the transition to the double-circled final state, can be preceded by any number of b hh repetitions, as well as followed by jaa b bcadb and then an arbitrary number of eekfff adb notes, again via a loop. Nightingales, another species with complex songs, can sing motifs with notes that are similarly embedded within looped note chunks [33]. Considering that there are hundreds of such motifs in a song repertoire of a nightingale, their songs must be at least as complex as those of Bengalese finches, at least from this formal standpoint. More precisely and importantly, the languages involved here, at least in the case of Bengalese finch, and perhaps other avian species, are closely related to constraints on regular languages that enable them to be easily learned [31,34,35]. Kakishita et al. [29] constructed a particular kind of restricted FSA generating the observed sequences (a k-reversible FSA). Intuitively, in this restricted case, a learner can determine whether two states are equivalent by examining only the local note sequences that can follow from any two states, determining whether the two states should be considered equivalent [36,37] (Figure I, Box 1). It is this local property that enables a learner to learn correctly and efficiently the proper automaton corresponding to external song sequences simply by listening to them, something that is impossible in general for FSA [38,39]. What about human language sound structure or its phonology? This is also now known to be describable purely in terms of FSA [40], a result that was not anticipated by earlier work in the field [41] which assumed more general computational devices well beyond the power of FSA (Box 1). For example, there are familiar ‘phonotactic’ constraints in every language, such that English speakers know that a form such as ptak could not be a possible English word, but plast might be [42]. To be sure, such constraints are often not ‘all or none’ but might depend on the statistical frequency of word subparts. Such gradation might also be present in birdsong, as reflected by the probabilistic transitions between states, as shown in Figure I (Box 1) and Figure 3 [31,43]. Once stochastic gradation is modelled, phonotactic constraints more closely mirror those found in birdsong finite-state descriptions. Such formal findings have been buttressed by recent experiments with both human infants and Bengalese finches, confirming that adjacent acoustic dependencies of this sort are readily learnable from an early age using statistical and prosodic cues [32,44–46]. However, other human sound structure rules apparently go beyond this simplest kind of strictly local description, although remaining finite state. These include the rules that account for ‘vowel harmony’ in languages such as Turkish, where, for example, the properties of the vowel u in the word pul, ‘stamp’, are ‘propagated’ through to all its endings [7], and stress patterns (J. Heinz, PhD thesis, University of California at Los Angeles, 2007). Whereas the limited-depth hierarchies that arise in songbird syntax seem reminiscent of the bounded rhythmic structures or 118
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
‘beat patterns’ found in human speech or music, it remains an open question whether birdsong metrical structure is amenable to the formal analysis of musical meter, or even how stress is perceived in birds as opposed to humans [47– 49] (Box 4). Tweets to phrases: the role of words Turning to syntactic description that lies beyond sound structure, we find that birdsong and human language syntax sharply diverge. In human syntax, but not birdsong, hierarchical combinations of effectively arbitrary depth can be assembled by combining words and words parts, such as the addition of s to the end of apple to yield apples, a word-construction process called ‘morphology’. Human syntax then goes even further, organizing words into higher-order phrases and entire sentences. None of these additional levels appear to be found in birdsong. This reinforces Marler’s long-standing view [28] that birdsong might best be regarded as ‘phonological syntax’, a formal language; that is, a set of units (here acoustic elements) that are arranged in particular ways but not others according to a definable rule set. What accounts for this difference between birdsong and language? First, birdsong lacks semantics and words in the human sense, because song elements are not combined to yield novel ‘meanings’. Instead, birdsong can convey only a limited set of intentions, as a graded, holistic communication system to attract mates or deter rivals and defend territory. In terms of the tripartite diagram of Figure 2, the conceptual–intentional component is greatly reduced. Birds might still have some internalized conceptual–intentional system, but for whatever reason it is not connected to a syntactic and externalization component. By contrast, human syntax is intimately wedded to our conceptual system, involving words in both their syntactic and semantic aspects, so that, for example, combining ‘red’ with ‘apples’ yields a meaning quite distinct from, for example, ‘green apples’. It seems plausible that this single distinction drives fundamental differences between birdsong and human syntax. In particular, birds such as Bengalese finches and nightingales can and do vary their songs in the acoustic domain, rearranging existing ‘chunks’ to produce hundreds of distinct song types that might serve to identify individual birds and their degree of sexual arousal, as well as local ‘dialect-based’ congener groups [50–52], although a recent systematic study of song recombination suggests that birds rarely introduce improvised song notes or sequences [32]. For example, skylarks mark individual identity by particular song notes [51], as starlings do with song sequences [52]; and canaries use special ‘sexy syllables’ to strengthen the effect of mate attraction [50]. However, more importantly, this bounded acoustic creativity pales in comparison with the seemingly limitless openended variation observed in even a single human speaker, where variation might be found not only at the acoustic level in how a word is spoken, but also in how words are combined into larger structures with distinct meanings, what could be called ‘compositional creativity’. It is this latter aspect that appears absent in birdsong. Song variants do not result in distinct ‘meanings’ with completely new semantics, but serve only to modify the entirety of the
Review original behavioural function of the song within the context of mating, never producing a new behavioural context, and so remaining part of a graded communication system. For example, the ‘sexy syllable’ conveys the strength of the motivation of a canary, but does not change the meaning of its song [50]. In this sense, birdsong creativity lies along a finite, acoustic combinational dimension, never at the level of human compositional creativity. Second, unlike birdsong, human language sentences are potentially unbounded in length and structure, limited only by extraneous factors, such as short-term memory or lung capacity [53]. Here too words are important. The combination of the Verb ate and the Noun apples yields the combination ate apples that has the properties of a Verb rather than a Noun. This effectively ‘names’ the combination as a new piece of hierarchical structure, phrase, with the label ate, dubbed the head of the phrase [54]. This new Verb-like combination can then act as a single object and enter into further syntactic combinations. For example, Allison and ate apples can combine to form Allison ate apples, again taking ate as the head. Phrases can recombine ad infinitum to form ever-longer sentences, so exhibiting the open-ended novelty that von Humboldt famously called ‘the infinite use of finite means’ [55], that is immediately recognized as the hallmark of human language: Pat recalled that Charlie said that Moira thought that Allison ate apples. Thus in general, sentences can be embedded within other sentences, recursively, as in the starling the cats want was tired, in a ‘nested’ dependency pattern, where we find one ‘top-level’ sentence, the starling was tired, consisting of a Subject, the starling, and a Predicate phrase was tired, that in turn itself contains a Sentence, the cats want formed out of another Subject, the cats, and a Predicate, want. Informally, we call such embeddings ‘recursive’, and the resulting languages ‘context-free languages’ (CFLs; Box 1). This pattern reveals a characteristic possibility for human language, a ‘nested dependency’. The singular number feature associated with the Subject, the starling, must match up with the singular number feature associated with top-level Verb form was, whereas the embedded sentence, the cats want has a plural Subject, the cats, that must agree with the plural Verb form want. Such ‘serial nested dependencies’ in the abstract form, a1a2b2b1 are both produced and recognized quite generally in human language [53].
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
The evidence for a corresponding ability in birds remains weak, despite recent experiments on training starlings to recognize such patterns (which must be carefully distinguished from the ability to produce such sequences in a naturalistic setting, as described in the previous section) [56,57]. In starlings, only the ability to recognize nesting was tested, and not the crucial dependency aspect that pairs up particular as with particular bs [11] (Box 2). In fact, human syntax goes beyond this kind of recursion to encompass certain strictly mildly contextsensitive constructions that have even more complex, overlapping dependency patterns (Box 1). Importantly, even though they differ on much else, since approximately 1970 a broad range of syntactic theories, comprising most of the major strands of modern linguistic thought, have incorporated Bloomfield’s [54] central insight that human language syntax is combinatorially word-centric in the manner described above [13–16,58,59], as well as having the power to describe both nested and overlapped dependencies. To our knowledge, such mild-context sensitivity has never been demonstrated, or even tested, in any nonhuman species. In short, word-driven structure building seems totally absent in songbird syntax, and this limits its potential hierarchical complexity. Birdsong motifs lack word-centric ‘heads’ and so cannot be individuated via some internal labelling mechanism to participate in the construction of arbitrary-depth structures. Whereas a starling song might consist of a sequence of warbles and rattle motif classes [57], there seems to be no corresponding way in which the acoustical features of the warble class are then used to ‘name’ distinctively the warble-rattle sequence as a whole, so that this combination can then be manipulated as single unit phrases into ever-more complex syntactic structures. Birdsong phrase structure? Nonetheless, recent findings suggest that birds have a limited ability to construct phrases, at least in the acoustic domain, as noted above, accounting for individual variation within species [32,33]. In particular, there might be acoustic segmentation chunking in the self-produced song of the Bengalese finch [29,31]. Suge and Okanoya used the ‘click’ protocol pioneered by Fodor et al. [60] to probe the ‘psychological reality’ of syntactic phrases in humans [34].
Box 4. Questions for future research We do not know for certain the descriptive complexity of birdsong. Does it belong to any particular member of the sub-regular language hierarchies, or does it lie outside these, possibly in the family of strictly CFLs? If birdsong is contained in some sub-regular hierarchy, how is this result to be reconciled with the findings in the Gentner et al. starling study [56]? If birdsong is context free, then we can again ask to what family of CFLs it belongs: is it a deterministic CFL (as opposed to a general CFL)? Is it learnable from positive examples? Current tests of finite-state versus CFL abilities in birdsong have chosen only the weakest (computationally and descriptively simplest) finite-state language to compare against the simplest CFL. Can starlings be trained to recognize descriptively more complex finite-state patterns; for example, a locally testable but not non-strictly local testable finite-state pattern, such as a1(ba1)1,
where a bird would have to recognize a note(s) such as b arbitrarily far from both ends of a song [68]? What about sub-regular patterns that are more complicated than this? The Gentner et al. experiment [49] did not test for the nested dependency structure characteristic of embedded sentences in human language. Can birds be trained to recognize truly nested dependencies, even if just of finite depth? Using the methods developed in, for example, [71], what is the descriptive complexity of prosody or rhythmic stress patterns in birdsong? What are the neural mechanisms underlying variable song sequences in songbirds? Both human speech and birdsong involve sequentially arranged vocalizations. Are there similar neural mechanisms for the production and perception of such sequences in songbirds and humans? Bolhuis et al. [9] have summarized current knowledge of these mechanisms in humans and birds.
119
Review Applied to human language, subjects given ‘click’ stimuli in the middle of phrases such as ate the apples, tend to ‘migrate’ their perception of where the click occurs to the beginning or end of the phrase. Suge and Okanoya established that 3-4 note sequences, such as the cde in Figure I (Box 1) are perceived as unitary ‘chunks’ so that the finches tended to respond as if the click was at the c or e end of an cde ‘chunk [34]. Importantly, recall that Bengalese finches are also able to produce such sequence chunks, as described earlier and in Figure I (Box 1) and Figure 3. This is strikingly similar to the human syntactic capacity to ‘remember’ an entire sequence encapsulated as a single phrase or a ‘state’ of an automaton, and to reuse that encapsulation elsewhere, just as human syntax reuses Noun Phrases and Verb Phrases. However, Bengalese finches do not seem to be able to manipulate chunks with the full flexibility of dependent nesting found in human syntax. One might speculate that, with the addition of words, humans acquired the ability to label and ‘hold in memory’ in separate locations distinct phrases such as Allison ate apples and Moira thought otherwise, parallel to the ability to label and separately store in memory the words ate and thought. Once words infiltrated the basic pre-existing syntactic machinery, the combinatory possibilities became open ended. Conclusions and perspectives Despite considerable linguistic interest in birdsong, few studies have applied formal syntactic methods to its structure. Those that do exist suggest that birdsong syntax lies well beyond the power of bigram descriptions, but is at most only as powerful as k-reversible regular languages, lacking the nested dependencies that are characteristic of human syntax [11,29,56,57]. This is probably because of the lack of semantics in birdsong, because song sequence changes typically alter message strength but not message type. This would imply that birdsong might best serve as an animal model to study learning and neural control of human speech [9], rather than internal syntax or semantics per se. Furthermore, comparing the structure of human speech and birdsong can be a useful tool for the study of the evolution of brain and behaviour (Box 4). Bolhuis et al. [9] have argued that, in the evolution of vocal learning, both common descent (homologous brain regions) and evolutionary convergence (distant taxa exhibiting functionally similar auditory-vocal learning) have a role. References 1 Darwin, C. (1882) The Descent of Man and Selection in Relation to Sex, Murray 2 Margoliash, D. and Nusbaum, H.C. (2009) Language: the perspective from organismal biology. Trends Cogn. Sci. 13, 505–510 3 Hauser, M.D. et al. (2002) The faculty of language: what is it, who has it, and how did it evolve? Science 298, 1569–1579 4 Bolhuis, J.J. and Wynne, C.D.L. (2009) Can evolution explain how minds work? Nature 458, 832–833 5 Doupe, A.J. and Kuhl, P.K. (1999) Birdsong and human speech: common themes and mechanisms. Annu. Rev. Neurosci. 22, 567–631 6 Bolhuis, J.J. and Gahr, M. (2006) Neural mechanisms of birdsong memory. Nature Rev. Neurosci. 7, 347–357 7 Yip, M. (2006) The search for phonology in other species. Trends Cogn. Sci. 10, 442–446
120
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3 8 Okanoya, K. (2007) Language evolution and an emergent property. Curr. Op. Neurobiol. 17, 271–276 9 Bolhuis, J.J. et al. (2010) Twitter evolution: converging mechanisms in birdsong and human speech. Nature Rev. Neurosci. 11, 747–759 10 Chomsky, C. (1966) Cartesian Linguistics, Harper & Row 11 Corballis, M.C. (2007) Recursion, language, and starlings. Cogn. Sci. 31, 697–704 12 Aristotle (1970) Historia Animalium. v.II, Harvard University Press 13 Steedman, M. (2001) The Syntactic Process, MIT Press 14 Kaplan, R. and Bresnan, J. (1982) Lexical-functional grammar: a formal system for grammatical relations. In The Mental Representation of Grammatical Relations (Bresnan, J., ed.), pp. 173– 281, Cambridge, MA, MIT Press 15 Gazdar, G. et al. (1985) Generalized Phrase-structure Grammar, Harvard University Press 16 Pollard, C. and Sag, I. (1994) Head-driven Phrase Structure Grammar, University of Chicago Press 17 Culicover, P. and Jackendoff, R. (2005) Simpler Syntax, Oxford University Press 18 Goldberg, A. (2006) Constructions at Work: The Nature of Generalization in Language, Oxford University Press 19 Marr, D. (1982) Vision, W.H. Freeman & Co 20 Rogers, J. et al. (2010) On languages piecewise testable in the strict sense. In Proceedings of the 11th Meeting of the Mathematics of Language Association (eds), pp. 255–265, Springer-Verlag 21 Okanoya, K. (2004) The Bengalese finch: a window on the behavioral neurobiology of birdsong syntax. Ann. NY Acad. Sci. 1016, 724–735 22 Sasahara, K. and Ikegami, T. (2007) Evolution of birdsong syntax by interjection communication. Artif. Life 13, 259–277 23 Catchpole, C.K. and Slater, P.J.B. (2008) Bird Song: Biological Themes and Variations, (2nd edn), Cambridge University Press 24 Wohlgemuth, M.J. et al. (2010) Linked control of syllable sequence and phonology in birdsong. J. Neurosci. 29, 12936–12949 25 Todt, D. and Hultsch, H. (1996) Acquisition and performance of repertoires: ways of coping with diversity and versatility. In Ecology and Evolution of Communication (Kroodsma, D.E. and Miller, E.H., eds), pp. 79–96, Cornell University Press 26 Gentner, T. and Hulse, S. (1998) Perceptual mechanisms for individual vocal recognition in European starlings. Sturnus vulgaris. Anim. Behav. 56, 579–594 27 Dobson, C.W. and Lemon, R.E. (1979) Markov sequences in songs of American thrushes. Behaviour 68, 86–105 28 Marler, P. (1977) The structure of animal communication sounds. In Recognition of Complex Acoustic Signals: Report of the Dahlem Workshop on Recognition of Complex Acoustic Signals, Berlin (Bullock, T.H., ed.), pp. 17–35, Abakon-Verlagsgesellschaft 29 Kakishita, Y. et al. (2009) Ethological data mining: an automata-based approach to extract behavioural units and rules. Data Min. Knowl. Disc. 18, 446–471 30 Hilliard, A.T. and White, S.A. (2009) Possible precursors of syntactic components in other species. In Biological Foundations and Origin of Syntax (Bickerton, D. and Szathma´ry, E., eds), pp. 161–184, MIT Press 31 Okanoya, K. (2004) Song syntax in Bengalese finches: proximate and ultimate analyses. Adv. Stud. Behav. 34, 297–345 32 Takahasi, M. et al. (2010) Statistical and prosodic cues for song segmentation learning by Bengalese finches (Lonchura striata var. domestica). Ethology 116, 481–489 33 Todt, D. and Hultsch, H. (1998) How songbirds deal with large amount of serial information: retrieval rules suggest a hierarchical song memory. Biol. Cybern. 79, 487–500 34 Suge, R. and Okanoya, K. (2010) Perceptual chunking in the selfproduced songs of Bengalese finches (Lonchuria striata var. domestica). Anim. Cog. 13, 515–523 35 Kakishita, Y. et al. (2007) Pattern extraction improves automata-based syntax analysis in songbirds. ACAL 2007. Lect. Notes in Artif. Intell. 828, 321–333 36 Kobayashi, S. and Yokomori, T. (1994) Learning concatenations of locally testable languages from positive data. Algorithmic Learning Theory, Lect. Notes in Comput. Sci. 872, 407–422 37 Kobayashi, S. and Yokomori, T. (1997) Learning approximately regular languages with reversible languages. Theor. Comput. Sci. 174, 251–257 38 Angluin, D. (1982) Inference of reversible languages. J. ACM 29, 741– 765
Review 39 Berwick, R. and Pilato, S. (1987) Learning syntax by automata induction. J. Mach. Learning 3, 9–38 40 Johnson, C.D. (1972) Formal Aspects of Phonological Description, Mouton 41 Chomsky, N. and Halle, M. (1968) The Sound Patterns of English, Harper & Row 42 Halle, M. (1978) Knowledge unlearned and untaught: what speakers know about the sounds of their language. In Linguistic Theory and Psychological Reality (Halle, M. et al., eds), pp. 294– 303, MIT Press 43 Pierrehumbert, J. and Nair, R. (1995) Word games and syllable structure. Lang. Speech 38, 78–116 44 Kuhl, P. (2008) Early language acquisition: cracking the speech code. Nat. Rev. Neurosci. 5, 831–843 45 Newport, E. and Aslin, R. (2004) Learning at a distance. I. Statistical learning of non-adjacent regularities. Cog. Sci. 48, 127–162 46 Gervain, J. and Mehler, J. (2010) Speech perception and language acquisition in the first year of life. Ann. Rev. Psychol. 61, 191–218 47 Halle, M. and Vergnaud, J-R. (1990) An Essay on Stress, MIT Press 48 Lerdahl, F. and Jackendoff, R. (1983) A Generative Theory of Tonal Music, MIT Press 49 Fabb, N. and Halle, M. (2008) A New Theory of Meter in Poetry, Cambridge University Press 50 Kreutzer, M. et al. (1999) Social stimulation modulates the use of the ‘A’ phrase in male canary songs. Behaviour 136, 1325–1334 51 Briefer, E. et al. (2009) Response to displaced neighbours in a territorial songbird with a large repertoire. Naturwissenschaften 96, 1067–1077 52 Knudsen, D.P. and Gentner, T.Q. (2010) Mechanisms of song perception in oscine birds. Brain Lang. 115, 59–68 53 Chomsky, N. and Miller, G. (1963) Finitary models of language users. In Handbook of Mathematical Psychology (Luce, R. et al., eds), pp. 419– 491, Wiley 54 Bloomfield, L. (1933) Language, Henry Holt ¨ ber die Verschiedenheit des menschlichen 55 von Humboldt, W. (1836) U Sprachbaues und ihren Einfluss auf die geistige Entwickelung des Menshengeschlechts, Ferdinand Du¨mmler 56 Gentner, T.Q. et al. (2006) Recursive syntactic pattern learning by songbirds. Nature 440, 1204–1207 57 Gentner, T. (2007) Mechanisms of auditory pattern recognition in songbirds. Lang. Learn. Devel. 3, 157–178 58 Chomsky, N. (1970) Remarks on nominalization. In Readings in English Transformational Grammar (Jacobs, R.A.P. and Rosenbaum, P., eds), pp. 184–221, Ginn
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3 59 Joshi, A. et al. (1991) The convergence of mildly context-sensitive grammar formalisms. In Foundational Issues in Natural Language Processing (Sells, P. et al., eds), pp. 31–82, MIT Press 60 Fodor, J. et al. (1965) The psychological reality of linguistic segments. J. Verb. Learn. Verb. Behav. 4, 414–420 61 Chomsky, N. (1956) Three models for the description of language. IRE Trans. Info. Theory 2, 113–124 62 Rogers, J. and Hauser, M. (2010) The use of formal language theory in studies of artificial language learning: a proposal for distinguishing the differences between human and nonhuman animal learners. In Recursion and Human Language (van der Hulst, H., ed.), pp. 213– 232, De Gruyter Mouton 63 Huybregts, M.A.C. (1984) The weak adequacy of context-free phrase structure grammar. In Van Periferie Naar Kern (de Haan, G.J. et al., eds), pp. 81–99, Foris 64 Shieber, S. (1985) Evidence against the context-freeness of natural language. Ling. Philos. 8, 333–343 65 Kudlek, M. et al. (2003) Contexts and the concept of mild contextsensitivity. Ling Phil. 26, 703–725 66 Berwick, R. and Weinberg, A. (1984) The Grammatical Basis of Linguistic Performance, MIT Press 67 van Heijningen, C.A.A. et al. (2009) Simple rules can explain discrimination of putative recursive syntactic structures by a songbird species. Proc. Natl. Acad. Sci. U.S.A. 106, 20538–20543 68 Rogers, J. and Pullum, G. Aural pattern recognition experiments and the subregular hierarchy. J. Logic, Lang. & Info (in press) 69 McNaughton, R. and Papert, S. (1971) Counter-free Automata, MIT Press 70 Trahtman, A. (2004) Reducing the time complexity of testing for local threshold testability. Theor. Comp. Sci. 328, 151–160 71 Heinz, J. (2009) On the role of locality in learning stress patterns. Phonology 26, 305–351 72 Crespi-Reghizzi, S. (1978) Non-counting context-free languages. J. ACM 4, 571–580 73 Crespi-Reghizzi, S. (1971) Reduction of enumeration in grammar acquisition. In Proceedings of the 2nd International Joint Conference on Artificial Intelligence (Cooper, D.C., ed.), pp. 546–552, William Kaufman 74 Crespi-Reghizzi, S. and Braitenburg, V. (2003) Towards a brain compatible theory of language based on local testability. In Grammars and Automata for String Processing: from Mathematics and Computer Science (Martin-Vide, C. and Mitrana, V., eds), pp. 17– 32, Gordon & Breach 75 Hosino, T. and Okanoya, K. (2000) Lesion of a higher-order song nucleus disrupts phrase level complexity in Bengalese finches. Neuroreport 11, 2091–2095
121
Review
Representing multiple objects as an ensemble enhances visual cognition George A. Alvarez Vision Sciences Laboratory, Department of Psychology, Harvard University, 33 Kirkland Street, William James Hall, Room 760, Cambridge, MA 02138, USA
The visual system can only accurately represent a handful of objects at once. How do we cope with this severe capacity limitation? One possibility is to use selective attention to process only the most relevant incoming information. A complementary strategy is to represent sets of objects as a group or ensemble (e.g. represent the average size of items). Recent studies have established that the visual system computes accurate ensemble representations across a variety of feature domains and current research aims to determine how these representations are computed, why they are computed and where they are coded in the brain. Ensemble representations enhance visual cognition in many ways, making ensemble coding a crucial mechanism for coping with the limitations on visual processing. Benefits of ensemble representation Unlike artificial displays used in laboratory experiments, where there is no reliable pattern across individual items, the real world is highly structured and predictable [1,2]. For instance, at the object level, the visual field often consists of collections of similar objects – faces in a crowd, berries on a bush. At a more primitive feature level, natural images are highly regular in terms of their contrast and intensity distributions [3,4], color distributions [5–8], reflectance spectra [9,10] and spatial structure [2,11–14]. Where there is structure, there is redundancy, and where there is redundancy, there is an opportunity to form a compressed and efficient representation of information [15–17]. One way to capitalize on this structure and redundancy is to represent collections of objects or features at a higher level of description, describing distributions or sets of objects as an ensemble rather than as individuals. An ensemble representation is any representation that is computed from multiple individual measurements, either by collapsing across them or by combining them across space and/or time. For instance, any summary statistic (e.g. the mean) is an ensemble representation because it collapses across individual measurements to provide a single description of the set. People are remarkably accurate at computing averages, including the mean size [18,19], brightness [20], orientation [18,21,22] and location of a collection of objects [23]; the average emotion [24], gender [24] and identity [25] of faces in a crowd; and the average number for a set of symbolically presented numbers [26,27]. These are all measures of central tendency for Corresponding author: Alvarez, G.A. (
[email protected]).
122
a collection of objects. Other statistics that describe a set, such as variance [28], skew and kurtosis, are also ensemble representations, although the ability to compute and represent these statistics has been the focus of less attention in recent research (but see [29,30] for reviews on earlier research). Finally, the concept of ensemble representations can be extended beyond first-order summary statistics, to include higher-order summary statistics [31–33]. Ensemble representations have been explored under various names in the literature, including ‘global features’ [32,34,35], ‘(w)holistic’ or ‘configural’ features [36–38], ‘sets’ [18,39] and ‘statistical properties’ or ‘statistical summaries’ [19,40]. Each of these terms shares the notion that multiple measurements are combined to give rise to a higher level description. The term ‘ensemble representation’ is used here as an umbrella term encompassing these different ideas. Although there is, as yet, no unifying model of ensemble representation across these domains, recent research on ensemble representation is unified by a common principle: representing multiple objects as an ensemble enhances visual cognition. The power of averaging How can computing ensemble representations help overcome the severe capacity limitations of our visual system? The answer lies in the power of averaging: simply put, the average of multiple noisy measurements can be much more precise than the individual measurements themselves. For instance, one can measure reaction time with millisecond precision even when rounding reaction times to the nearest 100 ms (Box 1). The same principle is at play in the ‘wisdom of crowds’ effect, in which people guess the weight of an ox and the average response is closer to the correct answer than are the individual guesses on average [41]. These benefits arise because, when measurements are averaged, random error in one individual measurement will tend to cancel out uncorrelated random error in another measurement. Thus, the benefits of averaging depend on the extent to which the noise in individual measurements is correlated (less correlated, more benefit) and the number of individual measurements averaged (more measurements, more benefit). The benefit of averaging can be formalized mathematically, given certain assumptions regarding the noise in the individual measurements (Figure 1). If the human visual system is capable of averaging, then observers should be able to judge the average size of a set more accurately than they can judge the individuals in the set. This is exactly what was demonstrated by Dan Ariely’s
1364-6613/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2011.01.003 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Box 1. The power of averaging
[()TD$FIG]
Imagine you are running an experiment with an expected effect size of 20 ms, which is not uncommon in behavioral research (e.g. negative priming or simple detection tasks). Do you need to worry about the sampling rate of your keyboard? First let us consider what would happen if we simply rounded reaction times to the nearest 100 ms. By averaging multiple samples, individual errors owing to rounding will tend to cancel each other out, and it is possible to obtain millisecond precision in the estimate of the mean despite rounding. Figure Ia shows the results of a simulation with ten virtual subjects and only 30 trials per subject. The true average of the population is 600 ms, and subjects are normally distributed around this mean (i.e. each subject has their own true mean, but the average across subjects will be 600 ms). For each simulated trial, reaction time was simulated as the subject’s true mean plus 15% random noise around their true mean. This is fairly typical of reaction time data, but the simulation results do not depend crucially on this value. The simulated reaction times were then rounded to the nearest 100 ms. When the true reaction times (from the simulation) are compared to the rounded reaction times, the mean and variance of the two data sets are nearly indistinguishable.
(a)
Effect of rounding
Now suppose your keyboard checks for a key once every 100 ms. This would be equivalent to rounding each reaction time up to the nearest 100 ms, which on the face of it sounds like it would add error to the estimate of the mean and variance of each condition. Indeed, it would lead to overestimates of the reaction time in each condition. However, the relative difference between conditions could be preserved. The simulation above was repeated with two conditions in which the true mean between conditions was simulated so that condition two was 20 ms slower than condition one on average. Figure Ib shows the results of the simulation, in which condition two was reliably slower than condition one for each individual subject, and the 20 ms difference is significant at p < 0.05 using a standard within-subject t-test. In general, whether the effect can be detected thus will depend on the degree of rounding, the expected size of the effect and the variability of the data. For the present purpose, the important point is that, by averaging a relatively modest number of trials, it is possible to overcome a great deal of noise in individual estimates to obtain a precise representation of the mean (Figure Ia) and to detect a subtle difference between two conditions (Figure Ib).
(b)
Effect of rounding-up 800
700 700 Reaction time (ms)
Reaction time (ms)
600 500 400 300
600 500 400 300
200
200
100
100
0
True values
Rounded values
0
Condition 1
Condition 2 TRENDS in Cognitive Sciences
Figure I. (a) The effect of rounding on estimating the mean and variance in a single condition. Error bars depict the standard deviation across subjects. (b) The effect of rounding-up on the comparison of two conditions in which the true mean differs by 20 ms. Error bars depict the within-subject standard error of the mean.
influential research on the ability of people to perceive the mean size of a set [18], which showed that observers can estimate with high accuracy the average size of a set of objects, even when they appear unable to report the size of the individual objects in the set. This type of averaging provides a potential mechanism for coping with the severe limitations on attentional processing. Attention appears to be a fluid and flexible resource: we can give full attention to a single item and represent that item with high precision, or we can divide our attention among many items but consequently represent each item with lower precision [42–44]. In general, objects outside the focus of attention are perceived with less clarity [45], lower contrast [46] and a weaker highfrequency response [47,48]. Presumably all objects in the visual field are represented with varying degrees of precision, depending on the amount of attention they receive. In some cases, objects outside the focus of attention are so poorly represented that it seems like we have no useful information about them at all. However, it turns out to be
possible to combine that imprecise information to recover an accurate measure of the group [23]. Figure 2 illustrates how attention might affect the fidelity of ensemble representations. Inside the focus of attention (red beams), individual items will be represented with relatively high precision. The average of these items will be represented with even higher precision, as expected from the benefits of averaging. For items outside the focus of attention, we assume that they must be attended to some extent to be perceived at all. For instance, the results of inattentional blindness studies have shown that without attention, there is little or no consciously accessible representation of visual information [49–51]. These studies typically aim for participants to completely withdraw attention from the tested items, and in some cases observers even actively inhibit information outside of the attentional set [51]. However, when observers know they will be asked about information outside the focus of attention, it is probable that they diffusely attend to those items. Figure 2 implies a parallel system with multiple foci of 123
()TD$FIG][ Review [()TD$FIG]
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Ensemble representation
Ensemble representation
Individual representation
Individual representation
Focal attention
Focal attention
Image of world
TRENDS in Cognitive Sciences
Figure 1. Gaining precision at a higher level of abstraction. By taking individual measurements and averaging them, it is possible to extract a higher-level ensemble representation. If error is independent between the individual representations, then the ensemble average will be more precisely represented than the individuals in the set. This benefit can be quantified after making certain assumptions. For instance, if each individual were represented with the same degree of independent, Gaussian noise (standard deviation = s), then the average of these individual estimates would have less noise, with a standard deviation equal to s/Hn, where n is the number of individual measurements. The process is depicted for the representation of object size, but the logic holds for any feature dimension.
attention, plus diffuse attention spread over items outside the foci of attention. However, a similar result could be modeled with a single spotlight of attention that spends more time in some locations than others. Either way this diffuse attention results in extremely imprecise representations of the individual items, and yet averaging even just three imprecise measurements results in a fairly precise representation of the ensemble. If a large enough sample of items is averaged together, then the ensemble representation for items outside the focus of attention can be nearly as accurate as the ensemble representation for items inside the focus of attention. The mechanisms of averaging Although there is general agreement that human observers can accurately represent ensemble features, many questions remain regarding ‘how’ these ensemble representations are computed, including: (i) Are individual representations computed and then combined to form an ensemble representation, or are ensemble representations somehow computed without computing individuals? (ii) If individual representations are computed, are they discarded once the ensemble has been computed? (iii) How many individual items are sampled and included in the calculation of the mean? Is it just a few or could it be all of them? (iv) Do all items contribute to the mean equally? Are ensembles built up from representations of individuals? Ariely [18] proposed that the visual system performs a type of compression, by creating an ensemble representation and then discarding individual representations. Some have interpreted this proposal to mean that the ensemble representation is computed without first directly computing individual measurements. For instance, it is possible that there is a ‘total activation map’ and a ‘number map’ 124
Distributed attention TRENDS in Cognitive Sciences
Figure 2. Effect of attention on the fidelity of ensemble representations. Two sets of items are depicted: one set inside the focus of attention (red beams) and one set diffusely attended outside the focus of attention (pink region). For illustrative purposes, both sets are composed of identical individuals, and thus both sets have the same individual and mean representations. For items inside the focus of attention, individual representations will be relatively precise (red curves). The ensemble representation of the items inside the focus of attention will be even more precise, owing to the benefits of averaging. For items outside the focus of attention which are diffusely attended, the individual representations will be very imprecise (gray curves). However, the benefits of averaging are so great that the ensemble representation will be fairly precise, even when a relatively small number of individual representations are averaged (just three in this example).
and that mean size is computed by taking the total activation and dividing it by the number of items [52]. However, Ariely’s use of the term ‘discard’ suggests that his intended meaning was that the individual properties are computed, combined and then discarded. This type of averaging model has been supported by research on the computation of mean orientation [21]. Addressing this question empirically is a challenge because it is possible to compute accurate ensemble representations even from very imprecise individual measurements. Consequently, a poor representation of individual items cannot be used as evidence for mean computation without computing individuals – unless the mean can be shown to be represented more accurately than expected based on the number and fidelity of individual items represented. Are individual representations discarded? How do we explain such poor performance when observers are required to report the properties of individual members of a set? One possibility is that these properties are computed and then discarded. An important alternative possibility is that the individual representations are not discarded, but are simply so noisy and inaccurate that observers cannot consistently identify individuals from the set owing to this high level of noise. Alvarez and Oliva found support for this possibility by modeling their results [23], consistently finding that the accuracy of ensemble judgments is perfectly predicted from the accuracy of individual judgments – even when individuals appear to be judged with near chance accuracy. This alternative possibility fits with a framework in which the representation of an image is hierarchical, retaining information at multiple levels of abstraction [35,53].
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Ensemble representation Individual representation Image of world
TRENDS in Cognitive Sciences
Figure 3. Effect of set size on the fidelity of individual and ensemble representations. The ensemble average should become more precise as the number of individual items increases, because the benefits of averaging accrue with each additional item averaged (with diminishing returns, of course). However, if the precision with which individual items can be represented decreases with set size, as depicted here, it is possible for this decrease to perfectly offset the benefits of averaging so that the precision of the average remains constant with set size.
increase in noise that occurs as the number of items increases. Do all items contribute to the mean equally? There is already some evidence that not all items contribute equally to the mean [58]. Intuitively, if some measures are very unreliable, and other measures are very reliable, we should give the more reliable measures more weight when combining these measurements. In general, computing a weighted average in which more reliable estimates are given greater weight will minimize the error in estimates of the mean. To illustrate this point, Figure 4 shows the results of a simulation in which the mean size of eight items was estimated. Half of the individual item sizes were estimated with high precision (low variance), whereas the other half were estimated with low precision (high variance). The individual measurements were then averaged using the standard equal-weight average or using a precision-weighted average in which each individual measurement was weighted proportional to its precision. A total of 1000 trials were simulated, and for each trial error was measured as the difference between the actual mean size and the estimated mean size. The error distributions show that error was lower for the precision-weighted average than for the standard, equal-weighted average.
Frequency
[()TD$FIG]
Equal-Weighted average
Error Precision-Weighted average Frequency
How many items are sampled? A great deal of enthusiasm surrounding studies on ensemble representations stems from the possibility that there are specialized ensemble processing mechanisms which are separate from the mechanisms employed to represent individual objects. However, this idea has spurred some controversy in the area of research on mean size perception, where modeling study has shown that it is possible to accurately estimate the mean by sampling a small subset of items [54]. In some cases, the average of the set could be accurately estimated by strategically sampling as few as one or two items, and estimating the average of those items alone [54]. Consistent with this subset sampling hypothesis, the accuracy of the mean estimate is typically constant as the number of items in the set increases beyond four items [18,55,56], whereas the benefits of averaging should accrue as more items are averaged together. This would be expected if observers were sampling just a subset of the items. However, there are several reasons to believe that observers are not strategically subsampling when they compute the mean. In the case of crowded items, observers simply cannot sample individual items, thus it is unlikely that judgments for crowded displays [21] reflect a sampling strategy. When items are not crowded, it has been shown that intermixing conditions that would require different sampling strategies does not impair performance on mean size estimation [57], suggesting that subjects either are not using a strategic sampling strategy or can instantly deploy a new strategy based on some property of the display. This latter possibility is unlikely, given that the displays in [57] were only presented for 200 ms. One study on perceiving the average facial expression has shown that observers discount outliers when computing the average, but a sampling strategy would show a large effect of outliers [58]. Moreover, the accuracy of centroid estimates suggests that ‘all’ of the items must be averaged to compute the centroid with the level of precision observed, requiring the representation of a minimum of eight individual items [23]. If observers are not strategically subsampling, the fact that the precision of mean size estimation is constant with the number of items beyond four presents a bit of a mystery. One possibility is that the benefits of averaging accrue quickly, and that one would predict a steep improvement in the precision of mean estimation from one to four items, with a leveling off beyond four items [58]. Another possibility is that the precision with which each individual item is represented decreases as the number of items increases, because each item receives less attention [42,44] and/or because items are more crowded and appear further in the periphery on average. If this were the case, then the benefits from averaging additional items would be offset by the decrease in precision with which the individual items are represented, as illustrated in Figure 3. This account predicts that the slope of the function relating the precision of mean judgments to the number of items would depend on the degree to which the noise in individual items increases with the number of items. In practice, this slope is often fairly shallow or even flat [18,55,56]. This raises the intriguing possibility that averaging perfectly offsets the
[()TD$FIG]
Error TRENDS in Cognitive Sciences
Figure 4. Benefits of precision-weighted averaging. A standard equal-weighted average will be less precise on average than a precision-weighted average in which more reliable individual measurements are given more weight in the average. Thus, if the precision of individual measurements is known, the optimal strategy for computing the average is to combine individual measurements with more weight given to more reliable individual measurements.
125
Review Exactly how to implement precision-weighted averaging depends on how the problem is formulated. When faced with a group of samples to average, we could either assume that each individual item is a sample drawn from a single distribution or that each individual item is a sample drawn from a separate distribution. If we assume that individual measurements are separate samples from a single distribution, and the goal is to estimate the central tendency of the underlying distribution, then each measurement i should weighted by1/si2 (where si2 is the variance for item i). For instance, if one of the items has infinite variance, it will be completely ignored. This type of weighted average has been used extensively in the cue integration literature to define the optimal strategy for combining cues that have different degrees of reliability [59]. Alternatively, if the items are considered samples from separate distributions, and the goal is to estimate the mean of the sample, then items should never be given zero weight in the average. One strategy would be to compute the mean and variance of the samples, and to adjust the mean towards more reliable measures in proportion to their variance. In this case, an item with infinite variance would be included in the initial estimate of the mean, but there would be no additional updating of the mean towards this item. This strategy was employed in the simulations shown in Figure 4. For ensemble averaging mechanisms to employ this type of precision-weighted averaging, the visual system would either have to know the degree of reliability with which items are represented or have a heuristic to calculate it. Both of these routes are plausible. Some models of visual perception model representations of individual items as probabilistic [59–61], in which knowledge is stored as a probability distribution that explicitly contains a representation of the reliability/variance of the representation. Alternatively, certain heuristics could be employed for estimating reliability, such as giving peripheral items less weight because visual resolution is known to drop off with eccentricity. Similarly, items inside the focus of attention might be weighted more than items outside the focus of attention because the precision with which items are represented is proportional to the amount of attention we give them. These heuristics would not be explicit representations of reliability, but they are cues that are tightly correlated with reliability, and thus they could be used to weigh individual items as a proxy for reliability. It has been suggested that attended items are given more weight in the averaging of crowded orientation signals [62]. One study has shown that when attention is drawn to a particular item in the set, the mean judgment is biased towards that item [63]. One possible interpretation of this finding is that attention enhances the resolution with which the attended item is represented [42–44,48], and that items are weighed by their precision or reliability when computing the mean [40]. This possibility is speculative and has not been directly tested in uncrowded displays. Beyond spatial averaging Recent research on ensemble representation has gone beyond assessing the ability of observers to average visual 126
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
features across space, including: (i) the ability to average features across time; (ii) the ability to represent other ensemble properties, such as the number of items in a set; (iii) the ability to represent spatial patterns; (iv) the relationship between ensemble representation and crowding; and (v) the neural correlates of ensemble representation. Computing ensemble representations across time In addition to spatial structure, there is a great deal of temporal structure and redundancy in the input to the visual system, and thus it would be advantageous to be able to also compute ensemble representations across time. Recent research has shown that observers can judge the mean size of a dynamically changing item or groups of items [40], or the mean expression of a dynamically changing face [56]. These findings demonstrate that perceptual averaging can operate over continuous and dynamic input, and that averaging across time can be as precise as averaging across space. Whether temporal averaging mechanisms constantly accumulate information or sample from high information points, such as salient transitions or discontinuities in the input stream, remains an open question. However, there is some evidence that certain information in a temporal sequence will be given more weight in the average than other information, possibly related to the amount of attention allocated to different points in the temporal sequence [40]. Number as an ensemble representation Perhaps the most basic summary description for a collection of items is the number of items in the set. Without verbally counting, observers are able to estimate the approximate number of items in a set [64–66]. Similar to the perception of mean properties, the ability to enumerate items in a set occurs rapidly. It is also possible to extract the number of items across multiple sets in parallel [39]. Surprisingly, there is even evidence that number is directly perceived in the same way as other primary visual attributes [67]. Burr and Ross [67] demonstrated that it is possible to adapt to number in the same way that it is possible to adapt to visual properties such as color, orientation or motion. Number literally seems to be a ‘perceived property’ of sets. The relationship between the mechanisms underlying number representation and perceptual averaging is an important topic for future research. Representing spatial patterns Statistical summary representations, such as the mean or number of items in a set, are extremely compact representations, collapsing the description of a set down to a single number. However, images often consist of spatially distributed patterns of information, also referred to as spatial regularities or spatial layout statistics. For example, natural images consist of regular distributions of orientation and spatial frequency information [34,68]. In one study, Oliva and Torralba [34] measured orientation energy at different spatial scales over thousands of images and conducted a principal components analysis on these measurements. This analysis revealed that there are regularities in the structure of natural images, with certain patterns of
()TD$FIG][ Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Spatial ensemble representation
Individual representation
Image of world
TRENDS in Cognitive Sciences
Figure 5. Spatial ensemble representations. Individual orientation measurements can be combined to represent patterns of orientation information. For each pattern, local orientation measurements are made (depicted as Gaussian curves centered around the true orientation), but each individual measure has a high degree of noise or uncertainty. Similar orientation signals are then pooled together to characterize regions with similar orientation signals using the average orientation. In the first column, the top half of the image has a mean orientation of vertical, whereas the bottom half has the a mean orientation of horizontal. The same is true for the image in the middle column. However, the pattern is flipped for the third column, here the top half has a mean of horizontal and the bottom half has a mean of vertical. Crucially, at the level of individual representations, the left and middle columns are just as different from each other as the left and right columns. However, at the ensemble level, the left and middle columns are more similar to each other than the left and right columns.
spatial frequency and orientation more likely to occur than other patterns. A schematic of a common pattern is shown in Figure 5, in which orientation signals tend to be more similar to each other within the top and bottom halves of the image than they are across the top and bottom halves. It would be efficient for the visual system to capitalize on the redundancy in natural images by using visual mechanisms that are tuned to the statistics of the natural world [11,69]. Indeed, a great deal of research has suggested that low-level sensory mechanisms are tuned to real-world statistical regularities [17,70–72]. The representation of such spatial ensemble statistics is robust to the withdrawal of attention, as would be expected if these ensemble representations are computed by pooling together local measurements [31]. For example, while attending to a set of moving objects in the foreground, changes to the background were only noticed when they altered the ensemble structure of the display, not when the ensemble structure remained the same, even though these changes were perfectly matched in terms of the magnitude of local change [31]. This suggests that the visual system maintains an accurate representation of the spatial ensemble statistics of a scene, even when attention is focused on a subset of items in the visual field. Ensemble representation and crowding Items in the visual field are often spaced too closely for each individual item to be resolved. For instance, it is unlikely that one can perceive the individual letters three sentences above or below this one. Yet, one can tell that there are letters present, that these letters are grouped into several words and so on. What is the nature of our perceptual representation when looking at a crowded collection of
objects? There is a growing body of evidence suggesting that one perceives the higher-order summary statistics of information within the crowded region [21,73]. For a crowded set of oriented items, one perceives the average orientation [21]. For more complex patterns, such as a set of letters, the perceived pattern appears to result from a more complex statistical representation [73]. Balas and colleagues generated stimuli using a model which uses the joint statistics of cells which code for position, phase, orientation and scale [73]. Any pattern, such as sets of letters, can be passed through this model, resulting in a synthetic image that is somewhat distorted, yet is statistically similar to the original. When directly viewed, the original and the synthetic image look very different. However, identification performance with these synthetic images correlates with identification performance for crowded letters in the periphery, suggesting that perception in the periphery could consist of a similar statistical representation. The relationship between ensemble representation and crowding raises important questions regarding whether ensemble coding occurs automatically and whether it is perceptual in nature (Box 2). Other studies suggest that there could be important differences between ensemble representation and crowd-
Box 2. Automaticity and directly perceived ensemble representations A central question is whether the visual system automatically computes ensemble representations without conscious intention or effort, or whether they are computed voluntarily based on task demands. If ensemble representations were automatically computed, then we would conclude that there are dedicated mechanisms for computing and representing them. We might then focus on identifying the core ensemble feature dimensions and assessing their tuning properties. To understand such mechanisms, we can bring to bear methods that have been employed to understand perception, such as single-cell physiology, and perceptual adaptation. If ensemble representations are not computed automatically, but instead reflect a voluntary high-level judgment, then the methods we would use, and questions we would ask, might be somewhat different. For instance, physiology and adaptation are unlikely to reveal much about these mechanisms and ensemble representations would probably depend on task incentives and observers’ goals. To understand such representations, we might explore regularities in how observers make ensemble judgments and turn our attention towards identifying consistent heuristics and biases in ensemble judgments. In addition to the distinction between automatic and voluntary, there is an important distinction between ‘directly perceived’ and ‘read-out’ ensemble representations. In some cases the observer directly perceives the ensemble representation. For example, when a collection of items is presented in the periphery, their orientations appear to be automatically averaged [28]. With such crowded items, the perceptual experience is of ‘directly seeing’ the average orientation (all items appear to have an orientation equal to the mean of the group), with an accompanying loss of perceptual access to the individual orientation signals. By contrast, when the same display appears at the fovea, the oriented items are not crowded and the orientation signals do not appear to be obligatorily averaged: it is clear that the items have different orientations and none of them appears to have an orientation that matches the average. However, even for uncrowded displays, it is possible that ensemble representations are automatically computed. For example, ensemble representations appear to be automatically computed when the primary task does not require it [77] and even when they impair task performance [94].
127
Review ing. For instance, crowding is greater in the upper visual field than the lower visual field, whereas under the same conditions the accuracy of ensemble judgments was the same in the upper and lower visual field [74]. Thus, although ensemble coding and crowding are closely related, there could be important dissociations between them. Neural correlates of ensemble representation Relatively little research has explored the neural mechanisms of ensemble representation. Perhaps the most basic question we can ask is whether there are brain regions with neurons dedicated to computing ensemble representations (above and beyond the computation of individual object representations). Extensive research suggests that the parietal cortex plays an important role in the representation of number [75]. However, much less research has been done to explore the representation of perceptual averages, such as mean size, mean facial expression or mean orientation. Future research in this area would provide important insight into the nature of ensemble coding, as well as the functional organization of the visual cortex. Additional benefits of computing ensemble representations The present article has focused on one primary benefit of ensemble representation: the ability to combine imprecise individual measurements to construct an accurate representation of the group, or ensemble. However, computing ensemble representations could yield many related benefits [18,76], which are discussed here. Information compression Compression is the process of recoding data so that it takes fewer bits of information to represent that data. To the extent that the encoding scheme distorts or loses information, the compression is said to be lossy. For instance, TIFF image encoding uses a form of lossless compression, whereas JPEG image encoding is a lossy form of image compression – although the information lost occurs at such a high spatial frequency that human observers typically cannot detect this loss. Ariely [18] proposed that reducing the representation of a set to the mean, and discarding individual representations, would be a sensible form of lossy compression for the human visual system: it leaves available an informative global percept which could potentially be used to navigate and choose regions of interest for further analysis. However, this form of compression would only be economical if ensemble representations and individual representations were ‘competing’ in some sense. Otherwise, in terms of compression, there is no advantage to discarding the individual representations, and one might as well extract the ensemble and retain the individual representations. There is some evidence that ensemble representations take the same memory space as individual representations [39], although other studies suggest that ensemble representations and noisy individual representations are maintained concurrently and that these levels are mutually informative [77,78]. These findings suggest that ensemble representations and individual representations probably do not compete for storage, at least not in a 128
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
mutually exclusive manner. However, none of these previous studies directly pitted ensemble memory versus individual memory and assessed possible trade-offs between them. Future research will be necessary to explore the extent to which ensemble representations and individual representations compete in memory. In terms of perceptual representations, it seems clear that individual and ensemble representations can be maintained simultaneously [23]. Whether ensemble coding is lossy or lossless depends on the fate of lower-level, individual representations. However, at the level of the ensemble representation, it is clear the data have been transformed into a more compressed form. It is possible that this format is more conducive to memory storage and learning. Ensemble representations are more precise than the lower-level representations composing them. Thus, there can be higher specificity of response at the ensemble level than at lower levels of representation. Such sparse coding has several advantages [79,80], including minimizing overlap between representations stored in memory [81] and learning associations in neural networks [82]. The extent to which observers can learn over ensemble representations of the type described in the present article is an important topic for future research, because it could bridge the gap between research on ensemble coding in visual cognition with the vast field of research on sparse coding and memory. Ensemble representations as a basis for statistical inference and outlier detection Another potential benefit to building an ensemble representation is to enable statistical inferences [83], including estimating the parameters of the distribution (mean, variance, range, shape), setting confidence intervals on those parameter estimates and classifying items into groups. A special case of classification is outlier detection, and an ensemble representation is ideal for this purpose [18,76]. For instance, if a set is well described by a distribution along an arbitrary dimension, say with a mean of 20 and standard deviation of 3, then an item with a value of 30 along this dimension is unlikely to be a member of the set. The ensemble representation would enable labeling this item as an outlier or even as a member of a different group. Outlier detection has been extensively studied using the visual search paradigm, in which the question has been whether an oddball item will instantly ‘pop out’ from a larger set of homogeneous items [84]. Items that are very different from the set, say a red item among green items, are said to be salient, and are easy to find in a visual search task [85,86]. Interestingly, computational models of saliency focus on ‘local differences’ between each item and its neighbors [87]. However, one could imagine displays in which the local context of a search target remained unchanged, but more distant items varied to either increase or decrease the degree to which the target appeared to be a member of the overall set. Finding that outlier status guides visual search above and beyond its effects on local saliency would provide strong support for the idea that ensemble representations play an important role in outlier detection.
Review Although it would be interesting if ensemble representations could enable rapid outlier detection, this finding is not necessary to support the idea that ensemble representations play an important role in classifying and grouping items. For instance, a face with a unique facial expression does not pop-out in a visual search task [88]. However, recent research shows that an outlier face is given reduced weight in the ensemble representation of a group of faces [58], even though observers often fail to perceive the outlier. This finding is consistent with the possibility that the ensemble representation enables labeling of items, but could also indicate that the ensemble computation gives outliers lower weight without attaching a classification label. The role of ensemble representations in determining set membership has not yet been extensively studied, and research in this area can potentially bridge the gap between study on ensemble representation, statistical inference and perceptual grouping. Building a ‘gist’ representation that can guide the focus of attention As detailed in previous sections, the power of averaging makes it possible to combine imprecise local measurements to yield a relatively precise representation of the ensemble (Figure 1). Moreover, it is possible to combine individual measurements to describe spatial patterns of information (Figure 5). A primary benefit of computing either type of ensemble representation is to provide a precise and accurate representation of the ‘gist’ of information outside the focus of attention. Without focused attention, our representations of visual information are highly imprecise [23]. If we were to simply discard or ignore these noisy representations, our conscious visual experience would be limited to only those items currently within the focus of attention. Indeed, some have argued that this is the nature of conscious visual experience [89,90]. In such a system, attention would be ‘flying blind’, without access to any information about what location or region to focus on next. Although locally imprecise, ensemble representations provide an accurate representation of higher-level patterns and regularities outside the focus of attention [23,31]. These patterns and regularities are highly diagnostic of the type of scene one is viewing [14], and therefore they are useful for determining which environment one is currently located within. Over experience, observers appear to learn associations between these ensemble representations and the location of objects in the visual field. For instance, observers appear to use global contextual information to guide the deployment of attention to locations likely to contain the target of a visual search task [33,91–93].Thus, rather than flying blind, the visual system can compute ensemble representations, providing a sense of the gist of information outside the focus of attention, and guiding the deployment of attention to important regions of a scene. In terms of forming a complete representation of a scene, gist representation and outlier detection probably work in tandem. For instance, when holding a scene in working memory, observers appear to encode the gist of the scene plus individual items that cannot be incorporated into the summary for the rest of the scene (i.e. outliers) [78].
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Benefits of building a hierarchical representation of a scene There are distinct computational advantages to building a hierarchical representation of a scene. In particular, by integrating information across levels of representation, it is possible to increase the accuracy of lower-level representations. It appears that observers automatically construct this type of representation when asked to hold a scene in working memory [77,78]. For instance, when recalling the size of an individual item from a display, the remembered size was biased towards the mean size of the set of items in the same color, and towards the overall mean size of all items in the display [77]. These results were well captured by a Bayesian model in which observers integrate information at multiple levels of abstraction to inform their judgment about the size of the tested item. Concluding remarks Traditional research on visual cognition has typically assessed the limits of visual perception and memory for individual objects, often using random and unstructured displays. However, there is a great deal of structure and redundancy in real-world images, presenting an opportunity to represent groups of objects as an ensemble. Because ensemble representations summarize the properties of a group, they are necessarily spatially and temporally imprecise. Nevertheless, such ensemble representations confer several important benefits. Much of the previous research on ensemble representation has focused on the fact that the human visual system is capable of computing accurate ensemble representations. However, the field is moving towards a focus on investigating the mechanisms that enable ensemble coding, the nature of the ensemble representation, the utility of ensemble representations and the neural mechanisms underlying ensemble coding. This future research promises to uncover important new properties of the representations underlying visual cognition and to further demonstrate how representing ensembles enhances visual cognition. Acknowledgments For helpful conversation and/or comments on earlier drafts, I thank Talia Konkle, Jason Haberman and Jordan Suchow. G.A.A. was supported by the National Science Foundation (Career Award BCS-0953730).
References 1 Kersten, D. (1987) Predictability and redundancy of natural images. J. Opt. Soc. Am. A 4, 2395–2400 2 Field, D.J. (1987) Relations between the statistics of natural images and the response properties of cortical cells. J. Opt. Soc. Am. A 4, 2379– 2394 3 Brady, N. and Field, D.J. (2000) Local contrast in natural images: normalisation and coding efficiency. Perception 29, 1041–1055 4 Frazor, R.A. and Geisler, W.S. (2006) Local luminance and contrast in natural images. Vis. Res. 46, 1585–1598 5 Webster, M.A. and Mollon, J.D. (1997) Adaptation and the color statistics of natural images. Vis. Res. 37, 3283–3298 6 Hyva¨rinen, A. and Hoyer, P.O. (2000) Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Comput. 12, 1705–1720 7 Judd, D.B. et al. (1964) Spectral distribution of typical daylight as a function of correlated color temperature. J. Opt. Soc. Am. A 54, 1031– 1040 129
Review 8 Long, F. et al. (2006) Spectral statistics in natural scenes predict hue, saturation, and brightness. Proc. Natl. Acad. Sci. U.S.A. 103, 6013– 6018 9 Maloney, L.T. (1986) Evaluation of linear models of surface spectral reflectance with small numbers of parameters. J. Opt. Soc. Am. A 3, 1673–1683 10 Maloney, L.T. and Wandell, B.A. (1986) Color constancy: a method for recovering surface spectral reflectance. J. Opt. Soc. Am. A 3, 29–33 11 Field, D.J. (1989) What the statistics of natural images tell us about visual coding. SPIE: Hum. Vis. Vis. Process. Digit. Display 1077, 269–276 12 Burton, G.J. and Moorehead, I.R. (1987) Color and spatial structure in natural scenes. Appl. Opt. 26, 157–170 13 Geisler, W.S. (2008) Visual perception and the statistical properties of natural scenes. Annu. Rev. Psychol. 59, 167–192 14 Torralba, A. and Oliva, A. (2003) Statistics of natural image categories. Network 14, 391–412 15 Huffman, D.A. (1952) A method for construction of minimum redundancy codes. Proc. IRE 40, 1098–1101 16 Shannon, C.E. and Weaver, W. (1949) The Mathematical Theory of Communication, The University of Illinois Press 17 Atick, J.J. (1992) Could information theory provide an ecological theory of sensory processing? Network: Comput. Neural Syst. 3, 213–251 18 Ariely, D. (2001) Seeing sets: representation by statistical properties. Psychol. Sci. 12, 157–162 19 Chong, S.C. and Treisman, A. (2003) Representation of statistical properties. Vis. Res. 43, 393–404 20 Bauer, B. (2009) Does Steven’s power law for brightness extend to perceptual brightness averaging? Psychol. Rec. 59, 171–186 21 Parkes, L. et al. (2001) Compulsory averaging of crowded orientation signals in human vision. Nat. Neurosci. 4, 739–744 22 Dakin, S.C. and Watt, R.J. (1997) The computation of orientation statistics from visual texture. Vis. Res. 37, 3181–3192 23 Alvarez, G.A. and Oliva, A. (2008) The representation of simple ensemble visual features outside the focus of attention. Psychol. Sci. 19, 392–398 24 Haberman, J. and Whitney, D. (2007) Rapid extraction of mean emotion and gender from sets of faces. Curr. Biol. 17, R751–R753 25 de Fockert, J. and Wolfenstein, C. (2009) Rapid extraction of mean identity from sets of faces. Q. J. Exp. Psychol. (Colchester) 62, 1716– 1722 26 Spencer, J. (1961) Estimating averages. Ergonomics 4, 317–328 27 Smith, A.R. and Price, P.C. (2010) Sample size bias in the estimation of means. Psychon. Bull. Rev. 17, 499–503 28 Morgan, M. et al. (2008) A ‘dipper’ function for texture discrimination based on orientation variance. J. Vis. 8, 1–8 29 Peterson, C.R. and Beach, L.R. (1967) Man as an intuitive statistician. Psychol. Bull. 68, 29–46 30 Pollard, P. (1984) Intuitive judgments of proportions, means, and variances: a review. Curr. Psychol. 3, 5–18 31 Alvarez, G.A. and Oliva, A. (2009) Spatial ensemble statistics are efficient codes that can be represented with reduced attention. Proc. Natl. Acad. Sci. U.S.A. 106, 7345–7350 32 Oliva, A. and Torralba, A. (2006) Building the gist of a scene: the role of global image features in recognition. Prog. Brain Res. 155, 23–36 33 Oliva, A. and Torralba, A. (2007) The role of context in object recognition. Trends Cogn. Sci. 11, 520–527 34 Oliva, A. and Torralba, A. (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 35 Navon, D. (1977) Forest before trees: the precedence of global features in visual perception. Cognit. Psychol. 9, 353–383 36 Kimchi, R. (1992) Primacy of wholistic processing and global/local paradigm: a critical review. Psychol. Bull. 112, 24–38 37 Thompson, P. (1980) Margaret Thatcher: a new illusion. Perception 9, 483–484 38 Young, A.W. et al. (1987) Configurational information in face perception. Perception 16, 747–759 39 Halberda, J. et al. (2006) Multiple spatially overlapping sets can be enumerated in parallel. Psychol. Sci. 17, 572–576 40 Albrecht, A.R. and Scholl, B.J. (2010) Perceptually averaging in a continuous visual world: extracting statistical summary representations over time. Psychol. Sci. 21, 560–567
130
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3 41 Galton, F. (1907) Vox populi. Nature 75, 450–451 42 Palmer, J. (1990) Attentional limits on the perception and memory of visual information. J. Exp. Psychol. Hum. Percept. Perform. 16, 332– 350 43 Alvarez, G.A. and Franconeri, S.L. (2007) How many objects can you track? Evidence for a resource-limited attentive tracking mechanism. J. Vis. 7, 1–10 44 Franconeri, S.L. et al. (2007) How many locations can be selected at once? J. Exp. Psychol. Hum. Percept. Perform. 33, 1003–1012 45 Titchener, E.B. (1908) Lectures on the Elementary Psychology of Feeling and Attention, Macmillan 46 Carrasco, M. et al. (2004) Attention alters appearance. Nat. Neurosci. 7, 308–313 47 Carrasco, M. et al. (2002) Covert attention increases spatial resolution with or without masks: support for signal enhancement. J. Vis. 2, 467–479 48 Yeshurun, Y. and Carrasco, M. (1998) Attention improves or impairs visual performance by enhancing spatial resolution. Nature 396, 72–75 49 Mack, A. and Rock, I. (1998) Inattentional Blindness, The MIT Press 50 Neisser, U. and Becklen, R. (1975) Selective looking: attending to visually specified events. Cognit. Psychol. 7, 480–494 51 Most, S.B. et al. (2005) What you see is what you set: sustained inattentional blindness and the capture of awareness. Psychol. Rev. 112, 217–242 52 Setic, M. et al. (2007) Modelling the statistical processing of visual information. Neurocomputing 70, 1808–1812 53 Kinchla, R.A. and Wolfe, J.M. (1979) The order of visual processing: ‘‘Top-down’’, ‘‘bottom-up’’, or ‘‘middle-out’’. Percept. Psychophys. 25, 225–231 54 Myczek, K. and Simons, D.J. (2008) Better than average: alternatives to statistical summary representations for rapid judgments of average size. Percept. Psychophys. 70, 772–788 55 Chong, S.C. and Treisman, A. (2005) Attentional spread in the statistical processing of visual displays. Percept. Psychophys. 67, 1–13 56 Haberman, J. et al. (2009) Averaging facial expression over time. J. Vis. 9, 1–13 57 Chong, S.C. et al. (2008) Statistical processing: not so implausible after all. Percept. Psychophys. 70, 1327–1334 58 Haberman, J. and Whitney, D. (2010) The visual system discounts emotional deviants when extracting average expression. Atten. Percept. Psychophys. 72, 1825–1838 59 Kersten, D. and Yuille, A. (2003) Bayesian models of object perception. Curr. Opin. Neurobiol. 13, 150–158 60 Vul, E. and Pashler, H. (2008) Measuring the crowd within: probabilistic representations within individuals. Psychol. Sci. 19, 645–647 61 Vul, E. and Rich, A.N. (2010) Independent sampling of features enables conscious perception of bound objects. Psychol. Sci. 21, 1168–1175 62 Mareschal, I. et al. (2010) Attentional modulation of crowding. Vis. Res. 50, 805–809 63 de Fockert, J.W. and Marchant, A.P. (2008) Attention modulates set representation by statistical properties. Percept. Psychophys. 70, 789–794 64 Dehaene, S. et al. (1998) Abstract representations of numbers in the animal and human brain. Trends Neurosci. 21, 355–361 65 Feigenson, L. et al. (2004) Core systems of number. Trends Cogn. Sci. 8, 307–314 66 Whalen, J. et al. (1999) Nonverbal counting in humans: the psychophysics of number representation. Psychol. Sci. 10, 130–137 67 Burr, D. and Ross, J. (2008) A visual sense of number. Curr. Biol. 18, 425–428 68 Geisler, W.S. et al. (2001) Edge co-occurrence in natural images predicts contour grouping performance. Vis. Res. 41, 711–724 69 Chandler, D.M. and Field, D.J. (2007) Estimates of the information content and dimensionality of natural scenes from proximity distributions. J. Opt. Soc. Am. A 24, 922–941 70 Barlow, H.B. and Foldiak, P. (1989) Adaptation and decorrelation in the cortex. In The Computing Neuron (Durbin, R. et al., eds), pp. 54–72, Addison-Wesley 71 Lewicki, M.S. (2002) Efficient coding of natural sounds. Nat. Neurosci. 5, 356–363 72 Olshausen, B.A. and Field, D.J. (1996) Natural image statistics and efficient coding. Network 7, 333–339
Review 73 Balas, B. et al. (2009) A summary-statistic representation in peripheral vision explains visual crowding. J. Vis. 9, 13–18 74 Bulakowski, P.F. et al. Reexamining the possible benefits of visual crowding: dissociating crowding from ensemble percepts. Atten. Percept. Psychophys. (in press) 75 Piazza, M. and Izard, V. (2009) How humans count: numerosity and the parietal cortex. Neuroscientist 15, 261–273 76 Cavanagh, P. (2001) Seeing the forest but not the trees. Nat. Neurosci. 4, 673–674 77 Brady, T.F. and Alvarez, G.A. Hierarchical encoding in visual working memory: ensemble statistics bias memory for individual items. Psychol. Sci. (in press) 78 Brady, T.F. and Tenenbaum, J.B. (2010) Encoding higher-order structure in visual working-memory: a probabilistic model. In Proceedings of the 32nd Annual Conference of the Cognitive Science Society (Ohlsson, S. and Catrambone, R., eds), pp. 411–416, Cognitive Science 79 Olshausen, B.A. and Field, D.J. (2004) Sparse coding of sensory inputs. Curr. Opin. Neurobiol. 14, 481–487 80 Olshausen, B.A. and Field, D.J. (1997) Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37, 3311–3325 81 Willshaw, D.J. et al. (1969) Non-holographic associative memory. Nature (Lond.) 222, 960–962 82 Zetzsche, C. (1990) Sparse coding: the link between low level vision and associative memory. In Parallel Processing in Neural Systems and Computers (Eckmiller, R. et al., eds), pp. 273–276, Elsevier Science
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3 83 Rosenholtz, R. (2000) Significantly different textures: a computational model of pre-attentive texture segmentation. In Proceedings of the 6th European Conference on Computer Vision (Vernon, D., ed.), pp. 197– 211, Springer-Verlag 84 Rosenholtz, R. (1999) A simple saliency model predicts a number of motion popout phenomena. Vis. Res. 39, 3157–3163 85 Itti, L. and Koch, C. (2001) Computational modelling of visual attention. Nat. Rev. Neurosci. 2, 194–203 86 Wolfe, J.M. (1994) Guided search 2.0: a revised model of visual search. Psychon. Bull. Rev. 1, 202–238 87 Itti, L. and Koch, C. (2000) A saliency-based search mechanism for overt and covert shifts of visual attention. Vis. Res. 40, 1489–1506 88 Nothdurft, H.C. (1993) Faces and facial expressions do not pop out. Perception 22, 1287–1298 89 Noe¨, A. and O’Regan, J.K. (2000) Perception, attention and the grand illusion. Psyche 6 (http://psyche.cs.monash.edu.au/v6/psche-6-15-noe. html) 90 O’Regan, J.K. (1992) Solving the ‘‘real’’ mysteries of visual perception: the world as an outside memory. Can. J. Psychol. 46, 461–488 91 Torralba, A. et al. (2006) Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol. Rev. 113, 766–786 92 Ehinger, K.A. et al. (2009) Modeling search for people in 900 scenes: a combined source model of eye guidance. Vis. Cogn. 17, 945–978 93 Chun, M.M. (2000) Contextual cueing of visual attention. Trends Cogn. Sci. 4, 170–178 94 Haberman, J. and Whitney, D. (2009) Seeing the mean: ensemble coding for sets of faces. Hum. Percept. Perform. 35, 718–734
131
Review
Cognitive neuroscience of self-regulation failure Todd F. Heatherton and Dylan D. Wagner Department of Psychological and Brain Sciences, 6207 Moore Hall, Dartmouth College, Hanover, NH 03755, USA
Self-regulatory failure is a core feature of many social and mental health problems. Self-regulation can be undermined by failures to transcend overwhelming temptations, negative moods and resource depletion, and when minor lapses in self-control snowball into selfregulatory collapse. Cognitive neuroscience research suggests that successful self-regulation is dependent on top-down control from the prefrontal cortex over subcortical regions involved in reward and emotion. We highlight recent neuroimaging research on self-regulatory failure, the findings of which support a balance model of self-regulation whereby self-regulatory failure occurs whenever the balance is tipped in favor of subcortical areas, either due to particularly strong impulses or when prefrontal function itself is impaired. Such a model is consistent with recent findings in the cognitive neuroscience of addictive behavior, emotion regulation and decision-making. The advantages of self-control The ability to control behavior enables humans to live cooperatively, achieve important goals and maintain health throughout their life span. Self-regulation enables people to make plans, choose from alternatives, control impulses, inhibit unwanted thoughts and regulate social behavior [1–4]. Although humans have an impressive capacity for self-regulation, failures are common and people lose control of their behavior in a wide variety of circumstances [1,5]. Such failures are an important cause of several contemporary societal problems – obesity, addiction, poor financial decisions, sexual infidelity and so on. Indeed, it has been estimated that 40% of deaths are attributable to poor self-regulation [6]. Conversely, those who are better able to self-regulate demonstrate improved relationships, increased job success and better mental health [7,8] and are less at risk of developing alcohol abuse problems or engaging in risky sexual behavior [9]. An understanding of the circumstances under which people fail at self-regulation – as well as the brain mechanisms associated with those failures – can provide valuable insights into how people regulate and control their thoughts, behaviors and emotions. Self-regulation failure The modern world holds many temptations. Every day, people need to resist fattening foods, avoid browsing the internet when they should be working, keep from snapping Corresponding author: Heatherton, T.F. (
[email protected]).
132
at annoying coworkers and curb bad habits, such as smoking or drinking too much. Psychologists have made considerable progress in identifying the individual and situational factors that encourage or impair self-control [4,5,10]. The most common circumstances under which self-regulation fails are when people are in bad moods, when minor indulgences snowball into full-blown binges, when people are overwhelmed by immediate temptations or impulses, and when control itself is impaired (e.g. after alcohol consumption or effort depletion). Researchers have examined each of these and we briefly discuss the major findings, beginning with the behavioral literature and then discussing recent neuroscience findings. Negative moods Among the most important triggers of self-regulation failure are negative emotions [11,12]. When people become upset they sometimes act aggressively [13], spend too much money [14], engage in risky behavior [15], including unprotected sex [16], comfort the self with alcohol, drugs or food [4,17], and fail to pursue important life goals. Indeed, negative emotional states are related to relapse for a number of addictive behaviors, such as alcoholism, gambling and drug addiction [18,19]. Laboratory studies have demonstrated that inducing negative affect leads to heightened cravings among alcoholics [12], increased eating by chronic dieters [20,21] and greater smoking intensity by smokers [22]. A theory by Heatherton and Baumeister provides an explanation for the roles of negative affect in disinhibited eating [23], which is also applicable to other self-regulatory failures. This theory proposes that dieters hold a negative view of self that is generally unpleasant (especially concerning physical appearance) and that dieters are motivated to escape from these unpleasant feelings by constricting their cognitive attention to the immediate situation while ignoring the long-term implications and higher-level significance of their current actions. This escape from aversive self-awareness not only helps dieters to forget their unpleasant views of self, but also disengages long-term planning and meaningful thinking and weakens the inhibitions that normally restrain a dieter’s food intake. This might explain, in part, the lack of insight that occurs in drug addiction [24]. Other behavioral accounts of the impact of negative mood on behavior include the idea that negative affect occupies attention, thereby leading to fewer resources to inhibit behavior [25], or that engaging in appetitive behaviors reduces anxiety and comforts the self and is therefore a form of coping [26].
1364-6613/$ – see front matter ß 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.tics.2010.12.005 Trends in Cognitive Sciences, March 2011, Vol. 15, No. 3
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
(Figure 1a). This disinhibition of dietary restraint has been replicated numerous times [20,28] and demonstrates that dieters often eat a great deal after they perceive their diets to be broken. It is currently not clear, however, how a small indulgence, which itself might not be problematic, escalates into a full-blown binge [29].
Lapse-activated consumption A common pattern of self-regulation failure occurs for addicts and chronic dieters when they ‘fall off the wagon’ by consuming the addictive substance or violating their diets [5]. Marlatt coined the term abstinence violation effect to refer to situations in which addicts respond to an initial indulgence by consuming even more of the forbidden substance [11]. In one of the first studies to examine this effect, Herman and Mack experimentally violated the diets of dieters by requiring them to drink a milkshake, a high-calorie food, as part of a supposed taste perception study [27]. Although non-dieters ate less after consuming the milkshakes, presumably because they were full, dieters [()TD$FIG]paradoxically ate more after having the milkshake
Cue exposure At the core of self-regulation is impulse control, but how do impulses arise? Both human and animal studies have demonstrated that exposure to drug cues increases the likelihood that the cued substance will be consumed [30– 33], and additionally increases cravings, attention and physiological responses such as changes in heart rate
(a)
250
No preload
Ice cream consummed (g)
Milkshake 200
150
100
50
0 Diet
0.5
(b)
Bold signal change
0.4
Non-diet
No preload Milkshake
0.3 0.2 0.1 0 -0.1 -0.2
Right NAcc (12, 9, -3) -0.3 Diet
0.5
Bold signal change
0.4
Non-diet
No preload Milkshake
0.3 0.2 0.1 0 -0.1 -0.2
Left NAcc (-15, 3, -8) -0.3 Diet
Non-diet
TRENDS in Cognitive Sciences
Figure 1. (a) When restrained eaters’ diets were broken by consumption of a high-calorie milkshake preload, they subsequently show disinhibited eating (e.g. increased grams of ice-cream consumed) compared to control subjects and restrained eaters who did not drink the milkshake (figure based on data from [30]). (b) Restrained eaters whose diets were broken by a milkshake preload showed increased activity in the nucleus accumbens (NAcc) compared to restrained eaters who did not consume the preload and satiated non-dieters [64].
133
Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
[33–35]. Yet people might be unaware that their environments are influencing them because stimuli can activate goals, cravings and so forth implicitly [36,37]. Even if people are somewhat aware of cues around them, they are unaware of the process by which exposure to those cues implicitly activates cognitive processes that determine behavior [38]. A recent meta-analysis of 75 articles found that implicit cognition is a strong and reliable predictor of substance use [39]. From this perspective, cognition that is spontaneously activated by stimuli from the environment alters how people act in a given situation. The ability to transcend immediate temptations in the service of long-term goals is a key aspect of self-regulation [5,40]. In an important series of studies, Mischel and colleagues studied how preschoolers responded in the face of temptation in situations in which delaying gratification led to larger rewards [40,41]. Successful self-control was associated with either redirection of attention away from temptation or cognitive reframing of ‘hot’ appetitive features into ‘cool’ representations [40]. A related pattern is found in behavioral economic studies in which people discount future rewards in decision-making by choosing less objectively valuable rewards that are immediately available [42]. A common feature of these studies is that people respond to appetizing cues by succumbing to immediate gratification rather than resisting temptation to achieve long-term goals.
emerge from this research is that self-regulation draws on a common domain-general resource, so that, for example, regulating one’s emotions over an extended period of time impairs subsequent attempts at resisting the temptation to eat appetizing foods and results in disinhibited eating [43]. Baumeister and Heatherton proposed a strength model of self-regulation in which it was hypothesized that the ability to effectively regulate behavior depends on a limited resource that is consumed by effortful attempts at self-regulation [5]. In addition, this model also posited that self-regulatory capacity can be built up through practice and training (Box 1). Since its formulation there has been a tremendous surge in research supporting the notion that self-regulation relies on a limited resource. Studies of self-regulatory resource depletion have demonstrated that self-regulatory resources can be depleted by a wide range of activities, from suppressing thoughts [44] and inhibiting emotions [43] to managing the impressions we make [45] and engaging in interracial interactions [46]. A recent meta-analysis of 83 studies of self-regulatory depletion concluded that the limited resource account of self-regulation remains the best explanation for this effect [10]. More recently, it has been suggested that self-regulation relies on adequate levels of circulating blood glucose that are temporarily reduced by tasks that require effortful selfregulation (Box 2).
Self-regulatory resource depletion Self-regulation, like many other cognitive faculties, is subject to fatigue. One of the more influential theories to
Functional neuroimaging studies of self-regulation Functional neuroimaging studies of self-regulation and its failures suggest that self-regulation involves a balance between brain regions representing the reward, salience and emotional value of a stimulus and prefrontal regions
Box 1. Can self-regulatory capacity be increased? In addition to postulating that self-regulation relies on a limited domain-general resource, the limited resource account of selfregulatory failure [5] also predicted that that self-regulatory capacity could be increased through practice or training. In the first study to examine the effect of self-regulatory training, participants engaged in a variety of daily tasks that required exertion of small amounts of self-control (e.g. remembering to maintain good posture). Compared to control participants, those who engaged in modest amounts of daily self-control were more resistant to the effects of self-regulatory depletion [100]. In addition, it has been shown that simple self-control regimens, such as using the non-dominant hand for daily activities, can reduce the depleting effects of suppressing stereotypes [101]. More recently, these results have been extended to health behaviors such as smoking cessation. Engaging in simple daily self-control exercises (e.g. avoiding unhealthy foods) before stopping smoking led to increased abstinence rates at follow-up for those who practiced self-control compared to a control group that did not [102]. These findings support the notion that self-regulatory strength can be increased through practice and that once increased, this newfound capacity to self-regulate can be used not only for comparatively banal tasks such as maintaining posture or using one’s non-dominant hand, but also for behaviors with important health consequences such as resisting the temptation to smoke. If self-regulatory capacity can be increased through simple selfcontrol exercises over relatively short periods of time, what about people whose profession requires constant self-regulation (e.g. professional musicians, air traffic controllers)? The study of selfregulatory capacity in such populations has remained largely unexplored; however, related research has shown that a relationship exists between musical training and grey matter in the dorsolateral prefrontal cortex [103], a brain region that has been implicated in both working memory and self-control [3].
134
Box 2. Self-regulatory resource depletion and blood glucose One issue with the limited resource model of self-regulation has been the lack of biological specificity in identifying the actual resource that is depleted by acts of self-control. It has recently been suggested that self-regulation relies on circulating blood glucose [104]. In a series of experiments, Gailliot and colleagues demonstrated that engaging in effortful self-control reduces blood glucose levels [105]. Moreover, they also found that artificially raising blood glucose levels eliminates the effects of self-regulatory depletion [105,106]. Although the notion that glucose metabolism affects self-regulation is recent, the impact of glucose on cognitive performance has been known for some time. For example, studies conducted in the 1990 s showed that administering glucose improves performance on memory tasks and on tasks requiring response inhibition [107]. In many respects this should come as no surprise, because glucose metabolism is the primary contrast in functional neuroimaging with positron emission tomography (PET), which, among numerous other findings, has demonstrated that glucose metabolism increases with task difficulty [108]. In light of this research, it seems plausible that self-regulatory failure following resource depletion is at least partly due to a temporary reduction in brain glucose stores. Finally, self-regulation relies primarily on cognitive functions that are ascribed to the prefrontal cortex, so depletion effects should presumably be greatest when both the depleting task and the subsequent self-regulation task recruit the same region of the brain. Although this has yet to be tested, PET neuroimaging, with its ability to directly measure glucose metabolism, is an ideal method for investigating the link between focal glucose depletion in the brain and subsequent impairments in self-regulation.
()TD$FIG][ Review
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Impulses overwhelm prefrontal control
Threats to self-regulation Cue exposure
Prefrontal-subcortical circuit is broken
Lapse activated consumption Negative mood
Lateral PFC NAcc X amygdala X
Resource depletion Alcohol consumption Prefrontal brain damage
PFC function is impaired
Leading to self-regulatory failure
TRENDS in Cognitive Sciences
Figure 2. Schematic of a balance model of self-regulation and its failure, highlighting the four threats to self-regulation identified in the text and their putative impact on brain areas involved in self-regulation. This model suggests that self-regulatory failure occurs whenever the balance is tipped in favor of subcortical regions involved in reward and emotion, either due to the strength of an impulse or due to a failure to appropriately engage top-down control mechanisms.
associated with self-control. When this balance tips in favor of bottom-up impulses, either because of a failure to engage prefrontal control areas or because of an especially strong impulse (e.g. the sight and smell of cigarettes for an abstinent smoker), then the likelihood of self-regulatory failure increases (Figure 2). Regulation of appetitive behaviors A universal feature of rewards, including drugs of abuse, is that they activate dopamine receptors in the mesolimbic dopamine system, especially the nucleus accumbens (NAcc) in the ventral striatum [47–49]. Functional neuroimaging studies have shown that the ingestion of drugs similarly increases activity in NAcc [50]. Earlier we noted that cue exposure is associated with self-regulation failure. Neuroimaging studies reveal a plausible mechanism for such effects. When addicted individuals are exposed to visual cues that have become associated with drugs (e.g. images of drugs and drug paraphernalia), they also show cue-related activity in the mesolimbic reward system [51– 53] and the insula [54]. Likewise, in neuroeconomic studies of decision-making, activity in mesolimbic reward structures is associated with choosing immediate monetary rewards [55,56]. Indeed, dopamine agonists increase impulsive behavior in intertemporal choice tasks [57]. Hence, exposure to cues activates reward regions, probably because of learned expectancies that the observed stimulus will be consumed and provide genuine reward. That is, over the course of human evolution, food-relevant stimuli, for example, were usually real and edible rather than mere visual representations. Thus, cue exposure motivates people to seek out relevant rewards. Interestingly, it seems likely that cue reactivity might influence motivation outside of conscious awareness [24,37,38,54]. Indeed, Childress and colleagues found that ‘unseen’ stimuli of cocaine (presented for 33 ms and then backward masked) produced striatal activity for cocaine addicts [58]. This supports the proposition that implicit cognition might be important in part because people are unaware that such unconscious processes are shaping their behavior and are therefore unable to resist their influence [59].
Of particular interest is what happens when participants attempt to regulate their responses to reward cues such as those representing money, food or drugs. When cocaine users [60] or smokers [61,62] are instructed to inhibit craving, they show increased activity in regions of the prefrontal cortex (PFC) associated with self-control and reduced cue-reactivity in regions associated with reward processing. Specifically, Volkow and colleagues showed that when cocaine users inhibit their craving in response to cocaine cues, they show reduced activity in the orbitofrontal cortex and ventral striatum [60]. Moreover, the magnitude of this reduction is correlated with an increase in activity in lateral PFC [60]. Similarly, in smokers, activity in the dorsolateral PFC during regulation of smoking craving correlated with reduced activity in the ventral striatum to smoking cues and this relationship mediated reductions in self-reported craving [61]. This effect is also observed in healthy participants who are instructed to regulate their response to cues representing monetary rewards; regulation of their response to reward cues results in decreased cue-related activity in the ventral striatum [63]. Finally, a recent study extended the above findings by demonstrating that individual differences in activity in the lateral PFC during a simple inhibition task were associated with real-world reductions in cigarette craving and consumption among smokers over a 3-week period [64]. The above studies indicate that regulation of craving requires top-down control of brain reward systems by PFC control regions [60,61,63]. But what happens when selfcontrol breaks down? As mentioned previously, one common reason why self-regulation fails is lapse-activated consumption, such as when dieters break their diet and temporarily engage in disinhibited eating [20,27,65,66]. One possible mechanism for this paradoxical pattern is that the initial intake of the food serves as a hedonic prime, and thereby brain regions involved in reward (i.e. NAcc) are freed from the regulatory influence of PFC, subsequently demonstrating a heightened response to appetizing food. A recent study tested this proposition by examining the effect of breaking a diet on neural 135
Review cue-reactivity to appetizing foods in dieters [67]. Compared to both non-dieters and dieters whose diet remained intact, those who had their diet broken showed increased cuereactivity to appetizing foods in the NAcc (Figure 1b), which echoes the behavioral findings of Herman and Mack[27]. Interestingly, non-dieters showed the opposite result; the NAcc showed the greatest response in the water condition, when subjects might have been hungry, but not in the milkshake condition, when participants were satiated. Thus, exposure to relevant cues or ingestion of forbidden substances heightens subcortical activity in reward regions, thereby tipping the balance so that frontal mechanisms seem to have less power over behavior. Self-regulation failure also occurs when frontal executive functions are compromised, such as following alcohol consumption [68] or injury [3]. For instance, patients with frontal lobe damage show a preference for immediate rewards in intertemporal choice tasks [69]. Likewise, transcranial magnetic stimulation to lateral PFC increases choices of immediate over delayed rewards [70]. It is plausible that negative mood and resource depletion interfere with self-regulation because they disrupt frontal control, thereby tipping the balance. We noted above that negative emotional states are associated with self-regulation failure, possibly because they interfere with higherorder representations, such as those involved in selfawareness and insight. Sinha and colleagues found that recall of personally distressing episodes led to decreased activity in PFC and increased activity in ventral striatal regions [71], which supports the idea that stress tips the balance to favor subcortical structures. Regulation of emotions Paralleling studies of appetitive regulation, research on emotion regulation has converged on a top-down model whereby neural responses to emotional material in the amygdala and associated limbic regions are downregulated by the lateral PFC [72–74]. Analogous to the cue-reactivity research outlined above, a frequent finding in studies of emotion regulation is of an inverse relationship between activity in the lateral PFC and the amygdala, a limbic structure sensitive to emotionally arousing stimuli [74– 78]. For instance, Wager and colleagues found that two independent pathways mediate frontal regulation of emotion: a frontal–striatal pathway is associated with successful regulation whereas a frontal–amygdala pathway is associated with less successful regulation [79]. Likewise, Schardt et al. found that increased functional coupling between lateral PFC and amygdala was associated with successful emotion regulation for those with genotypes associated with hyper-responsivity to negative stimuli [80]. Research on patients with mood disorders has demonstrated that the reciprocal relationship between PFC and amygdala during emotion regulation breaks down in patients suffering from major depressive disorder and borderline personality disorder (BPD) [75,81,82]. Recent studies suggest that this prefrontal–amygdala circuit might be related to differences in brain structure and connectivity. For instance, in contrast to controls, participants with BPD showed no coupling of metabolism be136
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
tween the medial PFC and the amygdala [83]. Similarly, reductions in white matter connectivity between the medial PFC and the amygdala, as measured with diffusion tensor imaging, were found for individuals with high anxiety [84]. In the non-clinical population, it has been shown that prolonged sleep deprivation leads to increased amygdala response to aversive images [85]. Regulation of attitudes and prejudice Social psychological models of person categorization suggest that stereotypes are automatically activated on encountering outgroup members and that active inhibition is required to suppress stereotypes and thereby avoid prejudicial behavior [86,87]. Functional neuroimaging research on race perception has largely corroborated these models by showing evidence of top-down regulation of the amygdala by the lateral PFC when viewing members of a racial outgroup [88,89]. Echoing the findings on the regulation of craving and emotions outlined above, activity in the lateral PFC was found to be inversely correlated with amygdala activity to racial outgroup members (i.e. African Americans) when viewing faces [88] and when assigning a verbal label to faces [89]. Further evidence that the recruitment of lateral PFC observed in these studies reflects self-regulatory processes comes from a study by Richeson and colleagues that combined functional neuroimaging with a behavioral measure of self-regulatory resource depletion [90]. Activity in the PFC (specifically lateral PFC and anterior cingulate cortex) when viewing black versus white faces was correlated with the degree to which participants experienced selfregulatory resource depletion in a separate behavioral experiment in which they were required to discuss racially charged topic with a black confederate [90]. Put differently, the degree to which participants found the inter-racial interaction cognitively depleting was associated with increased activity in lateral prefrontal regions when viewing black versus white faces during fMRI. Taken together, these findings suggest that, as with emotions and drug cues, regulation of attitudes towards outgroup members requires downregulation of the amygdala by the PFC. Prefrontal–subcortical balance model of self-regulation A longstanding idea in psychology is that resisting temptations reflects competition between impulses and self-control [2,5,40]. More recently, such dual-system models have received support from imaging research, with substantial evidence of frontal–subcortical connectivity and reciprocal activity [15,49,60,91–94]. Neuroscientific models of emotion regulation and self-control in drug addiction share conceptual similarities. For instance, models of drug addiction posit that brain reward systems are hypersensitized to drug cues and become uncoupled from PFC regions involved in top-down regulation [95,96]. Likewise, neuroeconomic studies of decision-making find that PFC activity is associated with long-term outcomes, whereas subcortical activity is associated with more immediate outcomes [97]. Similarly, models of emotion regulation and stereotype suppression suggest that prefrontal regions are involved in actively regulating emotion – or prejudicial attitudes – based on the observation of an inverse relationship be-
Review tween PFC and activity in the amygdala [77,88,89]. Studies of patients with anxiety and mood disorders offer similar evidence in the form of reduced functional [75] and structural [84] connectivity between the PFC and the amygdala. Similarly, alcohol consumption, which is known to disrupt self-regulation, shifts activity from the PFC to subcortical limbic structures [98], whereas excessive alcohol use leads to degeneration in cortical areas important for controlling impulsivity [68], which might serve to further undermine attempts to control impulses among alcoholics. During development, when frontal executive functions are still maturing, subcortical structures might more easily tip the balance and overwhelm selfregulatory resources, thereby explaining why adolescents might be prone to heightened emotionality and risk-taking [15]. What these different models have in common is the notion that during successful self-regulation, there is a balance between prefrontal regions involved in self-control and subcortical regions involved in representing reward incentives, emotions or attitudes. We propose that the precise subcortical target of top-down control is dependent on the regulatory context that individuals find themselves in: when a person regulates their food intake, this involves a prefrontal–striatal circuit, and when this same person later regulates their emotions, they instead invoke a prefrontal– amygdala circuit. From this perspective, the nature of selfregulation is constant across different types of regulation, despite variability in the neural regions that are being regulated [49]. Indeed, a recent review of self-control across six different domains found that lateral PFC is involved in exerting control regardless of the specific domain [99]. This supports our conjecture that the mechanism for self-regulation is domain-general, whereas the subcortical region involved varies depending on the nature of the stimulus, which might explain why the effects of resource depletion are not tied to any one self-regulatory domain. Why do people fail at self-regulation? Giving in to temptations can occur for a variety of reasons; for instance, dieters attempting to control their food intake might find it easy to ignore most foods, but when confronted with their favorite dessert their craving can overpower their resolve. Similarly, bad moods or competing regulatory demands can all conspire to break the hold people have over their impulses and desires. From the perspective of the prefrontal–subcortical balance model outlined above, anything that tips the balance in favor of subcortical regions can lead to self-regulatory collapse. This can occur in a bottom-up manner when people are confronted with especially potent cues, such as a favorite food, a free drink or a strong emotion, and in a top-down manner, such as when prefrontal functioning is impaired either when self-regulatory resources are depleted or due to drugs, alcohol or brain damage [3]. Therefore, for successful self-regulation, current self-regulatory ability must withstand the strength of an impulse. On this point, researchers have generally neglected to consider the situational factors that influence the balance between activity in subcortical regions and the PFC in self-regulation failure (Box 3). Our review suggests that some classic
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3
Box 3. Outstanding questions Are individual differences in susceptibility to self-regulatory failure related to prefrontal–subcortical connectivity or the integrity of frontal circuitry? Can direct measurements of brain glucose levels with FDG PET be used to test the glucose model of resource depletion? Does self-regulatory training alter brain connectivity and morphometry and do these changes predict greater self-regulatory success? Are patients with prefrontal damage, or adults with age-related cognitive decline, more susceptible to external cues such as appetizing foods or the sight and smell of cigarettes? Does the frontal–subcortical reciprocal relation change during childhood development or during aging or as a function of substance use?
self-regulatory failures occur because of their influence on reward (i.e. cue reactivity and lapse-activated consumption) whereas others occur because of their influence on PFC (i.e. negative moods, self-regulatory depletion, physiological disruption or damage of PFC). We also note that self-regulatory failure depends on the individual. That is, the particular domain a person tries to control is the one that is most prone to self-regulation failure. For example, self-regulatory resource depletion might lead an abstinent smoker to turn to cigarettes, a dieter to high-calorie foods or a prejudiced individual to make bigoted remarks; although the outcome is different in each case and the underlying subcortical regions involved can even differ (i.e. striatum or amygdala), the overall process is probably the same. Concluding remarks In this review we highlighted a number of threats to selfregulation, from negative mood and potent appetitive cues to lapse-activated consumption and self-regulatory resource depletion. Neuroimaging research on self-regulatory failure is still in its infancy. Recently, a small number of studies of drug addicts, patients and healthy individuals have shed light on the neural mechanisms underlying selfregulatory failure. This research corroborates theoretical models of self-control in which the PFC is involved in actively regulating subcortical responses to emotions and appetitive cues. This prefrontal–subcortical balance model emphasizes that self-regulatory collapse can occur because of both insufficient top-down control and overwhelming bottom-up impulses. Acknowledgments We thank Bill Kelley and Paul Whalen for helpful discussions in developing this model. This work was supported by NIH grant R01DA022582.
References 1 Baumeister, R.F. et al. (1994) Losing Control: How and Why People Fail at Self-Regulation, Academic Press 2 Hofmann, W. et al. (2009) Impulse and self-control from a dualsystems perspective. Perspect. Psychol. Sci. 4, 162–176 3 Wagner, D.D. and Heatherton, T.F. (2010) Giving in to temptation: the emerging cognitive neuroscience of self-regulatory failure, In Handbook of Self-Regulation: Research, Theory, and Applications (2nd edn) (Vohs, K.D. and Baumeister, R.F., eds), pp. 41–63, Guilford Press 137
Review 4 Heatherton, T.F. (2011) Self and identity: neuroscience of self and selfregulation. Annu. Rev. Psychol. 62, 363–390 5 Baumeister, R.F. and Heatherton, T.F. (1996) Self-regulation failure: an overview. Psychol. Inq. 7, 1–15 6 Schroeder, S.A. (2007) We can do better – improving the health of the American people. New Engl. J. Med. 357, 1221–1228 7 Tangney, J.P. et al. (2004) High self-control predicts good adjustment, less pathology, better grades, and interpersonal success. J. Pers. 72, 271–324 8 Duckworth, A.L. and Seligman, M.E. (2005) Self-discipline outdoes IQ in predicting academic performance of adolescents. Psychol. Sci. 16, 939–944 9 Quinn, P.D. and Fromme, K. (2010) Self-regulation as a protective factor against risky drinking and sexual behavior. Psychol. Addict. Behav. 24, 376–385 10 Hagger, M.S. et al. (2010) Ego depletion and the strength model of selfcontrol: a meta-analysis. Psychol. Bull. 136, 495–525 11 Marlatt, G.A. and Gordon, J.R. (1985) Relapse Prevention: Maintenance Strategies in the Treatment of Addictive Behaviors, Guilford Press 12 Sinha, R. (2009) Modeling stress and drug craving in the laboratory: implications for addiction treatment development. Addict. Biol. 14, 84–98 13 Anderson, C.A. and Bushman, B.J. (2002) Human aggression. Annu. Rev. Psychol. 53, 27–51 14 Bruyneel, S.D. et al. (2009) I felt low and my purse feels light: depleting mood regulation attempts affect risk decision making. J. Behav. Decis. Making 22, 153–170 15 Somerville, L.H. et al. (2010) A time of change: behavioral and neural correlates of adolescent sensitivity to appetitive and aversive environmental cues. Brain Cogn. 72, 124–133 16 Bousman, C.A. et al. (2009) Negative mood and sexual behavior among non-monogamous men who have sex with men in the context of methamphetamine and HIV. J. Affect. Disord. 119, 84–91 17 Magid, V. et al. (2009) Negative affect, stress, and smoking in college students: unique associations independent of alcohol and marijuana use. Addict. Behav. 34, 973–975 18 Sinha, R. (2007) The role of stress in addiction relapse. Curr. Psychiatry Rep. 9, 388–395 19 Witkiewitz, K. and Villarroel, N.A. (2009) Dynamic association between negative affect and alcohol lapses following alcohol treatment. J. Consult. Clin. Psychol. 77, 633–644 20 Heatherton, T.F. et al. (1991) Effects of physical threat and ego threat on eating behavior. J. Pers. Soc. Psychol. 60, 138–143 21 Macht, M. (2008) How emotions affect eating: a five-way model. Appetite 50, 1–11 22 McKee, S. et al. (2010) Stress decreases the ability to resist smoking and potentiates smoking intensity and reward. J. Psychopharmacol. DOI: 10.1177/0269881110376694 23 Heatherton, T.F. and Baumeister, R.F. (1991) Binge eating as escape from self-awareness. Psychol. Bull. 110, 86–108 24 Goldstein, R.Z. et al. (2009) The neurocircuitry of impaired insight in drug addiction. Trends Cogn. Sci. 13, 372–380 25 Ward, A. and Mann, T. (2000) Don’t mind if I do: disinhibited eating under cognitive load. J. Pers. Soc. Psychol. 78, 753–763 26 Sinha, R. (2008) Chronic stress, drug use, and vulnerability to addiction. Ann. N.Y. Acad. Sci. 1141, 105–130 27 Herman, C.P. and Mack, D. (1975) Restrained and unrestrained eating. J. Pers. 43, 647–660 28 Herman, C.P. and Polivy, J. (2010) The self-regulation of eating: theoretical and practical problems, In Handbook of SelfRegulation: Research, Theory, and Applications (2nd edn) (Vohs, K.D. and Baumeister, R.F., eds), pp. 492–508, Guilford Press 29 Marlatt, G.A. et al. (2009) Relapse prevention: evidence base and future directions, In Evidence-Based Addiction Treatment (1st edn) (Miller, P.M., ed.), pp. 215–232, Elsevier/Academic Press 30 Drummond, D.C. et al. (1990) Conditioned learning in alcohol dependence: implications for cue exposure treatment. Br. J. Addict. 85, 725–743 31 Glautier, S. and Drummond, D.C. (1994) Alcohol dependence and cue reactivity. J. Stud. Alcohol. 55, 224–229 32 Jansen, A. (1998) A learning model of binge eating: cue reactivity and cue exposure. Behav. Res. Ther. 36, 257–272
138
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3 33 Stewart, J. et al. (1984) Role of unconditioned and conditioned drug effects in the self-administration of opiates and stimulants. Psychol. Rev. 91, 251–268 34 Drobes, D.J. and Tiffany, S.T. (1997) Induction of smoking urge through imaginal and in vivo procedures: physiological and selfreport manifestations. J. Abnorm. Psychol. 106, 15–25 35 Payne, T.J. et al. (2006) Pretreatment cue reactivity predicts end-oftreatment smoking. Addict. Behav. 31, 702–710 36 Ferguson, M.J. and Bargh, J.A. (2004) How social perception can automatically influence behavior. Trends Cogn. Sci. 8, 33–39 37 Stacy, A.W. and Wiers, R.W. (2010) Implicit cognition and addiction: a tool for explaining paradoxical behavior. Annu. Rev. Clin. Psychol. 6, 551–575 38 Bargh, J.A. and Morsella, E. (2008) The unconscious mind. Perspect Psychol. Sci. 3, 73–79 39 Rooke, S.E. et al. (2008) Implicit cognition and substance use: a metaanalysis. Addict. Behav. 33, 1314–1328 40 Metcalfe, J. and Mischel, W. (1999) A hot/cool-system analysis of delay of gratification: dynamics of willpower. Psychol. Rev. 106, 3–19 41 Mischel, W. et al. (2010) ‘Willpower’ over the life span: mechanisms, consequences, and implications. Soc. Cogn. Affect. Neurosci. DOI: 10.1093/scan/nsq081 42 Bickel, W.K. and Marsch, L.A. (2001) Toward a behavioral economic understanding of drug dependence: delay discounting processes. Addiction 96, 73–86 43 Vohs, K.D. and Heatherton, T.F. (2000) Self-regulatory failure: a resource-depletion approach. Psychol. Sci. 11, 249–254 44 Muraven, M. et al. (2002) Self-control and alcohol restraint: an initial application of the self-control strength model. Psychol. Addict. Behav. 16, 113–120 45 Vohs, K.D. et al. (2005) Self-regulation and self-presentation: regulatory resource depletion impairs impression management and effortful self-presentation depletes regulatory resources. J. Pers. Soc. Psychol. 88, 632–657 46 Richeson, J.A. and Shelton, J.N. (2003) When prejudice does not pay: effects of interracial contact on executive function. Psychol. Sci. 14, 287–290 47 Baler, R.D. and Volkow, N.D. (2006) Drug addiction: the neurobiology of disrupted self-control. Trends Mol. Med. 12, 559–566 48 Robinson, T.E. and Berridge, K.C. (2003) Addiction. Annu. Rev. Psychol. 54, 25–53 49 Volkow, N.D. et al. (2008) Overlapping neuronal circuits in addiction and obesity: evidence of systems pathology. Philos. Trans. R. Soc. Lond. B: Biol. Sci. 363, 3191–3200 50 O’Doherty, J.P. et al. (2003) Temporal difference models and rewardrelated learning in the human brain. Neuron 38, 329–337 51 Garavan, H. et al. (2000) Cue-induced cocaine craving: neuroanatomical specificity for drug users and drug stimuli. Am. J. Psychiatry 157, 1789–1798 52 Grant, S. et al. (1996) Activation of memory circuits during cueelicited cocaine craving. Proc. Natl. Acad. Sci. U.S.A. 93, 12040–12045 53 Myrick, H. et al. (2008) Effect of naltrexone and ondansetron on alcohol cue-induced activation of the ventral striatum in alcoholdependent people. Arch. Gen. Psychiatry 65, 466–475 54 Naqvi, N.H. and Bechara, A. (2009) The hidden island of addiction: the insula. Trends Neurosci. 32, 56–67 55 Diekhof, E.K. and Gruber, O. (2010) When desires collide with reason: functional interactions between anteroventral prefrontal cortex and nucleus accumbens underlie the human ability to resist impulsive desires. J. Neurosci. 30, 1488–1493 56 McClure, S.M. et al. (2004) Separate neural systems value immediate and delayed monetary rewards. Science 306, 503–507 57 Pine, A. et al. (2010) Dopamine, time, and impulsivity in humans. J. Neurosci. 30, 8888–8896 58 Childress, A.R. et al. (2008) Prelude to passion: limbic activation by ‘‘unseen’’ drug and sexual cues. PLoS ONE 3, e1506 59 Wagner, D.D. et al. (2011) Spontaneous action representation in smokers watching movie smoking. J. Neurosci. 31, 894–898 60 Volkow, N.D. et al. (2010) Cognitive control of drug craving inhibits brain reward regions in cocaine abusers. Neuroimage 49, 2536– 2543 61 Kober, H. et al. (2010) Prefrontal-striatal pathway underlies cognitive regulation of craving. Proc. Natl. Acad. Sci. U.S.A. 107, 14811–14816
Review 62 Brody, A.L. et al. (2007) Neural substrates of resisting craving during cigarette cue exposure. Biol. Psychiatry 62, 642–651 63 Delgado, M.R. et al. (2008) Regulating the expectation of reward via cognitive strategies. Nat. Neurosci. 11, 880–881 64 Berkman, E.T., et al. In the trenches of real-world self-control: Neural correlates of breaking the link between craving and smoking. Psychol. Sci., in press 65 Heatherton, T.F. et al. (1992) Effects of distress on eating: the importance of ego-involvement. J. Pers. Soc. Psychol. 62, 801–803 66 Heatherton, T.F. et al. (1993) Self-awareness, task failure, and disinhibition: how attentional focus affects eating. J. Pers. 61, 49–61 67 Demos, K.E. et al. (2011) Dietary restraint violations influence reward responses in nucleus accumbens and amygdala. J. Cogn. Neurosci. 21568 DOI: 10.1162/jocn. 2010 68 Crews, F.T. and Boettiger, C.A. (2009) Impulsivity, frontal lobes and risk for addiction. Pharmacol. Biochem. Behav. 93, 237–247 69 Sellitto, M., Ciaramelli, E. and de Pellegrino, G. (2010) Myopic discounting of future rewards after medial orbitofrontal damage in humans. J. Neurosci. 30, 6429–6436 70 Figner, B. et al. (2010) Lateral prefrontal cortex and self-control in intertemporal choice. Nat. Neurosci. 13, 538–539 71 Sinha, R. et al. (2005) Neural activity associated with stress-induced cocaine craving: a functional magnetic resonance imaging study. Psychopharmacology 183, 171–180 72 Davidson, R.J. et al. (2000) Dysfunction in the neural circuitry of emotion regulation – a possible prelude to violence. Science 289, 591– 594 73 Ochsner, K.N. and Gross, J.J. (2005) The cognitive control of emotion. Trends Cogn. Sci. 9, 242–249 74 Hariri, A.R. et al. (2003) Neocortical modulation of the amygdala response to fearful stimuli. Biol. Psychiatry 53, 494–501 75 Johnstone, T. et al. (2007) Failure to regulate: counterproductive recruitment of top-down prefrontal–subcortical circuitry in major depression. J. Neurosci. 27, 8877–8884 76 Ochsner, K.N. et al. (2002) Rethinking feelings: an FMRI study of the cognitive regulation of emotion. J. Cogn. Neurosci. 14, 1215–1229 77 Ochsner, K.N. et al. (2004) For better or for worse: neural systems supporting the cognitive down- and up-regulation of negative emotion. Neuroimage 23, 483–499 78 Urry, H.L. et al. (2006) Amygdala and ventromedial prefrontal cortex are inversely coupled during regulation of negative affect and predict the diurnal pattern of cortisol secretion among older adults. J. Neurosci. 26, 4415–4425 79 Wager, T.D. et al. (2008) Prefrontal–subcortical pathways mediating successful emotion regulation. Neuron 59, 1037–1050 80 Schardt, D.M. et al. (2010) Volition diminishes genetically mediated amygdala hyperreactivity. Neuroimage 53, 943–951 81 Donegan, N.H. et al. (2003) Amygdala hyperreactivity in borderline personality disorder: implications for emotional dysregulation. Biol. Psychiatry 54, 1284–1293 82 Silbersweig, D. et al. (2007) Failure of frontolimbic inhibitory function in the context of negative emotion in borderline personality disorder. Am. J. Psychiatry 164, 1832–1841 83 New, A.S. et al. (2007) Amygdala–prefrontal disconnection in borderline personality disorder. Neuropsychopharmacology 32, 1629–1640 84 Kim, M.J. and Whalen, P.J. (2009) The structural integrity of an amygdala–prefrontal pathway predicts trait anxiety. J. Neurosci. 29, 11614–11618 85 Yoo, S.S. et al. (2007) The human emotional brain without sleep – a prefrontal amygdala disconnect. Curr. Biol. 17, R877–878
Trends in Cognitive Sciences March 2011, Vol. 15, No. 3 86 Devine, P.G. (1989) Stereotypes and prejudice – their automatic and controlled components. J. Pers. Soc. Psychol. 56, 5–18 87 Fiske, S.T. (1998) Stereotyping, prejudice, and discrimination. In The Handbook of Social Psychology (Vol. 2) (Gilbert, D. et al., eds), In pp. 357–411, McGraw-Hill 88 Cunningham, W.A. et al. (2004) Separable neural components in the processing of black and white faces. Psychol. Sci. 15, 806–813 89 Lieberman, M.D. et al. (2005) An fMRI investigation of race-related amygdala activity in African-American and Caucasian-American individuals. Nat. Neurosci. 8, 720–722 90 Richeson, J.A. et al. (2003) An fMRI investigation of the impact of interracial contact on executive function. Nat. Neurosci. 6, 1323–1328 91 Banks, S.J. et al. (2007) Amygdala–frontal connectivity during emotion regulation. Soc. Cogn. Affect. Neurosci. 2, 303–312 92 Batterink, L. et al. (2010) Body mass correlates inversely with inhibitory control in response to food among adolescent girls: an fMRI study. Neuroimage 52, 1696–1703 93 Li, C.S. and Sinha, R. (2008) Inhibitory control and emotional stress regulation: neuroimaging evidence for frontal–limbic dysfunction in psycho-stimulant addiction. Neurosci. Biobehav. Rev. 32, 581–597 94 MacDonald, K.B. (2008) Effortful control, explicit processing, and the regulation of human evolved predispositions. Psychol. Rev. 115, 1012– 1031 95 Bechara, A. (2005) Decision making, impulse control and loss of willpower to resist drugs: a neurocognitive perspective. Nat. Neurosci. 8, 1458–1463 96 Koob, G.F. and Le Moal, M. (2008) Addiction and the brain antireward system. Annu. Rev. Psychol. 59, 29–53 97 Heuttel, S.A. (2010) Ten challenges for decision neuroscience. Front. Neurosci. 4, 1–7 98 Volkow, N.D. et al. (2008) Moderate doses of alcohol disrupt the functional organization of the human brain. Psychiatry Res. 162, 205–213 99 Cohen, J.R. and Lieberman, M.D. (2010) The common neural basis of exerting self-control in multiple domains. In Self Control in Society, Mind, and Brain (Hassin, R. et al., eds), pp. 141–162, Oxford University Press 100 Muraven, M. et al. (1999) Longitudinal improvement of selfregulation through practice: building self-control strength through repeated exercise. J. Soc. Psychol. 139, 446–457 101 Gailliot, M.T. et al. (2007) Increasing self-regulatory strength can reduce the depleting effect of suppressing stereotypes. Pers. Soc. Psychol. Bull. 33, 281–294 102 Muraven, M. (2010) Practicing self-control lowers the risk of smoking lapse. Psychol. Addict. Behav. 24, 446–452 103 Bermudez, P. et al. (2009) Neuroanatomical correlates of musicianship as revealed by cortical thickness and voxel-based morphometry. Cereb. Cortex 19, 1583–1596 104 Gailliot, M.T. and Baumeister, R.F. (2007) The physiology of willpower: linking blood glucose to self-control. Pers. Soc. Psychol. Rev. 11, 303–327 105 Gailliot, M.T. et al. (2007) Self-control relies on glucose as a limited energy source: willpower is more than a metaphor. J. Pers. Soc. Psychol. 92, 325–336 106 Gailliot, M.T. et al. (2009) Stereotypes and prejudice in the blood: sucrose drinks reduce prejudice and stereotyping. J. Exp. Soc. Psychol. 45, 288–290 107 Benton, D. et al. (1994) Blood glucose influences memory and attention in young adults. Neuropsychologia 32, 595–607 108 Jonides, J. et al. (1997) Verbal working memory load affects regional brain activation as measured by PET. J. Cogn. Neurosci. 9, 462–475
139