THE ADAPTIVE BRAIN I1 Vision, Speech, Language, and Motor Control
ADVANCES IN PSYCHOLOGY 43 Editors G . E. STELMACH
...
33 downloads
769 Views
26MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
THE ADAPTIVE BRAIN I1 Vision, Speech, Language, and Motor Control
ADVANCES IN PSYCHOLOGY 43 Editors G . E. STELMACH
P. A. VROON
NORTH-HOLLAND AMSTERDAM .NEW YORK * OXFORD .TOKYO
THEADAPTIVEBRAINII Vision, Speech, Language, and Motor Control
Edited by
Stephen GROSSBERG Centerfor Adaptive Systems Boston University Boston, Massachusetts U.S. A .
1987
NORTH-HOLLAND AMSTERDAM .NEW YORK . OXFORD .TOKYO
0 ELSEVIER
SCIENCE PUBLISHERS B.V., 1987
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, or any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
ISBN : 0 444 70118 4 ISBNSet: 0 444 70119 2
The other volume in this set is: The Adaptive Brain I: Cognition, Learning, Reinforcement, and Rhythm, S.Grossberg, Ed., (1987). This is volume 42 in the North-Holland series Advances in Psychology, ISBN: 0 444 70117 6.
Publishers:
ELSEVIER SCIENCE PUBLISHERS B.V. P.O. Box 1991 1000 BZ Amsterdam The Netherlands
Sole distributors for the U.S. A. and Canada: ELSEVIER SCIENCE PUBLISHING COMPANY, INC. 52Vanderbilt Avenue NewYork,N.Y. 10017 U.S.A.
PRINTED IN THE NETHERLANDS
Dedicated to
Jacob Beck and George Sperling
With Admiration
This Page Intentionally Left Blank
VU
EDITORIAL PREFACE
The mind and brain sciences are experiencing a period of explosive development. In addition to experimental contributions which probe the widest possible range of phenomena with stunning virtuosity, a true theoretical synthesis is taking place. The remarkabk multiplicity of behaviors, of levels of behavioral and neural organization, and of experimental paradigms and methods for probing this complexity present a formidable challenge to all serious theorists of mind. The challenge is, quite simply, to discover unity behind this diversity by characterizing a small set of theoretical principles and mechanisms capable of unifiing and predicting large and diverse data bases as manifestations of fundamental processes. Another part of the challenge is to explain how mind differs from better understood physical systems, and to identify what is new in the theoretical methods that are best suited for a scientific analysis of mind. These volumes collect together recent articles which provide a unified theoretical analysis and predictions of a wide range of important psychological and neurological data. These articles illustrate the development of a true theory of mind and brain, rather than just a set of disconnected models with no predictive generality. In this theory, a small number of fundamental dynamical laws, organizational principles, and network modules help to compress a large data base. The theory accomplishes this synthesis by showing how these fundamental building blocks can be used to design specialized circuits in different neural systems and across species. Such a specialization is analogous to using a single Schrijdinger equation in quantum mechanics to analyse a large number of different atoms and molecules. The articles collected herein represent a unification in yet another sense. They were all written by scientists within a single research institute, the Center for Adaptive Systems at Boston University. The fact that a single small group of scientists can theoretically analyse such a broad range of data illustrates both the power of the theoretical methods that they employ and the crucial role of interdisciplinary thinking in achieving such a synthesis, It also argues for the benefits that can be derived from supporting more theoretical training and research programs within the traditional departments charged with an understanding of mind and brain phenomena. My colleagues and I at the Center for Adaptive Systems have repeatedly found that fundamental processes governing mind and brain can best be discovered by analysing how the behavior of individuals successfully adapts in real-time to constraints imposed by the environment, In other words, principles and laws of behavioral self-organization are rate-limiting in determining the design of neural processes, and problems of selforganization are the core issues that distinguish mind and brain studies from the more traditional sciences. An analysis of real-time behavioral adaptation requires that one identify the functional level on which an individual’s behavioral success is defined. This is not the level of individual nerve cells. Rather it is the level of neural systems. Many examples can now be given to illustrate the fact that one cannot, in principle, determine the properties which govern behavioral success from an analysis of individual cells alone. An analysis of individual cells is insufficient because behavioral properties are often emergent properties due to interactions among cells. Different types of specialized neural circuits govern different combinations of emergent behavioral properties.
Viil
Editorial Preface
On the other hand, it is equally incorrect to assume that the properties of individual cells are unimportant, as many proponents of artificial intelligence have frequently done to promote the untenable claim that human intelligence can be understood through an analysis of Von Neumann computer architectures. Carefully designed single cell properties are joined to equally carefully designed neural circuits to generate the subtle relationships among emergent behavioral properties that are characteristic of living organisms. In order to adequately define these circuits and to analyse their emergent behavioral properties, mathematical analysis and computer simulation play a central role. This is inevitable because self-organizing behavioral systems obey nonlinear laws and often contain very large numbers of interacting units. The mathematical theory that has emerged from this analysis embodies a biologically relevant artificial intelligence, as well as contributing new ideas to nonlinear dynamical systems, adaptive control theory, geometry, statistical mechanics, information theory, decision theory, and measurement theory. This mathematical work thus illustrates that new mathematical ideas are needed to describe and analyse the new principles and mechanisms which characterize behavioral self-organization. We trace the oceans of hyperbole, controversy, and rediscovery which still flood our science to the inability of some investigators to fully let go of unappropriate technological metaphors and nineteenth century mathematical concepts. Although initially attractive because of their simplicity and accessibility, these approaches have regularly shown their impotence when they are confronted by a nontrivial set of the phenomena that they have set out to explain. A unified theoretical understanding cannot be achieved without an appropriate mathematical language in our science any more than in any other science. A scientist who comes for the first time to such a new theoretical enterprise, embedded in such a confusing sociological milieu, may initially become disoriented. The very fact that behavioral, neural, mathematical, and computer analyses seem to permeate every issue defies all the traditional departmental boundaries and intellectual prejudices that have separated investigators in the past. After this initial period of disorientation passes, however, such a scientist can begin to reap handsome intellectual rewards. New postdoctoral fellows at the Center for Adaptive Systems have, for example, arrived with a strong training in experimental psychology augmented by modest mathematical and computer coursework, yet have found themselves within a year performing advanced analyses and predictions of previously unfamiliar neural data through computer simulations of real-time neural networks. The theoretical method itself and the foundation of knowledge to which it has already led can catapult a young investigator to the forefront of research in an area which would previously have required a lifetime of study. We have found often that problems which seemed impossible without the theory became difficult but tractable with it. In summary, the articles in these volumes illustrate a theoretical approach which analyses how brain systems are designed to form an adaptive relationship with their environment. Instead of limiting our consideration to a few performance characteristics of a behaving organism, we consider the developmental and learning problems that a system as a whole must solve before accurate performance can be achieved. We do not take accurate performance for granted, but rather analyse the organizational principles and dynamical mechanisms whereby it is achieved and maintained. Such an analysis is necessary if only because an analysis of performance per be does not impose sufficiently many constraints to determine underlying control mechanisms. The unifying power of such theoretical work is due, we believe, to the fact that principles of adaptation-such M the laws governing development and learning-are fundamental in determining the design of behavioral mechanisms. A preface precedes each article in these volumes. These commentaries link the articles together, highlight some of their major contributions, and comment upon future directions of research. The work reported within these articles has been supported by
ix
Editorial Preface
the Air Force Office of Scientific Research, the Army Research Office, the National Science Foundation, and the Office of Naval Research. We are grateful to these xgencies for making this work possible. We are also grateful to Cynthia Suchta for doing a marvelously competent job of typing and formatting the text, and to Jonathan Marshall for expertly preparing the index and proofreading the text. Beth Sanfield and Carol Yanakakis also provided valuable assistance. Stephen Grossberg Boston, Massachusetts March, 1986
X
TABLE OF CONTENTS CHAPTER I: THE QUANTIZED GEOMETRY OF VISUAL SPACE: THE COHERENT COMPUTATION OF DEPTH, FORM, AND LIGHTNESS 1. Introduction: The Abundance of Visual Models PART I 2. The Quantized Geometry of Visual Space 3. The Need for Theories Which Match the Data’s Coherence 4. Some Influences of Perceived Depth on Perceived Size 5. Some Monocular Constraints on Size Perception 6. Multiple Scales in Figure and Ground: Simultaneous Fusion and Rivalry 7. Binocular Matching, Competitive Feedback, and Monocular Self-Matching 8. Against the Keplerian View: Scale-Sensitive Fusion and Rivalry 9. Local versus Global Spatial Scales 10. Interaction of Perceived Form and Perceived Position 11. Some Influences of Perceived Depth and Form on Perceived Brightness 12. Some Influences of Perceived Brightness on Perceived Depth 13. The Binocular Mixing of Monocular Brightnesses 14. The Insufficiency of Disparity Computations 15. The Insufficiency of Fourier Models 16. The Insufficiency of Linear Feedforward Theories 17. The Filling-In Dilemma: To Have Your Edge and Fill-In Too PART II 18. Edges and Fixations: The Ambiguity of Statistically Uniform Regions 19. Object Permanence and Multiple Spatial Scales 20. Cooperative versus Competitive Binocular Interactions 21. Reflectance Processing, Weber Law Modulation, and Adaptation Level in Feedforward Shunting Competitive Networks 22. Pattern Matching and Multidimensional Scaling Without a Metric 23. Weber Law and Shift Property Without Logarithms 24. Edge, Spatial Frequency, and Reflectance Processing by thd Receptive Fields of Distance-Dependent Feedforward Networks 25. Statistical Analysis by Structural Scales: Edges With Scaling and Reflectance Properties Preserved 26. Correlation of Monocular Scaling With Binocular Fusion 27. Noise Suppression in Feedback Competitive Networks 28. Sigmoid Feedback Signals and Tuning 29. The Interdependence of Contrast Enhancement and Tuning
1
4
9 11
12 13 13 13 16 16 18 19 20
21 23 23 24 ,,27 28 30 33 34 36 36 41
Table of Contents 30. Normalization and Multistability in a Feedback
Competitive Network: A Limited Capacity Short Term Memory System 31. Propagation of Normalized Disinhibitory Cues 32. Structural versus Functional Scales 33. Disinhibitory Propagation of Functional Scaling From Boundaries to Interiors 34. Quantization of Functional Scales: Hysteresis and Uncertainty 35. Phantoms 36. Functional Length and Emmert's Law 37. Functional Lightness and the Cornsweet Effect 38. The Monocular Length-Luminance Effect 39. Spreading FIRE: Pooled Binocular Edges, False Matches, Allelotropia, Binocular Brightness Summation, and Binocular Length Scaling 40. Figure-Ground Separation by Filling-In Barriers 41. The Principle of Scale Equivalence and the Curvature of Activity-Scale Correlations: Fechner's Paradox, Equidistance Tendency, and Depth Without Disparity 42. Reflectance Rivalry and Spatial Frequency Detection 43. Resonance in a Feedback Dipole Field: Binocular Development and Figure-Ground Completion 44. Binocular Rivalry 45. Concluding Remarks About Filling-In and Quantization Appendix References
CHAPTER 2: NEURAL DYNAMICS OF FORM PERCEPTION: BOUNDARY COMPLETION, ILLUSORY FIGURES, AND NEON COLOR SPREADING 1. Illusions as a Probe of Adaptive Visual
Mechanisms 2. From Noisy Retina to Coherent Percept 3. Boundary Contour System and Feature Contour System 4. Boundary Contours and Boundary Completion 5. Feature Contours and Diffusive Filling-In 6. Macrocircuit of Processing Stages 7. Neon Color Spreading and Complementary Color Induction 8. Contrast, Assimilation, and Grouping 9. Boundary Completion: Positive Feedback Between Local Competition and Long-Range Cooperation of Oriented Boundary Contour Segments 10. Boundary Completion as a Statistical Process: Textural Grouping and Object Recognition 11. Perpendicular versus Parallel Contour Completion 12. Spatial Scales and Brightness Contrast 13. Boundary-Feature Trade-OB: Orientational Uncertainty and Perpendicular End Cutting 14. Induction of "Real" Contours Using "Illusory" Contour Mechanisms 15. Gated Dipole Fields
xi
41 42 42 44 44 45 46
47 47 48 56 56 58 59 62 63 64 67
80
83 83 85 85 89 91 91 97 99 105 105 108 109 112
113
Xii
Table of Corirenrs
16. Boundary Completion: Oriented Cooperation Among Multiple Spatial Scales 17. Computer Simulations 18. Brightness Paradoxes and the Land Retinex Theory 19. Related Data and Concepts About Illusory Contours 20. Cortical Data and Predictions 21. Concluding Remarks Appendix: Dynamics of Boundary Formation References
C H A P T E R 3: N E U R A L DYNAMICS OF P E R C E P T U A L G R O U P I N G : T E X T U R E S , BOUNDARIES, A N D EMERGENT SEGMENTATIONS 1. Introduction: Towards A Universal Set of Rules for Perceptual Grouping 2. The Role of Illusory Contours 3. Discounting the Illuminant: Color Edges and Featural Filling-In 4. Featural Filling-In Over Stabilized Scenic Edges 5. Different Rules for Boundary Contours and Feature Contours 6. Boundary-Feature Trade-off: Every Line End Is Illusory 7. Parallel Induction by Edges versus Perpendicular Induction by Line Ends 8. Boundary Completion via Cooperative-Competitive Feedback Signaling: CC Loops and the Statistics of Grouping 9. Form Perception versus Object Recognition: Invisible but Potent Boundaries 10. Analysis of the Beck Theory of Textural Segmentation: Invisible Colinear Cooperation 11. The Primacy of Slope 12. Statistical Properties of Oriented Receptive Fields: OC Filters 13. Competition Between Perpendicular Subjective Contours 14. Multiple Distance-Dependent Boundary Contour Interactions: Explaining Gestalt Rules 15. Image Contrasts and Neon Color Spreading 16. Computer Simulations of Perceptual Grouping 17. On-Line Statistical Decision Theory and Stochastic Relaxation 18. Correlations Which Cannot Be Perceived: Simple Cells, Complex Cells, and Cooperation 19. Border Locking: The Cafk Wall Illusion 20. Boundary Contour System Stages: Predictions About Cortical Architectures 21. Concluding Remarks: Universality of the Boundary Contour System Appendix: Boundary Contour System Equations References
114 116 121 127 127 129 134 138 143
145 147 149 149 151 153 154 158 162 163 165 165 167 170 173 177 180 187 189 193 198 202 207
Table of Contents
CHAPTER 4: NEURAL DYNAMICS OF BRIGHTNESS PERCEPTION: FEATURES, BOUNDARIES, DIFFUSION, AND RESONANCE 1. Paradoxical Percepts as Probes of Adaptive Processes 2. The Boundary-Contour System and the Feature Contour System 3. Boundary Contours and Boundary Completion 4. Feature Contours and Diffusive Filling-In 5. Macrocircuit of Processing Stages 6. FIRE: Resonant Lifting of Preperceptual Data into a Form-in-Depth Percept 7. Binocular Rivalry, Stabilized Images, and the Ganzfeld 8. The Interplay of Controlled and Automatic Processes 9. Craik-O’Brien Luminance Profiles and Multiple Step Illusions 10. Smoothly Varying Luminance Contours versus Steps of Luminance Change 11. The Asymmetry Between Brightness Contrast and Darkness Contrast 12. Simulations of FIRE 13. Fechner’s Paradox 14. Binocular Brightness Averaging and Summation 15. Simulation of a Parametric Binocular Brightness Study 16. Concluding Remarks Appendix A Appendix B References
CHAPTER 5: ADAPTATION AND TRANSMITTER GATING IN VERTEBRATE PHOTORECEPTORS Introduction Transmitters as Gates Intracellular Adaptation and Overshoot Monotonic Increments and Nonmonotonic Overshoots to Flashes on Variable Background 5. Miniaturized Transducers and Enzymatic Activation of Transmitter Production 6. Trun-Around of Potential Peaks at High Background Intensities 7. Double Flash Experiments 8. Antagonistic Rebound by an Intracellular Dipole: Rebound Hyperpolarization Due to Current Offset 9. Coupling of Gated Input to the Photoreceptor Potential 10. “Extran Slow Conductance During Overshoot and Double Flash Experiments 11. Shift Property and its Relationship to Enzymatic Modulation 12. Rebound Hyperpolarization, Antagonistic Rebound, and Input Doubling 13. Transmitter Mobilization 1. 2. 3. 4.
Xiii
211 213 215 215 219 220 222 224 225 225 229 237 239 243 247 247 251 258 263 267 271 273 275 276 279 281 283 284 287 290 292 293 294 296
xiv
Table of Contents
14. Quantitative Analysis of Models 15. Comparison with the Baylor, Hodgkin, Lamb Model 16. Conclusion References
CHAPTER 6: THE ADAPTIVE SELF-ORGANIZATION OF SERIAL ORDER IN BEHAVIOR: SPEECH, LANGUAGE, A N D MOTOR CONTROL 1. Introduction: Principles of Self-organization in Models of Serial Order: Performance Models versus Self-organizing Models 2. Models of Lateral Inhibition, Temporal Order, Letter Recognition, Spreading Activation, Associative Learning, Categorical Perception, and Memory Search: Some Problem Areas 3. Associative Learning by Neural Networks: Interactions Between STM and LTM 4. LTM Unit is a Spatial Pattern: Sampling and Factorization 5. Outstar Learning: Factorizing Coherent Patterns From Chaotic Activity 6. Sensory Expectations, Motor Synergies, and Temporal Order Information 7. Ritualistic Learning of Serial Behavior: Avalanches 8. Decoupling Order and Rhythm: Nonspecific Arousal aa a Velocity Command 9. Reaction Time and Performance Speed-Up 10. Hierarchical Chunking and the Learning of Serial Order 11. Self-Organization of Plans: The Goal Paradox 12. Temporal Order Information in LTM 13. Read-Out and Self-Inhibition of Ordered STM Traces 14. The Problem of STM-LTM Order Reversal 15. Serial Learning 16. Rhythm Generators and Rehearsal Waves 17. Shunting Competitive Dynamics in Pattern Processing and STM: Automatic Self-Tuning by Parallel Interactions 18. Choice, Contrast Enhancement, Limited STM Capacity, snd Quenching Threshold 19. Limited Capacity Without a Buffer: Automaticity versus Competition 20. Hill Climbing and the Rich Get Richer 21. Instar Learning: Adaptive Filtering and Chunking 22. Spatial Gradients, Stimulus Generalization, and Categorical Perception 23. The Progressive Sharpening of Memory: Tuning Prewired Perceptual Categories 24. Stabilizing the Coding of Large Vocabularies: Top-Down Expectancies and STM Reset by Unexpected Events 25. Expectancy Matching and Adaptive Resonance 26. The Processing of Novel Events: Pattern Completion versus Search of Associative Memory
300 305 308 309 311
313 314
320 324 325 328 329 332 332 335 335 338 338 339 343 344 345 346 349 350 351 353 354 356 359 359
Table of Contents
27. Recognition, Automaticity, Primes, and Capacity 28. Anchors, Auditory Contrast, and Selective Adaptation 29. Training of Attentions1 Set and Perceptual Categories 30. Circular Reactions, Babbling, and the Development of Auditory-Articulatory Space 31. Analysis-By-Synthesis and the Imitation of Novel Events 32. A Moving Picture of Continuously Interpolated Terminal Motor Maps: Coarticulation and Articulatory Undershoot 33. A Context-Sensitive STM Code for Event Sequences 34. Stable Unitization and Temporal Order Information in STM: The LTM Invariance Principle 35. Transient Memory Span, Grouping, and Intensity-Time Tradeoffs 36. Backward Effects and Effects of Rate on Recall Order 37. Seeking the Most Predictive Representation: All Letters and Words are Lists 38. Spatial Frequency Analysis of Temporal Patterns by a Masking Field: Word Length and Superiority 39. The Temporal Chunking Problem 40. The Masking Field: Joining Temporal Order to Differential Masking via an Adaptive Filter 41. The Principle of Self-Similarity and the Magic Number 7 42. Developmental Equilibration of the Adaptive Filter and its Target Masking Field 43. The Self-Similar Growth Rule and the Opposites Attract Rule 44. Automatic Parsing, Learned Superiority Effects, and Serial Position Effects During Pattern Completion 45. Gray Chips or Great Ships? 46. Sensory Recognition versus Motor Recall: Network Lesions and Amnesias 47. Four Types of Rhythm: Their Reaction Times and. Arousal Sources 48. Concluding Remarks Appendix: Dynamical Equations References
CHAPTER 7: NEURAL DYNAMICS OF WORD RECOGNITION AND RECALL: ATTENTIONAL PRIMING, LEARNING, AND RESONANCE 1. Introduction 2. Logogens and Embedding Fields 3. Verification by Serial Search 4. Automatic Activation and Limited-Capacity Attention 5. Interactive Activation and Parallel Access 6. The View from Adaptive Resonance Theory
xv
361 363 365 365 366 368 369 369 374 374 375 376 376 377 378 379 380 382 384 384 385 387 389 391 401
404 406 407 409 410 411
xvi
Table of Contents
7. Elements of the Microtheory: Tuning,
Categories, Matching, and Resonance 8. Counting Stages: Resonant Equilibration as Verification and Attention 9. Attentional Gain Control versus Attentional Priming: The 2/3 Rule 10. A Macrocircuit for the Self-organization of Recognition and Recall 11. The Schvaneveldt-McDonald Lexical Decision Experiments: Template Feedback and List-Item Error Trade-off 12. Word Frequency Effects in Recognition and Recall 13. Analysis of the Underwood and Freund Theory 14. Analysis of the Mandler Theory 15. The Role of Intra-List Restructuring and Contextual Associations 16. An Explanation of Recognition and Recall Differences 17. Concluding Remarks References
CHAPTER 8: NEURAL DYNAMICS OF SPEECH AND LANGUAGE CODING: DEVELOPMENTAL PROGRAMS, PERCEPTUAL GROUPING, AND COMPETITION FOR SHORT TERM MEMORY 1. Introduction: Context-Sensitivity of Self-organizing Speech and Language Units 2. Developmental Rules Imply Cognitive Rules as Emergent Properties of Neural Network interactions 3. A Macrocircuit for the Self-organization of Recognition and Recall 4. Masking Fields 5. The Temporal Chunking Problem: Seeking the Most Predictive Representation 6. The Word Length Effect 7. All Letters Are Sublists: Which Computational Units Can Self-organize? 8. Self-organization of Auditory-Motor Features, Items, and Synergies 9. Temporal Order Information Across Item Representations: The Spatial Recoding of Temporal Order 10. The LTM Invariance Principle 11. The Emergence of Complex Speech and Language Units 12. List Chunks, Recognition, and Recall 13. The Design of a Masking Field: Spatial Frequency Analysis of Item-Order Information 14. Development of a Masking Field: Random Growth and Self-similar Growth 15. Activity-Contingent Self-similar Cell Growth
412 419 420 425 430 439 441 442 445 446 448 450 456
458 459 459
461 461 462 462 463 465 465 466
466 467
469 470
Table of Contents
16. Sensitivity to Multiple Scales and
Intrascale Variations 17. Hypothesis Formation, Anticipation, Evidence, and Prediction 18. Computer Simulations 19. Shunting On-Center Off-Surround Networks 20. Mass Action Interaction Rules 21. Self-similar Growth Within List Nodes 22. Conservation of Synaptic Sites 23. Random Growth from Item Nodes to List Nodes 24. Self-similar Competitive Growth Between List Nodes 25. Contrast Enhancement by Sigmoid Signal Functions 26. Concluding Remarks: Grouping and Recognition Without Algorithms or Search Appendix References
xvii
473 473 475 481 489 490 490 491 492 492 493 494 496
AUTHOR INDEX
499
SUBJECT INDEX
505
This Page Intentionally Left Blank
Clinptrr 1
THE QIJANTIZED (;EOMETRY OF VISUAL SPACE: THE COHERENT COMPUTATION OF DEPTH, FORM. A N D LIGHTNESS Preface
The article which forms this Chapter introduces an ambitious research program aimed at creating a unified theory of preattentive visual perception; that is, a unified theory of 3-dimensional form, color, and brightness perception, including depth, texture, surface, and motion perception. The theory has since been developing very rapidly and has led to many new ideas and predictive successes. Four of the major published articles of the theory are contained in this volume (Chapters 1-4). In my prefaces, I highlight some of the key issues and directions for future research. As in all the articles in these volumes, the same small set of dynamical laws and mechanisms is used. What sets the different applications apart is not their local mechanisms. What sets them apart are the specialized circuits, built u p from a common set of mechanisms, which have evolved to adaptively solve particular classes of environmental problems. The present theory became possible when sufficiently many of these mechanisms and circuits were discovered in other applications -notably during the development of adaptive resonance theory (Volume I and Chapters 6-8)-to notice their applicability to visual perception. The theory in this Chapter was built up from two types of general purpose cooperative-competitive networks. The simpler type of network is an on-center off-surround network with feedlorward pathways whose cells obey mass action, or shunting, laws. I showed that such a network generates a constellation of emergent properties that is of fundamental importance in visual perception. no less than in many other applications. A single such network is capable of: reflectance processing, conservation or normalization of total activation (limited capacity). Weber law modulation, adaptation level processing, noise suppression, shift property, ratio-sensitive edge processing, qower law invariance, spatial frequency sensitivity. energetic amplification of matched input patterns, and energetic suppression of mismatched input patterns. The second type of general purpose network is an on-center off-surround network with feedback pathwayswhose cells obey mass action, or shunting, laws. Such a network is capable of contrast enhancement, short term memory storage, normalization of total activation, multistability, hysteresis, noise suppression, and propagation of reflectancesensitive and spatial frequency-sensitive st anding waves. With these mechanisms and their constellations of emergent properties as tools, I was able to address variants of the following basic question: How are ambiguous local visual cues bound together into unambiguous global context-sensitive percepts? To this end, Part I of the article reviews data concerning context-sensitive interactions between properties of depth, brightness, color, and form, as well as the inability of various models to explain these interactions. Some of the issues raised by these interactions are: Why are binocular rivalry and binocular fusion two alternative visual modes? How can fusion occur with respect to one spatial scale while rivalry simultaneously occurs with respect to another spatial scale at the same region of perceptual space? How does rivalry inhibit the visibility of percepts that would be visible when viewed monocularly? How do binocular matches at a sparse number of scenic locations impart unambiguous depth to large binocularly ambiguous regions? Moreover, how do the perceptual qualities, such ab color and brightness, of these ambiguous regions appear to inherit these depth values? How do we perceive flat surfaces as flat despite the fact that the binocular
2
Chapter I
fixat,ion point is a zero disparity point, and all other iinainbiguous binocular matches have increasing disparity as a function of their eccentricity from the fixation point? Such concerns lead to the realization that either binocularly fused edges or nionocularly viewed edges, but not binocularly mismatched edges, can trigger a filling-in process which is capable of rapidly lifting perceptual qualities, such as brightness and color, into a multiple-scale representation of form-and-color-in-depth. In order to understand how edge matches trigger filling-in, yet edge mismatches suppress filling-in, I introduced the concepts of filling-in generator (matched edge) and filling-in barrier (mismatched edge). I showed how to design a cooperative-competitive feedback network which ext#ractsedges from pairs of monocular input patterns, binocularly matches the edges, and feeds the results back toward the monocular patterns. Matched edges then automatically lift a binocularly fused representation of the monocular patterns up to the binocular perceptive field, and fill-in this binocular representation until a filling-in barrier (mismatched edge) is reached. Mismatched edges do not lift their monocular input patterns into a binocular representation. Thus the binocular filling-in process is triggered by monocularly viewed edges or binocularly matched edges, but not by binocularly mismatched edges. I call such a process a filling-in resonant erchange, or FIRE. The Weber law properties of the binocular FIRE process enable the binocular activation levels to mimmick binorular brightness data. Computer simulations of the FIRE process quantitatively demonstrate this property (Chapter 4). The FIRE process also clarifies many other visual data which are summarized in the Chapter. In particular, it was shown how a gated dipole field could be embedded within the FIRE process to generate some properties of binocular rivalry. The FIRE theory is based upon a single edge-driven filling-in process. As my colleagues 14ichael Cohen, Ennio Mingolla, and I began to quantitatively simulate more and more brightness and form data, it gradually became clear that a different sort of filling-in, called diffusive filling-in, pFeprocesses monocular input pattterns before they activate the FIRE process. This insight gradually led to the realization that a pair of distinct edge-driven systems exist, one devoted to boundary formation and segmentation and the other devoted to color and brightness detection and filling-in. Chapters 2 and 3 describe the rules of these systems and illustrate how their interactions can explain monocular form, color, and brightness percepts.
The Bchavioral and Brain Sciences 6,625 657 (1983) 01983 Cambridge Vniversity Press Reprinted by permission of the publisher
THE QUANTIZED GEOMETRY OF VISUAL SPACE: THE COHERENT COMPUTATION OF DEPTH, FORM, A N D LIGHTNESS
Stephen Grossbergt
Abstract A theory is presented of how global visual interactions between depth, length, lightness, and form percepts can occur. The theory suggests how quantized activity patterns which reflect these visual properties ran coherently fill-in, or complete, visually ambiguous regions starting with visually informative data features. Phenomena such as the Cornsweet and Craik-O’Brien effects, phantoms and subjective contours, binocular brightness summation, the equidistance tendency, Emmert’s law, allelotropia, multiple spatial frequency sraling and edge detection, figure-ground completion, coexistence of depth and binocular rivalry, reflectance rivalry, Ferhner’s paradox, decrease of threshold rontrast with increased number of cycle5 in a grating pattern, hysteresis, adaptation level tuning, Weber law modulation, shift of sensitivity with background luminance, and the finite rapacity of visual short term memory are discussed in terms of a small set of concepts and mechanisms. Limitations of alternative visual theories which depend upon Fourier analysis, Laplacians, zero-crossings, and cooperative depth planes are described. Relationships between monocular and binocular processing of the same visual patterns are noted, and a shift in emphasis from edge and disparity computations toward the characterization of resonant activity-scaling correlations across multiple spatial scales is recommended. This recommendation follows from the theory’s distinction between the concept of a structural spatial scale, which is determined by local receptive field properties, and a functional spatial scale, which is defined by the interaction between global properties of a visual scene and the network as a whole. Functional spatial scales, but not structural spatial scales, embody the quantization of network activity that reflects a scene’s global visual representation. A functional scale is generated by a filling-in resonant exrhange, or FIRE, which can be ignited by an exrhange of feedback signals among the binocular cells where monocular patterns are binocularly matched.
K e y Words: binocular vision; brightness perception; figure-ground; feature extraction; form perception; neural network; nonlinear resonance; receptive field; short-term memory; spatial srales; visual completion.
_____-_--
t Supported in part by the Air Force Office of Scientific Research (AFOSR 82-0148), the National Science Foundation (KSF 1ST-80-00257),and the Office of Naval Research (ONR N00014-83-K0337).
3
Chapter 1
4
The objects ofperception and the space in which thpy swm to lie are not abstracted by a rigid metric but a far looser one than any philosopher ever proposed or any psychologist dreamed. -Jerome Lettvin (1981)
1. Introduction: T h e Abundance of Visual Models Few areas of science can boast the wealth of interesting and paradoxical phenomena readily accessible to introspection that visual perception can. The sheer variety of effects helps to explain why so many different types of theories have arisen to carve up this data landscape. Fourier analysis (Cornsweet, 1970; Graham, 1981; Robson, 1975), projective geometry (Beck, 1972; Johannson, 1978; Kaufman, 1974), Riemannian geometry (Blank, 1978; Luneberg, 1947; Watson, 1978),special relativity (Caelli, Hoffman, and Lindman, 1978), vector analysis (Johannson, 1978), analytic function theory (Schwartz, 1980), potential theory (Sperling, 1970), and cooperative and competitive networks (Amari and Arbib, 1977; Dev, 1975; Ellias and Grossberg, 1975; Grossberg, 1970a, l973,1978e, 1981; Sperling, 1970; Sperling and Sondhi, 1968) are just some of the formalisms which have been used to interpret and explain particular visual effects. Some of the most distinguished visual researchers believe that this diversity of formalisms is inherent in the nature of psychological phenomena. Sperling (1981, p.282) has, for example, recently written In fact, as many kinds of mathematics seem to be applied to perception as there are problems in perception. I believe this multiplicity of theories without a reduction to a common core is inherent in the nature of psychology .. and we should not expect the situation to change. The moral, alas, is that we need many different models to deal with the many different aspects of perception.
.
The opinion Sperling offers is worthy of the most serious deliberation, since it predicts the type of mature science which psychology can hope to become, and thereby constrains the type of theorizing which psychologists will try to do. Is Sperling right? Or do there exist concepts and properties, heretofore not explicitly incorporated into the mainstream visual theories, which can better unify the many visual models into an integrated visual theory? Part I of this article reviews various visual data as well as internal paradoxes and inherent limitations of some recent theories that have attempted to explain these data. Part I1 presents a possible approach to overcoming these paradoxes and limitations and to explaining the data in a unified fashion. Numerical simulations that support the qualitative arguments and mathematical properties described in Part I1 are found in Cohen and Grossberg (1983a). Parts I and I1 are self-contained and can be read in either order.
PART I 2.
The Quantized Geometry of Visual Space
There is an important sense in which Sperling's assertion is surely correct, but in this sense it is also true of other sciences such as physics. Different formalisms can probe different levels of the same underlying physical reality without excluding the possibility that one formalism is more general, or physically deeper, than another. In physics, such theoretical differences can be traced to physical assumptions which approximate certain processes in order to clarify other processes. I will argue that several approaches to visual perception make approximations which do not accurately represent the physical processes which they have set out to explain. For this reason, such theories have predictive limitations which do not permit them to account, even to a first approximation, for major properties of the data. In other words, the mathematical
The Quantized Geometry of Visual Space
5
formalism of these theories has not incorporated fundamental physical intuitions into their computational structure. Once t h e intuitions are translated into a suitable formalism, the theoretical divcrsity in visual science will, I claim, gradually become qualitatively more like that known in physics. The comparison with physics is not an idle one. Certain of the intuitions which need to be formalized at the foundations of visual theory are well known to us all. They have not been acted upon because, despite their simplicity, they lead to conceptually radical conclusions that force a break with traditional notions of geometry. Lines and edges can no longer be thought of as a series of points; planes can no longer be built up from local surface elements or from sets of lines or points; and so on. All local entities evaporate as we build up notions of functional perceptual units which can naturally deal with the global context-dependent nature of visual percepts. The formalism in which this is achieved is a quantized dynamic geometry, and the nature of the quantization helps to explain why so many visual percepts seem to occur in a curved visual space. When a physicist discusses quantization of curved space, he usually means joining quantum mechanics to general relativity. This goal has not yet been achieved in physics. To admit that even the simplest visual phenomena suggest such a formal step clarifies both the fragmentation of visual science into physically inadequate formalisms, and the radical nature of the conceptual leap that is needed to remedy this situation.
S. The Need for Theories Which M a t c h the Data’s Coherence As background for my theoretical treatment, I will review various paradoxical data roncerning interactions between the perceived depth, lightness, and form of objects in a scene. These paradoxes should not, I believe, be viewed as isolated and unimportant anomalies, but rather its informative instances of how the visual system completes a scene’s global representation in response to locally ambiguous visual data. These data serve to remind us of the interdependence and rontext-sensitivity of visual properties; in other words, of their coherence. With these reminders fresh in our minds, I will argue in Part I1 that by probing important visual design principles on a deep mathematical level, one can discover, as automatic mathematical consequences, the way many visual properties are coherently caused as manifestations of these design principles. This approach to theory construction is not In the mainstream of psychological thinking today. Instead, one often finds models capable of computing some single visual property, such as edges or cross-correlations. Even with a different model for each property, this approach does not suggest how related visual properties work together to generate a global visual representation. For example, the present penchant for lateral inhibition by linear feedforward operators like a Laplacian or a Fourier trI$odeling nsform to compute edges or cross-correlations (Marr and Hildreth, 1980; Robson, 1975) pays the price of omitting related nonlinear properties like reflectance processing, Weber law modulation, figure-ground filling-in, and hysteresis. To the argument that one must first understand one property at a time, I make this reply: The feedforward linear theories contain errors even in the analysis of the concepts they set out to explain. Internal problems of these theories prevent them from understanding the other phenomena that rohere in the data. This lack of coherence, let alone correctness, will cause a heavy price to be paid in the long run, both scientifically and technologically. I‘nless the relationships among visual data properties are correctly represented in a distributed fashion within the system, plausible (and economic) ways to map these properties into other subsystems, whether linguistic, motor, or motivational, will be much harder to understand. Long-range progress, whether in theoretical visual science per se or in its relations to other scientific and technological disciplines, requires that the mathematical formalisms in which visual concepts are articulated be scrupulously criticized.
6
Chapter I
4. Some Infliirnccs of Perceivcd Depth on Pcrcrived Size
Interactions between an object’s perceived depth, size, and lightness have been intensively studied for many years. The excellent texts by Cornsweet (1970) and by Kaufman (1974) review many of the basic phenomena. The classical experiments of Holway and Boring (1941) show that observers can estimate the actual sizes of objects at different distances even if all the objects subtend the same visual angle on the observers’ retinas. Binocular cues contribute to the invariant percept of size. For example, Emmert (1881) showed that monocular cues may be insufficient to estimate an object’s length. He noted, among other properties, that a monocular afterimage seems to be located on any surface which the subject binocularly fixates while the afterimage is active. Moreover, the perceived size of the afterimage increases as the perceived distance of the surface increases. This effect is called Emmert’s law. Although the use of monocular afterimages to infer properties of normal viewing is fraught with difficulties, other paradigms have also suggested an effect of perceived depth on perceived size. For example, Gogel (1956, 1965, 1970) has reported that two objects viewed under reduction conditions (one eye looks through a small aperture in dim light) will be more likely to be judged as equidistant from the observer as they are brought closer together in the frontal plane. In a related experiment, one object is monocularly viewed through a mirror arrangement whereas all other objects in the scene are binocularly viewed. The monocularly viewed object then seems to lie at the same distance as the edge that, among all the binocularly viewed objects, is retinally most contiguous t o it. Gogel interpreted these effects as examples of an equidistance tendency in depth perception. The equidistance tendency also holds if a monocular afterimage occupies a retinal position near to that excited by a binocularly viewed object. One way to interpret these results is to assert that the perceived distance of the binocular object influenres the perceived distance of the adjacent afterimage by equidistance tendency, and thereupon influences the perceived size of the afterimage by Emmert’s law. Results such as these suggest that depth cues can influence size estimates. They also suggest that this influence can propagate between object representations whose cues excite disparate retinal points and that the patterning of all cues in the visual context of an object helps to determine its perceived length. The classical geometric notion that length can be measured by a ruler, or can be conreptualized in terms of any locally defined romputaton, thereby falls into jeopardy. 5. Some Monocular Constraints on Size Perception
Size estimates can also be modified by monocular cues, as in the corridor illusion (Richards and Miller, 1971; see Figure la). In this illusion, two cylinders of equal size in a picture are perceived to be of different sizes because they lie in distinct positions within a rectangular grid whose spatial scale diminishes toward a fixation point on the horizon. An analogous effect occurs in the Ponzo illusion shown on the right, wherein two horizontal rods of equal pictorial length are drawn superimposed over an inverted V (Kaufman, 1974; see Figure l b ) . The upper rod appears longer than the lower rod. The perception of these particular figures may be influenced by learned depth perspective cues (Gregory, 1966), although this hypothesis does not explain how perspective cues alter length percepts. There exist many other figures, however, in which a perspective effect on size scaling is harder to rationalize (Day, 1972). Several authors have therefore modeled these effects in terms of intrinsic scaling properties of the visual metric (Dodwell, 1975; Eijkman, Jongsma, and Vincent, 1981; Restle, 1971; Watson, 1978). A more dramatic version of scaling is evident when subjective contours complete the boundary of an incompletely represented figure. Then objects of equal pictorial size that lie inside and outside the completed figure may appear to be of different size (Coren,
The Quantized Geometry of Visual Space
F i g u r e 1. (a) The corridor illusion. (b) The Ponzo illusion. (After Kaufnian 1974. From Sight and Mind: An Introduction to Visual Pcrception. Copyright 0 1 9 7 4 by Oxford University Press, Inc. Reprinted by permission.) 1972). The very existence of subjective contoiirs raises the issue of how incomplete data about form can select internal representations which can span or fill-in the incomplete regions of the figure. How can we characterize those features or spatial scales in the incomplete figure which play an informative role in the completion process versus those features or scales which are irrelevant? Attneave (1954) has shown, for example, that when a drawing of a cat is replaced by a drawing in which the points of maximum curvature in the original are joined by straight lines, then the new drawing still looks like a cat (see Figure 2). Why are the points of maximumcurvature such good indicators of the entire form? Is there a natural reason why certain spatial scales in the figure might have greater weight than other scales? Attneave’s cat raises the question: Why does interpolation between points of maximum curvature with lines of zero curvature produce a good facsimile of the original picture? Different spatial scales somehow need to interact in our original percept for this to happen. To understand this issue, we need a correct definition of spatial scale. Such a definition should distinguish between local scaling effects, such as those which can be understood in terms of a neuron’s receptive field (Robson. 1975), and global scaling effects, such as those which control the fillingin of subjective rontours or of phantom images across a movie screen, which subtends a visual angle much larger than that spanned by any neuron’s receptive field (Smith and Over, 1979; Tynan and Sekuler, 1975; von Grunau, 1979; Weisstein, Maguire, and Berbaum, 1976).
6. Multiple Scaies in Figure and Ground: Siinultaneous Fusion and Rivalry T h a t interactions between several spatial scales are needed for form perception is also illustrated by the following type of demonstration (Beck, 1972). Represent a letter
8
Chapter 1
Figure 2. Attneave's cat: Connecting points of maximum curvature with straight lines yields a recognizable caricature of a cat. (After Attneave 1954.)
E by a series of nonintersecting straight lines of varying oblique and horizontal orientations drawn within an imaginary E contour and surrounded by a background of regular vertical lines. The E is not perceived because of the lines within the contour, since the several orientations of these interior lines do not group into an E-like shape. Somehow the E is synthesized as the complement of the regular background, or, more precisely, by the statistical differences between the figure and the ground. These statistical regularities define a spatial scale-broader than the scale of the individual lines-on which the E can be perceived. In a similar vein, construct a stereogram out of two pictures as follows (Kaufman, 1974; see Figure 3). The left picture is constructed from 45O-oblique dark parallel lines bounded by an imaginary square, which is surrounded by 135"-oblique lighter parallel lines. The right picture is constructed from 135'-oblique dark parallel lines bounded by an imaginary square whose position in the picture is shifted relative to the square in the left picture. This imaginary square is surrounded by 45O-oblique lighter parallel lines. When these pictures are viewed through a stereoscope, the dark oblique lines within the square are rivalrous. Nonetheless the square as a whole is seen in depth. How does this stereogram induce rivalry on the level of the narrowly tuned scales that interact preferentially with the lines, yet simultaneously generate a coherent depth impression on the broader spatial scales that interact preferentially with the squares? Kulikowski (1978) has also studied this phenomenon by constructing two pairs of pictures which differ in their spatial frequencies (see Figure 4). Each picture is bounded by the same frame, as well as by a pair of short vertical reference lines attached to the outside of each frame at the same spatial locations. In one pair of pictures, spatially blurred black and white vertical bars of a fixed spatial frequency are 180" out of phase. In the other pair of pictures, sharp black and white vertical bars of the same spatial extent are also 180' out of phase. The latter pair of pictures contains high spatial frequency components (edges) as well as low spatial frequency components. During binocular viewing, subjects can fuse the two spatially blurred pictures and see them in depth with respect to the fused images of the two frames. By contrast, subjects experience binocular rivalry when they view the two pictures of sharply etched bars. Yet they still experience the rivalrous patterns in depth. This demonstration suggests that the low spatial frequencies in the bar patterns can be fused to yield a depth impression even while the higher spatial frequency components in the bars elicit an alternating rivalrous
9
The Quantized Georneny of Visual Space
Figure S. The Kaufman stereogram induces an impression of depth even though the darker line patterns are rivalrous. (After Kaufman 1974. From Sight and Mind: An Introduction to Visual Perception. Copyright 01974 by Oxford Universitv Press, Inc. Reprinted by permission.)
a
b
C
Figure 4. Demonstration of depth perception with and without fusion. (a) Sinusoidal gratings in antiphase can be fused to yield a depth impression. (b) The square wave gratings yield a depth impression even when their sharp edges become double. (c) A similar dichotomy is perceived when single sinusoidal or bars are viewed. (After Kulikowski 1978. Reprinted by permission from Nature, volume 275, pp.126-127. Copyright BMacmillan Journals Limited.) perception of the monocular patterns. The demonstrations of Kaufman (1974) and Kulikowski (1978) raise many interesting questions. Perhaps the most pressing one is: Why are fusion and rivalry alternative binocular perceptual modes? Why are coexisting unfused monocular images so easily supplanted by rivalrous monocular images? How does fusion at one spatial scale coexist with rivalry at a different spatial scale that represents the same region of visual space?
7. Binocular Matching, Competitive Feedback. and Monocular SelfMatching These facts suggest some conclusions that will be helpful in organizing my data review and will be derived on a different theoretical basis in Part 11. I will indicate
10
Chapter 1
how rivalry suggests the existenre of biiioriilar r d l s that ran be activated by a single monocular input and that mutually interact in a roiiipetitive ferdbark network. First I will indicate why these binocular cells can be inonorularly activated. The binocular cells in question are the spatial loci where monocular data from the two eyes interact to grnerate fusion or rivalry as the outcome. To show why at least some of these cells can be monocularly activated, I will ronsider implications of the following mutually exclusive possibilities: either the outrome of binocular matching feeds back toward the monocular cells that generated the signals to the binocular cells, or it does not. Suppose it does not. Then the activities of monocular cells cannot subserve perception; rather, perception is associated with activities of binocular cells or of cells more central than the binocular cells. This is because both sets of monocular cells would remain active during a rivalry percept, since the binocular interaction leading to the rivalry percept does not, by hypothesis, feed back to alter the activities of the monocular cells. Now we confront the conclusion that monocular cells do not subserve perception with the fact that the visual world can be vividly seen through a single eye. It follows that some of the binocular cells which subserve perception can be activated by inputs from a single eye. Having entertained the hypothesis that the outcome of binocular matching does not feed back toward monocular cells, let us now consider the opposite hypothesis. In this rase, too, I will show that a single monocular representation must be able to activate rertain binocular cells. To demonstrate this fact, I will again argue by contradiction. Suppose it does not. In other words, suppose that the outcome of binocular matching does feed back toward monocular cells but a single monocular input cannot activate binocular cells. Because the visual world can be seen through a single eye, it follows that the activities of monocular cells subserve perception in this case. Consequently, during a binorular rivalry percept, the binocular-to-monocular feedback must quickly inhibit one of the monocular representations. The signals which this monocular representation was sending to the binocular cells are thereupon also inhibited. The binocular cells then receive signals only from the other monocular representation. The hypothesis that binorular cells cannot fire in response to signals from only one monocular representation implies that the binocular cells shut off, along with all of their output signals. The suppressed monocular cells are then released from inhibition and are excited again by their monocular inputs. The cycle ran now repeat itself, leading to the percept of a very fast flicker of one monocular view superimposed upon the steady percept of the other monocular view. This phenomenon does not ocrur during normal binocular vision. Consequently, the hypothesis that a single monocular input cannot activate binocular cells must be erroneous. Whether or not the results of binocular matching feed back toward monocular cells, certain binocular cells can be artivated by a single monocular representation. An additional conclusion can be drawn in the case wherein the results of binocular matching can feed back toward monocular cells. Here a single monocular source can activate binocular cells, which can thereupon send signals toward the monocular source. The monocular representation can thereby sell-match at the monocular source using the binocular feedback as a matching signal. This fact implies that the monocular source cells are themselves binocular cells, because a monocular input can activate binocular cells which then send feedback signals to the monocular source cells of the other eye. In this way the monocular source cells can be activated by both eyes, albeit less symmetrically than the binocular cells at which the primary binocular matching event takes place. This conclusion can be summarized as follows: The binocular cells at which binocular matching takes place are flanked by binocular cells that satisfy the following properties: (a) they are fed by monocular signals; (b) they excite the binocular matching cells; (c) they can be excited or inhibited due to feedback from the binocular matching
The Quantized Geometry of Visual-re
11
cells, depending upon whether fusion or rivalry occur. It remains only to consider the possibility that the results of binocular matching d o not fced back toward the monocular cells. The following argument indicates why this cannot happen. A purely feedforward interaction from monocular toward binocular rells cannot generate the main properties of rivalry, namely a sustained monocular percept followed by rapid and complete suppression of this percept when it is supplanted by the other monocular percept. This is because the very activity of the perceived representation must be the cause of its habituation and loss of competitive advantage relative to the suppressed representation. Consequently. the habituating signals from the perceived representation that inhibit the suppressed representation reach the latter at a stage at, or prior to, that representation’s locus for generating signals to the perceived representation that are capable of habituating. Such an arrangement allows the signals of the perceived representation to habituate but spares the suppressed representation from habituation. By symmetry, the two representations reciprocally send signals to each other that are received at, or at a stage prior to, their own signaling cells. This arrangement of signaling pathways defines a feedback network. One can now refine this conclusion by going through arguments like those above t o conclude that (a) the feedback signals are received at binocular cells rather than a t monocular rells, and (b) the feedback signals are not all inhibitory signals or else binocular fusion could not occur. Thus a competitive balance between excitatory and inhibitory feedback signals among binocular cells capable of monocular activation needs to be considered. Given the possibility of monorular self-matching in this framework, one also needs to ask why the process of monocular self-matching, in the absence of a competing input from the other eye, does not rause the cyclic strengthening and weakening of monocular activity that occurs when two nonfused monocular inputs are rivalrous. One does not need a complete theory of these properties to conclude that no theory in which only a feedforward flow of visual patterns from monocular to binocular cells occurs (e.g., to rompute disparity information) can explain these data. Feedback from binorular matching toward monocular computations is needed to explain rivalry data, just as such feedback is needed to explain the influence of perceived depth on perceived size or brightness. I will suggest in Part I1 how a suitably defined feedback scheme can give rise to all of these phenomena at once. 8 . Against the K e p l e r i a n View: Scale-Sensitive Fusion and R i v a l r y
The Kaufman (1974) and Kulikowski (1978) experiments also argue against the Keplerian view, which is a mainstay of modern theories of stereopsis. The Keplerian view is a realist hypothesis which suggests that the two monocular views are projected pointby-point along diagonal rays, and that their crossing-points are loci from which the real depth of objects may be computed (Kaufman, 1974). When the imaginary rays of Kepler are translated into network hardware, one is led to assume that network pathways carrying monocular visual signals merge along diagonal routes (Sperling, 1970). The Keplerian view provides an elegant way to think about depth, because (other things being equal) objects which are closer should have larger disparities, and their Keplerian pathways should therefore cross at points which are further along the pathways. M o r e over, all pairs of points with the same disparity cross a t the same distance along their pathway, and thereby form a row of contiguous crossing-points. This concept does not explain a result such as Kulikowski’s, since all points in each figure (so the usual reasoning goes) have the same disparity with respect to the corresponding point in the other figure. Hence all points cross in the same row. In the traditional theories, this means that all points should match equally well to produce an unambiguous disparity measure. Why then d o low spatial frequencies seem to match and yield a depth percept at the same disparity a t which high spatial frequencies do not seem to match?
12
Chapter 1
Rather than embrace the Keplrrian view, I will suggest how suitably preprocessed input data of fixed disparity can be matrhed by certain spatial scales but not by other spatial scales. To avoid misunderstanding, I should inimediately say what this hypothesis does not imply. It does not imply that a pair of high spatial frequency input patterns of large disparity cannot be matched, because only suitable statistics of the monocular input patterns will be matched, rather than the input patt.erns themselves. Furthermore, inferences made from linear statistics of the input patterns do not apply because the statistics in the theory need to be nonlinear averages of the input patterns to ensure basic stability properties of the feedback exchange between monocular and binocular cells. These assertions will be clarified in Part 11. Once the Keplerian view is questioned, the problem of false-images (Julesz, 1971), which derives from this view and which has motivated much thinking about stereopsis, also becomes less significant. The false-images are those crossing-points in Kepler’s grid that do not correspond to the objects’ real disparities. Workers like Marr and Poggio (1979)have also concluded that false images are not a serious problem if spatial scaling is taken into account. Their definition of spatial scale differs from my own in a way that highlights how a single formal definition can alter the whole character of a theory. For example, when they mixed their definition of a spatial scale with their view of the false-image problem, Marr and Poggio (1979) were led to renounce cooperativity as well, which I view as an instance of throwing out the baby with the bathwater, since all global filling-in and fi ure-ground effects thereby become inexplicable in their theory. Marr and Poggio (1974 abandoned cooperativity because they did not need it to deal with false images. In a model such aa theirs, the primary goal of which is to compute unambiguous disparity measures, their conclusion seems quite logical. Confronted by the greater body of phenomena that are affected by depth estimates, such a step seems unwarranted. 9. Local versus Global Spatial Scales
Indeed, both the Kaufman (1974)and the Kulikowski (1978)experiments, among many others, illustrate that a figure or ground has a coherent visual existence that is more than the sum of its unambiguous feature computations. Once a given spatial scale makes a good match in these experiments, a depth percept is generated that pervades a whole region. We therefore need to distinguish the scaling property that makes good matches based on local computations from the global scaling effects that fill-in an entire region subtending an area much broader than the local scales themselves. This distinction between local and global scaling effects is vividly demonstrated by constructing a stereogram in which the left ”figure” and its “ground” are both induced by a 5% density of random dots (Julesz, 1971,p.336) and the right “figure” of dots is shifted relative to its position in the left picture. Stereoscopically viewed, the whole figure, including the entire 95% of white background between its dots, seems to hover at the same depth. How is it that the white background of the “figure” inherits the depth quality arising from the disparities of its meagerly distributed dots, and the white background of the “ground” inherits the depth quality of its dots? What mechanism organizes the locally ambiguous white patches that dominate 95% of the pictorial area into two distinct and internally coherent regions? Julesz (1971,p.250) describes another variant of the same phenomenon using a random-dot stereogram inspired by an experiment of Shipley (1965). In this stereogram, the traditional center square in depth is interrupted by a horizontal white strip that cuts both the center square and the surround in half. During binocular viewing, the white strip appears to be cut along the contours of the square and it inherits the depth of figure or ground, despite the fact that it provides no disparity or brightness cues of its own at the cut regions.
The Quotitized Geoniety of Visual Space
13
10. Interaction of Perceived Form a i d P r r r r i v e d Position
The choice of scales leading to a depth percept can also cause a shift in perceived form, notably in the relative distance between patterns in a configuration. For example, when a pattern AB C is viewed through one eye and a pattern A BC is viewed through the other eye, the letter B can be seen in depth at a position halfway between A and C (von Tschermak-Seysenegg, 1952; Werner, 1937). This phenomenon, called displacement or allelotropia, again suggests that the dynamic transformations in visual space are not of a local character since the location of entire letters, not to mention their points and lines, can be deformed by the spatial context in which they are placed. The nonlocal nature of visual space extends also to brightness perception, as the following section summarizes. 11. Some Influences of Perceived Depth and Form on Perceived Brightness
The Craik-O’Brien and Cornsweet effects (Cornsweet, 1970; O’Brien, 1958) show that an object’s form, notably its edges or regions of rapid spatial change, can influence its apparent brightness or lightness (Figure 5). Let the luminance profile in Figure 5a describe a cross-section of the two-dimensional picture in Figure 5b. Then the lightness of this picture appears as in Figure 5c. The edges of the luminance profile determine the lightnesses of the adjacent regions by a filling-in process. Although the luminances of the regions are the same except near their edges, the perceived lightnesses of the regions are determined by the brightnesses of their respective edges. This remarkable property is reminiscent of Attneave’s cat, since regions of maximum curvature-in the lightness domain-again help to determine how the percept is completed. In the present instance, the filling-in proccss overrides the visual data rather than merely completing an incomplete pattern. Hamada (1976, 1980) has shown that this filling-in process is even more paradoxical than was previously thought. He compared the lightness of a uniform background with the lightness of the same uniform background with a less luminous Craik-O’Brien figure superimposed upon it. By the usual rules of brightness contrast, the lesser brightness of the Craik-O’Brien figure should raise the lightness of the background as its own lightness is reduced. Remarkably, even the background seems darker than the uniform background of the comparison figure, although its luminance is the same. Just as form can influence lightness, apparent depth can influence lightness. Figures which appear to lie at the same depth can influence each other’s lightness in a manner analogous to that found in a monocular brightness constancy paradigm (Gilchrist, 1979).
12. Some Influences of Perceived Brightness on Perceived Depth Just as depth can influence brightness estimates, brightness data can influence depth estimates. For example, Kaufman, Bacon, and Barroso (1973) studied stereograms build up from the two monocular pictures in Figure 6a. When these pictures are viewed through a stereogram, the eyes see the lines at a different depth due to the disparity between the two monocular views. If the stereogram is changed so that the left eye sees the same picture as before, whereas the right eye sees the two pictures superimposed (Figure 6b), then depth is still perceived. If both eyes see the same superimposed pictures, then of course no depth is seen. However, if one eye sees the pictures superimposed with equal brightness, whereas the other eye sees the two pictures superimposed, one with less brightness and the other with more, then depth is again seen. In the latter case there is no disparity between the two figures, although there is a brightness difference. How does this brightness difference elicit a percept of depth? The Kaufman et al. (1973)study raises an interesting possibility. If a binocular brightness difference can cause a depth percept, and if a depth percept can influence
14
Chapter 1
Figure 6. In (a) the luminance profile is depicted across a one-dimensional ray through the picture in (b). Although the interiors of all the regions have equal luminance, the apparent brightness of the regions is described by (c).
15
The Quantized Geomeny of Visual Space
PICTURE 1
PICTURE
2
Figure 6. Combinations of the two pictures in (a), such as in (b), yield a depth percept when each picture is viewed through a separate eye. Depth can be seen even if the two pictures are combined to yield brightness differences but no disparity differences.
16
Chapter 1
perceived length, then a binocular brightness difference should be able to cause a rhange in perceived length. It is also known that monocular cues can sometimes have effects on perceived length similar to those of binocular cues, as in the corridor and Ponao illusions. When these two phenomena are combined, it is natural to ask: Under what circumstances can a monocular brightness change cause a change (albeit small) in perceived length? I will return to this question in Part 11.
IS. The Binocular Mixing of Monocular Brightnesses The Kaufman et al. (1973)result illustrates the fact that brightness information from each eye somehow interacts in a binocular exchange. That this exchange is not simply additive is shown by several experiments. For example, let A B on a white field be viewed with the left eye and BC on a white field be viewed with the right eye in such a way that the two B’s are superimposed. Then the B does not look significantly darker than A and C despite the fact that white is the input to the other eye corresponding to these letter positions (Helmholtz, 1962). In a similar fashion, closing one eye does not make the world look half as bright despite the fact that the total luminance reaching the two eyes is halved (Levelt, 1964;von Tschermak-Seysenegg, 1952). This fact recalls the discussion of monocular firing of binocular cells from Section 7. The subtlety of binocular brightness interactions is further revealed by Fechner’s paradox (Hering, 1964). Suppose that a scene is viewed through both eyes but that one eye sees it through a neutral filter that attenuates all wavelengths by a constant ratio. The filter does not distort the reflectances, or ratios, of light reaching its eye, but only its absolute intensity. Now let the filtered eye be entirely occluded. Then the scene looks brighter and more vivid despite the fact that less total light is reaching the two eyes, and the reflectances are still the same. Binocular summation of brightness, in excess of probability summation, can occur when the monocular inputs are suitably matched “within some range, perhaps equivalent to Panum’s area ....Stereopsis and summation may be mediated by a common neural mechanism” (Blake, Sloane, and Fox, 1981). I will suggest below that the coexistence of Ferhner’s paradox and binocular brightness summation can be explained by properties of binocular feedback exchanges among multiple spatial scales. This explanation provides a theoretical framework in which recent studies and models of interactions between binocular brightness summation and monocular flashes can be interpreted (Cogan, Silverman, and Sekuler, 1982). Wallach and Adams (1954)have shown that if two figures differ only in terms of the reflectance of one region, then an effect quite the opposite of summation may be found. A rivalrous percept of brightness can be generated in which one shade, then the other, is perceived rather than a simultaneous average of the two shades. I will suggest below that this rivalry phenomenon may be related to the possibility that two monocular figures of different lightness may generate different spatial scales and thereby create a binocular mismatch. Having reviewed some data concerning the mutual interdependence and lability of depth, form, and lightness judgments, I will now review some obvious visual facts that seem paradoxical when placed beside some of the theoretical ideas that are in vogue at this time. I will also point out that some popular and useful theoretical approaches arc inherently limited in their ability to explain either these paradoxes or the visual interactions summarized above.
14. The Insufficiency of Disparity Computations It is a truism that the retinal images of objects at optical infinity have zero disparity, and that as an object approaches an observer, the disparities on the two retinas of corresponding object points tend to increase. This is the commonplace reason for assuming that larger disparities are an indicator of relative closeness. Julesz stereograms (Julesz,
The Quantized Geomehy of Visual Space
h
17
1971 have moreover provided an elegant paradigm wherein disparity computations are a s u cient indicator of depth, since each separate Julesz random dot picture contains no monocular form cues, yet statistically reliable disparities between corresponding random dot regions yield a vivid impression of a form hovering in depth. This stunning demonstration has encouraged a decade of ingenious neural modeling. Sperling (1970) introduced important pioneering concepts and equations in a classic paper that explains how cooperation within a disparity plane and competition between disparity planes can resolve binocular ambiguities. These ideas were developed into an effective computational procedure in Dev (1975) which led to a number of mathematical and computer studies (Amari and Arbib, 1977; Marr and Poggio, 1976). Due to these historical considerations, I will henceforth call models of this type Sperling-Dev models. All Sperling-Dev models assume that corresponding to each small retinal region there exist a series of disparity detectors sensitive to distinct disparities. These disparity detectors are organized in sheets such that cooperative effects occur between detectors of like disparity within a sheet, whereas competitive interactions occur between sheets. The net effect of these interactions is to suppress spurious disparity correlations and to carve out connected regions of active disparity detectors within a given sheet. These active disparity regions are assumed to correspond to a depth plane of the underlying retinal regions. Some investigators have recently expressed their enthusiasm for this interpretation by committing the homuncular fallacy of drawing the depth planes in impressive three-dimensional figures which carry the full richness of the monocular patterns, although within the model the monocular patterns do not differentially parse themselves among the several sheets of uniformly active disparity detectors. That something is missing from these models is indicated by the following considerations. The use of a stereogram composed of two separate pictures does not always approximate well the way two eyes view a single picture. When both eyes focus on a single point within a patterned planar surface viewed in depth, the fixation point is a point of minimal binocular disparity. Points increasingly far from the fixation point have increasingly large binocular disparities. Why does such a plane not recede toward optical infinity at the fixation point and curve toward the observer at the periphery of the visual field? Why does the plane not get distorted in a new way every time our eyes fixate on a different point within its surface? If disparities are a sufficient indicator of depth, then how do we ever see planar surfaces? Or even rigid surfaces? This insufficiency cannot be escaped just by saying that an observer's spatial scales get bigger as retinal eccentricity increases. To see this, let a bounded planar surface have an interior which is statistically uniform with respect to an observer's spatial scales (in a sense that will be precisely defined in Part 11). Then the interior disparities of the surface are ambiguous. Only its boundary disparities supply information about the position of the surface in space. Filling-in between these boundaries to create a planar impression is not just a matter of showing that the same disparity, even after an eccentricity compensation, can be locally computed at all the interior points, because an unambiguous disparity computation cannot be carried out at the interior points. The issue is not just whether the observer can estimate the depth of the planar surface, but also how the observer knows that a planar surface is being viewed. This problem is hinted at even when Julesz stereograms are viewed. Starting at one point in the stereogram results in the gradual loss of depth (Kaufman, 1974). Also, in a stereogram composed of three vertical lines to the left eye and just the two flanking lines to the right eye, the direction of depth of the middle line depends on whether the left line or the right line is fixated (Kaufman, 1974 . This demonstration makes the problem of perceiving planes more severe for any t eory which restricts itself to disparity computations, since it shows that depth can depend on the fixation points. What is the crucial difference between the way we perceive the depths of lines and planes? Kaufman (1974) seems to have had this problem in mind when he wrote that "all theories of stereopsis are really inconsistent with the geometry of stereopsis" (p.320).
L
18
Chapter 1
Another problem faced by Sperling-Dev models is that they cannot explain effects of perceived depth on perceived size and lightness. The attractive property that the correct depth plane fills-in with uniform activity due to local cooperativity creates a new problem: How does the uniform pattern of activity within a disparity plane rejoin the nonuniformly patterned monocular data to influence its apparent size and lightness? Finally, there is the problem that only a finite number of depth planes can exist in a finite neural network. Only a few such depth planes can be inferred to exist by joining data relating spatial scales to perceived depth-such as the Kaufman (1974) and Kulikowski (1978)data summarized in Section 6-to spatial frequency data which suggest that only a few spatial scales exist (Graham, 1981;Wilson and Bergen, 1979). Since only one depth plane is allowed to be active at each time in any spatial position in a Sperling-Dev model, apparent depth should discretely jump a few times as an observer approaches an object. Instead, apparent depth seems to change continuously in this situation. 15. T h e Insufficiency of Fourier Models
An approach with a strong kernel of truth but a fundamental predictive limitation is the Fourier approach to spatial vision. The kernel of truth is illustrated by threshold experiments with four different types of visual patterns (Graham, 1981; Graham and Nachmias, 1971). Two of the patterns are gratings which vary sinusoidally across the horizontal visual field with different spatial frequencies. The other two are the sum and difference patterns of the first two. If the visual system behaved like a single channel wherein larger peak-to-trough pattern intensities were more detectable, the compound patterns would be more detectable than the sinusoidal ones. In fact, all the patterns are approximately equally detectable. A model in which the different sinusoidal spatial frequencies are independently filtered by separate spatial channels or scales fits the data much better. Recall from Section 6 some of the other data that also suggest the existence of multiple scales. A related advantage of the multiple channel idea is that one can filter a complex pattern into its component spatial frequencies, weight each component with a factor that mirrors the sensitivity of the human observer to that channel, and then resynthesize the weighted pattern and compare it with an observer’s perceptions. This modulation transfer function approach has been used to study various effects of boundary edges on interior lightnesses (Cornsweet, 1970). If the two luminance profiles in Figure 7 are filtered in this way, they both generate the same output pattern because the human visual system attenuates low spatial frequencies. Unfortunately, both output patterns look like a Cornsweet profile, whereas actually the Cornsweet profile looks like a rectangle. This is not a minor point, since the interior regions of the Cornsweet profile have the same luminance, which is false in the rectangular figure. This application of the Fourier approach seems to me to be misplaced, since the Fourier transform is linear, whereas a reflectance computation must involve some sorts of ratios and is therefore inherently nonlinear. The Fourier scheme is also a feedforward transformation of an input pattern into an output pattern. It cannot in principle explain how apparent depth alters apparent length and brightness, since such computations depend on a feedback exchange between monocular data to engender binocular responses. In particular, the data reviewed in Section 4 show that the very definition of a length scale can remain ambiguous until it is embedded in a binocular feedback scheme. The Fourier transform does not at all suggest why length estimates should be so labile. The multiple channel and sensitivity notions need to be explicated in a different formal framework.
The Quantized Geometry of Visual Space
19
Figure 7. When the Cornsweet profile (a) and the rectangle (b) are filtered in such a way that low spatial frequencies are attenuated, both outputs look like a Cornsweet profile rather than a rectangle, as occurs during visual experience. 16. The Insufliciency of Linear Feedforward Theories
The above criticisms of the Fourier approach to spatial vision hold for all computational theories that are based on linear and feedforward operations. For example, some recent workers in artificial intelligence (Marr and Hildreth, 1980) compute a spatial scale by first linearly smoothing a pattern with respect to a Gaussian distribution and then computing an edge by setting the Laplacian (the second derivatives) of the smoothed pattern equal to zero (Figure 8). The use of the Laplacian to study edges goes back at least to the time of Mach (Ratliff, 1965). The Laplacian is time-honored, but it suffers from limitations that become more severe when its zero-crossings are made the centerpiece of a theory of edges. One of many difficulties is that zero-crossings compute only the position of an edge and not other related properties such as the brightness of the pattern near the edge. Yet the Cornsweet and Craik-O’Brien figures pointedly show that the brightnesses of edges can strongly influence the lightness of their enclosed forms. Something more than zero-crossings is therefore needed to understand spatial vision. The zero-crossing computation itself does not disclose what is missing, so its advocates must guess what is needed. Marr and Hildreth (1980)guess that factors like position, orientation, contrast, length, and width should be computed at the zero-crossings. These guesses do not follow from their definition--or their computation-of an edge. Such properties lie beyond the implications of the zero-crossing computation, because this computation discards essential features of the pattern near the zero-crossing location. Even if the other properties are added to a list of data that is stored in computer memory, this list distorts-indeed entirely destroys-the intrinsic geometric structure of the pattern. The replacement of the natural internal geomdrical relationships of a pattern by arbitrary numerical measures of the pattern prevents the Marr and Hildreth (1980)theory from understanding how global processes, such as filling-in, can spontaneously occur in a physical setting. Instead, the Marr and Hildreth (1980)formulation leads to an approach wherein all the intelligence of what to do next rests in the investigator rather than in the model. This restriction to local, investigator-driven computations is due not only to the
Chapter I
20
Figure 8. When a unit step in intensity (a) is smoothed by a Gaussian kernel, the result is (b). The first spatial derivative is (c), and the second spatial derivative is (d). The second derivative is zero at the location of the edge. present. state of their model’s development, but also to the philosophy of these workers, since Marr and Hildreth write (1980, p.189): “The visual world is not constructed of ripply, wave-like primitives that extend and add together over an area.” Finally, because their theory is linear, it cannot tell us how to estimate the lightnesses of objects, and because their theory is feedforward, it cannot say how apparent depth can influence the apparent size and lightness of monocular patterns.
17. The Filling-In Dilemma:
To Have Your Edge and Fill-In Too
Any linear and feedforward approach to spatial vision is in fact confronted with the full force of the filling-in dilemma: If spatial vision operates by first attenuating all but the edges in a pattern, then how do we ever arrive at a percept of rigid bodies with ample interiors, which are after all the primary objects of perception? How can we have our edges and fill-in too? How does the filling-in process span retinal areas which far exceed the spatial bandwidths of the individual receptive fields that physically justify a Gaussian smoothing process? In particular, in the idealized luminance profile in Figure 9, after the edges are determined by a zero-crossing computation, the directions in which to fill-in are completely ambiguous without further computations tacked on. I
The Quantized Geomeny of Vimal Space
21
Figure 9. In this luminance profile, zero-crossings provide no information about which regions are brighter than others. Auxiliary computations are needed to determine this. will argue in Part 11 of this article that a proper definition of edges does not require auxiliary guesswork. I should emphasize what I do not mean by a solution to the filling-in dilemma. It is not sufficient to say that edge outlines of objects constitute sufficient information for a viewer to understand a three-dimensional scene. Such a position merely says that observers can use edges to arrive at object percepts, but not how they do so. Such a view begs t,he question. It is also not sufficient to say that feedback expectancies, or hypotheses, can use edge information to complete an object percept. Such a view does not say how the feedback expectancies were learned, notably what substrate of completed form information was sampled by the learning process, and it also begs the question. Finally, it is inadequate to say that an abstract reconstruction process generates object representations from edges if this process would require a homunculus for its execution in real time. Expressed in another way, the filling-in dilemma asks: If it is really so hard for us to find mechanisms which can spontaneously and unambiguously fill-in between edges, then do we not have an imperfect understanding of why the nervous system bothers to compute edges? Richards and Marr (1981) suggest that the edge computation compresses the amount of data which needs to be stored. This sort of memory load reduction is important in a computer program, but I will suggest in Part I1 that it is not a rate-limiting constraint on the brain design which grapples with binocular data. I will suggest, in contrast, that the edge computation sets the stage for processes which selectively amplify and fill-in among those aspects of the data which are capable of matching monocularly, binocularly, or with learned feedback expectancies, as the case might be. This conclusion will clarify both why it is that edge extraction is such an important step in the processing of visual patterns, in partial support of recent models (Marr and Hildreth, 1980; Marr and Poggio, 1979), and yet edge preprocessing is just one stage in the nonlinear feedback interactions that are used to achieve a coherent visual percept. P A R T I1
18. Edges a n d Fixations: The Ambiguity of St,atistically ‘Chiform Regions The remainder of this article will outline the major concepts that are needed to build up my theory of these nonlinear interactions. I will also indicate how these concepts can be used to qualitatively interrelate data properties that often cannot be related at all by alternative theoretical approaches. Many of these concepts are mathematical
22
Chapter 1
properties of the membrane equations of neurophysiology, which are the foundation of all quantitative neurophysiological experimentation. The theory provides an understanding of these equations in terms of their computational properties. When the membrane equations are used in suitably interconnected networks of cells, a number of specialized visual models are included as special cases. The theory thereby indicates how these models can be interrelated within a more general, physiologically based, computational framework. Due to the scope of this framework, the present article should be viewed as a summary of an ongoing research program, rather than as a completely tested visual theory. Although my discussion will emphasize the meaning and qualitative reasons for various data from the viewpoint of the theory, previous articles about the theory will be cited for those who wish to study mathematical proofs or numerical simulations, and Appendix A describes a system that is currently being numerically simulated to study binocular filling-in reactions. I will motivate my theoretical constructions with two simple thought experiments. I will use these experiments to remind us quickly of some important relationships between perceived, depth and the monocular computation of spatial nonuniformities. Suppose that an observer attempts to fixate a perceptually uniform rectangle hovering in space in front of a discriminable but perceptually uniform background. How does the observer know where to fixate the rectangle? Even if each of the observer’s eyes independently fixates a different point of the rectangle’s interior, both eyes will receive identical input patterns near their fixation points due to the rectangle’s uniformity. The monocular visual patterns near the fixation points match no matter how disparately the fixation points are chosen within the rectangle. Several conclusions follow from this simple observation. Binocular visual matching between spatially homogeneous regions contains no information about where the eyes are pointed, since all binocular matches between homogeneous regions are equally good no matter where the eyes are pointed. The only binocular visual matches which stand out above the baseline of ambiguous homogeneous matches across the visual field are those which correlate spatially nonuniform data to the two eyes. However, the binocular correlations between these nonuniform patterns, notably their disparities, depend upon the fixation points of the two eyes. Disparity information by itself is therefore insufficient to determine the object’s depth. Instead, there must exist an interaction between vergence angle and disparity information to determine where an object is in space (Foley, 1980;Grossberg, 1976;Marr and Poggio, 1979;Sperling, 1970). This binocular constraint on resolving the ambiguity of where the two eyes are looking is one reason for the monocular extraction of the edges of a visual form and attendant suppression of regions which are spatially homogeneous with respect to a given spatial scale. Without the ability to know where the object is in space, there would be little evolutionary advantage in perceiving its solidity or interior. In this limited sense, edge detection is more fundamental than form detection in dealing with the visual environment. Just knowing that a feedback loop must exist between motor vergence and sensory disparities does not determine the properties of this loop. Sperling (1970)has postulated that vergence acts to minimize a global disparity measure. Such a process would tend to reduce the perception of double images (Kaufman, 1974). I have suggested (Grossberg. 1976b) that good binocular matches generate an amplification of network activity, or a binocular resonance. An imbalance in the total resonant output from each binocular hemifield may be an effective vergence signal leading to hemifield-symmetric resonant activity which signifies good binocular matching and stabilizes the vergence angle. The theoretical sections below will suggest how these binocular resonances also compute coherent depth, form, and lightness information.
The Quantized Geomehy of Visual Space
23
19. Object Perniancnce a n d Multiple Spatial Scales
The second thought experiment reviews a use for multiple spatial scales, rather than a single edge computation, corresponding to each retinal point. Again, our conclusions can be phrased in terms of the fixation process. As a rigid object approaches an observer, the binocular disparities between its nonfixated features increase proportionally. In order to achieve a concept of object permanence, and at the very least to maintain the fixation process, mechanisms capable of maintaining a high correlation between these progressively larger disparities are needed. The largest disparities will, other things being equal, lie at the most peripheral points on the retina. The expansion of spatial scales with retinal eccentricity is easily rationalized in this way (Hubel and Wiesel, 1977; Richards, 1975; Schwartz, 1980). It does not suffice, however, to posit that a single scale exists at each retinal position such that scale size increases with retinal eccentricity. This is because objects of different size can approach the observer. As in the Holway and Boring 1941) experiments, objects of different size can generate the same retinal image if they ie at different distances. If these objects possess spatially uniform interiors, then the boundary disparities of their monocular retinal images carry information about their depth. Because all the objects are at different depths, these distinct disparities need to be computed with respect to that retinal position in one eye that is excited by all the objects’ boundaries. Multiple spatial scales corresponding to each retinal position can carry out these multiple disparity computations. I will now discuss how the particular scales which can binocularly resonate to a given object’s monocular boundary data thereupon fill-in the internal homogeneity of the object’s representation with length and lightness estimates, as well as the related question of how monocular cues and learned expectancies can induce similar resonances and thus a perception of depth.
\
20. Cooperative versus Competitive Binocular Interactions
One major difference between my approach to these problems and alternative approaches is the following: I suggest that a Competitive process, not a cooperative process, defines a depth plane. The cooperative process that other authors have envisaged leads to sheets of network activity which are either off or maximally on. The competitive process that I posit can sustain quantized patterns of activity that reflect an object’s perceived depth, lightness, and length. In other words, the competitive patterns do not succumb to a homuncular dilemma. They are part of the representation of an object’s binocular form. The cells that subserve this representative process are sensitive to binocular disparities, but they are not restricted to disparity computations. In this sense, they do not define a “depth plane” at all. One reason that other investigators have not drawn this conclusion is because a binary code hypothesis is often explicit (or lurks implicitly) in their theories. The intuition that a depth plane can be perceived seems to imply cooperation, because in a binary world competition implies an either-or choice, which is manifestly unsuitable, whereas cooperation implies an and conjunction, which is at least tolerable. In actuality, a binary either-or choice does not begin to capture the properties of a competitive network. Mathematical analysis is needed to understand these properties. (I should emphasize at this point that cooperation and cooperativity are not the same notion. Both competitive and cooperative networks exhibit cooperativity, in the sense in which this word is casually used.) A large body of mathematical results concerning competitive networks has been discovered during the past decade (Ellias and Grossberg, 1975; Grossberg, 1970a, 1972d, 1973, 1978a, 1978c, 1978d, 1978e, 1980a, 1980b, 1981; Grossberg and Levine, 1975; Levine and Grossberg, 1976). These results clarify that not all competitive networks enjoy the properties that are needed to build a visual theory. Certain competitive networks whose cells obey the membrane equations of neurophysiology do have desirable
24
Chapter 1
properties. Such systems are called shuntin,g networks to describe the multiplicative relationship between membrane voltages and the conduct,ance changes that are caused by network inputs and signals. This multiplicative relationship enables these networks to automatically retune their sensitivity in response to fluctuating background inputs. Such an automatic gain control capacity implies formal properties that are akin to reflectance processing, Weber law modulation, sensitivity shifts in response to different backgrounds, as well as other important visual effects. Most other authors have worked with additive networks, whcih do not possess the automatic gain control properties of shunting networks. Sperling (1970, 1981) and Sperling and Sondhi (1968) are notable among other workers in vision for understanding the need to use shunting dynamics, as opposed to mere equilibrium laws of the form I ( A -tJ ) - ' . However, these authors did not develop the mathematical theory far enough to have at their disposal some formal properties that I will need. A review of these and other competitive properties is found in Grossberg (1981, Sections 10-27). The sections below build up concepts leading to binocular resonances.
a l . Reflectance Processing, Weber Law Modulation, and Adaptation Level in Feedforward Shunting Competitive Networks Shunting competitive networks can be derived as the solution of a processing dilemma that confronts all cellular tissues, the so-called noiee-saturation dilemma (Grossberg, 1973, 1978e). This dilemma notes that accurate processing both of low activity and high activity input patterns can be prevented by sensitivity loss due to noise (at the low activity end) and saturation (at the high activity end) of the input spectrum. Shunting competitive networks overcome this problem by enabling the cells to retune their sensitivity automatically as the overall background activity of the input pattern fluctuates through time. This result shows how cells can adapt their sensitivity to input patterns that fluctuate over a dynamical range that is much broader than the output range of the cells. As I mentioned above, the shunting laws take the form of the familiar membrane equations of neurophysiology in neural examples. Due to the generality of the noisesaturation dilemma, formally similar laws should occur in non-neural cellular tissues. I have illustrated in Grossberg (1978b) that some principles which occur in neural tissues also regulate non-neural developmental processes for similar computational reasons. The solution of the noise-saturation dilemma that I will review herein describes intercellular tuning mechanisms. Data describing intracellular adaptation have also been reported (Baylor and Hodgkin, 1974; Baylor, Hodgkin, and Lamb, 1974a, 1974b) and have been quantitatively fitted by a model in which visual signals are multiplicatively gated by a slowly accumulating transmitter substance (Carpenter and Grossberg, 1981). The simplest intercellular mechanism describes a competitive feedforward network in which the activity, or potential, z,(t) of the ith cell (population) u, in a field of cells vl, v l , . .. ,u, responds to a epatial pattern I,(t) = e , I ( t ) of inputs i = 1,2,. . .,n. A collection of inputs comprises a spatial pattern if each input has a fixed relative size or reflectance) 6,, but a possibly variable background intensity I ( t ) (due, say, to a uctuating light source). The convention that Cx,l 6 k = 1 implies that I ( t ) is the total input to the field; viz., Z ( t ) = c$=l I k ( t ) . The simplest law which solves the noisesaturation dilemma describes the net rate ( d z , ) / ( d t ) at which sites at w, are activated and/or inhibited through time. This law takes the form:
A
i = 1,2,. . . ,n where B > 0 2 -C and B 2 z , ( t ) 2 -C for all times t 2 0. Term -Azi describes the spontaneous decay of activity at a constant rate -A. Term (B- q)I,
The Quantized Geomet?y of Visua/@aee
Vi
0
0
0
0
0
"j 0
25
k' 0
Figure 10. In the simplest feedforward competitive network, each input I, excites its cell (population) u, and inhibits all other populations 3, j # i. (From Grossberg 1978e.)
describes the activation due to an excitatory input I, in the ith channel (Figure 10). Ik Term - ( x , + C ) &, I , describes the inhibition of activity by competitive inputs ,&j from the input channels other than u,. In the absence of inputs (namely all I, = 0, i = 1,2,. . . , n), the potential decays to the equilibrium potential 0 due to the decay term -Ax,. No matter how intense the chosen inputs I,, the potential x, remains between the values B and -C at all times C) Ckf, I, = 0 if x1 = -C. That is why because (B - q)Z, = 0 if 5, = B and -(zl B is called an excitatory saturation point and -C is called an inhibitory saturation point. When z, > 0, the cell v, is said to be depolarized. When 5, < 0, the cell u, is hyperpolarized. The cell can be hyperpolarized only if C > 0 since q ( t ) 2 -C at all times t .
+
Before noting how system (1) solves the noise-saturation dilemma, I should clarify its role in the theory as a whole. System (1) is part of a mathematical classification theory wherein a sequence of network variations on the noise-saturation theme is analysed. The classification theory characterizes how changes in network parameters (for example, decay rates or interaction rules) alter the transformation from input pattern ( 1 1 , 1 2 , . . . ,In)to activity pattern (51, q ,. ..,zn).The classification theory thereby provides useful guidelines for designing networks to accomplish specialized processing tasks. The inverse process of inferring which network can generate prescribed data properties is also greatly facilitated. In the present case of system (I), a feedforward flow of inputs to activities occurs wherein a narrow on-center of excitatory input (term (B- xl)Zt)is balanced against a broad off-surround of inhibitory inputs (term -(x, C) &+, 1,). Deviations from these hypotheses will generate network properties that differ from those found in system (l),as I will note in subsequent examples.
+
To see how system (1)solves the noise-saturation dilemma, let the background input
I(1) be held steady for a while. Then the activities in (1) approach equilibrium. These
Chapter 1
26
equilibrium values are found by setting d x , / d t = 0 in (1). They are
Equation (2) exhibits four main features: (a) Factorization and automatic tuning of sensitivity. Term 8, - C / ( B C) depends on the ith reflectance 8, of the input pattern. It is independent of the background intensity I. Formula (2) factorizes information about reflectance from information about background intensity. Due to the factorization property, zt remains proportional to 0, - C / ( B+ C)no matter how large I is chosen to be. In other words, 2, does not saturate. (b) Adaptation level, featural noise suppression, and symmetry-breaking. Output signals from cell v, are emitted only if the potential 5 , is depolarized. By ( l ) , 5 , is depolarized only if term B, - C / ( B C) is positive. Because the reflectance 0, must exceed C / ( B C) to depolarize z,,term C / ( B C) is called the adaptation leuel. The size of the adaptation level depends on the ratio of C to B. Typically B > C in uiuo, which implies that C / ( B-t C) B: 1. Were not C / ( B C) < 1, no choice of 6, could depolarize the cell, since B,, being a ratio, never exceeds 1. The most perfect choice of the ratio of C to B is C / B = l / ( n - 1) since then C / ( B C) = l / n . In this case, any uniform input pattern I1 = I2 = ... = I, is suppressed by the network because then all 8, = l / n . Since also C / ( B C) = l / n , all 5 , = 0 given any input intensity. This property is called featural noise suppression, or the suppression of zero spatial frequency patterns. Featural noise suppression guarantees that only nonuniform reflectances of the input pattern can ever generate output signals. The inequality B >> C is called a symmetry-breaking inequality for a reason that is best understood by considering the special case when C / B = l / ( n - 1). The ratio 1/(n - 1) is also, by (I), the ratio of the number of cells excited by the input I, devided by the number of cells inhibited by the input I , . Noise suppression is due to the fact that the asymmetry of the intercellular on-center off-surround interactions is matched by the asymmetry of the intracellular saturation points. In other words, the symmetry of the network as a whole is "broken" to achieve noise suppression. Any imbalance in this matching of intercellular to intracellular parameters will either increase or decrease the adapt ation level and thereby modify the noise suppression property. Thissymmetry-breaking property of shunting networks leads to a theory of how oncenter off-surround anatomies develop that is different from the one implied by an additive approach, such as a Fourier or Laplacian theory, if only because additive theories do not possess excitatory and inhibitory saturation points. In Grossberg (1978e, 1982e) I suggested how the choice of intracellular saturation points in a shunting network may influence the development of intercellular on-center off-surround connections to generate the correct balance of intracellular and intercellular parameters. An incorrect balance could suppress all input patterns by causing a pathologically large adaptation level. My suggestion is that the balance of intracellular saturation points determines the balance of morphogenetic substances that are produced at the target cells to guide the growing excitatory and inhibitory pathways. (c) Weber-law modulation. Term 8, - C / ( B C) is modulated by the term (B C)I(A which depends only on the background intensity I. This term takes the form of a Weber law (Cornsweet, 1970). Thus (2) describes Weber law modulation of reflectance processing above an adaptation level. (d) Normalization and limited capacity. The total activity of the network is
+
+
+
+
+
+
+
+
+
+
27
llie Quantized Geomeny of Vim1Space
By (3), z is independent of the number n of cells in the network if either C = 0 or C / ( B+ C) = 1/n. In every case, z 5 B no matter how intense I becomes, and B is independent of n. This tendency for total activity not to grow with n is called total activity normalization. Normalization implies that if the reflectance of one part of the input pattern increases while the total input activity remains fixed, then the cell activities corresponding to other parts of the pattern decrease. Weber law modulated reflectance processing helps to explain aspects of brightness constancy, whereas the normalization property helps to explain aspects of brightness contrast (Grossberg, 1981). The two types of property are complementary aspects of the same dynamical process. 22. Pattern Matching and Multidimensional Scaling Without a Metric
The interaction between reflectance processing and the adaptation level implies that the sum of two mismatched input patterns from two separate input sources will be inhibited by network (1). This is because the mismatched peaks and troughs of the two input patterns will add to yield an almost uniform total input pattern, which will be quenched by the noise suppression property. By contrast, the sum of two matched input patterns is a pattern with the same reflectances 8, as the individual patterns. The total activity I + J of the summed pattern, however, exceeds the total activities I and J of the individual patterns. Consequently, by (2) the activities in response to the summed pattern are
+
51
C + -(''-m)
( B C)(Z J ) = -A+I+J
(4)
which exceed the activities in response to the separate patterns. Network activity is thereby amplified in response to matched patterns and attenuated in response to mismatched patt,erns due to an interaction between reflectance processing, the adaptation level, and Weber law modulation. The fact that the activity of each cell in a competitive network can depend on how well two input patterns match is of great importance in my theory. Pattern matching is not just a local property of input sizes at each cell. A given cell can receive two different inputs, yet these inputs may be part of perfectly matched patterns, hence the cell activity is amplified. A given cell can receive two identical inputs, yet these inputs may be part of badly mismatched patterns, hence the cell activity is suppressed. This matching property avoids the homuncular dilemma by being an automatic consequence of the network's pattern registration process. Various models in Artificial (Zk - Jk)' or some other metric Intelligence, by contrast, use a Euclidean distance to compute pattern matches (Klatt, 1980; Newell, 1980). Such an approach requires a separate processor to compute a scalar distance between two patterns before deciding how to tack the results of this scalar computation back onto the mainstream of computational activity. A metric also misses properties of the competitive matching process which are crucial in the study of spatial vision, as well a~ in other pattern recognition problems wherein multiple scales are needed to represent the data unambiguously. In the competitive matching process, a match not only encodes the matched pattern; it also amplifies it. A metric does not encode a pattern, because it is a scalar rather than a vector. A metric does not amplify the matched patterns because it is minimized rather than maximized by a pattern match. Moreover, what is meant by matching differs in a metric and in a shunting network. A metric makes local matches between corresponding input intensities, whereas a network matches reflectances, which depend upon the entire pattern. One could of course use a metric to match ratios of input intensities, but this computation requires an extra homuncular processing step and is insensitive to overall input intensity, which is not true of the network matching mechanism. When the
Chapter I
28
long-range inhibitory term &+ Tk in (1) is replaced by dist ance-dependent inhibitory interactions, as in equation (22) of Section 24, a global match of patterns is replaced by simultaneous local matches on a spatial scale that varies monotonically with receptive field size. Although the properties of metric matches are disappointing in comparison to properties of feedforward network matching, they are totally inadequate when compared to properties of feedback network matching. In a feedback context, there is a flexible criterion of matching called the quenching threshold (Section 28). This criterion can be tuned by attentional and other cognitive factors. Furthermore, approximately matched patterns can mutually deform one another into a fused composite pattern via positive feedback signaling (Ellias and Grossberg, 1975; Grossberg, 1980b). These properties endow the matching process with hysteresis properties that can maintain a match during slow deformations of the input patterns (Fender and Julesz, 1967). When matching occurs between ambiguous bottom-up input patterns and top-down expectancies, the pattern fusion property can complete the ambiguous data leading to a cognitively mediated percept (Gregory, 1966; Grossberg, 1980b). The primary use of network matching in my binocular theory is to show how those spatial scales which achieve the best binocular match of monocular data from the two eyes can resonate energetically, whereas those spatial scales which generate a mismatched binocular interpretation of the monocular data are energetically attenuated. The ease with which these multidimensional scaling effects occur is due to properties that obtain in even the simplest competitive networks. I use the term “multidimensional scaling” deliberately, since similar competitive rules often operate on a higher perceptual and cognitive level Grossberg, 1978e), where metrical concepts have also been used as explanatory tools Osgood, Suci, and Tannenbaum, 1957; Shepard, 1980). An inadequate model of how cell activity reflects matching can limit a theory’s predictive range. For example, in a binocular context, I will use this relationship to suggest how several types of data can be related, including the coexistence of Fechner’s paradox and binocular brightness summation (Blake et a!., 1981), and the choice between binocular fusion and rivalry within a given spatial scale (Kaufman, 1974; Kulikowski, 1978). A reason for binocular brightness summation is already evident in equation (4). The effects of activities I and J on z, exceed those expected from noninteracting independent detectors, but are less than the sum I J , as a result of Weber law modulation (Cogan e t al., 1982). In a feedback network, the inputs It and J, are chosen to be sigmoid, or S-shaped, functions of the network activities at a prior processing stage. The sigmoid signals are needed to prevent the network as a whole from amplifying noise (Section 28). Then (4) is replaced by a nonlinear summation process that clarifies the success of power law and sigmoid summation rules in fitting data about spatial and binocular brightness interactions (Arend, Lange, and Sandick, 1981; Graham, 1981; Grossberg, 1981; Legge and Rubin, 1981).
t
+
23. Weber Law and Shift P r o p e r t y Without Logarithms
The simple equation (1) has other properties which are worthy of note. These properties describe other aspects of how the network retunes itself in response to changes in background activity. The simplest consequence of this retuning property is the classical Weber law - = constant
(5)
where A I is the just noticeable increment above a background intensity I. The approximate validity of (5) has encouraged the belief that logarithmic processing determines visual sensitivity (Cornsweet, 1970; Land, 1977), since A log1 = (AZ)/I, despite the
The Quantized Geometry of Visual Space
29
fact that the logarithm exhibits unphysiral infinities at small and large values of its argument. In fact, Cornsweet (1970) built separate theories of reflectance processing and of brightness perception by using logarithms to discuss reflectances and shunting functions like I ( A + J ) to discuss brightness. By contrast, shunting equations like (2) join together reflectance processing and brightness proressing into a single computational framework. Power laws have often been used in psychophysics instead of logarithms (Stevens, 1959). It is therefore of interest that equation (2) guarantees reflectance processing undistorted by saturation if the inputs I, are power law outputs Z, = XJ; of the activities J, at a prior processing stage. Reflectance processing is preserved under power law transformations because the form of (2) is left invariant by such a transformation. In particular,
where
I = JP and
To show how the Weber law (5) approximately obtains in (2), choose
Z, = K
+ AI,
and
I2 = 13 = . . . = I,, = K .
Then the total input before increment A I is applied to I , is I = n K . By (2),
21
If I
> AI
and n
=
(B+C)(I+AI) K+AI A Z+AI (-I
+
C
-
m)
> 1, then K + AI C - AZ(n - l ) Z AI n K + A I - m I n I + A Z + D z - +I D
where
D = 1 / n - C / ( B+ C). If Z > A , then
Consequently 11
AI g ( B + C ) ( I +D).
If z1 is detectable when it exceeds a threshold
r, then
I 2 . w
I
where
W=
r B+C
- D = constant.
Chapter 1
30
A more precise version of the Weber law (5) is the shifl property. This property says that the region of maximal visual sensitivity shifts without compression as the background off-intensity is parametrically increased (Werblin, 1971). The shift property obtains when the on-center input 1, is plotted in logarithmic coordinates despite the fact that (2) does not describe logarithmic processing. The shift property is important in a multidimensional parallel processing framework wherein changes in the number and intensity of active input sources can fluctuate wildly B) and the through time. Given the shift property, one can fix the activity scale (4, network’s output threshold once and for all without distorting the network’s decision rules as the inputs fluctuate through time. A fixed choice of operating range and of output thresholds is impossible in a multidimensional parallel processing theory that is built up from additive processors. If a fixed threshold is selective when rn converging input channels are active, then it may not generate any outputs whatsoever when n < m input channels of comparable intensity are active, and may unselectively generate outputs whenever n > rn input channels are active. Such a theory needs continually to redefine how big its thresholds should be as the input load fluctuates through time. To derive the shift property, rewrite (2) as xi
=
( B + C)Za- cz A t 1
’
Also write 1, in logarithmic coordinates as M = logel,, or Zi = ,e’ and the total off-surround input as L = Ctp, 4. Then, in logarithmic coordinates, (17)becomes
z,(M,L)=
BeM - C L A L eM‘
+ +
The question of shift invariance is: Does there exist a shift S such that
xt(M + S , h ) N r t ( M , L z ) for all M , where S depends only on hyperpolarization). Then
(1Q)
L1 and Ls? The answer is yes if C =
A+L s =log ’(-2) A+Lz
0 (no
(20)
which shows that successively increasing L by linear increments AL in (18) causes progressively smaller shifts S in (20). In particular, if L1 = ( n - 1)AL and Lz = nAL, then S approaches zero as n approaches infinity. If C > 0, then (19) implies that
A C ( L , - L2)e-M s = log,[ A B + ( B +A CB )tL (I +B + C)L2
1.
(21)
By (21), S depends on M only via term AC(L1 - LZ)e-M, which rapidly decreases as M increases. Thus the shift property improves, rather than deteriorates, at the larger intensities M which might have been expected to cause saturation. Moreover, if B B C , as occurs physically, then (20) is approximately valid at all values of M 2 0. 24. Edge, Spatial Frequency, and Reflectance Processing by the Receptive Fields of Distance-Dependent Feedforward Networks
Equation (1) is based on several assumptions which do not always occur in uiuo. It is the task of the mathematical classification theory to test the consequences of
The Quantized Geomehy of Visual Space
31
modifying these assumptions. One such assumption says that the inhibitory inputs excite all off-surround channels with equal strength, as in term -(zt c)x k f , 1, of (1). Another assumption says that only the ith channel is excited by the ith input, as in term (B z , ) I 1 of (1). In a general feedforward shunting network, both the excitatory and the inhibitory inputs can depend on the distance between cells, as in the feedforward network
+
d a5,
1
-A&
+ (B -
n
n
IkDk, -
5,)
(5,
k= 1
4-
c)
ZkEk,.
(22)
k=l
Here the coefficients Dk, and Ek1 describe the fall-off with the distance between cells vk and v, of the excitatory and inhibitory influences, respectively, of input Zk on cell u,. Equation (22) exhibits variants of all the properties enjoyed by equation (1). These properties follow from the equilibrium activities of (22), namely
x, = where
F,
+q
-
A
"
and
in response to a sustained input pattern I, = O,Z, i = 1 , 2 , . . . ,n. See Ellias and Grossberg (1975) and Grossberg (1981) for a discussion of these properties. For present purposes, I will focus on the fact that the noise suppression property in the network (22) implies an edge detection and spatial frequency detection capability in addition to its pattern matching capability. The noise suppression property in (23) is guaranteed by imposing the inequalities n
B
n
Dki k=l
5c
Eki k=l
I1
i = 1 , 2 , . . , n. Noise suppression follows from 26 because then all z, 5 0 in response to a uniform pattern (all t9* = l / n ) by (23) and 24 . The inequalities (26) say, just as in Section 21, that there exists a matched symmetry-breaking between the spatial bandwidths of excitatory and inhibitory intercellular signaling and the choice of inhibitory and excitatory intracellular saturation points -C and B , respectively. A distance-dependent network with the noise suppression property can detect edges and other nonuniform spatial gradients for the following reason. By (26), those cells w, which perceive a uniform input pattern within the breadth of their excitatory and inhibitory scales are suppressed by the noise suppression property no matter how intense the pattern activity is (Figure 11). Only those cells which perceive a nonuniform pattern with respect to their scales can generate suprathreshold activity. This is also true in a suitably designed additive network (Ratliff, 1965). When the interaction coefficients Dk, and Ek, of (22) are Gaussian functions of distance, as in Dk,= Dexp[-p(k - i)'] and Ek,exp[-u(k - l)'], then the equilibrium activities z, in (23) include and generalize the model of receptive field properties that is currently used to fit a variety of visual data. In particular, the term F, in (24) that appears in the numerator of z1 depends on sums of differences of Gaussians. Differenceof-Gaussian form factors for studying receptive field responses appear in the work of
32
Chapter 1
w +
+
+
Figure 11. When the feedforward competitive network is exposed to the pattern in it suppresses both interior and exterior regions of the pattern that look uniform to k k s at these pattern locations. The result is the differential amplification of pattern regions which look nonuniform to the network, as in (b).
The Quantized Geometv of Visual Space
33
various authors (Blakemore, Carpenter, and (;e.orgc.son, 1970; Ellias and Grossberg, 1975;Enroth-Cugell and Robson, 1966;Lrvine and Grossberg, 1976;Rodieck and Stone, 1965;Wilson and Bergen, 1979). At least three properties of (23)can distinguish it from an additive difference-of-Gaussian theory. The first is that each difference-of-Gaussian form factor BDk, - CEk, in (24) multiplies, or weights, a reflectance o k , and all the weighted reflectances are Weber-modulated by a ratio of the background input I to itself. The difference-of-Gaussian receptive field BDk, - CEk, thereby becomes a weighting term in the reflectance processing of the network as a whole. The second property is that each difference-of-Gaussian factor BDk, - CEk, is itself weighted by the excitatory saturation point B and the inhibitory saturation point C of the network, by contrast with a simple difference-of-Gaussian Dk, - &*. In networks in which zero spatial frequencies are exactly canceled by their receptive fields, the symmetry-breaking inequality B > C of the shunting model predicts that the ratio p-' of excitatory to inhibitory spatial bandwidths should be larger in a shunting theory than in an additive theory. A third way to distinguish experimentally between additive and shunting receptive field models is to test whether the contrast of the patterned responses changes as a function of suprathreshold background luminance. In an additive theory, the answer is no. In a distance-dependent shunting equation such as (23 , the answer is yes. This breakdown is numerically and mathematically analysed in E lias and Grossberg (1975). The ratios which determine z1 in (23) lead to changes of contrast as the background intensity Z increases only because the coefficients Dkr and Ekz are distance-dependent. In a shunting network with a very narrow excitatory bandwidth and a very broad inhibitory bandwidth, the relative sizes of the I, are independent of I. The contrast changes which occur as I increases in the distance-dependent case can be viewed as a partial breakdown of reflectance processing at high I levels due to the inability of inhibitory gain control to compensate fully for saturation effects. The edge enhancement property of a feedforward competitive network confronts us with the full force of the filling-in dilemma. If only edges can be detected by a network once it is constrained to satisfy, even approximately, such a basic property as noise suppression, then how does the visual system spontaneously fill-in among the edges to generate percepts of solid objects embedded in continuous media?
1
25. St,atistical Analysis by S t r u c t u r a l Scales: Edges With Scaling a n d Reflectance Properties Preserved
Before facing this dilemma, I need to review other properties of the excitatory input term I&, and the inhibitory input term ZkEk, in ( 2 2 ) . Let the interaction coefficients 4, and Ek, be distance-dependent, so that Dk, = D( k - t' I) and Ek, = E(I k - i I) where the functions D ( j ) and E ( j ) are decreasing unctions of J , such as Gaussians. Then the input terms C;=, ZkDk, cross-correlate the input pattern ( I , , Z z , , . . , I , ) with the kernel D ( j ) . Similarly, the input terms IkEk, cross-correlate the input pattern (Il,Zz,.. . ,I,,) with the kernel E ( j ) . These statistics of the input pattern, rather than the input pattern itself, are the local data to which the network reacts. I will call the kernels D ( j ) and E ( j ) structural scales of the network to distinguish them from the functional scales that will be defined below. The structural scales perform a statistical analysis of the data before the shunting dynamics further transform these data statistics. Although terms like Cf!, IkDkr are h e a r functions of the inputs l k , the inputs are themselves often nonlinear (notably S-shaped or sigmoidal) functions of outputs from prior network stages (Section 28). Thus the statistical analysis of input patterns is in general a nonlinear summation process. These concepts are elementary, as well as insufficient, for our purposes, It is, however, instructive to review how statistical preprocessing of an input pattern influences the network's reaction to patterns more complex than a rectangle, say, a periodic pat-
c;=l
c$=,
i
34
Chapter I
t,ern of high spatial frequency bars superimposrd on a periodic pattern of low spatial frequency bars (Figure 12a). Suppose for drfinitrness that the excitatory scale D ( j ) is narrower than the inhibitory scale E ( j ) to prevent the occurrence of spurious peak splits and multiple edge effects that can occur even in a feedforward network’s response to spots and bars of input (Ellias and Grossberg, 1975). Then the excitatory structural bandwidth determines a unit length over which input data is statistically pooled, whereas the inhibitory structural bandwidth determines a unit length over which the pooled data of nearby populations are evaluated for their uniformity. It is easily seen that a feedforward network in which featural noise suppression holds and whose excitatory bandwidth approximates a can react to the input pattern with a periodic series of smoothed bumps Figure 12b). By contrast,-a network whose excitatory bandwidth equals period 2a ut is less than the entire pattern width reacts only to the smoothed edges of the input pattern (Figure 12c). The interior of the input pattern is statistically unijorrn with respect to the larger structural scale, and therefore its interior is inhibited by noise suppression. As the excitatory bandwidth increases further, the smoothed edges are lumped to ether until the pattern generates a single centered hump, or spot, of network activity ?Figure 12d). This example illustrates how the interaction of a broad structural scale with the noise suppression mechanism can inhibit all but the smoothed edges of a finely and regularly textured input pattern. After inhibition takes place, the spatial breadth of the surviving edge responses depends on both the input texture and the structural scale; the edges have not lost their scaling properties. The peak height of these edge responses compute a measure of the pattern’s reflectances near its boundary, since ratios of input intensities across the network determine the steady-state potentials 5 , in (23). Rather than discard these monocular scaling and lightness properties, as in a zero-crossing computation, I will use them in an essential way below as the data with which to build up binocular resonances.
b
26. Correlation of Monocular Scaling With Binocular Fusion The sequence of activity patterns in Figures 12b, 12c, and 12d is reversed when an observer steadily approaches the picture in Figure 12a. Then the spot in Figure 12d bifurcates into two boundary responses, which in turn bifurcate into a regular pattern of smoothed bumps, which finally bifurcate once again to reveal the high frequency components within each bump. If the picture starts out sufficiently far away from the observer, then the first response in each of the observer’s spatial scales is a spot, and the bifurcations in the spot will occur in the same order. However, t h e distance at which a given bifurcation occurs depends on the spatial scale in question. Other things being equal, a prescribed bifurcation will occur at a greater distance if the excitatory bandwidth of the spatial scale is narrower (high spatial frequency). Furthermore, the registration of multiple spatial frequencies (or even of multiple spots) in the picture will not occur in a spatial scale whose excitatory bandwidth is too broad (low spatial frequency). The same sequence of bifurcations can occur within the multiple spatial scales corresponding to each eye. If the picture is simultaneously viewed by both eyes, the question naturally arises: How do the two activity patterns within each monocular scale binocularly interact at, each distance? Let us assume for the moment, as in the Kaufman (1974) and Kulikowski (1978)experiments, that as the disparity of two monocular patterns increases, it becomes harder for the high spatial frequency scales to fuse them. Since disparity decreases with increasing patterns (assuming they are detectable at all) when the distance is great enough, but the lower spatial frequency scales can maintain fusion over a broader range of decreasing distances than can the higher spatial frequency scales. Other things being equal, the scales which can most easily binocularly fuse their two monocular representations of a picture at a given distance are t h e scales which average away the finer features in the picture. It therefore seems natural to ask: Does the broad spatial smoothing within low spatial frequency scales enhance their ability t o
The Quantized Geomeny of VisualSpace
35
Figure 12. Transitions in the response of a network to a pattern (a) with multiple spatial frequencies progressively alters from (b) through (d) as the structural scales of the network expand.
Chapter 1
36
binocularly fuse disparate monocular activity patt,erns? Having arrived at this issue, we now need to study those properties of jeedback competitive shunting networks that will be needed to design scale-sensitive binocular resonances in which the fusion event is only one of a constellation of interrelated depth, length, and lightness properties. 27. Noise Suppression in Feedback Competitive Networks
The noise-saturation dilemma confronts all cellular tissues which process input patterns, whether the cells exist in a feedforward or in a feedback anatomy. As part of the mathematical classification theory, I will therefore consider shunting interactions in a feedback network wherein excitatory signals are balanced by inhibitory ones. Together, these feedback signals are capable of retuning network sensitivity in response to fluctuating background activity levels. The feedback analog of the distance-dependent feedforward network (22) is
i= 1,2,, . . ,n. As in (22), term - A z , describes the spontaneous decay of activity at rate -A. Term (B - z,)J, describes the excitatory effect of the feedforward excitatory input J,, which was chosen equal to En IkDk, in (22). Term -(z* -tc ) K , is also a feedforward term due to inhibition otactivity by the feedforward inhibitory input K,, which was chosen equal to Cz,l &ELI in (22). The new excitatory feedback term Cl=, f(Zk)DkI describes the total effect of all the excitatory feedback signals f(Zk)Dk, from the cells ut to v,. The function j ( z , ) transmutes the activity, or potential, of 5, into a feedback signal f(s,), which can be interpreted either as a density of spikes per unit time interval or as an electrotonic influence, depending on the situation. The g(zk)EkZ determines the total effect of all the inhibitory inhibitory feedback term feedback signals &,)& from the cells vk to u,. As in (22), the interaction coefficients Dk, and Ek, are often defined by kernels D ( j ) and E(j)., such that E ( j ) decreases more slowly than D ( j ) as a function of increasing values of 3. The problem of noise suppression is just as basic in feedback networks as in feedforward networks. Suppose, for example, that the feedforward inputs and the feedback signals both use the same interneurons and the same statistics of feedback signaling (f(z,)= g(z,)) to distribute their values across the network. Then (27) becomes
cE=l
d
&ZI
= - A z , -t ( B
n
n
k=l
k=l
c [ I k + f(Zk)]Dkt- ( 2 , + c)c [ I k +/(zk)]Eka
(28)
i = 1,2,...,n. In such a network, the same criterion of uniformity is applied both to feedforward and to feedback signals. Both processes share the same structural scales. Correspondingly, in (28) as in (22) the single inequality
suffices to suppress both uniform feedforward patterns and uniform feedback patterns.
28. Sigmoid Feedback Signals and Tuning Another type of noise suppression, called signal noise suppression, is also needed for a feedback network to function properly. This is true because certain positive feedback functions f ( w ) can amplify even very small activities w into large activities. Noise
The Quantized Geometry of VisuaISpace
31
ACTIVITY Figure 13. A sigmoid signal f ( w ) of cell activity w can suppress noise, contrast enhance suprathreshold activities, normalize total activity, and store the contrast enhanced and normalized pattern in short term memory within a suitably designed feedback competitive network. amplification due to positive feedback signaling can flood the network with internally generated noise capable of massively distorting the processing of feedforward inputs. Pathologies of feedback signaling have been suggested to cause certain seizures and hallucinations (Ellias and Grossberg, 1975; Grossberg, 1973; Kaczmarek and Babloyantz, 1977).
In Grossberg (1973), I proved as part of the mathematical classification theory that the simplest physically plausible feedback signal which is capable of attenuating, rather than amplifying, small activities is a sigmoid, or S-shaped, signal function (Figure 13). Several remarks should be made about t,his result. The comment is sometimes made that you only need a signal threshold to prevent noise amplification (Figure 13). This is true. but insufficient, because a threshold signal function does not perform the same pattern transformation as a sigmoid signal function. For example, in a shunting network with a narrow on-center and a broad off-surround, a threshold signal chooses the population that receives the largest input for activity storage and suppresses the activities of all other populations. By contrast, a sigmoid signal implies the existence of a quenching threshold (QT). This means that the activities of populations whose initial activation is less than the QT are suppressed, whereas the activity pattern of populations whose initial activities exceed the QT is contrast enhanced before being stored. I identify this storage process with storage in short term memory (STM).In a network that possesses a QT, any operation which alters the QT can sensitize or desensitize the network’s ability to store input data (Figure 14). This
38
Chapter 1
PATTERN BEFORE STORAGE
PATTERN AFTER STORAGE
Figure 14. In Figures 10a and lob, the same input pattern is differently transformed and stored in short term memory due to different settings of the network quenching threshold. tuning property is trivialized in a network that chooses the population which receives the largest input for STM storage. In either case, a nonlinear signal function is needed to prevent noise amplification in a feedback network. This fact presents a serious challenge to all linear feedforward models, such as Fourier and Gaussian models. A proper choice of signal function can be made by mathematically classifying how different signal functions transduce input patterns before they are stored in STM.Consider, for example, the following special case of (28):
f(zt)describes long-range a' = 1,2,. . . ,n. In (29), the competitive feedback term zk in the feedforward network (1). Xetwork (29) lateral inhibition, just like term &i strips away all extraneous factors ta focus on the following issue. After an input pattern ( I , , Iz,. . . ,I,, 51, Ja, . . . ,J,) delivered before time t = 0 establishes an initial pattern
The Quantized Geomehy of Visual @ace
39
( r ,(0).~ ~ ( 0 . . .) ,.~ ~ ( 0in)the ) network's artivitics. how does feedhark signaling within the nc,twork transform t,he init,ial patt,ern hcforci it is stored in STM? This problem was solvcd in Cirossberg (1973). Table 1 summarizes the main features of the solution. The function g ( w ) = w 'I(?) is graphed in Table 1 because the property that detcwnines t,he pattern transformation IS whethcr g ( w ) is an incrcasing, constant, or drrrvasing function at prescribed activities w. For example, a linear /(w) = au! drterminrs a constant, g ( w ) = a; a slower-than-linear /(w) = a,w(b + w ) ~ det,ermines a derreasing g ( w ) = a ( b w ) l ; a faster-than-linear j ( w ) = awn, n > 1 , determines an increasing g(w) = awn- I ; and a sigmoid signal function f ( w ) = a w 2 ( b +w 2 ) - I determines a ronraveg(iu) = a w ( b + w z ) Both linear and slower-than-linear signal functions amplify noise, and are therefore unsatisfactory. Faster-than-linear signal functions, such as power laws with powers greater than one, or threshold rules, suppress noise so vigorously that they make a choice. Sigmoid signal functions determine a Q T by mixing together properties of the other types of signal functions. Another import,ant point is that the QT docs not equal the turning point, or manifest threshold, of the sigmoid signal function. The QT depends on all of the parameters of the network. This fact must be understood to argue effectively that the breakd o w i of any of several mechanisms can induce pathological net,work properties, such as seizures or hallucinations, by causing the Q T to assume abnormally small values. Similarly. a n understanding of the factors that control the Q T is needed to analyse possible attentional and cognit.ive mechanisms that can modulat,e how precise a binocular or bottoiii-lip and t.op-down match has to be in order t,o ger1era.t.efusion and resonance. A forinula for t,he Q T of (29) has been roiriput,ed when this network is in its short term memory mode (set all inputs I, = J; = 0). Let. the fccdback signal function f(w) satisfy
+
where C 2 0, g ( w ) is increasing if 0 5 w 5 dl),and g ( u ) = 1 if z(') 5 w 5 B. Thus f ( w ) grows faster-than-linearly if 0 5 w 5 r ( ' ) ,linearly if 8 )5 w 5 B , and attains a maximum value of BC at w = B within the activity interval from 0 to B. The values of f ( w ) at activities w 2 B do not affect network dynamics because each 5, 5 B in (29). It was proved in Grossberg (1973, pp.355-359) that the Q T of (29) is
By (31), the Q T is not the manifest threshold of /(IN),which occurs where g ( w ) is increasing. Rather, the Q T depends on the transition activity where f(w) changes from faster-than-linear to linear, upon the overall slope C of the signal function in the physiological range, upon the number B of excitable sites in each population, and upon the decay rate A. By (31), an increase in C causes a decrease in the QT. Increasing a shuntingsignal C that nonspecifically gates all the network's feedback signals can thereby facilitate STM storage. Such a decrease in the Q T can facilitat,e binocular matching by weakening the criterion of how well matched two input patterns need to be in order for some network nodes t o supraliminally reverberate in STM. It cannot be overemphasized that this and other desirable tuning properties of competitive feedback networks depend upon the existence of a nonlinear signal function f ( w ) . For example, if f ( w ) is linear, then z(l)= 0 in (30) and the Q T = 0 by (31). Then all positive network activities, no matter how small, can be amplified and stored in STM, including activities due to internal cellular noise.
4MPUFIES NOISE
L kL
rMPLIFES NOISE
NJENCHES NOlSl
IAh
WENCHES NOISE
Table 1. Influence of signal function f(w) on input pattern transformation and short term memory storage.
The Quantized Geometry of VisualSpace
41
29. The Interdepcndcnce of Contrast Enhancement and Tuning
The existence of a QT suggests that the contrast enhancement of input patterns that is ubiquitous in the nervous system is not an end in itself (Ratliff, 1965). In feedback competitive shunting networks, contrast enhancement is a mathematical consequence of the signal noise suppression property. This fact is emphasized by the observation that linear feedback signals can perfectly store an input pattern’s refiectances-in particular, they do not enhance the pattern-but only at the price of amplifying network noise (Table 1). Contrast enhancement by a feedback network in its suprathreshold activity range follows from noise suppression by the network in its subthreehold activity range. Contrast enhancement can intuitively be understood if a feedback competitive network possesses a normalization property like that of a feedforward competitive network (Section 21). If small activities are attenuated by noise suppression and total activity is approximately conserved due to normalization, then large activities will be enhanced. The simplest example of total activity normalization in a feedback competitive network follows. Consider network (29) in its short term memory mode (all inputs I, = J, = 0). Let x = C:=, x, be the total STM activity and let F = / ( x , ) be the total feedback signal. Sum over the index i in (29) to find that d
= - A X + (B- x)F.
To solve for the possible equilibrium activities of x(t), let d x / d t = 0 in (32). Then Ax
B - X -- F.
(33)
~
By Table 1, a network with a faster-than-linear signal function choosesjust one activity, say x,, for storage in STM. Hence only one summand in F remains positive as time goes on, and its xi(i) value approaches that of x(t). Thus (33) can be rewritten as
Ax
B-z- I(.)
(34)
or equivalently
Equation (35) is independent of the number of active cells. Hence the total stored STM activity is independent of the number of active cells. The limiting equation (33) is analysed for other choices of signal function in Grossberg (1973). 30. Normalization and Multistability in a Feedback Competitive Network: A Limited Capacity Short Term Memory System
Thus suitably designed feedback competitive networks do possess a normalization property. Recall from Section 21 that in a feedforward competitive network, the total activity can increase with the total input intensity but is independent of the number of active cells. This is true only if the inhibitory feedforward interaction CkfrZk in (1) is of long range across the network cells. If the strengths of the inhibitory pathways are weakened or fall off rapidly with distance, then the normalization property is weakened also, and saturation can set in at high input int.ensities. The same property tends to hold for the feedforward terms (B - x,)J, and -(xt C ) K , of (27). The normalization property of a feedback competitive network is more subtle (Grossberg, 1973, 1981). If such a network is excited to suprathreshold activities and if the
+
42
Chapter 1
exciting inputs are then terminated, then the total activity of the network can approach one of perhaps several positive equilibrium values, all of which tend to be independent of the number of active cells. Thus if the activity of one cell is for some reason increased, then the activities of other cells will decrease to satisfy the normalization constraint unless the system as a whole is attracted to a different equilibrium value. This limited capacity ronstraint on short term memory is an automatic property in our setting. It is postulated without a mechanistic explanation in various other accounts of short term memory processing (Raaijmakers and Shiffrin, 1981, p. 126). The existence of multistable equilibria in a competitive feedback network is illustrated by equation (35). When / ( w ) is a faster-than-linear signal function, both A ( B - z)-l and g(z) in (35) are increasing functions of z, 0 5 z 5 B, and g(z) may be chosen so that these functions intersect at arbitrarily many values E l , Ez, ... of 2. Every other value in such a sequence is a possible stable equilibrium point of z, and the remaining values are unstable equilibrium points of 2. By contast, if g ( w ) is a concave function of w , as when f ( w ) is a sigmoid signal function, a tendency exists for the suprathreshold equilibria of z to be unique or closely clustered together. These assertions are mathematically characterized in Grossberg (1973).
31. Propagation of Normalized Disinhibitory Cues Just as in feedforward networks, the feedback normalization property is weakened if the inhibitory path strengths are chosen to decrease more rapidly with distance. Then the normalization property tends to hold among subsets of cells that lie within one bandwidth of the network’s inhibitory structural scale. In particular, if some cell activities are enhanced by a given amount, then their neighbors will tend to be suppressed by a comparable amount. The neighbors of these neighbors will then be enhanced by a similar amount, and so on. In this way, a disinhibitory wave can propagate across a network in such a way that each crest of the wave inherits, or “remembers,” the activity of the previous crest. This implication of the normalization property in a feedback network with finite structural scales will be important in my account of filling-in. Normalization within a structural scale also endows the network’s activity patterns with constancy and contrast patterns, as in the case of feedforward competitive networks (Section 24). In a feedback context, however, constancy and contrast properties can propagate far beyond the confines of a single structural scale because of normalized disinhibitory properties such as those Figure 15 depicts.
32. S t r u c t u r a l versus Functional Scales The propagation process depicted in Figure 15 needs to be understood in greater detail because it will be fundamental in all that follows. A good way to approach this understanding is to compare the reactions of competitive feedforward networks with those of competitive feedback networks to the same input patterns. Let us start with the simplest case. Choose C = 0 in (22) and (27). This prevents the noise suppression inequalities (26) from holding. Although feedforward and feedback inhibition are still operative, activities cannot be inhibited below zero in this case. Consequently, a uniform input pattern can be attenuated but not entirely suppressed. Choose a sigmoidal feedback signal function to prevent noise amplification, and thus to contrast-enhance the pattern of suprathreshold activities. These hypotheses enable us to study the main effects of feedback signaling unconfounded by the effect of noise suppression. What happens when we present a rectangular input pattern (Figure 15a) to both networks? Due to the feedforward inhibition in (22), the feedforward network enhances the edges of the rectangle and attenuates its interior (Figure 15b). By contrast, the feedback network elicits a regularly spaced series of excitatory peaks across the cells that receive the rectangular input (Figure 15c). This type of reaction occurs even if the
The Quantized Geometry of Visual Space
43
Figure 15. Reaction of a feedforward competitive network (b) and a feedback competitive network (c) to the same input pattern (a). Only the feedback network can activate the interior of the region which receives the input pattern with unattenuated activity.
44
Chapter I
input pattern is not contrast-enhanced by a fwdforward inhibitory stage, as in Figure 15b, before feedback inhibition can act on the contrast-enhanced pattern. The pattern of Figure 15c is elicited even if the feedback acts directly on the rectangular input pattern. Parametric numerical studies of this type of disinhibitory feedback reaction are found in Ellias and Grossberg (1975). The spatial bandwidth between successive peaks in Figure 15c is called the functional scale of the feedback network. My first robust points are that a functional scale can exist in a feedback network but not in a feedforward network, and that, although the functional scale is related to the structural scale of a feedback network, the two scales are not identiral. I will discuss the functional scale given C = 0 before reinstating the noise suppression inequalities (26) because the interaction between contrast enhancement and noise suppression in a feedback network is a much more subtle issue. 33. Disinhibitory Propagation of Functional Scaling From Boundaries to Interiors To see how a functional scale develops, let us consider the network’s response to the rectangular input pattern on a moment-to-moment basis. All the populations v, that are excited by the rectangle initially receive equal inputs. All the activities z, of these populations therefore start to grow at the same rate. This growth process continues until the feedback signals j(rm)Dmrand g(s,)E,, can be registered by the other populations u,. Populations vt which are near the rectangle’s boundary receive smaller total inhibitory sighals Ck=lg(rm)E,, than populations which lie nearer to the rectangle’s center, even when all the rectangle-excited activities smare equal. This is because the interaction strengths L,, = E(I rn - i I) are distance-dependent, and the boundary populations receive no inhibition from contiguous populations that lie outside the rectangle. As a result of this inhibitory asymmetry, the activities L, near the boundary start to grow faster than contiguous activities z3nearer to the center. The inhibitory feedback signal g(z,)E,, from ut to ‘u, begins to exceed the inhibitory feedback signal g(z3)Ejr from v, to v,, because za > z3 and El, = E j t . Thus although all individual feedback signals among rectangle-excited populations start out equal, they are soon differentiated due to a second-order effect whereby the boundary bias in the spatial distribution of the total inhibitory feedback signals is mediated by the activities of individual populations. As the interior activities zJ get differentially inhibited, their inhibitory signals g ( z J ) E , k to populations Vk which lie even deeper within the rectangle’s interior become smaller. Now the total pattern of inputs plus feedback signals is no longer uniform across the populations ‘uJ and v k . The populations are favored. Contrast enhancement bootstraps their activities zk into larger values. Now these populations can more strongly inhibit neighboring populations that lie even deeper into the rectangle’s interior, and the process continues in this fashion. The boundary asymmetry in the total inhibitory feedback signals hereby propagates ever deeper into the rertangle’s interior by a process of distance-dependent disinhibition and contrast enhancement until all the rectangle-excited populations are filled-in by a series of regularly spared activity peaks as in Figure l l c . 34. Quantization of Functional Scales: Hysteresis a n d Uncertainty
As I mentioned in Section 32, two distinct types of spatial scales can be distinguished in a feedback network. The structural scales D ( j ) and E ( j ) describe how rapidly the network’s feedback interaction coefficients decrease as a function of distance. The junctional scale describes the spatial wavelength of the disinhibitory peaks that arise in response to prescribed input patterns. Although these two types of scale are related, they differ in fundamental ways.
The Quantized Geometry of Visual @ace
4s
They are related beraiise an increase in a network’s structural scales can cause an increase in the functional scale with which it fills-in a given input pattern, as in the numerical studies of Ellias and Grossberg (1975). This is due to two effects acting together. A slower decrease of D ( j ) with increasing distance j can increase the number of contiguous populations that pool excitatory feedback. This effect can broaden the peaks in the activity pattern. A slower decrease of E ( j ) with increasing distance j can increase the number of contiguous populations which can be inhibited by an activity peak. This effect can broaden the troughs in the activity pattern. This relationship between structural and functional scales partially supports the intuition that visual processing includes a spatial frequency analysis of visual data (Graham, 1981; Robson, 1975), because if several feedback networks with distinct structural scales received the same input pattern, then they would each generate distinct functional scales such that smaller structural scales tended to generate smaller functional scales. However, the functional scale does not equal the structural scale, and its properties represent a radical departure from feedforward linear ideas. The most important of these differences can be summarized as follows. The functional scale is a quantized property of the interaction between the network and global features of an input pattern, such as its length. Unlike a structural scale, a functional scale is not just a property of the network. Nor is it just a property of the input pattern. The interaction between pattern and network literally creates the functional scale. The quantized nature of this interaction is easy to state because it is so fundamental. (The reader who knows some quantum theory, notably Bohr’s original model of the hydrogen atom, might find it instructive to compare the two types of quantization.) The length L of a rectangular input pattern might equal a nonintegral multiple of a network’s structural scales, but obviously there can only exist an integral number of disinhibitory peaks in the activity pattern induced by the rectangle. The feedback network therefore quantizes its activity in a way that depends on the global structure of the input pattern. The functional scales must change to satisfy the quantum property as distinct patterns perturb the network, even though the network’s structural scales remain fixed. For example, rectangular inputs of length L , L A L , L 2 A L , . . . ,L w A L might all induce M L peaks in the network’s activity pattern. Not until a rectangle of length L (w 1 ) A L is presented might the network respond with ML 1 peaks. This length quantization property suggests a new reason why a network, and perception, can exhibit hysteresis as a n input pattern is slowly deformed through time. This hysteresis property can contribute to, but is not identical with, the hysteresis that is due to penistent binocular matching as a result of positive feedback signaling when two monocular patterns are slowly deformed after first being binocularly matched (Fender and Julesz, 1967; Grossberg, 1980b). Another consequence of the quantization property is that the network cannot distinguish certain differences between input patterns. Quantization implies a certain degree of perceptual uncertainty.
+
+ +
+
+
+
35. Phantoms
The reader might by now have entertained the following objection to these ideas. If percepts really involve spatially regular patterned responses even to uniform input regions, then why don’t we easily see these patterns? I suggest that we sometimes do, as when spatially periodic visual phantoms can be seen superimposed upon otherwise uniform, and surprisingly large, regions (Smith and Over, 1979; Tynan and Sekuler, 1975; Weisstein, Maguire, and Berbaum, 1976). The disinhibitory filling-in process clarifies how these phantoms can cover regions which excite a retinal area much larger than a single structural scale. I suggest that we do not see phantoms more often for three related reasons. During day-twday visual experience, several functional scales are often simultaneously active. The peaks of higher spatial frequency functional scales can overlay the
46
Chapter 1
spaces between lower spatial frequency functional scales. Retinal tremor and other eye movements can randomize the spatial phases of, and thereby spatially smooth, the higher frequency scales across the lower frequency srales through time. Even within a single structural scale, if the boundary of an input pattern curves in two dimensions, then the disinhibitory wavelets can cause interference patterns as they propagate into the interior of the activity pattern along rays perpendicular to each boundary element. These interference patterns can also obscure the visibility of a functional scale. Such considerations clarify why experiments in which visual phantoms are easily seen usually use patterns that selectively resonate with a low spatial frequency structural scale that varies in only one spatial dimension. This suggestion that filling-in by functional scales may subserve phantoms does not imply that the perceived wavelength of a phantom is commensurate with any structural scale of the underlying network. Rather I suggest that once a pattern of functional wavelets is established by a boundary figure, it can quickly propagate by a resonant filling-in reaction into the interior of the figure if the shape of the interior does not define functional barriers to filling-in (Section 40). An important issue concerning the perception of phantoms is whether they are, of necessity, perceivable only if moving displays are used, or whether the primary effect of moving a properly chosen spatial frequency at a properly chosen velocity is to selectively suppress all but the perceived spatial wavelength via noise suppression. The latter interpretation is compatible with an explanation of spatial frequency adaptation using properties of shunting feedback networks (Grossberg, 1980b, Section 12). A possible experimental approach to seeing functional scales using a stationary display takes the form of a two-stage experiment. First adapt out the high spatial frequencies using a spatial frequency adaptation paradigm. Then fixate a bounded display which is large enough and is shaped properly to strongly activate a low spatial frequency scale in one dimension, and which possesses a uniform interior that can energize periodic network activity. 36. F u n c t i o n a l Length and Enimert’s Law Two more important properties of functional scales are related to length and lightness estimates. The functional wavelength defines a length scale. To understand what I mean by this, let a rectangular input pattern of fixed length L excite networks with different structural scales. I hypothesize that the apparent length of the rectangle in each network will depend on the functional scale generated therein. Since a broader structural scale induces a broader functional scale, the activity pattern in such a network will contain fewer active functional wavelengths. I suggest that this property is associated with an impression of a shorter object, despite the fact that L is fixed. The reader might object that this property implies too much. Why can a monocularly viewed object have ambiguous length if it can excite a functional scale? I suggest that under certain, but not all, monocular viewing conditions, an object may excite all the structural scales of the observer. When this happens, the object’s length may seem ambiguous. I will also suggest in Section 39 how binocular viewing of a nearby object can selectively excite structural scales which subserve large functional scales, thereby making the object look shorter. By contrast, binocular viewing of a far-away object can selectively excite structural scales which subserve small functional scales, thereby making the object look longer. Thus the combination of binocular selection of structural scales that vary inversely with an object’s distance, along with the inverse variation of length estimates with functional scales, may contribute to an explanation of Emmert’s law. This view of the correlation between perceived length and perceived distance does not imply that the relationship should be veridical- and indeed sometimes it is not (Hagen and Teghtsoonian, 1981)-for the following reasons. The functional scale is a quantized collective property of a nonlinear feedback network rather than a linear ruler. The selection of which structural scales will resonate to a given object and of
The Quantized Geometry of Visual Space
47
which functional scales will be generatrd within these structural scales depends on the interaction with the object in differrnt ways; for one, the choice of structural scale does not depend on a filling-in reaction. These remarks indicate a sense in which functional scales define an “intrinsic metric,” which is independent of cognitive influences but on whose shoulders correlations with motor maps, adaptive chunking, and lrarned feedback expectancy computations can build (Grossberg, 1978e, 1980b). This intrinsic metric helps to explain how monocular scaling effects, such as those described in Section 5, can occur. Once the relevance of the functional scale concept to metrical estimates is broached, one can begin to appreciate how a dynamic “tension” or “force field” or “curved metric” can be generated whereby objects which excite one part of the visual field can influence the perception of objects a t distant visual positions (Koffka, 1935; Watson, 1978). I believe that the functional scale concept explicates a notion of dynamic field interactions that escapes the difficulties faced by the Gestaltists in their pioneering efforts to explain global visual interactions. 33. Functional Lightness and the Cornsweet Effect
The functional scale concept clarifies how object boundaries can determine the iightness of object interiors, as in the Cornsweet effect. Other things being equal, a more intense pattern edge will cause larger inhibitory troughs around itself. The inhibitory trough which is interior to the pattern will thereby create a larger disinhibitory peak due to pattern normalization within the structural scale. This disinhibitory process continues t o penetrate the pattern in such a way that all the interior peak heights are influenced by the boundary peak height because each inhibitory trough “remembers” the previous peak height. The sensitivity of filled-in interior peak size to boundary peak size helps to explain the Cornsweet effect (Section 11). Crucial to this type of explanation is the idea that the disinhibitory filling-in process feeds off the input intensity within the object interior. The reader can now better appreciate why I set C = 0 to start off my exposition. Suppose that a feedforward inhibitory stage acts on an input pattern before the feedback network responds to the transformed pattern. Let the feedforward stage use its noise suppression property to convert a rectangular input pattern into an edge reaction that suppresses the rectangle’s interior (Figure 15b). Then let the feedback network transform the edge-enhanced pattern. Where does the feedback network get the input energy to fill-in off the edge r e actions into the pattern’s interior if the interior activities have already been suppressed? How does the feedback network know that the original input pattern had an interior at all? This is the technical version of the “To Have Your Edge and Fill-In Too” dilemma that I raised in Section 17. We are now much closer to an answer.
38. The M o n o c u l a r Length-Luminance Effect Before suggesting a resolution of this dilemma, I will note a property of functional scales which seems to be reflected in various data, such as the Wallach and Adams (1954) experiment, but seems not to have been studied directly. This property concerns changes in functional scaling that are due to changes in the luminance of an input pattern. To illustrate the phenomenon in its simplest form, I will consider qualitatively the response of a competitive feedback network such as (27) to a rectangular input pattern of increasing luminance. In Figure 16a the rectangle intensity is too low to elicit any suprathreshold reaction. In Figure 16b a higher rectangle intensity fills-in the region with a single interior peak and two boundary peaks. At the still higher intensity of Figure 16c, two interior peaks emerge. At successively higher intensities, more peaks emerge until the intensity gets so high that a smaller number of peaks again occurs (Figure 16d). This progressive increase followed by a progressive decrease in the number of interior peaks has been found in many comput,er runs (Cohen and Grossberg,
Chapter I
48
1983a; Ellias and Grossberg, 1975). It reflects the network’s increasing sensitivity at higher input intensities until such high intensities are reached that the network starts to saturate and is gradually desensitized. The quantitative change in the relative number of peaks is not so dramatic as Figure 16 suggests. If we assume that the total area under an activity pattern within a unit spatial region estimates the lightness of the pattern, then it is tempting to interpret the above result as a perceived lightness change when the luminance of an object, but not of its background, is parametrically increased. This interpretation cannot be made without extreme caution, however, because the functional scaling change within one monocular representation may alter the ability of this representation to match the other monocular representation within a given structural scale. In other words, by replacing spatially homogeneous regions in a figure by spatially patterned functional scales, we can think about whether these patterns match or mismatch under prescribed conditions. A change in the scales which are capable of binocular matching implies a change in the scales which can energetically resonate. A complex change in perceived brightness, depth, and length may hereby be caused. Even during conditions of monocular viewing, the phenomenon depicted by Figure 16 has challenging implications. Consider an input pattern which is a figure against a ground with nonzero reflectance. Let the entire pattern be illuminated at successively higher luminances. Within the energy region of brightness constancy, the balance between the functional scales of figure and ground can be maintained. At extreme luminances, however, the sensitivity changes illustrated in Figure 16 can take effect and may cause a coordinated change in both perceived brightness and perceived length. If the functional wavelength, as opposed to a more global estimate of the total activated region within a structural scale, influences length judgments, then a small length reduction may be detectable at both low and high luminances. This effect should at the present time be thought of as an intriguing possibility rather than as a necessary prediction of the theory because, in realistic binocular networks, interactive effects between monocular and binocular cells and between multiple structural scales may alter the properties of Figure 16.
39. Spreading FIRE:Pooled Binocular Edges, False Matches, Allelotropia, Binocular Brightness Summation, and Binocular Length Scaling Now that the concept of a functional scale in a competitive feedback network is clearly in view, I can reintroduce the noise suppression inequalities (26) to show how the joint action of noise suppression and functional scaling can generate a filling-in resonant exchange (FIRE that is sensitive to binocular properties such as disparity. Within the framework I ave built up, starting a FIRE capable of global effect8 on perceived depth, form, and lightness is intuitively simple. I will nonetheless describe the main ideas in mechanistic terms, since if certain constraints are not obeyed, the FIRE will not ignite (Cohen and Grossberg, 1Q83a). I will also restrict my attention to the simplest, or minimal, network which exhibits the properties that I seek. It will be apparent that the same types of properties can be obtained in a wide variety of related network designs. The equations that have been used to simulate such a FIRE numerically are described in the Appendix. First I will restrict attention to the case of a single structural scale, which is defined by excitat.ory and inhibitory kernels D ( j ) and E ( j ) ,respectively. Three main intuitions go into t,he construction. Proposition I: Only input pattern data which are spatially nonuniform with respect to a structural scale are informative (Section 18). Proposition II: The ease with which two monocular input patterns of fixed disparity can be binocularly fused depends on the spatial frequencies in the patterns (Sections 6 and 8). This dependence is not, however, a direct one. It is mediated by statistical
h
The Quantized Ceometiy of Visual Space
49
Figure 16. Response of a feedback competitive network to a rectangle of increasing luminance on a black background.
50
Chapter I
preprocessing of the input patterns using nonlinear cross-correlations, as in Section 25. Henceforth when I discuss an “edge,” I will mean a statistical edge rather than an edge wit,hin the input pattern itself. Proposition Irk Filling-in a functional scale can only be achieved if there exists an input source on which the FIRE can feed (Section 33). To fix ideas, let a rectangular input pattern idealize a preprocessed segment of a scene. The interior of the rectangle idealizes an ambiguous region and the boundaries of the rectangle idealize informative regions of the scene with respect to the structural scale in question. A copy of the rectangular input pattern is processed by each monocular representation. Since the scene is viewed from a distance, the two rectangular inputs will excite disparate positions within their respective monocular representations (Figure 17a). In general, the more peripheral boundary with respect to the foveal fixation point will correspond to a larger disparity. Proposition I suggests that the rectangles are passed through a feedforward competitive network capable of noise suppression to extract their statistical edges (Figure 17b). Keep in mind that these edges are not zero-crossings. Rather, their breadth is commensurate with the bandwidth of the excitatory kernel D(j) (Section 25). This property is used to realize Proposition I1 as follows. Suppose that the edge-enhanced monocular patterns are matched at binocular cells, where I mean matching in the sense of Sections 22 and 24. Because these networks possess distance-dependent structural scales, the suppressive effects of mismatch are restricted to the spatial wavelength of an inhibitory scale, E ( j ) , rather than involving the entire network. Because the edges are statistically defined, the concepts of match and mismatch refer to the degree of coherence between monocular statistics rather than to comparisons of individual edges. Three possible cases can occur. The case of primary interest is the one in which the two monocular edge reactions overlap enough to fall within each other’s excitatory on-center D ( j ) . This will happen, for example, if the disparity between the edge centers does not exceed half the width of the excitatory on-center. Marr and Poggio (1979) have pointed out that, within this range, the probability of false matches is very small, in fact less than 5%. Within the zero-crossing formalism of Marr and Poggio (1979), however, the decision to restrict matches to this distance is not part of their definition of an edge. In a theory in which the edge computation retains its spatial scale at a topographically organized binocular matching interface, this restriction is automatic. If this matching constraint is satisfied, then a pooled binocular edge is formed that is centered between the loci of the monocular edges (Figure 17c). See Ellias and Grossberg (1975, Figure 25) for an example of this shift phenomenon. The shift in position of a pooled binocular edge also has no analog in the Marr and Poggio (1979) theory. I suggest that this binocularly-driven shift is the basis for allelotropia (Section 10). If the two distal edges fall outside their respective on-centers, but within their offsurrounds, then they will annihilate each other if they enjoy identical parameters, or one will suppress the other by contrast enhancement if it has a sufficient energetic advantage. This unstable competition will be used to suggest an explanation of binocular rivalry in Section 44. Finally, the two edges might fall entirely outside each other’s receptive fields. Then each can be registered at the binocular cells, albeit with less intensity than a pooled binocular edge, due to equations (2) and (4). A double image can then occur. I consider the dependence of intensity on matching to be the basis for binocular brightness summation (Section 13). The net effect of the above operations is to generate two amplified pooled binocular edges at the boundaries of an ambiguous region if the spatial scale of the network can match the boundary disparities of the region. Networks which cannot make this match are energetically attenuated. Having used disparity (and thus depth) information to
The Quantized Ceomehy of Visual Space
51
Figure 17. After the two monocular patterns (a) are passed through a feedforward competitive network to extract their nonuniform data with respect to the network’s structural scales (b), the filtered patterns are topographically matched to allow pooled binocular edges to form (c) if the relationship between disparity and monocular functional scaling is favorable.
52
Chapter 1
L Figure 18. Monocular processing of patterns through feedforward competitive networks is followed by binocular matching of the two transformed monocular patterns. The pooled binocular edges are then fed back to both monocular representations at a processing stage where they can feed off monocular activity to start a FIRE. select suitable scales and to amplify the informative data within these scales, we must face the filling-in dilemma posed by Proposition 111. How do the binocular cells know how to fill-in between the pooled binocular edges to recover a binocular representation of the entire pattern? Where do these cells get the input energy to spread the FIRE? In other words, having used noise suppression to achieve selective binocular matching, how do we bypass noise suppression to recover the form of the object? If we restrict ourselves to the minimal solution of this problem, then one answer is strongly suggested. Signals from the pooled binocular edge are topographically fed back to the processing stage at which the rectangular input is registered. This is the stage just before the feedforward competitive step that extracts the monocular edges (Figure 18). Several important conclusions follow immediately from this suggestion: 1) The network becomes a feedback competitive network in which binocular match-
The Quantized Geomehy of Visual Space
53
ing modulates the patterning of monocular reprrsentations. 2) If filling-in can occur, a functional scale is defined within this feedback competitive network. A larger disparity between monocular patterns resonates best with a larger structural scale, which generates a larger functional scale. Thus perceived length depends on perceived depth. 3) The activity pattern across the functional scale is constrained by the network's normalization property. Thus perceived depth influences perceived brightness, notably the lightnesses of objects which seem to lie at the same depth. In short, if we can overcome the filling-in dilemma at all within feedback competitive shunting networks, then known dependencies between perceived depth, length, form, and lightness begin to emerge as natural consequences. I know of no other theoretical approach in which this is true. It remains to indicate how the FIRE can spread despite the action of the noise suppression inequalities (26). The main problem to avoid is summarized in Figure 19. Figure 19a depicts a pooled binocular edge. When this edge adds onto the rectangular pattern, we find Figure 19b. Here there is a hump on the rectangle. If this pattern is then fed through the feedforward competitive network, a pattern such as that in Figure 19c is produced. In other words, the FIRE is quenched. This is because the noise suppression property of feedforward competition drives all activities outside the hump to subthreshold values before the positive feedback loops in the total network can enhance any of these activities. I have exposed the reader to this difficulty to emphasize a crucial property of pooled binocular edges. If C > 0 in (27), then an inhibitory trough surrounds the edge (Figure 19d). (If C is too small to yield a significant trough, then the pooled edge must be passed through another stage of feedforward competition.) When the edge in Figure 19d is added to the rectangular input by a competitive interaction, the pattern in Figure 19e is generated. The region of the hump is no longer uniform. The uniform region is separated from the hump by a trough whose width is commensurate with the inhibitory scale E ( j ) . When this pattern is passed through the feedforward competition, Figure 19f is generated. The non-uniform region has been contrast-enhanced into a second hump, whereas the remaining uniform region has been annihilated by noise suppression. Now the pattern is fed back to the rectangular pattern stage and the cycle repeats itself. A third hump is thereby generated, and the FIRE rapidly spreads, or "develops," across the entire rectangular region at a rate commensurate with the time it takes to feed a signal through the feedback loop. Since the cells which are excited by the rectangle are already processing the input pattern when the FIRE begins, it can now spread very quickly. Some further remarks need to be made to clarify how the edge in Figure 19d adds to the rectangular input pattern. The inhibited regions in the edge can generate signals only if they excite off-cells whose signals have a net inhibitory effect on the rectangle. This option is not acceptable because mismatched patterns at the binocular matching cells would then elicit FIREs via off-cell signaling. Rather, the edge activities in Figure 19d are rectified when they generate output signals. These signals are distributed by a competitive (on-center off-surround) anatomy whose net effect is to add a signal pattern of the shape in Figure 19d to the rectangular input pattern. In other words, if all signaling stages of Figure 18 are chosen to be competitive to overcome the noisesaturation dilemma (Section Z l ) , then the desired pattern transformations are achieved. This hypothesis does not necessarily imply that the pathways between the processing stages are both excitatory and inhibitory. Purely excitatory pathways can activate each level's internal on-center off-surround interneurons to achieve the desired effect. From this perspective, one can see that the two monocular edge-extraction stages and the binocular matching stage at the top of Figure 18 can all be lumped into a single binocular edge matching stage. If this is done, the the mechanism for generating FIREs seems elementary indeed. If competitive signaling is used to binocularly match monocular
Chapter I
54
ICI
Figure 19. The FIRE is quenched in (a)-(.) because there exists no nonuniform region off the pooled binocular edge which can be amplified by the feedback exchange. In (d)(f), the inhibitory troughs of the edges enable the FIRE to propagate.
55
The Quantized Geometry of Visual Space
t
t
t
Figure 20. An antagonistic rebound, or off-reaction, in a gated dipole can be caused either by rapid offset of a phasic input or rapid onset of a nonspecific arousal input. As in Figure 21, function J ( t ) represents a phasic input, function I ( t ) represents a nonspecific arousal input, function z ~ ( t represents ) the potential, or activity, of the on-channel’s final stage, and function zs(t) represents the potential, or activity, of the off-channel’s final stage. (From Grossberg 1982c.)
56
Chapter I
representations, then a filling-in reaction will spontaneously occur within the matched scales. 40. Figure-Ground Separation by Filling-In Barriers
Now that we have seen how a FIRE can spread, it remains to say how it can be prevented from inappropriately covering the ent,ire visual field. A case in point is the Julesz (1971) 5% solut,ion of dots on a white background in the stereogram of Section 9. How do the different binocular disparities of the dots in the “figure” and “ground” regions impart distinct depths to the white backgrounds of these two regions? This is an issue because the same ambiguous white background fills both regions. I suggest that the boundary disparities of the “figure” dots can form pooled binocular edges in a spatial scale different from the one that best pools binocular edges in the Upround“scale. At the binocular cells of the ”ground” scale, mismatch of the monocular edges of the “figure” can produce an inhibitory trough whose breadth is commensurate with two inhibitory structural wavelengths. The spreading FIRE cannot cross a filling-in barrier (FIB) any more than a forest fire can cross a sufficiently broad trench. Thus, within a scale whose pooled binocular edges can feed off the ambiguous background activity, FIRES can spread in all directions until they run into FIBs. This mechanism does not imply that a FIRE can rush through all spaces between adjacent FIBs, because the functional scale is a coherent dynamic entity that will collapse if the spaces between FIBs, relative to the functional scale, are sufficiently small. Thus a random placement of dots may, other things being equal, form better FIBS than a deterministic placement which permits a coherent flow of FIRE to run between rows of FIBs. A rigorous study of the interaction between (passive) texture statistics and (coherent) functional scaling may shed further light on the discriminability of figureground separation. The important pioneering studies of Julesz (1978) and his colleagues on textmurestatistics have thus far been restricted to conclusions which can be drawn from (passive) correlational estimates. 41. T h e Prinriple of Scale Equivalence a n d t h e Curvature of ActivityScale Correlations: Fechner’s Paradox, Equidistance Tendency, and Depth Without Disparity
My description of how a FIRE can be spread and blocked sheds light on several types of data from a unified perspective. Suppose that, as in Section 36, an ambiguous monocular view of an object excites all structural scales due to self-matching of the monocular data at each scale’s binocular cells. Suppose that a binocular view of an object can selectively excite some structural scales more than others due to the relationship between matching and activity amplification (Section 22). These assumptions are compatible with data concerning the simultaneous activation of several spatial scales at each position in the visual field during binocular viewing (Graham, Robson, and Nachmias, 1978; Robson and Graham, 1981), with data on binocular brightness summation (Blake, Sloane, and Fox, 1981; Cogan, Silverman, and Sekuler, 1982 , and with data concerning the simultaneous visibility of rivalrous patterns and a dept percept (Kaufman, 1974; Kulikowski, 1978). The suggestion that a depth percept can be generated by a selective amplification of activity in some scales above others also allows us to understand: (1) why a monocular view does not lose its filling-in capability or other resonant properties (since it can excite some structural scales via self-matches); (2) why a monocular view need not have greater visual sensitivity than a binocular view, despite the possibility of activating several scales due to self-matches (since a binocular view may excite its scales more selectively and with greater intensity due to binocular brightness summation); (3) why a monocular view m a y look brighter than a binocular view (Fechner’s paradox) (since although the matched scales during a binocular view are amplified, so that activity lost by binocular mismatch in some scales is partially gained by binocular
h
The Quantized Geometry of Visual Space
57
summation in other scales, the monocular view may excite more scales by self-matches); and (4) yet why a monocular view may have a more ambiguous depth than a binocular view (since a given scene may fail to selectively amplify some scales more than others due to its lack of spatial gradients (Gibson, 1950)). The selective-amplification that enhances a depth percept is sometimes due to the selectivity of disparity matches, but it need not be. The experiment of Kaufman, Bacon, and Barroso (1973) shows that depth can be altered, even when no absolute disparities exist, by varying the relative brightnesses of monocular pattern features. The present framework interprets this result as an external manipulation of the energies that cause selective amplification of certain scales above others, and as one that does so in such a way that the preferred scales are altered as the experimental inputs are varied. The same ideas indicate how a combination of monocular motion cues and/or motion-dependent input energy changes can enhance a depth percept. Motions that selectively enhance delayed self-matches in certains scales above others can contribute to a depth percept. All of these remarks need quantitative implementation via a major program of computer simulations. The simulations that have already been completed do, however, support the mathematical, numerical, and qualitative results on which the theory is founded (Cohen and Grossberg, 1983a). Although this program is not yet complete, the qualitative concepts indicate how to proceed and how various data may be explained in a unified fashion that are not discussed in a unified way by competing theories. The idea that depth can be controlled by the energy balance across several active scales overcomes a problem in Sperling-Dev models. Due to the competition between depth planes in these models, only one depth plane at a time can be active in each spatial location. However, there can exist only finitely many depth planes, both on general grounds due to the finite dimension of neural networks, and on specific grounds due to inferences from spatial frequency data wherein only a few scales are needed to interpret the data (Graham, 1981; Wilson and Bergen, 1979). Why, then, do we not perceive just three or four different depths, one depth corresponding to activity in each depth plane? Why does the depth not seem to jump discretely from scale to scale as an object approaches us? Depth seems to change continuously as an object approaches us despite the existence of only a few structural scales. The idea that the energy balance across functional scales changes continuously as the object approaches, and thereby continuously alters the depth percept, provides an intuitively appealing answer. This idea also mechanistically explicates the popular thesis that the workings of spatial scales may be analogous to the workings of color vision, wherein the pattern of activity across a few cone receptor types forms the substrate for color percepts. The present framework suggests an explanation of Gogel’s equidistance tendency (Section 4). Suppose that a monocularly viewed object of ambiguous depth is viewed which excites most, or all, of its structural scales through self-matches. Let a nearby binocularly viewed object selectively amplify the scales with which it forms the best pooled binocular edges. Let a FIRE spread with the greatest vigor through these amplified scales. When the FIRE reaches the monocular self-matches within its scale, it can amplify the activity of these matches, much as occurs during binocular brightness summation. This shift in the energy balance across the scales which represent the monocularly viewed object impart it with depthfulness. This conclusion follows-and this is the crucial point-even though no new disparity information is produced within the self-matches by the FIRE. Only an energy shift occurs. Thus, although disparities may be sufficient to produce a depth percept, they may not be necessary to produce one. I suggest instead that suitable correlations between activity and scaling across the network loci that represent different spatial positions produce a depth percept. Depth is perceived whenever the resonant activity distribution is “curved” among several structural scales as representational space is traversed, no matter how-monocularly
58
Chapter 1
or binocularly-the activity distribution achieves its curvature. This conclusion may be restated as a deceptively simple proposition: An object in the outside world is perceived to be curved if it induces a curvature in the abstract representational space of activity-scale correlations. Such a conclusion seems to smack of naive realism, but it is saved from the perils of naive realism by the highly nonlinear and nonlocal nature of the shunting network representation of input patterns. This conclusion does, however, provide a scientific rationale for the temptations of naive realism, and points t,heway to a form of neorealism if one entertains the quantum-mechanical proposition that the curvature of an object in the outside world is also due to curved activity-scaling correlations in an abstract representational scale. Such considerations lead beyond the scope of this article. The view that all external operations that use equivalent activity-scaling correlations generate equivalent depth percepts liberates our thinking from the current addiction to disparity computations and suggests how monocular gradients, monocular motion cues, and learned cognitive feedback signals can all contribute to a depth percept. Because of the importance of this conception to my theory, I give it a name: the principle of ecale equivalence. 42. Reflectance Rivalry a n d Spatial Frequency Detection
The same ideas suggest an explanation of the Wallach and Adams (1954) data on rivalry between two central figures of different lightness (Section 13). Suppose that each monocular pattern generates a different functional scale when it is viewed monocularly (Section 38). Suppose, moreover, that the monocular input intensities are chosen so that the functional scales are spatially out of phase with each other. Then when a different input pattern is presented to each eye, the feedback exchange between monocular and binocular cells, being out of phase, can become rivalrous. This explanation leads to a fascinating experimental possibility: Given an input of fixed size, test a series of lightness differences to the two eyes. Can one find ranges of lightnes8 where the functional scales are rivalrous followed by ranges of lightness in which the functional scales can match? If this is possible, then it is probably due to the fact that only certain peaks in the two scales match binocularly. The extra peaks selfmatch. Should this happen, it may be possible to detect small spatial periodicities in lightness such that binocular matches are brighter than self-matches. I am not certain that these differences will be visible, because the filling-in process from the locations of amplified binocular matches across the regions of monocular self-matches may totally obscure the lightness differences of the two types of matches. Such a filling-in process may be interpreted as a type of brightness summation. Another summation phenomenon which may reflect the activation of a functional scale is the decrease in threshold contrast needed to detect an extended grating pattern as the number of cycles in the pattern is increased. Robson and Graham (1981) explain this phenomenon quantitatively 'by assuming that an extended grating pattern will be detected if any of the independently perturbed detectors on whose receptive field the stimulus falls signals its presence" (p.409). What is perplexing about this phenomenon is that "some kind of summation process takes place over at least something approaching 64 cycles of our patterns ... it is stretching credulity rather far to suppose that the visual system contains detectors with receptive fields having aa many as 64 pairs of excitatory and inhibitory regions" (p.413). This phenomenon seems less paradoxical if we suppose that a single suprathreshold peak within a structural scale can drive contiguous subthreshold peaks within that scale to suprathreshold values via a disinhibitory action. Suppose, moreover, that increasing the number of cycles increases the expected number of suprathreshold peaks that will occur at a fixed contrast. Then a summation effect across 64 structural wavelengths is not paradoxical if it is viewed as a filling-in reaction from suprathreshold peaks to subthreshold peaks, much like the filling-in reaction
The Quantized Geometry of Visual Space
59
that may occur between binocular matches and sclf-matches in the Wallach and Adams (1954)paradigm. Due to the large number of phenomena which become intuitively more plausible using this type of filling-in idea, I believe that quantitative studies of how to vary input brightnesses to change the functional scales generated by complex visual stimuli deserve more experimental and theoretical study. One challenge is to find new ways to selectively increase or decrease the activity within one structural scale without inadvertently increasing or decreasing the activities within other active scales as well. In meeting this challenge, possible effects of brightness changes on perceived length are no less interesting than their effects on perceived depth. For example, suppose that an increase in input contrast decreases the functional scale within a prescribed structural scale. Even if the individual peaks in the several functional scales retain approximately the same height, a lightness difference may occur due to the increased density of peaks within a unit cellular region. This lightness difference will alter length scaling in the limited sense that it can alter the ease with which matching can occur between monocular signals at their binocular interface, as I have just argued. It remains quite obscure, however, how such a functional length change in a network’s perceptual representation is related to the genesis of motor actions, or whether motor commands are synthesized from more global properties of the regions in which activity is concentrated across all scales. To the extent that motor consequences help to shape the synthesis of perceptual invariants, no more than a qualitative appreciation of how functional length changes can influence effects like Emmert’s law may be possible until quantitative sensory-motor models are defined and simulated. 43. Resonance in a Feedback Dipole Field: Binocular Development a n d Figure-Ground Completion
My discussions of how a FIRE spreads (Section 39) and of figure-ground completion (Section 40) tacitly used properties that require another design principle to be realized. This design suggests how visual networks are organized into dipole fields consisting of subfields of on-cells and subfields of off-cells with the on-cells joined together and the off-cells joined together by competitive interactions. Because this concept has been extensively discussed elsewhere (Grossberg, 1980b, 1982c, 1982d), I will only sketch the properties which I need here. I will start with a disclaimer to emphasize that I have a very specific concept in mind. My dipoles are not the classical dipoles which Julesz (1971b) used to build an analog model of stereopsis. My dipoles are on-cell off-cell pairs such that a sudden offset of a previously sustained input to the on-cell can elicit a transient antagonistic rebound, or off-reaction, in the activity of the off-cell. Similarly, a sudden and equal arousal increment to both the on-cell and the off-cell can elicit a transient antagonistic rebound in off-cell activity if the arousal increment occurs while the on-cell is active (Figure 20). Thus my notion of dipole describes how STM can be rapidly reset, either by temporal fluctuations in specific visual cues or by unexpected events, not necessarily visual at all, which are capable of triggering an arousal increment at visually responsive cells. In my theory, such an unexpected event is hypothesized to elicit the mismatch negativity component of the N200’evoked potential, and such an antagonistic rebound, or STM reset, is hypothesized to elicit the P300 evoked potential. These reactions to specific and nonspecific inputs are suggested to be mediated by slowly varying transmitter substances-notably catecholamines like norepinephrine-which multiplicatively gate, and thereby habituate to, input signals on their way to the on-cells and the off-cells. The outputs of these cells thereupon compete before eliciting net on-reactions and offreactions, respectively, from the dipole (Figure 21). In a dipole field, the on-cells are hypothesized to interact via a shunting on-center off-surround network. The off-cells are also hypothesized to interact via a shunting on-center off-surround network. These shunting networks normalize and tune the STM
60
Chapter 1
OFF
C0 MPETITION GATE SIGNAL
(4
AROUSAL INPUT
Figure 21. In the simplest example of a gated dipole, phasic input J and arousal input I add in the on-channel to activate the potential 21. The arousal input alone activates x 2 . Signals S1 = f ( q ) and S, = f(zz) such that S1 > Sz are thereby generated. In the square synapses, transmitters z1 and 22 slowly accumulate to a target level. Transmitter is also released at a rate proportional to Slzl in the on-channel and SazZ in the off-channel. This is the transmitter gating step. These signals perturb the potentials z3and 24, which thereupon compete to elicit the net on-reaction 5 5 and off-reaction 26. See Grossberg (1980b, 1982d) for a mathematical analysis of gated dipole properties. (From Grossberg 1982c.)
The Quantized Geometry of Visual Spare
61
activity within the on-subfield and the off-suhfield of the total dipole field network. The dipole interactions between on-cells and off-cells enable an on-cell onset to cause a complementary off-cell suppression, and an on-cell offset to cause a complementary off-cell enhancement. This duality of reactions makes sense of structural neural arrangements such as on-center off-surround networks juxtaposed against off-center onsurround networks and uses this unified processing framework to qualitatively explain visual phenomena such as positive and negative after-effects, the McCollough effect, spatial frequency adaptation, monocular rivalry, and Gestalt switching between ambiguous figures (Grossberg. 1980b). The new features that justify mentioning dipole fields here are that the on-fields and off-fields can interact to generat,e functional scales, and that the signals which regulate the balance of activity between on-cells and off-cells can habituate as the transmitter substances that gate these signals are progressively depleted. These facts will now be used to clarify how figure-ground completion and binocular rivalry might occur. I wish to emphasize, however, that dipole fields were not invented to explain such visual effects. Rather, they were invented to explain how internal representations which self-organize (e.g., develop, learn) as a result of experience can be stabilized against the erosive effects of later environmental fluctuations. My aduptitie resonance theory suggests how learning can occur in response to resonant activity patterns, yet is prevented from occurring when rapid STM reset and memory search routines are triggered by unexpected events. In the present instance, if LTM traces are placed in the feedforward and feedback pathways that subserve binocular resonances, then the theory suggests that binocular development will occur only in response to resonant data patterns, notably to objects to which attention is paid (Grossberg, 1976b, 1978e, 1980b; Singer, 1982). Because the mechanistic substrates needed for the stable self-organization of perceptual and cognitive codes are not peculiar to visual data, one can immediately understand why so many visual effects have analogs in other modalities. An instructive instance of figure-ground completion is Beck’s phantom letter E (Section 6). To fully explain this percept, one needs a good model of competition between orientation sensitive dipole fields; in particular, a good physiological model of cortical hypercolumn organization (Hubel and Wiesel, 1977). Some observations can be made about the relevance of dipole field organization in the absence of a complete model. Suppose that the regularly spaced vertical dark lines of the “ground” are sufficiently dense to create a statistically smoothed pattern when they are preprocessed by the nonlinear cross-correlators of some structural scales (Glass and Switkes, 1976). When such a smoothed patt,ern undergoes noise suppression within a structural scale, it generates statistical edges at the boundary of the “ground” region due to the sudden change in input statistics at this boundary. These edges of the (black) off-field generate complementary edges of the (white) on-field due to dipole inhibition within this structural scale. These complementary edges can use the ambiguous (preprocessed) white as an energy source to generate a FIRE that fills-in the interior of the “ground.’ This FIRE defines the ground as a coherent entity. The “ground” does not penetrate the “figure” because FIBS are generated by the competition which exists between orientation detectors of sufficiently different orientation. A “figure” percept can arise in this situation as the complement of the coherently filled-in “ground,” which creates a large shift in activity-scale correlations at the representational loci corresponding to the “ground” region. In order for the “figure” to achieve a unitary existence other than as the complement of the “ground,” a mechanism must operate on a broader structural scale than that of the variously oriented lines that fill the figure. For example, suppose that, due to the greater spatial extent of vertical ground lines than nonvertical figure lines, the smoothed vertical edges can almost completely inhibit all smoothed nonvertical edges near the figure-ground boundary. Then the “figure” can be completed as a disinhibitory filling-in reaction among all the
Chapter 1
62
smoothed nonvertical orientations of this structural scale. Thus, according to this view, “figure” and “ground” fill-in due to disinhibitory reactions among different subsets of cells. A lightncss difference may be produced between such a “figure” and a “ground” (Dodwell, 1975). A similar argument sharpens the description of how figure-ground completion occurs during viewing of the Julesz 5% stereogram (Section 40). In this situation, black dots that can be fused by one structural scale may nonetheless form FIBs in other structural scales. A FIRE is triggered in the struct,uralscales with fused black dots by the disinhibitory edges which flank the dots in the scale’s white off-field. This FIRE propagates until it reaches FIBs that are generated by the nonfused dots corresponding to an input region of different disparity. The same thing happens in all structural scales which can fuse some of the dots. The figure-ground percept is a statistical property of all the FIREs that occur across scales. 44. Binocular Rivalry
Binocular rivalry can occur in a feedback dipole field. The dynamics of a dipole field also explain why sustained monocular viewing of a scene does not routinely cause a perceived waxing and waning of the scene at the frequency of binocular rivalry, but may nonetheless cause monocular rivalry in response to suitably constructed pictures at a rate that depends on the juxtaposition of features in the picture (Grossberg, 1980b, Section 12). I will here focus on how the slowly habituating transmitter gates in the dipole field could cause binocular rivalry without necessarily causing monocular waxing and waning. Let a pair of smoothed monocular edges mismatch at the binocular matching cells. Also suppose that one edge momentarily enjoys a sufficient energetic advantage over the other to be amplified by contrast enhancement as the other is completely suppressed. This suppression can be mediated by the competition between the off-cells that correspond to the rivalrous edges. In particular, the on-cells of the enhanced edge inhibit the off-cells via dipole competition. Due to the tonic activation of off-cells, the off-cells of the other edge are disinhibited via the shunting competition that normalizes and tunes the off-field. The on-cells of these disinhibited off-cells are thereupon inhibited via dipole competition. As this is going on, the winning edge at the binocular matching cells elicits the feedback signals that ignite whatever FIREs can be supported by the monocular data. This resonant activity gradually depletes the transmitters which gate the resonating pathways. As the habituation of transmitter progresses, the net sizes of the gated signals decrease. The inhibited monocular representation does not suffer this disadvantage because its signals, having been suppressed, do not habituate the transmitter gates in their pathways. Finally, a time may be reached when the winning monocular representation loses its competitive advantage due to progressive habituation of its transmitter gates. As soon as the binocular competition favors the other monocular representation, contrast enhancement bootstraps it into a winning position and a rivalrous cycle is initiated. A monocularly viewed scene would not inevitably wax and wane, for the following reason. Other things being equal, its transmitter gates habituate to a steady level such that the habituated gated signals are an increasing function of their input sizes (Grossberg, 1968,1981, 1982e). Rivalry occurs only when competitive feedback signaling, by rapidly suppressing some populations but not others, sets the stage for the competitive balance to slowly reverse as the active pathways that sustain the suppression habituate faster than the inactive pathways. The same mechanism can cause a percept of monocular rivalry to occur when the monocular input pattern contains a suitable spatial juxtaposition of mutually competitive features (Rauschecker, Campbell, and Atkinson, 1973).
The Quantized Ceomeny of Visual Space
63
45. Concluding R e m a r k s A b o u t Filling-In and Qiiantizetion
The quantized dynamic geometry of FIRE provides a mechanistic framework in which the experimental interdependence of many visual properties may be discussed in a unified fashion. Of course, a great deal of theoretical work remains to be done (even assuming all the concepts are correct), not only in working out the physiological designs in which these dynamic transactions take place but also in subjecting the numerical and mathematical properties of these designs to a confrontation with quantitative data. Also, the discussion of disinhibitory filling-in needs to be complemented by a discussion of how hierarchical feedback interactions between the feedforward adaptive filters (features) and feedback adaptive templates (expectancies) that defme and stabilize a developing code can generate pattern completion effects, which are another form of filling-in (Dodwell, 1975; Grossberg, 1978e, Sections 21-22, 1980b, Section 17; Lanze, Weisstein, and Harris, 1982). Despite the incompleteness of this program, the very existence of such a quantization scheme suggests an answer to some fundamental questions. Many scientists have, for example, realized that since the brain is a universal measurement device acting on the quantum level, its dynamics should in some sense be quantized. This article suggests a new sense in which this is true by explicating some quantized properties of binocular resonances. One can press this question further by asking why binocular resonances are nonlinear phenomena that do not take the form of classical linear quantum theory. I have elsewhere argued that this is because of the crucial role which resonance plays in stabilizing the brain’s self-organization (Grossberg, 1976, 1978e, 1980b). The traditional quantum theory is not derived from principles of self-organization, despite the fact that the evolution of physical matter is as much a fundamental problem of self-organization on the quantum level as are the problems of brain development, perception, and learning. It will be interesting to see, as the years go by, whether traditional quantum theory looks more like an adaptive resonance theory as it too incorporates self-organizing principles into its computational structure.
Chapter I
64
APPENDIX The following system of equations defines a binocular interaction capable of supporting a filling-in resonant exchange (Cohen and Grossberg, 1983a). Monocular Representations
n
Binocular Matching
Binocular-to-Monocular Feedback
where
Fir= B'CEg - D' Ell and
+
Gf, = C;, El,.
Equation ( A l ) describes the response of the activities z t ~i ,= 1 , 2 , . ..,n, in the left monocular representation. Each Z ~ Lobeys a shunting equation in which both the excitatory interaction Coefficients c k l and the inhibitory interaction coefficients Ekg are Gaussian functions of the distance between vk and v,. Two types of simulations have been studied: Additive inputs: All ZkL are chosen equal. The terms J ~ register L the input pattern and summate with the binocular-to-monocular feedback functions Z k .
The Quantized Geometty of Visualspace
65
Shuntinginputs: All J ~ are L chosen equal. The terms I ~ register L the input pattern. The binocular-to-monocular feedback functions zk modulate the system’s sensitivity to the inputs &L in the form of gain control signals. Equation (A2) for the activities r , ~i , = 1 , 2 , . . . ,n, in the right monocular representation has a similar interpretation. Note that the same binocular-to-monocular feedback functions Zk are fed back to the left and right monocular representations. The binocular matching stage (A3) obeys an algebraic equation rather than a differential equation due to the simplifying assumption that the differential equation for and f(zt~). the matching activities y, reacts quickly to the monocular signals f(zt~) Consequently, y, is always in an approximate equilibrium with respect to its input signals. This equilibrium equation says that the monocular inputs f ( z k t ) and f(ZkR)are added before being matched by the shunting interaction. The signal functions f ( w ) are chosen to be sigmoid functions of activity w . The excitatory interaction coefficients C k r and inhibitory interaction coefficients E k r are chosen to be Gaussian functions of distance. The spatial decay rates of Ck,, C k , , and Cir are chosen equal. The spatial decay rates of Ekt, E k , , and Ei, are chosen equal. The on-center is chosen narrower than the off-surround. , .. . ,f(s,;)) and ( f ( ? l ~~)(, z z R ) , After monocular signal patterns ( ~ ( z I L ) f(?zd, . . , , ~(z,R))are matched at the binocular matching stage, the binocular activities yh are rectified by the output signal function g(yk), which is typically chosen to be a sigmoid function of yk. Then these rectified output signals are distributed back to the monocular representations via competitive signals (A6) with the same spatial bandwidths as are used throughout the computation. Numerical studies have been undertaken with the following types of results (Cohen and Grossberg, 1983a). An “edgeless blob,” or Gaussianly smoothed rectangular input, does not supraliminally excite the network at any input intensity. By contrast, when a rectangle is added to the blob input, the network generates a FIRE that globally fills-in the “figure” defined by the rectangle and uses the rectangle’s edges to generate a globally structured “ground” (Figure 22). Despite the fact that the network is totally insensitive to the blob’s intensity in the absence of the rectangle, the rectangle’s presence in the blob sensitizes the network to the ratio of rectangle-plus-blob to blob intensities, and globally fills-in these figure and ground lightness estimates. Parametric input series have been done with rectangles on rectangles, rectangles on blobs, triangles on rectangles, and so forth to study how the network estimates and globally fills-in lightness estimates that are sensitive to the figure-to-ground intensity ratio. Monocular patterns that are mismatched relative to a prescribed structural scale do not activate a FIRE at input intensities that are suprathreshold for matched monocular patterns. Thus, differentstructural scales selectively resonate to the patterns that they can match. Different structural scales also generate different functional scales, other things being equal. Matched monocular patterns such as those described above have been shown to elicit only subliminal feedforward edge reactions until their intensities exceed the network’s quenching threshold, whereupon a full-blown global resonance is initiated which reflects disparity, length, and lightness data in the manner previously described.
Chapter 1
66
RECTANGLE ON BLOB SUPRATHRESHOLD LEFT F ELD
r(PUT
I
1
I
t
0
-1.3*10*
-5.310-
1
6.5*10+
-6.5~10-
Figure 22. Figure-ground filling-in due to a rectangle on an “edgeless blob”: By itself, the blob elicits no suprathreshold reaction in the binocular matching field at any input intensity. By itself, in a network without feedback from the matching field, the rectangle elicits only a pair of boundary edges at any input intensity. Given a fixed ratio of rectangle to blob intensity in the full network, as the background input intensity is parametrically increased, the network first elicits subthreshold reactions to the edges of the rectangle. Once the quenching threshold is exceeded, a full blown global resonance is triggered. Then the rectangle fills-in an intensity estimate between its edges (the “figure”) and structures the blob so that it fills-in an intensity estimate across the entire blob (”ground”). The two intensity estimates reflect the ratio of rectangle-to-blob input intensities. (From Cohen and Grossberg 1982.)
The Quantized Geomcny of Visual Space
67
REFERENCES Amari, S., Dynamics of pattern formation in lateral-inhibition type neural fields. Biological Cybernetics, 1977, 2’1, 77 87. Amari, S., Competitive and cooperative aspects in dynamics of neural excitation and self-organization. In S. Amari and M. Arbib (Eds.), Competition and cooperation in neural networks. Berlin: Springer-Verlag, 1982, 1-28. Amari, S. and Arbib, M.A., Competition and cooperation in neural nets. In J. Metzler (Ed.), Systems neuroscience. New York: Academic Press, 1977. Arend, L.E., Spatial differential and integral operations in human vision: Implications of stabilized retinal image fading. Psychological Review, 1973,80,374-395. Arend, L.E., Buehler, J.N., and Lockhead, G.R., Difference information in brightness perception. Perception and Psychophysics, 1971,9, 367-370. Arend, L.E., Lange, R.V., and Sandick, B.L., Nonlocal determination of brightness in spatially periodic patterns. Perception and Psychophysics, 1981,29, 310-316. Attneave, F., Some informational aspects of visual perception. Psychological Review, 1954, 61, 183-193. Barlow, H.B., Optic nerve impulses and Weber’s Law. In W.R. Uttal (Ed.), Sensory coding. Boston: Little, Brown, and Co., 1972. Barlow, H.B. and Levick, W.R., The mechanism of directionally selective units in rabbit’s retina. Journal of Physiology, 1965,178,447-504. Baylor, D.A. and Hodgkin, A.L., Changes in time scale and sensitivity in turtle photoreceptors. Journal of Physiology, 1974,242, 729-758. Baylor, D.A., Hodgkin, A.L., and Lamb, T.D., The electrical response of turtle cones to flashes and steps of light. Journal of Physiology, 1974,242, 685-727 (a). Baylor, D.A., Hodgkin, A.L., and Lamb, T.D., Reconstruction of the electrical responses of turtle cones to flashes and steps of light. Journal of Physiology, 1974, 242, 759-791 (b). Beck, J., Surface color perception. Ithaca, NY: Cornell University Press, 1972. Bergstrom, S.S., A paradox in the perception of luminance gradients, I. Scandinavian Journal of Psychology, 1966,I, 209-224. Bergstrom, S.S.,A paradox in the perception of luminance gradients, 11. Scandinavian Journal of Psychology, 1967,8, 25-32 (a). Bergstrom, S.S., A paradox in the perception of luminance gradients, 111. Scandinavian Journal of Psychology, 1967,8,33-37 (b). Bergstriim, S.S., A note on the neural unit model for contrast phenomena. Vision Research, 1973,13, 2087-2092. Blake, R. and Fox, R., The psychophysical inquiry into binocular summation. Perception and Psychophysics, 1973, 14, 161-185. Blake, R., Sloane, M., and Fox, R., Further developments in binocular summation. Perception and Psychophysics, 1981,30,266-276. Blakemore, C., Carpenter, R.H., and Georgeson, M.A., Lateral inhibition between orientation detectors in the human visual system. Nature, 1970,228, 37-39. Blank, A.A., Metric geometry in human binocular perception: Theory and fact. In E.L.J. Leeuwenberg and H.F.J.M. Buffart (Eds.), Formal theories of visual perception. New York: Wiley and Sons, 1978. Boynton, R.M., The psychophysics of vision. In R.N. Haber (Ed.), Contemporary theory and research in visual perception. New York: Holt, Rinehart, and Winston, 1968.
68
Chapter I
Bridgeman, B., Metacontrast and lateral inhibition. Psychological Review, 1971, 78, 528-539. Bridgeman, B., A correlational model applied to metacontrast: Reply to Weisstein, Ozog, and Sroc. Bulletin of the Psychonomic Society, 1977,10, 85-88. Bridgeman, B., Distributed sensory coding applied to simulations of iconic storage and metacontrast. Bulletin of Mathematical Biology, 1978,40,605-623. Buffart, H., Brightness and contrast. In E.L.J. Leeuwenberg and H.F.J.M. Buffart (Eds.), Formal theories of visual perception. New York: Wiley and Sons, 1978. Buffart, H., A theory of cyclopean perception. Nijmegen: University, 1981. Buffart, H., Brightness estimation: A transducer function. In H.-G. Geissler, H.F.J.M. Buffart, P. Petzoldt, and Y.M. Zabrodin (Eds.), Psychophysical judgment and the process of perception. Amsterdam: North-Holland, 1982. Buffart, H., Leeuwenberg, E., and Restle, F., Coding theory of visual pattern completion. Journal of Experimental Psychology, 1981, 7, 241-274. Caelli, T.M., Visual perception: Theory and practice. Oxford: Pergamon Press, 1982. Caelli, T.M., Hoffman, W.C., and Lindman, H., Apparent motion: Self-excited 05 cillation induced by retarded neuronal flows. In E.L.J. Leeuwenberg and H.F.J.M. Buffart (Eds.), Formal theories of visual perception. New York: Wiley and Sons, 1978. Carpenter, G.A. and Grossberg, S., Adaptation and transmitter gating in vertebrate photoreceptors. Journal of Theoretical Neurobiology, 1981, 1, 142. Carpenter, G.A. and Grossberg, S., Dynamic models of neural systems: Propagated signals, photoreceptor transduction, and circadian rhythms. In J.P.E. Hodgson (Ed.), Oscillations in mathematical biology. New York: Springer-Verlag, 1983. Cogan, A.L., Silverman, G., and Sekuler, R., Binocular summation in detection of contrast flashes. Perception and Psychophysics, 1982,S1, 330-338. Cohen, M.A. and Grossberg, S., Some global properties of binocular resonances: Disparity matching, filling-in, and figure-ground synthesis. In P.Dodwell and T. C a d i (Eds.), Figural synthesis. Hillsdale, NJ: Erlbaum, 1983 (a). Cohen, M.A.and Grossberg, S., The dynamics of brightness perception. In preparation, 1983 (b). Cohen, M.A. and Grossberg, S., Absolute stability of global pattern formation and parallel memory storage in competitive neural networks. Transactions IEEE, in press, 1983 (c). Coren, S., Brightness contrast as a function of figure-ground relations. Journal of Experimental Psychology, 1969, 80,517-524. Coren, S., Subjective contours and apparent depth. Psychological Review, 1972, I Q , 359-367. Coren, S., Porac, C., and Ward, L.M., Sensation and perception. New York: Academic Press, 1979. Cornsweet, T.N., Visual perception. New York: Academic Press, 1970. Crick, F.H.C., Marr, D., and Poggio, T., An information processing approach to understanding the visual cortex. In The cerebral cortex: Neurosciences research program, 1980. Curtis, D.W.and Rule, S.J., Binocular processing of brightness information: A vectorsum model. Journal of Experimental Psychology: Human Perception and Performance, 1978,4,132-143. Dalenoort, G.J., In search of the conditions for the genesis of cell assemblies: A study in self-organization. Journal of Social and Biological Structures, 1982,5, 161-187 (a).
The Quantized Geometry of Visual Space
69
Dalenoort, G.J., Modelling cognitive processes in self-organizing neural networks, an exercise in scientific reduction. In L.M. Ricciardi and A.C. Scott (Eds.), Biomathematics in 1980. Amsterdam: North-Holland, 1982, 133-144 (b). Day, R.H., Visual spatial illusions: A general explanation. Science, 1972, 175, 13351340. DeLange, H., Attenuation characteristics a n d phase-shift characteristics of the human fovea-cortex systems in relation to flicker-fusion phenomena. Delft: Technical University, 1957. Deregowski, J.B., Illusion and culture. In R.L. Gregory and G.H. Gombrich (Eds.), Illusions in n a t u r e a n d art. New York: Scribner’s, 1973, 161-192. Dev, P.,Perception of depth surfaces in random-dot stereograms: A neural model. International Journal of Man-Machine Studies, 1975, 7,511-528. DeWcert, Ch. M.M.and Levelt, W.J.M., Binocular brightness combinations: Additive and nonadditive aspects. Perception and Psychophysics, 1974, 15, 551-562. Diner, D., Hysteresis in human binocular fusion: A second look. Ph.D. Thesis, California Institute of Technology, Pasadena, 1978. Dodwell, P.C., Pattern and object perception. In E.C. Carterette and M.P. Friedman (Eds.), Handbook of perception, Vol. 5: Seeing. New York: Academic Press, 1975. Eijkman, E.G.J., Jongsma, H.J., and Vincent, J., Two-dimensional filtering, oriented line detectors, and figural aspects as determinants of visual illusions. Perception and Psychophysics, 1981, 29, 352-358. Ellias, S. and Grossberg, S., Pattern formation, contrast control, and oscillations in the short term memory of shunting on-center off-surround networks. Biological Cybernetics, 1975, 20, 69-98. Emmert, E., Grossenverhaltnisse der Nachbilder. Klinische Monatsblatt der Augenheilk unde, 1881, 19, 442-450. Engel, G.R., The visual processes underlying binocular brightness summation. Vision Research, 1967, 7,753-767. Engel, G.R., The autocorrelation function and binocular brightness mixing. Vision Research, 1969, 9,1111-1130. Enroth-Cugell, C. and Robson, J.G., The contrast sensitivity of retinal ganglion cells of the cat. Journal of Physiology, 1966, 187, 517-552. Fender, D. and Julesz, B., Extension of Panum’s fusional area in binocularly stabilized vision. Journal of the Optical Society of America, 1967, 57, 819-830. Festinger, L., Coren, S., and Rivers, G., The effect of attention on brightness contrast and assimilation. American Journal of Psychology, 1970,83, 189-207. Foley, J.M., Depth, size, and distance in stereoscopic vision. Perception and Psychophysics, 1968,3,265-274. Foley, J.M., Binocular depth mixture. Vision Research, 1976, 16, 1263-1267. Foley, J.M., Binocular distance perception. Psychological Review, 1980, 87, 411-434. Foster, D.H., Visual apparent motion and the calculus of variations. In E.L.J. Leeuwenberg and H.F.J.M. Buffart (Eds.), Formal theories of visual perception. New York: Wiley and Sons, 1978. Foster, D.H., A spatial perturbation technique for the investigation of discrete internal representations of visual patterns. Biological Cybernetics, 1980, 38, 159-169. Fox, R. and Mehtyre, C., Suppression during binocular fusion of complex targets. Psychonomic Science, 1967, 8, 143-144.
Chapter 1
70
Freeman, W.J., Cinematic display of spatial structure of EEG and averaged evoked potentials (AEPs) of olfactory bulb and cortex. Electroencephalography and Clinical Neurophysiology, 1973, S7, 199. Freeman, W.J., Mass action in the nervous system. New York: Academic Press, 1975.
Freeman, W.J., EEG analysis gives model of neuronal template matching mechanism for sensory search with olfactory bulb. Biological Cybernetics, 1979, 35, 221-234 ( a ) . Freeman, W .J., Nonlinear dynamics of paleocortex manifested in the olfactory EEG. Biological Cybernetics, 1979, 55, 21-37 (b). Freeman, W.J., Nonlinear gain mediating cortical stimulus response relations. Biological Cybernetics, 1979, 55, 237-247 (c). Freeman, W.J., A physiological hypothesis of perception. Perspectives in Biology and Medicine, 1981, 24, 561-592. Freeman, W.J. and Schneider, W., Changes in spatial patterns of rabbit olfactory EEG with conditioning to odors. Psychophysiology, 1982, 19, 44-56. Frisby, J.P., Seeing. Oxford: Oxford University Press, 1979. Frisby, J.P. and Julesz, B., Depth reduction effects in random line stereograms. Perception, 1975,4, 151-158. Gerrits, H.J.M., deHaan, B.,and Vendrick. A.J.H., Experiments with retinal stabilized images: Relations between the observations and neural data. Vision Research, 1966, 6, 427-440.
Gerrits, H.J.M. and Timmermann, J.G.M.E.N., The filling-in process in patients with retinal scotomata. Vision Research, 1969, 9, 439-442. Gerrits, H.J.M. and Vendrick, A.J.H., Artificial movementsof a stabilized image. Vision Research, 1970, 10, 1443-1456 (a). Gerrits, H.J.M. and Vendrick, A.J.H., Simultaneous contrast, filling-in process and information processing in man’s visual system. Experimental Brain Research, 1970, 11, 411-430 (b). Gerrits, H.J.M. and Vendrick, A.J.H., Eye movements necessary for continuous perception during stablization of retinal images. Bibliotheca Ophthalmologica, 1972, 82, 339-347.
Gerrits, H.J.M. and Vendrick, A.J.H., The influence of simultaneous movements on perception in parafoveal stabilized vision. Vision Research, 1974, 14, 175-180. Gibson, J., Perception of the visual world. Boston: Houghton Mifflin, 1950. Gilchrist, A.L., Perceived lightness depends on perceived spatial arrangement. Science, 1977, 195, 185-187.
Gilchrist, A.L., The perception of surface blacks and whites. Scientific American, 1979, 240, 112-124.
Glass, L., Effect of blurring on perception of a simple geometric pattern. Nature, 1970, 228, 1341-1342.
Glass,L. and Switkes, E., Pattern recognition in humans: Correlations which cannot be perceived. Perception, 1976, 5, 67-72. Gogel, W.C., The tendency to see objects as equidistant and its reverse relations to lateral separation. Psychological Monograph 70 (whole no. 411), 1966. Gogel, W.C.,Equidistance tendency and its consequences. Psychological Bulletin, 1965,64, 153-163.
Gogel, W.C., The adjacency principle and three-dimensional visual illusions. Psyche nomic Monograph, Supplement 3 (whole no. 45), 153-169, 1970. Gonzales-Estrada, M.T.and Freeman, W.J., Effects of carnosine on olfactory bulb EEG, evoked potentials and DC potentials. Brain Research, 1980, 202, 373-386.
The Quantized Geometry of VisualSpace
71
Graham, N., The visual system does a crude Fourier analysis of patterns. In S. Grossberg (Ed.), Mathematical psychology a n d psychophysiology. Providence, RI: American Mat,hematical Society, 1981. Graham, N. and Nachmias, J., Detection of grating patterns containing two spatial frequencies: A test of single-channel and multiple channel models. Vision Research, 1971, 11, 251-259. Graham, N., Robson, J.G., and Nachmias, J., Grating summation in fovea and periphery. Vision Research, 1978, 18, 816-825. Gregory, R.L., Eye and brain. New York: McGraw-Hill, 1966. Grimson, W.E.L., A computer implementation of a theory of human stereo vision. Philosophical Transactions of the Royal Society of London B, 1981, 292, 217-253. Grimson, W.E.L., A computational theory of visual surface interpolation. Philosophical Transactions of the Royal Society of London B, 1982, 298, 395-427 (a). Grimson, W.E.L., From images to surfaces: A computational study of the human early visual system. Cambridge, MA: MIT Press, 1982 (b). Grimson, W.E.L., Surface consistency constraints in vision. Computer Graphics and Image Processing, in press, 1983. Grossberg, S., Some physiological and biochemical consequences of psychological postulates. Proceedings of the National Academy of Sciences, 1968, 60,758-765. Grossberg, S., On learning and energy-entropy dependence in recurrent and nonrecurrent signed networks. Journal of Statistical Physics, 1969,1, 319-350(a). Grossberg, S.,On the serial learning of lists. Mathematical Biosciences, 1969,4,201253 (b). Grossberg, S.,Neural pattern discrimination. Journal of Theoretical Biology, 1970,27, 291-337 (a). Grossberg, S., Some networks that can learn, remember, and reproduce any number of complicated space-time patterns, 11. Studies in Applied Mathematics, 1970, 49, 135-166 (b). Grossberg, S., On the dynamics of operant conditioning. Journal of Theoretical Biology, 1971, 33,225-255 (a). Grossberg, S., Pavlovian pattern learning by nonlinear neural networks. Proceedings of the National Academy of Sciences, 1971,68,828-831 (b). Grossberg, S., A neural theory of punishment and avoidance, I: Qualitative theory. Mathematical Biosciencs, 1972, 16, 39-67 (a). Grossberg, S.,A neural theory of punishment and avoidance, 11: Quantitative theory. Mathematical Biosciences, 1972, 15, 253-285 (b). Grossberg, S., Pattern learning by functional-differential neural networks with arbitrary path weights. In K. Schmitt (Ed , Delay and functional-differential equations and their applications. New ork: Academic Press, 1972 (c). Grossberg, S., Neural expectation: Cerebellar and retinal analogs of cells fired by learnable or unlearned pattern classes. Kybernetik, 1972,10, 49-57 (d). Grossberg, S., Contour enhancement, short-term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 1973, 52, 217-257. Grossberg, S., Classical and instrumental learning by neural networks. In R. Rosen and F. Snell (Eds.), Progress in theoretical biology, Vol. 3. New York: Academic Press, 1974. Grossberg, S., A neural model of attention, reinforcement, and discrimination learning. International Review of Neurobiology, 1975,18,263-327.
;I
12
Chapter 1
Grossberg, S., Adaptive pattern classification and universal recoding, I: Parallel development and coding of neural feature detectors. Biological Cybernetics, 1976,23, 121-134 (a). Grossberg, S., Adaptive pattern classification and universal recoding, 11: Feedback, expectation, olfaction, and illusions. Biological Cybernetics, 1976,23, 187-202 (b). Grossberg, S.,On the development of feature detectors in the visual cortex with applications to learning and reaction-diffusion systems. Biological Cybernetics, 1976,21, 145-159 (c). Grossberg, S., Behavioral contrast in short-term memory: Serial binary memory models or parallel continuous memory models? Journal of Mathematical Psychology, 1978, 17, 199-219 (a). Grossberg, S., Communication, memory, and development. In R. Rosen and F. Snell (Eds.), Progress in theoretical biology, Vol. 5. New York: Academic Press, 1978 (b). Grossberg, S., Competition, decision, and consensus. Journal ofMathematical Analysis and Applications, 1978,66,470-493 (c). Grossberg, S., Decisions, patterns, and oscillations in the dynamics of competitive systems with applications to Volterra-Lotka systems. Journal of Theoretical Biology, 1978, 7S, 101-130 (a). Grossberg, S., A theory of human memory: Self-organization and performance of sensory-motor codes, maps, and plans. In R. Rosen and F. Snell (Eds.), Progress in theoretical biology, Vol. 5. New York: Academic Press, 1978 (e). Grossberg, S., Biological competition: Decision rules, pattern formation, and oscillations. Proceedings of the National Academy of Sciences, 1980,77,2338-2342 (a). Grossberg, S., How does a brain build a cognitive code? Psychological Review, 1980, 87,1-51 (b). Grossberg, S., Adaptive resonance in development, perception, and cognition. In S. Grossberg (Ed.), Mathematical psychology a n d psychophysiology. Providence, RI: American Mathematical Society, 1981. Grossberg, S., Associative and competitive principles of learning and development: The temporal unfolding and stability of STM and LTM patterns. In S.I. Amari and M. Arbib (Eds.), Competition and cooperation in neural networks. New York: Springer-Verlag, 1982 (a), Grossberg, S., A psychophysiological theory of reinforcement, drive, motivation, and attention. Journal of Theoretical Neurobiology, 1982,1,286-369 (b). Grossberg, S.,The processing of expected and unexpected events during conditioning and attention: A psychophysiological theory. Psychological Review, 1982, 89,529572 (c). Grossberg, S., Some psychophysiological and pharmacological correlates of a developmental, cognitive, and motivational theory. In R. Karrer, J. Cohen, and P. Tueting (Eds.), Brain and information: Event related potentials. New York: New York Academy of Sciences, 1982 (a). Grossberg, S., Studies of mind a n d brain: Neural principles of learning, perception, development, cognition, and motor control. Boston: Reidel Press, 1982 (e). Grossberg, S., The adaptive self-organization of serial order in behavior: Speech and motor control. In E.C. Schwab and H.C. Nusbaum (Eds.), Perception of speech and visual form: Theoretical issues, models, and research. New York: Academic Press, 1983. Grossberg, S. and Kuperstein, M., Adaptive dynamics of the saccadic eye movement system. In preparation, 1983.
The Quaritized Geonteny of Visual Space
73
Grossberg, S. and Levine, D., Some developmental and at,tentional biases in the contrast enhancement and short term memory of recurrent neural networks. Journal of Theoretical Biology, 1975, 53, 341-380. Grossberg, S. and Pepe, J., Schizophrenia: Possible dependence of associational span, bowing, and primacy versus recency on spiking threshold. Behavioral Science, 1970, 15, 359-362. Grossberg, S. and Pepe, J., Spiking threshold and overarousal effects in serial learning. Journal of Statistical Physics, 1971, 3, 95-125. Griinau, M.W. von, The involvement of illusory contours in stroboscopic motion. Perception and Psychophysics, 1979, 25, 205-208. Hagen, M.A. and Teghtsoonian, M., The effects of binocular and motion-generated information on the perception of depth and height. Perception and Psychophysics, 1981, SO, 257-265. Hamada, J., A mathematical model for brightness and contour perception. Hokkaido Report of Psychology, 1976, HRP-11-76-17. Hamada, J., Antagonistic and non-antagonistic processes in the lightness perception. Proceedings of the XXII International Congress of Psychology, Leipzig, July 6 - 12, 1980. Hebb, D.O., The organieation of behavior. New York: Wiley and Sons, 1949. Hecht, S., Vision 11: The nature of the photoreceptor process. In C. Murchison (Ed.), A handbook of general experinlent a1 psychology. Worcester, MA: Clark University Press, 1934. Helmholtz, H.L.F. von, Treatise on physiological optics, J.P.C. Southall (Trans.). New York: Dover, 1962. Hepler, N., Color: A motion-contingent after-effect. Science, 1968, 162, 376-377. Hering, E., Outlines of a theory of the light sense. Cambridge, MA: Harvard University Press, 1964. Hermann, A., T h e genesis of q u a n t u m theory (1899-1913), C.W. Nash (Trans.). Cambridge, MA: MIT Press, 1971. Hildreth, E.C., Implementation of a theory of edge detection. MIT Artificial Intelligence Laboratory Technical Report TR-579, 1980. Hochberg, J., Contralateral suppressive fields of binocular combination. Psychonomic Science, 1964, 1, 157-158. Hochberg, J. and Beck, J., Apparent spatial arrangement and perceived brightness. American Journal of Psychology, 1954, 47, 263-266. Holway, A.F. and Boring, E.G., Determinants of apparent visual size with distance variant. American Journal of Psychology, 1941, 54, 21-37. Horn, B.K.P., Determining lightness from an image. Computer Graphics and Image Processing, 1974, 3, 277-299. Horn, B.K.P., Understanding image intensities. Artificial Intelligence, 1977,8, 201-231. Hubel, D.H. and Wiesel, T.N., Functional architecture of macaque monkey visual cortex. Proceedings of the Royal Society of London (B),1977, 198, 1-59. Hurvich, L.M. and Jameson, D., Some quantitative aspects of an opponent-color theory, 11: Brightness, saturation, and hue in normal and dichromatic vision. Journal of the Optical Society of America, 1955,45, 602-616. Indow, T., Alleys in visual space. Journal of Mathematical Psychology, 1979, 19, 221-258. Indow, T., An approach to geometry of visual space with no a priori mapping functions. Journal of Mathematical Psychology, in press, 1983.
74
Chapter 1
Johansson, G., About the geometry underlying spontaneous visual decoding of the optical message. In E.L.J. Leeuwenberg and H.F.J.M. Buffart (Eds.), Formal theories of visual perception. New York: Wiley and Sons, 1978. Julesz, B., Binocular depth perception of computer-generated patterns. Bell System Technical Journal, 1960,59, 1125-1162. Julesz, B., Towards the automation of binocular depth perception (AUTOMAP). Praceedings of the IFIP Congress 62, 27 Aug-1 Sep 1962. Amsterdam: NorthHolland, 1962,439-444. Julesz, B., Binocular depth perception without familiarity cues. Science, 1964, 145, 356-362. Julesz, B., Binocular depth perception in mm-a cooperative model of stereopsis. In 0.-J. Grusser and R. Klinke (Eds. P a t t e r n recognition in biological and technical systems, Proceedings of t e German Cybernetic Society, Berlin, April 6-9, 1970. Berlin: Springer-Verlag, 1971,300-315 (a). Julesz, B.,Foundations of cyclopean perception. Chicago: University of Chicago Press, 1971 (b). Julesz, B., Cooperative phenomena in binocular depth perception. American Scientist, 62, 32-43. Reprinted in I.L. Janis (Ed.), Current trends in psychology: Readings from American Scientist. Los Altos, CA: W. Kaufmann, 1974. Julesz, B., Global stereopsis: Cooperative phenomena in stereoscopic depth percep tion. In R. Held, H.W. Leibowitz, and H.-L. Teuber (Eds.), Handbook of sensory physiology, Vol. 8: Perception. Berlin: Springer-Verlag, 1978,215-256 (a). Julesz, B., Perceptual limits of texture discrimination and their implications to figuground separation. In E.L.J. Leeuwenberg and H.F.J.M. Buffart Eds.), Formal theories of visual perception. New York: Wiley and Sons, 1978 b). Julesz, B. and Chang, J.J., Interaction between pools of binocular disparity detectors tuned to different disparities. Biological Cybernetics, 1976,22, 107-119. Just, M.A. and Carpenter, P.A., Eye fixations and cognitive processes. Cognitive Psychology, 1976,8,441-480. Kaczmarek, L.K. and Babloyantz, A., Spatiotemporal patterns in epileptic seizures. Biological Cybernetics, 1977,26, 199-208. Kaufman, L., Sight and mind: A n introduction to visual perception. New York: Oxford University Press, 1974. Kaufman, L., Bacon, J., and Barroso, F., Stereopsis without image segregation. Vision Research, 1973,19, 137-147. Klatt, D.H., Speech perception: A model of acoustic-phonetic analysis and lexical access. In R.A. Cole (Ed.), Perception and production of fluent speech. Hillsdale, NJ: Erlbaum, 1980. Konig, A. and Brodhun, E., Experimentelle Untersuchungen Uber die psychophysische Fundamentalformel in Bezug auf den Gesichtssinn. Siteungsberichte der preussischen Akademie der Wissenschaften, Berlin, 1889, 27, 641-644. Koffka, K., Principles of gestalt psychology. New York: Harcourt and Brace, 1935. Kulikowski, J.J., Limit of single vision in stereopsis depends on contour sharpness. Nature, 1978,275, 126-127. Laming, D.R.J., Mathematical psychology. London: Academic Press, 1973. Land, E.H., The retinex theory of color vision. Scientific American, 1977,2S7,108-128. Land, E.H. and McCann, J.J., Lightness and retinex theory. Journal of the Optical Society of America, 1971,61, 1-11. Lanze, M.,Weisstein, N.,and Harris, J.R., Perceived depth versus structural r e l e k c e in the object-superiority effect. Perception and Psychophysics, 1982,S1, 376-382.
k
t
The Quantized Georneny of VisualSpace
I5
Leake, B. and Annines, P., Effects of connectivity on the activity of neural net models. Journal of Theoretical Biology, 1976, 58,337-363. Leeuwenberg, E., The perception of assimilation and brightness contrast. Perception and Psychophysics, 1982,32, 345-352. Legge, G.E. and Foley, J.M., Contrast masking in human vision. Journal of the Optical Society of America, 1980, 70, 1458-1471. Legge, G.E.and Rubin, G.S.,Binocular interactions in suprathreshold contrast perception. Perception and Psychophysics, 1981,30, 49-61. LeGrand, Y., Light, colour, and vision. New York: Dover Press, 1957. Leshowitz, B., Taub, H.B., and Raab, D.H., Visual detection of signals in the presence of continuous and pulsed backgrounds. Perception and Psychophysics 1968,4, 207213. Lettvin, J.Y., “Filling out the forms”: An appreciation of Hubel and Weisel. Science, 1981, 214, 518-520. Levelt, W.J.M., O n binocular rivalry. Soesterberg, The Netherlands: Institute for Perception, RVO-TNO, 1965. Levine, D.S. and Grossberg, S., Visual illusions in neural networks: Line neutralization, tilt aftereffect, and angle expansion. Journal of Theoretical Biology, 1976, 61,477504. Logan, B.F. Jr., Information in the zero-crossings of bandpass signals. Bell System Technical Journal, 1977, 56, 487-510. Luneberg, R.K., Mathematical analysis of binocular vision. Princeton, NJ.: Princeton University Press, 1947. Luneberg, R.K., The metric of binocular visual space. Journal of the Optical Society of America, 1950,60,637-642. McCourt, M.E., A spatial frequency dependent grating-induction effect. Vision Research, 1982, 22, 119-134. Marr, D.,The computation of lightness by the primate retina. Vision Research, 1974, 14, 1377. Marr, D., Early processing of visual information. Philosophical Transactions of the Royal Society of London B, 1976, 275, 483-524. Marr, D., Artificial intelligence-a personal view. Artificial Intelligence, 1977,9,37-48. Marr, D., Representing visual information. Lectures on Mathematics in the Life Sciences, 1978, 10, 101-180. Marr, D., Vision: A computational investigation into the h u m a n representation and processing of visual information. San Francisco: W.H. Freeman, 1982. Marr, D. and Hildreth, E., Theory of edge detection. Proceedings of the Royal Society of London (B), 1980, 207, 187-217. Marr, D. and Poggio, T., Cooperative computation of stereo disparity. Science, 1976, 194, 283-287. Marr, D. and Poggio, T., From understanding computation to understanding neural circuitry. Neurosciences Research Progress Bulletin, 1977,15, 470-488. Marr, D. and Poggio, T., A computational theory of human stereo vision. Proceedings of the Royal Society of London B, 1979, 204, 301-328. Maudarbocus, A.Y. and Ruddock, K.H.,Non-linearity of visual signals in relation to shape-sensitive adaptation processes. Vision Research, 1973, 13, 1713-1737. Mayhew, J.E.W. and Frisby, J.P., Psychophysical and computational studies towards a theory of human stereopsis. Artificial Intelligence, 1981,17, 349-385.
Chopter I
16
Miller, R.F., The neuronal basis of ganglion-re11 receptive-field organization and the physiology of amacrine cells. In F.O. Schmitt (Ed.), The neuroscience fourth study program. Cambridge, MA: MIT Press, 1979. Minor, A.V., Flerova, G.I., and Byzov, A.L., Integral evoked potentials of single neurons in the frog olfactory blub (in Russian). Neurophysiologica, 1969, 1, 269-278. Mori, T., Apparent motion path composed of a serial concatenation of translations and rotations. Biological Cybernetics, 1982, 44, 31-34. Nachmias, J. and Kocher, E.C., Visual detection and discrimination of luminance increments. Journal of the Optical Society of America, 1970, 00, 382-389. Newell, A., Harpy, production systems, and human cognition. In R. Cole (Ed.), Perception and production of fluent speech. Hillsdale, NJ: Erlbaum, 1980. O’Brien, V., Contour perception, illusion and reality. Journal of the Optical Society of America, 1958, 48, 112-119. Osgood, C.E., Suci, G.J., and Tannenbaum, P.H., T h e measurement of meaning. Urbana: University of Illinois, 1957. Poggio, T., Neurons sensitive to random-dot stereograms in areas 17 and 18 of the rhesus monkey. Society for Neuroscience Abstracts, 1980, 0. Poggio, T., Trigger features or Fourier analysis in early vision: A new point of view. In D. Albrecht (Ed.), The recognition of p a t t e r n a n d form, Lecture Notes in Biomathematics. New York: Springer-Verlag, 1982, 44, 88 -99. Pollen, D.A. and Ronner, S.F., Phase relationships between adjacent simple cells in the visual cortex. Science, 1981, 212, 1409-1411. Pollen, D.A., Spatial computation performed by simple and complex cells in the visual cortex of the cat. Vision Research, 1982, 22, 101-118. Pulliam, K., Spatial frequency analysis of three-dimensional vision. Proceedings of the Society of Photo-Optical Instrumentation Engineers, 1981, 303, 71-77. Raaijmakers, J.G.W. and Shiffrin, R.M., Search of associative memory. Psychological Review, 1981, 88,93-134. Rall, W., Core conductor theory and cable properties of neurons. In E.R. Kandel (Ed.), Handbook of physiology: T h e nervous system, Vol. 1, Part 1. Bethesda, MD: American Physiological Society, 1977. Rashevsky, N., Mathematical biophysics. Chicago: University of Chicago Press, 1968.
Ratliff, F., Mach bands: Quantitative studies on neural networks in the retina. New York: Holden-Day, 1965. Rauschecker, J.P.J., Campbell, F.W., and Atkinson, J., Colour opponent neurones in the human visual system. Nature, 1973, 245, 42-45. Restle, F., Mathematical models in psychology. Baltimore. MD: Penguin Books, 1971.
Richards, W., Visual space perception. In E.C. Carterette and M.P. Friedman (Eds.), Handbook of perception, Vol. 5: Seeing. New York: Academic Press, 1975. Richards, W. and Marr, D., Computational algorithms for visual processing. MIT Artificial Intelligence Lab, 1981. Richards, W. and Miller, J.F. Jr., The corridor illusion. Perception and Psychophysics, 1971, 9,421-423.
Richter, J. and Ullman, S., A model for the temporal organization of X- and Y-type receptive fields in the primate retina. Biological Cybernetics, 1982, 43, 127-145. Robson, J.G., Receptive fields: Neural representation of the spatial and intensive attributes of the visual image. In E.C. Carterette and M.P. Friedman (Eds.), Handbook of perception, Vol. 5: Seeing. New York: Academic Press, 1975.
n2e Quantized Geomeuy of Visual Space
77
Robson, J.G. and Graham, N., Probability slimmation and regional variation in contrast sensitivity across the visual field. Vision Research, 1981,21, 409-418. Rock, I., In defense of unconscious inference. In W. Epstein (Ed.), Stability and constancy in visual perception. New York: Wiley and Sons, 1977. Rodicck, R.W. and Stone, J., Analysis of receptive fields of cat retinal ganglion cells. Journal of Neurophysiology, 1965,28, 833-849. Rozental, S. (Ed.), Niels Bohr. New York: Wiley and Sons, 1967. Rushton, W.A., Visual adaptation: The Ferrier lecture, 1962. Proceedings of the Royal Society of London B, 1965,162,20-46. Sakata, H., Mechanism of Craik-O’Brien effect. Vision Research, 1981,21, 693-699. Schriever, W.,Experimentelle studien uber stereokopische sehen. Zeitschrift fuer Psychologie, 1925,96, 113-170. Schrijdinger, E., Miiller-Pouillets Lehrbuch d e r Physik 11. Auflage, Zweiter Band. Braunschweig. Schwartz, E.L.,Computational anatomy and functional architecture of striate cortex: A spatial mapping approach to perceptual coding. Vision Research, 1980,20, 645669. Sekuler, R., Visual motion perception. In E.C. Carterette and M.P. Friedman (Eds.), H a n d b o o k of perception, Vol. 5: Seeing. New York: Academic Press, 1975. Shepard, R.N., Multidimensional scaling, tree-fitting, and clustering. Science, 1980, 210, 390-398. Shepard, R.N. and Chipman, S., Second-order isomorphism of internal representations: Shapes of states. Cognitive Psychology, 1970,1, 1-17. Shepard, R.N. and Metzler, J., Mental rotation of three-dimensional objects. Science, 1971, 171, 701-703. Shepherd, G.M., Synaptic organization of the mammalian olfactory bulb. Physiological Review, 1972,52, 864-917. Shipley, T., Visual contours in homogeneous space. Science, 1965,150, 348-350. Singer, W., The role of attention in developmental plasticity. Human Neurobiology, 1982,1, 41-43. Smith, A.T. and Over, R.,Motion aftereffect with subjective contours. Perception and Psychophysics, 1979,25, 95-98. Sperling, G., Binocular vision: A physical and a neural theory. American Journal of Psychology, 1970,85, 461-534. Sperling, G., Mathematical models of binocular vision. In S. Grossberg (Ed.), Mathematical psychology and psychophysiology. Providence, RI: American Mathematical Society, 1981. Sperling, G. and Sondhi, M.M., Model for visual luminance discrimination and flicker detection. Journal of the Optical Society of America, 1968,58, 1133-1145. Stevens, S.S.,The quantification of sensation. Daedalus, 1959,88,606-621. Stromeyer, C.F. I11 and Mansfield, R.J.W., Colored after-effects produced with moving edges. Perception and Psychophysics, 1970, 7,108-114. Swets, J.A., Is there a sensory threshold? Science, 1961,134, 168-177. Tschermak-Seysenegg, A. von, Introduction to physiological optics, P. Boeder (Trans.). Springfield, IL: C.C. Thomas, 1952. Tynan, P. and Sekuler, R., Moving visual phantoms: A new contour completion effect. Science, 1975,188, 951-952. Uttal, W., The psychobiology of sensory coding. New York: Harper and Row, 1973.
78
Chapter I
van den Brink, G. and Keemink, C.J., Luminance gradients and edge effects. Vision Research, 1976,16,155-159. van Nes, F.L., Experimental studies i n spatio-temporal contrast transfer by the human eye. Utrecht: University, 1968. van Nes, F.L. and Bouman, M.A., The effects of wavelength and luminance on visual modulation transfer. Excerpta Medica International Congress Series, 1965, 125, 183-192. van Tuijl, H.F.J.M. and Leeuwenberg, E.L.J., Neon color spreading and structural information measures. Perception and Psychophysics, 1979,25, 269-284. von BCkCsy, G., Mach-and Hering-type lateral inhibition in vision. Vision Research, 1968,8,1483-1499. Wallach, H. and Adams, P.A., Binocular rivalry of achromatic colors. American Journal of Psychofogy, 1954,07, 513-516. Watson, A.S., A Riemann geometric explanation of the visual illusions and figural aftereffects. In E.C.J. Leeuwenberg and H.F.J.M. Buffart (Eds.), Formal theories of visual perception. New York: Wiley and Sons, 1978. Weisstein, N., The joy of Fourier analysis. In C.S. Harris (Ed.), Visual coding and adaptability. Hillsdale, NJ: Erlbaum, 1980. Weisstein, N. and Harris, C.S., Masking and the unmasking of distributed representations in the visual system In C.S. Harris (Ed.), Visual coding and adaptability. Hillsdale, NJ: Erlbaum, 1980. Weisstein, N., Harris, C.S., Berbaum, K., Tangney, J., and Williams, A., Contrast reduction by small localized stimuli: Extensive spatial spread of above-threshold orientation-selective masking. Vision Research, 1977, 17, 341-350. Weisstein, N. and Maguire, W., Computing the next step: Psychophysical measures of representation and interpretation. In E. Riseman and A. Hanson (Eds.), Computer vision systems. New York: Academic Press, 1978. Weisstein, N., Maguire, W., and Berbaum, K., Visual phantoms produced by moving subjective contours generate a motion aftereffect. Bulletin of the Psychonomic Society, 1976, 8,240 (abstract). Weisstein, N., Maguire, W., and Berbaum, K., A phantom-motion aftereffect. Science, 1977,198, 955-998. Weisstein, N., Maguire, W., and Williams, M.C., Moving phantom contours and the phantom-motion aftereffect vary with perceived depth. Bulletin of the Psychonomic Society, 1978,12,248 (abstract). Weisstein, N., Matthews, M., and Berbaum, K., Illusory contours can mask real contours. Bulletin of the Psychonornic Society, 1974,4,266 (abstract). Werblin, F.S., Adaptation in a vertebrate retina: Intracellular recordings in Necturus. Journal of Neurophysiology, 1971,34,228-241. Werner, H., Dynamics in binocular depth perception. Psychological Monograph (whole no. 218), 1937. Wilson, H.R., A transducer function for threshold and suprathreshold human vision. Biological Cybernetics, 1980,38, 171-178. Wilson, H.R.and Bergen, J.R.,A four-mechanism model for spatial vision. Vision Research, 1979,19, 19-32. Wilson, H.R. and Cowan, J.D., Excitatory and inhibitory interactions in localized populations of model neurons. Biophysical Journal, 1972,12, 1-24. Wilson, H.R. and Cowan, J.D., A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue. Kybernetik, 1973,13, 55-80.
The Quantized Geometry of Visual Space
79
Winston, P.H., MIT Progress in understanding images. Proceedings: Trnnge TJnderstanding Workshop, Palo Alto, California, 1979,25--36. Wyatt, H.J.and Daw, N.W., Directionally sensitive ganglion cells in the rabbit retina: Specificity for stimulus direction, size, and speed. Journal of -Veurophysiology,1975, 38,613-626. Zucker, S.W., Motion and the Mueller-Lyer illusion. McGili University Department of Electrical Engineering Technical Report 80-2R,1980.
80
Chapter 2
NEURAL DYNAMICS OF FORM PERCEPTION: BOITNDARY COMPLETION, ILLUSORY FIGURES, AND NEON COLOR SPREADING Preface This Chapter illustrates our belief that the rules for form and color processing can best be understood by considering how these two types of processes interact. We suggest that form and color and handled by two parallel contour-extracting systems: The Boundary Contour System detects, sharpens, and completes boundaries. The Feature Contour System generates the color and brightness signals which elicit featural filling-in within these boundaries. Our analysis of these systems leads to several revolutionary conclusions, whose paradoxical nature is most clearly perceived when they are expressed unabashedly without technical caveats or interpretations. These conclusions include: All boundaries are invisible. All line ends are illusory. Boundaries are formed discontinuously. Such conclusions arise from an analysis of visual perception which provides simple, if as yet incomplete, answers to the following types of questions: How do we rccognize emergent groupings without necessarily eeeing contrasts that correspond to these groupings? How can boundaries be formed preattentively, yet be influenced by attention and learned information? How can local features initiate the organization of a percept, yet often be overruled by global configural properties that determine the final percept? How can early stages in boundary formation be sensitive to local image contrasts, yet the final boundary configuration possess structural, coherent, and hysteretic properties which can persist despite significant changes in local image contrasts? In order to understand such issues,we have come to realize that the visual system trades-off several problems against one another. Indeed, the visual system provides excellent examples of how individual neural subsystems, by needing to be specialized to deal with part of an adaptive problem, cannot have complete information about the problem as a whole, yet the interactions between these subsystems are so cleverly contrived that the system as a whole can synthesize a globally consistent solution to the problem. We call one of the key trade-offs the Boundary-Feature Trade-Ofl. A study of this trade-off reveals that several basic uncertainty principles limit the information which particular visual processing stages can, in principle, compute. The visual system does not, however, succumb to these uncertainties. Instead, later processing stages are designed t o overcome them. One such uncertainty principle concerns how the visual system discounts the illuminant. In order to do so, it extracts color edges at an early processing stage. To recapture the veridical colors that lie between these color edges, it uses the color edges to fill-in color interiors at a later processing stage. In order to contain this featural filling-in process, the visual system uses cells with oriented receptive fields to detect local boundary contrasts. Such oriented cells cannot, however, detect line ends and corners. “Orientational” certainty thus implies a type of “positional” uncertainty. A later processing stage completes the boundaries at line ends and corners to prevent colors from flowing out of them. Often boundaries need to be completed over scenic regions that do not contain local image contrasts. Fuzzy bands of orientations cooperate across these regions to initiate the completion of these intervening boundaries. The final perceptual boundary is, however, sharp, not fuzzy. We explain how feedback interactions with the next level of processing eliminate this type of orientational uncertainty.
Neural Dynaniics of Form Perception
81
The circuit within the Boundary Contour System which completes sharp and coherent boundaries is a specialized type of cooperative-competitive feedback loop, which we have named the CC Loop. The featural filling-in process within the Feature Contour System does not possess coherent properties of this kind. Rather, it obeys a system of nonlinear diffusion equations which are capable of averaging featural qualities within each boundarycompartment. Thus, unlike the FIRE theory of Chapter 1, in which a single edge-driven process controls form and feat,ural filling-in, the present theory suggests that a pair of parallel edge-driven processes exist, and only the boundary completion process is a cooperative-competitive feedback network. The successes of these two theories in explaining their respective data bases therefore raises the burning question: How can they be unified into a single visual theory? When the present theory was being constructed from an analysis of perceptual data, the relevant neural data base was spotty at best. Within a year of our first publications in 1983 and 1984, striking support for the theory was reported in both neural and further perceptual experiments. We consider the 1984 data of von der Heydt, Peterhans, and Baumgartner, which we summarize herein, to be particularly important, because it seems to confirm the fact that the visual system compensates for the positional uncertainty caused by orientational tuning in area 17 of the visual by completing line ends at the next processing stage in area 18 of the visual
PsyrholoEical Review 92, 173-211 (1985) 01985 American Psychological Association, Inc. Reprinted by permission of the publisher
82
NEURAL DYNAMICS OF FORM PERCEPTION: BOUNDARY COMPLETION, ILLUSORY FIGURES, A N D NEON COLOR SPREADING
Stephen Grossbergt and Ennio Mingollat
Abstract A real-time visual processing theory is used to analyse real and illusory contour formation, contour and brightness interactions, neon color spreading, complementary color induction, and filling-in of discounted illuminants and scotomas. The theory also physically interprets and generalizes Land’s retinex theory. These phenomena are traced to adaptive processes that overcome limitations of visual uptake to synthesize informative visual representations of the external world. Two parallel contour sensitive processes interact to generate the theory’s brightness, color, and form estimates. A boundary eontour process is sensitive to orientation and amount of contrast but not to direction of contrast in scenic edges. It synthesizes boundaries sensitive to the global configuration of scenic elements. A feature eontout process is insensitive to orientation but sensitive to both amount of contrast and to direction of contrast in scenic edges. It triggers a diffusive filling-in of featural quality within perceptual domains whose boundaries are determined by completed boundary contours. The boundary contour process is hypothesized to include cortical interactions initiated by hypercolumns in Area 17 of the visual cortex. The feature contour process is hypothesized to include cortical interactions initiated by the cytochrome oxydase staining blobs in Area 17. Relevant data from striate and prestriate visual cortex, including data that support two predictions, are reviewed. Implications for other perceptual theories and axioms of geometry are discussed.
t Supported in part by the Air Force Office of Scientific Research (AFOSR 82-0148) and the Office of Naval Research (ONR N00014-83-K0337). $ Supported in part by the Air Force Office of Scientific Research (AFOSR 82-0148).
Ma& Dynamics of Form Perception
83
1. Illiisions a s a P r o b e of Adaptive Visual Mechanisms A fundamental goal of visual science is to explain how an unambiguous global visual representation is synthesized in response to ambiguous local visual cues. The difficulty of this problem is illustrated by two recurrent themes in visual perception: Human observers often do not see images that are retinally present, and they often do see images that are not retinally present. A huge data base concerning visual illusions amply illustrates the complex and often paradoxical relationship beteen scenic image and visual percept. That paradoxical data abound in the field of visual perception becomes more understandable through a consideration of how visual information is acquired. For example, light passes through retinal veins before it reaches retinal photoreceptors, and light does not influence the retinal regions corresponding to the blind spot or retinal scotomas. The percepts of human observers are not distorted, however, by their retinal veins or blind spots during normal viewing conditions. Thus some images that are retinally present are not perceived because our visual processes are adaptively designed to free our percepts from imperfections of the visual uptake process. The same adaptive mechanisms that can free our percepts from images of retinal veins can also generate paradoxical percepts, as during the perception of stabilized images (Krauskopf, 1963; Pritchard, 1961; Pritchard, Heron, and Hebb, 1970; Riggs, Ratliff, Cornsweet, and Cornsweet, 1953; Yarbus, 1967). The same adaptive mechanisms that can compensate for the blind spot and certain scotomas can also genrate paradoxical percepts, as during filling-in reactions of one sort or another (Arend, Buehler, and Lockhead, 1971; Gellatly, 1980; Gerrits, de Hann, and Vendrick, 1966; Gerrits and Timmermann, 1969; Gerrits and Vendrick, 1970; Kanizsa, 1974; Kennedy, 1978, 1979, 1981; Redies and Spillmann, 1981; van TuijI, 1975; van Tuijl and de Weert, 1979; van Tuijl and Leeuwenberg, 1979; Yarbus, 1967). These examples illustrate the general theme that many paradoxical percepts may be expressions of adaptive brain designs aimed at achieving informative visual representations of the external world. For this reason, paradoxical percepts may be used as probes and tests of the mechanisms that are hypothesized to instantiate these adaptive brain designs. The present article makes particular use of data about illusory figures (Gellatly, 1980; Kanizsa, 1974; Kennedy, 1978, 1979, 1981; Parks, 1980; Parks and Marks, 1983; Petry, Harbeck, Conway, and Levey, 1983) and about neon color spreading (Redies and Spillmann, 1981; van Tuijl, 1975; van Tuijl and de Weert, 1979; van Tuijl and Leeuwenberg, 1979) to refine the adaptive designs and mechanisms of a real-time visual processing theory that is aimed at predicting and explaining data about depth, brightness, color, and form perception (Carpenter and Grossberg, 1981, 1983; Cohen and Grossberg, 1983, 1984a, 1984b; Grossberg, 1981, 1983a, 1983b, 1984a; Grossberg and Cohen, 1984; Mingolla and Grossberg, 1984). As in every theory about adaptive behavior, it is necessary to specify precisely the sense in which its targeted data are adaptive without falling into logically circular arguments. In the present work, this specification takes the form of a new perceptual processing principle, which we call the boundary-jeature trade-of. The need for such a principle can begin to be seen by considering how the perceptual system can generate behaviorally effective internal representations that compensate for several imperfections of the retinal image. 2. From Noisy Retina to Coherent Percept
Suppressing the percept of stabilized retinal veins is far from sufficient to generate a usable percept. The veins may occlude and segment scenic images in several places. Even a single scenic edge can be broken into several disjoint components. Somehow in the final percept, broken retinal edges are completed and occluded retinal color and brightness signals are filled-in. These completed and filled-in percepts are, in a strict mechanistic sense, illusory percepts.
84
Chapter 2
Observers are often not aware of which parts of a perceived edge are ‘real” and which are “illusory.” This fact clarifies why data about illusory figures are so important for discovering the mechanisms of form perception. This fact also points to one of the most fascinating properties of visual percepts. Although many percepts are, in a strict mechanistic sense, ”illusory” percepts, they are often much more veridical, or “real,” than the retinal data from which they are synthesized. This observation clarifies a sense in which each of the antipodal philosophical positions of realism and idealism is both correct and incorrect, as is often the case with deep but partial insights. The example of the retinal veins suggests that two types of perceptual process, boundary completion and featural filling-in, work together to synthesize a final percept. In such a vague form, this distinction generates little conceptual momentum with which to build a theory. Data about the perception of artificially Stabilized images provide further clues. The classical experiments of Krauskopf (1963)and Yarbus (1967)show that if certain scenic edges are artificially stabilized with respect to the retina, then colors and brightnesses that were previously bounded by these edges are seen to flow across, or fill-in, the percept until they are contained by the next scenic boundary. Such data suggest that the processes of boundary completion and featural filling-in can be dissociated. The boundary-feature trade-off makes precise the sense in which either of these processes, by itself, is insufficient to generate a final percept. Boundary-feature trade-off also suggests that the rules governing either process can only be discovered by studying how the two processes interact. This is true because each system is designed to offset insufficiencies of the other system. In particular, the process of boundary completion, by itself, could at best generate a world of outlines or cartoons. The process of featural filling-in, by itself, could at best generate a world of formless brightness and color qualities. Our theory goes further to suggest the more radical conclusion that the process of boundary completion, by itself, would generate a world of invisible outlines, and the process of featural filling-in, by itself, would generate a world of invisiblefeatural qualities. This conclusion follows from the realization that an early stage of both boundary processing and of feature processing consists of the extraction of different types of contour information. These two contour-extracting processes take place in parallel, before their results are reintegrated at a later processing stage. Previous perceptual theories have not clearly separated these two contour-extracting systems. One reason for this omission is that, although each scenic edge can activate both the boundary contour system and the feature contour system, only the net effect of their interaction at a later stage is perceived. Another reason is that the completed boundries, by themselves, are not visible. They gain visibility by restricting featural filling-in and thereby causing featural contrast differences across the perceptual space. The ecological basis for these conclusions becomes clearer by considering data about stabilized images (Yarbus, 1967) alongside data about brightness and color perception (Land, 1977). These latter data can be approached by considering another ambiguity in the optical input to the retina. The visual world is typically viewed in inhomogeneous lighting conditions. The scenic luminances that reach the retina thus confound variable lighting conditions with invariant object colors. It has long been known that the brain somehow “discounts the illuminant” in order to generate percepts whose colors are more veridical than those in the retinal image (Helmholtz, 1962). The studies of Land (1977)have refined this insight by showing that the perceived colors within a picture constructed from overlapping patches of color are determined by the relative contrasts at the edges between successive patches. Lighting conditions can differ considerably as one moves across each colored patch. At each patch boundary, lighting conditions typically change very little. A measure of relative featural contrast across such a boundary therefore provides a good local estimate of object reflectances. Land’s results about discounting the illuminant suggest that an early stage of the
Neural Dynamics of Form Perception
85
featrural extraction process consists in computing featural contrasts at scenic edges. Data such as that of Yarbus (1967), which show that boundaries and features can be dissociated, then suggest that the extraction of feature contour and boundary contour information are two separate processes. The Land (1977) data also support the concept of a featural filling-in process. Discounting the illurninant amounts to suppressing the color signals from within the color patches. All that remains are nondiscounted feature contrasts at the patch boundaries. Without featural filling-in, we would perceive a world of colored edges, instead of a world of extended forms. The present theory provides a physical interpretation and generalization of the Land retinex theory of brightness and color perception (Grossberg, 1984a), including an explanation of how we can see extended color domains. This explanation is summarized in Section 18. Our theory can be understood entirely as a perceptual processing theory. As its perceptual constructs developed, however, they began to exhibit striking formal similarities with recent neural data. Some of these neural analogs are summarized in Table 1 below. Moreover, two of the theory’s predictions about the process of boundary completion have recently received experimental support from recordings by von der Heydt, Peterhans, and Baumgartner (1984) on cells in Area 18 of the monkey visual cortex. Neurophysiological linkages and predictions of the theory are more completely described in Section 20. Due to the existence of this neural interpretation, the formal nodes in the model network are called cells throughout the article. 3. B o u n d a r y Contour System a n d Feature Contour System
Our theory claims that two distinct types of edge, or contour, computations are carried out within parallel systems during brightness, color, and form perception (Grossberg, 1983a, 1983b, 1984a). These systems are called the boundary contour system (BCS) and the feature contour system (FCS). Boundary contour signals are used to generate perceptual boundaries, both “real” and “illusory.” Feature contour signals trigger the filling-in processes whereby brightnesses and colors spread until they either hit their first boundary contours or are attenuated due to their spatial spread. Boundary contours are not, in isolation, visible. They gain visibility by restricting the filling-in that is triggered by feature contour signals and thereby causing featural contrasts across perceptual space. These two systems obey different rules. We will summarize the main rules before using them to explain paradoxical visual data. Then we will explain how these rules can be understood as consequences of boundary-feature trade-off. 4. Boundary Contours and Boundary Completion
The process whereby boundary contours are built up is initiated by the activation of oriented masks, or elongated receptive fields, at each position of perceptual space (Hubel and Wiesel, 1977). An oriented mask is a cell, or cell population, that is selectively responsive to scenic edges. Each mask is sensitive to scenic edges that activate a prescribed small region of the retina, if the edge orientations lie within a prescribed band of orientations with respect to the retina. A family of such oriented masks exists at every network position, such that each mask is sensitive to a different band of edge orientations within its prescribed small region of the scene. Orientation and Contrast The output signals from the oriented masks are sensitive to the orientation and to the amount of contrast, but not to the direction of contrast, at an edge of a visual scene. A vertical boundary contour can thus be activated by either a close-to-vertical darklight edge or a close-tevertical light-dark edge at a fixed scenic position. The process whereby two like-oriented masks that are sensitive to direction of contrast at the same
86
Chapter 2
perceptual location give rise to an output signal that is not sensitive to direction of contrast is designated by a plus sign in Figure la. Short-Range Competition The outputs from these masks activate two successive stages of short-range competition that obey different rules of interaction. 1. The cells that react to output signals due to like-oriented masks compete between nearby perceptual locations (Figure Ib). Thus, a mask of fixed orientation excites the like-oriented cells at its location and inhibits the like-oriented cells at nearby locations. In other words, an on-center off-surround organization of like-oriented cell interactions exists around each perceptual location. It may be that these spatial interactions form part of the network whereby the masks acquire their orientational specificity during development. This possibility is not considered in this article. 2. The outputs from this competitive stage input to the next competitive stage. Here, cells compete that represent perpendicular orientations at the same perceptual location (Figure lc). This competition defines a push-pull opponent process. If a given orientation is inhibited, then its perpendicular orientation is disinhibited. In summary, a stage of competition between like orientations at different, but nearby, positions is followed by a stage of competition between perpendicular orientations at the same position. Long-Range Oriented Cooperation a n d Boundary Completion The outputs from the second competitive stage input to a spatially long-range cooperative process. We call this process the boundary completion process. Outputs due to like-oriented masks that are approximately aligned across perceptual space can cooperate via this process to synthesize an intervening boundary. We show how both “real” and “illusory” boundaries can be generated by this boundary completion process. The following two demonstrations illustrate a boundary completion process with the above properties of orientation and contrast, short-range competition, and long-range cooperation and boundary completion. In Figure 2a, four black pac-man figures are arranged at the vertices of an imaginary square on a white background. The famous illusory Kanizsa (1974) square can then be seen. The same is true when two pacman figures are black, the other two are white, and the background is grey, as in Figure 2b. The black pac-man figures form dark-light edges with respect to the grey background. The white pac-man figures form light-dark edges with the grey background. The visibility of illusory edges around the illusory square shows that a process exists that is capable of completing boundaries between edges with opposite directions of contrast. The boundary completion process is thus sensitive to orientational alignment across perceptual space and to amount of contrast, but not to direction of contrast. Another simple demonstration of these boundary completing properties can be constructed as follows. Divide a square into two equal rectangles along an imaginary boundary. Color one rectangle a uniform shade of grey. Color the other rectangle in shades of grey that progress from light to dark as one moves from end 1 of the rectangle to end 2 of the rectangle. Color end 1 a lighter shade than the uniform grey of the other rectangle, and color end 2 a darker shade than the uniform grey of the other rectangle. As one moves from end 1 to end 2, an intermediate grey region is passed whose luminance approximately equals that of the uniform rectangle. At end 1, a light-dark edge exists from the nonuniform rectangle to the uniform rectangle. At end 2, a dark-light edge exists from the nonuniform rectangle to the uniform rectangle. An observer can see an illusory edge that joins the two edges of opposite contrast and separates the intermediate rectangle region of equal luminance. Although this boundary completion process may seem paradoxical when its effects are seen in Kanizsa squares, we hypothesize that this process is also used to complete boundaries across retinal scotomas, across the faded images of stabilized retinal veins,
Neural Dynamics of Form Perception
t n n
n
n
Figure 1. (a) Boundary contour signals sensitive to the orientation and amount of contrast at a scenic edge, but not to its direction of contrast. (b) Like orientations compete at nearby perceptual locations. (c) Different orientations compete at each perceptual location. (d) Once activated, aligned orientations can cooperate across a larger visual domain to form “real” and “illusory” contours.
88
Chapter 2
Figure 2. (a) Illusory Kanizsa square induced by four black pac-man figures. (From “Subjective Contours” by G. Kanizsa, 1976, Scientific American, 234, p:Sl. Copyright 1976 by Scientific American, Inc. Adapted by permission.) (b) An illusory square induced by two black and two white pac-man figures on a grey background. Illusory contours can thus join edges with opposite directions of contrast. (This effect may be weakened by the photographic reproduction process.)
Neural Dynamics of Form Perception
89
and between all perceptual domains that are separated by sharp brightness or color differences. Binocular Matching A monocular boundary contour can be generated when a single eye views a scene. When two eyes view a scene, a binocular interaction can occur between outputs from oriented masks that respond to the same retinal positions of the two eyes. This interaction leads to binocular competition between perpendicular orientations at each position. This competition takes place at, or before, the competitive stage. Although binocular interactions occur within the boundary contour system they will not be needed to explain this article’s targeted data. Boundary contours are like frames without pictures. The pictorial data themselves are derived from the feature contour system. We suggest that the same visual source inputs in parallel to both the boundary contour system and the feature contour system, and that the outputs of both types of processes interact in a context-sensitive way at a later stage. 5. Feature Contours and Diffusive Filling-In
The feature contour process obeys different rules of contrast than does the boundary contour process. Contrast The feature-contour process is insensitive to the orientation of contrast in a scenic edge, but it is sensitive to both the direction of contrast as well as to the amount of contrast, unlike the boundary contour process. Speaking intuitively, in order to compute the relative brightness across a scenic boundary, it is necessary to keep track of which side of the scenic boundary has a larger reflectance. Sensitivity to direction of contrast is also used to determine which side of a red-green scenic boundary is red and which is green. Due to its sensitivity to the amount of contrast, feature contour signals discount the illuminant. We envision that three parallel channels of double-opponent feature contour signals exist: light-dark, red-green, and blue-yellow (Boynton, 1975; DeValois and DeValois, 1975; Mollon and Sharpe, 1983). These double-opponent cells are replicated in multiple cellular fields that are maximally sensitive to different spatial frequencies (Graham, 1981; Graham and Nachmias, 1971). Both of these processing requirements are satisfied in a network that is called a gotcd dipole field Grossberg, 1980, 1982). The detailed properties of double-opponent gated dipole fie ds are not needed in this article. Hence they are not discussed further. A variant of the gated dipole field design is, however, used to instantiate the boundary contour system in Section 15. The feature contour process also obeys different rules of spatial interaction than those governing the boundary contour process. Diffusive Filling-In Boundary contours activate a boundary completion process that synthesizes the boundaries which define monocular perceptual domains. Feature contours activate a diffusive filling-in process that spreads featural qualities, such as brightness or color, across these perceptual domains. Figure 3 depicts the main properties of this filling-in process. We assume that featural filling-in occurs within a syncytium of cell compartments. By a syncytium of cells, we mean a regular array of intimately connected cells such that contiguous cells can easily pass signals between each other’s compartment membranes. A feature contour input signal to a cell of the syncytium activates that cell. Due to the syncytial coupling of this cell with its neighbors, the activity can rapidly spread to neighboring cells, then to neighbors of the neighbors, and so on. Because the spreading occurs via a diffusion of activity (Cohen and Grossberg, 1984b; Grossberg, 1984a), it
I
Chapter 2
90
m
BOUNDARY CONTOUR SIGNALS
-------++-- ---- ---- --,+--
coMp#RTAIIENT
01FFUSION
FEATURE CONTOUR SIGNALS Figure 3. Monocular brightness and color stage domain (MBC). Monocular feature contour signals activate cell compartments that permit rapid lateral diffusion of activity, or potential, across their compartmental boundaries, except at those compartment boundaries that receive boundary contour signals from the BCS stage of Figure 4. Consequently, the feature contour signals are smoothed except at boundaries that are completed within the BCS stage. tends to average the activity that is triggered by a feature contour input signal across the cells that receive this spreading activity. This averaging of activity spreads across the syncytium with a space constant that depends on the electrical properties of both the cell interiors and their membranes. The electrical properties of the cell membranes can be altered by boundary contour signals in the following way. A boundary contour signal is assumed to decrease the diffusion constant of its target cell membranes within the cell syncytium. It does so by acting as an inhibitory gating signal that causes an increase in cell membrane resistance. A boundary contour signal hereby creates a barrier to the filling-in process at its target cells. This diffusive filling-in reaction is hypothesized to instantiate featural filling-in over retinal scotomas, over the faded images of stabilized retinal veins, and over the illuminants that are discounted by feature contour preproc9ssing. Three types of spatial interaction are implied by this description of the feature contour system: (a) Spatial frequency preprocessing: feature contour signals arise as the outputs of several double-opponent networks whose different receptive field sizes make them maximally sensitive to different spatial frequencies. (b) Diffusive filling-in: feature contour signals within each spatial scale then cause activity to spread across
Neural Dynamics of Form Perception
91
the scale cell's syncytium. This filling-in process has its own diffusive bandwidth. (c) Figural boundaries: boundary contour signals define the limits of featural filling-in. Boundary contours are sensitive to the configuration of all edges in a scene, rather than t,o any single receptive field size. Previous perceptual theories have tended to focus on one or another of these factors, but not on their interactive properties. 6. M a c r o e i r c u i t of Proeessiiig S t a g e s Figure 4 describes a macrocircuit of processing stages into which the microstages of the boundary contour system and feature contour system can be embedded. The processes described by this macrocircuit were introduced to explain how global properties of depth, brightness, and form information can be generated from monocularly and binocularly viewed patterns (Grossberg, l983b, 1984a). Table 1 lists the full names of the abbreviated macrocircuit stages, as well as their neural interpretation. Each monocular preprocessing (MP) st>ageMPL and MPR can generate inputs, in parallel, to its boundary contour system and its feature contour system. The pathway MPL -+ BCS carries inputs to the left-monocular boundary contour system. The pathway MPL -+ MBCL carries inputs to the left-monocular feature contour system. Only after all the stages of scale-specific, orientation-specific, contrast-specific, competitive, and cooperative interactions take place within the BCS stage, as in Section 4, does this stage give rise t o boundary contour signals B C S 4 MBCL that act as barriers to the diffusive filling-in triggered by MPL + MBCL feature contour signals, a6 in Section 5. The divergence of the pathways MPL MBCL and MPL 4 BCS allows the boundary contour system and the feature contour system to be processed according to their different rules before their signals recombine within the cell syncytia. -+
7. Neon Color S p r e a d i n g and C o m p l e m e n t a r y Color I n d u c t i o n The phenomenon of neon color spreading illustrates the existence of boundary contours and of feature contours in a vivid way. Redies and Spillmann (1981), for example, reported an experiment using a solid red cross and an Ehrenstein figure. When the solid red cross is perceived in isolation, it looks quite uninteresting (Figure 5a). When an Ehrenstein figure is perceived in isolation, it generates an illusory contour whose shape (e.g., circle or diamond) depends on the viewing distance. When the red cross is placed inside the Ehrenstein figure, the red color flows out of its containing contours and tends to fill the illusory figure (Figure 5b). Our explanation of this percept uses all of the rules that we listed. We suggest that vertical boundary contours of the Ehrenstein figure inhibit contiguous boundary contours of like orientation within the red cross. This property uses the orientation and contrast sensitivity of boundary masks (Figure la) and their ability to inhibit likeoriented nearby cells, irrespective of direction of contrast (Figures l a and lb). This inhibitory action within the BCS does not prevent the processing of feature contour signals from stage MPL to stsageMBCL and from stage MPR to stage MBCR, because boundary contour signals and feature contour signals are received by MBCL and MBCR despite the fact that some of their corresponding boundary contour signals are inhibited within the BCS stage. The inhibition of these boundary contour signals within the BCS stage allows the red featural activity to diffuse outside of the red cross. The illusory boundary contour that is induced by the Ehrenstein figure restricts the diffusion of this red-labeled activation. Thus during neon color spreading, one can "see" the difference between boundary contours and feature contours, as well as the role of illusory boundary contours in restricting the diffusion of featural activity. In Figure 5b, the illusory boundary induced
92
A
L
Figure 4. Macrocircuit of processing stages. Table 1 lists the functional names of the abbreviated stages and indicates a plausible neural interpretation of these stages. Boundary contour formation is assumed to occur within the BCS stage. Its output signals to the monocular MBCL and MBCR stages define boundaries within which feature contour signals from MPL and MPR, respectively, can trigger the spreading, or diffusion, of featural quality.
Neural Dynamics of Form Perception
93
TABLE 1 Siixiiriiary of Neiiral Analogs Abbreviation
Full N a m e
Neural Interpret a tion
MPL
Left monocular preprocessing stage Right monocular preprocessing stage Boundary cont,our synthesis stage
Lateral geniculate nucleus
MPR BCS
MBCL
Left monocular brightness and color stage
Right monocular brightness and color stage Binocular percept stage
Lateral geniculate nucleus Interactions initiated by the hypercolumns in striate cortex-Area 17 (Hubel and Wiesel, 1977) Interactions initiated by the cytochrome oxydase staining blobs-Area 17 (Hendrickson, Hunt, and Wu, 1981; Horton and Hubel, 1981; Hubel and Livingstone, 1981; Livingstone and Hubel, 1982) Interactions initiated by the cytochrome oxydase staining blobs-Area 17 Area V4 of the prestriate cortex (Zeki, 1983a, 1983b)
by the Ehrenstein figure restricts the flow of red featural quality, but the “real” boundary of the cross does not. This percept illustrates that boundary contours, both “real” and “illusory,” are generated by the same process. The illusory contour in Figure 5b tends to be perpendicular to its inducing Ehrenstein figures. Thus, the Ehrenstein figure generates two simultaneous effects. It inhibits like-orientated boundary contours at nearby positions, and it excites perpendicularly oriented boundary contours at the same nearby positions. We explain this effect as follows. The boundary contours of the Ehrenstein figure inhibit contiguous like-oriented boundary contours of the red cross, as in Figure Ib. By Figure Ic, perpendicular boundary contours at each perceptual position compete as part of a push-pull opponent process. By inhibiting the like-oriented boundary contours of the red cross, perpendicularly oriented boundary contours at. the corresponding positions are activated due to disinhibition. These disinhibited boundary contours can then cooperate with other approximately aligned boundary contours to form an illusory contour, as in Figure Id. This cooperative process further weakens the inhibited boundary contours of the red cross, as in Figure lc, thereby indicating why a strong neon effect depends on the percept of the illusory figure. Redies and Spillmann (1981) systematically varied the distance of the red cross from the Ehrenstein figure-their relative orientations, their relative sizes, and so forth-to study how the strength of the spreading effect changes with scenic parameters. They report that “thin [red] flanks running alongside the red connecting lines” (Redies and Spillmann, 1981) can occur if the Ehrenstein figure is slightly separated from the cross or if the orientations of the cross and the Ehrenstein figure differ. In our theory, the orientation specificity (Figure la) and distance dependence (Figure lb) of the inhibitory
94
Chapter 2
Figure 5. Neon color spreading. (a) A red cross in isolation appears unremarkable. (b) When the cross is surrounded by an Ehrenstein figure, the red color can flow out of the cross until it hits the illusory contour induced by the Ehrenstein figure.
Neural Dynamics of Form Perception
95
process among like-oriented cells suggest why these manipulations weaken the inhibitory effect of Ehrenstein boundary contours on the boundary contours of the cross. When the boundary contours of the cross are less inhibited, they can better restrict the diffusion of red-labeled activation. Then the red color can only bleed outside the contours of the cross. One might ask why the ability of the Ehrenstein boundary contours to inhibit the boundary contours of the cross does not also imply that Ehrenstein boundary contours inhibit contiguous Ehrenstein boundary contours? If they do, then how do any boundary contours survive this process of mutual inhibition? If they do not, then is this explanation of neon color spreading fallacious? Our explanation survives this challenge because the boundary contour process is sensitive to the amount of contrast, even though it is insensitive to the direction of contrast, as in Figure la. Contiguous boundary contours do mutually inhibit one another, but this inhibition is a type of shunting lateral inhibition (Appendix such that equally strong inhibitory contour signals can remain positive and balanced Grossberg, 1983a . If, however, the Ehrenstein boundary contour signals are stronger than the boun ary contour signals of the cross by a sufficient amount, then the latter signals can be inhibited. This formal property provides an explanation of the empirical fact that neon color spreading is enhanced when the contrast of a figure (e.g., the cross) relative to the background illumination is less than the contrast of the bounding contours (e.g., the Ehrenstein figure) relative to the background illumination (van Tuijl and de Weert, 1979). This last point emphasizes one of the paradoxical properties of the boundary contour system that may have delayed its discovery. In order to work properly, boundary contour responses need t o be sensitive to the amount of contrast in scenic edges. Despite this contrast sensitivity, boundary contours can be invisible if they d o not cause featural contrasts to occur. A large cellular activation does not necessarily have any perceptual effects within the boundary contour system. Although the rules of the boundary contour system and the feature contour system may prove sufftcient to explain neon color spreading, this explanation, in itself, does not reveal the adaptive role of these rules in visual perception. The adaptive role of these rules will become apparent when we ask the following questions: Why does not color spread more often? How does the visual system succeed as well as it does in preventing featural filling-in from flooding every scene? In Section 13, we show how these rules prevent a massive flow of featural quality in response to such simple images as individual lines and corners, not just in response to carefully constructed images like red crosses within Ehrenstein figures. We will now build up to this insight in stages. The same concepts also help to explain the complementary color induction that van Tuijl (1975) reported in his original article about the neon effect (Grossberg, 1984a). To see this, draw on white paper a regular grid of horizontal and vertical black lines that form 5mm squares. Replace a subset of black lines by blue lines. Let this subset of lines be replaced from the smallest imaginary diamond shape that includes complete vertical or horizontal line segments of the grid (Figure 6). When an observer inspects this pattern, the blue color of the lines appears to spread around the blue line segments until it reaches the subjective contours of the diamond shape. This percept has the same explanation as the percept in Figure 5b. Next replace the black lines by blue lines and the blue lines by black lines. Then the illusory diamond looks yellow rather than blue. Let us suppose that the yellow color in the diamond is induced by the blue lines in the background matrix. Then why in the previous display is not a yellow color in the background induced by the blue lines in the diamond? Why is the complementary color yellow perceived when the background contains blue lines, whereas the original color blue is perceived when the diamond contains blue lines? What is the reason for this asymmetry? This asymmetry can be explained in the following way. When the diamond is
d
1
96
Chapter 2
Figure 6 . Neon color spreading nd complementary color induction. When the ,lattice in (a) is composed of black lines and the contour in (b) composed of blue lines is inserted within its diamond-shaped space, then blue color flows within the illusory diamond induced by the black lines. When the lattice in (a) is blue and the contour in (b) is black, then yellow color can flow within the illusory diamond. (From “A New Visual Illusion: Neonlike Color Spreading and Complementary Color Induction between Subjective Contours” by H.F.J.M. van Tuijl, 1975, Acta Psychologica, 39,pp.441-445. Copyright 1975 by North-Holland. Adapted by permission.)
Neural Dynamics of Form Perception
97
composed of blue lines, then double-opponent color processing enables the blue lines to induce contiguous yellow feature contour signals in the background. These yellow feature contour signals are constrained by the boundary contour signals of the black lines to remain within a spatial domain that also receives feature contour signals from the black lines. The yellow color is thus not seen in the background. By contrast, the boundary contour signals of the black lines in the background inhibit the contiguous boundary contour signals of the blue lines in the diamond. The blue feature contour signals of the blue lines can thus flow within the diamond. When blue lines form the background, they have two effects on the diamond. They induce yellow feature contour signals via double-opponent processing. They also inhibit the boundary contour signals of the contiguous black lines. Hence the yellow color can flow within the diamond. To carry out this explanation quantitatively, we need to study how double-opponent color processes (light-dark, red-green, yellow-blue) preprocess the feature contour signals from stage MPL t o stage MBCL and from stage MPR to stage MBCR. Doubleopponent color processes with the requisite properties can be defined using gated dipole fields (Grossberg, 1980). We also need to quantitatively specify the rules whereby the boundary completion process responds to complex spatial patterns such as grids and Ehrenstein figures. We now approach this task by considering properties of illusory figures. 8. Contrast, Assimilation, and Grouping
The theoretical approach closest in spirit to ours is perhaps that of Kennedy (1979). We agree with many of Kennedy’s theoretical conclusions, such as Some kind of brightness manipulation . , . acts on certain kinds of inducing elements but in a way which is related to aspects of form....Changes in the luminance of the display have different effects on standard brightness contrast and subjective contour effects ....Something over and beyond simple bright,ness contrast is called for. (p.176) Grouping factors have to be an essential part of any discussion of subjective contours. (p.185) Contrast and grouping factors produce a percept that has some characteristics of a percept of an environmental origin. (p.189) Speaking intuitively, Kennedy’s remarks about contrast can be compared with properties of our feature contour system, and his remarks about grouping can be compared with properties of our boundary contour system. Once these comparisons are made, however, our theory diverges significantly from that of Kennedy, in part because his theory does not probe the mechanistic level. For example, Kennedy (1979) invoked two complementary processes to predict brightness changes: contrast and assimilation. Figure 7 describes the assimilation and contrast that are hypothesized to be induced by three shapes, Contrast is assumed to induce a brightening effect and assimilation is assumed to induce a darkening effect. This concept of assimilation is often used to explain how darkness or color can spread throughout an illusory figure (Ware, 1980). In our theory, local brightening and darkening effects are both consequences of a unified feature contour process. The fact that different parts of a figure induce different relative contrast effects does not imply that different levels of relative contrast are due to different processes. Also, in our theory a darkening effect throughout an illusory figure is not due to a lower relative contrast per Be, but to inhibition of a boundary contour leading to diffusion of a darker featural quality throughout the figure. Our theory thus supports the conclusion that perception of relative brightening and darkening effects
Chapter 2
98
) 8
k '..... ...... ....... ...............,.k.. ..... ........... .... ....... ....,...... I . . . .
..I..
,
L ...I......
atk
0 Figure 7. Three shapes redrawn from Kennedy (1979).Regions of contrast are indicated by [-] signs. Regions of assimilation are indicated by [+hsigns. Our theory suggests that the net brightening (contrast) or assimilation (dar ening) that occurs between two figures depends not only on figurally induced feature contour signals of variable contrast, but also on the configurationally sensitive boundary contours within which the featurally induced activations Representation, C.F. Nodine and D.F. Copyright 1979 by Praeger Publishers.
Neural Dynamics of Form Perception
99
cannot be explained just using locally defined scenic properties. The global configuration of all scenic elements determines where and how strongly boundary contours will be generated. Only after these boundary contours are completed can one determine whether the spatial distribution and intensity of all feature contour signals within these boundary contours will have a relative brightening or darkening effect. The theory of Kennedy (1979) comes close to this realization in terms of his distinction between brightness and grouping processes. Kennedy suggested, however, that these processes are computed in serial stages, whereas we suggest that they are computed in parallel stages before being joined together (Figure 4). Thus Kennedy (1979, p.191) wrote First, there are properties that are dealt with in perception of their brightness characteristics ....Once this kind of processing is complete, a copy is handed on to a more global processing system. Second, there are properties that allow them to be treated globally and grouped. Although our work has required new concepts, distinctions, and mechanisms beyond those considered by Kennedy, we find in his work a seminal precursor of our own. 9. B o u n d a r y Completion: Positive Feedback Between Local Competition and Long-Range Cooperation of Oriented Boundary Contour Segments
The following discussion employs a series of pictures that elicit illusory contour percepts to suggest more detailed properties of the cooperative boundary completion process of Figure Id. One or even several randomly juxtaposed black lines on white paper need not, induce an illusory rontour. By contrast, a series of radially directed black lines can induce an easily perceived circular contour (Figure 8a). This illusory contour is perpendicular t o each of the inducing lines. The perpendicular orientation of this illusory contour reflects a degree of orientational specificity in the boundary completion process. For example, the illusory contour becomes progressively less vivid as the lines are tilted to assume more acute angles with respect to the illusory circle (Figure 8b). We explain this tendency to induce illusory contours in the perpendicular direction by combining properties of the competitive interactions depicted in Figures I b and I c with properties of the cooperative process depicted in Figure Id, just as we did to explain Figure 5b. It would be mistaken, however, to conclude that illusory contour induction can take place only in the direction perpendicular to the inducing lines. The perpendicular direction is favored, BS a comparison between Figure 8a and Figure 9a shows. Figure 9a differs from Figure 8a only in terms of the orientations of the lines; the interior endpoints of the lines are the same. An illusory square is generated by Figure 9a to keep the illusory contour perpendicular to all the inducing lines. Not all configurations of inducing lines can, however, be resolved by a perpendicular illusory contour. Figure 9b induces the same illusory square as Figure Qa, but the square is no longer perpendicular to any of the inducing lines. Figures 8 and 9 illustrate several important points, which we now summarize in more mechanistic terms. At the end of each inducing line exists a weak tendency for several approximately perpendicular illusory line segments to be induced (Figure 10a). In isolation, these local reactions usually do not generate a percept of a line, if only because they do not define a closed boundary contour that can separate two regions of different relative brightness. Under certain circumstances, these local line segments can interact via the spatially long-range boundary completion process. This cooperative process can be activated by two spatially separated illusory line segments only if their orientations approximately line up across the intervening perceptual space. In Figure 8b, the local illusory line segments cannot line up. Hence no closed illusory contour is generated. In Figure 9b, the local illusory line segments can line up, but only in
100
Chapter 2
Figure 8. (a) Bright illusory circle induced perpendicular to the ends of the radial lines. (b) Illusory circle becomes less vivid as line orientations are chosen more parallel to the illusory contour. Thus illusory induction is strongest in an orientation perpendicular to the ends of the lines, and its strength depends on the global configuration of the lines relative to one another. (From Perception and Pictorial Representation, C.F. Nodine and D.F.Fisher (Eds.), p.182, New York: Praeger. Copyright 1979 by Praeger Publishers. Adapted by permission.)
Neural Dynamics of Form Perception
101
Figure 9. (a) Illusory square generated by changing the orientations, but not the end-points, of the lines in Figure 8a. (b) Illusory square also generated by lines with orientations that are not exactly perpendicular to the illusory contour. (From Pereeption a n d Pictorial Representation, C.F. Nodine and D.F. Fisher (Eds.), p.186, New York: Praeger. Copyright 1979 by Praeger Publishers. Adapted by permission.)
102
Chapter 2
directions that are not exactly perpendirnlar to the inducing lines. Thus the longrange cooperative process is orientation-specific across perceptual space (Figure Id). Boundary completion can be triggered only when pairs of sufficiently strong boundary contour segments are aligned within the spatial bandwidth of the cooperative interaction (Figure lob). An important property of Figures 8 and 9 can easily go unnoticed. Belore boundary completion occurs, each scenic line can induce a band of almost perpendicular boundary contour reactions. This property can be inferred from the fact that each line can generate illusory contours in any of several orientations. Which orientation is chosen depends on the global configuration of the other lines, as in Figures 9a and 9b. An adaptive function of such a band of orientations is clear. If only a single orientation were activated, the probability that several such orientations could be exactly aligned across the perceptual space would be slim. Boundary completion could rarely occur under such demanding conditions. By contrast, after boundary completion occurs, one and only one illusory contour is perceived. What prevents all of the orientations in each band from simultaneously cooperating to form a band of illusory contours? Why is not a fuzzy region of illusory contours generated, instead of the unique and sharp illusory contour that is perceived? Somehow the global cooperative process chooses one boundary orientation from among the band of possible orientations at the end of each inducing line. An adaptive function of this process is also clear. It offsets the fuzzy percepts that might otherwise occur in order to build boundaries at all. How can the coexistence of inducing bands and the percept of sharp boundaries be explained? Given the boundary contour rules depicted in Figure 1, a simple solution is suggested. Suppose that the long-range cooperative process feeds back to its generative boundary contour signals. The several active boundary contour signals at the end of each inducing line are mutually competitive. When positive feedback from the global cooperative process favors a particular boundary contour, then this boundary contour wins the local competition with the other active boundary contour signals. The positive feedback from the global cooperative process to the local competitive process must therefore be strong relative to the mask inputs that induce the band of weak boundary contour reactions at each inducing line end. Another important property can be inferred from the hypothesis that the boundary completion process feeds back an excitatory signal that helps to choose its own line orientation. How is this positive feedback process organized? At least two local boundary contour signals need to cooperate in order to trigger boundary completion between them. Otherwise, a single inducing line could trigger approximately perpendicular illusory lines that span the entire visual field, which is absurd. Given that two or more active boundary contour signals are needed to trigger the intervening cooperative process, as in Figure l l a , how does the cooperative process span widely separated positions yet generate boundaries with sharp endpoints? Why does not the broad spatial range of the process cause fuzzy line endings to occur, as would a low spatial frequency detector? Figure I l b suggests a simple solution. First, the two illusory contours generate positive signals along the pathways labeled 1. Thse orientationally aligned signals supraliminally excite the corresponding cooperative process, whose nodes trigger positive feedback via pathways such as pathway 2. Pathway 2 delivers its positive feedback to a position that is intermediate between the inducing line segments. Then, pathways such as 1 and 3 excite positive feedback from intervening pathways such as pathway 4. The result is a rapid positive feedback exchange between all similarly oriented cooperative processes that lie between the generative boundary contour signals. An illusory line segment is hereby generated between the inducing line segments, but not beyond them.
Neural Dynamics of Form Perception
103
‘\ I
0
0
--,’ ’- 0
0
‘\
Figure 10. Perpendicular induction. (a) The end of a scenic line (dark edge) activates a local tendency (dashed lines) to induce contours in an approximately perpendicular direction. (b) If two such local tendencies are sufficiently strong, if they approximately line up across perceptual space, and if they lie within a critical spatial bandwidth, then an illusory contour may be initiated between them.
104
Figure 11. Boundary completion. (a) Local competition occurs between different orientations at each spatial location. A cooperative boundary completion process can be activated by pairs of aligned orientations that survive their local competitions. (b) The pair of pathways 1 activate positive boundary completion feedback along pathway 2. Then pathways such as 3 activate positive feedback along pathways such as 4. Rapid completion of a sharp boundary between pathways 1 can hereby be generated.
Neural Dynamics of Form Perceprion
105
10. B o u n d a r y C o i n p l r t i o n as a Statistical Process: T e x t u r a l G r o u p i n g and Object Recognition
Figure 11 shows that the boundary completion process can be profitably thought of as a type of statistical grouping process. In response to a textured scene, many boundary contour segments simultaneously attempt to enhance their local competitive advantage by engaging the positive feedback from all possible cooperative processes that share their spatial position and orientational alignment. As shown in Figure l l b , there exist cooperative processes with multiple spatial bandwidths in order t o fill-in boundary contours between perceptual locations that are separated by variable distances. The most favorable combination of all positive feedback signals to the competing local boundary contour segments will win the orientational competition (Figure 12), as is illustrated by our simulations below. The statistical nature of the boundary completion process sheds light on how figures made u p of closely spaced dots can be used to induce illusory contours (Kennedy, 1979; Kennedy and Ware, 1978). We also suggest that the orientational tuning and spatially distributed nature of this statistical process contributes to the coherent cross correlations that are perceived using Julesz stereograms (Glass and Switkes, 1976; Julesz, 1971). These properties of the boundary completion process have been suggested by consideration of illusory contours. Clearly, however, the process itself cannot distinguish the illusory from the real. The same properties are generated by any boundary contour signals that can win the cooperative-competitive struggle. The ability of the boundary contour process to form illusory groupings enables our theory to begin explaining data from the Beck school (Beck, Prazdy, and Rosenfeld, 1983) on textural grouping, and data of workers like Biederman (1984) and Leeper (1935) concerning how colinear illusory groupings can facilitate or impair recognition of partially degraded visual images (Grossberg and Mingolla, 1985). One of the most important issues concerning the effects of illusory groupings on texture separation and object recognition is the following one. If illusory groupings can be so important, then why are they often invisible? Our theory’s distinction between boundary contours and feature contours provides a simple, but radical, answer. Boundary contours, in themselves, are always invisible. Perceptual invisibility does not, however, prevent boundary contours from sending large bottom-up signals directly to the object recognition system, and from receiving top-down boundary completion signals from the object recognition system (Grossberg, 1980). Our theory hereby makes a sharp distinction between the elaboration of a visible form percept a t the binocular percept (BP stage (Figure 4) and the activation of object recognition mechanisms. We suggest t at these two systems are activated in parallel by the BCS stage. The above discussion suggests some of the properties whereby cooperative interactions can sharpen the orientations of boundary contour segments as they span ambiguous perceptual regions. This discussion does not, however, explain why illusory contour segments are activated in bands of nearly perpendicular orientations at the ends of lines. The next section supplies some further information about the process of illusory induction. The properties of this induction process will again hold for both illusory and real contours, which exist on an equal mechanistic footing in the network.
b
11. Perpendicular versus Parallel Contour Completion
The special status of line endings is highlighted by consideration of Figure 2a. In this famous figure, four black pac-man forms generate an illusory Kanizsa square. The illusory edges of the Kanizsa square are completed in a direction parallel to the darklight inner edges of the pac-man forms. Why are parallel orientations favored when black pac-man forms are used, whereas perpendicular orientations are favored when the ends of black lines are used? Figure 13a emphasizes this distinction by replacing the
106
Chapter 2
Figure 12. Interactions between an oriented line element and its boundary completion process. (a) Output from a single oriented competitive element subliminally excites several cooperative processes of like orientation but variable spatial scale. (a) Several cooperative processes of variable spatial scale can simultaneously excite a single oriented competitive element of like orientation.
Neural Dynamics of Form Perception
107
3
3 Figure 13. Open versus closed scenic contours. (a) If the black pac-man figures of Figure 2 are replaced by black lines of perpendicular orientation, then a bright illusory square is seen. (b) If line ends are joined together by black lines and the resultant closed figures are colored black, then a bright illusory square is again seen. These figures illustrate how perpendicular contour induction by open line ends can be replaced by parallel contour induction by closed edges. black pac-man forms with black lines whose endpoints are perpendicular to the illusory contour. Again the illusory square is easily seen, but is now due to perpendicular induction rather than to parallel induction. An analysis of spatial scale is needed to understand the distinction between perpendicular induction and parallel induction. For example, join together the line endpoints in Figure 13a and color the interiors of the resultant closed contours black. Then an illusory square is again seen (Figure 13b). In Figure 13b, however, the illusory contours are parallel to the black closed edges of the bounding forms, rather than perpendicular to the ends of lines, as in Figure 13b. The black forms in Figure 13b can be thought of as thick lines. This raises the question: How thick must a line become before perpendicular induction is replaced by parallel induction? How thick must a line become before its “open” end becomes a “closed” edge? In our networks, the measure of thickness is calibrated in terms of several interacting parameter choices: the number of degrees spanned by an image on the retina, the mapping from retinal cells to oriented masks within the boundary contour system, the spatial extent of each oriented mask, and the spatial extent of the competitive interactions that are triggered by outputs from the oriented masks. The subtlety of this calibration issue is illustrated by Figure 14. In Figure 14, the black interiors of the inducing forms in Figure 13b are eliminated, but their boundaries are retained. The black contours in Figure 13b remain closed, in a geometrical sense, but the illusory square vanishes. Does this mean that these black contours can no longer induce an illusory square boundary contour? Does it mean that an illusory boundary contour does exist, but that the change in total patterning of feature contour signals no longer differentially brightens the inside or outside of this square? Or is a combination of these two types of effects simultaneously at work? Several spatial scales are simultaneously involved in both the boundary contour process and the feature
108
Chapter 2
Figure 14. Influence of figural contrast on illusory brightness. When the black interiors of Figure 13b are colored white, the illusory square is no longer perceived. contour process. A quantitative analysis of multiple scale interactions goes beyond the scope of this article. The following discussion outlines some factors that are operative within each spatial scale of the model. Section 13 suggests that both perpendicular induction and parallel induction are properties of the same boundary completion process. T h e different induction properties are traced to different reactions of the boundary completion process t o different visual patterns. Before exploring these points, the following section clarifies how removal of the black interiors in Figure 14 eliminates the percept of an illusory Kanizsa square.
12. Spatial Scales and Brightness Contrast Figure 15 uses pac-man forms instead of the forms in Figure 14 due to their greater simplicity. In Figure 15 the interiors of the upper two pac-man forms are black, but the interiors of the bottom two pac-man forms are white. When all four pac-man forms are colored white, an illusory square is not visible, just as in Figure 14. In Figure 15, by contrast, two vertical illusory contours can be perceived between the black pacman forms and the pac-man forms with white interiors. The existence of these vertical contours suggests that the vertical black lines in the bottom two pac-man figures can cooperate with the vertical black lines in the top two pac-man figures to induce boundary contours in a direction parallel to their orientation. When all the pac-man forms have white interiors, however, the interior contrast generated by these forms by the feature contour process does not differ significantly from the exterior contrast that is generated by these forms. By using two pac-man forms with black interiors, the interior contrast is enhanced relative to the exterior contrast. This enhanced interior brightness flows downward within the illusory vertical contours, thereby enhancing their visibility. Why does coloring the interiors of two pac-man figures black enhance their interior contrastive effect? This property can be better understood by comparing it with classical demonstrations of brightness contrast. This comparison shows that the property in question is not peculiar to illusory figures. It is the same property as the brightness contrast that is due to "real" figures. Figure 16 compares a thin letter 0 with a thick letter 0. The brightness levels interior to and exterior to the thin letter 0 are not obviously different. A sufficiently thick letter 0 can generate a different percept, however. If the letter 0 is made sufficiently thick, then it becomes a black annulus surrounding a white circle. It is well-knownfrom
Neural Dynamics of Form Perception
109
Figure 15. Influence of figural contrast on illusory brightness. If only two pac-man forms in Figure 2 are colored black, and the other two forms have white interiors, then an illusory contour can be seen between contiguous black and white forms. This percept suggests that some illusory boundary contour induction may occur in response to Figure 14, but than not enough differential feature contour contrast is generated inside and outside the boundary contour to make the boundary contour visible.
0 Figure 16. Effects of spatial scale on perceived contrast. (a) No obvious brightness difference occurs between the inside and the outside of the circle. (b) By thickening the circle sufficiently, it becomes a background annulus. The interior of the circle can then be brightened by classical brightness contrast. classical studies of brightness contrast that darkening an annulus around an interior circle can make the circle look brighter (Cornsweet, 1970). We suggest that the difference between a thin letter 0 and a brightness contrast demonstration reflects the same process of lateral inhibition (Grossberg, 1981) as the difference between a pac-man form with white interior and a pac-man form with black interior.
13. Boundary-Feature Trade-off: Orientational Uncertainty and Perpendicular End Cutting We are now ready to consider the boundary-feature trade-off and to show how it explains the paradoxical percepts above as consequences of an adaptive process of fundamental importance.
110
Chapter 2
The theory's rules begin to seem natural when one acknowledges that the rules of each contour system are designed to offset insufficiencies of the other contour system. The boundary contour system, by itself, could at best generate a perceptual world of'outlines. The feature contour system, by itself, could a t best generate a world of formless qualities. Let us accept that these deficiencies are, in part, overcome by letting featural filling-in spread over perceptually ambiguous regions until reaching a boundary contour. Then it becomes a critical t.ask to synthesize boundary contours that are capable of restraining the featural flow at perceptually important scenic edges. Orientationally tuned input masks, or receptive fields, are needed to initiate the process of building up these boundary contours (Figure 1). If the directions in which the boundaries are to point were not constrained by orientational tuning, then the process of boundary completion would become hopelessly noisy. We now show that orientationally tuned input masks are insensitive to orientation at the ends of scenic lines and corners. A compensatory process is thus needed to prevent featural quality from flowing out of the percepts of all line endings and corners. Without this compensatory process, filling-in anomalies like neon color spreading would be ubiquitous. This compensatory process is called the end-cutting process. The end-cutting process is the net effect of the competitive interactions described in Figures Ib and l c . Thus the rules of the boundary contour system take on adaptive meaning when they are understood from the viewpoint of how boundary contours restrict featural filling-in. This section discusses how this end-cutting process, whose function is to build up "real" boundary contours with sharply defined endpoints, can also sometimes generate illusory boundary contours through its interaction with the cooperative boundary completion process of Figure Id and Figure 11. The need for an end-cutting process can be seen by considering Figure 17. Figure 17 describes a magnified view of a black vertical line against a white background. Consider Position A along the right edge of the scenic line. A vertically oriented input mask is drawn surrounding Position A. This mask is sensitive to the relative contrast of line edges that fall within its elongated shape. The mask has been drawn with a rectangular shape for simplicity. The rectangular shape idealizes an orientationally sensitive receptive field (Hubel and Wiesel, 1977). The theory assumes t>hata sufficiently contrastive vertical dark-light edge or a sufficiently contrastive light-dark edge falling within the mask area can activate the vertically tuned nodes, or cells, that respond to the mask at Position A. These cells are thus sensitive both to orientation and to the amount of contrast, but not to the direction of contrast (Figure la). A set of masks of varying orientations is assumed to exist at each position of the field. Each mask is assumed to have an excitatory effect on cells that are tuned to the same orientation and an inhibitory effect on cells that are tuned to the other orientations at its spatial position (Figure l c ) . A t a position, such as A, which lies along a vertical edge of the line far from its end, the rules for activating the oriented masks imply that the vertical orientation is strongly favored in the orientational competition. A tacit hypothesis is needed to draw this conclusion: The oriented masks are elongated enough to sense the spatially anisotropic distribution of scenic contrast near Position A. Were all the masks circularly symmetric, no mask would receive a larger input than any other. When oriented masks are activated at a position such as B, a difficulty becomes apparent. Position B lies outside the black line, but its vertical mask still overlaps the black inducing line well enough to differentially activate its vertically tuned cells. Thus the possibility of selectively registering orientations carries with it the danger of generating boundary contours that extend beyond their inducing edges. Suppose that the vertically oriented cells at positions such as B were allowed to cooperate with vertically oriented cells at positions such as A. Then a vertical boundary contour could form that would enable featural quality to flow out of the line. We now show that the end-cutting process that prevents this from happening also has properties of illusory
Neural Dynamics of Form Perception
111
A
B
Figure 17. Orientational specificity at figural edges, corners, and exteriors. (a) At positions such as A that are along a figural edge, but not at a figural corner, the oriented mask parallel to the edge is highly favored. At positions beyond the edge, such as B, masks of the same orientation are still partially activated. This tendency can, in the absence of compensatory mechanisms, support a flow of dark featural activity down and out of the black figure. (b) A line is thin, functionally speaking, when at positions near a corner, such as C, many masks of different orientations are all weakly activated or not activated at all.
112
Chapter 2
induction that have been described above. Suppose that inhibitory signals can be generated from positions such as A to positions such as B that lie beyond the end of the line. Because the position of the line relative to the network can change unpredictably through time, these signals need to be characterized in terms of the internal network geometry rather than with respect to any particular line. To prevent featural flow, the vertical activation at Position A needs to inhibit the vertical activation at Position B, but not all activations at Position B. Thus the inhibitory process is orientationally selective across perceptual space (Figure lb). The spatial range of the inhibitory process must also be broad enough for vertical activations at line positions such as A to inhibit vertical activations at positions such as B that lie outside the line. Otherwise expressed, the spatial’range of these orientationally selective inhibitory signals must increase with the spatial scale of the masks. Once the need for an inhibitory end-cutting process is recognized, several paradoxical types of data immediately become more plausible. Consider, for example, Figure 5b in which the vertical boundary contours of the Ehrenstein figure inhibit the vertical boundary contours of the contiguous red cross. The orientational specificity and limited spatial bandwidth of the inhibition that are needed to prevent featural flow also explain why increasing the relative orientation or spatial separation of the cross and Ehrenstein figure weakens the neon spreading effect (Redies and Spillmann, 1981). The inhibitory end-cutting process explains how a vertical orientation of large contrast at a position such as A in Figure 17a can inhibit a vertical orientation of lesser contrast, as at Position B. More than this inhibitory effect is needed to prevent featural activity from flowing outside of the line. Horizontally oriented boundary contours must also be activated at the end of the line. These horizontal boundary contours are not activated, however, without further network machinery. To understand why this is so, consider Position C in Figure 17b. Position C lies at the end of a narrow black line. Due to the thinness of the line relative to the spatial scale of the oriented input masks, several oriented masks of differing orientations at Position C can all register small and similar amounts of activation, as in the computer simulations of Section 17. Orientational selectivity breaks down at the ends of lines, even though there may exist a weak vertical preference. After the strongly favored vertical orientation at position A inhibits the weakly activated vertical orientation at positions such as B or C, the mask inputs themselves do not provide the strong activations of horizontal orientations that are needed to prevent featural flow. Further processing is needed. The strong vertical inhibition from Position A must also disinhibit horizontal, or close-to-horizontal, orientations at positions such as B and C . This property followsfrom the postulate that perpendicular orientations compete at each perceptual position, as in Figure lc. Thus the same competitive mechanisms in Figures l b and lc that explain how end cutting-with its manifestly adaptive function-occurs, also explain how red color can paradoxically flow out of a red cross when it is surrounded by an Ehrenstein figure (Figure 5). As the thickness of the black line in Figure 17 is increased, the horizontal bottom positions of the line begin to favor horizontal orientations for the same reason that the vertical side positions of the line favor vertical orientations. When this occurs, the horizontal orientations along the thickened bottom of the line can cooperate better via the boundary completion process to directly form a horizontal boundary contour at the bottom of the figure. Parallel induction by a thick black form hereby replaces perpendicular induction by a thin black line as the thickness of the line is increased.
14. Induction of “Real” Contours Using “Illusory’y Contour Mechanisms Some readers might still be concerned by the following issues. Does not the endcutting process, by preventing the vertical boundary contour from extending beyond
Neural Dynamics of Form Perception
I13
Position C in Figure 17b, create an even worse property: the induction of horizontal illusory contours? Due to the importance of this issue in our theory, we summarize the adaptive valuc of this property using properties of the cooperative boundary completion process of Figure Id and Figure 11. Suppose that inhibition from Position A to Position B does not occur in Figure 17a. Then vertical activations can occur at both positions. By Figure 11, an illusory vertical boundary contour may be generated beyond the ‘‘real” end of the line. The same is true at the left vertical edge of the line. Due to the existence of ambiguous boundary contour orientations between these vertical boundary contours, featural quality can freely flow between the dark interior of the line and the white background below. The end-cutting process prevents featural flow from occurring at line ends. It does so by generating a strong horizontal activation near corner positions such as C in Figure 17b. In the same way, it generates a strong horizontal activation near the bottom left corner of the line. Using the cooperative process in Figure 11, these two horizontal activations can activate a horizontal boundary contour across the bottom of the line. Although this horizontal boundary contour is “illusory,” it prevents the downward flow of dark featural quality beyond the confines of the inducing line, and thereby enables the network to perceive the line’s ‘‘real” endpoint. Thus the “real” line end of a thin line is, strictly speaking, an “illusory” contour. “Real” and “illusory” contours exist on an equal ontological footing in our theory. In the light of this adaptive interaction between the competitive end-cutting process and the cooperative boundary completion process in the perception of “real” scenic contours, the fact that occasional juxtapositions of “real” scenic contours also generate boundary contours that are judged to be “illusory“ seems to be a small price to pay. The remaining sections of this article describe a real-time network that is capable of computing these formal properties. 15. G a t e d Dipole Fields
We assume that the competitive end-cutting and cooperative boundary completion processes are mediated by interaetions between on-cells and off-cells that form opponent processes called gated dipoles. Specialized networks, or fields, of gated dipoles have been used to suggest explanations of many visual phenomena, such as monocular and binocular rivalry, spatial frequency adaptation, Gestalt switching between ambiguous figures, color-contingent and orientation-contingent after-effects, and attentional and norepinephrine influences on visual critical period termination and reversal (Grossberg, 1976, 1980, 1982, 1983a, 1984a). The gating properties of these fields are described here only in passing. Before describing the details of the gated dipole fields that will be used, we qualitatively summarize how they can mediate the competitive end-cutting process. Several closely related variations of this design can generate the desired properties. We develop one scheme that incorporates the main ideas. Suppose that an input mask at position ( i , j ) is preferentially tuned to respond to an edge of orientation k. Denote the input generated by this mask by JaJk.Suppose that this input activates the potential z,,~ of the corresponding on-cell population. Also suppose that the variously oriented inputs J I J k at a fixed position (;,i)cause a competition to occur among the corresponding on-cell potentials + k . In the present scheme, we suppose that each orientation k preferentially inhibits the perpendicular orientation K at the same position ( i , j ) . In this sense, the on-potential Z , ~ Kis the off-potential of the input J l l k , and the on-potential X s j k is the off-potential of the input J r J ~These . pairs of competing potentials define the dipoles of the field. One consequence of dipole competition is that at most one potential Z t J k or z a Jof~ a dipole pair can become supraliminally active at any time. Furthermore, if both inputs
114
Chapter 2
JIlk and J l l ~ are equally large, then -other things being equal -neither potential z l i k nor zIJ? can become supraliminally active. Dipole competition between perpendicular orientations activates a potential q l k or z , ~ K only if it receives a larger net input than its perpendicularly tuned competitor. The amount of activation is, moreover, sensitive to the relative contrast of these antagonistic inputs. An oriented input Jt3k excites its own potential Z,,k and inhibits similarly oriented potentials Z W k at nearby positions ( p , q ) , and conversely. The input masks are thus organized as part of an on-center off-surround anatomy of short spatial range (Figure 18). Due to this convergence of excitatory and inhibitory inputs at each orientation and position the net input to a potential z , j k may be excitatory or inhibitory. Thissituation creates a new possibility. Suppose that zzlt receives a net inhibitory input, whereas Z , ~ K receives no external input. Then zIlk is inhibited and X,,K is supraliminally excited. This activation of z l , ~is due to a disinhibitory action that is mediated by dipole competition. In order for z l l ~to be excited in the absence of an excitatory input J I J ~a ,persistently active, or tonic, internal input must exist. This is another wellknown property of gated dipoles (Grossberg, 1982). By symmetry, the same tonic input influences each pair of potentials a!,jk and z I J ~ . When transmitter gates are placed in specialized dipole pathways-hence the name gated dipole--properties like negative after-effects, spatial frequency adaptation, and binocular rivalry are generated (Grossberg, 1980, 1983a, 1983b). Transmitter gates are not further discussed here. We now apply the properties of dipole competition to explain the inhibitory endcutting process in more quantitative detail. Suppose that vertical input masks Jwk are preferentially activated at positions such as A in Figure 17a. These input masks succeed in activating their corresponding potentials zWk, which can then cooperate to generate a vertically oriented boundary contour. By contrast, positions such as B and C in Figure 17 receive orientationally ambiguous inputs due to the thinness of the black bar relative to the length of the oriented masks. Consequently, the inputs J,3k to these positions near the end of the bar are small, and several mask orientations generate inputs of comparable size. Without compensatory mechanisms, featural quality would therefore flow from the end of the bar. This is prevented from happening by the vertically oriented input masks JWk at positions such as A. These input masks generate large off-surround inhibitory signals to z y k at positions (i,j) at the end of the bar. Due to dipole competition, the horizontally tuned potentials Z,,X are disinhibited. The horizontally tuned potentials of several horizontally aligned positions at the end of the bar can then cooperate to generate a horizontally oriented boundary contour that prevents featural quality from flowing beyond the end of the bar.
16. Boundary Completion: Oriented Cooperation Among Multiple Spatial Scales
The stage of dipole competition between perpendicular orientations is followed by a stage of shunting Competition among all the orientations corresponding to a fixed position (i,j). The stage of shuntingcompetition possesses several important properties. For one, the shunting competition tends to conserve, or normalize, the total activity of the potentials y,Jk at the final stage of competitive processing n
c
%k
k=l
(Figure 18). This limited capacity property converts the activities (gs,l, g,,?, ...,g,,”) of the final stage into a ratio scale. See the Appendix for mathematical details.
Neural Dynamics of Form Perception
115
INPUT TO ORIENTED
COO PE R A T I ON FEEDBACK
0
-
Figure 18. Orientationally tuned competitive interactions. A shunting on-center offsurround interaction within each orientation and between different positions is followed by a push-pull dipole competition between orientations and within each position. The different orientations also compete to normalize the total activity within each position before eliciting output signals t o the cooperative boundary completion process that exists between positions whose orientations are approximately aligned.
Chapter 2
116
An equally important property of the shunting competition at each position (i,j) becomes apparent when several positions cooperate to complete boundary contours. Figure 19 depicts how two properly aligned potentials, gnlk and yuvk, of orientation k at different positions (;,i)and ( u , u ) cooperate to activate the potential Zwk at an intervening position ( p , q ) . Potential zp9k, in turn, excites the potential zwk of the same orientation k and at the same position ( p , q ) . As in Figure 11, this positive feedback process rapidly propagates to the potentials of orientation k corresponding to all positions between ( i , i )and ( u , v ) . To generate a sharp contour (Section Q), a single orientation k needs to be chosen from among several partially activated orientations at each position (p, 9). Such a choice is achieved through an interaction between the oriented cooperation and the shunting competition. In particular, in Figure 19, the positive feedback from zpqk to enhances the relative size of v w k compared to its competitors gwr at position (p, 4). In order for the positive feedback signals h(zp9k)from zp9k to zp9k to achieve a definite choice, the form of the signal function h ( w must be correctly chosen. It was proved in Grossberg (1973) that a signal function (20) that is faster-than-linear at attainable activities w = zpqk is needed to accomplish this task. A faster-than-linear signal function sharply contrast-enhances the activity patterns that reverberate in its positive feedback loops (Grossberg, 1983a). Examples of faster-than-linear signal functions are power laws such as h ( w ) = Aw",A > 0 , n > 1; threshold laws such as h ( w ) = A max(w - B,O),A > 0 , B > 0; and exponential laws such as h ( w ) = At?, A > 0 , B > 0. The opponent competition among the potentials zt3k and the normalizing competition among the potentials gtlk may be lumped into a single process (Grossberg, 1983a). They have been separated herein to achieve greater conceptual clarity.
I,
17. Computer Simulations This section describes some of the simulations that have been done in our ongoing program of quantitative model testing and refinement. The equations that govern the simulations are defined in the Appendix. Figure 20 describes a simulation of boundary completion. In this simulation, the potentials of gated dipoles at positions 15 and 25 receive positive inputs. The potential of the gated dipole at position i is denoted by gn(t)in Figure 20. A single positional index i is sufficient because the simulation is carried out on a one-dimensional array of cells. The potential of the boundary completion cell at position i is denoted by t a ( t ) . Figure 20 provides a complete summary of how the boundary completion process unfolds through time. Each successive increasing curve in the figure describes the spatial pattern of activities y , ( T ) or za(T)across positions i at successive times t = 2'. Note that the input to the two gated dipole positions cause a rapid activation of gated dipole positions that lie midway between them via cooperative feedback signals. Then these three positions rapidly fill-in the positions between them. The final pattern of yl activities defines a uniformly active boundary that ends sharply at the inducing positions 15 and 25. By contrast, the final pattern of z, values extends beyond the inducing positions due to subliminal activation of these positions by the interactions depicted in Figure 12a. Figure 21 illustrates how the boundary completion process attenuates scenic noise and sharpens fuzzy orientation bands. Each column of the figure describes a different time during the simulation. The original input is a pattern of two noisy but vertically biased inducing sources and a horizontally oriented noise element. Horizontally biased end cuts are momentarily induced before the oriented cooperation rapidly attenuates all nonvertical elements to complete a vertical boundary contour. Figures 22a and 22b illustrate how a field of oriented masks, such as those depicted in Figure 17, react to the sharp changes in direction a t the end of a narrow input bar. These figures encode the activation level of each mask by the length of the line having
Neural Dynamics ofForm Perception
117
Figure 19. Excitatory boundary completion feedback between different positions. Outputs triggered by aligned dipole on-pot,entials Yt3k abd guvk can activate intervening boundary completion potentials zWk. The potentials z w k , in turn, deliver strong positive feedback to the corresponding potentials wmk, which thereupon excite the potentials X& and inhibit the potentials z W ~ .
Chapter 2
118
Y FIELD
I
0
lo
20
I
1
30
40
position
Figure 20a. Computer simulation of boundary completion in a one-dimensional array of cells. Two sustained inputs to positions 15 and 25 of the y field trigger a rapid filling-in. Activity levels at five successive time periods are superimposed, with activity levels growing to a saturation level. (a) Sharp boundary in y field of Figure 19.
Neural Dynamics of Form Perception
119
Z FIELD
Figure 20b. Fringe of subliminal activity flanks suprathreshold activity pattern in z field of Figure 19.
Chapter 2
120
REAL TIME
BOUNDARY
x
input 1.
x
!
COMPLETION rl
y field at time: 2. 3. 4.
1
5.
6.
Figure 21. Each column depicts a different time during the boundary completion process. The input consists of two noisy but vertically biased inducing line elements and an intervening horizontal line element. The competitive-cooperativeexchange triggers transient perpendicular end cuts before attenuating all nonvertical elements as it completes the vertical boundary.
Neural Dynamics of Form Perception
121
the same orientation as the mask at the position. We call such a display an’otiedation field. A position at which only one line appears is sensitive only to the orientation of that line. A position a t which several lines of equal length appear is equally sensitive to all these computed orientations. The relative lengths of lines across positions encode the relative mask activations due to different parts of the input pattern. Figure 22a shows that a strong vertical preference exists at positions along a vertical edge that are sufficiently far from an endpoint (e.g., positions such as A in Figure 17a). Masks with close-to-vertical orientations can also be significantly activated at such positions. Thus there exists a strong tenency for parallel induction of contours to occur along long scenic edges, as in the illusory Kanizsa square of Figure 2. This tendency for strong parallel induction to occur depends on the length of the figural edge relative to the length of the input masks. Consider, for example, positions along the bottom of the figure, such as position C in Figure 17b. Because the figure is narrow relative to the mask size, the orientational preferences are much weaker and more uniformly distributed, hence more ambiguous, at the ends of narrow lines. Figure 22b illustrates how different values of mask parameters can generate different orientational fields in response to the same input pattern. The dark-light and light-dark contrast that is needed to activate a mask (parameter a in the Appendix, equation ( A l ) ) is higher in Figure 22b than in Figure 22a. Consequently the positions that respond to scenic edges are clustered closer to these edges in Figure 22b, and edge positions near the line end are not activated. In both Figures 22a and 22b, the input activations near the line end are weak, orientationally ambiguous, or nonexistent. In Figures 23a and 23b, the orientation fields of Figures 22a and 22b are transformed by the competitive interactions within a dipole field. The functional unit of this field again consists of a complete set of orientations at each perceptual location. At each position (i,j),the value gtJk of the final competitive stage (Figure 18) is described by a line of orientation k whose length is proportional to y t i k . In response to the orientation field of Figure 22a, the dipole field generate a strong horizontal end cut in Figure 23a at the perceptual positions corresponding to the end of the line. These horizontal activations can cooperate to generate a boundary contour capable of preventing featural flow from the end of the line. Oblique activations are also generated near the line end as part of this complementary induction process. These oblique activations can induce nonperpendicular illusory contours, as in Figure 9b. In Figure 23b, “illusory” horizontal end cuts are generated at the locations where the vertically oriented inputs of Figure 22b terminate, despite the fact that the locations do not coincide with the end of the line. Comparison of Figures 23a and 23b shows that the horizontal end cuts in both examples exist on a similar ontological footing, thereby clarifying the sense in which even the percepts of ”real” line ends are “illusory” and the percepts of “illusory” line ends are “real.“ This conclusion does not imply that human observers are unable t o say when certain illusory boundaries seem to be ”unreal.” We trace this capability to the different ways in which some scenes coactivate the feature contour system and the boundary contour system, rather than to different boundary completion mechanisms within the boundary contour system for “real” and “illusory” line percepts. 18. Brightness Paradoxes and the Land Retinex Theory
This article has focused on the process whereby both real and illusory visual contours are formed. From the perspective of this process, the distinction between a real contour and an illusory contour is highly ambiguous. The role of end cutting in defining sharp “illusory” boundary contours at the “realn ends of narrow lines is a case in point (Section 14). To quantitatively underst,and illusory brightness effects in the theory, it is necessary to analyse how feature contour signals combine with boundary contour signals within
Chapter 2
122
+
...
,
-
.
*
~
.
-.~ *
.
.
+
+
*
.
.
I
.
Figure 22a. Orientation field. Lengths and orientations of lines encode relative sizes of activations and orientations of the input masks at the corresponding positions. The input pattern corresponds to the shaded area. Each mask has total exterior dimensions of 16 x 8 units, with a unit length being the distance between two adjacent lattice positions.
Neural Dynamics of Form Perception
123
Figure 22b. Orientational field whose masks respond to higher contrasts than those in Figure 22a.
124
Chapter 2
*
1
I:
I
*
*
1
* * I
*
I:
*
I
*
1
*
1
*
..
*
x
*
.
*
. .
* zc
.
+ +
I
-
\
$
\
,
,
I
-
Figure 23a. Response of the potentials gigk of a dipole field to the orientation field of Figure 22a. End cutting generates horizontal activations at line end locations that receive small and orientationally ambiguous input activations. The oblique activations that occur at the line end can induce nonperpendicular illusory contours, as in Figure Qb.
Neural Dynamics of Form Perception
125
Figure 2Sb. Response of the potentials yilk of a dipole field to the orientation field of Figure 22b. End cutting generates "illusory" horizontal activations at the locations where vertically oriented inputs terminate.
126
Chapter 2
the monocular brightness and color stages MBCL and MBCR of Figure 4, and the manner in which these processing stages interact to generate a binocular percept at the BP stage of Figure 4. This analysis of brightness extends beyond the scope of this article. Cohen and Grossberg (1984b) simulated a number of paradoxical brightness percepts that arise when observers inspect certain contoured images, such as the Craik-O’Brien effect (Arend et d.,1971; O’Brien, 1958) and its exceptions (Coren, 1983; Heggelund and Krekling, 1976; Todorovic‘, 1983; van den Brink and Keemink, 1976); the Bergstrijm (1966, 1967a, 1967b) demonstrations comparing the brightnesses of smoothly modulated and step-like luminance profiles; Hamada’s (1980) demonstrations of nonclassical differences between the perception of luminance decrements and increments; and Fechner’s paradox, binocular brightness averaging, and binocular brightness summation (Blake, Sloane, and Fox, 1981; Cogan, 1982; Cogan, Silverman, and Sekuler, 1982; Curtis and Rule, 1980; Legge and Rubin, 1981; Levelt, 1965). Classical concepts such as spatial frequency analysis, Mach bands, and edge contrast are insufficient by themselves to explain the totality of these data. Because the monocular brightness domains do not know whether a boundary contour signal from the BCS stage is due to a “real” scenic contour or an “imaginary” scenic contour, these brightness simulations support our theory of boundary-feature interactions. Cohen and Grossberg (1984a) and Grossberg (1983a) showed through mathematical derivations and computer simulations how the binocular visual representations at the BP stage combine aspects of global depth, brightness, and form information. Grossberg (1980, 1983a, 1984a) used the t,heory to discuss the dynamics of monocular and binocular rivalry (Kaufman, 1974; Kulikowski, 1978; Rauschecker, Campbell, and Atkinson, 1973). Grossberg (1984a) indicated how the theory can be used to explain the fading of stabilized images (Yarbus, 1967). Grossberg (1984a) also suggested how the theory can be extended to include color interactions. This extension provides a physical interpretation of the Land (1977) retinex theory. In this interpretation, a simultaneous parallel computation of contrast-sensitive feature contour signals occurs within double-opponent color processes (light-dark, redgreen, yellow-blue). This parallel computation replaces Land’s serial computation of edge contrasts along sampling paths that cross an entire visual scene. Despite Land’s remarkable formal successes using this serial scanning procedure, it has not found a physical interpretation until the present time. One reason for this delay has been the absence of an explanation of why gradual changes in illumination between successive scenic contours are not perceived. The diffusive filling-in of feature contour signals within domains defined by boundary contour signals provides an explanation of this fundamental fact, as well as of Land’s procedure of averaging the outcomes of many serial scans. In addition to physically interpreting the Land retinex theory, the present theory also substantially generalizes the Land theory. The Land theory cannot, for example, explain an illusory brightness change that is due to the global configuration of the inducing elements, as in Figure 8a. The illusory circle in Figure 8a encloses a region of enhanced illusory brightness. No matter how many radially oriented serial scans of the Land theory are made between the radial lines, they will compute a total contrast change of zero, because there is no luminance difference between these lines. If one includes the black radial lines within the serial scans, then one still gets the wrong answer. This is seen by comparing Figures 8a and 8b. In these two figures, the number, length, contrast, and endpoints of the lines are the same. Yet Figure 8a generates a strong brightness difference, whereas Figure 8b does not. This difference cannot be explained by any theory that depends only on averages of local contrast changes. The brightness effects are clearly due to the global configuration of the lines. A similar limitation of the Land theory is seen by comparing Figures 8 and 9, where rearranging the orientation of the line ends can alter the shape of the perceived region where enhanced brightness obtains.
Neural Dynamics of Form Perception
127
Although the present theory physically interprets the Land retinex theory, it does not by any means provide a complete description of color processing by the nervous system. Much further work needs to be done, for example, to characterize how visual preprocessing generat,es color-specific, as opposed to merely wavelength-sensitive, feature contour inputs into the featural filling-in syncytium (Zeki, 1983a, 1983b).
19. Related D a t a and Concepts About Illusory Contours A variety of other workers have developed concepts based on their data that support our conception of boundary completion, although no one of them has explicitly posited the properties of the feature contour and boundary contour processes. Petry et al. (1983) wrote, for example, that “apparent brightness is influenced more by number of inducing elements, whereas apparent sharpness increases more with inducing element width ....Theoretical accounts of subjective contours must address both perceptual attributes” (p.169), in support of our discussion in Sections 11 and 12. Day (1983) wrote that “illusory contours . . . are due primarily to the spread of induced contrast to partially delineated borders” (p.488), in support of our concept of diffusive filling-in (Section 5), but he did not describe either how the borders are completed or how the featural induction and spread are accomplished. Prazdny (1983) studied variants of the illusion in Figure 8a. He concluded that “simultaneous brightness contrast is not a cause of the illusion” (p.404) by replacing the black lines with alternating black and white rectangles on a grey background. In this way, he also demonstrated that illusory contours can be completed between scenic contours of opposite direction of contrast, as in Figure 2b, but he did not conclude from this that distinct boundary contour and feature contour processes exist. Instead, he concluded that “It remains to be determined which of the competing ‘cognitive’ theories offers the best explanation ... of subjective contours” (p.404). Our results suggest that a cognitive theory is not necessary to explain the basic phenomena about subjective contours, unless one reinterprets cognitive to mean any network computation whose results are sensitive to the global patterning of all inducing elements. 20. Cortical D a t a and Predictions
Although the analysis that led to the boundary contour system and feature contour system was fueled by perceptual data, it has gradually become clear that a natural neural interpretation can be given to the processing stages of these systems. This linkage is suggested herein to predict unknown but testable neurophysiological properties, to provide a perceptual interpretation of known neural data, and to enable future data about visual cortex to more sharply constrain the development of perceptual theories. We associate the early stages of left-monocular (MPL) and right-monocular (MPR) preprocessing in Figure 4 with the dynamics of the lateral geniculate nucleus, the first stages in the boundary contour system with the hypercolumns in striate cortex (Hubel and Wiesel, 1977 , and the first stages in the feature contour system with the blobs in striate cortex Hendrickson, Hunt, and Wu, 1981; Horton and Hubel, 1981). This interpretation is compatible with recent cortical data: The LGN projects directly to the hypercolumns as well as to the blobs (Livingstone and Hubel, 1982). The blobs are sensitive to color but not to orientation (Livingstone and Hubel, 1984), whereas the hypercolumns are sensitive to orientation but not to color (Hubel and Wiesel, 1977). Given this neural labeling, the theory predicts that the blobs and the hypercolumns activate testably different types of cortical interactions. These interactions do not necessarily occur within the striate cortex, although they must be triggered by signals from the blobs and hypercolumns. The blobs are predicted to initiate featural filling-in. Hence, a single blob should be able to elicit a spreading effect among cells encoding the same featural quality (Figure 3). By contrast, the hypercolumns are predicted to elicit boundary completion. Hence,
2
128
Chapter 2
pairs of similarly oriented and aligned hypercoli~mnsmust be activated before boundary completion over intervening boundary-sensitive cells can be activated (Figure 11). In other words, blobs are predicted to cause an oulurardly directed featural spreading, whereas hypercolumns are predicted to cause an inwardly directed boundary completion. Neural data that support our conception of how these interactions work are summarized below. Cells at an early stage in the boundary contour system are required to be sensitive to orientation and amount of contrast, but not to direction of contrast. Such contour-sensitive cells have been found in Area 17 of monkeys (Gouras and Kriiger, 1979; Tanaka, Lee, and Creutzfeldt, 1983) as well as cats (Heggelund, 1981). These contour-sensitive cells are predicted to activat,e several stages of competition and cooperation that together contribute to the boundary completion process. The boundary completion process is predicted to be accomplished by a positive feedback exchange between cells reacting to long-range cooperation within an orientation and cells reacting to short-range competition between orientations (Figure 1). The competitive cells are predicted to occur at an earlier stage of cortical processing than the cooperative cells (Figure 18). These competitive cells are instrumental in generating a perpendicular end cut at the ends of lines (Figures 24 and 25). The cooperative cells are predicted to be segregated, possibly in distinct cortical lamina, according to the spatial range of their cooperative bandwidths (Figure 12). The recent data of von der Heydt ei al. (1984) support two of these predictions. These authors have reported the existence of cells in Area 18 of the visual cortex that help to “extrapolate lines to connect parts of the stimulus which might belong to the same object” (p.1261). These investigators found these cells by using visual images that induce a percept of illusory figures in humans, as in Figures 2 and 8. Concerning the existence of a cooperative boundary completion process between similarly oriented and spatially aligned cells, they write: Responses of cells in area 18 that required appropriately positioned and oriented luminance gradients when conventional stimuli were used could often be evoked also by the corresponding illusory contour stimuli ....The way widely separated picture elements contribute to a response resembles the function of logical gates (pp.1261-1262). By logical gates they mean that two or more appropriately positioned and oriented scenic contours are needed to activate a response from an intervening cell, as in Figure 11. Concerning the existence of a competitive end-cutting process, they write “The responses to stimuli with lines perpendicular to the cell’s preferred orientation reveal an unexpected new receptive field property” (p.1262). The deep issue raised by these data can be expressed as follows. Why do cells that usually react to scenic edges parallel to their orientational preference also react to line ends that are perpendicular to their orientational preference? We provide an explanation of this property in Sections 11 and 13. If we put these two types of experimental evidence together, the theory suggests that the contour-sensitive cells in Area 17 input to the cells that von der Heydt et al. (1984) have discovered in Area 18. A large number of physiological experiments can be designed to test this hypothesis, using stimuli such as those in Figure 2. For example, suppose that the contour-sensitive cells that would stimulate one end of the boundary completion process in response to a Kanizsa square are destroyed. Then the Area 18 cells that would normally be activated where the illusory boundary lies should remain silent. If these contour-sensitive cells could be reversibly inhibited, then the Area 18 cells should fire only when their triggering contour-sensitive cells in Area 17 are uninhibited. Informative experiments can also be done by selectively inhibiting boundary contour signals using stabilized image techniques. Suppose, for example, that the large circular boundary and the vertical boundary in Figure 24 are stabilized on the retina of a monkey. Then the cells that von der Heydt et d. discovered should stop firing at the corresponding Area 18 locations. This effect should also be reversible
Neural Dynamics of Form Perception
129
when image stabilization is terminated. The net impact of the experiments of von der Heydt et al. is thus to provide strong support for the concept of an inwardly directed boundary completion process and an orthogonally oriented end-cutting process at the ends of lines, as well as a well-defined experimental methodology for testing finer aspects of these processes. Concerning the outwardly directed featural filling-in process, a number of predictions can be made. The cellular syncytium that subserves the featural spreading is predicted to possess membranes whose ability to passively, or electrotonically, spread activation can be gated shut by boundary contour signals (Figure 3). The syncytium is hypothesized to be an evolutionary homolog of the intercellular interactions that occur among the retinal horizontal layers of certain fish (Usui, Mitarai, and Sakakibara, 1983). A possible cortical mechanism of this feature contour syncytium is some form of dendrodendritic coupling. Any manipulation that inhibits signals from the boundary contour system to the feature contour system (pathways BCS- MBCL and BCS+ MBCR of Figure 4) is predicted to release the syncytial flow, as well as to generate a percept of featural flow of colors and brightnesses. If all boundary contour signals are inhibited, so that no boundary restrictions of featural flow occur, then a functional ganzfeld exists within the feature contour system. A dramatic reduction in visual sensitivity should occur, even if the feature contour system is otherwise intact. An indirect behavioral test of how boundary contour signals restrict featural flow can be done using a stabilized image technique (Figure 24). Suppose that the large circular boundary and the vertical boundary in Figure 24 can be stabilized on the retina of a monkey. Train a monkey to press the first lever for food when it sees the unstabilized figure, and to press the second lever to escape shock when it sees a figure with a red background containing two small red circles of different shades of red, as in the stabilized percept. Then stabilize the relevant contours of Figure 24 and test which lever the monkey presses. If it presses the second lever with greater frequency than in the unstabilized condition, then one has behavioral evidence that the monkey perceives the stabilized image much as humans do. Also carry out electrode recordings of von der Heydt e t al. (1984) cells at Area 18 locations corresponding to the stabilized image contours. If these cells stop firing during stabilization and if the monkey presses the second lever more at these times, then a featural flow that is contained by boundary contour signals is strongly indicated. Figure 25 depicts a schematic top-down view of how boundary contour signals elicited by cortical hypercolumns could restrict the syncytial flow of featural quality elicited by cortical blobs. This flow does not necessarily occur among the blobs themselves. Figure 25 indicates, however, that the topographies of blobs and hypercolumns are well suited to serve as inputs to the cell syncytium. We suggest that the cell syncytium occurs somewhere between the blobs in Area 17 (also called V1) and the cells in Area V4 of the prestriate cortex (Zeki, 1983a, 1983b . The theory suggests that the cells of von der Heydt et al. (1984)project to the eel syncytium. Hence staining or electrophysiological techniques that reveal the projections of these cells may be used to locate the syncytium. These experiments are illustrative rather than exhaustive of the many that are suggested by the theory.
I
21. Concluding Remarks By articulating the boundary-feature trade-off, our theory shows that a sharp distinction between the boundary contour system and the feature contour system is needed to discover the rules that govern either system. Paradoxical percepts like neon color spreading can then be explained as consequences of adaptive mechanisms that prevent observers from perceiving a flow of featural quality from all line ends and corners due to orientational uncertainty. The theory’s instantiation of featural filling-in, in turn, arises from an analysis of how the nervous system compensates for having discounted
130
Chapter 2
Figure 24. Contour stabilization leads to filling-in of color. When the edges of the large circle and the vertical line are stablized on the retina, the red color (dots) outside the large circle envelopes the black and white hemi-disks except within the small red circles whose edges are not stabilized (Yarbus, 1967). The red inside the left circle looks brighter and the red inside the right circle looks darker than the enveloping red.
Neural Dynamics of Form Perception
L
",i
131
R
L
R
L
R
Figure 25. Predicted interactions due to signals from blobs and hypercolumns. (a) In the absence of boundary contour signals, each blob can initiate featural spreading to blob-activated cells of like feat,ural quality in a light-dark, red-green, blue-yellow double-opponent system. The symbols L and R signify signals initiated with the left and right ocular dominance columns, respectively. The symbols r and g designate two different color systems; for example, the red and green double-opponent systems. The arrows indicate possible directions of featural filling-in. (b) An oriented boundary contour signal can be initiated from orientations at left-eye positions, right-eye positions, or both. The rectangular regions depict different orientationally tuned cells within a hypercolumn (Hubel and Wiesel, 1977). The shaded region is active. (c) These boundary contour signals are well positioned to attenuate the electrotonic flow of featural quality between contiguous perceptual positions. The shaded blob and hypercolumn regions are activated in the left figure. The arrows in the right figure illustrate how featural filling-in is restricted by the active boundary contour signal.
132
Chapter 2
spurious illuminants, stabilized retinal veins, scotomas, and other imperfections of the retinal image. Once one accepts the fact that featural qualities can fill-in over discounted inputs, then the need for another contour system to restrict the featural flow seems inevitable. A careful study of these contour systems reveals that they imply a strong statement about both the computational units and the types of visual representations that are used in other approaches to visual perception. We claim that local computations of scenic luminances, although useful for understanding some aspects of early visual processing, cannot provide an adequate understanding of visual perception because most scenic luminances are discounted as spurious by the human visual system. We also posit that physical processes of featural filling-in and boundary completion occur, as opposed to merely formal correspondences between external scenes and internal representations. Many contemporary contributors to perception eschew such physical approaches in order to avoid the pitfalls of naive realism. Despite the physical concreteness of the contour system processes, these processes do not support a philosophy of naive realism. This can be seen most easily by considering how the activity patterns within the contour systems are related to the “conscious percepts” of the theory. For example, many perpendicular end cuts due to scenic line endings never reach consciousness in the theory. This property reflects the fact that the theory does not just rebuild the edges that exist “out there.” Instead, the theory makes a radical break with classical notions of geometry by suggesting that a line is not even a collection of points. A line is, at least in part, the equilibrium set of a nonlinear cooperative-competitive dynamical feedbaek process. A line in the theory need not even form a connected set until it dynamically equilibrates, as Figures 20 and 21 demonstrate. This property may have perceptual significance, because a boundary contour cannot effectively restrict featural filling-in to become visible until it can separate two regions of different featural contrast. Initial surges of boundary completion may thus be competitively squelched before they reach consciousness, as in metacontrast phenomena. In a similar vein, featural filling-in within a cell syncytium does not merely establish a point-to-point correspondence between the reflectances of a scene and corresponding positions within the cell syncytium. Until a boundary contour pattern is set up within the syncytium, the spatial domain within which featural contour inputs interact to influence prescribed syncytial cells is not even defined, let alone conscious. Perhaps the strongest disclaimer to a naive realism viewpoint derives from the fact that none of the contour system interactions that have been discussed in this article are assumed to correspond to conscious percepts. All of these interactions are assumed to be preprocessing stages that may or may not lead to a conscious color-and-form-indepth percept at the binocular percept stage of Figure 4. As during binocular rivalry (Kaufman, 1974; Kulikowski, 1978), a contoured scene that is easily perceived during monocular viewing is not always perceived when it is binocularly viewed along with a discordant scene to the other eye. A conscious percept is synthesized at the theory’s BP stage using output signals from the two pairs of monocular contour system (Cohen and Grossberg, 1984a, 1084b; Grossberg, 1983a). The formal cells within the BP stage we sensitive to spatial scale, orientation, binocular disparity, and the spatial distribution of featural quality. Many BP cells that receive inputs from the MBCL and MBCR stages are not active in the BP percept. Although the BP stage instantiates a physical process, this process represents an abstract context-sensitive representation of a scenic environment, not merely an environmental isomorphism. We believe that Area V4 of the prestriate cortex fulfills a similar function in uiuo (Zeki, 1983a, 1983b). Even when a conscious representation is established at the BP stage, the information that is represented in this way is quite limited. For example, the process of seeing a form at the BP stage does not imply that we can recognize the objects within that form. We hypothesize that the boundary contour system sends signals in parallel to the monocular brightness and color stages (MBCL and MBCR in Figure 4) as well as
Neural Dynamics of Fomi Perception
133
to an object recognition system. The top-down feedback from the object recognition system to the boundary contour system ran provide “cognitive contourn signals that are capable of modulating the boundary completions that occur within the boundary contour system (Gregory, 1966; Grossberg, 1980, 1982, 1984b). Thus we envisage that two types of cooperative feedback-boundary Completion signals and learned top-down expectancies-can monitor the synthesis of monocular boundary contours. For the same reasons that not all bottom-up activations of boundary contours become visible, not all top-down activations of boundary contours become visible. A boundary contour that is invisible at the BP stage can, however, have a strong effect on the object recognition system. “Seeing” a BP form percept does not imply a knowledge of where an object is in space, any more than it implies a knowledge of which object is being seen. Nonetheless, just as the same network laws are being used to derive networks for color and form perception and for object recognition, so too are these laws being used to analyse how observers learn to generate accurate movements in response to visual cues (Grossberg, 1978, 1985, in press; Grossberg and Kuperstein, 1985). This work on sensory-motor control suggests how a neural network as a whole can accurately learn to synthesize and calibrate sensory-motor transformations in real-time even though its individual cells cannot do so, and even if the cellular parameters from which these networks are built may be different across individuals, may change during development, and may be altered by partial injuries throughout life. Our most sweeping reply to the criticism of naive realism is thus that a single set of dynamical laws can be used, albeit in specialized wiring diagrams, for the explanation of data that, on the level of naive experience, could not seem to be more different. Using such laws, the present theory promises to provide a significant synthesis of perceptual and neural data and theories. Spatial frequencies and oriented receptive fields are both necessary but not sufficient. The perceptual interpretation of the blobs and hypercolumns strengthens the arguments for parallel cortical processing, but the need for several stages of processing leading to a unitary percept also strengthens the arguments for hierarchical cortical processing. A role for propagated action potentials in the boundary contour system is balanced by a role for electrotonic processing in the feature contour system. Relatively local cortical processing is needed to comp.ute receptive field properties, but relatively global cortical interactions are needed to generate unambiguous global percepts, such as those of perceptual boundaries, from ambiguous local cues. The deepest conceptual issue raised by the present results concern the choice of perceptual units and neural design principles. The impoverished nature of the retinal image and a huge perceptual data base about visual illusions show that local computations of pointwise scenic luminances cannot provide an adequate understanding of visual perception. The boundary-feature trade-off suggests that the visual system is designed in a way that is quite different from any possible local computational theory. This insight promises to be as important for the design of future computer vision and robotics algorithms as it may be for progress in perceptual and neural theory.
134
Chapter 2
APPENDIX Dynamics of Boiindary Formation
A network that instantiates the qualitative requirements described in the text will now be defined in stages, so that the basic properties of each stage can be easily understood. A t each stage, we chose the simplest instantiation of the computational idea. Oriented Masks To define a mask centered at position (i,j)with orientation k, divide the rectangular receptive field of the mask into a left-rectangle Lgjk and a right-rectangle Ryk. Suppose that all the masks sample a field of preprocessed inputs. Let Spqequal the preprocessed input to the position ( p , q ) of this field. The output Jz3kfrom the mask at position (i,j) with orientation k is then defined by
where
and the notation Ip]+ = max(p,O). In ( A l ) , term
only if U*jk/Vi3k> a. Because U,j~ measures the total input to the left rectangle L , j k and Vtjk measures the total input to the right rectangle Rt3k, inequality (A4) says that the input to Ltjk exceeds that to RIJkby the factor a. Parameter a ( 2 1) thus measures the relative contrast between the left and right halves of the receptive field. The sum of two terms in the numerator of ( A l ) says that Jtlk is sensitive to the amount of contrast, but not to the direction of contrast, received by Lg3k and R+. The denominator term in ( A l ) enables JzJkto compute a ratio scale in the limit where p(V:Jk4- K J k ) is much greater than 1. Intraorientational Competition Between Positions As in Figure 18, inputs JIlk with a fixed orientation k activate potentials w,+ with the same orientation via on-center off-surround interactions. To achieve a disinhibitory capability, all potentials Wgjk, are also excited by the same tonically active input I. Suppose that the excitatory inputs are not large enough to saturate their potentials, but that the inhibitory inputs can shunt their potentials toward small values. Then
where Dp*, is the inhibitory interaction strength between positions ( p , q ) and (i,j),and f(J,ik) is the input signal generated by J,,k. Suppose, for simplicity, that f(Jijk)
= yJijk,
(A6)
Neural Dynamics of Form Perception
135
where y is a positive ronstant. Also suppose that w j J k equilibrates rapidly to its inputs through time and is thus always approximately at equilibrium. Setting $ w 1 3 k = 0 in (A5),we find that
Dipole Competition Between Perpendicular Orientations ~ output signals that compete at their Perpendicular potentials W,Jk and w I J elicit , (Figure 18). Assume that these output target potentials Z l J k and z t J ~respectively , are always nonnegative by (A7), signals equal the potentials of w , 3 k and w , ~ K which and that Z t J k and zl1x respond quickly to these signals within their linear dynamical range. Then Zagk = W13k - W i l K (A8)
Output signals are, in turn, generated by x , J k and z l 3when ~ they exceed a nonnegative threshold. Let this threshold equal zero and suppose that the output signals o i 3 k = O ( Z , J k ) and O i 3 = ~ 0(zlJ~ grow ) linearly above threshold. Then
and OljK = C[wlJx - wtjk]+,
(All)
wltere C is a positive constant and [p]' = max(p,O). ~nt:rorientationa~ Competition W i t h i n a Position Let the outputs Ollkr k = 1 , 2 , . . . , n, be the inputs to an orientationally tuned on-center off-surround competition within each position. The potential Y t J k is excited by O13k and inhibited by all OtJm,m-3 k . Potential Y I J k therefore obeys the shunting on-center off-surround equation (Gkssberg, 1983a)
Suppose that implies that
Vi3k
also equilibrates rapidly to its inputs. Setting
where
By equation (A13), the total activity
$&jk
= 0 in (A12)
Chapter 2
136
tends to be conserved because Yij
=
BO,j 13
Thus if A is small compared to OIJ, then vr3 2 B . Oriented Cooperation As in Figure 19, if two (sets of) output signals f ( Y y k ) and f ( v u v k ) can trigger supraliminal activation of an intervening boundary completion potential z,t, then positive can initiate a rapid completion of a boundary with orienfeedback from Zpqk to &?k tation k between positions (i,j) and ( u , ~ ) .The following equation illustrates a rule for activating a boundary completion potential Ztlk due to properly aligned pairs of outputs: ddt Z t j k = - z q k + 8 ( f(vpqk)Ep,pj) (k) (Pd
t 8(
(A171
f(Ypqk)Fi!)(Pd
In (A17), g(s) is a signal function that becomes positive only when 8 is positive, and has a finite maximum value. A sum of two sufficiently positive g(s) terms in (A17) is needed to activate Z,jk above the firing threshold of its output signal h(ztlk). The output signal function h ( s ) is chosen faster-than-linear, and with a large slope to help choose orientation k in position ( i , j ) . Each sum
cf
(Yppk)Ej:!J
(PP4
and
cf
(Ypqk)Fj:A
(Pd
adds up outputs from a strip with orientation k that lies to one side or the other of position (i,j), as in Figure 11. The oriented kernels E E , and Fk\ accomplish this process of anisotropic averaging. A set of modestly large f(&+) outputs within the bandwidth of Egi, or FZjl can thus have as much of an effect on 2& as a single larger f(g,k) output. This property contributes to the statistical nature of the boundary completion process. An equation in which the sum of g ( w ) terms in (A17) is replaced by a product of g ( w ) terms works just as well formally. At equilibrium, (A17) implies that
The effect of boundary completion feedback signals h(z+) on the ( i , j ) position is described by changing the equation (A7) to
h
'h
Equations Al), (A19), (AlO), A13), and (AM), respectively, define the equilibrium of the networ , up to parameter c oices. This system is summarized below for completeness.
Neural Dynamics of Form Perception
137
and Gjk
= 8(
(Pd
i(gpqk)E:!J)
+ 9 ( 1 f(@pqk)F$t)j). (Pd
Although these equilibrium equations compactly summarize the computational logic of competitive-cooperative boundary contour interactions, a full understanding of the information processing capabilities of this network requires a study of the corresponding differential equations, not just their equilibrium values. The equations for feature contour signals and diffusive filling-in are described in Cohen and Grossberg (1984b).
138
Chapter 2
REFERENCES Arend, L.E., Buehler, J.N., and Lockhead, G.R., Difference information in brightness perception. Perception and Psychophysics, 1971,9, 367-370. Beck, J., Prazdny, K., and Rosenfeld, A., A theory of textural segmentation. In J. Beck, B. Hope, and A. Rosenfeld (Eds.), H u m a n a n d machine vision. New York: Academic Press, 1983, pp.1-38. Bergstrom, S.S., A paradox in the perception of luminance gradients, I. Scandinavian Journal of Psychology, 1966, 7, 209-224. Bergstriim, S.S., A paradox in the perception of luminance gradients, 11. Scandinavian Journal of Psychology, 1967, 8 , 25-32 (a). Bergstrom, S.S., A paradox in the perception of luminance gradients, 111. Scandinavian Journal of Psyrhology, 1967, 8, 33-37 (b). Biederman, I., Personal communication, 1984. Blake, R., Sloane, M., and Fox, R., Further developments in binocular summation. Perception and Psychophysics, 1981, 30, 266-276. Boynton, R.M., Color, hue, and wavelength. In E.C. Carterette and M.P. Friedman (Eds.), Handbook of perception: Seeing, Vol. 5. New York: Academic Press, 1975, pp.301-347. Carpenter, G.A. and Grossberg, S., Adaptation and transmitter gating in vertebrate photoreceptors. Journal of Theoretical Neurobiology, 1981, 1, 1-42. Carpenter, G.A. and Grossberg, S., Dynamic models of neural systems: Propagated signals, photoreceptor transduction, and circadian rhythms. In J.P.E. Hodgson (Ed.), Oscillations in mathematical biology. New York: Springer-Verlag, 1983, pp.102196. Cogan, A.L., Monocular sensitivity during binocular viewing. Vision Research, 1982, 22, 1-16. Cogan, A.L., Silverman, G., and Sekuler, R., Binocular summation in detection of contrast flashes. Perception and Psychophysics, 1982, 31, 330-338. Cohen, M.A. and Grossberg, S., Neural dynamics of binocular form perception. New roscience Abstracts, 1983, 13, No. 353.8. Cohen, M.A. and Grossberg, S., Some global properties of binocular resonances: Disparity matching, filling-in, and figure-ground synthesis. In P. Dodwell and T. Caelli (Eds.), Figural synthesis. Hillsdale, NJ: Erlbaum, 1984 (a). Cohen, M.A. and Grossberg, S., Neural dynamics of brightness perception: Features, boundaries, diffusion, and resonance. Perception and Psychophysics, 1984, S6, 428456 (b). Coren, S., When “filling-in” fails. Behavioral and Brain Sciences, 1983, 6, 661-662. Cornsweet, T.N., Visual perception. New York: Academic Press, 1970. Curtis, D.W. and Rule, S.J., Fechner’sparadoxreflects a nonmonotonerelation between binocular brightness and luminance. Perception and Psychophysics, 1980, 27, 263266. Day, R.H., Neon color spreading, partially delineated borders, and the formation of illusory contours. Perception and Psychophysics, 1983, 34, 488-490. DeValois, R.L. and DeValois, K.K., Neural coding of color. In E.C. Carterette and M.P. Friedman (Eds.), Handbook of perception: Seeing, Vol. 5. New York: Academic Press, 1975, pp.117-166. Gellatly, A.R.H., Perception of an illusory triangle with masked inducing figure. Perception, 1980,9, 599-602.
Neural Dynamics of Form Perception
139
Gerrits, H.J.M., deHann, B., and Vendrick, A.J.H., Experiments with retinal stabilized images: Relations beween the observations and neural data. \'ision Research, 1966, 6,427 440. Gerrits, H.J.M. and Timmermann, J.G.M.E.N., The filling-in process in patients with retinal scotomata. vision Research, 1969, 9, 439-442. Gerrits, H.J.M. and Vendrick, A.J.H., Simultaneous contrast, filling-in process and information processing in man's visual system. Experimental Brain Research, 1970, 11, 411-430. Glass, L. and Switkes, E., Pattern recognition in humans: Correlations which cannot be perceived. Perception, 1976, 5, 67-72. Gouras, P. and Kriiger, J., Responses of cells in foveal visual cortex of the monkey to pure color contrast. Journal of Neurophysiology, 1979, 42, 850-860. Graham, N., The visual system does a crude Fourier analysis of patterns. In S. Grossberg (Ed.), Mathematical psychology and psychophysiology. Providence, R I American Mathematical Society, 1981, pp.1-16. Graham, N. and Nachmias, J., Detection of grating patterns containing two spatial frequencies: A test of single-channel and multiple-channel models. Vision Research, 1971, 11, 251-259. Gregory, R.L., Eye and brain. New York: McGraw-Hill, 1966. Grossberg, S., Contour enhancement, short term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 1973, 52, 217-257. Grossberg, S., Adaptive pattern classification and universal recoding, 11: Feedback, expectation, olfaction, and illusions. Biological Cybernetics, 1976, 23, 187-202. Grossberg, S., A theory of human memory: Self-organization and performance of sensory-motor codes, maps, and plans. In R. Rosen and F. Snell (Eds.), Progress in theoretical biology, Vol. 5. New York: Academic Press, 1978, pp.233-374. Grossberg, S., How does a brain build a cognitive code? Psychological Review, 1980, 87, 1-51. Grossberg, S., Adaptive resonance in development, perception, and cognition. In S. Grossberg (Ed.), Mathematical psychology and psychophysiology. Providence, RI: American Mathematical Society, 1981, pp.107-156. Grossberg, S., Studies of mind a n d brain: Neural principles of learning, perception, development, cognition, a n d m o t o r control. Boston: Reidel Press, 1982. Grossberg, S., The quantized geometry of visual space: The coherent computation of depth, form, and lightness. Behavioral and Brain Sciences, 1983, 6,625492 (a). Grossberg, S., Neural substrates of binocular form perception: Filtering, matching, diffusion, and resonance. In E. Basar, H. Flohr, H. Haken, and A.J. Mandell (Eds.), Synergetics of the brain. New York: Springer-Verlag, 1983 (b), pp.274-298. Grossberg, S., Outline of a theory of brightness, color, and form perception. In E. Degreef and J. van Buggenhaut (Eds.), Trends in mathematical psychology. Amsterdam: North-Holland, 1984 (a),pp.59-86. Grossberg, S., Some psychophysiological and pharmacological correlates of a developmental, cognitive, and motivational theory. In R. Karrer, J. Cohen, and P. Tueting (Eds.), Brain and information: Event related potentials. New York: New York Academy of Sciences, 1984 (b), pp.58-151. Grossberg, S., The adaptive self-organization of serial order in behavior: Speech, language, and motor control. In E.C. Schwab and H.C. Nusbaum (Eds.), TITLE???. New York: Academic Press, 1985. Grossberg, S., The role of learning in sensory-motor control. Behavioral and Brain Sciences, in press.
140
Chapter 2
Grossberg, S. and Cohen, M., Dynamics of brightness and contour perception. Supplement t o Investigative Ophthalmology and Visual Science, 1984, 25, 71. Grossberg, S. and Kuperstein, M., Neural dynamics of adaptive sensory-motor control: Ballistic eye movements. Amsterdam: North-Holland, 1985. Grossberg, S. and Mingolla, E., Neural dynamics of perceptual grouping: Textures, boundaries, and emergent segmentations. Perception and Psychophysics, 1985, 38, 141-171.
Hamada, J., Antagonistic and non-antagonistic processes in the lightness perception. Proceedings of the XXII international congress of psychology, Leipzig, July, 1980.
Heggelund, P., Receptive field organization of complex cells in cat striate cortex. Experimental Brain Research, 1981,42, 99-107. Heggelund, P. and Krekling, S., Edge dependent lightness distributions at different adaptation levels. Vision Research, 1976, 16,493-496. Helmholtz, H.L.F. von, Treatise on physiological optics, J.P.C. Southall (Translator and Editor). New York: Dover, 1962. Hendrickson, A.E., Hunt, S.P., and Wu, J.-Y., Immunocytochemical localization of glutamic acid decarboxylase in monkey striate cortex. Nature, 1981, 292, 605-607. Horton, J.C. and Hubel, D.H., Regular patchy distribution of cytochrome oxidase staining in primary visual cortex of macaque monkey. Nature, 1981, 292, 762-764. Hubel, D.H. and Livingstone, M.S., Regions of poor orientation tuning coincide with patches of cytochrome oxidase staining in monkey striate cortex. Neuroscience A bstracts, 1981, 118.12. Hubel, D.H. and Wiesel, T.N., Functional architecture of macaque monkey visual cortex. Proceedings of the Royal Society of London (B),1977, 198, 1-59. Julesz, B., Foundations of cyclopean perception. Chicago: University of Chicago Press, 1971., Kanizsa, G., Contours without gradients or cognitive contours? Italian Journal of Psychology, 1974, I, 93-113. Kanizsa, G., Subjective contours. Scientific American, 1976, 234, 48-64. Kaufman, L., Sight and mind: An introduction to visual perception. New York: Oxford University Press, 1974. Kennedy, J.M., Illusory contours and the ends of lines. Perception, 1978, 7, 605-607. Kennedy, J.M., Subjective contours, contrast, and assimilation. In C.F. Nodine and D.F.Fisher (Eds.), Perception and pictorial representation. New York: Praeger Press, 1979. Kennedy, J.M., Illusory brightness and the ends of petals: Changes in brightness without aid of stratification or assimilation effects. Perception, 1981, 10, 583-585. Kennedy, J.M. and Ware, C.. Illusory contours can arise in dot figures. Perception, 1978, 7, 191-194. Krauskopf, J., Effect of retinal image stabilization on the appearance of heterochromatic targets. Journal of the Optical Society of America, 1963, 53, 741-744. Kulikowski, J.J., Limit of single vision in stereopsis depends on contour sharpness. Nature, 1978, 275, 126-127. Land, E.H., The retinex theory of color vision. Scientific American, 1977,237, 108-128. Leeper, R., A study of a neglected portion of the field of learning-the development of sensory organization. Journal of Genetic Psychology, 1935, 46, 41-75. Legge, G.E.and Rubin, G.S.,Binocular interactions in suprathreshold contrast perception. Perception and Psychophysics, 1981, SO, 49-61.
Neural Dynamics of Form Perception
141
Levelt, W.J.M., On binociilar rivalry. Soesterberg: Institute for Perception, 1965, RVO-TNO. Livingstone, M.S. and Hubel, D.H., Thalamic inputs to cytochrome oxidase-rich regions in monkey visual cortex. Proceedings of the Xational Academy of Sciences, 1982,79, 6098-6101. Livingstone, M.S. and Hubel, D.H., Anatomy and physiology of a color system in the primate visual cortex. Journal of Neuroscience, 1984, 4, 309-356. Mingolla, E. and Grossberg, S., Dynamics of contour completion: Illusory figures and neon color spreading. Supplement to Investigative Ophthalmology and Visual Science, 1984,25, 71. Mollon, J.D. and Sharpe, L.T. (Eds.), Colour vision. New York: Academic Press, 1983. O’Brien, V., Contour perception, illusion, and reality. Journal of the Optical Society of America, 1958,48, 112-119. Parks, T.E., Subjective figures: Some unusual concomitant brightness effects. Perception, 1980,9,239-241. Parks, T.E. and Marks, W., Sharp-edged versus diffuse illusory circles: The effects of varying luminance. Perception and Psychophysics, 1983,33, 172-176. Petry, S.,Harbeck, A., Conway, J., and Levey, J., Stimulus determinants of brightness and distinctions of subjective contours. Perception and Psychophysics, 1983, 34, 169-174. Prazdny, K.,Illusory contours are not caused by simultaneous brightness contrast. Perception and Psychophysics, 1983,34, 403-404. Pritchard, R.M., Stabilized images on the retina. Scientific American, 1961,204, 7278. Pritchard, R.M., Heron, W., and Hebb, D.O., Visual perception approached by the method of stabilized images. Canadian Journal of Psychology, 1960,14, 67-77. Rauschecker, J.P.J., Campbell, F.W., and Atkinson, J., Colour opponent neurones in the human visual system. Nature, 1973, 245, 42-45. Redies, C. and Spillmann, L., The neon color effect in the Ehrenstein illusion. Perception, 1981,10,667-681. Riggs, L.A., Ratliff, F., Cornsweet, J.C., and Cornsweet, T.N., The disappearance of steadily fixated visual test objects. Journal of the Optical Society of America, 1953, 43,495-501. Tanaka, M., Lee, B.B., and Creutzfeldt, O.D., Spectral tuning and contour representation in area 17 of the awake monkey. In J.D. Mollon and L.T. Sharpe (Eds.), Colour vision. New York: Academic Press, 1983,pp.269-276. Todorovic‘, D., Brightness perception a n d the Craik-O’Brien-Cornsweet effect. Unpublished M.A. Thesis. Storrs: University of Connecticut, 1983. Usui, S., Mitarai, G., and Sakakibara, M., Discrete nonlinear reduction model for horizontal cell response in the carp retina. Vision Research, 1983, 23, 413-420. Van den Brink, G. and Keemink, C.J., Luminance gradients and edge effects. Vision Research, 1976, 16, 155-159. Van Tuijl, H.F.J.M., A new visual illusion: Neonlike color spreading and complementary color induction between subjective contours. Acta Psychologica, 1975,39,441-445. Van Tuijl, H.F.J.M. and de Weert, C.M.M., Sensory conditions for the occurrence of the neon spreading illusion. Perception, 1979, 8,211-215. Van Tuijl, H.F.J.M. and Leeuwenberg, E.L.J., Neon color spreading and structural information measures. Perception and Psychophysics, 1979,25, 269-284.
142
Chapter 2
Von der Heydt, R., Peterhans, E., and Baumgartner, G., Illiisory cont,ours and cortical neuron responses. Science, 1984, 224, 1260-1262. Ware, C., Coloured illusory triangles due to assimilation. Perception, 1980, 9, 103-107. Yarbus, A.L., Eye movements and vision. New York: Plenum Press, 1967. Zeki, S., Colour coding in the cerebral cortex: The reaction of cells in monkey visual cortex to wavelengths and colours. Neurosrjence, 1983, 9, 741-765 (a). Zeki, S., Colour coding in the cerebral cortex: The responses of wavelength-selective and colour coded cells in monkey visual cortex to changes in wavelength composition. Neuroscience, 1983,9, 767-791 (b).
143
< ’ h a l ) t i ~3
NEURAL DYNAMICS OF PERCEPTUAL GROUPING: TEXTURES, BOUNDARIES, A N D EMERGENT SEGMENTATIONS Preface This Chapter illustrates our belief that, once a mind-brain theory has probed sufficiently deeply, its further development may proceed in an evolutionary, rather than a revolutionary, way. Although the theory described in Chapter 2 was derived to deal with issues and data concerning boundary formation and featural filling-in, a modest refinement of the theory also deals with many other phenomena about perceptual grouping and textural segmentation. Several of our analyses of these grouping phenomena are contained in this Chapter. Thus unlike traditional artificial intelligence models, each of whose steps forward requires another clever trick by a programmer in an endless series of tricks that never stwn to add up to a theory, in the present type of analysis, once a theory has been derived, its emergent properties continue t o teach us surprising new things. In addition to its competence in textural grouping, the present theory is also competent to provide insights into surface perception, including shape-from-shading. These applications make critical use of our revolutionary claim that all boundaries are invisible, and that they gain visibility by supporting filled-in featmuralcontrast differences within the compartments which boundaries form within the Feature Contour System. The present article documents our belief that the hypercolumns in visual cortex should not be viewed as part of an orientation system, as many visual neurophysiologists are wont t o do. We argue, instead, that the hypercolumns form part of a boundary completion and segmentation system. This is not a minor change in emphasis, because many emergent boundaries span regions of a scenic image which do not contain any oriented contrasts whatsoever. A nuniber of popular models of visual sharpening and recognition, including the Boltzmann machine and various associative learning machines, assume the existence of a cost function which the system acts to minimize. In contrast, we believe that many neural systems do not attempt to minimize a cost function. (See, however, Chapter 5 of Volume I.) Instead, a circuit like the CC Loop spontaneously discovers a coherent segmentation of a scene by closing its own internal cooperative-competitive feedback loops. A circuit like an ART machine discovers and manipulates for itself those “costs” which are appropriate to a particular input environment in the form of its top-down templates, or critical feature patterns (Volume I and Chapters 6 and 7). Although models which utilize explicit cost functions may have useful applications in technology, as models of brain processes, we consider them to be an unappropriate application of 19th century linear physical Hamiltonian thinking. K e advocate instead the use of 20th century nonlinear biological dissipative systems, derived and developed on their own terms as a direct expression of a truly biological intuition.
Perception and Psychophysics 38, 141 - 171 (1985) 0 1 9 8 5 The Psychonomic Society, Inc. Reprinted by permission of the publisher
144
NEURAL DYNAMICS OF PERCEPTUAL GROUPING: TEXTURES, BOUNDARIES, AND EMERGENT SEGMENTATIONS
Stephen Grossbergt and Ennio Mingollat
Abstract
A real-time visual processing theory is used to analyse and explain a wide variety of perceptual grouping and segmentation phenomena, including the grouping of textured images, randomly defined images, and images built up from periodic scenic elements. The theory explains how “local” feature processing and ”emergent” features work together to segment a scene, how segmentations may arise across image regions which do not contain any luminance differences, how segmentations may override local image properties in favor of global statistical factors, and why Segmentations that powerfully influence object recognition may be barely visible or totally invisible. Network interactions within a Boundary Contour Syatena (BCS), a Feature Contour System (FCS), and an Object Recognition System (ORS) are used to explain these phenomena. The BCS is defined by a hierarchy of orientationally tuned interactions, which can be divided into two successive subsystems, called the OC Filter and the CC Loop. The OC Filter contains two successive stages of oriented receptive fields which are sensitive to different properties of image contrasts. The OC Filter generates inputs to the CC Loop, which contains successive stages of spatially short-range competitive interactions and spatially long-range cooperative interactions. Feedback between the competitive and cooperative stages synthesizes a global context-sensitive segmentation from among the many possible groupings of local featural elements. The properties of the BCS provide a unified explanation of several ostensibly different Gestalt rules. The BCS also suggests explanations and predictions concerning the architecture of the striate and prestriate visual cortices. The BCS embodies new ideas concerning the foundations of geometry, on-line statistical decision theory, and the resolution of uncertainty in quantum measurement systems. Computer simulations establish the formal competence of the BCS as a perceptual grouping system. The properties of the BCS are compared with probabilistic and artificial intelligence models of segmentation. The total network suggests a new approach to the design of computer vision systems, and promises to provide a universal set of rules for perceptual grouping of scenic edges, textures, and smoothly shaded regions.
.
t
__
Supported in part by the Air Force Office of Scientific Research (AFOSR’85-0149) and the Army Research Office (DAAG-29-85-K-0095). $ Supported in part by the Air Force Office of Scientific Research (AFOSR 85-0149). 144
Neural Dynamics of Perceptual Grouping
145
1. Introdurtion: Towards A I’iiivrrsal Set of Rilles for P r r r e p t i i a l Group-
ing
The visual system segments optical input into regions that are separated by perceived contours or boundaries. This rapid, seemingly automatic, early step in visual processing is difficult to characterize, largely because many perceived contours have no obvious correlates in the optical input. A rontour in a pattern of luminances is generally defined as a spatial discontinuity in luminance. While usually sufficient, however, such discontinuities are by no means necessary for sustaining perceived contours. Regions separated by visual contours also occur in the presence of: statistical differences in textural qualities such as orientation, shape, density, or color (Beck, 1966a, 1966b, 1972, 1982, 1983; Beck, Prazdny, and Rosenfeld, 1983), binocular matching of elements of differing disparities (Julesz, 1960), accretion and deletion of texture elements in moving displays (Kaplan, 1969), and in classical “subjective contours” (Kanizsa, 1955). The extent t o which the types of perceived contours just named involve the same visual processes as those triggered by luminance contours is not obvious, although the former are certainly as perceptually real and generally as vivid as the latter. Perceptual contours arising at boundaries of regions with differing statistical distributions of featural qualities have been studied in great detail (Beck, 1966a, 1966b, 1972, 1982, 1983; Beck, Prazdny, and Rosenfeld, 1983; Caelli, 1982, 1983; Caelli and Julesz, 1979). Two findings of this research are especially salient. First, the visual system’s segmentation of the scenic input occurs rapidly throughout all regions of that input, in a manner often described as “preattentive.” That is, subjects generally describe boundaries in a consistent manner when exposure times are short (under 200 msec) and without prior knowledge of the regions in a display at which boundaries are likely to occur. Thus any theoretical account of boundary extraction for such displays must explain how early “data driven” processes rapidly converge on boundaries wherever they occur. The second finding of the experimental work on textures complicates the implications of the first, however: the textural segmentation process is exquisitely contextsensitive. That is, a given texture element a t a given location can be part of a variety of larger groupings, depending on what surrounds it. Indeed, the precise determination even of what acts as an element at a given location can depend on patterns at nearby locations. One of the greatest sources of difficulty in understanding visual perception and in designing fast object recognition systems is such context-sensitivity of perceptual units. Since the work of the Gestaltists (Wertheimer, 1923), it has been widely recognized that local features of a srene, such as edge positions, disparities, lengths, orientations, and rontrasts, are perceptually ambiguous, but that combinations of these features can be quickly grouped by a perceiver to generate a clear separation between figures, and between figure and ground. Indeed, a figure within a textured scene often seems to “pop out” from the ground (Neisser, 1967). The “emergent” features by which an observer perceptually groups the “local” features within a scene are sensitive to the global structuring of textural elements within the scene. The fact that these emergent perceptual units, rather than local features, are used to group a scene carries with it the possibility of scientific chaos. If every scene can define its own context-sensitive units, then perhaps object perception can only be described in terms of a n unwieldly taxonomy of scenes and their unique perceptual units. One of the great accomplishmentsof the Gestaltists was to suggest a short list of rules for perceptual grouping that helped to organize many interesting examples. As is often the case in pioneering work, the rules were neither always obeyed nor exhaustive. No justification for the rules was given other than their evident plausibility. More seriously for practical applications, no effective computational algorithms were given to instantiate the rules. Many workers since the Gestaltists have made important progress in advancing our understanding of perceptual grouping processes. For example, Sperling (1970), Julesz
146
Chapter 3
(1971),and Dev (1975) introduced algorithms for iifing disparity cues to coherently separate figure from ground in random dot stereograms. Later workers such as Marr and Poggio (1976)have studied similar algorithms. Caelli (1982, 1983) has emphasized the importance of the conjoint action of orientation and spatial frequency tuning in the filtering operations that preprocess textured images. Caelli and Dodwell (l982), Dodwell (1983), and Hoffman (1970) have recommended the use of Lie group vector fields as a tool for grouping together orientational cues across perceptual space. Caelli and Julesz (1979) have presented evidence that “first order statistics of textons” are used to group textural elements. The term “textons” designates the features that are to be statistically grouped. This view supports a large body of work by Beck and his colleagues (Beck, 1966a, 1966b, 1972, 1982, 1983; Beck, Prazdy, and Rosenfeld, 1983), who have introduced a remarkable collection of ingenious textural displays that they have used to determine some of the factors that control textural grouping properties. The collective effect of these and other contributions has been to provide a sophisticated experimental literature about textural grouping that has identified the main properties that need to be considered. What has not been achieved is a deep analysis of the design principles and mechanisms that lie behind the properties of perceptual grouping. Expressed in another way, what is missing is the raison d’etre for textural grouping and a computational framework that dynamically explains how textural elements are grouped in real-time into easily separated figures and ground. One manifestation of this gap in contemporary understanding ran be found in the image processing models that have been developed by workers in artificial intelligence. In this approach, curves are analysed using different models from those that are used to analyse textures, and textures are analysed using different models from the ones used to analyse surfaces (Horn, 1977; Marr and Hildreth, 1980). All of these models are built up using geometrical ideas-such as surface normal, curvature, and Laplacianthat were used to study visual perception during the nineteenth century (Ratliff, 1965). These geometrical ideas were originally developed to analyse local properties of physical processes. By contrast, the visual system’s context-sensitive mechanisms routinely synthesize figural percepts that are not reducible to local luminance differences within a scenic image. Such emergent properties are not just the effect of local geometrical transformat ions. Our recent work suggests that nineteenth century geometrical ideas are fundamentally inadequate to characterize the designs that make biological visual systems so efficient (Carpenter and Grossberg, 1981, 1983; Cohen and Grossberg, 1984a, 1984b; Grossberg, 1983a, 1983b, 1984a, 1985; Grossberg and Mingolla, 1985, 1986). This claim arises from the discovery of new mechanisms that are not designed to compute local geometrical properties of a scenic image. These mechanisms are defined by parallel and hierarchical interactions within very large networks of interacting neurons. The visual properties that these equations compute emerge from network interactions, rather than from local transformations. A surprising consequence of our analysis is that the same mechanisms which are needed to achieve a biologically relevant understanding of how scenic edges are internally represented also respond intelligently to textured images, smoothly shaded images, and combinations thereof. These new designs thus promise to provide a uniuereal set of rules for the pre-attentive perceptual grouping processes that feed into depthful form percept and object recognition processes. The complete development of these designs will require a major scientific effort. The present article makes two steps in that direction. The first goal of the article is to indicate how these new designs render transparent properties of perceptual grouping which previously were effectively manipulated by a small number of scientists, notably Jacob Beck. A primary goal of this article is thus to provide a dynamical explanation of recent textural displays from the Beck school. Beck and his colleagues have gone far in determining which aspects of textures tend to group and under what conditions. Our
Neural Dynamics of Percepmal Grouping
147
work sheds light on how such segmentation may be implemented by the visual system. The results of Glass and Switkes (1976) on grouping of statistically defined percepts and of Gregory and Heard (1979) on border locking during the cafe! wall illusion will also be analysed using the same ideas. The second goal of the article is to report computer simulations that illustrate the theory’s formal competence for generating perceptual groupings that strikingly resemble human grouping properties. Our theory first introduced the distinction between the Boundary Contour System and the Feature Contour System to deal with paradoxical data concerning brightness, color, and form perception. These two systems extract two different types of contoursensitive information-called Boundary Contour signals and Feature Contour signalsat an early processing stage. The Boundary Contour signals are transformed through successive processing stages within the Boundary Contour System into coherent boundary structures. These boundary structures give rise to topographically organized output signals t o the Feature Contour System (Figure 1). Feature Contour signals are sensitive to luminance and hue differences within a scenic image. These signals activate the same processing stage within the Feature Contour System that receives boundary signals from the Boundary Contour System. The feature contour signals here initiate the filling-in processes whereby brightnesses and colors spread until they either hit their first Boundary Contour or are attenuated by their spatial spread. While earlier work examined the role of the Boundary Contour System in the synthesis of individual contours, whether ‘‘real’’ or “illusory,” its rules also account for much of the segmentation of textured scenes into grouped regions separated by perceived contours. Accordingly, Sections 2 9 of this paper review the main points of the theory with respect to their implications for perceptual grouping. Sections 10-15 and 17-19 then examine in detail the major issues in grouping research to date and describe our solutions qualitatively. Section 16 presents computer simulations showing how our model synthesizes context-sensitive perceptual groupings. The model is described in more mechanistic detail in Section 20. Mathematical equations of the model are contained in the Appendix. 2. T h e Role of Illusory Contours
One of the main themes in our discussion is the role of illusory contours in perceptual grouping processes. Our results make precise the sense in which percepts of “illusory contours”--or contour percepts that do not correspond to one-dimensional luminance differences in a scenic image-and percepts of “real contours” are both synthesized by the same mechanisms. This discussion clarifies why, despite the visual system’s manifestly adaptive design, illusory contours are so abundant in visual percepts. We also suggest how illusory contours that are at best marginally visible can have powerful effects on perceptual grouping and object recognition processes. Some of the new designs of our theory can be motivated by contrasting the noisy visual signals that reach the retina with the coherence of conscious visual percepts. In humans, for example, light passes through a thicket of retinal veins before it reaches retinal photoreceptors. The percepts of human observers are fortunately not distorted by their retinal veins during normal vision. This is due, in part, to the action of mechanisms which attenuate the perception of images that are stabilized with respect to the retina as the eye jiggles in its orbit with respect to the outside world. Suppressing the percept of the stabilized veins does not, in itself, complete the percept of retinal images that are occluded and segmented by the veins. Boundaries need to be completed and colors and brightnesses filled-in in order to compensate for the image degradation that is caused by the retinal veins. A similar discussion follows from a consideration of why human observers do not typically notice their blind spots (Kawabata, 1984). Observers are not able to distinguish which parts of such a completed percept are derived directly from retinal signals and which parts are due to boundary completion and featural filling-in. The completed and filled-in percepts are called, in the usual jargon,
Chapter 3
148
tl t
Figure 1. A macrocircuit of processing stages: Monocular preprocessed signals (MP) are sent independently to both the Boundary Contour System (BCS) and the Feature Cont80urSystem (FCS). The BCS pre-attentively generates coherent boundary structures from these MP signals. These structures send outputs to both the FCS and the Object Recognition System (ORS).The ORS, in turn, rapidly sends top-down learned template signals to the BCS. These template signals can modify the pre-attentively rompleted boundary structures using learned information. The BCS passes these modifications along to the FCS. The signals from the BCS organize the FCS into perceptual regions wherein filling-in of visible brightnesses and colors can occur. This filling-in process is activated by signals from the MP stage.
Neural Dynamics of Perceptual Grouping
149
“illusory” figures. These examples siiggest that both “real” and “illusory”’ figures are generated by the same perceptual niechanisms. and suggest why “illusory” figures are so important in perceptual grouping processes. Once this is understood. the need for a perceptual theory that treats “real” and “illusory” percepts on an equal footing also becomes apparent. A central issue in such a theory concerns whether boundary completion and featural filling-in are the same or distinct processes. One of our theory’s primary contributions is to show that these processes are different by characterizing the different processing rules that they obey. At our present stage of understanding, many perceptual phenomena can be used to make this point. We find the following three phenomena to be particularly useful: the Land (1977) color and brightness experiments; the Yarbus (1967) stabilized image experiments; and the reverse-contrast Kanizsa square (Grossberg and Mingolla, 1985). 3. Discounting t h e Illuininant: Color Edges and Featural Filling-In The visual world is typically viewed under inhomogeneous lighting conditions. The scenic luminances that reach the retina thus confound fluctuating lighting conditions with invariant object colors and lightnesses. Helmholtz (1962) already knew that the brain somehow “discounts the illurninant” to generate color and lightness percepts that are more veridical than those in the retinal image. Land (1977) has clarified this process in a series of striking experiments wherein color percepts within a picture constructed from overlapping patches of colored paper are determined under a variety of lighting conditions. These experiments show that color signals corresponding to the interior of each patch are suppressed. The chromatic contrasts across the edges between adjacent patches are used to generate the final percept. It is easy to see how such a scheme “discounts the illuminant.” Large differences in illumination can exist within any patch. On the other hand, differences in illumination are small across an edge on such a planar display. Hence the relative chromatic contrasts across edges, assumed to be registered by Black-White, Red-Green, and Blue-Yellow double opponent systems, are good estimates of the object reflectances near the edge. Just as suppressing the percept of stabilized veins is insufficient to generate an adequate percept, so too is discounting the illuminant within each color patch. Without further processing, we could at best perceive a world of colored edges. Featural filling-in is needed to recover estimates of brightness and color within the interior of each patch. Thus extraction of color edges and featural filling-in are both necessary in order to perceive a color field or a continuously shaded surface. 4. F e a t u r a l Filling-In Over Stabilized Scenic Edges
Many images can be used to firmly establish that a featural filling-in process exists. The recent thesis of TodoroviC (1983) provides a nice set of examples that one can construct with modest computer graphics equipment. Vivid classical examples of featural filling-in were discovered by artificially stabilizing certain image contours of a scene (Krauskopf, 1963; Yarbus, 1967). Consider, for example, the image schematized in Figure 2. After the edges of the large circle and the vertical line are stabilized on the retina, the red color (dots) outside the large circle fills-in the black and white hemi-discs except within the small red circles whose edges are not stabilized (Yarbue, 1967). The red inside the left circle looks brighter and the red inside the right circle looks darker than the uniform red that envelopes the remainder of the percept. When the Land (1977) and Yarbus (1967) experiments are considered side-by-side, one can recognize that the brain extracts two different types of contour information from scenic images. Feature Contours, including “color edges,” give rise to the signals which generate visible brightness and color percepts a t a later processing stage. Feature Contours encode this information as a contour-sensitive process in order to discount
150
Chapter 3
Figure 2. A classical example of featural filling-in: When the edges of the large circle and t,he vertical line are stabilized on the retina, the red color (dots) outside the large circle envelopes the black and white hemi-discs except within the small red circles whose edges are not stabilized (Yarbus, 1987). The red inside the left circle looks brighter and the red inside the right circle looks darker than the enveloping red.
Neural Dyiiarnics of Perceptual Grouping
151
the illuminant. Boundary Contours arc extracted in order to define the perceptual boundaries, groupings or forms within which featiiral estimates derived from the Feature Contours ran fill-in at a later proccssing stage. In the Yarbus (1967) experiments, once a stabilized scenic edge can no longer generate a Boundary Contour, featural signals can flow across the locatjons corresponding to the stabilized scenic edge until they reach the next Boundary Contour. The phenomenon of neon color spreading also illustrates the dissociation of Boundary Contour and Feature Contour processing (Ejima, Redies, Takahashi, and Akita, 1984; Redies and Spillmann, 1981; Redies, Spillmann, and Kunz. 1984; van Tuijl, 1975; van Tuijl and de Weert, 1979; van Tuijl and Leeuwenberg, 1979). An explanation of neon color spreading is suggested in Grossberg (1984a) and Grossberg and Mingolla (1985). 6. Different Rules f o r Boundary Contours and Feature Contours
Some of the rules that distinguish the Boundary Contour System from the Feature Contour System can be inferred from the percept generated by the reverse contrast Kanizsa square image in Figure 3 (Cohen and Grossberg, 1984b; Grossberg and Mingolla, 1985). Prazdny (1983, 1985) and Shapley and Gordon (1985) have also used reverse contrast images in their discussions of form perception. Consider the vertical boundaries in the perceived Kanizsa square. In this percept, a vertical boundary connects a pair of vertical scenic edges with opposite direction-of-contrast. In other words: T h e black pac-man figure causes a dark-light vertical edge with respect to the grey background, The white pac-man figure causes a light-dark vertical edge with respect to the grey background. The process of boundary completion whereby a Boundary Contour is synthesized between these inducing stimuli is thus indifferent to direction-of-contrast. T h e boundary completion process is, however, sensitive to the orientation and amount of contrast of the inducing stimuli. The Feature Contours extracted from a scene are, by contrast, exquisitely sensitive to direction-of-contrast. Were this not the case, we could never tell the difference between a dark-light and a light-dark percept. \Vc would be blind. Another difference between Boundary Contour and Feature Contour rules can be inferred from Figures 2 and 3. In Figure 3, a boundary forms i n u ~ ~ rind an oriented way between a pair of inducing scenic edges. In Figure 2, featural filling-in is due to an outward and unoriented spreading of featural quality from indioidual Feature Contour signals that continues until the spreading signals either hit a Boundary Contour or are attenuated by their own spatial spread (Figure 4). The remainder of the article develops these and deeper properties of the Boundary Contour System to explain segmentation data. Certain crucial points may profitably be emphasized now. Boundaries may emerge corresponding to image regions in which no contrast differences whatsoever exist. The Boundary Contour System is sensitive to statistical differences in the distribution of scenic elements, not merely to individual image contrasts. In particular, the oriented receptive fields. or masks, which initiate boundary processing are not edge detectors; rather, they are local contrast detectors which can respond to statistical differences in the spatial distribution of image contrasts, including but not restricted to edges. These receptive fields are organized into multiple subsystems, such that the oriented receptive fields within each subsystem are sensitive to oriented contrasts over spatial domains of different sizes. These subsystems can therefore respond differently to spatial frequency information within the scenic image. Since all these oriented receptive fields are also sensitive to amount of contrast, the Boundary Contour System registers statistical differences in luminance, orientation, and spatial frequency even at its earliest stages of processing. Later stages of Boundary Contour System processing are also sensitive to these factors, but in a different way. Their inputs from earlier stages are already sensitive to these factors. They then actively transform these inputs using competitive-cooperative feedback interactions. The Boundary Contour System may hereby process statistical
152
Chapter 3
Figure 3. A reverse contrast Kanisza square: An illusory square is induced by two black and two white pac-man figures on a grey background. Illusory contours can thus join edges with opposite directions-of-contrast. (This effect may be weakened by the photographic reproduction process.)
differences in luminance, orientation, and spatial frequency within a scenic image in multiple ways. We wish also to dispel misconceptions that a comparison between the names Boundary Contour System and Feature Contour System may engender. As indicated above, the Boundary Contour System does generate perceptual boundaries, but neither the data nor our theory permit the conclusion that these boundaries must coincide with the edges in scenic images. The Feature Contour System does lead to visible percepts, such as organized brightness and color differences, and such percepts contain the elements that are often called features. On the other hand, both the Boundary Contour System and the Feature Contour System contain “feature detectors” which are sensitive to luminance or hue differences within scenic images. Although both systems contain “feature detectors,” these detectors are used within the Boundary Contour System to generate boundaries, not visible ”features.” In fact, within the Boundary Contour Syst,em, all boundaries are perceptually invisible. Boundary Contours do, however, contribute to visible percepts, but only indirectly. All visible percepts arise within the Feature Contour System. Completed Boundary Contours help to generate visible percepts within the Feature Contour System by defining the perceptual regions within which activations due to Feature Contour signals can fill-in. Our names for these two systems emphasize that conventional usage of the terms
Neural Dynamics of Perceprual Grouping
153
m
BOUNDARY CONTOUR SIGNALS
tttttttt FEATURE CONTOUR SIGNALS
Figure 4. A monocular brightness and color stage domain within the Feature Contour System: Monocular Feature Contour signals activate cell compartments which permit rapid lateral diffusion of activity, or potential, across their compartment boundaries, except at those compartment boundaries which receive Boundary Contour signals from the BCS. Consequently the Feature Contour signals are smoothed except at boundaries that are completed within the BCS stage. boundary and feature needs modification to explain data about form and color perception. Our usage of these important terms captures the spirit of their conventional meaning, but also refines this meaning to be consistent within a mechanistic analysis of the interactions leading to form and color percepts. 6 . Boundary-Feature Trade-off: Every Line End Is Illusory
The rules obeyed by the Boundary Contour System can be fully understood only by considering how they interact with the rules of the Feature Contour System. Each contour system is designed to offset insufficiencies of the other. The most paradoxical properties of the Boundary Contour System can be traced to its role in defining the perceptual domains that restrict featural filling-in. These also turn out to be the p r o p erties that are most important in the regulation of perceptual grouping. The inability of previous perceptual theories to provide a transparent analysis of perceptual group ing can be traced to the fact that they did not clearly distinguish Boundary Contours from Feature Contours; hence they could not adequately understand the rules whereby Boundary Contours generate perceptual groupings to define perceptual domains adequate to contain featural filling-in.
154
Chapter 3
When one frontally assaults the problem of designing Boundary Contours to rontain featural filling-in, one is led to many remarkable conclusions. One conclusion is that the end of every line is a n “illusory” contour. We now summarize what we mean by this assertion. An early stage of Boundary Contour processing needs to determine the orientations in which scenic edges are pointing. This is accomplished by elongated receptive fields, or orientationally tuned input masks (Hubel and Wiesel, 1977). Elongated receptive fields are, however, insensitive to orientation at the ends of thin lines and at object corners (Grossberg and Mingolla, 1985). This breakdown is illustrated by the computer simulation summarized in Figure 5a, which depicts the reaction of a lattice of orientationally tuned cells to a thin vertical line. Figure 5a shows that in order t o achievesome measure of orientational certainty along scenic edges, the cells sacrifice their ability to determine either position or orientation a t the end of a line. In other words, Figure 5a summarizes the effects of an “uncertainty principle” whereby “orientational certainty” along scenic edges implies “positional unrertainty” a t line ends and corners. Stated in a vacuum, this breakdown does not seem to be particularly interesting. Stated in the shadow of the featural filling-in process, it has momentous implirations. Without further processing that is capable of compensating for this breakdown, the Boundary Contour System could not generate boundaries corresponding to scenic line ends and corners. Consequently, within the Feature Contour System, boundary signals would not exist at positions corresponding to line ends (Figure 6). The Feature Contour signals generated by the interior of each line could then initiate spreading of featural quality to perceptual regions beyond the location of the line end. In short, the failure of boundary detection at line ends could enable colors to flow out of every line end! In order to prevent this perceptual catastrophe, orientational tuning, just like discounting the illuminant, must be followed by a hierarchy of compensatory processing stages in order to gain full effectiveness. To offset this breakdown under normal circumstances, we have hypothesized that outputs from the cells with oriented receptive fields input to two successive stages of competitive interaction (Grossberg, 1984a; Grossberg and Mingolla, 1985), which are described in greater detail in Section 20 and the Appendix. These stages are designed to compensate for orientational insensitivity a t the ends of lines and corners. Figure 5b shows how these competitive interactions generate horizontal Boundary Contour signals at the end of a vertical line. These “illusory” Boundary Contours help to prevent the flow of featural contrast from the line end. Such horizontal Boundary Contours induced by a vertical line end are said to be generated by end cutting, or orthogonal induction. The circle illusion that is perceived by glancing at Figure 7 can now be understood. The Boundary Contour end cuts at the line ends can cooperate with other end cuts of similar orientation that are approximately aligned across perceptual space, just as Boundary Contours do to generate the percept of a Kanizsa square in Figure 3. These Boundary Contours group “illusory” figures for the same reason that they complete figures across retinal veins and blind spots. Within the Boundary Contour System, both ”real” and “illusory” contours are generated by the same dynamical laws.
7. Parallel Induction by Edges versus P e r p e n d i c u l a r Induction by L i n e Ends Knowing the directions in which Boundary Contours will form is obviously essential to understanding perceptual grouping. Why does a boundary form parallel to the inducing edges in Figure 3 but perpendicular to the line ends in Figure 7? This is clearly a question about spatial scale, since thickening a line until its end becomes an edge will cause induction t o switch from being perpendicular to the line to being parallel to the edge. An answer to this question can be seen by inspecting Figure 5. In Figure Sa, strong vertical reactions occur in response to the long vertical edge of the line. Figure 5b shows
Neural Dynamics of Perceptual Grouping
15s
OUTPUT OF ORIENTED MASKS
. . . .
.
. . . .
Figure 5a. An orientation field: Lengths and orientations of lines encode the relative sizes of the activations and orientations of the input masks at the corresponding positions. The input pattern, which is a vertical line end as seen by the receptive fields, corresponds to the shaded area. Each mask has total exterior dimensions of 16 x 8 units, with a unit length being the distance between two adjacent lattice positions.
Chapter 3
156
OUTPUT OF
COMPETITION
' p x x t x x \ t *
l
-
\
i t
l \
t
t
l
I
'
4
1
)
-
Figure 5b. Response of the potentials I/i,k of the dipole field defined in the Appendix to the orientation field of Figure 5a: End cutting generates horizontal activations at line end locations that receive small and orientationally ambiguous input activations.
Neural Dynamics of Perceptual Grouping
157
FF 1-1 inp
Figure 6.Possible spurious flow within the Feature Contour System of featural quality from line ends: Labels ABCD outline the positions corresponding to the tip of a vertically oriented thin line. The black areas from A to B and from C to D indicate regions of the Feature Contour System which receive signals due to direct image-induced activation of vertically oriented receptive fields within the Boundary Contour System. The stipled areas indicate regions of the Feature Contour System which receive Feature Contour signals from the interior of the line image. Feature Contour System receptive fields, being small and unoriented, may be excited at line ends, even if the oriented receptive fields of the Boundary Contour System are not. The arrows indicate that filling-in due to these Feature Contour signals can spread outside the putative boundary ABCD of the line end.
158
Chapter 3
\ Figure 7. Cooperation among end cut signals: A bright illusory circle is induced perpendicular to the ends of the radial lines. that these vertical reactions remain vertical when they pass through the competitive stages. This is analogous to a parallel induction, since the vertical reactions in Figure 5b will generate a completed vertical Boundary Contour that is parallel to its corresponding scenic edge. By contrast, the ambiguous reaction at the line end in Figure 5a generates a horizontal end cut in Figure 5b that is perpendicular to the line. If we thicken the line into a bar, it will eventually become wide enough to enable the horizontally oriented receptive fields at the bar end to generate strong reactions, in just the same way as the vertically oriented receptive fields along the side of the line generated strong vertical reactions there. The transition from ambiguous to strong horizontal reactions as the line end is thickened corresponds to the transition between perpendicular and parallel Boundary Contour induction. This predicted transition has been discovered in electrophysiological recordings from cells in the monkey visual cortex (von der Heydt, Peterhans, and Baumgartner, 1984). The pattern of cell responding in Figure 5a is similar to the data which von der Heydt e t 41. recorded in area 17 of the striate cortex, whereas the pattern of cell responding in Figure 5b is similar to the data which von der Heydt et al. recorded in area 18 of the prestriate cortex. See Grossberg (1985) and Grossberg and Mingolla (1985) for a further discussion of these and other supportive neural data. 8. Boundary Completion via Cooperative-Competitive Feedback Signaling: CC Loops and the Statistics of Grouping
Another mechanism important in determining the directions in which perceptual groupings occur will now be summarized. As in Figure 5b, the outputs of the competitive stages can generate bands of oriented responses. These bands enable cells sensitive to similar orientations at approximately aligned positions to begin cooperating to form the
Neural Dynamics of Perceptual Grouping
159
final Boundary Contour percept. These bands play a useful role, because they increase the probability that spatially separated Boundary (‘ontour fragments will be aligned well enough to cooperate. Figure 8 provides visible evidence of the existence of t.hese bands. In Figure 8a, the end cuts that are exactly perpendicular to their inducing line ends can group to form a square boundary. In Figure Sb, the end cuts that are exactly perpendicular to the line ends cannot group, but end cuts that are almost perpendicular to the line ends can. Figure 8 also raises the following issue. If bands of end cuts exist at every line end, then why cannot all of them group to form bands of different orientations, which might sum to create fuzzy boundaries? How is a single sharp global boundary selected from among all of the possible local bands of orientations? We suggest that this process is accomplished by the type of feedback exchange between competitive and cooperative processes that is depicted in Figure 9. We call such a competitive-cooperative feedback exchange a CC Loop. Figure 9a shows that the competitive and cooperative processes occur a t different network stages, with the competitive stage generating the end cuts depicted in Figure 5b. Thus the outcome of the competitive stage serves as a source of inputs to the cooperative stage and receives feedback signals from the cooperative stage. Each cell in the cooperative process can generate output signals only if it receives a sufficient number and intensity of inputs within both of its input-collecting branches. Thus the cell acts like a type of logical gate, or statistical dipole. The inputs to each branch come from cells of the competitive process that have an orientation and position that are similar to the spatial alignment of the cooperative cell’s branches. When such a cell is activated, say by the conjoint action of both input pathways labeled 1 in Figure 9b, it sends excitatory feedback signals along the pathways labeled 2. These feedback signals activate cells within the competitive stage which rode a similar orientation and spatial position. The cells at the competitive stage cannot distinguish whether they are activated by bottom-up signals from oriented receptive fields or by top-down signals from the cooperative stage. Either source of activation can cause them to generate bottom-up competitive-to-cooperative signals. Thus new cells a t the cooperative stage may now be activated by the conjoint action of both the input pathways labeled 3 in Figure 9b. These newly activated cooperative cells can then generate feedback signals along the pathway labeled 4. In this way, a rapid exchange of signals between the competitive and cooperative stages may occur. These signals can propagate inwards between pairs of inducing Boundary Contour inputs, as in the Kanizsa square of Figure 3, and can thereby complete boundaries across regions whirh receive no bottom-up inputs from oriented receptive fields. The process of boundary completion occurs discontinuously across space by using the gating properties of the cooperative cells (Figure 9b) to successively interpolate boundaries within progressively finer intervals. This type of boundary completion process is capable of generating sharp boundaries, with sharp endpoints, across large spatial domains (Grossberg and Mingolla, 1985). Unlike a low spatial frequency filter, the boundary completion process does not sacrifice fine spatial resolution to achieve a broad spatial range. Quite the contrary is true, since the CC Loop sharpens, or contrast-enhances, the input patterns which it receives from oriented receptive fields. This process of contrast enhancement is due to the fact that the cooperative stage feeds its excitatory signals back into the Competitive stage. Thus the competitive stage does double duty: it helps to complete line ends that oriented receptive fields cannot detect, and it helps to complete boundaries across regions which may receive no inputs whatsoever from oriented receptive fields. In particular, the excitatory signals from the cooperative stage enhance the competitive advantage of cells with the same orientation and position at the competitive stage (Figure Qb). As the competitive-cooperative feedback process unfolds rapidly
Chapter 3
(a)
\
Figure 8. Evidence for bands of orientation responses: In a), an illusory square is generated with sides perpendicular to the inducing lines. In b), an illusory square is generated by lines with orientations that are not exactly perpendicular to the illusory contour. Redrawn from Kennedy (1979).
Neural Dynaniics of Perceptual Grouping
161
Figure 9. Boundary completion in a cooperative-competitive feedback exchange (CC Loop): (a) Local competition occurs between different orientations at each spatial location. A cooperative boundary completion process can be activated by pairs of aligned orientations that survive their local competitions. This cooperative activation initiated the feedback to the competitive stage that is detailed in Figure 9h. (b) The pair of pathways 1 activate positive boundary completion feedback along pathway 2. Then pathways such as 3 activate positive feedback along pathways such as 4. Rapid completion of a sharp boundary between pathways 1 can hereby he generated. See text for details.
162
Chapter 3
through time, these local competitive advantages are synthesized into a global boundary grouping which can best reconcile all these local tendencies. In the most extreme version of this contrast-enhancement process, only one orientation at each position can survive the competition. That is, the network makes an orientational choice at each active position. The design of the CC Loop is based upon theorems which characterize the factors that enable contrast-enhancement and choices to occur within nonlinear cooperativecompetitive feedback networks (Ellias and Grossberg, 1975; Grossberg, 1973: Grossberg and Levine, 1975). As this choice process proceeds, it completes a boundary between some, but not all, of the similarly oriented and spatially aligned cells within the active bands of the competitive process (Figure 8) This interaction embodies a type of real-time statistical decision process whereby the most favorable groupings of cells at the competitive stage struggle to win over other possible groupings by initiating advantageous positive feedback from the cooperative stage. As Figure 8b illustrates, the orientations of the grouping that finally wins is not determined entirely by local factors. This grouping reflects global cooperative interactions that can override the most highly favored local tendencies, in this case the strong perpendicular end cuts. The experiments of von der Heydt, Baumgartner, and Peterhans (1984) also reported the existence of area 18 cells that act like logical gates. These experiments therefore suggest that either the second stage of competition, or the cooperative stage, or both, occur within area 18. Thus, although these Boundary Contour System properties were originally derived from an analysis of perceptual data, they have successfully predicted recent neurophysiological data concerning the organization of mammalian prestriate cortex. I
9. Form Perception versus Object Recognition: Invisible but Potent Boundaries
One final remark needs to be made before turning to a consideration of textured scenes. Boundary Contours in themselves are invisible. Boundary Contours gain visibility by separating Feature Contour signals into two or more domains whose featural contrasts, after filling-in takes place, turn out to be different. (See Cohen and Grossberg, 1984b and Grossberg, 1985 for a discussion of how these and later stages of processing help to explain monocular and binocular brightness data.) We distinguish this role of Boundary Contours in generating visible form percepts from the role played by Boundary Contours in object recognition. We claim that completed Boundary Contour signals project directly to the Object Recognition System (Figure 1). Boundary Contours thus need not be visible in order to strongly influence object recognition. An “illusory” Boundary Contour grouping that is caused by a textured scene can have a much more powerful effect on scene recognition than the poor visibility of the grouping might indicate. We also claim that the object recognition system sends learned top-down template, or expectancy, signals back to the Boundary Contour System (Carpenter and Grossberg, 1985a, 1985b; Grossberg, 1980, 1982a, 1984b). Our theory hereby both agrees with and disagrees with the seminal idea of Gregory (1966) that “cognitive contours” are critical in boundary completion and object recognition. Our theory suggests that Boundary Contours are completed by a rapid, pre-attentive, automatic process as they activate the bottom-up adaptive filtering operations that activate the Object Recognition System. The reaction within the Object Recognition System determines which top-down visual templates to the Boundary Contour System will secondarily complete the Boundary Contour grouping based upon learned “cognitive” factors. These “doubly completed” Boundary Contours send signals to the Feature Contour System to determine the perceptual domains within which featural filling-in will take place. We consider the most likely location of the boundary completion process to be area 18 (or V2) of the prestriate cortex (von der Heydt, Peterhans, and Baumgartner, 19841,
Neural Dynamics of Perceptual Grouping
163
the most likely location of the final stages of color and form perception to be area V4 of the prestriate cortex (Drsimone. Srhrin, Moran. and I‘ngerleider, 1985; Zeki, 1983a, 1983b), and the most likely location of some aspects of object recognition to be the infcrotemporal cortex (Schwartz, Desimone, Albright, and Gross, 1983). These anatomiral interpretations have been chosen by a comparison between theoretical properties and known neural data (Grossberg and Mingolla, 1985). They also provide markers for performing neurophysiological experiments to further test the theory’s mechanistic predictions.
10. Analysis of the Beck T h e o r y of Textural Segmentation: Invisible Colinear Cooperation We now begin a dynamical explanation and refinement of the main properties of Beck’s important theory of textural segmentation (Beck, Prazdny, and Rosenfeld, 1983). One of the central hypotheses of the Beck theory is that “local linking operations form higher-order textural elements” (p.2). “Text,uraI elements are hypothesized to be formed by proximity, certain kinds of similarity, and good continuation. Others of the Gestalt rules of grouping may play a role in the formation of texture., .There is an encoding of the brightness, color, size, slope, and the location of each textural element and its parts” (p.31). We will show that the properties of these “textural elements” are remarkably similar to the properties of the completed boundaries that are formed by the Boundary Contour System. To explain this insight, we will analyse various of the images used by Beck, Prazdny, find Roscnfeld (1983) in the light of Boundary Contour System properties. Figure 10 provides a simple example of what the Beck school means by a “textural element.” Beck, Prazdny. and Rosenfeld (1983) write: “The short vertical lines are linked to form long lines. The length of the long lines is an ‘emergent feature’ which makes them stand out from the surrounding short lines” (p.5). The linking per se is explained by our theory in terms of the process whereby similarly oriented and spatially aligned outputs from the second competitive stage can cooperate to complete a colinear intervening Boundary Contour. One of the most remarkable aspects of this “emergent feature” is not analysed by Beck el al. Why do we continue to see a series of short lines if long lines are the emergent features which control perceptual grouping? In our theory, the answer to this question is as follows. Within the Boundary Contour System, a boundary structure emerges corresponding to the long lines described by Beck et al. This structure includes a long vertical component as well as short horizontal end cuts near the endpoints of the short scenic lines. The output of this Boundary Contour Structure to the Feature Contour System prevents featural filling-in of dark and light contrasts from crossing the boundaries corresponding to the short lines. On the other hand, the output from the Boundary Contour System to the Object Recognition System reads out a long line structure without regard to which subsets of this structure will be perceived as dark or light. This example points to a possible source of confusion in the Beck model. Beck e t al. (1983) claim that “There is an encoding of the brightness, color, size, slope, and the location of each textural element and its parts” (p.31). Figure 10 illustrates a sense in which this assertion is false. The long Boundary Contour structure can have a powerful effect on textural segmentation even if it has only a minor effect on the brightness percepts corresponding t o the short lines in the image, because a n emergent Boundary Contour can generate a large input to the Object Recognition System without generating a large brightness difference. The Beck model does not adequately distinguish between the contrast sensitivity that is needed to activate elongated receptive fields at an early stage of boundary formation and the effects of completed boundaries on featural filling-in. The outcome of featural filling-in, rather than the contrast sensitivity of the
164
Chapter 3
Figure 10. Emergent features: The colinear linking of short line segments into longer segments is an ‘emergent feature” which sustains textural grouping. Our theory explains how such emergent features can contribute to perceptual grouping even if they are not visible. (Reprinted from Beck, Prazdny, and Rosenfeld, 1983.)
Neuraf Dynamics of Perceptual Grouping
165
Boundary Contour System’s elongated receptive fields, helps to determine a brightness or color percept (Cohen and Grossberg, 1984b; Grossberg and Mingolla, 1985). A related source of ambiguity in the Beck model arises from the fact that the strength of an emergent Boundary Contour does not even depend on image contrasts, let alone brightness percepts, in a simple way. The Beck model does not adequately distinguish between the ability of elongated receptive fields to activate a Boundary Contour in regions where image contrast differences do exist and the cooperative interactions that complete the Boundary Contour in regions where image contrast differences may or may not exist. The cooperative interaction may, for example, alter Boundary Contours at positions which lie within the receptive fields of the initiating orientation-sensitive cells, as in Figure 8b. The final percept even at positions which directly receive image contrasts may be strongly influenced by cooperative interactions that reach these positions by spanning positions which do not directly receive image contrasts. This property is particularly important in situations where a spatial distribution of statistically determined image contrasts, such as dot or letter densities, form the image that excites the orientation-sensitive cells.
11. T h e P r i m a c y of Slope Figure 11 illustrates this type of interaction between bottom-up direct activation of orientationally tuned cells and top-down cooperative interaction of such cells. Beck and his colleagues have constructed many images of this type to demonstrate that orientation or “slope is the most important of the variables associated with shape for producing textural segmentation., . A tilted T is judged to be more similar to an upright T than is an L. When these figures are repeated to form textures.. . the texture made up of Ls is more similar to the texture made up of upright Ts than to the texture made up of tilted Ts” (Beck. Prazdny, and Rosenfeld, 1983, p.7). In our theory, this fact follows from several properties acting together: The elongated receptive fields in the Boundary Contour System are orientationally tuned. This property provides the basis for the system’s sensitivity to slope. As colinear boundary completion takes place due to cooperative-competitive feedback (Figure 9), it can group together approximately colinear Boundary Contours that arise from contrast differences due to the different letters. Colinear components of diflerent letters are grouped just as the Boundary Contour System groups image contrasts due to a single scenic edge that excites the retina on opposite sides of a retinal vein. The number and density of inducing elements of similar slope can influence the strength of the final set of Boundary Contours pointing in the same direction. Both Ls and Ts generate many horizontal and vertical boundary inductions, whereas tilted Ts generate diagonal boundary inductions. The main paradoxical issue underlying the percept of Figure 11 concerns how the visual system overrides the perceptually vivid individual letters. Once one understands mechanistically the difference between boundary completion and visibility, and the role of boundary completion in forming even individual edge segments without regard to their ultimate visibility, this paradox is resolved.
12. Statistical Properties of Oriented Receptive Fields: OC Filters Variations on Figure 11 can also be understood by refining the above argument.
In Beck (1966), it is shown that X’s in a background of T’s produces weaker textural segmentation than a tilted T in a background of upright Ts, even though both images contain the same orientations. We agree with Beck, Prazdny, and Rosenfeld (1983) that “what is important is not the orientation of lines per se but whether the change in orientation causes feature detectors to be differentially stimulated” (p.9). An X and a T have a centrally symmetric shape that weakens the activation of elongated receptive fields. A similar observation was made by Schatz 1977), who showed that changing the slope of a single line from vertical to diagonal 1 to stronger textural segmentation than changing the slope of three parallel lines from vertical to diagonal.
el
166
Chapter 3
Figure 11. The primacy of slope: In this classic figure, textural segmentation between the tilted and upright T’s is far stronger than between the upright T’s and L’s. The figure illustrates that grouping of disconnected segments of similar slope is a powerful basis for textural segmentation. (Reprinted from Beck, Prazdny, and Rosenfeld, 1983.)
Neural Dynamics of Perceptua! Grouping
167
Both of these examples are compatible with the fact that orientationally tuned cells measure the statistical distribution of contrasts within their receptive fields. They do not respond only to a template of an edge, bar, or other definite image. They are sensitive to the relative contrast of light and dark on either side of their axis of preferred orientation (Appendix, equation Al). Each receptive field at the first stage of Boundary Contour processing is divided into two halves along an oriented axis. Each half of the receptive field sums the image-induced inputs which it receives. The integrated activation from one of the half fields inhibits the integrated activation from the other half field. A net output signal is generated by the cell if the net activation is sufficiently positive. This output signal grows with the size of the net activation. Thus each such oriented cell is sensitive to amount-of-contrast (size of the net activation) and to direction-of-contrast (only one half field inhibits the other half field), in addition to being sensitive to factors like orientation, position, and spatial frequency. A pair of such oriented cells corresponding to the same position and orientation, but opposite directions-of-contrast, send converging excitatory pathways to cells defining the next stage in the network. These latter cells are therefore sensitive to factors like orientation, position, spatial frequency, and amount-of-contrast, but they are insensitive to direction-of-contrast. Together, the two successive stages of oriented cells define a filter that is sensitive to properties concerned with orientation and contrast. We therefore call this filter an OC Filter. The OC Filter inputs to the CC Loop. The Boundary Contour System network is a composite of OC Filter and CC Loop. The output cells of the OC Filter, being insensitive to direction-of-contrast. are the ones which respond to the relative contrast of light and dark on either side of their axis of preferred orientation. Both the X’s studied by Beck (1956) and the multiple parallel lines studied by Schatz (1977) reduce this relative contrast. These images therefore weaken the relative and absolute sizes of the input to any particular orientation. Thus even the “front end” of the Boundary Contour System begins to regroup the spatial arrangement of contrast differences that is found wtihin the scenic image.
IS. C o m p e t i t i o n Between Pcrpendiciilar Subjective Contours A hallmark of the Beck approach has been the use of carefully chosen but simple figural elements in arrays whose spatial parameters can be easily manipulated. Arrays built up from U shapes have provided a particularly rich source of information about textural grouping. In the bottom half of Figure 12, for example, the line ends of the U’s and of the inverted U’s line up in a horizontal direction. Their perpendicular end cuts can therefore cooperate, just as in Figures 7 and 8, to form long horizontal Boundary Contours. These long Boundary Contours enable the bottom half of the figure to be preattentively distinguished from the top half. Beck e t al. (1983) note that segmentation of this image is controlled by “subjective contours” (p.2). They do not use this phrase to analyse their other displays, possibly because the “subjective” Boundary Contours in other displays are not as visible. The uncertainty within Beck, Prazdny, and Rosenfeld (1983) concerning the relationship between “linking operations” and “subjective contours” is illustrated by their analysis of Figure 13. In Figure 13a, vertical and diagonal lines alternate. In Figure 13b, horizontal and diagonal lines alternate. The middle third of Figure 13a is preattentively segmented better than the middle third of Figure 13b. Beck e t al. (1983) explain this effect by saying that ”The linking of the lines into chains also occurred more strongly when the lines were colinear than when they were parallel, i.e., the linking of horizontal lines to form vertical columnsn (p.21). “The horizontal lines tend to link in the direction in which they point. The linking into long horizontal lines competes with the linking of the lines into vertical columns and interferes with textural segmentation” (p.22). Our theory supports the spirit of this analysis. Both the direct outputs from horizontally oriented receptive fields and the vertical end cuts induced by competitive
168
Chapter 3
Figure 12. Textural grouping supported by subjective contours: Cooperation among end cuts generates horizontal subjective contours in the bottom half of this figure. (Reprinted from Beck, Prazdny, and Rosenfeld, 1983.) processing at horizontal line ends can feed into the colinear boundary completion process. The boundary completion process, in turn, feeds its signals back to a competitive stage where perpendicular orientations compete (Figure 9). Hence direct horizontal activations and indirect vertical end cuts can compete at positions which receive both influences due to cooperative feedback. Beck et al. (1983) do not, however, comment upon an important difference between Figures 13a and 13b that is noticed when one realizes that linking operations may generate both visible and invisible subjective contours. We claim that, in Figure 13b, the end cuts of horizontal and diagonal line ends can cooperate to form long vertical Boundary Contours that run from the top to the bottom of the figure. As in Figure 8b, global cooperative factors can override local orientational preferences to choose end cuts that are not pekpendicular to their inducing line ends. We suggest that this happens with respect to the diagonal line ends in Figure 13b due to the cooperative influence of the vertical end cuts that are generated by colinear horizontal line ends. The long vertical Boundary Contours that are hereby generated interfere with textural segmentation by passing through the entire figure. This observtltion, by itself, is not enough to explain the better segmentation of Fig-
Neural Dynamics of Percephcal Grouping
169
Figure IS. Effects of distance, perpendicular orientations, and colinearity on perceptual grouping: In both (a) and (b vertical and horizontal subjective boundaries are generated. The text explains how t groupings in (a) better segregate the middle third of the figure.
Chapter 3
170
ure 13a. Due to the horizontal alignment of vertical and diagonal line ends in Figure 13a, horizontal Boundary Contours could cross this entire figure. In Figure 13a, however, vertical lines within the top and bottom thirds of the picture are contiguous to other vertical lines. In Figure 13b diagonal lines are juxtaposed between every pair of horizontal lines. Thus in Figure 13a, a strong tendency exists to form vertical Boundary Contours in the top and bottom thirds of the picture due both to the distance dependence of colinear cooperation and to the absence of competing intervening orientations. These strong vertical Boundary Contours can successfully compete with the tendency to form horizontal Boundary Contours that cross the figure. In Figure 13b, the tendencies to form vertical and horizontal Boundary Contours are more uniformly distributed across the figure. Thus the disadvantage of Figure 13b may not just be due to “linking into long horizontal lines competes with the linking of the lines into vertical columns” as Beck et at. (1983, p.22) suggest. We suggest that, even in Figure 13a, strong competition from horizontal linkages occurs throughout the figure. These horizontal linkages do not prevent preattentive grouping because strong vertical linkages exist at the top and bottom thirds of the figure and these vertical groupings cannot bridge the middle third of the figure. In Figure 13b, by contrast, the competing horizontal linkages in the top and bottom third of the figure are weaker than in Figure 13a. Despite this the relative strengths of emerging groupings corresponding to different parts of a scene, rather than the strengths of oriented activations at individual scenic positions, determine how well a region of the scene can be segmented.
14. Multiple Distance-Dependent Boundary Contour Interactions: plaining Gestalt Rules
Ex-
Figure 14 illustrates how changing the spatial separation of figural elements, without changing their relative positions, can alter interaction strengths at different stages of the Boundary Contour System; different rearrangements of the same scenic elements can differentially probe the hierarchical organization of boundary processing. This type of insight leads us to suggest how different Gestalt rules are realized by a unified system of Boundary Contour System interactions. In the top half of Figure 14a, horizontal Boundary Contours that cross the entire figure are generated by horizontal end cuts at the tips of the inverted U’s. These long Boundary Contours help to segregate the top half of the figure from its bottom, just as they do in Figure 12. This figure thus reaffirms that colinear cooperative interactions can span a broad spatial range. Some horizontal Boundary Contour formation may also be caused by cooperation between the bottoms of the 11’s. We consider this process to be weaker in Figure 14a for the same reason that it is weaker in Figure 12: the vertical sides of the U’s weaken it via competition between perpendicular orientations. Beck 1983. p.231, by contrast, assert that “The bottom lines of the U’s link on the basis et o colincarity (a special case of good continuation)”, and say nothing about the horizontal Boundary Contours induced by the horbontal end cuts. In Figure 14b, the U and inverted U images are placed more closely together without otherwise changing their relative spatial arrangement. End cuts at the tips of the inverted U’s again induce horizontal Boundary Contours across the top half of the figure. New types of grouping are also induced by this change in the density of the U’s. The nature of these new groupings can most easily be understood by considering the bottom of Figure 14b. At a suitable viewing distance, one can now see diagonal groupings that run at 45’ and 135’ angles through the bases of the U’s and inverted U’s. We claim that these diagonal groupings are initiated when the density gets sufficiently high to enable diagonally oriented receptive fields to record relatively large image contrasts. In other words, a t a low density of scenic elements, orientationally tuned receptive fields can be stimulated only by one U or inverted U at a time. At a sufficiently high density of scenic elements, each receptive field can be stimulated by parts of different scenic elements that fall within that receptive field. Once the diagonal receptive fields get activated, they
1
Neural Dynamics of Perceptual Grouping
171-
Figure 14. The importance of spatial scale: These three figures probe the subtle effects on textural grouping of varying spatial scale. For example, the diagonal grouping at the bottom of (b) is initiated by differential activation of diagonally oriented masks, despite the absence of any diagonal edges in the image. See the text for extended discussion.
172
Chapter 3
Figure 14e. can trigger diagonally oriented boundary completions. A similar possibility holds in the top half of Figure 14b. Horizontally and vertically tuned receptive fields can begin to be excited by more than one U or inverted U. Thus the transition from Figure 14a to 14b preserves long-range horizontal cooperation based on competitive end cuts and other colinear horizontal interactions, and enables the earlier stage of oriented receptive fields to create new scenic groupings, notably in diagonal directions. Beck e l al. (1983) analyse Figures 14a and 14b using Gestalt terminology. They say that segmentation in Figure 14a is due to “linking based on the colinearity of the base lines of the U’S” (p.24). Segmentation in Figure 14b is attributed to “linking based on closure and good continuation’’ (p.25). We suggest that both segmentations are due to the same Boundary Contour System interactions, but that the scale change in Figure 14b enables oriented receptive fields and cooperative interactions to respond to new local groupings of image components. In Figure 14c, the relative positions of U’s and inverted U’s are again preserved, but they are arranged to be closer together in the vertical than in the horizontal direction. These new columnar relationships prevent the image from segmenting into top and bottom halves. Beck et al. (1983) write that “Strong vertical linking based on proximity interferes with textural segmentation” (p.28). We agree with this emphasis on proximity, but prefer a description which emphasizes that the vertical linking process uses the same textural segmentation mechanisms as are needed to explain all of their displays. We attribute the strong vertical linking to the interaction of five effects within the Boundary Contour System. The higher relative density of vertically arranged U’s and inverted U’s provides 8 relatively strong axtivation of vertically oriented receptive fields. The higher density and stronger activation of vertically oriented receptive
Neural Dynamics of Perceptual Grouping
173
fields generates larger inputs to the vertirally oriented long-range rooperative process, which enhances the vertical advantage by generating strong top-down positive feedbark. The smaller relative density of horizontally arranged U’s and inverted U’s provides a relatively weak activation of horizontally oriented receptive fields. The lower density and smaller activation of these horizontally oriented receptive fields generates a smaller input to the horizontally oriented cooperative process. The horizontally oriented cooperation consequently cannot offset the strength of the vertically oriented cooperation. Although the horizontal end cuts can be generated by individual line ends, the reduction in density of these line ends in the horizontal direction reduces the total input to the corresponding horizontally oriented cooperative cells. All of these factors favor the ultimate dominance of vertically oriented long-range Boundary Contour structures. Beck el al. (1983)analyse the different figures in Figure 14 using different combinations of classical Gestalt rules. We analyse these figures by showing how they differentially stimulate the same set of Boundary Contour System rules. This type of mechanistic synthesis leads to the suggestion that the Boundary Contour System embodies a universal set of rules for textural grouping. 15. I m a g e Contrasts and Neon Color Spreading
Beck et al. (1983)used regular arrays of black and grey squares on a white background and of white and grey squares on a black background with the same incisiveness as they used U displays. All of the corresponding perceptual groupings can be qualitatively explained in terms of the contrast-sensitivity of Boundary Contour System responses to these images. The most difficult new property of these percepts can be seen by looking at Figure 15. Diagonal grey bands can be seen joining the grey squares in the middle third of the figure. We interpret thiv effect to be a type of neon color spreading (van Tuijl, 1975). This interpretation is supported by the percept that obtains when the grey squares are replaced by red squares of similar contrast, as we have done using our computer graphics system. Then diagonal red bands can be seen joining the red squares in the middle of the figure. Neither these red diagonal bands, nor by extension the grey bands seen upon inspection of Figure 15, can be interpreted as being merely a classical contrast effect due to the black squares. The percept of these diagonal bands can be explained using the same type of analysis that Grossberg (1984a) and Grossberg and Mingolla (1985)have used to explain the neon color spreading that is induced by a black Ehrenstein figure surrounding a red cross (Figure 16; Redies and Spillmann, 1981) and the complementary color induction and spreading that is induced when parts of an image grating are achromatic and complementary parts are colored (van Tuijl, 1975). These explanations indicate how segmentation within the Boundary Contour System can sometimes induce visible contrasts at locations where no luminance contrasts exist in the scenic image. Neon spreading phenomena occur only when some scenic elements have greater relative contrasts with respect to the background than do the complementary scenic elements (van Tuijl and de Weert, 1979). This prerequisite is satisfied by Figure 15. The black squares are much more contrastive relative to the white ground than are the grey squares. Thus the black-to-white contrasts can excite oriented receptive fields within the Boundary Contour System much more than can the grey-to-white contrasts. As in our other explanations of neon color spreading, we trace the initiation of this neon effect to two properties of the Boundary Contour System: the contrast-sensitivity of the oriented receptive fields, and the lateral inhibition within the first competitive stage among like-oriented cells at nearby positions (Section 20 and Appendix). Due to contrast-sensitivity, each light grey square activates oriented receptive fields less than each black square. The activated orientations are, by and large, vertical and horizontal, at least on a sufficiently small spatial scale. At the first competitive stage, each strongly activated vertically tuned cell inhibits nearby weakly activated vertically tuned cells,
174
Chapter 3
Figure 15. Textural segmentation and neon color spreading: The middle third of this figure is easily segmented from the rest. Diagonal flow of grey featural quality between the grey squares of the middle segment is an example of neon color spreading. See also Figures 16 and 17. (Reprinted from Beck, Prazdny, and Rosenfeld, 1983. We are grateful to Jacob Beck for providing the original of this figure.) and each strongly activated horizontally tuned cell inhibits nearby weakly activated horizontally tuned cells (Figure 17). In all, each light grey square’s Boundary Contours receive strong inhibition both from the vertical and the horizontal direction. This conjoint vertical and horizontal inhibition generates a gap within the Boundary Contours at each corner of every light grey aquare and a net tendency to generate a diagonal Boundary Contour via disinhibition at the second competitive stage. These diagonal Boundary Contours can then link up via colinear cooperation to further weaken the vertical and horizontal Boundary Contours as they build completed diagonal Boundary Contours between the light grey squares. This lattice of diagonal Boundary Contours enables grey featural quality to flow out of the squares and fill-in the positions bounded by the lattice within the Feature Contour System. In the top and bottom thirds of Figure 15, on the other hand, only the horizontal Boundary Contours of the grey squares are significantly inhibited. Such inhibitions tend to be compensated at the cooperative stage by colinear horizontal boundary completion. Thus the integrity of the horizontal Boundary Contours near such a grey square’s corner tends to be preserved. It is worth emphasizing a similarity and a difference between the percepts in Figures 14b and 15. In both percepts, diagonal Boundary Contours help to segment the images. However, in Figure 14b, the diagonals are activated directly at the stage of the oriented receptive fields, whereas in Figure 15, the diagonals are activated indirectly via disinhibition at the second competitive stage. We suggest that similar global factors may partially determine the Hermann grid illusion. Spillmann (1985) has reviewed evidence that suggests a role for central factors in generating this illusion, notably the work of Preyer (1897/98) and Prandtl (1927) showing that when a white grid is presented on a colored background, the illusory spots have the same color as the surrounding squares.
Neural Dynamics of Perceptual Grouping
175
Figure 16. Neon color spreading: (a) A red cross in isolation appears unremarkable. (b) When the cross is surrounded by an Ehrenstein figure, the red color can flow out of the cross until it hits the illusory contour induced by the Ehrenstein figure.
176
Chapter 3
Figure 17. Boundary Contour disinhibition and neon color spreading: This figure illustrates how the neon Spreading evident in Figure 16 can occur. If grey squares are much lighter than black squares and the squares are sufficiently close, thenet effect of strong inhibitory boundary signals from the black squares to the weakly activated grey square boundaries leads to disinhibition of diagonal Boundary Contours. Cooperation between these diagonal boundaries enables diagonal featural flow to occur between the grey squares.
Neural Dynamics of Perceptual Grouping
177
Wolfe (1984) has presented additional evidriicr that global factors contribute to this illusion. Although we expect our theory to be progressively refined as it achieves a greater behavioral and neural explanatory range, we believe that the types of explanation suggested above will continue to integrate the several classical Gestaltist laws into a unified neo-Gestaltist mechanistic understanding. In this new framework, instead of invoking different Gestalt laws to explain different percepts, one analyses how different images probe the same laws in context-sensitive ways. 16. Computer Simulations of Perceptual Grouping
In this section, we summarize computer simulations that illustrate the Boundary Contour System's ability to generate perceptual groupings akin to those in the Beck et ~ l displays. . In the light of these results, we then analyse data of Glass and Switkes (1976) about random dot percepts and of Gregory and Heard (1979) about border locking during the Caf6 wall illusion results before defining rigorously the model neuron interactions that define the Boundary Contour System (BCS). Numerical parameters were held fixed for all of the simulations; only the input patterns were varied. As the input patterns were moved about, the BCS sensed relationships among the inducing elements and generated emergent boundary groupings among them. In all of the simulations, we defined the input patterns to be the output patterns of the oriented receptive fields, as in Figure Ma, since our primary objective was to study the CC Loop, or cooperative-competitive feedback exchange. This step reduced the computer time needed to generate the simulations. If the BCS is ever realized in parallel hardware, rather than by simulation on a traditional computer, it will run in real-time. In all the Figures 18 25, we have displayed network activities after the CC Loop converges to an equilibrium state. These simulations used only a single cooperative bandwidth. They thus illustrate how well the BCS can segment images using a single "spatial frequency" scale. Multiple scales are, however, needed to generate three-dimensional form percepts (Grossberg, 1983a, 1985; Grossberg and Mingolla, 1986). Figure 18a depicts an array of four vertically oriented input clusters. We call each cluster a Line because it represents a caricature of an orientation field's response to a vertical line (Figure 5a). In Figures 18b, c, and d, we display the equilibrium activities of the cells at three successive CC Loop stages: the first competitive stage, the second competitive stage, and the cooperative stage. The length of an oriented line at each position is proportional to the equilibrium activity of a cell whose receptive field is centered at that position with the prescribed orientation. We will focus upon the activity pattern within the y-field, or second competitive stage, of each simulation (Figure 18c . This is the final competitive stage that inputs to the cooperative stage (Section 8 . The w-field (first competitive stage) and z-field (cooperative stage) activity patterns are also displayed to enable the reader to achieve a better intuition after considering the definitions of these fields in Section 20 and the Appendix. The input pattern in Figure 18a possesses a manifest vertical symmetry: Pairs of vertical Lines are colinear in the vertical direction, whereas they are spatially out-ofphase in the horizontal direction. The BCS senses this vertical symmetry, and generates emergent vertical lines in Figure 18c, in addition to horizontal end cuts at the ends of each Line, as suggested by Figure 10. In Figure 19a, the input pattern shown in Figure 18a has been altered, so that the first column of vertical Lines is shifted upward relative to the second column of vertical Lines. Figure 1%shows that the BCS begins to sense the horizontal symmetry within the input configuration. In addition to the emergent vertical grouping and horizontal end cuts like those of Figure 18c, an approximately horizontal grouping has appeared. In Figure 20, the input Lines are moved so that pairs of Lines are colinear in the
1
178
Chapter 3
a INPUT TO COMPETITION I
b
COMPETITION I ' I l l
I l l I I I
d
COOPERATION
Figure 18. Computer simulation of processes underlying textural grouping: The length of each line segment in this figure and Figures 19-25 is proportional to the activation of a network node responsive to one of twelve possible orientations. The dots indicate the positions of inactive cells. In Figures 18-25, part (a) displays the results of input masks which sense the amount of contrast at a given orientation of visual input, as in Figure 5a. Parts (b)-(d) show equilibrium activities of oriented cells at the competitive and cooperative layers. A comparison of (a) and (b) indicates the major groupings sensed by the network. Here only the vertical alignment of the two left and two right Lines is registered, See text for detailed discussion.
Neural Dynamics ofPerceptual Grouping
b
a INPUT TO
COMPETITION I
179
GO M PETITION I
. . .
I d COMPET IT ION I I I COOPERAT I ON C
. . . . . . . . . . . . . . . . . . . .
...---_-............ ......----.......... . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . I ( . . . . . . . . . . . . . . . . . . I 1 I . . . . . . . . . . . . . . . .
. I l l . . I l l . . I l l . . I 1 1 . . I 1 1 . . I 1 1 . . i l l . . I l l
. . . . . . .I . . . . . .11
I
.
.
. . .
.
. . . . . . I l l . . . . . .
. . . .
. . . .
. . . .
1
. . . . . .
. . . I l l . . . . . . . . . I l l . . . . . . . . . [ ) I . . . . . . . . . i l l . . . . . .
. . . . . . . I l l . . . . . .
. . . . . . .
. . . . . .
. I t . 1 1 1 . . . . . . . . . . . I l l . . . . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
I I
.
,
.
.
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
....----........... ......_----........
Figure 19. The emergence of nearly horizontal grouping: The only difference between the input for this figure and that of Figure 18 is that the right column of lines has been shifted upward by one lattice location. The vertical grouping of Figure 18 is preserved as the horizontal grouping emerges. The horizontal groupings are due to cooperation between end cuts at the Line ends.
180
Chapter 3
vertical direction and their Line ends are lined up in the horizontal direction. Now both vertical and horizontal groupings are generated in Figure 2&, as in Figure 13. In Figure 21a, the input lines are shifted so that they become non-colinear in a vertical direction, but pairs of their Line ends remain aligned. The vertical symmetry of Figure 20a is hereby broken. Thus in Figure 21c, the BCS groups the horizontal Line ends, but not the vertical Lines. Figure 22 depicts a more demanding phenomenon: the emergence of diagonal groupings where no diagonals whatsoever exist in the input pattern. Figure 22a is generated by bringing the two horizontal rows of vertical Lines closer together until their ends lie within the spatial bandwidth of the cooperative interaction. Figure 22c shows that the BCS senses diagonal groupings of the Lines, as in Figure 14b. It is remarkable that these diagonal groupings emerge both on a microscopic scale and a macroscopic scale. Thus diagonally oriented receptive fields are activated in the emergent boundaries, and these activations, as a whole, group into diagonal bands. In Figure 23c, another shift of the inputs induces internal diagonal bands while enabling the exterior grouping into horizontal and diagonal boundaries to persist. In Figure 24a, one of the vertical Lines is removed. The BCS now senses the remaining horizontal and diagonal symmetries (Figure 24c). In Figure 25a, the lower Line is moved further away from the upper pair of Lines until the cooperation can no longer support the diagonal groupings. The diagonal groupings break apart, leaving the remaining horizontal groupings intact (Figure 25c). 17. On-Line Statistical Decision Theory and Stochastic Relaxation These figures illustrate the fact that the BCS behaves like an on-line statistical decision theory in response to its input patterns. The BCS can sense only those groupings of perceptual elements which possess enough "statistical inertia" to drive its cooperativecompetitive feedback exchanges towards a non-zero stable equilibrium configuration. The emergent patterns in Figures 18-25 are thus as important for what they do not show as for what they do show. All possible groupings of the oriented input elements could, in principle, have been generated, since all possible groupings of the cooperativecompetitive interaction were capable of receiving inputs. In order to compare and contrast BCS properties with other approaches, one can interpret the distribution of oriented activities at each input position as being analogous to a local probability distribution, and the final BCS pattern as being the global decision that the system reaches and stores based upon all of its local data. The figures show that the BCS regards many of the possible groupings of these local data as spurious, and suppresses them as being functional noise. Some popular approaches to boundary segmentation and noise suppression do adopt a frankly probabilistic framework. For example, in a stochastic relaxation approach based upon statistical physics, Geqan and Geman (1984) slowly decrease a formal temperature parameter that drives their system towards a minimal energy configuration with boundary enhancing properties. Zucker (1985)has also suggested a minimization algorithm to determine the best segmentation. Such algorithms provide one way, indeed a classical way, to realize coherent properties within a many body system. These algorithms define open loop procedures in which external agents manipulate the parameters leading to coherence. In the BCS, by contrast, the only "external parameters" are the input patterns themselves. Each input pattern defines a different set of boundary conditions for the BCS,and this difference, in itself, generates different segmentations. The BCS does not need extra external parameters because it contains a closed loop process-the CC Loop-which regulates its own convergence to a symmetric and coherent configuration via its real-time competitivecooperative feedback exchanges. The BCS differs in other major ways from alternative models. Geman and Geman (1984), for example, build into the probability distributions of their algorithm informa-
NeurdqvMnrics ofPerceptual Grouping
a INPUT TO
COMPETIT ION I
. . . . . . . . . . . . . ..... .. . . . . . . . . . . . . .. . . . . . .
181
b
COMPETITION I
. . . . . . . . .I I . . . . . . . . .I I . . . . . . . I I . . . . . . .I I . . . . . . . . . . . . . . II II . . . . . . . . . . . . . . . . . . .. .................. . . . . . . . .
I I I I I I
l l l l l l
l l l l l l
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . . . . . .. . . . . . .. . . . . . .. . . . . . ..
. . . .
. . . .
. . . . . . . . . .
. . . .
. .
. . . . . .
. I l l . . . . . . . I l l . . . . . . . I l l . . . . . . . I l l . . . . . . . I 1 1 . . . . . . . I l l . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
d COMPETITION I 1 COOPERATION . . . . . _ . . . . . . . . . C
Figure 20. Coexistence of vertical and horizontal grouping: Here both horizontal and vertical groupings are completed at all Line ends.
Chapter 3
182
b COMPETITION I
a INPUT TO
I COMPETIT ION ......
. . . . . . . . . . . . . . . . . . . . . . . . . . .. I I I I I I . . . . . . . . . . . . ..
. .
. . . . . . . . . . . . . . . . . . . . . . , . . . . . , . . , , . . .
. . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . ..... . . . . . . . . II ll ll ." . " . ' . . . II . . . . . ... I ......Ill"" ... . . . . . . II ll ll . . " ' . ' ' . . . 1I . . . . . . I l l " ' . . . . . . . . . . . . . . . . . . . . .I . . . . .
. . . . . . . . . . . . .
. . .
. . . l l l 1 l .l
l l l 1 l .l
I '
. . . . . ..
I I I I
' ' ' '
I .
COOPERATION . . . . . . ................... ................... ................... ................... ................... ...... ......---........... . . . . . . . . . . . . . . . . . . . . .................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 21. Horizontal grouping by end cuts: A horizontal shift of the lower two Lines in Figure 20 breaks the vertical groupings but preserves the horizontal groupings.
Neural Dynamics of Perceptual Grouping
183
a INPUT TO
COMPETITION I
d
COOPERATION
Figure 22. The emergence of diagonal groupings: The Boundary Contour System (BCS) is capable of generating groupings along orientations which have no activity in the oriented mask responses. Individual diagonally oriented cells are activated within the diagonally oriented groupings.
184
Chapter 3
a
TO COMPETITION I . . . . . . . . . . . . . . . . . . . . INPUT
............. . I l l . . . . . . . I . I 1 1 . . . . . . .I . I l l . . . . . . . I . I l l . . . . . . .I . i l l . . . . . . .I . I 1 1 . . . . . . .1
L
c"0 M PET IT I ON I
....... l l ' . . . l l ' . . .
l l l 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
l l l 1 . .
' ' ' ' . .
. .
. . . . . . . . . . . . . . . .
. . ,
.
.
I
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . I '
I ' I '
I ' I '
I '
C
I d
COMPET I T I ON II 1 COOPERAT ION
Figure 23. Multiple diagonal groupings: A new diagonal grouping emerges as a result of shifting the input Lines. As in Figure 20, grouping in one orientation does not preclude grouping in an (almost) perpendicular orientation at the same Line end.
Neural Dynamics ofPereepfuai Grouping
a INPUT TO
COMPET IT ION
I
185
b
COMPETITION I
. . . . . I . . . . . .
I . . . .. . I . . . . . . I . . . . . .
I . . . . I "
. .
. . . . . . .
. . . . .
. . . . .
. . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
C
d
COMPET IT I ON I I I COOPERPT I ON
Figure 24. Global restructuring due to removal of local features: The inputs of this figure and Figure 23 are identical, except that the lower right Line has been removed. A comparison of Figure 24b with Figure 23b shows that, although gross aspects of the shared grouping are similar, removal of one Line can affect groupings among other Lines.
Chapter 3
186
b
a INPUT TO
COMPETITION I
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I . . . .. . . . . . .
. . . . . . . . . . . . . . . . . . . .
I . . . .
. .
I . . . .. . I . . . .. . I . . . .
I . . . .
COMPET IT ION I
. . . . . _ _ - - _ . . .
. . . . . . ..------\\. I l l . . . . . . . I 1 1 . . . . . . . I l l . . . . . . . I I I . . . . .
I
l
l
. I l l
. I , I
. . . . . . .
I
. . . . . . .
I 1 I I
. . . . . . .
I . . . . . . I . . . . . . I . . . . . .
1
. . . . . . . I1 I . . . . . . . . . . . _ _ - - _ . . .
..-\--------. . .
I
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . .
I
. . . . . . . . . . . . . . . . . . . I l l . ' ' .. . . . . . . . . . . . . I I I . . ~ . .. . . . . .
. . . . . . I , I . . . . . . . . . . . . . . . . . I , [ . . . . . . . . . . . . . .
. .
. .
I
"
~
'
'.
. . . . . . . . . . . . ..
. . . . .
.
,
,
,
. .
. . . . . . . . . . . . .. . . . . . .
d
C
COMPETITION I 1
I
. . . . . .I l l ' " ' . . . . . . . . . . . . . . . . . . . .. . . . . . .
COOPERATION
. . . . . . _ _ _ . . . . . . . . . . .
..-----------.. . . . . . . . . . . . . . . . . . . . . . . . . . .................... . . . . . .
....................
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
..----------_..
. . . . .
. . . . . . - - . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
............ ............ ............ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.................... . . . . . . . . . . . . . . . . . . . . ....................
Figure 26. Distance-dependence of grouping: Relative to the inputs of Figure 24, the bottom Line has moved outside of the cooperative bandwidth that supported diagonal grouping. Although the diagonal grouping vanishes, the horizontal grouping at the bottom of the top Lines persists.
Neurol Dynamics of Perceptual Grouping
187
tion about the images to be processed. The dynamics of the BCS clarify the relevance of probabilistic concepts to the segmentation process. In particular, the distributions of oriented activities at each input position (Figure 5) play the role of local probability distributions. On the other hand, within the BCS, these distributions emerge as part of a real-time reaction to input patterns, rather than according to predet,ermined constraints on probabilities. The BCS does not incorporate hypotheses about which images will be processed into its probability distributions. Such knowledge is not needed to achieve rapid pre-attentive segmentation. The Object Recognition System (ORS) does encode information about which images are familiar (Figure 1). Feedback interactions between the BCS and the ORS can rapidly supplement a pre-attentive segmentation using the templates read-out from the ORS in response to BCS signals. Within our theory, however, these templates are not built into the ORS. Rather, we suggest how they are learned, in real-time, as the ORS self-organizes its recognition code in response to the pre-attentively completed output patterns from the BCS (Carpenter and Grossberg, 1985a, 1985b; Grossberg, 1980, 1984b). Thus the present theory sharply distinguishes between the processes of pre-attentive segmentation and of learned object recognition. By explicating the intimate interaction between the BCS and the ORS, the present theory also clarifies why these distinct processes are often treated as a single process. In particular, the degree to which topdown learned templates can deform a pre-attentively completed BCS activity pattern will depend upon the particular images being processed and the past experiences of the ORS. Thus by carefully selecting visual images, one can always argue that one or the other process is rate-limiting. Furthermore, both the pre-attentive BCS interactions and the top-down learned ORS interactions are processes of completion which enhance the coherence of BCS output patterns. They can thus easily be mistaken for one another.
18. Correlations Which Cannot Be Perceived: Simple Cells, Complex Cells, and Cooperation Glass and Switkes (1976)described a series of striking displays which they partially explained using the properties of cortical simple cells. Herein we suggest a more complete explanation of their results using properties of the BCS. In their basic display (Figure 26), “a random pattern of dots is superimposed on itself and rotated slightly.. , a circular pattern is immediately perceived.. . If the same pattern is superimposed on a negative of itself in which the background is a halftone gray and it rotated as before., ,, it is impossible to perceive the circular Moir4. In this case spiral petal-like patterns can be seen” (p.67). The circular pattern in Figure 26 is not “perceived” in an obvious sense. All that an observer can “see” are black dots on white paper. We suggest that the percept of circular structure is recognized by the Object Recognition System, whereas the Feature Contour System, wherein percepts of brightness and color are seen, generates the filledin contrast differences that distinguish the black dots from the white background (Figure 1). A similar issue is raised by Figure 10, in which short vertical lines are seen even though emergent long vertical lines influence perceptual grouping. Thus, in the Glass and Switkes (1976) displays, no less than in the Beck, Prazdny, and Rosenfeld (1983) displays, one must sharply distinguish the recognition of perceptual groupings from the percepts that are seen. These recognition events always have properties of “coherence,” whether or not they can support visible contrast differences. It then remains to explain why inverting the contrast of one of the images can alter what is recognized as well as what is seen. We agree with part of the Glass and Switkes (1976)explanation. Consider a pair of black dots in Figure 26 that arises by rotating one image with respect to the other, Let the orientation of the pair with respect to the horizontal be 8”. Since the dots are close to one another, they can activate receptive fields that have an orientation
Chapter 3
188
ee
Figure 26. A Glass pattern: The emergent circular pattern is "recognized" although it is not "seen" as a pattern of differing contrasts. The text suggests how this can happen. (Reprinted from Glass and Switkes, 1975.) approximately equal to 6'". This is due to the fact that an oriented receptive field is not an edge detector per se, but rather is sensitive to relative contrast differences across its medial axis. Only one of the two types of receptive fields at each position and orientation will be strongly activated, depending on the direction-of-contrast in the image. Each receptive field is sensitive to direction-of-contrast, even though pairs of these fields corresponding to like positions and orientations pool their activities at the next processing stage to generate an output that is insensitive to direction-of-contrast. We identify cells whose receptive fields are sensitive to direction-of-contrast with simple cells and the cells at the next stage which are insensitive to direction-of-contrast with complez cells of the striate cortex (DeValois, Albrecht, and Thorell, 1982;Gouras and Kriiger, 1979; Heggelund, 1981; Hubel and Wiesel, 1962, 1968; Schiller, Finlay, and Volman, 1976;Tanaka, Lee, and Creutzfeldt, 1983). Glass and Switkes (1976)did not proceed beyond this fact. We suggest, in addition, that long-range cooperation within the BCS also plays a crucial role in grouping Glass images. To see how cooperation is engaged, consider two or more pairs of black dots that satisfy the following conditions: Each pair arises by rotating one image with respect to the other. The orientation of all pairs with respect to the horizontal is approximately 0". All pairs are approximately colinear and do not lie too far apart. Such combinations of dots can more strongly activate the corresponding cooperative cells than can random combinations of dots. Each cooperative cell sends positive feedback to cells at the competitive stages with the same position and orientation. The competing cells which receive the largest cooperative signals gain an advantage over cells with different orientations. After competition among all possible cooperative groupings takes place, the favored groupings win and generate the large circular Boundary Contour structure that is recognized but not seen. Small circular
Neural Dynamics of Perceptual Grouping
189
houndaries are also generated around each dot and support the visible percept of dots on a white background within the Feature Contour System. Thus the orientation 0” of a pair of rotated black dots engages the BCS in two fundamentally different ways. First, it preferentially activates some oriented rereptive fields above others. Second, it preferentially activates some cooperative cells above others due to combinations of inputs from preferentially activated receptive fields. As in the displays of Beck, Prazdny, and Rosenfeld (1983), the Glass images probe multiple levels of the BCS. The other Glass images probe different levels of the BCS, notably the way in which simple cells activate complex cells which, in turn, activate the competitive layers. These images are constructed by reversing the contrast of one of the two images before they are superimposed. Then an observer sees black and white dots on a grey background. The recognition of circular macrostructure is, however, replaced by recognition of a more amorphous spiral petal-like pattern. Glass and Switkes (1976) noted that their “hypothesized neural mechanism does not appear to explain the observation of spirallike patterns” (p.71). To explain this recognition, we first note that the black dots on the grey background generate dark-light contrasts, whereas the white dots on the grey background generate light-dark contrasts. Hence the simple cells which responded to pairs of rotated black dots in Figure 26 are now stimulated by only one dot in each pair. Two or more randomly distributed black dots may be close enough to stimulate individual simple cells, but the cells favored by stimulation by two or more random dots will have different orientations than the cells stimulated by two or more rotated black dots in Figure 26. In addition, simple cells that are sensitive to the opposite direction-of-contrast ran respond to the white dots on the grey background. These cells will be spatially rotated with respect to the cells responding to the black dots. Moreover, since the black-to-grey contrast is greater than the white-to-grey contrast, the cells which respond to the black dots will fire more vigorously than the cells which respond to the white dots. Thus although both classes of simple cells feed into the corresponding complex cells, the complex cells which respond to the black dots will be more vigorously activated than the complex cells responding to the white dots. The cooperative stage will favor the most active rombinations of complex cells whose orientations are approximately colinear and which are not too far apart. Due to the differences in spatial position and orientation of the most favored competitive cells, a different boundary grouping is generated than in Figure 26. A similar analysis can be given to the Glass and Switkes displays that use complementary colors. In summary, the Glass and Switkes (1976) data emphasize three main points: Although simple cells sensitive to the same orientation and opposite direction-of-contrast feed into complex cells that are insensitive to direction-of-contrast, reversing the direction-of-contrast of some inputs can alter the positions and the orientations of the complex cells that are most vigorously activated. Although many possible groupings of cells can initially activate the cooperative stage, only the most favored groupings can survive the cooperative-competitive feedback exchange, as in Figures 18-25. Although all emergent Boundary Contours can influence the Object Recognition System, not all of these Boundary Contours can support visible filled-in contrast differences within the Feature Contour System. Prazdny (1984) has presented an extensive set of Glass-type displays, which have led him to conclude “that the mechanisms responsible for our perception of Glass patterns are also responsible for the detection of extended contours” (p.476). Our theory provides a quantitative implementation of this assertion. 18. Border Locking: T h e Caf6 Wall Illusion
A remarkable percept which is rendered plausible by BCS properties is the caf6 wall illusion (Gregory and Heard, 1979). This illusion is important because it clarifies the conditions under which the spatial alignment of colinear ’image contours with different contrasts is normally maintained. The illusion is illustrated in Figure 27. The illusion occurs only if the luminanre of the “mortar” within the horizontal strips
190
Chapter 3
Figure 27. The caf6 wall illusion: Although only horizontal and vertical luminance contours exist in this image, strong diagonal groupings are perceived. (Reprinted from Gregory and Heard, 1979.) lies between, or is not far outside, the luminances of the dark and light tiles, as in Figure 27. The illusion occurs, for example, in the limiting case of the Miinsterberg figure, in which black and white tiles are separated by a black mortar. Gregory and Heard (1979) have also reported that the tile boundaries appear to ‘kreep across the mortar during luminance changes” (p.368). Using a computer graphics system, we have generated a dynamic display in which the mortar luminance changes continuously through time. The perceived transitions from parallel tiles to wedge-shaped tiles and back are dramatic, if not stunning, using such a dynamic display. Some of the BCS mechanisms that help to clarify this illusion can be inferred from Figure 28. This figure depicts a computer simulation of an orientation field that was generated in response to alternating black and white tiles surrounding a black strip of mortar. Figure 29 schematizes the main properties of Figure 28. The hatched areas in Figure 29a depict the regions in which the greatest artivations of oriented receptive fields occur. Due to the approximately horizontal orientations of the activated receptive fields in Figure 29a, diagonal cooperative groupings between positions such as A and B can be initiated, as in Figures 23-25. Figure 28 thus indicates that a macroscopic spatial asymmetry in the activation of oriented receptive fields can contribute to the shifting of borders which leads to the wedge-shaped percepts. Figure 29b schematizes the fact that the microstructure of the orientation field is also skewed in Figure 28. Diagonal orientations tend to point into the black regions at the corners of the white tiles. Diagonal end cuts induced near positions A and B (Section 6) can thus cooperate between A and B in approximately the same direction as the macrostructure between A and C can cooperate with the macrostructure between B and E (Figure 29a). Diagonal activations near positions C and D can cooperate with each other in a direction almost parallel to the cooperation between A and B. These microscopic and macroscopic cooperative effects can help to make the boundaries at the top of the mortar seem to tilt diagonally downwards.
Neural Dynamics of Perceptual Grouping
191
0
Figure 28. Sixnulation of the responses of a field of oriented masks to the luminance pattern near the mortar of the caf6 wall illusion: The right of the bottom row joins to the left of the top row. The relative size of the masks used to generate the figure is indicated by the oblong shape in the center. Several finer points are clarified by the combination of these macroscale and microscale properties. By themselves, the microscale properties do not provide a sufficient explanation of why, for example, an end cut at position D cannot cooperate with direct diagonal activations at A. The macroscale interactions tilt the balance in favor of cooperation between A and B. In the Munsterberg figure, the black mortar under a white tile may seem to glow, whereas the black mortar under a black tile does not. Using a dark grey mortar, the grey mortar under a white tile may seem brighter, whereas the grey mortar under a black tile may better preserve its grey appearance. McCourt (1983)has also called attention to the relevance to brightness induction to explaining the cafb wall illusion. A partial explanation of these brightness percepts can be inferred from Figure 29. End cuts and diagonal groupings near position A may partially inhibit the parallel boundary between A and C. Brightness can then flow from the white tile downwards,
192
Chapter 3
Figure 29. A schematic depiction of the simulation in Figure 28: (a) shows the region of strong horizontal activity and indicates a possible diagonal grouping between positions A and B. (b) suggests that cooperation may occur in response to direct activations of oriented masks at positions C and D, as well as in response to end cuts at positions A and B. See text for additional discussion.
Neural Dynamics of Percepmal Grouping
193
t
as during neon color spreading Figure 16). The more vigorous boundary activations
above positions such as D and E Figure 29a) may better contain local featural contrasts within a tighter web of Boundary Contours. This property also helps to explain the observation of Gregory and Heard (1979) that the white tiles seem to be pulled more into the black a t positions such as A than at positions such as C. Our analysis of the caf6 wall illusion, although not based on a complete computer simulation, suggests that the same three factors which play an important role in generating the Glass and Switkes (1976) data also play an important role in generating the Gregory and Heard (1979) data. In addition, perpendicular end cuts and multiple spatial scales seem to play a role in generating the Gregory and Heard (1979) data, with different combinations of scales acting between positions such as A-B than positions such as C-D. This last property may explain why opposite sides A and C of an apparently wedge-shaped tile sometimes seem to lie a t different depths from an observer (Grossberg, 1983a). 20. B o u n d a r y Contour System Stages: P r e d i c t i o n s A b o u t Cortical Architectures
This section outlines in greater detail the network interactions that we have used to characterize t h e BCS. Several of these interactions suggest anatomical and physiological predictions about the visual cortex. These predictions refine our earlier predictions that the data of von der Heydt, Peterhans, and Baumgartner (1984) have since supported (Grossberg and Mingolla, 1985). Figure 30 summarizes the proposed BCS interactions. The process whereby Boundary Contours are built u p is initiated by the activation of oriented masks, or elongated receptive fields, a t each position of perceptual space (Hubel and Wiesel, 1977). An oriented mask is a cell, or cell population, that is selectively responsive to oriented scenic contrast differences. In particular, each mask is sensitive to scenic edges that activate a prescribed small region of the retina, and whose orientations lie within a prescribed band of orientations with respect to the retina. A family of such oriented masks lies at every network position such that each mask is sensitive to a different band of edge orientations within its prescribed small region of the scene. A. Position, Orientation, Amount -of-Contrast, and Direction-of- Con trast The first stage of oriented masks is sensitive to the position, orientation, amountof-contrast, and direction-of-contrast at an edge of a visual srene. Thus two subsets of masks exist corresponding to each position and orientation. One subset responds only to light-dark contrasts and the other subset responds to dark-light contrasts. Such oriented masks do not, however, respond only to scenic edges. They can also respond t o any image which generates a sufficiently large net contrast with the correct position, orientation, and direction-of-contrast within their receptive fields, as in Figures 14b and 26. We identify these cells with the simple cells of striate cortex (DeValois, Albrecht, and Thorell, 1982; Hubel and Wiesel, 1962, 1968; Schiller, Finlay, and Volrnan, 1976). Pairs of oriented masks which are sensitive to similar positions and orientations but to opposite directions-of-contrast excite the next BCS stage. The output from this stage is thus sensitive to position, orientation, and amount-of-contrast. but is insensitive to direction-of-contrast. A vertical Boundary Contour can thus be activated by either a close-to-vertical light-dark edge or a close-to-vertical dark-light edge a t a fixed scenic position, as in Figure 2. The activities of these cells define the orientation field in Figure 5a. We identify the cells at this stage with the complex cells of striate cortex (DeValois, Albrecht, and Thorell, 1982; Gouras and Kriiger, 1979; Heggelund, 1981; Hubel and Wiesel, 1962, 1968; Schiller, Finlay, and Volman, 1976; Tanaka, Lee, and Creutzfeldt, 1983). Spitzer and Hochstein (1985) have independently developed an essentially identical model of complex cell receptive fields to explain parametric properties of their cortical data.
Chapter 3
194
ORIENTED COOPERATION
DIPOLE FIELD
+
++ T
$I+
I+
cc
LOOP
Figure 30. Circuit diagram of the Boundary Contour System: Inputs activate oriented masks which cooperate at each position and orientation before feeding into an on-center off-surround interaction. This interaction excites like-orientations at the same position and inhibits like-orientations at nearby positions. The affected cells are on-cells within a dipole field. On-cells at a fixed position compete among orientations. On-cells also inhibit off-cells which represent the same position and orientation. Off-cells at each position, in turn, compete among orientations. Both on-cells and off-cells are tonically active. Net excitation (inhibition) of an on-cell (off-cell) excites (inhibits) a cooperative receptive field corresponding to the same position and orientation. Sufficiently strong net positive activation of both receptive fields of a cooperative cell enables it to generate feedback via an on-center off-surround interaction among like-oriented cells. Dipole oncells which receive the most favorable combination of bottom-up signals and top-down signals generate the emergent perceptual grouping.
Neural Dynamics of Perceptual Grouping
195
B. On-Center Off-Surround Interaction within Each Orimfation The outputs from these cells activate the first of two successive stages of short-range competition, which are denoted by Competition (I) and Competition (11) in Figures 18-25. At the first competitive stage, a mask of fixed orientation excites the likeoriented cells at its position and inhibits the like-oriented cells at nearby positions. Thus an on-center off-surround interaction between like-oriented cells occurs around each perceptual location. This interaction predicts that a stage subsequent to striate complex cells organizes cells sensitive to like orientations at different positions so that they can engage in the required on-center off-surround interaction. C. Push-Pull Competition between Orientations at Each Position The inputs to the second competitive stage are the outputs from the first competitive stage. At the second competitive stage, competition occurs between different orientations at each position. Thus a stage of competition between like orientations at different, but nearby, positions (Competition I) is followed by a stage of competition between different orientations at the same position (Competition 11). This second competitive stage is tonically active. Thus inhibition of a vertical orientation excites the horizontal orientation at the same position via disinhibition of its tonic activity. The combined action of the two competitive stages generates the perpendicular end cuts in Figure 5b that we have used to explain the percepts in Figures 7, 8, 12, and 13. Conjoint inhibition of vertical and horizontal orientations by the first competitive stage leading to disinhibition of diagonal orientations at the second competitive stage (Figure 17) was also used to explain the diagonal groupings in Figure 15. A similar interaction was used to help explain the neon color spreading phenomenon described in Figure 16 (Grossberg and Mingolla, 1985). Thus the interactions of the first and second competitive stages help to explain a wide variety of seemingly unrelated perceptual groupings, color percepts, and illusory figures. D. Dipole Field: Spatial Impenetrability The process described in this section refines the BCS model that was used in Grossberg and Mingolla (1985). This process incorporates a principle of cortical design which has been used to carry out related functional tasks in Grossberg (1980, 1983b). The functional role played by this process in the BCS can be understood by considering Figure 18c. At the second competitive stage of this figure, horizontal end cuts border the vertical responses to the inducing input Lines. What prevents the end cuts at both sides of each Line from cooperating? If these end cuts could cooperate, then each Line could activate one of a cooperative cell’s pair of receptive fields (Figure 9). As a result, horizontal Boundary Contours could be generated throughout the region between pairs of vertical Lines in Figure 18d, even though these Lines are spatially out-of-phase. The problem can thus be summarized as follows: Given the need for a long-range cooperative process to complete boundaries over retinal veins, the blind spot, etc., what prevents this cooperative process from leaping over intervening images and grouping together unappropriate combinations of inputs? In situations wherein no image-induced obstructions prevent such grouping, it can in fact occur, as in Figures 8 and 9. If, however, cooperative grouping could penetrate all perceived objects, then many spurious groupings would occur across every Line. The perceptual space would be transparent with respect to the cooperative process. To prevent this catastrophe, we propose a Postulate of Spatial Impenetrability. This postulate suggests that mechanisms exist which prevent the cooperative process from grouping across all intervening percepts. Inspection of Figure 18c discloses the primary computational properties that such a process must realize. It must not prevent likeoriented responses from cooperating in a spatially aligned position, because that is the primary functional role of cooperation. It need only prevent like-oriented responses
196
Chapter 3
(such as the horizontal end cuts in Figure 18a) from cooperating across a region of perpendicularly oriented responses (such as the vertical responses to the vertical Lines in Figure 18c). We therefore hypothesize that the vertical responses to the Lines generate inhibitory inputs to horizontally oriented receptive fields of the cooperative process (Figure 31). The net input due to both horizontal end cuts and vertical Lines at the horizontally oriented cooperative cells is thus very small or negative. As a result, neither receptive field of a horizontally oriented cooperative cell between the vertical Lines can be supraliminally excited. That is why the cooperative responses in Figure 18d ignore the horizontal end cuts. It remains to say how both excitatory and inhibitory inputs are generated from the second competitive stage to the cooperative stage. We hypothesize that the second competitive stage is a dipole field (Grossberg, 1980, 1983b) and that inputs from the first competitive stage activate the on-cells of this dipole field. Suppose, for example, that an input excites vertically oriented on-cells, which inhibit horizontally oriented oncells at the same position, as we have proposed in Section 20C.We assume, in addition, that inhibition of the horizontal on-cells excites the horizontal off-cells via disinhibition. The excited vertically oriented on-cells send excitatory inputs to the receptive fields of vertically-oriented cooperative cells, whereas the excited horizontally oriented off-cells send inhibitory inputs to the receptive fields of horizontally oriented cooperative cells (Figure 30). Two new cortical predictions are implied by this dipole field hypothesis: Both the on-cell subfield and the off-cell subfield of the dipole field are tonically active, thereby enabling their cells to be activat,ed due to disinhibition. Excitation of on-cells generates excitatory inputs to like-oriented cooperative receptive fields, whereas excitation of off-cells generates inhibitory inputs to like-oriented cooperative receptive fields. The tonic activity of the on-cell subfield helps to generate perpendicular end cuts, thereby preventing color flow from line ends. The tonic activity of the off-cell subfield helps to inhibit like-oriented cooperative cells, thereby augmenting spatial impenetrability. E. Long-Range Oriented Cooperation between Like-Oriented Pairs of Input Groupings
The outputs from the dipole field input to a spatially long-range cooperative process. We call this process the boundary completion process. Outputs due to like-oriented dipole field cells that are approximately aligned across perceptual space can cooperate via this process to synthesize an intervening boundary, as in Figures 18-25. A cooperative cell can be activated only if it receives a sufficiently positive net input at both of its orientationally tuned receptive fields (Figure 9). Two types of parameters must be specified to characterize these receptive fields: macroscale parameters which determine the gross shape of each receptive field; and microscale parameters which determine how effectively a dipole field input of prescribed orientation can excite or inhibit a cooperative receptive field. Figure 32 describes a computer simulation of the cooperative receptive field that we used to generate Figures 18-25. The Cooperative out-field, or projection field, in Figure 32a describes the interaction strengths, or path weights, from a single horizontally oriented dipole field on-cell to all cells within the cooperative stage. The length of each line is proportional to the size of the interaction strength to on-cells with the depicted positions and orientations. The cooperative in-field, or receptive field, in Figure 32b describes the path weights from all dipole field on-cells with the depicted positions and preferred orientations to a single cooperative cell with a horizontally oriented receptive field. The length of each line is thus proportional to the sensitivity of the receptive field to inputs received from cells coding the depicted positions and orientations. The cell in Figure 32b is most sensitive to horizontally oriented inputs that fall along a horizontal axis passing through the cell. Close-to-horizontal orientations and close-to-horizontal positions can also help to excite the cell, but they are less effective. Figures 32a and 32b describe the same information, but from different perspectives of a single dipole field on-cell source (Figure 32a) and a
Neural Dynamics of Perceptual Grouping
197
- - - - -
- I
1
I -
Figure 31. A mechanism to implement the postulate of spatial impenetrability: The left receptive fields of two horizontally tuned cooperative cells are crossed by a thin vertical Line. Although horizontal end cut signals can excite the upper receptive field, these are cancelled by the greater number of inhibitory inputs due to the vertical Line inputs. Within the lower receptive field, the excitatory inputs due to end cuts prevail.
198
Chapter 3
single cooperative cell sink (Figure 32b). Figure 33 depicts a cooperative out-field (Figure 33a) and in-field (Figure 33b) due to a different choice of numerical parameters. In Figure 33a, a single dipole field on-cell can spray inputs over a spatially broad region, but the orientations that it can excite are narrowly tuned at each position. From the perspective of a cooperative cell’s receptive fields, the out-field in Figure 33a generates an in-field which is spatially narrow, but the orientations that can excite it are broadly tuned. Figures 32 and 33 illustrate a duality between in-fields and out-fields that is made rigorous by the equations in the Appendix. F. On-Center Off-Surround Feedback within Each Orientation This process refines the BCS system that was described in Grossberg and Mingolla (1985). In Section 8, we suggested that excitatory feedback from the cooperative stage to the second competitive stage-more precisely to the on-cells of the dipole field-can help to eliminate fuzzy bands of boundaries by providing some orientations with a competitive advantage over other orientations. It is also necessary to provide some positions with a competitive advantage over other positions, so that only the favored orientations and positions will group to form a unique global boundary. Topographically organized excitatory feedback from a cooperative cell to a competitive cell is insufficient. Then the spatial fuzziness of the cooperative process (Figure 32) favors the same orientation at multiple non-colinear positions. Sharp orientational tuning but fuzzy spatial tuning of the resultant boundaries can then occur. We suggest that the cooperative-to-competitive feedback process realizes a Postulate of Spatial Sharpening in the following way. An active cooperative cell can excite likeoriented on-cells at the same position (Figure 30). An active cooperative cell can also inhibit like-oriented on-cells at nearby positions. Then both orientations and positions which are favored by cooperative groupings gain a competitive advantage within the on-cells of the dipole field. Figures 18-25 show that the emergent groupings tend to be no thicker than the inducing input Lines due to this mechanism. Figure 30 shows that both the bottomup inputs and the top-down inputs to the dipole field are organized as on-center offsurround interactions among like orientations. The net top-down input is, however, always nonnegative due to the fact that excitatory interneurons are interpolated between the on-center off-surround interaction and the dipole field. If this on-center off-surround interaction were allowed to directly input to the dipole field, then a single Line could generate a spatially expanding lattice of mutually perpendicular secondary, tertiary, and higher-order end cuts via the cooperative-competitive feedback loop. This completes our description of BCS interactions.
21. Concluding Remarks: Universality of the Boundary Contour System The Boundary Contour System and Feature Contour System interactions of our theory have suggested quantitative explanations and predictions for a large perceptual and neural data base, including data about perceptual grouping of textures and borders, illusory figures, monocular and binocular brightness percepts, monocular and binocular rivalry, the Land retinex demonstrations, neon color spreading and related filling-in phenomena, complementary color induction, fading of stabilized images, multiple scale interactions, shape-from-shading, metacontrast, hyperacuity, and various other global interactions between depth, lightness, length, and form properties (Cohen and Grossberg, 1984b; Grossberg, 1980, 1983b, 1984a, 1985; Grossberg and Mingolla, 1985). This expanded explanatory and predictive range is due, we believe, to the introduction and quantitative analysis of several fundamental new principles and mechanisms to the perceptual literature, notably the principle of Boundary-Feature Trade-off and the mechanisms governing Boundary Contour System and Feature Contour System interac t ions.
Neural Dynamics OfPerceptual Grouping
199
OUT FIELD
IN FIELD - - - - *
-
-
-
I***
&
@
*
--
. - z z . r -
-
-
-
- * * * I
*
-
I
-
-
Figure 32. Cooperative in-field and out-field: Line lengths are proportional to the strengths of signals from a horizontally tuned competitive cell to cooperative cells of various orientations at nearby positions. Thus in (a) strong signals are sent to horizontal cooperative cells 5 units to the left or the right of the competitive cell (center circle), but signal strength drops off with distance and change of orientation. (b) shows the dual perspective of weights assigned to incoming signals by the receptive field of a horizontal cooperative cell. (Note that only ezcilatory signal strengths are indicated in this figure.) The parameters used to generate these fields are the identical ones used in Figures 18-25.
200
Chapter 3
OUT FIELD
IN FIELD
Figure 33. Extreme cooperative in-field and out-field: This figure employs more extreme parameter choices than were used in the simulations of Figures 18-25. Greater orientational uncertainty at one location of the in-field corresponds to greater positional uncertainty in the out-field, thereby illustrating the duality between in-field and out-field.
Neural Dynamics of Perceptual Grouping
201
The present article has refined the mechanisms of the Boundary Contour System by using this system to quantitatively simulate enwrgent perceptual grouping properties that are found in the data of workers like Beck, Prazdny, and Rosenfeld (1983). Glass and Switkes (1976),and Gregory and Heard (1979). We have hereby been led to articulate and instantiate the postulates of spatial inipcnctrability and of spatial sharpening, and to thereby make some new predictions about prestriate cortical interactions. These results have also shown that several apparently different Gestalt rules can be analysed using the context-sensitive reactions of a single Boundary Contour System. Taken together, these results suggest that a universal set of rules for perceptual grouping of scenic edges, textures, and smoothly shaded regions is well on the way to being characterized.
Chapter 3
202
APPENDIX Boundary Contour System Equations The network which we used to define the Boundary Contour System (BCS)is defined in stages below. This network further develops the BCS system that was described in Grossberg and Mingolla (1985). A. Oriented Masks To define a mask, or oriented receptive field, centered at position (i,j) with orientation k, divide the elongated receptive field of the mask into a left-half L l , k and a right-half & j k . Let all the masks sample a field of preprocessed inputs. If ,S equals the preprocessed input to position ( p , q ) of this field, then the output J l , k from the mask at position (i,j) with orientation k is
where
and the notation [p]+ = max(p,O). The sum of the two terms in the numerator of ( A l ) says that &k is sensitive to the orientation and amount-of-contrast, but not to the direction-of-contrast, received by L r j k and R , , k . The denominator term in (Al) enables J I J k to compute a ratio scale in the limit where p ( u l , k v * , k ) is much greater than 1. In all of our simulations, we have chosen p = 0 . B. On-Center Off-Surround Interaction within Each Orientation (Competition I) Inputs J 1 3 k with a fixed orientation k activate potentials w , j k at the first competitive stage via on-center off-surround interactions: each J l , k excites W l J k and inhibits W& if I p - i 1' I q - j 1' is sufficiently small. All the potentials w,,k are also excited by the same tonic input I, which supports disinhibitory activations at the next competitive stage. Thus
+
+
where APqIlis the inhibitory interaction strength between positions ( p , g ) and ( i , i )and j ( & k ) is the input signal generated by J l , k . In our runs, we chose /(&k)
= BJsjk-
(-45)
Sections (C) and (D)together define the on-cell subfield of the dipole 5eld described in Section 20. C . Push-Pull Opponent Processes between Orientations at each Position Perpendicular potentials w,,k and W,,K elicit output signals that compete at their target potentials Zl,k and z , ~ K ,respectively. For simplicity, we assume that these output signals equal the potentials w r j k and w(,K, which are always nonnegative. We also assume that Z , j k and Z,,K respond quickly and linearly to these signals. Thus
= w13k - W t j K
(A61
Neural Dynamics of Perceplud Grouping
203
D. Normalization at each Position We also assume that, as part of this push-pull opponent process, the outputs yljk of the second competitive stage become normalized. Several ways exist for achieving this property (Grossberg, 1983a). We have used the following approach. The potentials Z,3k interact when they become positive. Thus we let the output 0 , k = o ( z , , k ) from Z t J k equal
where C is a positive constant and [p]+ = max(p,O). All these outputs at each position interact via a shunting on-center off-surround network whose potentials & j k statisfy
Each potential that
Yt3k
equilibrates rapidly to its input. Setting EO1,k
Yzjk
=D -__ + 02,
where
&Y,jk
= 0 in (A9) implies
('410)
n
Oij =
1 Oijm. m=l
(All)
Thus if D is small compared to O,], then C:=, gym E . E. Opponent Inputs to the Cooperative Stage The next process refines the BCS model used in Grossberg and Mingolla (1985). It helps to realize the Postulate of Spatial Impenetrability that was described in Section 20. The t U g j k , q j k , and VSJk potentials are all assumed to be part of the on-cell subfield of a dipole field. If u g j k is excited, an excitatory signal f ( Y t j k ) is generated at the cooperative stage. When potential I/(Jk is excited, the potential y i 3 ~corresponding to the perpendicular orientation is inhibited. Both of these potentials form part of the ~ the on-cell subfield of a dipole field. Inhibition of an on-cell potential y a j disinhibits ) the corresponding off-cell potential j & , ~ , which sends an inhibitory signal - f ( & , ~to cooperative level. The signals f ( 2 / r j k ) and - f ( g l l ~ )thus occur together. In order to instantiate these properties, we made the simplest hypothesis, namely that &jK
= Uyk.
('412)
F. Oriented Cooperation: Statistical Gates The cooperative potential Z , j k can be supraliminally activated only if both of its cooperative input branches receive enough net positive excitation from similarly aligned competitive potentials (Figure 9). Thus
204
Chapter 3
In (A13): g(s) is a signal function that beromcs positive only when s is positive, and has a finite maximum value. A slower-than-linrar function g(s) =
+I'
- - --
K t [s]+
was used in our simulations. A sum of two sufficiently positive g(s) terms in A13) is needed to activate ~ , above ~ k the firing t,hreshold of its output signal h ( z j J h . A threshold-linear signal function
h(2) = L [ z - M]+
(45)
and
is a spatial cross-correlation that adds up inputs from a strip with orientation (approximately equal to) & that lies to one side or the other of position ( i , j ) , as in Figures 31 and 32. The orientations r that contribute t.0 the spatial kernels Fj;;f' and Gkt) also approximately equal k. The kernels FL") and
G P ) are defined by
and
where -i QwtJ - arctan(-),9P-'
and P,R, and T are positive constants. In particular, R and T are odd integers. Kernels F and G differ only by a minus sign under the [. .It sign. This minus sign determines
-
the polarity of the kernel; namely, whether it collects inputs for z,,k from one side or N
2
the other of position ( i , j ) . Term e x p [ - 2 ( 9 - 1) ] determines the optimal distance P from (i,j) at which each kernel collects its inputs. The kernel decays in a Gaussian fashion as a function of N,,,/P, where Nw,, in (A20) is the distance between ( p , q ) and ( i,~') .The cosine terms in ( A N ) and (A19) determine the orientational tuning of the kernels. By (A21), QW,,is the direction of position ( p , q ) with respect to the position of the cooperative cell ( i ,j ) in (A13). Term 1 cos(QpqrJ-r) 1 in (AM) and (A19) computes how parallel Qwt3 is to the receptive field orientation r at position ( p , q ) . By (A21), term I c o ~ ( Q-~r) , I~is maximal when the orientation r equals the orientation of ( p , q ) with respect to (i,j). The absolute value sign around this term prevents it from becoming negative. Term cos(Qw,, - k) in (AM) and (A19) computes how parallel
Neural Dynamics ofpercephcclr Grouping
'
205
QPq13is to the orientation k of the receptive field of the cooperative cell (i'j.)in (A13). By (A21), term c o ~ ( Q -~ k) , ~is maximal when the orientation k equals the orientation of (p, q ) with respect to (i,j). Positions ( p , q ) such that cos(Qpql3- k) < 0 do not input of a negative number equals zero. On the other to Z I J k via kernel F because the [.. I' hand, such positions ( p , q ) may input to ZgJk via kernel G due to the extra minus sign in the definition of kernel G. The extra minus sign in (A19) flips the preferred axis of orientation of kernel GLt) with respect to the kernel F$) in order to define the two input-collecting branches of each cooperative cell, as in Figures 8 and 30. The product cos(QWl -k)T in (A18) and (A19) thus determine larger path terms f I cos(Qwla-r) weights from dipole field on-cells whose positions and orientations are nearly parallel to the preferred orientation k of the cooperative cell (i,j ) , and larger path weights from dipole field off-cells whose positions and orientations are nearly perpendicular to the preferred orientation k of the cooperative cell (i,j). The powers R and T determine the sharpness of orientational tuning: Higher powers enforce sharper tuning. G. On-Center Off-Surround Feedback within Each Orientation: The next process refines the BCS model used in Grossberg and Mingolla (1985). It helps to realize the Postulate of Spatial Sharpening that was described in Section 20. We assume that each Z,,k activates a shunting on-center off-surround interaction within each orientation k. The target potentials V , j k therefore obey an equation of the form
IR
The bottom-up transformation J 1 , k --* w , J k in (A4) is thus similar to the top-down v a j k in (A20). Functionally, the Zajk -+ v,? transformation entransformation Z , j k ables the most favored cooperations to enhance their preferred positions and orientation as they suppress nearby positions with the same orientation. The signale V t j k take effect by inputting to the W a l k opponent process. Equation (A4) is thus changed to --f
At equilibrium, the computational logic of the BCS is determined, up to parameter choices, by the equations
Chapter 3
206
and
Wherever possible, simple spatial kernels were used. For example the kernels WPqr3in (A22) and A ~ in j(A23) were both chosen to be constant within a circular receptive field: A . . - A if ( p - !))" ( q I w13- 0 otherwise and w W if(p-!)2+(q-j)21Wo pq'3 - ( 0 otherwise. The oriented receptive fields Lijk U R;jk in (A2) and (A3) were chosen to have parallel linear sides with hemicircular ends.
[
+
Neural Dynamics of Perceptual Crouping
207
REFER.ENCES Beck, J., Perceptual grouping produced by changes i n orientation and shape. Science, 1966,154, 538-540 (a). Beck, J., Effect of orientation and of shape similarity on perceptual grouping. Perception and Psychophysics, 1966,1,300-302(b). Beck, J., Similarity grouping and peripheral discriminability under uncertainty. American Journal of Psychology, 1972,85, 1 19. Beck, J., Textural segmentation. In J. Beck (Ed.), Organization a n d representation in perception. Hillsdale, NJ: Erlbaum, 1982. Beck, J., Textural segmentation, second-order statistics, and textural elements. Biological Cybernetics, 1983,48,125-130. Beck, J., Prazdny, K., and Rosenfeld, A., A theory of textural segmentation. In J. Beck, B. Hope, and A. Rosenfeld (Eds.), H u m a n and machine vision. New York: Academic Press, 1983. Caelli, T., On discriminating visual textures and images. Perception and Psychophysics, 1982, 31, 149-159. Caelli, T., Energy processing and coding factors in texture discrimination and image processing. Perception and Psychophysics, 1983, 34,349-355. Caelli, T. and Dodwell, P.C., The discrimination of structure in vectorgraphs: Local and global effects. Perception and Psychophysics, 1982,32, 314-326. Caelli, T. and Julesz, B.. Psychophysical evidence for global feature processing in visual texture discrimination. Journal of the Optical Society of America, 1979,69,675-677. Carpenter, G.A. and Grossberg, S., Adaptation and transmitter gating in vertebrate photoreceptors. Journal of Theoretical Neurobiology, 1981, 1, 1-42. Carpenter, G.A. and Grossberg, S., Dynamic models of neural systems: Propa ated signals, photoreceptor transduetion, and circadian rhythms. In J.P.E. Hodgson (kd.), Oscillations in mathematical biology. New York: Springer-Verlag, 1983. Carpenter, G.A. and Grossberg, S., Neural dynamics of category learning and recognition: Attention, memory consolidation, and amnesia. In J. Davis, W. Newburgh, and E. Wegman (Eds.), Brain structure, learning, and memory. AAAS Symposium Series, in press, 1985 (a). Carpenter, G.A. and Grossberg, S., Neural dynamics of category learning and recognition: Structural invariants, evoked potentials, and reinforcement. In M. Commons, R. Herrnstein, and S. Kosslyn (Eds.), P a t t e r n recognition a n d concepts in animals, people, and machines. Hillsdale, NJ: Erlbaum, 1985 (b). Cohen, M.A. and Grossberg, S., Some global properties of binocular resonances: Disparity matching, filling-in, and figure-ground synthesis. In P. Dodwell and T. Caelli (Eds.), Figural synthesis. Hillsdale, NJ: Erlbaum, 1984 (a). Cohen, M.A. and Grossberg, S., Neural dynamics of brightness perception: Features, boundaries, diffusion, and resonance. Perception and Psychophysics, 1984, 36,428456 (b). Cohen, M.A. and Grossberg, S., Neural dynamics of speech and language coding: Developmental programs, perceptual grouping, and competition for short term memory. Human Neurobiology, in press, 1985. Desimone, R., Schein, S.J., Moran, J., and Ungerleider, L.G., Contour, color, and shape analysis beyond the striate cortex. Vision Research, 1985,25, 441-452. Dev, P., Perception of depth surfaces in random-dot stereograms: A neural model. international Journal of Man-Machine Studies, 1975. 7, 511-528. DeValois, R.L.,Albrecht, D.G., and Thorell, L.G., Spatial frequency selectivity of cells in macaque visual cortex. Vision Research, 1982, 22, 545-559.
208
Chapter 3
Dodwell, P.C., The Lie transformation group model of visual perreption. Perception and Psychophysics, 1983, 34, 1-16. Ejima, Y., Redies, C., Takahashi, S., and Akita, M., The neon color effect in the Ehrenstein pattern: Dependence on wavelength and illuminance. Vision Research, 1984, 24, 1719-1726.
Ellias, S. and Grossberg, S., Pattern formation, contrast control, and oscillations in the short term memory of shunting on-center off-surround networks. Biological Cybernetics, 1975, 20, 69-98. Geman, S. and Geman, D., Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Patent Analysis and Machine Intelligence, 1984, 6, 721-741.
Glass, L. and Switkes, E., Pattern recognition in humans: Correlations which cannot be perceived. Perception, 1976, 5, 67-72. Gouras, P. and Kriiger, J., Responses of cells in foveal visual cortex of the monkey to pure color contrast. Journal of Neurophysiology, 1979, 42, 850-860. Gregory, R.L.,Eye and brain. New York: McGraw-Hill, 1966. Gregory, R.L. and Heard, P., Border locking and the Cafb Wall illusion. Perception, 1979, 8, 365-380.
Grossberg, S., Contour enhancement, short-term memory, and constancies in reverberating neural networks. Studies in Applied Mathematics, 1973, 52, 217-257. Grossberg, S., How does a brain build a cognitive code? Psychological Review, 1980, 87, 1-51.
Grossberg, S., Studies of mind a n d brain: Neural principles of learning, perception, development, rognition, and motor control. Boston: Reidel Press, 1982.
Grossberg, S., The quantized geometry of visual space: The coherent computation of depth, form, and lightness. Behavioral and Brain Sciences, 1983, 6, 825-692 (a). Grossberg, S., Neural substrates of binocular form perception: Filtering, matching, diffusion, and resonance. In E. Basar, H. Flohr, H. Haken, and A.J. Mandell (Eds.), Synergetics of t h e brain. New York: Springer-Verlag, 1983 (b). Grossberg, S., Outline of a theory of brightness, color, and form perception. In E. Degreef and J. van Buggenhaut (Eds.), Trends in mathematical psychology. Amsterdam: North-Holland, 1984 (a). Grossberg, S., Some psychophysiological and pharmacological correlates of a developmental, cognitive, and motivational theory. In R. Karrer, J. Cohen, and P. Tueting Eds.), Brain and information: Event related potentials. New York: New ork Academy of Sciences, 1984 (b). Grossberg, S., Cortical dynamics of depth, brightness, color, and form perception: A predictive synthesis. Submitted for publication, 1985. Grossberg, S. and Levine, D., Some developmental and attentional biases in the contrast enhancement and short term memory of recurrent neural networks. Journal of Theoretical Biology, 1975, 53,341-380. Grossberg, S. and Mingolla, E., Neural dynamics of form perception: Boundary completion, illusory figures, and neon color spreading. Psychological Review, 1985, 92,
k
173-211.
Grossberg, S. and Mingolla, E., Neural dynamics of surface perception: Boundary webs, illuminants, and shape-from-shading. In preparation, 1986. Heggelund, P., Receptive field organisation of complex cells in cat striate cortex. Experimental Brain Research, 1981, 4Z, 99-107.
Neural Dynamics of Perceptual Grouping
209
Helmholtz, H.L.F. von, Treatise on physiologiral optirs, J.P.C. Southall (Trans.). New York: Dover, 1962. Hoffman, W.C., Higher visual perception as prolongation of the basic Lie transformation group. Mathematical Biosciences, 1970,6, 437-471. Horn, B.K.P., Understanding image intensities. Artificial Intelligence, 1977,8,201-231. Hubel, D.H. and Wiesel, T.N., Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. Journal of Physiology, 1962, 160, 106-154. Hubel, D.H. and Wiesel, T.N., Receptive fields and functional architecture of monkey striate cortex. Journal of Physiology, 1968, 195, 215-243. Hubel, D.H. and Wiesel, T.N., Functional architecture of macaque monkey visual cortex. Proceedings of the Royal Society of London (B), 1977, 198, 1-59. Julesz, B., Binocular depth perception of computer-generated patterns. Bell System Technical Journal, 1960, 39, 1125-1162. Julesz, B., Foundations of cyclopean perception. Chicago: University of Chicago Press, 1971. Kanizsa, G . , Margini quasi-percettivi in campi con stimolazione omogenea. Revista di Psicologia, 1955, 49, 7-30. Kaplan, G.A., Kinetic disruption of optical texture: The perception of depth at an edge. Perception and Psychophysics, 1969,6, 193-198. Kawabata, N., Perception at the blind spot and similarity grouping. Perception and Psychophysics, 1984, 36, 151-158. Krauskopf, J., Effect of retinal image stabilization on the appearance of heterochromatic targets. Journal of the Optical Society of America, 1963, 53, 741-744. Land, EX., The retinex theory of color vision. Scientific American, 1977,237, 108-128. Marr, D. and Hildret,h, E., Theory of edge detection. Proceedings of the Royal Society of London (B), 1980, 207, 187-217. Marr, D. and Poggio, T., Cooperative computation of stereo disparity. Science, 1976, 194, 283-287.
MrCourt, M.E., Brightness induction and the caf6 wall illusion. Perception, 1983, 12, 131-142.
Neisser, U., Cognitive psychology. Xew York: Appleton-Century-Crofts, 1967. Prandtl, A . , Uber gleichsinnige induktion und die lichtverteilung in gitterartigen mustern. Zeitschrift fur Sinnesphysiologie, 1927, 58, 263-307. Prazdny, K., Illusory contours are not caused by simultaneous brightness contrast. Perception and Psychophysics, 1983, 34, 403-404. Prazdny, K., On the perception of Glass patterns. Perception, 1984, 13, 469-478. Prazdny, K., On the nature of inducing forms generating perception of illusory contours. Perception and Psychophysics, 1985,37,237-242. Preyer, W., On certain optical phenomena: Letter to Professor E.C. Sanford. American Journal of Psychology, 1897/98,9, 42-44. Ratliff, F., M a c h bands: Quantitative studies on neural networks in the retina. New York: Holden-Day, 1965. Redies, C. and Spillmann, L., The neon color effect in the Ehrenstein iIlusion. Perception, 1981, 10, 667-681. Redies, C., Spillmann, L., and Kunz, K., Colored neon flanks and line gap enhancement. Vision Research, 1984, 24, 1301-1309. Schatz, B.R., The computation of immediate texture discrimination. MIT A I Memo 426, 1977.
210
Chapter 3
Sehiller, P.H., Finlay, B.L., and Volman, S.F., Quantitative studies of singlccell properties in monkey striate cortex, I: Spatiotemporal organization of receptive fields. Journal of Neurophysiology, 1976,39, 1288-1319. Srhwartz, E.L., Desimone, R., Albright, T., and Gross, C.G., Shape recognition and inferior temporal neurons. Proceedings of the National Academy of Sciences, 1983, 80,577&5778. Shapley, R. and Gordon, J., Nonlinearity in the perception of form. Perception and Psychophysics, 1985,37, 84-88. Sperling, G., Binocular vision: A physical and a neural theory. American Journal of Psychology, 1970,83, 461-534. Spillmann, L.,Illusory brightness and contour perception: Current status and unresolved problems. Submitted for publication, 1985. Spitzer, H. and Hochstein, S., A complex-cell receptive field model. Journal of Neurophysiology, 1985,53, 1266-1286. Tanaka, M., Lee, B.B., and Creutzfeldt, O.D., Spectral tuning and contour representation in area 17 of the awake monkey. In J.D. Mollon and L.T. Sharpe (Eds.), Colour vision. New York: Academic Press, 1983. TodorovY, D., Brightness perception and the Craik-O’Brien-Cornsweet effect. Unpublished M.A. Thesis. Storrs, CT: University of Connecticut, 1983. van Tuijl, H.F.J.M., A new visual illusion: Neonlike color spreading and complementary color induction between subjective contours. Acta Psychologica, 1975,39, 441-445. van Tuijl, H.F.J.M. and de Weert, C.M.M., Sensory conditions for the occurrence of the neon spreading illusion. Perception, 1979,8,211-215. van Tuijl, H.F.J.M. and Leeuwenberg, E.L.J., Neon color spreading and structural information measures. Perception and Psychophysics, 1979, 25, 269-284. von der Heydt, R., Peterhans, E., and Baumgartner, G., Illusory contours and cortical neuron responses. Science, 1984,224, 1260-1262. Wertheimer, M., Untersuchungen zur Lehre von der Gestalt, 11. Psychologische Forschung, 1923,4,301-350. Wolfe, J.M., Global factors in the Hermann grid illusion. Perception, 1984, 13,33-40. Yarbus, A.L., Eye movements and vision. New York: Plenum Press, 1967. Zeki, S., Colour coding in the cerebral cortex: The reaction of cells in monkey visual cortex t,o wavelengths and colours. Neuroscience, 1983,9,741-765 (a). Zeki, S., Colour coding in the cerebral cortex: The responses of wavelength-selective and colour coded cells in monkey visual cortex to changes in wavelength composition. Neuroscience, 1983,9,767-791 (b). Zucker, S.W., Early orientation selection: Tangent fields and the dimensionality of their support. Technical Report 85-13-R,McGill University, Montreal, 1985.
211
Chapter 4 NEURAL DYNAMICS O F BRIGHTNESS PERCEPTION: F E A T I J R E S , BOUNDARIES, DIFFUSION, A N D R E S O N A N C E Preface This Chapter describes some of the simulations of paradoxical brightness data which led us to realize that a diffusive filling-in process exists which is contained by boundary contours. These computer simulations also enabled us to mathematically characterize interactions between brightness contours, boundary contours, and featural filling-in that are capable of quantitatively simulating the targeted brightness data. At the time that these simulations were being completed, we knew of no direct neurophysiological evidence for a system with these detailed properties, although we qualitatively interpreted the cortical filling-in system to be functionally homologous to the horizontal cell layers in fish and higher organisms. After completing our simulations, we were gratified to read the 1984 data of Piccolino, Neyton, and Gerschenfeld on the H1 horizontal cells of the turtle retina. These data reported a formally identical type of filling-in interaction among these horizontal cells. If a functional homolog does exist between retinal filling-in and cortical filling-in, then we may infer from the Piccolino el d. data that the chemical transmitter which delivers boundary contour signals to the cortical syncytium may be a catecholamine. We chose to simulate a set of monocular brightness data which no single previous theory had been able to explain. The multiple constraints which these data imposed upon us guided us to our filling-in model. As the Chapter describes, t,hese brightness data, taken together, challenge classical concepts of edge processing, and force one to consider how interactions between brightness and form processing govern the brightness profiles that we perceive. In addition to simulating monocular brightness data using Boundary Contour System and Feature Contour System interactions, the Chapter also describes computer simulations of a demanding set of binocular brightness data which were performed using the FIRE theory. Both types of theory thus help to explain their targeted data quite well, and both do so better than rival theories. On the other hand, the theory as a whole then uses two different types of filling-in--one monocular (diffusion) and the other binocular (FIRE)-and two different types of cooperative-competitive interactions --one monocular (CC Loop) and the other binocular (FIRE). In addition, as the FIRE theory began to be used to analyse 3-dimensional percepts of complex 2-dimensional images, the analysis seemed to become unnecessarily complicated. These inelegances ultimately focused my attention upon the following demanding problem: How can the FIRE theory be replaced by a binocular theory of the Boundary Contour System and Feature Contour System while preserving all of the good properties of the FIRE process within a theory with an expanded predictive range? Why did the FIRE theory work so well if it could be replaced in such a fashion? I have recently completed a theory of 3-dimensional form, color, and brightness perception which satisfies a11 of these concerns. In this theory, many of the key concepts from the FIRE process, such as filling-in generator and filling-in barrier, play a basic role. In addition, the monocular rules for the Boundary Contour and Feature Contour Systems generalize, indeed form the foundation for the 3-D form theory. Thus the theory has again developed in an evolutionary way. The range of perceptual and neural data which can now be analysed due to this synthesis is much larger, although I am now surer than ever that the process of evolutionary theoretical refinement is as yet far from over.
Perception and Psychophysics 30. 428- 450 (1984) 0 1 9 8 5 The Psychonomic Society, Inc. Reprinted by permission of the publisher
212
NEURAL DYNAMICS O F BRIGHTNESS PERCEPTION: FEATURES, B O U N D A R I E S , D I F F U S I O N , A N D R E S O N A N C E
Michael A. Cohent and Stephen Grossberg$
Abstract
A real-time visual processing theory is used to unify the explanation of monocular and binocular brightness data. This theory describes adaptive processes which overcome limitations of the visual uptake process to synthesize informative visual representations of the external world. The brightness data include versions of the CraikO’Brien-Cornsweet effect and its exceptions, Bergstrom’s demonstrations comparing the brightnesses of smoothly modulated and step-like luminance profiles, Hamada’s demonstrations of nonclassical differences between the perception of luminance decrements and increments, Fechner’s paradox, binocular brightness averaging, binocular brightness summation, binocular rivalry, and fading of stabilized images and ganzfelds. Familiar concepts such as spatial frequency analysis, Mach bands, and edge contrast are relevant but insufficient to explain the totality of these data. Two parallel contour-sensitive processes interact to generate the theory’s brightness, color, and form explanations. A boundary-contour process is sensitive to the orientation and amount of contrast but not to the direction of contrast in scenic edges. It generates contours that form the boundaries of monocular perceptual domains. The spatial patterning of these contours is sensitive to the global configuration of scenic elements. A feature-contour process is insensitive to the orientation of contrast, but is sensitive to both the amount of contrast and to the direction of contrast in scenic edges. It triggers a diffusive filling-in reaction of featural quality within perceptual domains whose boundaries are dynamically defined by boundary contours. The boundary-contour system is hypothesized to include the hypercolumns in visual striate cortex. The feature-contour system is hypothesized to include the blobs in visual striate cortex. These preprocessed monocular activity patterns enter consciousness in the theory via a process of resonant binocular matching that is capable of selectively lifting whole monocular patterns into a binocular representation of form-and-color-in-depth. This binocular process is hypothesized to occur in area V4 of the visual prestriate cortex.
t Supported in part by the National Science Foundation (NSF IST-80-00257) and the Office of Naval Research (ONR N00014-83-KO337). j: Supported in part by the Air Force Office of Scientific Research (AFOSR 82-0148) and the Office of Naval Research (ONR N00014-83-K0337).
N e m l Dynamics of BrightnessPerception
213
1. Paradoxical Percepts as Prohes of Adaptive Processes
This article describes quantitative simulations of monocular and binocular brightness data to illustrate and support a real-time perreptual processing theory. This theory introduces new concepts and mechanisms concerning how human observers achieve informative perceptual representations of the external world that overcome limitations of the sensory uptake process, notably of how distributed patterns of locally ambiguous visual features can be used to generate unambiguous global percepts. For example, light passes through retinal veins before it reaches retinal photoreceptors. Human observers do not perceive their retinal veins due to the action of mechanisms that attenuate the perception of images that are stabilized with respect to the retina. Mechanisms capable of generating this adaptive property of visual percepts can also generate paradoxical percepts, as during the perception of stabilized images or ganzfelds (Pritchard, 1961; Pritchard, Heron, and Hebb, 1960; Riggs, Ratliff, Cornsweet, and Cornsweet, 1953; Yarbus, 1967). Once such paradoxical percepts are traced to an adaptive perceptual process, they can be used as probes to discover the rules governing this process. This type of approach has been used throughout the research program on perception (Carpenter and Grossberg, 1981; Cohen and Grossberg, 1984; Grossberg, 1980, 1982a, 1983a, 1983b; Grossberg and Mingolla, 1985b) of which this work forms a part. Suppressing the perception of stabilized veins is insufficient to generate an adequate percept. The images that reach the retina can be occluded and segmented by the veins in several places. Somehow, broken retinal contours need to be completed, and occluded retinal color and brightness signals need to be filled in. Holes in the retina, such as the blind spot or certain scotomas, are also not visually perceived (Gerrits, deHaan, and Vendrick, 1966; Gerrits and Timmermann, 1969; Gerrits and Vendrick, 1970) due to some sort of filling-in process. These completed boundaries and filled-in colors are illusory percepts, albeit illusory percepts with an important adaptive value. The large literature on illusory figures and filling-in ran thus be used as probes of this adaptive process (Arend, Buehler, and Lockhead, 1971; Day, 1983; Gellatly, 1980; Kanizsa, 1974; Kennedy, 1978, 1979, 1981; Parks, 1980; Parks and Marks, 1983; Petry, Harbeck, Conway, and Levey, 1983; Redies and Spillmann, 1981; van Tuijl, 1975; van Tuijl and de Weert, 1979; Yarbus, 1967). The brightness simulations that we report herein illustrate our theory’s proposal for how real and illusory boundaries are completed and features are filled-in. Retinal veins and the blind spot are not the only blemishes of the retinal image. The luminances that reach the retina confound inhomogeneous lighting conditions with invariant object reflectances. Workers since the time of Helmholtz (Helmholtz, 1962) have realized that the brain somehow “discounts the illuminant” to generate color and brightness percepts that are more accurate than the retinal data. Land (1977) has shown, for example, that the perceived colors within a picture are constructed from overlapping colored patches are determined by the relative contrasts at the edges between the patches. The luminances within the patches are somehow discounted. These data also point to the existence of a filling-in process. Were it not possible to fill-in colors to replace the discounted illuminants, we would perceive a world of boundaries rather than one of extended forms. Since edges are used to generate the filled-in percepts, an adequate perceptual theory must define edges in a way that ciLn accomplish this goal. We suggest that the edge computations whereby boundaries are completed are fundamentally different-in particular, they obey different rules-from the edge computations leading to color and brightness signals. We claim that both types of edges are computed in parallel before being recombined to generate filled-in percepts. Our theory hereby suggests that the fundamental question “What is an edge, perceptually speaking?” has not adequately been answered by previous theories. One consequence of our answer is a physical explanation and generalization of the retinex theory (Grossberg, 1985), which Land (1977)
214
Chapter 4
has developed to explain his experiments. The present article further supports this conception of how edges are computed by qualitatively explaining, and quantitatively simulating on the computer, such paradoxical brightness data as versions of the Craik-O’Brien-Cornsweet effect (Arend ct a/., 1971; Cornsweet, 1970; O’Brien, 1958) and its exceptions (Coren, 1983; Heggelund and Krekling, 1976; van den Brink and Keemink, 1976; TodoroviC, 1983), the Bergstrinn demonstrations comparing brightnesses of smoothly modulated and step-like luminance profiles (Bergstrom, 1966, 1967a, 1967b), and the demonstrations of Hamada (1980) showing nonclassical differences between the perception of luminance decrements and increments. These percepts can all be seen with one eye. Our theory links these phenomena to the visual mechanisms that are capable of preventing perception of retinal veins and the blind spot, and that fill-in over discounted illuminants, which also operate when only one eye is open. Due to the action of binocular visual mechanisms that generate a self-consistent percept of depthful forms, some visual images that can be monocularly perceived may not be perceived during binocular viewing conditions. Binocular rivalry provides a classical example of this fact (Blake and Fox, 1974; Cogan, 1982; Kaufman, 1974; Kulikowski, 1978). To support the theory’s conception of how depthful form percepts are generated (Cohen .and Grossberg, 1984; Grossberg, 1983a, lQ83b),we suggest explanations and provide simulations of data concerning inherently binocular brightness interactions. These data include results on Fechner’s paradox, binocular brightness summation, binocular brightness averaging, and binocular rivalry (Blake, Sloane, and Fox, 1981; Cogan, 1982; Cogan, Silverman, and Sekuler, 1982; Curtis and Rule, 1980; Legge and Rubin, 1981; Levelt, 1965). These simulations do not, of course, begin to exhaust the richness of the perceptual literature. They are meant to be illustrative, rather than exhaustive, of a perceptual theory that is still undergoing development. On the other hand, this incomplete theory already reveals the perhaps even more serious incompleteness of rival theories by suggesting concepts and explaining data that are outside the range of these rival theories. The article also illustrates the theory’s burgeoning capacity to integrate the explanation of perceptual data by providing simulations of data about Fechner’s paradox, binocular brightness averaging, binocular brightness summation, and binocular rivalry using the same model parameters that were established to simulate disparity matching, filling-in, and figure-ground synthesis (Cohen and Grossberg, 1984). Although our theory was derived from perceptual data and concepts, after it reached a certain state in its development, striking formal similarities with recent neurophysiological data could not fail to be noticed. Some of these relationships are briefly summarized in Table 1 below. Although the perceptual theory can be understood without considering its neurophysiological interpretation, if one is willing to pursue this interpretation, then the perceptual theory implies a number of neurophysiological and anatomical predictions. Such predictions enable yet another data base to be used for the further development and possible disconfirmation of the theory. A search through the neurophysiological lit,erature has revealed that some of these predictions were already supported by known neural data, albeit data that took on new meaning in the light of the perceptual theory. Not all of the predictions were known, however. In fact, two of its predictions about the process of boundary completion have recently received experimental support from recordings by von der Heydt, Peterhans, and Baumgartner (1984) on cells in area 18 of the monkey visual cortex. Neurophysiological interpretations and predictions of the theory are described in Grossberg and Mingolla (1985b). Due to the existence of this neural interpretation, we will take the liberty of calling the formal nodes in our network “cells” throughout the article. The next sections summarize the concepts that we use to explain the brightness data.
Neural Dynamics of Brighmess Perception
TABLE
215
1
N A M E S OF M A C R O C I R C U I T STAGES Abbreviations
Full Names
MPL
Left Monocular Preprocessing Stage (Lateral geniculate nucleus) Right Monocular Preprocessing Stage (Lateral geniculate nucleus) Boundary Contour Synthesis Stage [Interactions initiated by the hyperrolumns in striate cortex-Area 17 (Hubel and Wiesel, 1977)] Left Monocular Brightness and Color Stage [Interactions initiated by the cytochrome oxydase staining blobs-Area 17 (Hendrickson, Hunt, and Wu, 1981; Horton and Hubel, 1981; Hubel and Livingstone, 1981; Livingstone and Hubel, l982)] Right Monocular Brightness and Color Stage [Interactions initiated by the cytochrome oxydase staining blobs- Area 171 Binocular Percept Stage [Area V4 of the prestriate cortex (Zeki, 1983a, 1983b)l
MPR BCS
MBCL
MBCR BP
2.
The Boundary-Contour System a n d the Feature-Contour System
The theory asserts that two distinct types of edge, or contour, computations are carried out within two parallel systems. We call these systems the boundary-contour system and the feature-contour system. Boundary-contour signals are used to synthesize the boundaries, whether "real" or "illusory," that the perceptual process generates. Feature-contour signals initiate the filling-in processes whereby brightnesses and colors spread until they either hit their first boundary contour or are attenuated due to their spatial spread. Boundary contours are not, in themselves, visible. They gain visibility by restricting the filling-in that is triggered by feature-contour signals. These two systems obey different tules. The main rules can be summarized as follows. 3. Boundary Contours a n d Boundary Completion
The process whereby boundary contours are built up is initiated by the activation of oriented masks, or elongated receptive fields, at each position of perceptual space (Hubel and Wiesel, 1977). Our perceptual analysis leads to the following hypotheses about how these masks activate their target cells, and about how these cells interact to generate boundary contours. (a) Orientation and contrast. The output signals from the oriented masks are sensitive to the orientation and to the amount of contrast, but not to the direction of contrast, at an edge of a visual scene. Thus,a vertical boundary contour can be activated by either a close-to-vertical dark-light edge or a close-to-vertical light-dark edge at a fixed scenic position. The process whereby two like-oriented masks that are sensitive to direction of contrast at the same perceptual location give rise to an output signal that is not sensitive to direction of contrast is designated by a plus sign in Figure la.
Chapter 4
216
4
n n
n
n
Figure 1. (a) Boundary-contour inputs are sensitive to the orientation and amount of contrast at a scenic edge, but not to its direction of contrast. (b) Like orientations compete at nearby perceptual locations. c) Different orientations compete at each perceptual location. (d) Once activated, asigned orientations can cooperate across a larger visual domain to form real or illusory contours.
Neural Dynamics of BrightnessPerception
217
(b) Short-range competition. (i) The rt-11s that. react to output signals due to like-oriented masks compete between nearby perceptual lorations (Figure Ib). Thus, a mask of fixed orientation excites the like-oriented cells at its location and inhibits the like-oriented cells at nearby locations. In other words, an on-center off-surround organization of like-oriented cell interactions exists around each perceptual location. (ii) The outputs from this competitive stage input to the next competitive stage. A t this stage, cells compete that represent perpendicular orientations at the same perceptual location (Figure l c ) . This competition defines a push-pull opponent process. If a given orientation is inhibited, then its perpendicular orientation is disinhibited. In all, a stage of competition between like orientations at different, but nearby, positions is followed by a stage of competition between perpendicular orientations at the same position. (c) Long-range oriented cooperation a n d boundary completion. The outputs from the last competitive stage input to a spatially long-range cooperative process that is called the “boundary-completion” process. Outputs due to like-oriented masks that are approximately aligned across perceptual space can cooperate via this process to synthesize an intervening boundary. The boundary-completion process is capable of synthesizing global visual boundaries from local scenic contours (Grossberg and Mingolla, 1985b). Both “real” and “illusory” boundaries are assumed to be generated by this boundary-completion process. Two simple demonstrations of a boundary-completion process with properties (a (c) can be made as follows. In Figure 2a, four pac-man figures are arranged at t e vertices of an imaginary’rectangle. It is a familiar fact that an illusory Kanizsa (1974) square can be seen when all four pac-man figures are black against a white background. The same is true when two pac-man figures are black, the other two are white, and the background is grey, as in Figure 2b. The black pac-man figures form dark-light edges with respect to the grey background. The white pac-man figures form light-dark edges with the grey background. The visibility of illusory edges around the illusory square shows that a process exists that is capable of rompleting the contours between edges with opposite directions of contrast. This contour-completion process is thus sensitive to amount of contrast but not to direction of contrast. Another simple demonstration of these contour-completing properties can be constructed as follows. Divide a square into two equal rectangles along an imaginary boundary. Color one rectangle a uniform shade of grey. Color the other rectangle in shades of grey that progress from light to dark as one moves from end 1 of the rectangle to end 2 of the rectangle. Color end 1 a lighter shade than the uniform grey of the other rectangle, and color end 2 a darker shade than the uniform grey of the other rectangle. Then, as one moves from end 1 to end 2, an intermediate grey region is passed whose luminance approximately equals that of the uniform rectangle. At end 1, a light-dark edge exists from the nonuniform rectangle to the uniform rectangle. At end 2, a darklight edge exists from the nonuniform rectangle to the uniform rectangle. Despite this reversal in the direction of contrast from end 1 to end 2, an observer can see an illusory edge that joins the two edges of opposite contrast and separates the intermediate rectangle region of equal luminance. This boundary completion process, which seems so paradoxical when its effects are seen in Kanizsa squares, is also hypothesized to complete boundaries across the blind spot, across the faded images of stabilized retinal veins, and between all perceptual domains that are separated by sharp brightness or color differences. (d) Binocular matching. A monocular boundary contour can be generated when a single eye views a scene. When two eyes view a scene, a binocular interaction can occur between outputs from oriented masks that respond to the same retinal positions of the two eyes. This interaction leads to binocular competition between perpendicular orientations at each position. This competition takes place at, or before, the competitive stage (b ii).
h
218
Chapter 4
Figure 2. (a) An illusory Kanizsa square is induced by four black pac-man figures. (b) An illusory square is induced by two black and two white pac-man figures on a grey background. Illusory contours can thus join edges with opposite directions of contrast (the effect may be weakened by the photographic reproduction process).
Neural Dynamics of Brightness Perception
219
4 . Fratiire Coritoiirs and Diffiisivr Filling-In
The rulcs of contrast obeyed by the feature-contour process are different from those obeyed by the boundary-contour process. (a) Contrast. The feature-contour process is insensitive to the orientation of contrast in a scenic edge, but it is sensitive to both the direction of contrast and the amount of contrast, unlike the boundary-contour process. For example, to compute the relative brightness across a scenic boundary, it is obviously important to keep track of which side of the scenic boundary has a larger reflectance. Sensitivity to direction of contrast is also needed to determine which side of a red-green scenic boundary is red and which is green. Due to its sensitivity to the amount of contrast, feature-contour signals ”discount the illuminant.” In the simulations in this article, only one type of feature-contour signal is considered, namely, achromatic or light-dark signals. In the simulations of chromatic percepts, three parallel channels of double-opponent feature-contour signals are used: light-dark, red-green, and blue-yellow. The simulations in this article consider only how input patterns are processed by a single network channel whose on-center off-surround spatial filter plays the role of a single spatial frequency channel (Grossberg, 1983b). We often call such a network a “spatial scale” for short. From our analysis of the dynamics of individual spatial scales, one can readily infer how multiple spatial scales, acting in parallel, transform the same input patterns. The rules of spatial interaction that govern the feature-contour process are also different from those that govern the boundary-contour process. (b) Diffusive filling-in. Boundary contours activate a boundary-completion process that synthesizes the boundaries that define monocular perceptual domains. Feature contours activate a diffusive filling-in process that spreads featural qualities, such as brightness or color, across these perceptual domains. Figure 3 depicts the main properties of this filling-in process. It is assumed that featural filling-in occurs within a syncytium of cell compartments. By a syiicytiiim of cells, we mean a regular array of cells in such an intimate relationship to one another than contiguous cells can easily pass signals between each other’s compitrtiiirnt membrancs. In the present instance, a feature-contour input signal to a cell of the syncytium activates that cell. Due to the synrytial coupling of this cell with its neighbors, the activity can rapidly spread to neighboring cells, then to neighbors of neighbors, and so on. Since the spreading occurs via a diffusion of activity (Appendix A), it tends to average the activity that was triggered by the feature-contour input signal across the cells that receive this spreading activity. This averaging of activity spreads across the syncytium with a space constant that depends upon the electrical properties of both the cell interiors and their membranes. The electrical properties of the cell membranes can be altered by boundary-contour signals in the following way. A boundary-contour signal is assumed to decrease the diffusion constant of its target cell membranes within the cell syncytium. It does so by acting as an inhibitory gating signal that causes an increase in cell membrane resistance (Appendix A). At the same time that a boundary-contour signal creates a barrier to the filling-in process at its target cells, it also acts to inhibit the activity of these cells. Thus, due to the physical process whereby a boundary contour limits featural spreading across the syncytium, a boundary-contour input also acts as a feature-contour input to its target syncytial cells. Such a diffusive filling-in reaction is hypothesized to instantiate featural filling-in over the blind spot, over the faded images of stabilized retinal veins, and over the illuminants that are discounted by feature-contour preprocessing. Three distinguishable types of spatial interaction are implied by this description of the feature-contour system: (i) Spatialfrequency preprocessing: Feature-contour signals arise as the outputs of several distinct on-center off-surround networks with different receptive field sizes, or spatial scales. (ii) Diffusive filling-in: The feature-contour signals
220
Chapter 4
tttmtt FEATURE CONTOUR SIGNALS
Figure 3. A monocular brightness and color stage (MBC):Monocular feature-contour signals activate cell compartments that permit rapid lateral diffusion of activity, or potential, across their boundaries, except at the boundaries that receive boundarycontour signals from the BCS stage of Figure 4. Consequently, the feature-contour signals are smoothed except at boundaries that are synthesized within the BCS stage. within each spatial scale then cause activity to spread across the scale cell's syncytium. This filling-in process has its own diffusive bandwidth. (iii) Figural boundaries: The boundary-contour signals define the limits of featural filling-in. Boundary contours are sensitive to the configuration of all edges in a scene, rather than to any single receptive field size. The interplay of these three types of spatial interaction will be essential in our explanations of brightness data. 5. Macrocircuit of Processing Stages
Figure 4 describes a macrocircuit of processing stages into which the microstages of the boundary-contour system and feature-contour system can be embedded. The processes described by this macrocircuit are capable of synthesizing global properties of depth, brightness, and form information from monocularly and binocularly viewed patterns (Grossberg, l983a, 1984). Table 1 lists the full names of the abbreviated macrocircuit stages, as well as the neural structures that seem most likely to execute analogous processes.
Neural Dynamics of Brightness Perception
221
I/
T F i g u r e 4. A macrocircuit of processing stages: Table 1 lists the functional names of the abbreviated stages and indicates a neural interpretation of these stages. Boundarycontour formation is assumed to occur within the BCS stage. Its output signals to the monocular MBCL and MBCR stages define boundaries within which feature-contour signals from MPL or MPR can trigger the spreading. or diffusion, of featural quality.
222
Chapter 4
Each monocular preprocessing stage MPL and MPR can generate inputs to a boundary-contour system and a feature-contour system. The pathway MPL -+ BCS carries inputs to the left-monocular boundary-contour system. The pathway MPL -+ MBCL carries inputs to the left-monocular feature-contour system. Only after a11 the microstages of scale-specific, orientation-specific, contrast-specific, competitive, and cooperative interactions (Section 3) take place within the BCS stage does this stage give rise to boundary-contour signals BCS -+ M B C L that act as barriers to the diffusive fillingin triggered by MPL + MBCL feature-contour signals (Section 4). Thus, the divergence of the pathways MPL -+ MBCL and MPL + B C S allows the boundary-contour system and the feature-contour system to undergo significant processing according to different rules before their signals recombine within the cell syncytia.
6. FIRE: Resonant Lifting of Preperceptual Data into a Form-in-Depth Percept The activity patterns generated by feature-boundary interactions at the monocular brightness and color stages MBCL and MBCR must undergo further processing before they can be perceived. This property is analogous to the fact that a contoured monocular image is not always perceived. It can, for example, be suppressed by a discordant image to the other eye during binocular rivalry. Only activity patterns at the binocular percept (BP) stage of Figure 4 are perceived. Signals from stage MBCL and/or stage MBCR that are capable of activating the BP stage are said to “lift” the preprocessed monocular patterns into the perceptual domain (Cohen and Grossberg, 1984; Grossberg, 1983b). We use the word “lift” instead of a word like “search” because the process occurs directly via a single parallel processing step, rather than by some type of serial algorithm. This lifting process works as follows. Monocular arrays of cells in MBCL and MBCR send topographically organized pathways to BP and receive topographirally organized pathways from BP. A monocular activity pattern across MBCL can elicit output signals in the M B C L -+ BP pathway only from positions that are near contours, or edges, of the MBCL activity pattern (Figure 5). Contours of an MBCL pattern must not be confused with edges of an external scene. They are due to boundary-contour signals in the B C S -+ MBCL pathway, which themselves are the result of a great deal of preprocessing. Thus, no contour signals are initially elicited from the MBCL stage to the BP stage at positions within the interiors of filled-in regions. Similar remarks hold for contour signals from the MBCR stage to the BP stage. Pairs of contour signals from MBCL and MBCR that correspond to similar perceptual locations are binocularly matched at the BP stage. If both contour signals overlap sufficiently, then they can form a fused binocular contour with the BP stage. If their positions mismatch by a larger amount, then both contours can mutually inhibit each other, or the stronger contour can suppress the weaker contour. If their positions are even more disparate, then a pair, or “double image,” of contours can be activated at the BP stage. These possibilities are due to the fact that contour signals from MBCL and MBCR to BP possess an excitatory peak surrounded by a pair of inhibitory troughs. Under conditions of monocular viewing, the contour signals from (say) MBCL to BP are always registered, or “self-matched,” at BP because no contours exist from MBCR that are capable of suppressing them. Contours at the BP stage that survive this binocular matching process can send topographic contour signals back to MBCL and MBCR along the feedback pathways (Figure 5). Remarkably, feedback exchange of such local contour signals can trigger a rapid filling-in reaction across thousands of cells. This filling-in reaction is due to the form of the contour signals that are fed back from BP to MBCL and MBCR. These signals also possess an excitatory peak surrounded by a pair of inhibitory troughs. The inhibitory troughs cause local nonuniformities in the activity pattern near the original MBCL or MBCR contour. These local nonuniformities are seen by the MBCL -+ B P
Neural Dynamics of Brightness Perception
223
Figure 5 . Binocular representation of MBC patterns at the BP stage: Each MBCt and MBCR activity pattern is filtered in such a way that its contours generate topographically organized inputs to the BP stage. At the BP stage, these contour signals undergo a process of binocular matching. This matching process takes place simultaneously across several on-center off-surround networks, each with a different spatial interaction bandwidth. Contours capable of matching at the BP stage send feedback signals to their respective MBCL or MBCR patterns. Closing this feedback loop of local edge signals initiates the rapid spreading of a standing wave that resonantly “lifts” a binocular representation of the matched monocular patterns into the BP stage. This standing wave, or filling-in resonant exchange (FIRE),spreads until it hits the first binocular mismatch within its spatial scale. The ensemble of all resonant standing waves across the multiple spatial scales of the BP constitutes the network percept. If all MBCL or MBCR contour inputs are suppressed by binocular matching at a spatial scale of the BP stage, then their respective monocular activity patterns cannot be lifted into resonant activity within this BP spatial scale. The BP spatial scale selectively resonates with some, but not all, monocular patterns within the MBCt and MBCR stages.
224
Chapter 4
and M B C R 4 B P pathways as new ront,iguous rontoiirs, which ran thus send signals to BP. In this way, a matched contour at BP ran trigger a standing wave of activity that can rapidly spread, or fill-in, across BP until its hits the first pair of mismatched contours. Such a mismatch creates a barrier to filling-in. As a result of this filling-in process across BP, the activities at interior positions of filled-in regions of MBCL and MBCR can be lifted into perception within BP. Although such an interior cell in MBCL sends topographic signals to BP, these signals are not topographically related to MPL in a simple way, due to syncytial filling-in within MBCi,. The properties of the resonant filling-in reaction imply that MBCL or MBCR activity patterns that do not emit any contour signals to BP cannot enter perception. Activity patterns, all of whose contour signals are inhibited within BP due to binocular mismatch, also cannot enter perception. Only activity patterns that lie between a contour match and its nearest contour mismatch can enter perception. Such a filling-in reaction, unlike diffusive filling-in (Section 4), is a type of nonlinear resonance phenomenon, which we call a "filling-in resonant exchange" (FIRE). In the full theory, multiple networks within MBCL and MBCR that are sensitive to different spatial frequencies and disparities are topographically matched within multiple networks of BP. The ensemble of all such resonant standing waves constitutes the network's percept. Cohen and Grossberg (1984) and Grossberg (1983b) describe how these ensembles encode global aspects of depth, brightness, and form information. In this article, we show that these ensembles also mimic data about Fechner's paradox, binocular brightness summation, and binocular brightness averaging (Sections 13-15). The fact that a single process exhibits all of these properties enhances the plausibility of the rules whereby FIRE contours are computed and matched within BP. The standing waves in the BP stage may themselves be further transformed, say by a local smoothing operation. This type of refinement does not alter our discussion of binocular brightness data; hence, it will not be further discussed. 7. Binocular Rivalry, Stabilized Images, and the Ganefeld
The following qualitative properties of the FIRE process illustrate how binocular rivalry and the fading of ganzfelds arid stabilized images can occur witthin the network of Figure 4. Suppose that, due to binocular matching of perpendicular orientations, as in Section 3d, some left-monocular boundary contours are suppressed within the BCS stage. Then these boundary contours cannot send boundary contour signals to the corresponding region of stage MBCL. Featural activity thus quickly diffuses across the network positions corresponding to these suppressed contours (Gerrits and Vendrick, 1970). Consequently, no contour output signals can be emitted from these positions within the MBCL stage to the BP stage. No edge matches within the BP stage can occur at these positions, so no effective feedback signals are returned to the MBCL stage at these positions to lift the corresponding monocular subdomain into perception. Thus, the subdomains whose boundary contours are suppressed within the BCS stage are not perceived. As soon as these boundary contours win the BCS binocular competition, their subdomain contours can again rapidly support the resonant lifting of the subdomain activity pattern into perception at the BP stage. During binocular rivalry, an interaction between rapidly competing short-term memory traces and slowly habituating transmitter gates can cause oscillatory switching between left and right BCS contours (Grossberg, 1980, 1983a). The same argument shows that a subdomain is not perceived if its boundary edges are suppressed by binocular rivalry within the BCS stage or by image stabilization, or if they simply do not exist, as in a ganzfeld.
Neural Dynamics of Brightness Perception
8.
225
The Interplay of Coiltrolled a n d Automatic Processes
The most significant technical insights that our theory introduces concern the manner in which local computations can rapidly generate global context-sensitive representations via hierarchically organized networks whose individual stages undergo parallel processing. Using these insights, one can also begin to understand how internally generated “cognitive” feature-contour signals or “cognitive” boundary-contour si nals can modify the global representations generated within the network of Figure 4 IGregory, 1966; Grossberg, 1980). Indeed, the network does not know which of its contour signals are generated internally and which are generated externally. One can also now begin to understand how state-dependent nonspecific changes in sensitivity at the various network stages (e.g., attentional shifts) can modify the network’s global representations. For example, the contrast sensitivity of feature-contour signals can change as a function of background input intensity or internal nonspecific arousal (Grossberg, 1983b, Sections 24-28). The balance between direct feature-contour signals and diffusive filling-in signals can thus be altered by changes in input luminance or arousal parameters, and can thereby influence how well filling-in can overcome feature-contour contrast effects during the Craik-O’Brien illusion (Section 9). Once such internally or externally controlled factors are specified, however, the network automatically generates its global representations using the intrinsic structure of its circuitry. In all aspects of our theoretical work, controlled and automatic factors participate in an integrated network design (Grossberg, 1982a), rather than forming two computationally disjoint serial and parallel subsyst,ems, as Schneider and Shiffrin (1977) have suggested. Even the complementary attentional and orienting subsystems that have been hypothesized to regulate the stability and plasticity of long-term memory encoding processes in response to expected and unexpected events (Grossberg, 1975, 1982a, 1982b) both utilize parallel mechanisms that are not well captured by the controlled versus automatic processing dichotomy. 9. Craik-O’Brien Luminance Profiles and Multiple S t e p Illusions
Arend ef al. (1971) have studied the perceived brightness of a variety of luminance profiles. The construction of these profiles was suggested by the seminal article of O’Brien (1958). Each of the luminance profiles was produced by placing appropriately cut sectors of black and white paper on a disk. The disk was rotated at a rate much faster than that required for flicker fusion. The luminances thereby generated were then independently calibrated. The subjects were asked to describe the relative brightness distribution by describing the locations and directions of all brightness changes, and by ordering the brightnesses of regions that appeared uniform. Ordinal, rather than absolute, brightness differences were thereby determined. One of their important results is schematized in Figure 6. Figure 6a describes a luminance profile in which two Craik-O’Brien luminance cusps are joined to a uniform background luminance. The luminances to the left and to the right of the cusps are equal, and the average luminance across the cusps equals the background luminance. Figure 6b shows that this luminance profile is perceived as (approximate) steps of increasing brightness. In particular, the perceived brightnesses of the left and right backgrounds are significantly different, despite the fact that their luminances are equal. This type of result led Arend et af. (1971, p.369) to conclude that “the brightness information generated by moving contours is difference information only, and the absolute informat,ion hypothesis is rejected.” In other words, the nonuniform luminances between successive edges are discounted, and only the luminance differences of the edges determine the percept. Similar concepts were developed by Land (1977). This conclusion does not explain how the luminance differences at the edges are computed, or how the edges determine the subjective appearance of the perceptual domains that exist between the edges. The incomplete nature of the conclusions does not,
226
Chapter 4
Figure 6. (a) A one-dimensional slice across a two-dimensional Craik-O’Brien luminance profile. The background luminances at the left and right sides of the profile are equal. (b) This luminance profile appears like a series of two (approximate) steps in increasing brightness.
Neural Dynamics of Brightness Pereeptwn
227
however, limit their usefulness as a working hypothesis. This hypothesis must, however, be tempered by the fact that it is not universally true. For example, the hypothesis does not explain illusory brightness differences that can exist along illusory contours that cross regions of uniform luminance (Kanizsa, 1974; Kaufman, 1974; Kennedy, 1979). It does not explain how Craik-O’Brien filling-in can improve or deteriorate as the balance between background illumination and edge contrast is varied Heggelund ain why a and Krekling, 1976; van den Brink and Keemink, 1976). It does not exp(I strong Craik-O’Brien effect is seen when a vertical computer-generated luminance cusp on a uniform background is enclosed by a black border that touches the two ends of the cusp, yet vanishes completely when the black border is removed and the cusp is viewed within a uniform background on all sides (Todorovic‘, 1983). It does not explain why, in response to five cusps rather than two, subjects may see a flattened percept rather than five rising steps (Coren, 1983). The present theory suggests an explanation of all these properties. The illusory brightness properties are discussed in Grossberg (1984) and Grossberg and Mingolla (198513). The remaining issues are clarified below. Figure 7 describes the results of a computer simulation of the two-step brightness illusion that is described in Figure 6. The networks of differential equations on which the simulation is based are summarized in Appendix A. Figure 7 depicts the equilibrium solutions to which these networks of differential equations rapidly converge. All of the simulation results reported herein are equilibrium solutions of such networks. These networks define one-dimensional arrays of cells due to the one-dimensional symmetry in the luminance profiles. Figure 7a describes the input pattern to the network. The double cusps are surrounded by a uniform luminance level that is Gaussianly smoothed at its edges to minimize spurious edge effects. Figure 7b shows that each of the two luminance cusps in the input pattern generates a narrow boundary-contour signal. Each boundary-contour signal causes a reduction in the rate of diffusion across the membranes of its target cells at the MBCL or MBCR stage. A reduced rate of diffusion prevents the lateral spread of featural activity across the membranes of the affected cells. A reduced diffusion rate thereby dynamically generates boundary contours within the cell syncytium (Figure 3). Successive boundary contours determine the spatial domains within which featural activity can spread. The feature-contour process attenuates the background luminance of the input pattern and computes the relative contrasts of the cusps. It does this by letting the individual inputs interact within a shunting on-center off-surround network (Grossberg, 1983b). Such a network is defined in Appendix A, equation (1).The resultant featurecontour activity pattern is an input pattern to a cell syncytium. The boundary-contour signals from the BGS stage also contribute to this input pattern. Boundary-contour signals generate feature-contour signals as well as boundary-contour signals because they increase cell membrane resistances in order to decrease the cells’ diffusion constants, as described in Section 4b. Due to this effect on cell-membrane resistances, boundary-contour signals are a source of inhibitory feature-contour signals. These inhibitory signals act on a narrower spatial scale than the feature-contour signals from the MPL and MPR stages. The total feature-contour input pattern received by MBCL is the sum of the feature-contour patterns from the MPL and BCS stages. This total feature-contour input pattern is depicted in Figure 7c. The flanks of this pattern were artificially extended to the left and to the right to avoi spurious boundary effects and to simulate the output when the input pattern is placed on an indefinitely large field.) When the feature-contour input pattern of Figure 7c is allowed to diffuse within the perceptual domains defined by the boundary-contour pattern of Figure 7b, the step-like activity pattern of Figure 7d is the result. Figure 8 simulates a luminance profile with five cusps, using the same equations and parameters that generated Figure 7. The activity pattern in Figure 8d is much flatter than one might expect from the step-like pattern in Figure 7d. Coren (1983)
6
Chapter 4
228
TWO STEP ILLUSION INPUT PATTERN
BOUNDARY CONTOUR PATTERN
a
b
FEATURE CONTOUR PATTERN
MONOCULAR BRIGHTNESS PATTERN
8.7*10-'
II
+.SrlCi-'
,
,
,
,
, POS
-8.7*10-'
1
C
-4.5*10-'
1
,
,
,
,
, 3500
d
Figure 7. Simulation of the two-step illusion: (a) Input luminance pattern. (b) The pattern of diffusion coefficients that is induced by boundary contours. This pattern d q termines the limits of featural spreading across the cell syncytium. The two luminance cusps in (a) determine a pair of boundary contours at which the diffusion coefficients are small in (b). (c) The featurecontour pattern induced by (a), The background luminance is attenuated, and the relative contrasts of the luminance cusps are accentuated. (d) When pattern (c) diffuses within the syncytial domains determined by (b), a series of two approximate steps of activity results.
Neural Dynamics of Brightness Perception
'
229
found a similar result with this type of stimulus. Figure 8 suggests that the result of Coren (1983), which he attributes to cognitive factors, may be partially explained by feature-contour and boundary-contour interactions due to a single spatial scale. Such a single-scale reaction does not, however, exhaust even the noncognitive monocular interactions that are hypothesized to occur within our theory. The existence of multiple spatial scales has been justified from several points of view Graham, 1981; Graham and Nachmias, 1971; Grossberg, 1983b; Kaufman, 1974; Ku ikowski, 1978). The influence of these multiple scale reactions are also suggested by some displays of Arend et al. (1971). One such display is redrawn in Figure 9. The transformation of cusp in Figure 9a into step in Figure 9b and the computation of the relative contrast of the increments on their background are easy for the single-scale network that simulates Figures 7 and 8. This network cannot, however, generate the same brightness on both sides of the increments in Figure Qb, because the boundary-contour signals due to the increments prevent the feature-contour signals due to the cusps from diffusing across the increments. Thus, to a single-scale network, the left and right distal brightnesses appear more equal than the brightnesses on both sides of the cusps. This difficulty is partially overcome when multiple spatial scales (vie., separate shunting on-center off-surround networks with different intercellular interaction coefficients) process the same input pattern, and the perceived brightness is derived from the average of all the resultant activity patterns across their respective syncytia. In this setting, a low-frequency spatial scale may generate a boundary contour in response to the cusp, but not in response to the increments (Grossberg, 1983b). The monocular brightness pattern generated by such a scale is thus a single step centered at the position of the cusp. When this step is averaged with the monocular brightness pattern of a high-spatial-frequency scale, the difference between proximal and distal background brightness estimates becomes small relative to the difference between step and background brightness. This explanation of Figure 9 may be testable by selectively adapting out the high- or low-spatial frequency scales. The action of low-spatial-frequency scales can also contribute to the flattening of the perceived brightnesses induced by a five-cusp display. Five cusps activate a broader network domain than do two cusps of equal size. Low-spatial-frequency scales that do not significantly react to two cusps may generate a blob-like reaction to five cusps. When such a reaction is averaged in with the already ffattened high-spatial-frequency reaction, an even flatter percept can result.
I
10. Smoothly Varying Luminance Contours versus Steps of Luminance Change Bergstriim (1966, 1967a, 1967b) has collected data that restrict the generality of the conclusion that sharp edges control the perception of brightness. In those experiments, he compared the relative brightness of several luminance displays. Some of the displays possessed no sharp luminance edges within their interiors. Other displays did possess sharp luminance edges. Bergstriim used a variant of the rotating prism method to construct two-dimensional luminance distributions in which the luminance changed in the horizontal direction but was constant in each narrow vertical strip. The horizontal changes in two such luminance distributions are shown in Figure 10. Figure 10a depicts a luminance profile wherein the luminance continuously d e creases from left to right. Bergstriim constructed this profile to quantitatively test the theory of Mach (1866)that attributes brightness changes to the second derivative d 2 L ( z /dz2 with respect to the spatial variable z of the luminance profile L ( z ) (see Ratlid, 1965). Mach (1866) concluded that, if two adjacent points 2 1 and 2 3 have , the point 23 at which the second derivative similar luminances [L(zl) M L ( z 3 ) ] then is negative {[da(z,)/dz2] < 0}, looks brighter than the point 21 at which the second derivative is positive { [ d 2 L ( 2 1 ) / d z 2>] 0}, and that a transition between a darker and
230
Chapter 4
FIVE STEP ILLUSION INPUT PATTERN
BOUNDARY CONTOUR PATERN '.9'
FUTURE CONTOUR PATTERN
MONOCULAR BRIGHTNESS PATTERN
8.7*10-'
E
*
-8.7*10"
00
C
E: , 94
,
,
,
, POS
,
,
,
,
, UOO
d
Figure 8. Simulation of the iive-step illusion: The main difference between Figures 7b and 8b is that Figure 8b contains six syncytial domains whereas Figure 7b contains only three. Each domain averages only the part of the featurecontour pattern that it receives. The result in Figure 8d is a much flatter pattern that one might expect from Figure 7d.
Neural Dynamics of Brightness Perception
23 1
Figure 9. The luminance profile in (a) generates the brightness profile in (b). (Redrawn with permission from Arend, Buehler, and Lockhead, 1971.)
232
Chapter 4
x; x;x; Figure 10. Two luminance profiles studied by Bergstriim. Position 23 of (a) looks brighter that position 2: of (b). Also position 23 looks brighter than positive z1 in (a), and position 2; looks somewhat brighter than position 2; in (b). These data challenge the hypothesis that sharp edges determine the level of brightness. They also challenge the hypothesis that a sum of spatial-frequency-filteredpatterns determines the level of brightness.
Neural Dynamics of Br@itness Perception
233
a lighter percept occurs at the intervening inflection point PZ { [ d z L ( q ) / d z 2 = ] 0 ) . In Figure I l a , as Mach would predict, the position 5 3 to the right of 5 2 looks brighter than the position 21 to the left of 5 2 . Figure l l a describes the results of a magnitudeestimation procedure that was used to determine the brightnesses of different positions along the luminance profile. For details of this procedure, Bergstrom’s original articles should be consulted. Figure 1l a challenges the hypothesis that brightness perception depends exclusively upon difference estimates at sharp luminance edges. No edge exists at the inflection point z2, yet a significant brightness difference is generated around position zz.Moreover the brightness difference inverts the luminance gradient, since z1 is more luminous that z3, yet 1 3 looks brighter than 5,. One might attempt to escape this problem by claiming that, although the luminance profile in Figure 10a contains no manifest edges, the luminance changes sufficiently rapidly across space to be edge-like with respect to some spatial scale. This hypothesis collapses when the luminance profile of Figure 10b is considered. The luminance profile of Figure lob is constructed from the luminance profile of Figure 10a as follows. The luminance in each rectangle of Figure 10b is the average luminance taken across the corresponding positions of Figure 10a. Unlike Figure 10a, however, Figure 10b possesses several sharp edges. If the hypothesis of Arend ef al. (1971) is taken at face value, then position 2; of Figure lob should look brighter than position z3 of Figure 10a. This is because mean luminances are preserved between the two figures and Figure 10b has sharp edges, whereas Figure 1Oa has no interior edges whatsoever. A magnitude estimation procedure yielded the data shown in Figure l l b . Comparison of Figures l l a and I l b shows that position 5; looks darker, not brighter, than position 53. These data cast doubt on the conclusion of Arend e t al. (1971),just as the data of Arend et al. cast doubt on the conclusion of Mach (1866). Our numerical simulations reproduce the main effects summarized in Figures 10 and 11. The critical feature of these simulations is that the two luminance profiles in Figure 10 generate different boundary-contour patterns as well as different featurecontour patterns. The luminance profile of Figure 12 generates boundary contours only at the exterior edges of the luminance profile (Figure 12b). By contrast, each interior step of luminance of Figure 13a also generates a boundary contour (Figure 13b , Thus, the monocular perceptual domains that are defined by the two luminance pro les are entirely different. In this sense, the two profiles induce, and are processed by, different perceptual spaces. These different parsings of the cell syncytium not only define different numbers of spatial domains, but also different sizes of domains over which featural quality can spread. In addition, the smooth versus sharp contours in the two luminance profiles generate different feature-contour patterns (Figures 12c and 13c). The differences between the feature-contour patterns do not, however, explain Bergstrtim’s data, because the featurecontour pattern at position 2: in Figure 13c is more intense than the feature-contour pattern at position 2 3 in Figure 12c. This is the result one would expect from classical analyses of contrast enhancement. By contrast, when these feature-contour patterns are diffusively averaged between their respective boundary contours, the result of Bergstrtim is obtained. The monocular brightness pattern at position 5 3 in Figure 12d is more intense than the monocular brightness pattern at position z; in Figure 13d. We therefore concur with Bergstrom in his claim that these results are paradoxical from the viewpoint of classical notions of brightness contrast. We know of no other brightness thmry that can provide a principled explanation of both the Arend et al. (1981) data and the Bergstrom (1966, 1967a, 1967b) data. In particular, both types of data cause difficulties for the Fourier theory of visual pattern perception as an adequate framework with which to explain brightness percepts. For example, the low-frequency spatial components in the two Bergstrijm profiles in Figure 10 are similar, whereas the step-like contour in Figure 10b also contains high-
B
Chapter 4
234
80
70
W
f 60 I
12 50 a
m 40
W
-+
>
30
0 7 20
m
3 v)
a
I
a u
I
I I
I
I I
a
I
1
2
34
5
67
8
910
11
12
SPACE
Figure 11. Magnitude estimates of brightness in response to the luminance profiles of Figure 10. (Redrawn from Bergstriim, 1966.)
Neural Dynanucs of Brightness Perception
235
BERGSTROM BRIGHTNESS PARADOX (1) INPUT PATTERN
BOUNDARY CONTOUR PATTERN
a
b
FUNRE CONTOUR PATERN
MONOCUUR BRIGHTNESS
PATTERN 5 1*10°
9.6.10'
-9.6.10'
C
-5.1*1Go
Figure 12. Simulation of a Bergstrom (1966) brightness experiment. The input pattern (a) generates boundary contours in (b) only around the luminance profile as a whole. By contrast, the input pattern in Figure 13a generates boundary contours around each step in luminance (Figure 13b). The input patterns in Figures 12a and 13a thus determine different syncytial domains within which featural filling-in can occur. The in ut patterns in Figures 12a and 13a also determine different feature-contour patterns rFigures 12c and 13c). The feature-contour pattern in Figure 13c is more active at position z; than is the feature-contour pattern of Figure 12c at the corresponding position q . (See Figure 10 for definitions of 2 3 and z;.) The feature-contour pattern of Figure 12c diffuses within the syncytial domains of Figure 12b, and the feature-contour pattern of Figure 13c diffuses within the syncytial domains of Figure 13b. The resultant brightness pattern of Figure 12d is more active at position 2 3 than is the brightness pattern of Figure 13d at position 2;. This feature-to-brightness reversal is due to the fact that the boundary-contour patterns and feature-contour patterns induced by the two input patterns are different. The global etructuting of each feature-contour pattern within each syncytial domain determines the ultimate brightness pattern.
236
Chapter 4
BERGSTROM BRIGHTNESS PARADOX (2)
INPUT PATTERN
BOUNDARY CONTOUR PATTERN
' 1
POS
a
b
FEATURE CONTOUR PATTERN
MONOCULAR BRIGHTNESS
-9,4*lo-1
700
PATTERN 9.6*10'
-9.6*10'
Figure 13. Simulation of a Bergstrom (1966) brightness experiment. See caption of Figure 12.
Neural Dynamics of Brightness Perception
231
spatial-frequency components. One might therefore expect position 5 3 to look brighter than position z’,whereas the reverse is true. In a similar fashion, when a rectangular luminance profije is Fourier analysed using the human modulation transfer function (MTF), it comes out looking like a Craik-O’Brien contour (Cornsweet, 1970). A CraikO’Brien contour also comes out looking like a Craik-O’Brien contour. Our explanation, by contrast, shows why both Craik-O‘Brien contours and rectangular contours look rectangular. Some advocates of the Fourier approach have responded to this embarrassment by saying that what the outputs of the MTF look like is irrelevant, since only the identity of these outputs is of interest. This argument has carefully selected its data. It does not deal with the problem that the interior and exterior activities of a Craik-O’Brien contour are the same and differ from the activities of the cusp boundary, whereas the interior and boundary activities of a rectangle are the same and differ from the activities of the rectangle exterior. The problem is not merely one of equivalence between two patterns. It is also one of the recognition of an individual pattern. These difficulties of the Fourier approach do not imply that multiple spatial scales are unimportant during visual pattern perception. Multiple scale processing does not, however, provide a complete explanation. Moreover, the feature-contour processing within each scale needs to use shunting interactions, rather than additive interactions of the Fourier theory, in order to extract the relative contrasts of the feature-contour pattern (Appendices A and B; Grossberg, 1983b). 11. The Asymmetry Between Brightness Contrast and Darkness Contrast
In the absence of a theory to explain the Arend et al. and Bergstriim data, one might. have hoped that a more classical explanation of these effects could be discovered by a more sophisticated analysis of the role of contrast enhancement in brightness perception. In both paradigms, it might at first seem that contrast enhancement around edges or inflection points could explain both phenomena in a unified way, if only a proper definition of contrast enhancement could be found. The following data of Hamada (1980) indicate, in a particularly vivid way, that more than a proper definition of contrast enhancement is needed to explain brightness data. Figure 14 depicts three luminance profiles. In Figure 14a, a uniform background luminance is depicted. Although the background luminance is uniform, it is not, strictly speaking, a ganzfeld, or it is viewed within a perceptual frame.) In Figure 14b, a brighter Craik-O’Brien luminance profile is added to the background luminance. In Figure 14a, a darker Craik-O’Brien luminance profile is subtracted from the background luminance. The purity of this paradigm derives from the facts that its two CraikO’Brien displays are equally long and that the background luminance is constant in all the displays. Thus, brightening and darkening effects can be studied uncontaminated by other variables. The classical theory of brightness contrast predicts that the more luminous edges in Figure 14b will look brighter than the background in Figure 14a and that, due to brightness contrast, the background around the more luminous edges in Figure 14b will look darker than the uniform pattern in Figure 14a. This is, in fact, what Hamada found. The classical theory of brightness contrast also predicts that the less luminous edges in Figure 14c will look darker than the background in Figure 14a and that, due to brightness contrast, the background around the less luminous edges in Figure 14c will look brighter than the background in Figure 14a. Hamada (1980) found, contrary to classical theory, that both the dark edges and the background in Figure 14c look darker than the background in Figure 14a. These data are paradoxical because they show that brighter edges and darker edges are, in some sense, asymmetrically processed, with brighter edges eliciting less paradoxical brightness effects than darker edges. Hamada (1976, 1978) developed a multistage mathematical model to attempt to deal with his challenging data. This model is remarkable for its clear recognition that
1;
238
Chapter 4
Figure 14. The luminance contours studied by Hamada (1980). All backgrounds in (a)-(c) have the same luminance.
Neural Dynamics of Brightness Perception
239
a “nonopponent” type of brightness processing is needed in addition to a contrastive, or edge-extracting, type of brightness processing. Hamada did not define boundary contours or diffusive filling-in between these contours, but his important model should
nonetheless be better known. Figures 15 and 16 depict a simulation of the Hamada data using our theory. As desired, classical brightness contrast occurs in Figure 15, whereas a nonclassical darkening of both figure and ground occurs in Figrue 16. The dual action of signals from the BCS stage to the MBC stages as boundary-contour signals and as inhibitory feature-contour signals contributes to this result in our simulations. All of the results described up to now consider how activity patterns are generated within the MBCL and MBCR stages. In order to be perceived, these patterns must activate the BP stage. In the experiments already discussed, the transfer of patterned activity to the BP stage does not introduce any serious constraints on the brightness properties of the FIRE model. This is because all the experiments that we have thus far considered present the same image to both eyes. The experiments that we now discuss present different combinations of images to the two eyes. Thus they directly probe the process whereby monocular brightness domains interact to generate a binocular brightness percept. 12. Simulations of FIRE
In the remaining sections of the article, we describe computer simulations using the simplest version of the FIRE process and the same model parameters that were used in Cohen and Grossberg (1984). We show that this model qualitatively reproduces the main properties of Fechner’s paradox (Levelt, 1965), binocular brightness summation and averaging (Blake, Sloane, and Fox, 1981; Curtis and Rule, 1980), and a parametric brightness study of Cogan (1982) on the effects of rivalry, nonrivalry suppression, fusion, and contour-free images. Thus, although the model was not constructed to simulate these brightness data and does not incorporate many known theoretical refinements, it performs in a manner that closely resembles difficult data. We believe that these simulations place the following quotation from a recent publication into a new perspective: “The emerging picture is not simple....Levelt’s theory .. . works for binocular brightness perception, but not for sensitivity to a contrast probe ....It seems unlikely that any single mechanism can account for binocular interactions ....The theory of binocular vision is essentially incomplete” (Cogan, 1982, pp.14-15). Before reporting simulations of brightness experiments, we review a few basic prop erties of this FIRE model. All the simulations were done on one-dimensional arrays of cells, for simplicity. All the simulations use pairs of input patterns that have zero disparity with respect to each other. The reaction of a single spatial scale to these input patterns will be reported. Effects using nonzero disparities and multiple spatial scales are described in Cohen and Grossberg (1984) and Grossberg (1983b). The input patterns should be interpreted as monocular patterns across MBCL and MBCR, rather than the srenic images themselves. (a) Insensitivity t,o functional ganzfelds. In Figure 17, two identical input patterns exist at the MBCL and MBCR stages (Figure l7a). Both input patterns are generated by putting a rectangular pattern through a Gaussian filter. This smoothing operation was sufficient to prevent the pathways MBCL --* BP and MBCR + BP in Figure 4 from detecting suprathreshold contours in the input patterns. We call an input pattern that has no contours that are detectable by these pathways a “functional ganzfeld.” The FIRE process does not lift functional ganzfelds at any input intensity. The simulation illustrates that the BP stage is insensitive to input patterns that include no boundary contours detectable by its filtering operations. (b) Figure-ground synthesis: Ratio scale and power law. Figure 18 describes the FIRE reaction that is triggered when a rectangular input pattern is superimposed
Chapter 4
240
HAMADA BRIGHTNESS PARADOX (1) INPUT PATTERN
BOUNDARY CONTOUR PATTERN 6 :*::J
1
- 1
POS
a
1700
13
1700
-6 O * l O a
> Voo
-5.3*10-'
1
C
'
'
' POS ' '
g : ' l '
-3.1*10-'
-
'
"
' 1700
d
Figure 15. Simulation of the Hamada (1980) brightness experiment. The dotted line in (d) describes the brightness level of the background in Figure 13a. Classical contrast enhancement is obtained in (d).
Neural Dynamics of Brighmess Perception
241
HAMADA BRIGHTNESS PARADOX (2)
INPUT PATTERN
BOUNDARY CONTOUR PATTERN
POS
-3.0*10-'
i
1700
b FEATURE CONTOUR PAlTfRN
MONOCULAR BRIGHTNESS PAlTfRN
5.3*10-'
3.;+d
1
Figure 16. Simulation of the Hamada (1980) brightness experiment. The dotted line in (d) describes the brightness level of the background in Figure 13a. Both background and cusp of (a) look darker than this reference level.
Chapter 4
242
3.0*:0''
3.0.10-'
E
F
5 c -1 0
0 on the next nerve cell u2. Such a law would permit unbiased transmission of signals from one cell to another. We are faced with a dilemma, however, if the signal from u1 to v2 is due to the release of a chemical z ( t ) from u1 that activates u2. If such a chemical transmitter is persistently released when S is large, what keeps the net signal T from getting smaller and smaller as u1 runs out of transmitter? Some means of replenishing, or accumulating, the transmitter must exist to counterbalance its depletion due to release fom u1. Based on this discussion, we can rewrite (1) in the form
T=Sz
(2)
and ask how the system can keep z replenished so that z(t) 2
B
(3)
at all times t. This is a question about the eeneitiuity of 212 to signals from q,since if z could decrease to very small values, even large signals S would have only a small effect on T. Equation (2)has the following interpretation. The signal S causes the transmitter z to be released at a rate T = Sz. Whenever two processes, such as S and z , are multiplied, we say that they interact by maee action, or that r gatee S. Thus (2 says that z gates S to release a net signal T , and (3) says that the cell tries to replenis z to maintain the system’s sensitivity to S. Data concerning the gating action of transmitters in several neural preparations have been collected by capek et al. (1971), Esplin and Zablmka-Esplin (1971), Zablocka-Esplin and Esplin (1971). What is the simplest law that joins together both (2) and (3)? It is the following differential equation for the net rate of change dzldt of z:
h
d = A ( B - 2 ) - SZ. dt -Z
(4)
Equation (4) describes the following four processes going on simultaneously.
I and II. Accumulation and Production and Feedback Inhibition The term A ( B - z ) enjoys two possible interpretations, depending on whether it represents a passive accumulation process or an active production process. In the former interpretation, there exist B sites to which transmitter can be bound, z sites are bound at time t , and B - z sites are unbound. Then term A ( B - r ) says simply that transmitter is bound at a rate proportional to the number of unbound sites. In the latter interpretation, two processes go on simultaneously. Term A B on the right-hand side of (4) says that z is produced at a rate A B . Term - A z says that
Chopter 5
216
once E is produced, it inhibits the production rate by an amount proportional to E’S concentration. In biochemistry, such an inhibitory effect is called jeedback inhibition by the end product of a reaction. Without feedback inhibition, the constant rate A B of production would eventually cause the cell to burst. With feedback inhibition, the net production rate is A ( B - , which causes z ( t ) to approach the finite amount B , as we - z ) thus enables the cell to accumulate a target level B desire by (3). The term of transmitter.
III and IV. Gating and Release Term - S E in (4) says that z is released at a rate S z , as we desire by (2). As in (2 , release of z is due to mass action activation of z by S, or to gating of S by z (Figure 1 . The two equations (2) and (4) describe the simplest dynamic law that “corresponds” to the constraints (2) and (3). Equations (2) and (4) hereby begin to reconcile the two constraints of unbiased signal transmission and maintenance of sensitivity when the signals are due to release of transmitter. All later refinements of the theory describe variations on this robust design theme.
1
5. Intracellular Adaptation and Overshoot
Before describing these variations, let us first note that equations (2) and (4) already imply important qualitative features of photoreceptor dynamics; namely, adaptation to maintained signal levels, and overshoot in response to sudden changes of signal level. Suppose for definiteness that S ( t ) = So for all times 1 to and that at time t = t o , S ( t ) suddenly increases to S1. By (4), z(2) reacts to the constant level S ( t ) = SO by approaching an equilibrium value 20. This equilibrium value is found by setting dzldt = 0 in (4) and solving to get
By (2), the net signal TOto u2 at time t = 20 is ABSo
sozo = A+S’ 0 Now let S ( t ) switch to the value S1 > SO. Because z ( t ) is slowly varying, z ( t ) approximately equals zo for some time after t = t o . Thus the net signal to uz during these times is approximately equal to ABSl A So’
s1qJ= -
+
(7)
+
b
Equation (7 has the same form as a Weber law J ( A I)-1.The signal S1 is evaluated relative to t e baseline So just as J is evaluated relative to I. The Weber law in (7) is due to slow intracellular adaptation of the transmitter to the sustained signal level. A Weber law can also be caused by fast intercellular lateral inhibition across space, but the mechanisms underlying these two adaptive processes are entirely different (Grossberg, 1973, 1980). The capability for intracellular adaptation can be destroyed by matching the reaction rate of the transmitter to the fluctuation rate in S ( t ) . For example, if z ( t ) reacts as quickly as S(t), then at all times t ,
T ( t )s
ABS(t) A S(t)
~
+
Adoptotion and Transmitter Gating in Vertebrate Photoreceptors
S
Lr
.
z
f
211
SZ
release
gating
accumulation
-
S
SZ
gat ing
r elea 8 e
b
Figure 1. (a) Production, feedback inhibition, gating, and release of a transmitter z by a signal S. (b) Mass action transmitter accumulation at unoccupied sites has the same formal properties as production and feedback inhibition.
Chapter 5
278
c
t
Figure 2. Overshoot and habituation of the gated signal T = S z due to a sudden increment in signal S. no matter what values S ( t ) attains, so that the adaptational baseline, or memory of prior input levels, is destroyed. A basis for overshoot behavior can also be traced to z’s slow reaction rate. If z ( t ) in (4) reacts slowly to the new transmitter level S = S1, it gradually approaches the new equilibrium point that is determined by S = S1, namely AB
= __ A + Si
(9)
as the net signal decays t o the asymptote
Thus after S ( t ) switches from So to Sl, the net signal T = Sz jumps from (6) to (7) and then gradually decays to (10) (Figure 2). The exact course of this overshoot and decay is described by the equation
+ Sl)(t - to)> + so ABSl + -(i - exp{-(A + Si)(t- t o ) ) ) A+
SIZ(t) = ABS1 exp{-(A
A
5‘1
+
for t 2 to. Equation (11) shows that the gain or averaging rate A S, of T through time increases with the size of the signal S1. The transmitter law (4) is thus capable of “automatic gain control” by the signal. The sudden increment followed by slow decay of T is called “overshoot” in a photoreceptor and “habituation” in various other neural preparations.
Adaptation and Pansmitter Gating in Vertebrate Photoreceptors
279
4. Monot,onic Increments a n d Nonmonotonic Overshoots to Flashes on Variable Background
The minimal transmitter model implies more subtle properties as well. Some of these properties figure prominently in our explanation of the BHL data. Others stand as experimental predictions. Baylor el al. (1974b) found that, in response to a flash of fixed size superimposed on a succession of increasing background intensities, the cone potential V reacts with a progressive decrease in the size of its transient response. By contrast, V’s transient response reaches its peak at successively earlier times until a sufficiently high background intensity is reached. In response to even higher background intensities, the potential reaches its peak at successively later times (Figure 3). This is a highly nonlinear effect. In our theory, T is the input to the photoreceptor’s potential V . We study T in its own right to provide a better understanding of V’s behavior in the full theory. Simple approximations make possible analytic estimates that qualitatively explain the behavior in Figure 3. Since a flash sets off a chain reaction in the cone, and the chain reaction lasts for some time after the flash terminates, we approximate the chain reaction by a rectangular step of fixed size 6. When a flash occurs at a succession of background intensities, we superimpose the step 6 on a succession of background intensities S (Figure 4). We estimate the effect of the flash on the potential peak by computing the initial change in T due to the change in S by 6. We also estimate a possible initial “hump” in the potential through time by measuring the height and the area of t,he overshoot created by prescribed background levels S (Figure 4). The initial change in T to a change in S by 6 is found to be a decreasing function of S. This result is analogous to the decreasing size of the potential change caused by a fixed flash at successively higher background intensities. However, the size of the overshoot, or “hump,” need not be a decreasing function of 6. If 6 is sufficiently small, then the overshoot size can increase before it decreases as a function of 6. In other words, a more noticeable hump can appear at large background intensities S, but it can eventually shrink as the background intensity is increased even further. Baylor et al. (1974b) report humps at high background intensities as well as their shrinkage at very high background intensities. To estimate the change A T due to a step size of 6, we subtract (6) from (7) to find
Let S1 - So = 6, corresponding to a step of fixed size 6 superimposed after S ( t ) equilibrates to a background intensity SO= S. Then
AT=-
AB6 A+S
which is a decreasing function of S. To estimate the overshoot size n, we subtract (10) from (7) to find
Again setting S1 - So = 6 and SO= S, we express R as the function of S
R(S)=
+
A B ( S 6)6 ( At S ) ( A S 6 ) ’
+ +
(13)
Chpter 5
280
0
c
.-c0
0
300
rnsec
Figure 3. The transient reactions of a cone potential to a fixed flash superimposed on a succession of increasing background levels. The potential peaks decrease, whereas the times of maximal potential first decrease and then increase, a8 the background parametrically increases. Effect of increasing intensity of conditioning step on response to 11 msec flash applied 1.1 sec after beginning of a step lasting 1.7 sec. The abscissa is the time after the middle of the flash; and the ordinate is U ( t ) / l A t ,where U ( t ) is the hyperpolarization, At is the pulse duration, and Z is proportional to flash intensity. The numbers against the curves give the logarithm of the conditioning light expressed in photoisomerizations cone-' sec-l. Redrawn from Figure 3 (Baylor and Hodgkin, 1974, p.734).
Adaptation and Transmitter Gating in Vertebrate Photoreceptors
281
t
t
Figure 4. An input step of fixed size 6 on a background S causes a transient change in T of size A T and an overshoot of size n. How does n(S)change as a function of S? To test whether n(S) increases or decreases as a function of S, we compute whether dn/dS is positive or negative. One readily proves and that d n / d S < 0 if S > that d f l / d S > 0 at S = 0 if A > ;(1+&)6; or if A < 6. In other words, the size of the overshoot always decreases as a function of S if S is chosen sufficiently large, but the overshoot size increases at small S values if the increment 6 is sufficiently small. A similar type of non-monotonic behavior describes the total area of the overshoot.
-6+JAIA-6)
5. Miniaturized Transducers a n d Enzymatic Activation of Transmitter
Production We will now discuss how the time at which the potential reaches its peak can first decrease and then increase as a function of background intensity. Our discussion again centers on the design theme of ensuring the transducer’s sensitivity. By proceeding in this principled fashion, we can explain more than the “turn-around” of the potential peaks. We can also explain why the steady-state of T as a function of S can obey a law of the form QS) T = 1PS(1 R S US2
+
+
+
282
chapter 5
with
P,Q,R, and U constants, rather than a law of the form T=- PS
1+RS
as in (6).
Equation (16)is the analog within our theory of the BHL equation ~1
K =
PS(1+ QS) 1+RS
for the steady-state level of their ”blocking” variable 21. Equation (18) cannot be valid at very large S values because it predicts that z1 can become arbitrarily large, which is physically meaningless. This does not happen in (16).The appearance of term U S a in (16)allows us to fit BHL’s steady-state data better than they could using (18). More important than this quantitative detail is the qualitative fact that the mechanism which replaces (17) by (16)also causes the turn-around in the peak potential. We now suggest that this mechanism is a light-induced enzymatic modulation of transmitter production and/or mobilization rates. Thus we predict that selective poisoning of this enzymatic mechanism can simultaneously abolish the turn-around in the potential peak and reduce (16)to (17). The need for enzymatic modulation can be motivated by the following considerations. Despite the transmitter accumulation term A ( B - z ) in equation (4), habituation to a large signal S can substantially deplete z, as in (5). What compensatory mechanism can counteract this depletion as S increases? Can a mechanism be found that maintains the sensitivity of the transmitter gate even at large S values? One possibility is to store an enormous amount of transmitter, just in case; that is, choose a huge constant B in (4). This strategy has the fatal flaw that a very large storage depot takes up a lot of space. If each photoreceptor is large, then the number of photoreceptors that can be packed into a unit retinal area will be small. Consequently the spatial resolution of the retina will be poor in order to make its resolution of individual input intensities good. This solution is unsatisfactory. Given this insight, our design problem can be stated in a more refined fashion as follows: How can a miniaturized receptor maintain its sensitivity at large input values? An answer is suggested by inspection of equation (4). In equation (4), the transmitter depletion rate -Sz increases as S increases, but the transmitter production rate A is constant. If the production rate keeps up with the depletion rate, then transmitter can be made continuously available even if B is not huge. The marriage of miniaturization to sensitivity hereby suggests that the coefficient A is enzymatically activated by the signal S . Let us suppose that this enzymatic step obeys the simplest mass action equation, d -dtA = - C ( A - Ao)
+ D [ E - ( A - Ao)]S.
In (19),A ( t ) has a baseline level A0 in the dark (S = 0). Turning light on makes S positive and drives A ( t ) towards its maximum value A0 E. Rewriting (19)as
+
-dA = -(C dt
+ D S ) ( A - Ao) + DES
shows that the activation of A ( t ) by a constant signal S increases the gain C + DS as well as the asymotote DES A=A0+=
Adaptation and 7kansmitter Gating in Vertebrate Photoreceptors
283
of A ( t ) . This asymptote can be rewritten in the ronvenient form A=A
by using the notation
O
(-)i1++ GF SS
F = (A0 + E)DA;'C-'
and
G = DC-'
.
(23) (24)
To make our main qualitative points, let us assume for the moment that the enzymatic activation of A by S proceeds much more rapidly than the release of z by S. Then A ( t ) approximately equals its asymptote in (22) at all times t . Equation (4) can then be replaced by the equation
d
i+FS 1+GS
- z ) - St.
&z = Ao(--)(B
Let us use (25) to commte the steadv-state resDonse T = S z to a sustained signal S . We find t h i t ' T = PS(1 QS) 1 R S US2
+
+
where
and
+ P = B, Q = F = (A0 + E)DA,'C-' R = A,' + F = A;'[l + (A0 + E ) D C - ' ] u = GA,'
= DA;'c-'.
(29)
Note that the form of (16) does not change if S is related to light intensity I by a law of the form PI S(I)= (30) 1+uI' Only the coefficients P , Q, R, and U change. 6. Turn-Around of Potential Peaks at High Background Intensities Despite the assumption that A depends on S, all of our explanations thus far use a single differential equation (25). We will qualitatively explain the turn-around of potential peaks, the quenching of a second overshoot in double flash experiments, and the existence of rebound hyperpolarization when a depolarizing current is shut off during a hyperpolarizing light using only this differential equation. In the BHL theory, by contrast, a substantial number of auxiliary differential equations are needed to explain all of these phenomena at once. Moreover, we can quantitatively fit the data using only equation (25) better than BHL can fit the data using all their auxiliary variables. Our full theory provides an even better fit. More importantly, equation (25) suggests that all these phenomena are properties of a transmitter gate. To qualitatively explain the turn-around of peak potential as background intensity increases, we consider Figures 5 and 6 . In Figure 5, S starts out at a steady-state value SO. Then a flash causes a chain reaction which creates a gradual rise and then fall in S. Function S reaches its maximum at the time 1 = t s when dS/dt = 0. The transmitter z responds to the increase in S by gradually being depleted. As the chain reaction wears off, z gradually accumulates
Chapter 5
2.84
again. Function z reaches its minimum at the time t = t , when dzldt = 0. From Figure 5, we can conclude that the gated signal T = Sz reaches a maximum at a time t = t~ before S reaches its maximum. This is because
dT - = -dS 2 dt dt
dz + s--. dt
After time t = tS, both d s l d t and dzldt are negative until the chain reaction wears off. Thus d T / d t is also negative during these times. Consequently d T / d t = 0 at a time tT < t s . Figure 6 explains the turn-around by plotting the times when d S / d t = 0 , dz dt = 0 , and d T / d t = 0 as a function of the background level SO. In Figure 6, we think o ts(So), t,(So), and ~ T ( Sas~functions ) of SO. Two properties control the turn-around: (a) the function t s ( S 0 ) might or might not decrease as SOincreases, but eventually it must become approximately constant at large So values; (b) the function t,(So) decreases faster as SOincreases until t,(So) approximately equals ts(S0) at large So values. Property (a) is due to the fact that the photoreceptor has a finite capacity for reacting to photons in a unit time interval. After this capacity is exceeded, higher photon intensities cannot be registered. Property (b) is due to the light-induced increase of z’s reaction rate to higher SO levels. Light speeds up 2’s reaction rate, so that at higher So values z can equilibrate faster to the chain reaction S. In particular, t,(So) approaches ts(S0) as SOincreases. Using properties (a) and b), we will now explain the turn-around. When SO is small, d z / d t is also small. By 31) this means that dTldt = 0 almost when dS/dt = 0, or that tT S ts. As SO increases to intermediate values, the chain reaction S also increases. Consequently dz/dt becomes more negative and makes z smaller. Also z’s gain is sped up, so that t , approaches closer to t s . In (31), this means that S d t l d t will be large and negative at times when e is small. To achieve dTldt = 0, we therefore need dSldt to be large and positive. In other words, T reaches its peak while S is still growing rapidly. Hence tT occurs considerably earlier than t s . This argument shows why the peak of T occurs earlier as SOIncreases. Why does turn-around occur? Here properties (a) and (b) are fully used. By property (b), z reaches its minimum right after S reaches its maximum if SO is large. In other words, t , approaches t s as SO becomes large. This means that both dS/dt = 0 and d t / d t = 0 at almost the same time. By (31), also d T l d t = 0 at about this time. In all, tT Z tS 2 t , if S w 0. Now we use property (a). Since t s ( S 0 ) is approximately ) bend backwards from its position much earlier constant at large So values, t ~ ( S 0must than ts(So) at intermediate So values to a position closer to ts(S0) values. This is the turn-around that we seek.
1
t
7. Double Flash Experiments
In BHL’s double flash experiments, a bright flash causes the potential to overshoot. A second bright flash that occurs while the potential is reacting to the chain reaction caused by the first flash does not cause an overshoot even though it extends the duration of the chain reaction. This effect can be explained as follows (Figure 7). The first bright flash causes an overshoot due to the slow reaction of z to the onset of the chain reaction, aa in Section 4. For definiteness suppose that z ( 0 ) = B at time t = 0 and that the chain reaction starts rising at time t = 0 to a maintained intensity of approximately S. By (M), z ( t ) decreases from B to approximately
Adaptation and lhnsmiiier Gaiing in veriebrate pkoiorecepiors
I
I
I I
tS
I
285
I
tT
t
t
Figure 5. Signal S ( t ) peaks at time t = t s before transmitter z ( t ) reaches its minimum at time t = t,. Consequently, the gated signal T = S z peaks at a time t = t~ earlier than 1 = t g .
Chapter 5
n
I SO
Figure 6. As &(So) is drawn closer to ts(So at large So values due to enzymatic activation of transmitter accumulation rate, t~ So) reaches a minimum and begins to increase again. This decrease in z(2) causes the overshoot, since the product Sz(2) first increases due to the fast increase in S ( t ) and then decreases due to the slower decrease in z(t). Once z ( t ) equilibrates at the level (32), it thereafter maintains this level until the chain reaction decays. In a double flash experiment, the second flash occurs before the chain reaction can decay. The second flash maintains the chain reaction a while longer at the level S. No second overshoot in z occurs simply because z has already equilibrated at the level (32) by the time the second flash occurs. When T is coupled to the potential V, the overshoot in T also causes a gain change in V’s reaction rate. BHL noticed this gain change and introduced another conductance into their model whose properties were tailored to explain the double flash experiment. In the BHL model, this second conductance is a rather mysterious quantity (see Section 15). In our model, it follows directly from the slow fluctuation rate of the transmitter gate (Section 9). Our model’s predictions can be differentiated from those of the BHL model because they all depend on the slow rate of the transmitter gate. Speeding up the transmitter’s reaction rates should eliminate not only the overshoot and the second conductance, but also the photoreceptor’s ability to remember an adaptational baseline (Section 3).
Adaptation and lkanmitter Gating in Vertebrate Photoreceptors
281
n
mV
- 20
-
0
750
msec
Figure 7. Effect of a bright conditioning flash on the response to a subsequent bright test flash. (a) Response to test flash alone. (b) Response to conditioning flash alone. (c) Response to both flashes, with the upper two responses dotted. Redrawn from Figure 15 of Baylor et ul. (1974a, p.716). 8. Antagonistic Rebound by an Intraeellular Dipole: Rebound Hyperpolarization Due to Current Offset
The ubiquity of the gating design in neural systems is illustrated in a striking way by the following data. Baylor et ul. (1974a) showed that offset of a rectangular pulse of depolarizing current during a cone's response to light causes a rebound hyperpolarization of the cone's potential. By contrast, offset of a depolarizing current in the absence of light does not cause a rebound hyperpolarization (Figure 8). In other words, an antagonistic rebound in potential, from depolarization to hyperpolarization, can sometimes occur. One of the most important properties of a slow gate is its antagonistic rebound property. This property was flrst derived to explain data about reinforcement and attention in Grossberg (1972a, 1972b, 1975) and was later used to explain data about perception and cognitive development in Grossberg (1976, 1980). These results show how antagonistic rebounds can be caused when the signals to one or both of two par-
Chapter 5
288
1
40 f
I
I
I
t
I
I
I
I
120
0
msec Figure 8 . Changes in potential produced by current in darkness (a), and during the response to light (b), superimposed tracings. Between arrows, a rectangular pulse of depolarizing current (strength 1.5x lo-'') was passed through the microelectrode. (c) is the response to light without current. Redrawn from Figure 10 of Baylor et al. (1974a, p.706). allel channels are gated before the gated signals compete to elicit net outputs from the channels (Figure 9). In reinforcement and cognitive examples, the two competing channels have typically been interpreted to be due to intercellular interactions. The competing channels implicated by the BHL data are, by contrast, intracellular. They are the depolarizing and hyperpolarizing voltage-conductance terms in the membrane equation for the cone potential (Section 9). In the remainder of this section, we will review how slow gates can cause antagonistic rebounds. Then we will have reached the point where the gated signal must be coupled to the potential in order to derive further insights. This coupling is, however, quite standard in keeping with our claim that most of the interesting properties of the BHL data are controlled by the fluctuations of T under particular circumstances. To explain the main idea behind antagonistic rebound, suppose that one channel receives input S1 and that the other channel receives input Sz = S1 c, L > 0. Let the first channel possess a slow gate 21 and the second channel possess a slow gate 21. Suppose for definiteness that each gate satisfies
+
i = 1,2 as in (32). The explicit form of (33) is irrelevant. All we need is the property that z, is a decreasing function of Si. In other words, larger signals can deplete more transmitter. This is true in (33) because, by (27) and (28), Q < R . However, the opposite is true for the gated signals TI= S1zl and Ta = 5 2 ~ 2 .The
function
t QS) T = 1PS(1 + RS + US2
(16)
is an increaeing function of S because, by (27)-(29), QR > U. In other words, a larger S signal produces a larger output T even though it depletes more z . This simple yet subtle fact about gates lies at the heart of our explanation of antagonistic rebound. The property was first derived in Grossberg (1968,1909). The lack of widespread knowledge of this property among experimentalists has caused much unnecessary confusion about
Adaptaticy and Transmitter Gating in Vertebrate Photoreceptors
289
Figure 9. A gated dipole. Signals S1 and Sz are gated by the slow transmitters z1 and zz, respectively, before the gated signals T I = Slzl and Tz = Szzz compete to generate a net reaction. the dynamics of transmitters in various neural systems. Because this fact was not known to BHL, they found an ingenious, albeit unintuitive, way to explain the rebound in terms of ?heir second conductance. Our themy differs from theirs strongly on this point. The steady-state equation (18) does not embody either the intuitive meaning or the mathematical properties of our steady-state equation (16). In our theory, antagonistic rebound can be trivially proved as follows. When t is on, Sz > S1. Consequently, despite the fact that zz < 21, it follows that T2 > TI. After competition acts, the net output Tz - TI of the on-channel is positive. To see how rebound occurs, shut c off. Then Sz and S1 rapidly equalize at the value S,. However z2 and z1 change more slowly. Thus the inequality zz < z1 persists for some time. Consequently the net output reverses sign because
Tz - 21'
SI(ZZ- 21) < 0
(34)
and an antagonistic rebound occurs. The rebound is transient due to the fact that and z1 gradually equilibrate to the same input S1 at a common value 21, and thus
22
after equilibration occurs. A similar argument shows how antagonistic rebound can occur if only the channel whose input is perturbed contains a slow gate.
Chapter 5
290
gated signa I
intensity
log
Figure 10. Shift of dynamic range to increments in log S after transmitter equilibrates to different background intensities So, S1, Ss, .... 9. Coupling of G a t e d Input to the Photoreceptor Potential
The photoreceptor potential V is assumed to obey the standard membrane equation
dV
COdt = (V+ - V)g+
+ (v-- V)g- + (Vp - V)gP
(36)
where V(t)is a variable voltage; Co is a capacitance; V + , V - , and VP are excitatory, inhibitory, and passive saturation points, respectively; g + , g - , and g p are excitatory, inhibitory, and passive conductances, respectively; and
v- 5 v p < v+. Then V -
(37)
V(1)5 V + for all 1 2 0 if V - 5 V(0)5 V + . By rewriting (36)as dV cox = -(g+ + 9- + gP)V t v + g + + v-g- + v p g p
+ g- + g p and the asymptote of V
we notice that the total gain of V is g+ to constant conductances is
+
v+g+ v - g - +-~ vpgp 9+ g- t g p
+
(38)
in response
(39)
Both the gain and the asymptote are altered by changing the conductances. In the special case of the turtle cone, light acts by decreasing the excitatory conductance g+ (Baylor et al., 1974b). We will assume below that the gated signal T causes this change in g+. Light hereby slows down the cone’s reaction rate as it hyperpolarizes the cone (driving V towards V-). We wish to emphasize at the outset that similar results would hold if we assumed that T increased, rather than decreased, g+. The main difference
Adaptation and lkansmitter Gating in VertebratePhotoreceptors
291
would be a speeding up of the potential change rather than its slowing down by inputs T . In all situations wherein V can react more quickly than T can fluctuate, differences in the gain of V do not imply new qualitative properties, although they can imply quantitative differences. One of these differences is that a decrease of V’s gain as T increases prolongs the duration of V’s reaction to light. We will couple T to g+ using a simple mass action law. Suppose that there exist go membrane “pores” of which g+ = g ( t ) pores are open and go - g(t) are closed at any time t . Suppose that T closes open pores by mass action, so that go pores will open after T shuts off. Then d 2 s = H(so - 9 ) - JgT (40) where H and J are positive constants. Suppose also that this process is rapid compared to V’s reaction rate to changes in g . We can then assume that g is always in approximate equilibrium with T . Setting d g l d t = 0, we find go g
+
=
m
where X = J H - I . To achieve a more symmetric notation, we write g - = g1 and for simplicity set g p = 0. We also rescale the time variable so that CO= 1 in (36). Then equation (36)takes the form
Our next steps are to compute the equilibrium potential Vo that occurs when T = 0, and to write an equation for the amount of hyperpolarization
s=vo-v that occurs in response to an arbitrary function
(43)
T. We find
and where The steady-state value , z found by (45) to be
L = Kg,(Vo - v - ) . of
2,
in response to a constant or slowly varying T is
zoo
where
MT = N+T
M = Vo - V - > 0
and From (47), it follows that
N~oo =T M-s,
(47)
292
Chapter 5
where M is the maximum possible level of hyperpolarization. This equat,ion is formally identical to the BHL equation (in their notation)
except that their blocking variable 21 is replaced by our gated signal T = Sz (Baylor et al., 197413). The formal similarity of (50)to (51) is one cornerstone on which our fit to the BHL data is based. Another cornerstone is the fact that T satisfies the equation
t QS) T = 1PS(1 + R S + US2 whereas
21
satisfies
PS(l+QS) 1+RS ’ BHL relate data about U to data about S via the hypothetical process z1 using (18) and (51) just as we related data about z to data about S via the hypothetical process T using (16) and (50). Despite these formal similarities, the substantial differences between other aspects of the two theories show how basic the gating concept is in transmitter dynamics. ZI
K =
10. “Extra” Slow Conductance During Overshoot a n d Double Flash Experiment s Baylor et al. (1974a) found that a bright flash causes an overshoot in hyperpolarization followed by a plateau phase before the potential returned to its baseline level. They also found that an extra conductance accompanies the overshoot. Because their blocking and unblocking variables could not explain these overshoot and conductance properties, they added a new conductance term, denoted by G f , to their voltage equation and defined its properties to fit the data. Baylor e l al. (1974b) also defined the properties of Gf to explain double flash experiments. If a second bright flash occurs during the plateau phase of the response to the first flash, then the plateau phase is prolonged, but a second overshoot does not occur (Figure 7). We will argue that such an “extra” conductance follows directly from the coupling of the gated signal T to the potential V. In other words, an extra conductance can be measured without postulating the existence of an extra membrane channel to subserve this conductance. To qualitatively understand this property, note that the gain of z in (45) is
r = 80 +181+ +K 8lKT T
‘
(52)
Approximate the chain reaction that is elicited by a bright flash with a rectangular step 0 ift 0. To see how this mechanism works, suppose that S ( t ) is a rectangular step with onset time t = 0 and intensity S. After the light and the depolarizing pulse are both turned on, z ( t ) will approach the asymptote
rather than the smaller asymptote
that would have been approached in the absence of the depolarizing current. If I9 is small, the asymptote of V with and without current will be similar because the gated signal
approaches wro as 8 does. If the pulse is shut off at time t = t o , B rapidly returns to the value 1, so that S can bind transmitter with its usual strength. Hence shortly after time t = t o , the gated signal will approximately equal
by (68), rather than the smaller value
*I
SP(1+ QS) = 1 RS + US2
+
that it would have attained by (68), had the depolarizing pulse never occurred. By (70), (71), and (42), more hyperpolarization occurs after the current is shut off than would have occurred in response to the light alone. This explanation of rebound hyperpolarization can be tested by doing parametric studies in which the asymptote of V in response to a series of J values is used to estimate e ( J ) from (42) and (69). When this B ( J ) function is substituted in (70), a predicted rebound hyperpolarization can be estimated by letting T = To in (42). A related rebound hyperpolarization effect can be achieved if, after the photoreceptor equilibrates to a fixed background level S , a step of additional input intensity is imposed for a while, after which the input is returned to the level S. An overshoot in potential to step onset, and an undershoot in potential to step offset, as well as a slowing down of the potential gain, can all be explained using (42) augmented by a transmitter gating law. Kleinschmidt and Dowling (1975) have measured such an effect in the Geklco geklco rod. It can be explained using Figure 11. Figure I l a depicts the (idealized) temporal changes in the input signal S ( t ) , Figure 1 l b depicts the corresponding depletion and recovery of z ( t ) , and Figure l l c depicts the consequent overshoot and undershoot
296
Chapter 5
of the gated signal T ( t ) ,which has corresponding effects on the asymptote and gain of
the potential V ( t ) . Baylor et al. (1974a, p.714) did a related experirnent when they either interruptedor brightened a steady background light. In particular, t.hey first exposed the turtle eye to a light equivalent to 3.7 x lo4 photon pm-2 sec- for one second. Then the light intensity was either doubled or reduced to zero for 40 msec. The net effect is to add or subtract the same light intensity from a steady background. The depolarization resulting from the offset of light is larger than the hyperpolarization resulting from doubling the light. This follows from (42) by showing that the equilibrium hyperpolarization achieved by setting S = So is greater than the change in hyperpolarization achieved right after switching S to 25'0 given that the transmitter has equilibrated to S = So. In other words, (72) where
and Inequality (72) can be reduced to the inequality V + > V-,and is therefore true. Another inequality follows from V + > V - and is stated as a prediction. Twice the equilibrium hyperpolarization achieved by setting S = SO exceeds the total hyperpolarization achieved right after switching S to 2So given that the transmitter has equilibrated to S = So. In other words,
13. Transmitter Mobilization
Baylor et al. (1974a) found that very strong flashes or steps of light introduce extra components into the response curves of the cone potential. These components led BHL to postulate the existence of more slow processes 23, 2 4 , and 25, in addition to their blocking and unblocking variables z1 and 22. The time scales which BHL ascribed to this augmented chain reaction of slow processes are depicted in Figure 12. Below we will indicate how transduction processes that are familiar in other transmitter systems, say in the mobilization of acetylcholine at neuromuscular junctions (Eccles, 1964,p.W) or of calcium in the sarcoplasm reticulum of skeletal muscles (Caldwell, 1971), can account for the existence of extra components. We will also indicate how these processes can cause very small correction terms to occur in the steady-state relationship (16)between the gated signal T and the signal intensity S. Let us distinguish between transmitter that is in bound, or storage, form and transmitter that is in available, or mobilized, form, as in Figure 13. Let the amount of storage transmitter at time t be w ( t ) and the amount of mobilized transmitter at time t be r(t). We must subdivide the processes defining (4) among the components w ( t ) and ~ ( t and ), allow storage transmitter to be mobilized and conversely. Then (4) is replaced by the system d --w = K ( L - w ) - (M-w - NE) (78) dt
Adaptation and Transmitter Gating in Vertebrate Photoreceptors
a
b
C
291
t
t
t
Figure 11. (a) Rectangular step in S ( t ) causes (b) gradual depletion-then-accumulation of z ( t ) . The combined effect is (c) overshoot and undershoot of T ( t ) .
chapter 5
298
Figure 12. Order of magnitude of the time constants of the zl processes in seconds. Backward reactions are all small compared to forward reactions. Redrawn from Baylor and Hodgkin (1974, p.757).
accumulation
mo b iIi z a t ion
sz
S gating
release
Figure 13. Transmitter w accumulates until a target level is reached. Accumulated transmitter is mobilized until an equilibrium between mobilized and unmobilized transmitter fractions is attained. The signal S is gated by mobilized transmitter which is released by mass action. The signal also modulates the accumulation and/or mobilization process. and
d
-z = (Mw dt
- Nz)- S t .
(79)
Term K(L - w ) in (78) says that w ( t ) tries to maintain a level L via transmitter accumulation (or production and feedback inhibition). Term - ( M w - Nz)in (78) says that storage transmitter w is mobilized at a rate M whereas mobilized transmitter z is demobilized and restored at a rate N until the two processes equilibrate. Term M w - N z in (79)says that w’s loss is z’s gain. Term -Sz in (79) says that mobilized transmitter is released at rate -Sz as it couples to the signal S by mass action. In all, equations (78) and (79) are the minimal system wherein transmitter accumulation, gating, and release can occur given that transmitter must be mobilized before it can be released. Once this system is defined, we must again face the habituation dilemma that was discussed in Section 5. Should not some or all of the production and mobilization
Adaptation and Dansmitter Gating in Vertebrate Photoreceptors
299
terms be enzymatically activated by light to prevent t)he mobilized transmitt,er 'from being rapidly depleted by high intensity lights? The terms which are candidates for enzymatic activation in (78) and (79) are K, M , and N, as in the equations
and
dN dt = - ~ N ( N- N O )+ P N [ ~ N- ( N - N o ) ] S .
(82)
The BHL data are insufficient to conclude whether all the terms K, M, and N can vary due to light activation. A possible empirical test of how many terms are activated will be suggested below. Before this test is described, however, we note an interesting analogy with the five slow variables 21, zz, 23, 24, and 5 that BHL defined to meet their data and the five slow variables w , L,K , M, and N. BHL needed the two slow variables 21 and zz to fit their data in moderate light intensities, and the three extra variables 23, 24, and 25 to describe components at very high light intensities. By comparison, the variables w , z , K , M , and N are five slow variables with w and E the dominant variables at intermediate light intensities, and K,M, and N possibly being slowly activated at high light intensities. Apart from the similarity in the numbers of slow variables in the two models, their dynamics and intuitive justification differ markedly, since our variables have an interpretation in general transmitter systems, whereas the BHL variables were formally defined to fit their data. A possible test of the number of enzymatically activated coefficients is the following. Recall that enzymatic activation of transmitter production changed the steady-state law relating T to S from (17) to (16). In other words, enzymatically activating one coefficient adds one power of S to both the numerator and the denominator of the law for T. Analogously, enzymatically activating n coefficients adds n powers of S to the numerator and denominator of this law. When n = 3, the law takes the form
+
P * S ( 1 + Q'S t R'S2 U ' S 3 ) W'SZ X'S3 Y'S4'
T = 1 + V'S
+
+
+
(83)
The higher-order coefficients R', U',X', and Y' are very small compared to the other, coefficients P', Q',V', and W'. Thus the enzymatic activation terms add very small corrections to the high intensity values of T, and thus to the corresponding values of , 2 via (50). If these high-intensity corrections could be measured, we would have an experimental test of how many terms K , M , and N are enzymatically activated. These but they do alter the rate higher powers do not alter the asymptotic shift X in (M), with which the asymptotic shift is approached as a function of increasing light intensity. We have hereby qualitatively explained all the main features of the BHL data using a minimal model of a miniaturized chemical transmitter. It remains to comment more completely on the form of the chain reaction which we used to convert light intensity Z ( t ) into the signal S ( t ) and to display quantitative data fits. The simplest chain reaction is the one used by BHL: 71Y1 = fva 4- 73YZ = 71111
+
(W
;liYn
+ TnYn = 'Yn-1Yn-1
Chapter 5
300
and S ( t ) = lin(t)-
We have used this chain reaction with good results. However, this law possesses the physically implausible property that y, 4 00 as I + 00. Only finite responses are possible in v i m . A related chain reaction avoids this difficulty and also fits the data well. This modified chain reaction approximates (84) a t small Z ( t ) values. It is
+ YI/1 = (6 - c I / l ) W + YYZ = (6 - ~ ~ 2 1 fgn
~
1
(85)
+ yvn = (6 - c ~ n ) ~ n - 1
and S(t)I/n(t)-
It is easily checked that in response to a step of light intensity I, all the asymptotes vi(00) in (85) have the form pI(1 v I ) - l , as in (30). The possibility exists that each step in the chain reaction is gated by a slow transducer. This would help to explain why so many slow variables appear at high light intensities even if not all the rates K,M, and N are enzymatically activated. Such a complication of the model adds no new conceptual insights and will remain unwarranted until more precise biochemical data are available.
+
14. Quantitative Analysis of Models In this section we will compare the experimental measurements of Baylor and Hodgkin 1974) with the predictions of their model Baylor e l al., 1974b) and our models I [equation (25)) and I1 (equations (4)and (l9\).The BHL model is outlined in Section 15. For each of Models I
and 11:
d
Z Z= A ( B - Z ) - SZ d & A = - C ( A - Ao)
+ DIE - ( A - Ao)]S,
(4) (19)
we will examine the properties of the gated signal
T = Sz.
(2)
That is, we present a model in which the amount of hyperpolarization, z, is directly where TO= SOZOis the steady-state level. Similar results, proportional to Sz - SOZO, with better quantitative fits, are obtained when the potential obeys the equation
cO ddt v
= ( V + - V ) - Po - (V - V - ) g , 1+KT
Recall that, if the potential obeys equation (42),then the amount of hyperpolarization is given by the equation
Adaptation and Transmitter Gating in VertebratePhotoreceptors
301
and, if z equilibrates quickly relative to z , Z E
MT
__
(47)
N+T’
Equation (47) says that z is approximately proportional to T if N is large relative to
T. For the rest of the section we will consider the experiment (Section 4) in which a short flash of fixed intensity is superimposed on ever-increasing levels of background light. Let z be the amount of hyperpolarization and 2 0 the equilibrium level for a fixed background intensity. As presented in Figure 3, the peak z - zo decreases as background intensity increases, but the time at which the peak occurs first decreases and then increases as the intensity increases. Figures 14-17 show that, of the three models under consideration, Model I1 provides the best fit to the data and BHL the poorest. Figure 14a presents the results of the intracellular recordings of Baylor and Hodgkin (1974); Figure 14b gives the predictions of the BHL model; and Figures 14c and 14d give predictions of Models I and 11. In each case, the peak potential in the dark is scaled to the value 25. The minimal background intensity is calibrated by finding that level at which the peak potential is 12.5, or half the peak in the dark. Thus each model fits the peak data eikctly in the dark and with the lowest positive background intensity. Note that the vertical scale in Figure 14d (Model 11) is the same as that of Figure 14a, which depicts the data. By contrast, the scales of Figure 14b (BHL) and Figure 14c (Model I) have been adjusted to accomodate the poorer match between the data and BHL and between the data and Model I. These results on peak potential are summarized in Figure 15. Figure 16 indicates the time of peak hyperpolarization BS a function of background intensity. Here, BHL gives a poor fit to the data; Model I gives a much better fit; and Model 11, with the slow enzymatic activation, gives the best fit of all. Figure 17 shows the fit of the steady-state data (equilibrium levels of 20) for the parameters chosen in each model.
The Chain Reaction In Models I and 11, the signal S ( t ) is given by a chain reaction described by equation (84) (Baylor ct al., 1974a). The constants n and rl . ..7nare chosen so that, when the light stimulus I ( t ) is a flash in the dark, S ( t ) matches the experimental dark response top curve of Figure 14a). Since equation (84) is linear, S ( t ) is equal to the sum of the ark response curve plus a constant which is proportional to the background intensity. Consequently, in this paradigm, any choice of chain reaction constant which provides a good fit to the dark curve will fit the data as well as any other choice. A simple function form which provides an adequate fit in the dark is
6
A suitable choice of constant J makes f(t) equal to S ( t ) in the dark when n = 6 and 71 = W ,
7a
= 57,
...,
76
= 7.
(87)
This is the “independent activation” form of Baylor et al. (1974a). This form is used in Model I (7= 17.3) and Model I1 (7= 17.0). Other chain reactions give similar results. In the BHL model, a similar chain reaction is used, except that the last step is modified to incorporate the unblocking variable z2 and the slow process 23 (Sections 13 and 15).
Chapter 5
302 0
msec
I
300
0
1
msec
300
13.5
0
BH L
a
b
Figure 14. Intracellular response curves z ( t )- zo showing the effect of a flash superimposed on a background light of fixed intensity. Each horizontal axis represents the time since the middle of the flash, which lasts 11 msec. The vertical axis is scaled so that the peak value of z ( t ) - zo = z t) in the dark is equal to 25. The number above each curve is loglo of the background ight intensity 1 0 , which is calibrated so that when loglo I0 = 3.26,the peak of z ( t ) - zo is equal to 12.5. (a) The Baylor-Hodgkin (1974) data. (b) The BHL model (redrawn from Baylor et al., 1974b, p.785).
\
Adaptation and Transmitter Gating in Vertebrate Photoreceptors
0
msec
msec
3 00
303
300
I
.OISr
.2Or
jjj,
0
I
n
C
d
Figure 14 (continued). (c) Model 1. (d) Model 11. Note that the vertical scales are not all the same.
Chapter 5
304
t
I og I x
- xo I
Figure 15. The size of the peak hyperpolarization, as a function of Iogl&, for the Baylor-Hodgkin data and the three models. Note that at high input intensities, BHL differs from the data and Model I1 by a factor of 10. Parameter Values of Model I and Model II Equation 88 contains the parameter values chosen for Model I in Figures 1417. Equation 89 contains the parameter values for Model 11. We wish to emphasize, however, that the model properties described in this section are robust over a wide range of parameter values and are not particular to the choices listed below.
11
Model I
Model 11
A0 = 1.8, F = 0.00333, G = 0.00179.
(88)
d 3% = A(B - Z ) - SZ
(4)
Adaptation and nansmitter Gating in Vertebrate Photoreceptors
t
t
305
msec
i
100
90
-
80
-
70 60
-
4
3
5
6
7
Figure 16. Times at which the peak hyperpolarization occur for the Baylor-Hodgkin data and the three models. Note that the input intensity at which the turn-around occurs and the dynamic range of peak times are much too small in the BHL model. BHL consider this the most serious defect of their model. d - A = - C ( A - Ao) dt
A0 = 0.5,
+ D [ E - ( A - Ao)]S
C = 0.2, D = 0.0047,
E = 18.830.
(89)
For each model, B is arbitrary, since the gated signal is assumed to be proportional to S z: a particular B would just multiply S r by a constant factor. 15. Comparison with the Baylor, Hodgkin, Lamb Model
BHL first observed that the voltage response of a turtle cone to a weak and brief flash of light (e.g., 11 msec) can rise for over 100 milliseconds before it slowly decays over a period of several hundred milliseconds. In order to maintain a prolonged response after the flash terminates, they assume that light sets off a chain reaction (84). They also assume that, in response to small light signals, the change V ( t ) in membrane potential is proportional to y , ( t ) . The main physical idea is suggested by that fact that, in response to higher input intensities, the chain reaction fits the data during the rising portion of the potential,
Chapter 5
306
-
2 0o t 2
15
-
3
4
5
6
7
Figure 17. Graphs of the steady-state hyperpolarization xo in response to the constant light intensity 10for the Baylor-Hodgkin data and the three models. but yields higher values than the potential during its falling phase. Because this effect occurs at later times, it is ascribed to a process that is triggered at the end of the chain reaction. Because the potential undershoots the chain reaction, it is assumed that this later process interferes with the potential. Such considerations led BHL to argue that the chain reaction activates a process which elicits the initial hyperpolarization of the potential. Thus it is assumed that "after a certain delay, light liberates a substance, possible calcium ions, which blocks sodium channels in the outer segments" (Baylor e t al., 1974b, p.760) and thereby tends to hyperpolarize the cone. The larger hyperpolarization produced by the chain reaction than the data is then ascribed to subsequent processes that interfere with ("unblock") the blocking substance during the falling phase of the potential. Their entire model is based on this assumption. Denote the concentration of blocking substance by 21 ( 1 ) and the concentration of the unblockingsubstances by zz(t) and zg(t). Function q ( t ) replaces the last stage yn(t) of the chain reaction in (84). Baylor et al. (1974b) choose the equations for z1 to fit the data in Figure 3. To explain these and related data, it is assumed that the zj act on each other via a nonlinear feedback process that is deflned as follows: q,V = E
- V(1 + G j + GI)
.
Adaptation and Transmitter Gating in Vertebrate Photorereptors
307
and
Initial conditions on all y i ( t ) and z j ( t ) for 1 5 0 are 0, V ( 0 ) = V,, and VD, the potential in darkness, satisfies
to make V , an equilibrium point of (90) in the absence of light. The potential V is related to the hyperpolarization U via the equation U = V - V,. Equation (90) describes how the potential V is hyperpolarized by changes in the conductances G f and G1. Equation (91) shows that GI is a decreasing function of zI. Equations (92) and (93) say that Gf time-averages a logistic function of V. Equation (94) describes the chain reaction with end product h - 1 and light input I ( t ) . Equations (95)-(98)describe the nonlinear chain reaction of blocking and unblocking variables zl, 22, and 23 that is driven by the output yn-l of the chain reaction. Equation (99) defines parameters to make Vg the equilibrium point of (90) when I ( t ) = 0. The equations (90)-(99) are an ingenious interpretation of the data, but their main features, such as the chain reaction of blocking and unblocking variables in (95 -(98), the non-linear dependence of the blocking and unblocking rates on these varia les in (98), and the existence of the voltage-dependent conductance GI in (92)-(93) are hard to interpret as logical consequences of a well-designed transducer, and have difficulties meeting the data quantitatively, as shown in Section 14. To explain the turn-around of potential peaks, BHL use the nonlinear feedbaek process between blocking and unblocking variables in (go), (91), (95)-(98) to argue that “the shortening of the time to peak occurs because the concentration of 22 increases and speeds up the conversion of 21 to 22’’ (Baylor et al., 1974b, p.784). To explain the eventual slowdown of response to high background intensities, the parameters are chosen (e.g., A > 1in (98)) so that “at very high levels of .z2 the reaction is so fast that there is no initial peak and the reaction is in equilibrium throughout the whole response. This results in an increase in the time to peak because the rate of destruction of 22 at a high intensity is less than the rate of destruction of 21 at some lower intensity” (Baylor et al., 1974b, p.784). Thus, the existence of process z2 and its properties are postulated to fit these data rather than to satisfy fundamental design constraints. Of great qualitative importance is the fact that this explanation of the turn-around in potential peak implies
b
Chapter 5
308
the nonexistence of overshoots at high flash intensities. This implication does not hold in our gating model. It forces the following auxiliary hypotheses in the BHL model. Baylor, Hodgkin, and Lamb note that the above mechanisms do not suffice to explain certain phenomena that occur after a strong flash. In particular, the potential transiently overshoots its plateau, achieving a peak change of 15-25 mV, before it settles to a plateau of 12-20 mV. They did their double flash experiments (Figure 7) to study this phenomenon. In Figure 7, the second flash does not elicit a second overshoot, but rather merely prolongs the plateau phase. They need two variable conductances GIand Gfto account for these data. The light-sensitive conductance GIin (91)is a decreasing function of zl,which is, in turn, an increasing function of light intensity due to (94) and (95). The conductance Gf in (92)depends on light only through changes in potential. In particular Gf is a time-average of a logistic function (93)of the potential. The main idea is that the light-sensitive conductance GIis shut off by the first flash. This leads to an initial hyperpolarization which changes Gf.This latter change decreases the potential at which the cell saturates from 30 to 20 mV, and causes the potential to return toward its plateau value. A t the plateau value, Gf is insensitive to a new flash, so a second overshoot does not occur, but the newly reactivated chain reaction does prolong the plateau phase. Even without the extra conductance Gf,some overshoot can be achieved in the model in response to weaker lights which hyperpolarize V(t) by 5-10 mV. These over, shoots are due to delayed desensitization, but they disappear when strong lights perturb the BHL model, unlike the situation in real cones; hence the need for Gf. The authors also use the conductance Gf to explain why offset of a rectangular pulse of depolarizing current that is applied during a cone’s response to light does cause a rebound hyperpolarization, whereas a depolarizing current in the absence of light does not (Figure 8).
16. Conclusion We have indicated how a minimal .model for a miniaturized unbiased transducer that is realized by a depletable chemical transmitter provides a conceptually simple and quantitatively accurate description of parametric turtle cone data. These improvements on the classical studies of Baylor, Hodgkin, and Lamb are, at bottom, due to the use of a ‘gating” rather than an “unblocking” concept to describe the transmitter’s action. Having related the experiments on turtle cone to a general principle of neural design, we can recognize the great interest of testing whether analogous parametric experiments performed on nonvisual cells wherein slowly varying transmitters are suspected to act will also produce similar reactions in cell potential. Where the answer is “no,” can we attribute this fact to specialized differences in the enzymatic modulation of photoreceptor transmitters that enable them to cope with the wide dynamic range of light intensity fluctuations?
ADDENDUM The anatomical site is presently uncertain at which the transmitter gating stage described herein may take place. Earlier experimental studies suggested a site in the outer segment. More recent work using suction electrodes suggests that this site is unlikely. The model’s validity is not dependent on this issue. Rather, we provide parametric tests of the gate’s existence, wherever it might be spatially located.
Adaptation and Dansniitter Gating in Vertebrate Photoreceptors
309
REFERENCES Arden, G.B. and Low, J.C., Changes in pigeon cone photocurrent caused by reduction in extracellular calcium activity. Journal of Physiology, 1978, 280, 55-76. BickstGm, A.-C. and Hemilli, S.O., Dark-adaptation in frog rods: Changes in the stimulus-response function. Journal of Physiology, 1979, 287, 107-125. Baylor, D.A. and Hodgkin, A.L., Changes in time scale and sensitivity in turtle photoreceptors. Journal of Physiology, 1974,242, 729-758. Baylor, D.A., Hodgkin, A.L., and Lamb, T.D., The electrical response of turtle cones to flashes and steps of light. Journal of Physiology, 1974, 242, 685-727 (a). Baylor, D.A., Hodgkin, A.L., and Lamb, T.D., Reconstruction of the electrical responses of turtle cones to flashes and steps of light. Journal of Physiology, 1974, 243, 759-791 (b). Boynton, R.M. and Whitten, D.N., Visual adaptation in monkey cones: Recordings of late receptor potentials. Science, 1970, 170, 1423-1426. Caldwell, P.C., Calcium movements in muscle. In R.J. Podolsky (Ed.), Contractility of muscle cells and related processes. Englewood Cliffs, NJ: Prentice-Hall, 1971, pp.105-114. eapek, R., Esplin, D.W., and Salehmoghaddam, S., Rates of transmitter turnover at the frog neuromuscular junction estimated by electrophysiological techniques. Journal of Neurophysiology, 1971, 34, 831-841. Dowling, J.E. and Ripps, H., S-potentials in the skate retina: Intracellular recordings during light and dark adaptation. Journal of General Physiology, 1971, 58, 163-189. Dowling, J.E. and Ripps, H., Adaptation in skate photoreceptors. Journal of General Physiology, 1972,00, 698-719. Eccles, J.C., T h e physiology of synapses. New York: Springer-Verlag, 1964. Esplin, D.W. and Zablocka-Esplin, B., Rates of transmitter turnover in spinal mone synaptic pathway investigated by neurophysiological techniques. Journal of Neure physiology, 1971, 34, 842-859. Grabowski, S.R., Pinto, L.H., and Pak, W.L., Adaptation in retinal rods of axolotl: Intracellular recordings. Science, 1972, 176, 1240-1243. Grossberg, S., Some physiological and biochemical consequences of psychological postulates. Proceedings of the National Academy of Sciences, 1968,60, 758-765. Grossberg, S.,On the production and release of chemical transmitters and related topics in cellular control. Journal of Theoretical Biology, 1969,22, 325-364. Grossberg, S., A neural theory of punishment and avoidance, I: Qualitative theory. Mathematical Biosciences, 1972, 15, 39-67 (a). Grossberg, S., A neural theory of punishment and avoidance, 11: Quantitative theory. Mathematical Biosciences, 1972, 25, 253-285 (b). Grossberg, S.,A neural model of attention, reinforcement, and discrimination learning. International Review of Neurobiology, 1975, 18, 263-327. Grossberg, S., Adaptive pattern classification and universal recoding, 11: Feedback, expectation, olfaction, and illusions. Biological Cybernetics, 1976, 25, 187-202. Grossberg, S., A theory of human memory: Self-organization and performance of sensory-motor codes, maps, and plans. In R. Rosen and F. Snell (Eds.), Progress in theoretical biology, Vol. 5. New York: Academic Press, 1978 (a). Grossberg, S.,Behavioral contrast in short-term memory: Serial binary memory models of parallel continuous memory models? Journal of Mathematical Psychology, 1978, 17, 199-219 (b).
310
Chapter 5
Grossberg, S., How does a brain build a cognitive code? Psychological Review, 1980, 87, 1-51.
Grossberg, S., Psychophysiological substrates of schedule interactions and behavioral contrast. In S. Grossberg (Ed.), Mathematical psychology and psyehophysiology. Providence, R I American Mathematical Society, 1981 (a). Grossberg, S., Some psychophysiological and pharmacological correlates of a developmental, cognitive, and motivational theory. In J. Cohen, R. Karrer, and P. Tueting (Eds), The proceedings of the sixth international conference on evoked potentials, 1981 (b). Hemila, S., Background adaptation in the rods of the frog’s retina. Journal of Physiology, 1977, 266, 721-741.
Hemila, S., An analysis of rod outer segment adaptation based on a simple equivalent circuit. Biophysical and Structural Mechanism, 1978, 4, 115-128. Kleinschmidt, J., Adaptation properties of intracellularly recorded gekko photoreceptor potentials. In H. Langer (Ed.), Biochemistry and physiology of visual pigments. New York: Springer-Verlag, 1973. Kleinschmidt, J. and Dowling, J.E., Intracellular recordings from gekko photoreceptors during light and dark adaptation. Journal of General Physiology, 1975, 66, 617-648. Norman, R.A. and Werblin, F.S., Control of retinal sensitivity, I: Light and dark adaptation of vertebrate rods and cones. Journal of General Physiology, 1974,63, 37-61. Zablocka-Esplin, B. and Esplin, D.W., Persistent changes in transmission in spinal mono-synaptic pathway after prolonged tetanization. Journal of Neurophysiology, 1971, 34, 860-867.
31 1
Chapter 6
THE ADAPTIVE SELF-ORGANIZATION OF SERIAL ORDER IN BEHAVIOR: SPEECH, LANGUAGE, AND MOTOR CONTROL Preface This Chapter describes several of the general organizational principles, network modules, and neural mechanisms that our group has used to analyse and predict data about speech, language, and motor control. The Chapter describes a progression of concepts and mechanisms, leading from simple to more complex, that we have used to analyse such temporally ordered behaviors. In this way, the reader can discern how relatively simple and pre-wired circuits for the control of temporal order, as are found, say, in the command cell networks of invertebrates, are functionally related to much more context-sensitive and adaptive circuits for the control of speech and motor planning. These latter circuits are specialized versions of adaptive resonance theory (ART architectures (Volume I). A comparison of the temporal order mechanisms describe herein with the ART circuits that were used to analyse data about conditioning and cognitive recognition learning in Volume I enables the reader to better appreciate how evolution may specialize a fundamental network module to accomplish a surprising diversity of behavioral tasks. All of the results in this Chapter are predicated upon a proper choice of functional units. These functional units are epatial patterns of short term memory (STM) activation across ensembles of cells and 5patial patterns of associative long term memory (LTM) traces across ensembles of cell pathways. Many scientists now realize that distributed processing is a characteristic feature of neural dynamics. A precise mathematical understanding is needed to confidently design distributed processes, because improperly designed distributed processors can become wildly unstable in any but the most artificially constrained environments. The processes described in this and the subsequent Chapters were derived from an analysis of how systems with desirable emergent behavioral properties can stably self-organize in complex environments. The Chapter begins by noting that a number of popular models cannot withstand an analysis of their internal structure, let alone of their ability to stably self-organize. All too little modeling expertise is currently being devoted to building principled theories which can last, or to analysing the computational basis for a model’s partial successes. A heavy price is often paid by these models. Their interna1,flaws are mirrored by their narrow explanatory range. Three examples of how concepts about self-organization can guide, and even drastically change, one’s thinking are highlighted here. Many people consider the process of storing temporal order information in STM,or working memory, to be primarily a performance issue that is not strongly linked to learning problems. Indeed, the shift from studies of verbal learning to studies of free recall in the late 1960’s may partly be understood as an effort to avoid perplexing issues about context-sensitive changes in the learned units, or chunks, that emerge during verbal learning experiments. In contrast, I suggest that the laws which govern the storage of temporal order information in STM are designed to ensure that STM is updated in a way that enables temporally stable list learning in LTM to occur. In other words, I relate the laws which store individual items in STM to the LTM laws which group, or chunk, these items into unitized lists. I derive laws for storing temporal order information in STM from the hypothesis that STM storage enables LTM to form unitized list codes in a temporally stable way. Such laws show how to alter the STM activities of previous items in response to the presentation of new items so that the repatterning of STM
h
312
Chapter 6
activities that is caused by the new items does not inadvertently obliterate the LTM codes for old item groupings. Remarkably, despite these adaptive constraints, and often because of them, temporal order information is not always stored veridically, as is true during free recall. This analysis clarifies how the entire spatial pattern of temporal order information over item representations can act as a working memory representation of a speech stream. This representation helps to explain a variety of data concerning the LTM encoding of speech presented at different rates, notably how shortening or lengthening of a later vowel can alter the perception of a prior consonant, or how varying the durations of silence and fricative noise that follow a syllable can influence the perception of the syllable. Another issue that is clarified by an analysis of stable unitization concerns the functional units that are processed at successive stages of a language or visual recognition system. From the perspective of self-organization, it seems wrong to assume that a letter level exists that is followed by a word level. Although an extensive analysis of more appropriate levels was already published in 1978, the hypothesis that letter and word levels exist seem to be dying hard. This is true, I believe, because people confuse their lay experiences using letters and words with the abstract unitization operations which subserve their lay experiences. I suggest that, instead of a letter level and a word level, there exists an item level and a list level, or more precisely a "temporal order information over item representations in STM" level and a "sublist chunks in STM" level. These alternative levels differ in many ways from letter and word levels. For example, a familiar letter that is not a word cannot be represented on the word level. In contrast, a familar letter that is or is not a word can be represented on both the item and the list level. A familiar phoneme can also be represented on the item and list levels, and can thus, under certain circumstances, interact with semantic information. The analysis of how to design the list level leads to a multiple grouping circuit called a maeking field. Computer simulations of a masking field architecture are described in Chapter 8. The present Chapter describes some of the data which a masking field list level has successfully predicted-such as a word length effect in word superiority studies-and can help to explain-such as how attentional processes can prevent word superiority or phonemic restoration effects from occurring, and how the Magic Number 7 of G.A. Miller is dynamically generated. A third self-organization theme concerns the manner in which temporal order information is encoded in LTM. I suggest that mechanisms for explaining data about one of my first loves, serial verbal learning, are also used to learn cognitive list chunks and predictive motor plans. I also show how top-down mechanisms for learning temporal order information can supplement, and often supplant, more classical notions such as associative chaining. Indeed, bottom-up adaptive filtering of temporal order information in STM can encode unitized speech or motor planning nodes, whose top-down read-out of learned templates can encode temporal order information in LTM. This circuit as a whole is a specialized ART architecture. These results provide a general foundation for the extensive computer simulation studies of specialized neural circuits governing adaptive sensory-motor control which Michael Kuperstein and I have published as a separate volume in this series.
P a t t e r n Rrcogiiitioii by I I i i ~ n a n sa n d %facIiiiics, Vol. 1: Speech Perception E.C. Schwah and H.C. Nusbauni (Eds.) 0 1 9 8 6 Academic Press, Inc. Reprintred by permission of the publisher
313
THE A D A P T I V E SELF-ORGANIZATION OF S E R I A L O R D E R I N BEHAVIOR: SPEECH, L A N G U A G E , A N D M O T O R C O N T R O L
Stephen Grossbergt
1. Introduction: Principles of Self-organization in Models of Serial Order: Performance Models versus Self-Organizing Models The problem of serial order in behavior is one of the most difficult and far-reaching problems in psychology (Lashley, 1951). Speech and language, skilled motor control, and goal-oriented behavior generally are all instances of this profound issue. This chapter describes principles and mechanisms that have been used to unify a variety of data and models, as well as to generate new predictions concerning the problem of serial order. The present approach differs from many alternative contemporary approaches by deriving its conclusions from concepts concerning the adaptive self-organization (e.g., the developmeng, chunking, and learning) of serial behavior in response to environmental pressures. Most other approaches to the problem, notably the familiar information processing and artificial int.el1igence approaches, use performance models for which questions of self-organization are raised peripherally if at all. Some models discuss adaptive issues but do not consider them in a real-time context. A homunculus is often used either implicitly or explicitly to make the model work. Where a homunculus is not employed, models are often tested numerically in such an impoverished learning environment that their instability in a more realistic environment is not noticed. These limitations in modeling approaches have given rise to unnecessary internal paradoxes and predictive limitations within the modeling literature. I suggest that such difficulties are due to the facts that principles and laws of self-organization are rate-limiting in determining the design of neural processes, and that problems of self-organization are the core issues that distinguish psychology from other natural sciences such as traditional physics. In light of these assertions, it is perhaps more understandable why a change of terminology or usage of the same concepts and mechanisms to discuss a new experiment can be hailed as a new model. The shared self-organizing principles that bind the ideas in one model to the ideas in other models are frequently not recognized. This style of model building tends to perpetuate the fragmentation of the psychological community into noninteracting specialities, rather than foster the unifying impact whereby modeling has transformed other fields.
t Supportedin part by the National Science Foundation (NSF IST-80-00257 and NSF IST-84-17756) and the Office of Naval Research (ONR N00014-83-K0337).
314
Chapter 6
The burgeoning literature on network and activation niodels in psychology has, for example, routinely introduced as new ideas concepts that were previously developed to explain psychological phenomena in the neural modeling literature. Such concepts as unitized nodes, the priming of short-term memory, probes of long-term memory, automatic processing, spreading activation, distinctiveness, lateral inhibition, hierarchical cascades, and feedback were all quantitatively used in the neural modeling literature before being used by experimental psychologists. Moreover, the later users have often ignored the hard-won lessons t o be found in the neural modeling literature. The next section illustrates some characteristic difficulties of models and how they can be overcome by the present approach (this discussion can be skipped on a first reading). 2. Models of Lateral Inhibition, Temporal Order, Letter Recognition, Spreading Activation, Associative Learning, Categorical Perception, and Memory Search: Some Problem Areas
A. Lateral Inhibition and the SufRx ERect From a mathematical perspective, a model that uses lateral inhibition is a competitive dynamical system (Grossberg, 1980a). Smale (1976)has proven that the class of competitive dynamical systems contains systems capable of exhibiting arbitrary dynamical behavior. Thus to merely say that lateral inhibition is at work is, in a literal mathematical sense, a vacuous statement. One needs to define precisely the dynamics that one has in mind before anything of scientific value can be gleaned. Even going so far as to say that the inhibitory feedback between nearby populations is linear says nothing of interest, because linear feedback can rause such varied phenomena as oscillations that never die out or the persistent storage of short-term memory patterns, depending on the anatomy of the network as a whole (Cohen and Grossberg, 1983; Grossberg, 1978c, 1980a). An imprecise definition of inhibitory dynamics will therefore inevitably produce unnecessary controversies, as has already occurred. For example, Crowder’s (1978) explanation of the suffix effect (Dallett, 1965) and Watkins and Watkins’s (1982) critique of the Crowder theory both focus on the purported property of recurrent lateral inhibition that an extra suffix should weaken the suffix effect due to disinhibition. However, this claim does not necessarily hold in certain shunting models of recurrent lateral inhibition that are compatible with the suffix effect (Grossberg, 1978a, 1978e). This controversy concerning the relevance of lateral inhibition to the suffix effect cannot be decided until the models of lateral inhibition used to andyse that effect are determined with complete mathematical precision. A type of lateral inhibition that avoids the controversy is derived from a rule of self-organization that guarantees the stable transfer of temporal order information from short-term memory to long-term memory as new items continually perturb a network (Section 33). B. Temporal Order Information in Long-Term Memory A more subtle problem arises in Estes’s (1972) influential model of temporal order information in long-term memory. Estes (1972, p.183) writes: “The inhibitory tendencies which are required to properly shape the response output become established in memory and account for the long term preservation of order information.” Estes goes on to say that inhibitory connections form from the representations of earlier items in the list to the representations of later list items. Consequently, earlier items will be less inhibited than later items on recall trials and will therefore be performed earlier. Despite the apparent plausibility of this idea, a serious problem emerges when one writes down dynamical equations for how these inhibitory interactions might be learned in real-time. One then discovers that learning by this mechanism is unstable because, as Estes realized, the joint activation of two successive network nodes is needed for the network to know which inhibitory pathway should be strengthened. As such an inhibitory pathway is strengthened, it can more strongly inhibit its receptive node, which
The Adaptive Self-organization of Serial Order in Behavior
315
is the main idea of the Estes model Houever. mhen this inhibitory action inhibits the receptive node, it undermines the joint excitation that is needed to learn and remember the strong inhibitory connection. The inhibitory connection then weakens, the receptive node is disinhibited, and the learning proress is initiated anew. An unstable cycle of learning and forgetting order information is thus elicited through time. Notwithstanding the heuristic appeal of Estes’s mechanism, it cannot be correct in its present form. All conclusions that use this mechanism therefore need revision, such as Rumelhart and Norman’s (1982) discussion of typing and MarKay’s (1982) discussion of syntax. One might try to escape the instability problem that arises in Estes’s (1972) theory of temporal order information by claiming that inhibitory connections are prewired into a sequential buffer and that many different lists can be performed from this buffer. Unfortunately, traditional buffer concepts (e.g., -4tkinson and Shiffrin, 1968; Raaijmakers and Shiffrin, 1981) face design problems that are as serious as the instability criticism (Grossberg, 1978a). In this way, the important design problem of how to represent temporal order information in short-term and long-term memory without using either a traditional buffer or conditioned inhibitory connections is vividly raised. Solutions of these problems are suggested in Sections 12- 19 and Section 34. C. Letter and Word Recognition A similar instability problem occurs in the work on letter perception of Rumelhart and McClelland (1982). They write: “Each letter node is assumed to activate all of those word nodes consistent with it and inhibit all other word nodes. Each active word node competes with all other word nodes . . .” (Rumelhart and McClelland, 1982, p.61). Obviously, the selective connections between letter nodes and word nodes are not prewired into such a network at birth. Otherwise all possible letter-word connections for all languages would exist in every mind, which is absurd. Some of these connections $re therefore learned. If the inhibitory connections are learned, then the model faces the same instability criticism that was applied to Estes’s (1972) model. Grossberg (1984b) shows, in addition, that if the excitatory connections are learned, then learning cannot get started. The connections hypothesized by Rumelhart and McClelland also face another type of challenge from a self-organization critique. How does the network learn the difference between a letter and a word? Indeed, some letters are words, and both letters and words are pronounced using a temporal series of motor commands. Thus many properties of letters and words are functionally equivalent. Why, then, should each word compete with all other words, whereas no letter competes with all other words? An alternative approach is suggested in Section 37, where it is suggested that the levels used in the Rumelhart and McClelland model are insufficient. The McClelland and Rumelhart model faces such difficulties because it considers only performance issues concerning the processing of four-letter words. In contrast, the present approach considers learning and performance issues concerning the processing of words of any length. Its analysis of how a letter stream of arbitrary length is organized during real-time presentation leads to a process that predicts, among other properties, a word-length effect in word superiority studies (Grossberg, 1978e, Section 41; reprinted in Grossberg, 1982d, p.595). Subsequent data have supported this prediction (Matthei, 1983; Samuel, van Santen, and Johnston, 1982,1983). No such prediction could be made using Rumelhart and McClelland’s (1982) model, since it is defined only for four-letter words. Moreover, the theoretical ideas leading to predictions such as the word-length effect are derived from an analysis of how letter and word representations are learned. An analysis of performance issues per se provides insufficient constraints on processing design. D. Spreading Activation Similar difficulties arise from some usages of ideas like spreading activation in network memory models. In Anderson (1976) and Collins and Loftus (1975), the amount of activation arriving at a network node is a decreasing function of the number of links
316
Chapter 6
the activation has traversed, and the tinie for activation to spread is significant (about 50-100 msec per link). By contrast, there is overwhelming neural evidence of activations that do not pass passively through nerve cells and that are not carried slowly and decrementally across nerve pathways (Eccles, 1952; Kuffler and Nicholls, 1976; Stevens, 1966). Rather, activation often cannot be triggered at nerve cells unless proper combinations of input signals are received, and when a signal is elicited, it is carried rapidly and nondecrementally along nerve pathways. Although these ideas have been used in many neural network analyses of psychological data, their unfamiliarity to many psychologists is still a source of unnecessary controversy (Ratcliff and McKoon, 1981). Most spreading activation models are weakened by their insufficient concern for which nodes have a physical existence and which dynamical transactions occur within and between nodes. Both of these issues are special cases of the general question of how a node can be self-organized through experience. Anderson's (1983) concepts of a /an eflect in spreading activation illustrates these difficulties. Anderson proposes that if more pathways lead away from a concept node, each pathway can carry less activation. In this view, activation behaves like a conserved fluid that flows through pipe-like pathways. Hence the activation of more pathways will slow reaction time, other things being equal. The number of pathways to which a concept node leads, however, is a learned property of a self-organizing network. The pathways that are strengthened by learning are a subset of all the pathways that lead away from the concept node. At the concept node itself, no evidence is available to label which of these pathways was strengthened by learning (Section 3). The knowledge of which pathways are learned is only available by testing how effectively the learned signals can activate their recipient nodes. It is not possible, in principle, to make this decision at the activating node itself. Since many nodes may be activated by signals from a single node, the network decides which nodes will control observable behavior by restricting the number of activated nodes. Inhibitory interactions among the nodes help to accomplish this task. Inhibitory interactions are not used in Anderson's (1983) theory, although it is known that purely excitatory feedback networks are unstable unless artificially narrow choices of parameters are made. Without postulating that activation behaves like a conserved fluid, a combination of thresholds and inhibitory interactions can generate a slowing of reaction time as the number of activated pathways is increased. In fact, the transition from a fan concept (associative normalization) to inhibitory interactions and threshold was explicitly carried out and applied to the study of reaction time (Grossberg, 1968b, 1969~).This theoretical step gradually led to the realization that inhibitory interactions cause limited capacity roperties as a manifestation of a fundamental principle of network design (Section 197. Anderson (1983) intuitively justifies his fan concept in terms of a limited capacity for spreading activation, but he does not relate the limited capacity property to inhibitory processes. E. Associative Learning and Categorical Perception In the literature on associative learning, confusion has arisen due to an insufficient comparative analysis of the adaptive models that are available. For example, some authors erroneously claim that all modern associative models use "Hebbian synapses" (Anderson, Silverstein, Ritz, and Jones, 1977) and thus go on to equate important differences in processing capabilities that exist among different associative models. For example, in their discussion of long-term memory, Anderson et al. (1977) claim that the change in synaptic weight .zcj from a node v, to a node vJ equals the product of the activity f, of v, with the activity g, of u,, where f, and g3 may be positive or negative. If both f, and g3 are negative, two inhibited nodes can generate a positive increment in memory, which is neurally unprecedented. Also, if f, is positive and g3 is negative, a negative memory trace zS3can occur. Later, if f, is negative, its interaction with negative memory z,, causes a positive activation of g]. Thus an inhibited node v, can, via a negative memory trace q 3 ,excite a node u,. This property is also neurally
The Adaptive Self Organization of Serial Order in Behavior
317
unprecedented. Both of these properties follow from the desire of Anderson et al. 1977) to apply ideas from linear system theory to neural networks. These problems do not arise in suitably designed nonlinear associative networks (Section 3). The desire to preserve the framework of linear system theory also led Anderson et al. (1977) to employ a homunculus in their model of categorical perception, which cannot adequately be explained by a linear model. To start their discussion of categorical perception, they allowed some of their short-term memory activities to become amplified by positive linear feedback. Left unchecked in a linear model, the positive feedback would force the activities to become infinite, which is physically impossible. To avoid this property, the authors imposed a rule that stops the growth of each activity when it reaches a predetermined maximal or minimal size, and thereafter stores this extremal value in memory. The tendency of all variables to reach a maximal or minimal value is then used to discuss data about categorical perception. No physical process is defined to justify the discontinuous change in the slope of each variable when it reaches an extreme of activity, or to explain the subsequent storage of these activities. The model thus invokes a homunculus to explain both categorical perception and short-term memory storage. If the discontinuous saturation rule is replaced by a continuous saturation rule, and if the dynamics of short-term memory storage are explicitly defined, then positive linear feedback can compress the stored activity pattern, rather than contrast enhance it, as one desires to explain categorical perception (Grossberg, 1973, 1978d). This example illustrates how perilous it is to substitute formal algebraic rules, such as those of linear system theory, for dynamical rules in the explication of a psychological process. Even in cases where the algebraic rule seems to express an intuitive property of the psychological process such as the tendency to saturate-the algebraic rule may also suggest the use of other rules - such as linear positive feedback-that produce diametrically opposed results when they are used in a dynamical description of the process. No homunculus is needed to explain categorical perception in suitably designed nonlinear neural networks (Sections 18 and 22). Indeed, nonlinear network mechanisms are designed to avoid the types of instabilities and interpretive anomalies that a linear feedback system approach often generates in a neural network context. F. Classical Conditioning and Attentiond Processing Much as Anderson ct al. (1977) improperly lumped all associative models into a Hebbian category, so Sutton and Barto (1981) have incorrectly claimed that associative models other than their own use Hebbian synapses. They go on to reject all Hebbian models in favor of their own non-Hebbian associative model. Given the apparent importance of the Hebbian distinction, it is necessary to define a Hebbian synapse and to analyse why it is being embraced or rejected. Sutton and Barto (1981, p.135) follow Hebb to define a Hebbian synapse as follows: “when a cell A repeatedly and persistently takes part in firing another cell B, then A’s efficiencyin firing B is increased.” However, in my associative theory, which Sutton and Barto classify as a Hebbian theory, repeated and persistent associative pairing between A and B can yield conditioned decreases, as well as increases, in synaptic strength (Grossberg, 1969b, 1970b, 1972~). This is not a minor property, since it is needed to assert that the unit of long-term memory is a spatial pattern of synaptic strengths (Section 4 ) . Hebb’s law, by contrast, is consistent with the assumption that the unit of long-term memory is a single synaptic strength. This property does not satisfy the definition of a Hebbian synapse; hence my associative laws are not Hebbian, contrary to Sutton and Barto’s claim. Moreover, the associative component of these laws is only one of several interesting factors that control their mathematical and behavioral properties. None of these factors was considered by Hebb. Notwithstanding these important details, we still need to ask why Sutton and Barto attack “Hebbian” models. The reason is that Hebbian theories are purported to be unable (1) to recall a conditioned response with a shorter time lag after the presentation
318
Chapter 6
of a conditioned stimulus (CS) than was required for efficient learning to occur between the CS and the unconditioned stimulus (VCS), or (2) to explain the inverted U in learning efficacy that occurs as a function of the time lag between a CS and C’CS on learning trials. Indeed, Sutton and Barto (1981,. p.142) confidently assert: “not one of the adaptive element models currently in the literature is capable of producing behavior whose temporal structure is in agreement with that observed in animal learning as described above.” Unfortunately, this assertion is false. In fact, Sutton and Barto refer to the article by Grossberg (1974) which reviews a conditioning theory that can explain these phenomena (Grossberg, 1971, 1972, 1972b, 1975), as well as a variety of other phenomena that Sutton and Barto cannot explain due to their model’s formal kinship with the Rescorla-Wagner model (Grossberg. 1982b; Rescorla and Wagner, 1972). Moreover, my explanation does not depend on the non-Hebbian nature of my associative laws, but rather on the global anatomy of the networks that I derive to explain conditioning data. This anatomy includes network regions, called drive representations, at which the reinforcing properties of external cues join together with internal drive inputs to compute motivational decisions that modulate the attentional procesing of external cues. No such concept is postulated in Sutton and Barto’s (1981) model. Thus the fact that a pair of simultaneous CS’s can be processed, yet a CS that is simultaneous with a UCS is not processed, does not depend on the elaboration of the UCS’s motivational and attentional properties in the Sutton and Barto model, despite the fact that the UCS might have been a CS just hours before. Sutton and Barto’s model of classical conditioning excludes motivational and attentional factors, instead seeking all explanations of classical conditioning data in the properties of a single synapse. Such an approach cannot explain the large data base concerning network interactions between neocortex, hypothalamus, septum, hippocampus, and reticular formation in the control of stimulus-reinforcer properties (Berger and Thompson, 1978; Deadwyler, West, and Robinson, 1981; DeFrance, 1976; Gabriel, Foster, Orona, Saltwiek, and Stanton, 1980; Haymaker, Anderson, and Nauta, 1969; MacLean, 1970; O’Keefe and Nadel, 1978; Olds, 1977; Stein, 1958; West, Christian, Robinson, and Deadwyler, 1981) and leads its authors to overlook the fact that such interactions are interpreted and predicted by alternative models (Grossberg, 1975). The present chapter also focuses on behavioral properties that are emergent properties of network interactions, rather than of single cells, and illustrates that single cell and network laws must both be carefully chosen to generate desirable emergent properties. G.Search of Associative Memory The Anderson e t at. (1977) model provides one example of a psychological model whose intuitive basis is not adequately instantiated by its formal operations. Such a disparity between intuition and formalism causes internal weaknesses that limit the explanatory and predictive power of many psychological models. These weaknesses can coexist with a model’s ability to achieve good data fits on a limited number of experiments. Unfortunately good curve fits have tended to inhibit serious analysis of the internal structure of psychological models. Another example of this type of model is Raaijmakers and Shiffrin’s (1981) model of associative memory search. The data fits of this model are remarkably good. One reason for its internal difficulties is viewed by the authors as one of its strengths: “Because our main interest lies in the development of a retrieval theory, very few assumptions will be stated concerning the interimage structure” (Raiijmakers and Shifiin, 1981, p.123). To characterize this retrieval theory, the model defines learning rules that are analogous to laws of associative learning. However, in information processing models of this kind, terminology like short-term memory (STM) and long-term memory (LTM)is often used instead of terminology like CS, UCS,and conditioning. These differences of terminology seem to have sustained the separate development of models that describe mechanistically related processes.
The Adaptive Self-organizationof Sen2 Order in Behavior
319
Although Raiijmakers and Shiffrin’s (1981) model intuitively discusses STM and LTM, no STM variables are formally defined; only LTM strengths are defined. This omission forces compensatory assumptions to be made through the remaining theoretical W,S) between the ith word at test structure. In particular, the LTM strength S(W*T, (T)and the j t h stored (S) word is made a linear function
S(Wt~7wj~) = btaj
(1)
of the time t,, during which both words are in the STM buffer. Thus there is no forgetting, the LTM strength grows linearly to infinity on successive trials, and although both words are supposedly in the buffer when LTM strength is growing, strength is assumed to grow between W,Tand W,s rather than between Wasand W!s. A more subtle difficulty is that time per 8 C should not explicitly determine a dynamical process, as it does in l), unless it parameterizes an external input. All of these problems arise because the t eory does not define STM activities which can mediate the formation of long-term memories. Instead of using STM activities as the variables that control performance, the theory defines sampling and recovery probabilities directly in terms of LTM traces. The sampling probabilities are built up out of products of LTM traces, as in the formula
6
for the probability of sampling the ith word W t s given a probe consisting of a context cue CT and the kth word WkT at test (T).This formula formally compensates for the problem of steadily increasing strengths by balancing numerator strengths against denominator strengths. It also formally achieves selectivity in sampling by multiplying strengths together. The theory does not, however, explain how or why these operations might occur in viuo. The context cue CT is of particular importance because the relative strength of context-to-word associations is used to explain the theory’s proudest achievement: the part-list cuing effect. However, the context cue is just an extra parameter in the theory because no explanation is given of how a context representation arises or is modified due to experimental manipulations. In other words, because recall theory says nothing about chunking or recognition, the context cue plays a role akin to that played by the “fixed stars” in classical explanations of centrifugal force. In addition to the continuous rule for strength increase (ebuation (l)),the theory defines a discrete rule for strength increase
S‘(wi~,Wjs) = S(waT,Wjs) + 91
(3)
which also leads to unbounded strengths as trials proceed. The incrementing rule (equation (3)) is applied only after a successful recall. Although this rule helps to fit some data, it is not yet explained why two such different strengthening rules should coexist. The authors represent the limited capacity of STM by appending a normalization constraint onto their sampling probability rule. They generalize equation (2) with the samdinp: rule
where the weights w, satisfy
c m
j= 1
wj
2 w.
(5)
320
Chapter 6
Equation (4) defines the probability of sampling the ith image I t , given the set of probe cues Q1, Q a ,, , , Qn.Why these normalization weights, which intend to represent the
.
limited capacity of STM, should appear in a sampling rule defined by LTM traces, is unexplained in the theory. The properties that the formalism of Raiijuiakers and Shiffrin 1981) attempts to capture have also arisen within my own work on human memory Grossberg, 1978a, 1978b). Because this theory describes the self-organization of both recognition and recall using real-time operations on STM and LTM traces, it exhibits these properties in a different light. Its analog of the product rule (equation (2)) is due t o properties of temporal order information in STM derived from a principle that. guarantees the stable transfer of temporal order information from STM t.o LTM (Section 34). Its analog of the continuous strengthening rule (equation (1)) is found in the chunking process whereby recognition chunks are formed (Section 21). Its analog of the discrete strengthening rule (equation (3)) is due to the process whereby associations from recognition chunks to recall commands are learned (Section 6). Its analog of the normalization rule (equation (5)) is a normalization property of competitive STM networks that are capable of retuning their sensitivity in response to variable operating loads (Section 17). Not surprisingly, the part-list cuing effect poses no problem for this theory, which also suggests how contextual representations are learned. In light of these remarks, I suggest that Raaijmakers and Shiffrin (1981) have not realized how much the data they wish to explain depends on the “interimage structure’’ that their theory does not consider. A few principles and mechanisms based on ideas about self-organization have, in fact, been the vantage point for recognizing and avoiding internal difficulties within psychological models of cognition, perception, conditioning, attention, and information processing (Grossberg, 1978a, 1978e, 1980b, 1980d, 1981b, 1982b, 1982d, 1983, 1984a, 198413). Some of these principles and mechanisms of self-organization are defined below and used t o discuss issues and data concerning the functional units of speech, language, and motor control. This foundation was originally built u p for this purpose in Grossberg (1978e). That article, as well as others that derive the concepts on which it is based, are reprinted in Grossberg (1982d).
I
3. Associative Learning by Neural Networks: Int,eractions Between STM and LTM
The foundation of the theory rests on laws for associative learning in a neural network, which I call the embedding field equations (Grossberg, 1964). These laws are derived from psychological principles and have been physiologically interpreted in many places (e.g., Grossberg, 1964, 1967, 196813, 1969b, 1970b. 1972c, 1974). They are reviewed herein insofar as their properties shed light on the problem of serial order. The associative equations describe interactions among unitized nodes u, that are connected by directed pathways, or azona, e i 3 . These interactions are defined in terms of STM traces s , ( t ) computed at the nodes vI and LTM traces z,, computed at the endpoints, or eynaptic knoba, S13of the directed pathways eL3(Figure 1). The simplest realization of these interactions among n nodes v1, v2,. . . ,vn is given by the system of differential equations
and
The Adaptive Self-organization of Serial Order in Behavior
321
sij vj
V. I
Figure 1. STM trace x, fiuctuates at each node v,, and an LTM trace z,, fluctuates at the end (synaptic knob) S,, of each conditionable pathway er3. The performance signal B,, is generated in el, by 2, and travels at a finite velocity until it reaches S,,. The LTM trace z,, computes a time average of the contiguous trace 5, multiplied by a sampling signal E,, that is derived from Ill,, The performance signal B,, is gated by z,, before the gated signal Bl,z,, perturbs z J . where i , j = 1 , 2 , . . . , n; $ denotes the rate of change of the continuous variable, zr or E,,. as the case might be; and the notation [[I+ = max(E,O) defines a threshold. The terms in equations ( 6 ) and (7) have the following interpretations. A. STM Decay Function A, in equation ( 6 ) is the decay rate of the STM trace 2 , . This rate can, in principle, depend on all the unknowns of the system, as in the competitive interaction
which I describe more fully in Section 18. Equation ( 8 ) illustrates that STM decay need not be a passive process. Active processes of competitive signaling, as in this equation, or other feedback interactions, can be absorbed into the seemingly innocuous term A,z, in equation (0). B. Spreading Activation Function Bkl in equation (6) is a performance signal from node v k to the synaptic knob(s) Sk, of pathway e k , . Activation “spreads” along ek, via the signal &,. Two typical choices of Bkl are
Bk,(t) = M k ( t
-
4 - rrr1+
(9)
or
Bk&) = f ( 4 t - Q1))bkl (10) where f ( ( ) is a sigmoid, or S-shaped, function of [ with f ( 0 ) = 0. In equation (Q), a signal leaves vk only if zk exceeds the signal threshold r k , (Figure 2a). The signal moves along e k , at a finite velocity (“activation spreads”) and reaches Sk, after Tk, time units. Typically, rkl is a short time compared to the time it takes vk to exceed threshold r k , in response to signals. Parameter bkl measures the strength of the pathway ek, from vk to u,. If bk, = 0,no pathway exists. In equation (lo),the signal threshold r k l is replaced by attenuation of the signal at small xk values and saturation of the signal at large zk values (Figure 2b). The S-shaped
322
Chapter 6
Figure 2. (a) A threshold signal: B,J(t)is positive only if z,(t - rIJ)exceeds the signal threshold B,, is a linear function of z j ( t - r1,) above this threshold. (b) A sigmoid signal: B,,(t) is attenuated at small values of zl(t - r,,), much as in the threshold case, and levels off at large values of sl(t- rIJ)after all signaling sites are turned on.
rIJ.
The Adaptive Selforganization of Serial Order in Behavior
323
signal function is the simplest phycical signal function that can prevent noise amplification from occurring due to reverberatory signaling in a feedback network (Section 18). C. Probed Read-Out of LTM:Gating of Performance Signals Term B k r z k r in equation (6) says that the signal b k , from Vk to S k , interacts with the LTM trace z k , at s k , . This interaction can be intuitively described in several ways. For one, B k , is a probe signal, activated by STM at v k , that reads-out the LTM trace z k , into the STM trace z, of v,. For another, z k , gates signal &, before it reaches v, from V k , so that the signal strength that perturbs zIat v, is B k , Z k , rather than B k , . Thus even if an input to Z’k excites equal signals B k , in all the pathways e k , , only these v, abutted by large LTM traces z k , will be appreciably activated by V k . Activation does not merely “spread” from Vk to other nodes; it can be transformed into propagated signals ( Z k into B k , ) and gated by LTM traces ( B k , into B t , Z k , ) before it reaches these nodes. D.Adaptive Filtering The gated signals from all the nodes Uk combine additively at v, to form the total signal TI = C%, B k , z k , of equation ( 6 ) . Speaking mathematically, T,is the dot product, or inner product, of the vectors B, = (B1*, Bz,,. . , B,,) and z, = (zl,,zzt,. . . ,zn,) of probe signals and LTM traces, respectively. Such a dot product is often written as
T,= B, * z t .
(11)
The transformation of the vector z’ = (zl,z2, . . . ,in)of all STM traces into the vector T’ = ( T I T , z ,. . . ,T,)of all dot products, specifically
z*+ T’, completely describes how STM traces generate feedback signals within the network. A transformation by dot products as in equation (12) is said to define a filter. Because the LTM traces z, that gate the signals B,can be changed by experience, the transformation (12) is said to define an adaptive filter. Thus the concepts of feedback signaling and adaptive filtering are identiral in equation ( 6 ) . E. Lateral Inhibition C k , in equation (6) describes the total inhibitory signal from all nodes Term Wk to w j . An illustrative choice of the inhibitory signal from wk to v, is
1
is the time lag for a signal to be transmitted where g(E is a sigmoid signal function, (“spread” between v k and v,, and c k t describes the strength of the inhibitory path from Vk to V I . F. Automatic Activation of Content-Addressable Nodes Function Z , ( t ) in equation ( 6 ) is an input corresponding to presentation of the ith event through time. The input & ( t ) can be large during and shortly after the event and otherwise equals zero. The input automatically excites u, in the sense that the input has a direct effect on the STM activity of its target node. In all, each STM trace can decay, can be activated by external stimuli, and can interact with other nodes via sums of gated excitatory signals and inhibitory signals. These equations can be generalized in several ways (Grossberg, 1974, 1982d). For example, LTM traces for inhibitory pathwayscan also be defined (Grossberg, 1969b) and in a way that avoids the difficulties of Estes’s (1972) theory in Section 1. The Appendix describes a more general version of the equations that includes stable conditionable inhibitory pathways.
chapter 6
324
G. LTMDecay Function Dl,in equation (7) is the decay rate of the LTM trace z,,. The LTM decay rate, like the STM decay rate, can depend on the state of the system as a whole. For example, in principle it can be changed by attentional signals, probe signals, slow threshold fluctuations, and the like without destroying the invariants of associative learning that I need to carry out my argument (Grossberg, 1972c, 1974, 1982d). H . Read-In of STM into LTM: Stimulus Sampling Function El, in equation (7) describes a learning signal from u, to Sl,that drives the Otherwise LTM changes in z,, at S,J. In other words, P, eampks v, by turning on El,. expressed, the STM trace z, is read-into the LTM trace z,, by turning on the sampling signal E,,. In the simplest case, E,, is proportional to B,,. By setting both D,, and E,, equal to zero in equation (7),a pathway e,, can be converted from a conditionable pathway to a prewired pathway that is incapable of learning. .4n important technical issue concerns the most general relationship that can exist between B,, and E,,. It has been proven that, in a precise mathematical sense, unbiased learning occurs if L‘B,Jis large only if E,, is large” (Grossberg, 1972c, 1982d). This condition, called a local flow condition, is interpreted physically as follows. After the sampling signal E,, reaches S,,, it influences learning by zl,within S., The sampling signal Ell is also averaged, delayed, or otherwise transformed within S,, to give rise to the performance signal B,,. This signal acts at a ‘‘later stage” within S,, than EI, because B,,energizes the net effect B , , Z , ~ of v, upon v3. The mathematical local flow condition shows that this physical interpretation of the relationship between E,, and Btl is sufficient to guarantee unbiased learning. I. Mutual Interaction of STM and LTM By joining together terms D,,z,, and E,,z,, it follows from equation (7) that the LTM trace z,, is a time average of the product of learning signals El,from w, to S,,, with STM traces at u,. When tegchanges in size, it alters the gated signals from u, to ul via term B,,zs,, and thus the value of the STM trace 2,. In this way the STM and LTM traces mutually influence each other, albeit on different spatial and temporal scales. 4.
LTM Unit is a Spatial Pattern: Sampling and Factorization
To understand the functional units of goal-oriented behavior, it is necessary to characterize the functional unit of long-term memory in an associative network. This problem was approached by first analysing what the minimal anatomy capable of associative learning can actually learn (Grossberg, 1967, 1968a, 1968g, 1970b) and then proving that the same functional unit of memory is computed in much more general anatomies (Grossberg, 1969b, 1972c, 1974). Three properties that were discovered by these investigations will be needed here: 1) The functional unit of LTM is a spatial pattern of activity. 2 ) A spatial pattern is encoded in LTM by a process of stimulus sampling. 3) The learning process factorizes the input properties which energize learning and performance from the spatial patterns to be learned and performed. Each of these abstract properties is a computational universal that appears under different names in ostensibly unrelated concrete applications. Henceforth in the chapter, an abstract property will be described before it is applied to concrete examples.
The Adaptive Self-Organization of Serial Order in Behavior
325
5. Oiitstnr Lrnrning: Factoriring Cohrrrnt P n t i c w n F r o m C h n o t i c Ac-
tivity The minimal anatomy capable of agsociative learning is depicted in Figure 3a. A single node, or population, V O , is activated by an external event via an input function I o ( t ) . This event is called the sampling event. For example, in studies of classical conditioning, the sampling event is the conditioned stimulus (CS). If the sampling event causes the signal thresholds of node vo to be exceeded by its STM trace T O ,then learning signals Eo, propagate along the pathways eot toward a certain number of nodes utr i = 1 , 2 , . . ., n. The same analysis of learning applies no matter how many nodes v, exist, provided that at least two nodes exist (n 2) to permit some learning to occur. The learning signals EO, are also called sampling signals because their size influences the learning rate, with no learning occurring when all signals Eo, are equal to zero. The sampling signals Eo, from vo do not activate the nodes v, directly. In contrast, directly influence the nodes v, by activating the LTM-gated performance signals BO,ZO, their STM traces 2,. The nodes vt can also be activated directly by the events to be learned. These events are represented by the inputs I , ( t ) which activate the STM traces 2, of the nodes u t , i = 1 , 2 , . . . ,n. Because the signals Eorenable the 20, to sample STM traces, the inputs ( I , ( t ) ,Zz(t), . . . ,Zn(t)) are called the sampled event. In studies of classical conditioning, the sampled event IS the unconditioned stimulus (UCS). The output signals from the nodes v, that are caused by the UCS control the network’s unconditioned response (UCR). The sampling signals Eot directly activate the performance signals Bot and the LTM trace zol rather than the STM traces 5 , . These LTM traces are computed a t the synaptic knobs So, that abut the nodes v,. This location permits the LTM traces 20, to sample the STM traces z, when they are activated by the sampling signals Eo,. Such a minimal network is called an outsfar because it can be redrawn as in Figure 3b. Mathematical analysis of an outstar reveals that it can learn a spatial pattern which is a sampled event to the nodes vl, vz, . . . , vn whose inputs Z, have a fixed relative size while uo’s sampling signals are active. If the inputs I, have a fixed relative size, they can be rewritten in the form Il(t) = W ) (14) where 8, is the constant relative input size, or “reflectance,” and the function 1(t)is the fluctuating total activity, or “background” input, of the sampled event. The convention that C:=l 8, = 1 guarantees that Z(t) represents the tofal sampled input t o the outstar, specifically I ( t ) = C:=l I t ( f ) .The pattern weights of the sampled event is the vector
>
of constant relative input sizes. The outstar learns this vector. The assertion that an outstar can learn a vector 8 means the following. During learning trials, the sampling event is followed a number of times by the sampled event. Thus the inputs Zo(t) and Z(t) can oscillate wildly through time. Despite these wild oscillations, however, learning in an outstar does not oscillate. Rather, the outstar can progressively, or monotonically, learn the invariant spatial pattern B across trials, corresponding to the intuitive notion that “practice makes perfect” (Figure 4 ) . The outstar does this by using the fluctuating inputs Z o ( t ) and I ( t ) as energy to drive its encoding of the pattern 8. The fluctuating inputs I o ( t ) and I ( t ) determine the rate of learning but not the pattern 8 that is learned. This is the property of factorization: fluctuating input energy determines the learning rate, while t,he invariant input pattern determines what is learned. The factorization property shows that the outstar can detect and encode temporally coherent relationships among the inputs that represent the sampled event.
326
Chapter 6
cs
ucs
Ibl
Figure 3. The minimal network capable of associative pattern learning: (a) A conditioned stimulus (CS) activates a single node, or cell population, V O , whichsends sampling signals to a set of nodes v1, v2, . . . ,v,. An input pattern representing an unconditioned stimulus (UCS)activates the nodes u1, 212,. . . ,u,, which elicit output signals that contribute to the unconditioned response (UCR). The sampling signals from uo activate the LTM traces 20, that are computed at the synaptic knobs SO,,i = 1,2,. . . ,n. The activated LTM traces can learn the activity pattern across vl ,v2,. ..,v, that represents the UCS. (b) When the sampling network in (a) is drawn to emphasize its symmetry, the result is an outetar wherein vo is the sampling source and the set { q ,v l , . ..,v,} is the sampled border.
The Adaptive Self-organization of Serial Order in Behavior
x ,(t) t
321
Amount learned on successive trials
Figure 4. Oscillatory inputs due to repetitive A-then-B presentations are translated into a monotonic learned reaction of the corresponding stimulus sampling probabilities. In the text, fluctuations in the sampling input Z o ( t ) and total sampled input I ( t ) , as well as the monotonic reactions of the relative LTM traces Zo,(t), generalize the A-then-B interpretation. In mathematical terms, factorization implies that the relative LTM traces
are drawn monotonically toward the target ratios 8,. Stimulus sampling means that the LTM ratios 2, change only when the sampling signals from vo to the synaptic knobs So, are positive. Because the LTM ratios form a probability distribution (each 2, 2 0 and C:=, 2, = 1) and change only when sampling signals are emitted, I call them the stimulus sampling probabilities of an outstar. The behavior of these quantities explicates the probabilistic intuitions underlying stimulus sampling theory (Neimark and Estes, 1967) in terms of the deterministic learning dynamics of a neural network. In particular, the factorization property dynamically explains various properties that are sssumed in a stimulus sampling model; for example, why learning curves should be monotonic in response to wildly oscillating inputs (Figure 4). The property of factorization also has an important meaning during performance trials. Both sampling signals and performance signals are released during performance trials (Grossberg, 1972~).The property of factorization means that the performance signal may be chosen to be any nonnegative and continuous function of time without destroying the outstar’s memory of the spatial pattern that was encoded in LTM on learning trials. The main constraint is that the pattern weights 8, be read out synchronously from all the nodes v,. What happens if the sampled event to an outstar is not a spatial pattern, as in the case when a series of sampled events occur, rather than a single event? Such an event series can be represented by a vector input
t 2 0, where each input Z , ( t ) is a nonnegative and continuous function of time. Because
Chopter 6
328
each input, I , ( t ) is continuous, the relative pattern weights
are also continuous functions of time. as in the vector function
of pattern weights. Mathematical analysis of the outstar reveals that its LTM traces learn a spatial pattern even if the weights e ( t ) vary through time. The spatial pattern that is encoded in LTM is a weighted average of all the spatial patterns 6 ( t ) that are registered at the nodes u, while sampling signals from 00 are active. This result raises the question: How can each of the patterns 6 ( t ) be encoded in LTM, rather than an average of them all? The properties of outstar learning (Section 4) readily suggest an answer to this question. This answer propelled the theory on one of its roads toward a heightened understanding of the serial order problem. Before following this road, some applications of outstar learning are now summarized.
6. Sensory Expectations, Motor Synergies, and Temporal Order Information The fact that associative networks encode spatial patterns in LTM suggests that the brain's sensory, motor, and cognitive computations are all pattern transformations. This expectation arises from the fact that computations which cannot in principle be encoded in LTM can have no adaptive value and thus would presumably atrophy during evolution. Examples of spatial patterns as functional units of sensory processing include the reflectance patterns of visual processing (Cornsweet, 1970),the sound spectrograms of speech processing Cole, Rudnicky, Zue, and Reddy, 1980; Klatt, 1980), the smell-induced patterns of 01 actory bulb processing (Freeman, 1975), and the tasteinduced patterns of thalamic processing (Erickson, 1963). More central types of pattern processing are also needed to understand the self-organization of serial order. A. Sensory Expectancies Suppose that the cells u1 ,u2,. . ,u, are sensory feature detectors in a network's sensory cortex. A spatial pattern across these feature detectors may encode a visual or auditory event. The relative activation of each vt then determines the relative importance of each feature in the global STM representation of the event across the cortex. Such a spatial pattern code can effectively represent an event even if the individual feature detectors u, are broadly tuned. Using outstar dynamics, even a single command node vo can learn and perform an arbitrary sensory representation of this sort. The pattern read out by uo is often interpreted as the representation that uo ezpeete to find across the field ul, v2,. ,u, due to prior experience. In this context, outstar pattern learning illustrates top-down expectancy learning (Section 25). The expectancy controlled by a given node uo is a time-average of all the spatial patterns that it ever sampled. Thus, it need not equal any one of these patterns. B. Motor Synergies Suppose that the cells u1, uz, .. . ,u, are motor control cells such that each ut can excite a particular group of muscles. A larger signal from each u, then causes a faster contraction of its target muscles. Spatial pattern learning in this context means that an outstar command node uo can learn and perform fixed relative rates of contraction across all the motor control cells v1,u2, ...,u,. Such a spatial pattern can control a motor synergy, such as playing a chord on the piano with prescribed fingers; making a
1
.
..
The Adaptive Self Organizationof Serial Order in Behavior
329
synchronous motion of the wrist, arm, and shoulder: or actinting a prescribed target configuration of lips and tongue nhile iittering a speech sound (Section 32). Because outstar memory is not disturbed when the performance signal from vo is increased or derreased, such a motor synergy, once learned, can be performed at a variety of synchronous rates without requiring the motor pattern to be relearned at each new rate (Kelso, Southard, and Goodman, 1979; Soerhting and Laquaniti, 1981). In other words, the factorization of pattern and energy provides a basis for independently processing the command needed to reach a terminal motor target and the velocity with which the target will be approached. This property may be better understood through the following example. When I look at a nearby object, I can choose to touch it with my left hand, my right hand, my nose, and so on. Several terminal motor maps are simultaneously available to move their corresponding motor organs towards the object. “Willing” one of these acts releases the corresponding terminal motor map, but not the others. The chosen motor organ ran, moreover, be moved towards the invariant goal at a wide range of velocities. The distinction between the invariant terminal motor map and the flexibly programmable performance signal illustrates how factorization prominently enters the problems of learned motor control. C. Temporal Order Information Over Item Representations Suppose that a sequence of item representations is activated in a prescribed order during perception of a list. At any given moment, a spatial pattern of STM activity exists across the excited populations. Were the same items excited in a different order by a different list, a different spatial pattern of STM activity would be elicited. Thus the spatial pattern reflects temporal order information as well as item information. An outstar sampling source can encode this spatial pattern as easily as any other spatial pattern. Thus although an outstar can encode only a spatial pattern, this pattern can represent temporal properties of external events. Such a spatial encoding of temporal order information is perhaps the example par ezcellenee of a network’s parallel processing capabilities. How a network can encode temporal order information in STM without falling into the difficulties mentioned in Section 2 will be described in Section 34.
7. Ritualistic Learning of Serial Bchavior: Avalariches The following sections approach the problem of serial order in stages. These stages mark different levels of sophistication in a network’s ability to react adaptively to environmental feedback. The stages represent a form of conceptual evolution in the theory reflecting the different levels of behavioral evolution that exist across phylogeny. The first stage shows how outstar learning capabilities can be used to design a minimal network capable of associatively learning and/or performing an arbitrary sequence of events, such as a piano sonata or a dance. This construction is called an avalanche (Grossberg, 1969g, 1970a, 1970b) because its sampling signal traverses a long axon that activates regularly spaced cells (Figure 5 ) in a manner reminiscent of how avalancheconduction along the parallel fibers in the cerebellum activates regularly spaced Purkinje cells (Ecrles, Ito, and Szentagothai, 1967; Grossberg, 1969d). The simplest avalanche requires only one node to encode the memory of the entire sequence of events. Thus, the construction shows that complex performance per se is easily achieved by a small and simple neural network. The simplest avalanche also exhibits several disadvantages stemming from the fact that its performance is ritualistic in several senses. Each of these disadvantages has a remedy that propels the theory forward. Performance is temporally ritualistic because once performance has been initiated, it cannot be rhythmically modified by the performer or by conflicting environmental demands. Performance is spatially ritualistic in the sense that the motor patterns to be performed do not have learned sensory referents.
Chapter 6
330
5
5
Figure 6. An avalanche is the minimal network that can associatively learn and rituaemits a brief sampling listically perform any space-time pattern. The sampling node pulse that serially excites the outstar sampling bouquets that converge on the sampled field F(’)= { w ; ’ ) , w ! ) , . . . ,vi’’}. On performance trials, a sampling pulse resynthesizes the space-time pattern as a series of smoothly interpolated spatial patterns.
”1’)
The first modification of the avalanche enables performance speed to be continuously modulated, or even terminated, in response to performer or environmental signals. This flexibility can be achieved on performance trials without any further learning of the ordered patterns themselves. The construction thus provides a starting point for analysing how order information and rhythm c a n be decoupled in more complex learning situations. The construction is not of merely formal interest, however, since it shares many properties with the command cell anatomies of invertebrates (Dethier, 1968;Hoyle, 1977; Kennedy, 1968; Stein, 1971; Willows, 1968). With the modified avalanche construction before us, some design issues become evident concerning how to overcome the network’s spatially ritualistic properties. The
The Adaetive Self-organizationof Serial Order in Behavior
33 1
pursuit of these issues leads to a study of Serial learning and chunking that, in turn, provides concepts for building a theory of recognition and recall. The needed serial learning and chunking properties are also properties of the embedding field equations, albeit in a different processing context than that of outstar learning. Because the avalanche constructions require a hierarchy of network stages, superscripts are used on the following variables. Suppose that the act to be learned is con";'), . . . , L J ~ ' ) ,henceforth called the field of cells F(I). This trolled by a set of nodes field replaces the nodes q ,v 2 , . . . , vn of an outstar. Let each node receive a nonnegative and continuous input II(t),1 2 0, i = 1 , 2 , . . . ,n. The set of inputs I,(t) collectively form a vector input J ( t ) = ( Z ~ ( t ) , I z ( t ) , . I. Z. n ( t ) ) l (17) t 2 0, that characterizes the commands controlling the sequence of events. At the end of Section 4 I raised the question of how such a vector input could be learned despite the outstar's ability to learn only one spatial pattern. An avalanche can accomplish this task using a single encoding cell in the following way. Speaking intuitively, J(1) describes a moving picture playing through time on the "screen" of nodes dl). An avalanche can learn and perform such a "movie" as a sequence of still pictures that are smoothly interpolated through time. Because each input Z , ( t ) is continuous, the pattern weights
"!'),
are also continuous and can therefore be, arbhrarily closely approximated by a sequence of values 0; (0)3 Bi ( 2 1 2 > 213 > ... > 21,
(24)
due to the fact that the list of items r , , rz, . . . , r, was previously presented to F ( 2 ) . Consequently, when a performance signal from u i 3 ) is gated by these LTM traces, an STM pattern across F(’) is generated such that
A reaction time rule such as equation (23) initiates an output signal faster from a node with a large STM activity than from a node with a small STM activity. The chain of STM inequalities (25) can thus be translated into the correct order or performance using such a reaction time rule if the following problem of perseveration can be prevented. After the first item rl is performed, the output signal from v i 2 ) must shut off to prevent a sustained output signal from interfering with the performance of later items. A specific inhibitory feedback pathway thus inhibits zr)after a signal is emitted from viz) (Figure 10). The same perseveration problem then faces the remaining active nodes u r ) , u p ) , . . . , Hence every output pathway from F(’) can activate a specific inhibitory feedback pathway whose activation can self-inhibit the corresponding STM trace (Grossberg, 1978a, 1978e; Rumelhart and Norman, 1982). With this performance mechanism in hand, we now consider the more difficult problem of how the chain of LTM inequalities (24) can be learned during presentation of a list of items r l , r 2 , .. . ,r,.
WE).
14. The Problem of STM-LTM O r d e r Reversal The following example illustrates the problem in its most severe form. The STM properties that I now consider will, however, have to be generalized in Section 34. Suppose that each node v i 2 ) is excited by a fixed amount when the i t h list item r, is presented. Suppose also that, as time goes on, the STM trace z!’) gets smaller due either to internodal competition or passive trace decay. Which of the two decay merhanisms is used does not affect the basic result, although different mechanisms will cause testable secondary differences in the order information to be encoded in LTM. Whichever decay mechanism is used, in response to serially presented lists, the last item to have occurred always has the largest STM trace. In other words, a recency effect exists at each time in STM (Figure 9b). Given this property, how is the chain of LTM inequalities learned? In other words, how does a sequence of recency gradients in STM get translated into a primacy gradient in LTM? I call this issue the STM-LTM order reversal problem (Grossberg, 1978e). The same problem arises during serial verbal learning, but in a manner that disguises its relevance to planned serial behavior. In this task, the generalization gradients of errors at each list position have the qualitative form depicted in Figure 11. A gradient of anticipatory (forward) errors occurs a t the beginning of the list, a two-sided gradient of anticipatory and perseverative (backward) errors near the middle of the list, and a gradient of perseverative errors at the end of the list (Osgood, 1953). I suggest that the gradient of anticipatory errors at the beginning of the list is learned in the same way as a primacy gradient in LTM. I have shown (Grossberg, 1969f) that the same associative laws also generate the other position-sensitive error gradients. Thus a command node that is activated after the entire list is presented encodes a recency gradient in LTM rather than the primacy gradient that is encoded by a command node activated before (or when) the first list item is presented. The same laws also provide an explanation
340
Chapter 6
Figure 9. Simultaneous encoding of context and temporal order by top-down STM-
LTM order reversal: (a) The context node v y ) reads-out a primacy gradient across the item representations of F ( 2 ) . (b) The context node v i 3 ) can learn a primacy gradient in LTM by multiplicatively sampling and additively storing a temporal series of STM recency gradients across F ( a ) .
The Adaptive Self-Otganizarion of Serial Order in Behavior
341
STM P A T T E R N
STM COMPET I T ION SELFINHIBITORY
REHEARSAL WAVE
THRESHOLD
Figure 10. A reaction time rule translates larger STM activities into faster output onsets. Output-contingent STM self-inhibition prevents item perseveration. of why the curve of cumulative errors versus list position is bowed and skewed toward the end of the list, and of why associations at the beginning of the list are often, but not always, learned faster than associations at the end of the list (Grossberg and Pepe, 1970, 1971). From the perspective of planned serial behavior, these results show how the activation of a command node at different times during list presentation causes the node to encode totally different patterns of order information in LTM. Thus the learning of order information is highly context-sensitive. If a command node is activated by a prescribed list subsequence via F @ )--+ F(3)signals that subserve chunking and recognition, then this subsequence constrains the order information that its command node will encode by determining the time at which the command node is activated. Moreover, this context-sensitivity is not just a matter of determining which item representations will be sampled, as the issue of STM-LTM order reversal clearly shows. An important conclusion of this analysis is that the same sort of context-sensitive LTM gradients are learned on a single trial regardless of whether command nodes sample item representations at different times, or if the item representations sample each other
342
Chapter 6
Figure 11. Each sampling node vJ learns a different LTM pattern z, = ( z j 1 , z j 2 , .. ., zJn) if it samples at different times. In a list of length n = I, whose intertrial interval is sufficiently long, a node that starts sampling at the list beginning ( j 2 1) learns a primacy gradient in LTM. At the list end ( j Z L), a recency gradient in LTM is learned. Near the list middle ( j 2 L / 2 ) , a two-sided LTM gradient is learned. When STM probes read different LTM patterns z j into STM, the different patterns generate different error gradients due to the action of internal noise, the simultaneous read-out by other probes, and the STM competition that acts after LTM read-out. through time. Although the order information that is encoded by the sampling nodes is the same, the two situations are otherwise wholly distinct. In the former case, list subsequences are the functional units that control learned performance, and many lists can be learned and performed over the same set of item representations. In the latter case, individual list items are the functional units that control learned performance, and once a given chain of associations is learned among the item representations, it will interfere with the learning of any other list ordering that is built up from the same item representations (Dixon and Horton, 1968; Lenncberg, 1967; Murdock, 1974). A third option is also available. It arises by considering a context-modulated avalanche whose serial ordering and context nodes are both self-organized by associative processes (Figure 12). In such a network, each of the item nodes can be associated with any of several other item nodes. In the absence of contextual support, activating any one of these it,em nodes causes only subliminal activation of its set of target item nodes, while activating a particular context node sensitizes a subset of associatively linked item nodes. A serial ordering of supraliminally activated item nodes can thus be generated. Such an adaptive context-modulated avalanche possesses many useful properties. For example, item nodes are no longer bound to each other via rigid associative chains. A given item can activate different items in different contexts. The inhibition of a given context node can rapidly prevent the continued performance of the item ordering that it controls, while the activation of different context nodes can rapidly instate a new performance ordering among the same items or different items. In Figure 12, the item nodes are called command nodes. This change of terminology is intended to emphasize the fact that, in order for this design to be useful, the items must represent chunks on a rather high level of network processing. The number of transitions from each command node to its successors must be reasonably small in order to achieve the type of unambiguous serial ordering that the context chunk is supposed to guarantee. The sequence chunks within the masking field discussed in
The Adaptive Self-organization of Serhl Order in Behavior
343
CONTEXT IODES
0
SERIAL LINKS AMONG COMMAND NODES
Figure 12. In an adaptive context-modulated avalanche, each command node can be associated with a set of command nodes that it subliminally activates. Learned topdown signals from a context node can also sensitize a set of command nodes. The convergent effects of top-down and internodal signals causes the supraliminal activation of command nodes in a prescribed serial order. Different context nodes can generate different serial orders. Section 38 are prime candidates for command nodes in an adaptive context-mediated avalanche. The ability of top-down and serial associative signals to activate ordered STM traces supraliminally without also unselectively activating a broad field of STM traces is facilitated by balancing these excitatory associative signals against inhibitory signals, notably inhibitory masking signals (see Section 38; Grossberg, 1978e, Sections 41-46). 15. Serial Learning This section indicates how the context-sensitive LTM gradients in Figure 11 are learned. Why the same rules imply that the cumulative error curve of serial learn-
344
Chapter 6
ing is bowed and skewed is reviewed from a recent perspective in Grossberg (1982a). First I consider how a primacy gradient (equation (24)) is encoded by the LTM traces ( z l l , z 1 2 , . . . , zlm) of a node that is first activated before, or when, the first list item is presented. I then show how a recency gradient
"I3)
Z,]
< 2,2 < . . . < Znm
(26)
is encoded by the LTM traces (znl, zn2,. . . ,znm) of a node vL3) that is first activated after the whole list is presented. A two-sided gradient zk]
< Zk2 < . . . < Zkr > Z k , r + l > . . . > Zkm
(27)
"12)
encoded by a node that is activated during the midst of the list presentation can then be understood as a combination of these effects. Let node ui3) start sending out a sampling signal El at about the time that rl is being presented. After rapidly reaching peak size, the signal El gradually decays through time with its S T M trace zf)as future list items r2, r 3 , . , . are presented. Thus El is largest when STM trace zi2)is maximal, smaller when both traces zy)and z!f' are active, smaller still when traces z?),z r ) , and zf)are active, and so on. Consequently, the product E1$) in row 1 of Figure 9b exceeds the product Elzf) in row 2 of Figure Qb, which in turn exceeds the product E l @ in row 3, and so on. Due to the slow decay of each LTM trace zlt on each learning trial, zl1adds up the products E l z y ) in successive rows of column 1, 212 adds up the products El$) in successive rows of column 2, and so on. An LTM primacy gradient (equation (24)) is thus generated due to the way in which El samples the successive STM recency gradients, and to the fact that the LTM traces 21%add up the sampled STM gradients Els,('). By contrast, the sampling signal En emitted by node u p ) samples a different set of STM gradients because starts to sample only after all the item representations v1(2) ,u2( 2 ) , . . . , u!? have already been activated on a given learning trial. Consequently, when the sampling signal En does turn on, it sees the already active STM recency gradient 21') < zf' < . . . < 5 m (2) (28)
of the entire list. Moreover, the ordering (28) persists for a while because no new items are presented until the next learning trial. Thus signal En samples a n STM recency gradient a t every time. When all sampled recency gradients are added u p through In time, they generate a recency gradient (equation (26)) in the LTM traces of summary, command nodes that are activated at the beginning, middle, or end of a list encode different LTM gradients because they multiplicatively sample STM patterns at different times and summate these products through time.
"A3).
16. Rhythm Generators and Rehearsal Waves The previous discussion forces two refinements in our ideas about how nonspecific arousal is regulated. In a context-modulated avalanche, the nonspecific arousal node 0i3) both selects the set of nodes v!') that it will control and continuously modulatesperformance velocity across these nodes. A command node that reads out temporal order information as in Figure 9a can no longer fulfill both roles. Increasing or decreasing the
The Adaptive Serf--Organizationof Senid Order in Behavior
345
command node’s activity while it reads-out its LTM pattern proportionally amplifies the STM of all its item representations. Arbitrary performance rhythms are no longer attainable, because the relative reaction times of individual item representations are constrained by the pattern of STM order information. Nor is a sustained but continuously modulated supraliminal read-out from the command node permissible, because item representations that were already performed could then be reexcited, leading to a serious perseveration problem. Thus if a nonspecific arousal source dedicated to rhythmic control is desired, it must be distinguished from the planning nodes. Only then can order information and rhythm information remain decoupled. The reader should not confuse this idea of rhythm with the performance timing that occurs when item representations are read out as fast as possible (Sternberg, Monsell, Knoll, and Wright, 1978; Sternberg, Wright, Knoll, and Monsell, 1980). Properties of such a performance can, in fact, be inferred from the mechanism for read-out of temporal order information per se (Section 47). Another type of nonspecific arousal is also needed. If read-out of LTM order informa, prevents these tion is achieved by activating the item representations across F ( 2 ) what item representations from being uncontrollably rehearsed, and thereby self-inhibited, while the list is being presented? To prevent this from happening, it is necessary to distinguish between STM artivation of an item representation and output signal generation by an active item representation. This distinction is mechanized by assuming the existence of a nonspecific rehearsal wave capable of shunting the output pathways of the item representations. When the rehearsal wave is off, the item representations can blithely reverberate their order information in STM without generating self-destructive inhibitory feedback. Only when the rehearsal wave turns on does the read-out of order informat ion begin. The distinction between STM storage and rehearsal has major implications for which planning nodes in F(3)will be activated and what they will learn. This is due to two facts working together: The rehearsal wave can determine which item subsequences will be active at any moment by rehearsing, and thereby inhibiting, one group of item representations before the next group of items is presented. Each active subsequence of item representations can, in turn, chunk its own planning node. The rehearsal wave thus mediates a subtle interaction between the item sequences that occur and the chunks that form to control future performance (Section 37).
17. Shunting Competitive Dynamics in Pattern Processing and STM: Automatic Self-Tuning by Parallel Interactions This analysis of associative mechanisms suggests that the unit of LTM is a spatial pattern. This result raises the question of how cellular tissues can accurately register input patterns in STM so that LTM mechanisms may encode them. This is a critical issue in cells because the range over which cell potentials, or STM traces, can fluctuate is finite and often narrow compared to the range over which cellular inputs can fluctuate. What prevents cells from routinely turning on all their excitable sites in response to intense input patterns, thereby becoming desensitized by saturation before they can even register the patterns to be learned? Furthermore, if small input patterns are chosen to avoid saturation, what prevents the internal noise of the cells from distorting pattern registration? This noise-saturation dilemma shows that cells are caught between two potentially devastating extremes. How do they achieve a golden mean of sensitivity that balances between these extremes? I have shown (Grossberg, 1973) that mass action competitive networks can automatically retune their sensitivity as inputs fluctuate to register input differences without being desensitized by either noise or saturation. In a neural context, these systems are called shunting on-center off-surround networks. The shunting, or mass action, dynamics are obeyed by the familiar membrane equations of neurophysiology; the automatic
Chapter 6
346
retuning is due to automatic gain control by the inhibitory signals. The fixed operating range of cells should not be viewed as an unmitigated disadvantage, By fixing their operating range once and for all, cells can also define fixed output threshold and other decision criteria with respect to this operating range. By maintaining sensitivity within this operating range despite fluctuations in total input load, cells can achieve an impressive parallel processing capability. Even if parallel input sources to the cells switch on or off unpredictably through time, thereby changing the total input to each cell, the automatic gain control mechanism can recalibrate the operating level of total STM activity to bring it into the range of the cells’ fixed decision criteria. Additive models, by contrast, do not have this capability. These properties are mathematically described in Grossberg (1983,Sections 21-23). Because the need to accurately register input patterns by cells is ubiquitous in the nervous system, competitive interactions are found at all levels of neural interactions and of my models thereof. A great deal of what is called “information processing” in other approaches to intelligence reduces in the present approach to a study of how to design a competitive, or close-to-competitive, network to carry out a particular class of computations. Several types of specialized competitive networks will be needed. As I mentioned in Section 1, the class of competitive systems includes examples which exhibit arbitrary dynamical behavior. Computer simulations t,hat yield an interesting phenomenon without attempting to characterize which competitive parameters control the phenomenon teach us very little, because a small adjustment of parameters could, in principle, generate the opposite phenomenon. To quantitatively classify the parameters that control biologically important competitive networks is therefore a major problem for theorists of mind. Grossberg (IQSla, Sections 10-27) and Cohen and Grossberg (1983)review some results of this ongoing classification.
18. Choice, Contrast Enhancement, Limited STM Capacity, and Quenching Threshold Some of the properties that I use can be illustrated by the simplest type of competitive feedback network:
where i = 1,2,.. . , n. In equation (29),term - A z a describes the passive decay of the STM trace z, at rate -A. The excitatory term (B - zt)[Z, f(zl)] describes how an excitatory input I, and an excitatory feedback signal f(z,)from v, to itself excites by mass action the unexcited sites ( B - z, of the total number of sites B at each node u,. The inhibitory term -zl(J1 &fc f t z k ) ] describes how the inhibitory input J, and the inhibitory, or competitive, feedback signals f(q) from all tIk, k # i , turn off the 5, excited sites of v, by mass action. Equation (29)strips away all extraneous factors to focus on the following issue. How does the choice of the feedback signal function f ( w ) influence the transformation and storage of input patterns in STM? To discuss this question, I assume that inputs (11, I,, . . . ,I,, J1,J z , . . . ,J,) are delivered before time t = 0 and switch off at time t = 0 after having instated an init,ial pattern z(0) = (z1(0)?z2(0),... ,.zn(0)) in the network’s STM traces. Our task is to understand how the choice of f ( w ) influences the transformation of z(0) into the stored pattern z(m) = (z1(oo),z~(oo), . ..,z,(oo)) as time increases. Figure 13 shows that different choices of f ( w ) generate markedly different storage modes. The function g ( w ) = w - ’ f ( w ) is also graphed in Figure 13 because the property that determines the type of storage is whether g(w) is an increasing, constant, or decreasing function at prescribed values of the activity w . For example, as in the
+
+
llre Adaptive SeIjWrganization of Serial Order in Behavior
347
four rows of Figure 13, a linear / ( u s ) = 011’ grnerates a constant g ( w ) = a ; a slowerthan-linear f ( w ) = a w ( b w) generates a decreasing g ( w ) = a ( b + u l ) - l ;a fasterthan-linear f(w)= a w n , n > 1, generates an increasing g ( w ) = awn-’; and a sigmoid ,(w) = aw2(b + w 2 ) - ’ generates a concave g ( w ) = ow(b w 2 ) - ’ . Both linear and slower-than-linear signal functions amplify noise. Even tiny activities are bootstrapped into large activities by the network’s positive feedback loops. This fact represents a serious challenge to linear feedback models (Grossberg, 1978d). A faster-than-linear signal function can tell the difference between small and large inputs by amplifying and storing only sufficiently large activities. Such a signal function amplifies the large activities so much more than the smaller activities that it makes a choice. Only the largest initial activity is stored in STM. A sigmoid signal function can also suppress noise, although it does so less vigorously than a faster-than-linear signal function. Consequently, activities less than a criterion level, or quenching threshold ( Q T ) , are suppressed, whereas the pattern of activities that exceeds the Q T is contrast enhanced before being stored in STM. Any network that possesses a Q T can be tuned. By increasing or decreasing the QT, the criterion of which activities represent functional signals-and hence should be processed and stored in STM -and of which activities represent functional noise-and hence should be suppressed--can be flexibly modified through time. An increase in the Q T can cause all but the largest artivities to be quenched. Thus the network can behave like a choice machine if its storage criteria are made sufficiently strict. A sudden decrease in the Q T can cause all recently presented patterns to be stored. If a novel or unexpected event suddenly decreases the Q T , all relevant data can be stored in STM until the cause of the unexpected event is learned (Grossberg, 1975, 1982b). It cannot be overemphasized that the existence of the Q T and its desirable tuning properties all follow from the use of a nonlinear signal function. To illustrate the Q T concept concretely, consider a sigmoid signal function / ( w ) that is faster-than-linear for 0 w 5 z(’)and linear for z(’)5 w 5 B . The slowerthan-linear part of / ( w ) does not affect network dynamics because each z, B by equation (29). More precisely, let f ( w ) = C w g ( w ) where C 2 0, g(w) is increasing 5 w 5 B . Grossberg (1973, pp.355-359) has for 0 5 w 5 ~ ( ‘ 1 , and g ( w ) = 1 for demonstrated that the Q T of this network is
+
’
+