Advances in
IMAGING AND ELECTRON PHYSICS VOLUME
152
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
HONORARY ASSOCIATE EDITORS
TOM MULVEY BENJAMIN KAZAN
Advances in
IMAGING AND ELECTRON PHYSICS VOLUME
152 Edited by
PETER W. HAWKES CEMES-CNRS, Toulouse, France
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier 525 B Street, Suite 1900, San Diego, California 92101-4495, USA 84 Theobald’s Road, London WC1X 8RR, UK This book is printed on acid-free paper. Copyright © 2008, Elsevier Inc. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (www.copyright.com), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2008 chapters are as shown on the title pages. If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/2008 $35.00 Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, E-mail:
[email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permissions.” For information on all Elsevier – Academic Press publications visit our Web site at www.books.elsevier.com ISBN-13: 978-0-12-374219-3 Printed in the United States of America 08 09 10 11 9 8 7 6 5 4 3 2 1
CONTENTS
Preface Contributors Future Contributions
1. Stack Filters: From Definition to Design Algorithms
ix xi xiii
1
Nina S. T. Hirata I. Introduction II. Stack Filters III. Optimal Stack Filters IV. Stack Filter Design Approaches V. Application Examples VI. Conclusion Acknowledgments References
2. The Foldy–Wouthuysen Transformation Technique in Optics
1 4 20 26 35 39 44 44
49
Sameen Ahmed Khan I. Introduction II. The Foldy–Wouthuysen Transformation III. Quantum Formalism of Charged-Particle Beam Optics IV. Quantum Methodologies in Light Beam Optics V. Conclusion Appendix A Appendix B Acknowledgments References
3. Nonlinear Systems for Image Processing
49 51 58 60 62 64 66 73 74
79
Saverio Morfu, Patrick Marquié, Brice Nofiélé, and Dominique Ginhac I. II. III. IV. V.
Introduction Mechanical Analogy Inertial Systems Reaction–Diffusion Systems Conclusion
79 83 95 108 133
v
vi
Contents
VI. Outlooks Acknowledgments Appendix A Appendix B Appendix C Appendix D References
4. Complex-Valued Neural Network and Complex-Valued Backpropagation Learning Algorithm
134 141 142 143 144 145 146
153
Tohru Nitta I. Introduction II. The Complex-Valued Neural Network III. Complex-Valued Backpropagation Learning Algorithm IV. Learning Speed V. Generalization Ability VI. Transforming Geometric Figures VII. Orthogonality of Decision Boundaries in the Complex-Valued Neuron VIII. Conclusions References
5. Blind Source Separation: The Sparsity Revolution
154 155 162 169 175 181 209 217 218
221
J. Bobin, J.-L. Starck, Y. Moudden, and M. J. Fadili I. Introduction II. Blind Source Separation: A Strenuous Inverse Problem III. Sparse Multichannel Signal Representation IV. Morphological Component Analysis for Multichannel Data V. Morphological Diversity and Blind Source Separation VI. Dealing With Hyperspectral Data VII. Applications VIII. Conclusion References
6. “Disorder”: Structured Diffuse Scattering and Local Crystal Chemistry
222 224 231 237 244 275 284 296 298
303
Ray L. Withers I. Introduction II. The Modulation Wave Approach III. Applications of The Modulation Wave Approach
303 309 313
Contents
vii
IV. Selected Case Studies V. Conclusions Acknowledgments References
323 332 332 332
Contents of Volume 151
339
Index
341
Corrigendum Color Plate Section
This page intentionally left blank
PREFACE
Six chapters make up this new volume, with contributions on electron microscopy, neural networks, stack filters, blind source separation and, a very novel topic, the Foldy-Wouthuysen transformation in optics. Stack filters, of which median filters are the best known in practice, have a large literature, some abstrusely mathematical, some experimental. N.S.T. Hirata takes us systematically through the subject, with sections on the relation between these and morphological filters, the design of optimal filters and examples of such designs. This very clear and methodical account will, I am sure, be found helpful. This is followed by an account of the Foldy-Wouthuysen transformation as applied to optics by S.A. Khan, who has already contributed to these Advances with R. Jagannathan on a related subject, the study of electron optics via quantum mechanics. First, the transformation is described and the necessary mathematics recapitulated. The quantum approach to charged particle optics is then introduced and the chapter concludes with an examination of the same approach in connection with light optics. I am delighted to include here this very novel work, which sheds a new light on the foundations of electron wave optics. The third chapter too deals with a very novel theme, the role of nonlinear systems and tools in image processing. Here, S. Morfu, P. Marquié, B. Nofiélé and D. Ginhac explain how nonlinearity extends the types of processing of interest and discuss in detail their implementation on cellular neural networks. Many of these ideas were completely new to me and I hope that readers will find them as stimulating as I did. The values of the elements of neural networks need not be real, as T. Nitta explains in a chapter on complex-valued networks. After an introduction to the notion of a complex-valued neuron, T. Nitta introduces complexvalued back-propagation learning algorithms and then considers many practical aspects of the procedure in a long and lucid presentation. The following chapter brings us back to one of the perennial problems of image processing, source separation in the absence of any detailed information about the system response. J. Bobin, J.-L. Starck, Y. Moudden and M.J. Fadili give an account of the progress that is being made thanks to sparsity and morphological diversity. All aspects of the method are presented in detail and this long chapter too is itself a short monograph on the topic.
ix
x
Preface
The volume concludes with a contribution by R.L. Withers on the problem of imaging disordered, or rather, locally ordered crystal phases, which generate highly structured diffuse intensity distributions around the strong Bragg reflections of the average structure. This clear analysis of a complex subject will certainly be frequently consulted. All these contributions contain much novel or very recent material and I am extremely pleased to include such studies in these Advances. The authors are warmly thanked for taking so much trouble to make this work accessible to a wide audience. I am delighted to report that the whole series of Advances has now been made available by Elsevier on their ScienceDirect database, right back to volume I when the title was Advances in Electronics and the editor was the late Ladislaus (Bill) Marton. Peter W. Hawkes
CONTRIBUTORS
Tohru Nitta National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki, 305-8568 Japan Ray L. Withers Research School of Chemistry, Australian National University Canberra, A.C.T, 0200, Australia S. Morfu, P. Marquié, B. Nofiélé and D. Ginhac Laboratoire LE2I UMR 5158, Aile des sciences de l’ingénieur, BP 47870 21078 Dijon, Cedex, France Sameen Ahmed Khan Engineering Department, Salalah College of Technology, Post Box No. 608, 211 Salalah, Sultanate of Oman Nina S. T. Hirata Department of Computer Science, Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010, 05508-090 São Paulo, SP – Brazil J. Bobin, J.-L. Starck and Y. Moudden Laboratoire AIM, CEA/DSM-CNRS-Université Paris Diderot, CEA Saclay, IRFU/SEDI-SAP, Service d’Astrophysique, Orme des Merisiers, 91191, Gif-surYvette, France M. J. Fadili GREYC CNRS UMR 6072, Image Processing Group, ENSICAEN 14050, Caen Cedex, France
xi
This page intentionally left blank
FUTURE CONTRIBUTIONS
S. Ando Gradient operators and edge and corner detection W. Bacsa Optical interference near surfaces, sub-wavelength microscopy and spectroscopic sensors P. E. Batson (vol. 153) First results uing the Nion third-order STEM corrector C. Beeli Structure and microscopy of quasicrystals A. B. Bleloch (vol. 153) STEM and EELS: mapping materials atom by atom C. Bobisch and R. Möller Ballistic electron microscopy G. Borgefors Distance transforms Z. Bouchal Non-diffracting optical beams F. Brackx, N. de Schepper and F. Sommen The Fourier transform in Clifford analysis A. Buchau Boundary element or integral equation methods for static and time-dependent problems B. Buchberger Gröbner bases T. Cremer Neutron microscopy N. de Jonge and E. C. Heeres Electron emission from carbon nanotubes A. X. Falcão The image foresting transform R. G. Forbes Liquid metal ion sources B. J. Ford The earliest microscopical research C. Fredembach Eigenregions for image classification
xiii
xiv
Future Contributions
A. Gölzhäuser Recent advances in electron holography with point sources D. Greenfield and M. Monastyrskii (vol. 155) Selected problems of computational charged particle optics M. Haider, H. Müller and S. Uhlemann (vol. 153) Present and future hexapole correctors for high resolution electron microscopes H. F. Harmuth and B. Meffert (vol. 154) Dirac’s difference equation and the physics of finite differences M. I. Herrera The development of electron microscopy in Spain F. Houdellier, M. Hÿtch, F. Hüe and E. Snoeck (vol. 153) Aberration correction with the SACTEM–Toulouse: from imaging to diffraction J. Isenberg Imaging IR-techniques for the characterization of solar cells K. Ishizuka Contrast transfer and crystal images A. Jacobo Intracavity type II second-harmonic generation for image processing B. Kabius and H. Rose (vol. 153) Novel aberration correction concepts L. Kipp Photon sieves A. Kirkland, P. D. Nellist, L.-Y. Chamg and S. J. Haigh (vol. 153) Aberration-corrected imaging in CTEM and STEM G. Kögel Positron microscopy T. Kohashi Spin-polarized scanning electron microscopy O. L. Krivanek. N. Dellby, R. J. Keyse, M. F. Murfitt, C. S. Own and Z. S. Szilagyi (vol. 153) Aberration correction and STEM R. Leitgeb Fourier domain and time domain optical coherence tomography B. Lencová Modern developments in electron optical calculations H. Lichte New developments in electron holography M. Matsuya Calculation of aberration coefficients using Lie algebra
Future Contributions
xv
S. McVitie Microscopy of magnetic specimens P. G. Merli and V. Morandi Scanning electron microscopy of thin films M. A. O'Keefe Electron image simulation D. Oulton and H. Owens Colorimetric imaging N. Papamarkos and A. Kesidis The inverse Hough transform K. S. Pedersen, A. Lee and M. Nielsen The scale-space properties of natural images S. J. Pennycook (vol. 153) Some applications of aberration-corrected electron microscopy E. Rau Energy analysers for electron microscopes E. Recami Superluminal solutions to wave equations H. Rose (vol. 153) History of direct aberration correction G. Schmahl X-ray microscopy R. Shimizu, T. Ikuta and Y. Takai Defocus image modulation processing in real time S. Shirai CRT gun design methods T. Soma Focus-deflection systems and their applications I. Talmon Study of complex fluids by transmission electron microscopy N. Tanaka (vol. 153) Aberration-corrected microscopy in Japan M. E. Testorf and M. Fiddy Imaging from scattered electromagnetic fields, investigations into an unsolved problem N. M. Towghi Ip norm optimal filters E. Twerdowski Defocused acoustic transmission microscopy
xvi
Future Contributions
Y. Uchikawa Electron gun optics K. Urban (vol. 153) Aberration correction in practice K. Vaeth and G. Rajeswaran Organic light-emitting arrays M. van Droogenbroeck and M. Buckley Anchors in mathematical morphology M. Yavor Optics of charged particle analysers Y. Zhu and J. Wall (vol. 153) Aberration-corrected electron microscopes at Brookhaven National Laboratory
CHAPTER
1 Stack Filters: From Definition to Design Algorithms Nina S. T. Hirata*
Contents
I Introduction II Stack Filters A Preliminaries B Stack Filters: Definition and Properties C Subclasses of Stack Filters D Relation to Morphological Filters III Optimal Stack Filters A Mean Absolute Error Optimal Stack Filters B Equivalent Optimality in the Binary Domain C Formulation as a Linear Programming Problem IV Stack Filter Design Approaches A Overview B Heuristic Solutions C Optimal Solution V Application Examples A Design Procedure B Examples VI Conclusion Acknowledgments References
1 4 4 7 10 17 20 20 23 25 26 26 27 33 35 35 36 39 44 44
I. INTRODUCTION Many nonlinear filters such as the median, rank-order, order statistic, and morphological filters became known in the 1980s (Bovik et al., 1983; Brownrigg, 1984; Haralick et al., 1987; Heygster, 1982; Huang, 1981; Justusson, 1981; Lee and Kassam, 1985; Maragos and Schafer, 1987a, b; Pitas and Venetsanopoulos, 1990; Prasad and Lee, 1989; Serra, 1982, 1988; Serra and Vincent, 1992; Wendt et al., 1986). The state of the art in the area of nonlinear filters at the end of the 1980s is compiled in one of the * Department of Computer Science, Institute of Mathematics and Statistics, University of São Paulo, Rua do Matão, 1010, 05508-090 São Paulo, SP – Brazil Advances in Imaging and Electron Physics,Volume 152, ISSN 1076-5670, DOI: 10.1016/S1076-5670(08)00601-0. Copyright © 2008 Elsevier Inc. All rights reserved.
1
2
Nina S. T. Hirata
first books on that subject (Pitas and Venetsanopoulos, 1990). Since then, several other books on nonlinear filters have been published (Dougherty and Astola, 1999; Heijmans, 1994; Marshall and Sicuranza, 2006; Mitra and Sicuranza, 2000; Soille, 2003). Many of the nonlinear filters are derived from order statistics (Pitas and Venetsanopoulos, 1992). Median filters are the best known among those based on order statistics, and they are the root of other classes of filters in the sense that many classes of filters have been derived as generalizations of the median filter. Two findings played key roles in the development of new classes of nonlinear filters from median filters: (1) the “threshold decomposition structure” first observed in median filters (Fitch et al., 1984) that allows multilevel signal filtering to be reduced to binary signal filtering, and (2) the possibility of choosing an arbitrary rank element rather than the median as the output of the filter. The first finding led to the introduction of a general class known as stack filters (Wendt et al., 1986)—the subject of this chapter, and the second one to the development of rankorder (Heygster, 1982; Justusson, 1981; Nodes and Gallagher, 1982) and order statistic filters (Bovik et al., 1983). Both stack filters and order statistic filters include the median and the rank-order filters as particular cases. Median filters initially were considered an alternative to linear filters because they have, for instance, better edge-preservation capabilities. However, compared to stack filters, when applied on images, median filters tend to produce blurred images, destroying details. An example of the effects of median and stack filters is shown in Figure 1. The stack filter does not suppress all the noise as the median filter does; however, its output is much sharper than the output of the median filter. Another major class of nonlinear filters that became known around the same time are morphological filters (Haralick et al., 1987; Serra, 1982, 1988; Serra and Vincent, 1992). They include very popular filters such as openings and closings. Although developed independently, morphological operators are strongly related to stack filters. Maragos and Schafer (1987b) have shown the connections of stack filters and morphological operators. In fact, they have shown that stack filters correspond to morphological increasing operators with flat structuring elements. This chapter provides an overview of stack filters. The previous text briefly contextualizes stack filters within the scope of nonlinear filters. The remainder of this text is written to answer the following three questions: 1. What are stack filters? 2. How do stack filters relate to other classes of filters? 3. How to design stack filters from training data? In order to answer these questions, this chapter presents an extensive account on stack filters, divided in four major sections. Section II introduces basic definitions and notations, followed by a definition of
Stack Filters: From Definition to Design Algorithms
FIGURE 1 images.
3
From left to right: original, corrupted, median-filtered, and stack-filtered
stack filters and some of their properties, such as equivalence to positive Boolean functions. The section also includes median, rank-order filters, and their generalizations viewed as subclasses of the stack filters. Section II ends with a brief explanation of the relation between stack filters and morphological operators. Section III formally characterizes the notion of optimal stack filters in the context of statistical estimation. Optimality is considered with respect to the mean absolute error (MAE) criterion because there is a linear relation between the MAE of a stack filter (relative to multilevel signals) and the MAEs relative to the binary cross sections. The formulation, based on costs derived from the joint distribution of the processes corresponding to images to be processed and respective ideal output images, allows a clear delineation between the theoretical formulation of the design problem and the process of estimating costs from data. Section IV presents an overview of the main stack filter design approaches. In particular, heuristic algorithms that provide suboptimal solutions and a recent algorithm that provides an optimal solution are described. All these algorithms use training data to explicitly or implicitly estimate the costs involved in the theoretical formulation of the design problem. Section V presents examples of images processed by stack filters that have been designed using the exact solution algorithm. The last
4
Nina S. T. Hirata
section highlights some important issues reported throughout the text and discusses some of the remaining challenges.
II. STACK FILTERS A. Preliminaries 1. Signals and Operators Formally, a digital signal defined on a certain domain E is a mapping f : E → K, where K = {0, 1, . . . , k}, with 0 < k ∈ N, is the set of intensities or gray levels.1 Given a signal definition domain E and a set of levels K, the set of all signals defined on E with levels in K will be denoted as K E . In particular, if k = 1, the signals are binary and they can be equivalently represented by subsets of E. The set of all binary signals defined on E is denoted {0, 1}E or, equivalently, P(E) (the collection of all subsets of E). If k > 1, then the signals are multilevel. The translation of a signal f ∈ K E by q ∈ E is denoted fq and defined by, for any p ∈ E, fq (p) = f (p − q). Analogously, given a set X ⊆ E, its translation by q ∈ E, denoted Xq , is defined as Xq = {p + q | p ∈ X}. The transpose of X, ˇ is defined as X ˇ = {−p | p ∈ X}. denoted X, Signal processing may be performed by operators of the form : K E → K E . Binary signal operators also can be represented by set operators, that is, mappings of the form : P(E) → P(E).
a. W-Operators. In particular, operators of great interest are those that are locally defined. The notion of local definition is characterized by a neighborhood window in the following manner. Let W ⊆ E be a finite set, to be called window. Usually, window W is a connected set of points in E, containing the origin of E. The origin of E will be denoted o. An operator : K E → K E is locally defined within W if, for any p ∈ E, [( f )](p) = [( f |Wp )](p)
(1)
where f |Wp corresponds to the input signal f restricted to W around p. This is equivalent to say that, for any p ∈ E, there exists ψp : K W → K such that
[( f )](p) = ψp ( f−p |W )
(2)
where f−p |W is just to guarantee that the domain of function ψp is W. 1 We consider E = Z for one-dimensional signals and E = Z2 for two-dimensional signals (or images).
Stack Filters: From Definition to Design Algorithms
5
Operator is translation invariant if, for any p ∈ E,
[( f )]p = ( fp )
(3)
that is, if applying the operator and then translating the output signal is equivalent to first translating the input signal and then applying the operator. An operator that is both translation invariant and locally defined within W can thus be characterized by one function ψ: K W → K (i.e., ψp = ψ for all p ∈ E). More precisely, the output of , for a given input signal f, at any location p ∈ E, is given by
[( f )](p) = ψ( f−p |W ).
(4)
These operators will be called W-operators. The function ψ is called the characteristic function of . If is a binary operator (i.e., : {0, 1}E → {0, 1}E ), then its characteristic function ψ can be seen as a Boolean function on d = |W| Boolean variables x1 , x2 , . . . , xd . More specifically, supposing W = {w1 , w2 , . . . , wd }, wi ∈ E, i = 1, 2, . . . , d, for any f ∈ {0, 1}E , ψ( f−p |W ) corresponds to the Boolean function ψ evaluated for xi = f−p (wi ), i = 1, . . . , d.
b. Increasing Operators. For any f1 , f2 ∈ K E , f1 ≤ f2 ⇐⇒ f1 (p) ≤ f2 (p), ∀p ∈ E. An operator is increasing if, for any f1 , f2 ∈ K E , f1 ≤ f2 implies ( f1 ) ≤ ( f2 ). In set notation, is increasing if, for any S1 , S2 ⊆ E, S1 ⊆ S2 implies (S1 ) ⊆ (S2 ). c. Diagram Representation of W-Operators. A visual representation of operators is useful to illustrate some concepts. Here the diagram representation of binary W-operators used henceforth is introduced. Given W, binary signals in {0, 1}W can be represented as subsets of W or, equivalently, as elements in {0, 1}d . Suppose that W = {w1 , w2 , w3 }, the binary signal g ∈ {0, 1}W with g(w1 ) = 0, g(w2 ) = 0, and g(w3 ) = 0 corresponds to the element 000 ∈ {0, 1}3 , and so on. The set {0, 1}d with the usual ≤ relation (i.e., for any u = (u1 , u2 , . . . , ud ), v = (v1 , v2 , . . . , vd ) ∈ {0, 1}d , u ≤ v if and only if ui ≤ vi , i = 1, 2, . . . , d) is a partially ordered set. Together with the usual logical operations (OR +, AND ·, and NEGATION ·), it forms a Boolean lattice. Partially ordered sets can be depicted by Hasse diagrams. The diagram at the left side of Figure 2 corresponds to the representation of {0, 1}3 . Each element of the lattice is represented by a vertex and two vertices corresponding to elements u and v, such that u < v, are linked if and only if there is no other element w such that u < w < v. The diagram at the right side
6
Nina S. T. Hirata
111
011 001
101
110 100
010
000 FIGURE 2 Left: representation of {0, 1}3 . Right: representation of ψ: {0, 1}3 →{0, 1} with ψ(111) = ψ(011) = ψ(101) = ψ(001) = 1 and ψ(110) = ψ(010) = ψ(100) = ψ(000) = 0.
corresponds to the representation of the function ψ: {0, 1}3 → {0, 1} with ψ(111) = ψ(011) = ψ(101) = ψ(001) = 1 and ψ(110) = ψ(010) = ψ(100) = ψ(000) = 0. Elements in {0, 1}3 mapped to 1 are depicted by solid circles, whereas those mapped to 0 are depicted by open circles. In particular, in this example ψ is increasing (i.e., if ψ(u) = 1, then ψ(v) = 1 for any v > u). If a W-operator is increasing, whenever an element is solid, all elements above it (according to the partial order relation) are necessarily solid in its Hasse diagram representation.
2. Thresholding and Stacking a. Thresholding. The threshold of a value y ∈ K at a level t ∈ K is represented by Tt (y) =
1, if y ≥ t, 0, if y < t.
(5)
The mapping Tt from K E to {0, 1}E , given by, for any f ∈ K E and t ∈ K,
(Tt [ f ])(p) = Tt ( f (p)), p ∈ E,
(6)
is called the threshold of f at level t. The binary signal Tt [ f ] defines a subset of E, called the cross section of f at level t. Notice that Tt (·) denotes a single value, whereas Tt [·] denotes a signal.
b. Threshold Decomposition Structure. According to the threshold decomposition structure of a signal, any signal f ∈ K E can be expressed as f (p) =
k t=1
Tt ( f (p)), p ∈ E.
(7)
Stack Filters: From Definition to Design Algorithms
7
Gray-level signal (Threshold level 3) (Threshold level 2)
ADD
(Threshold level 1) FIGURE 3 Threshold decomposition structure of a 1D signal.
Figure 3 shows a one-dimensional (1D) signal of length 11 with k = 3 and its threshold decomposition structure. By summing the cross sections, the original multilevel signal can be retrieved.
c. Operators That Commute With Thresholding. Hereafter, operators : K E → K E are assumed to satisfy ({0, 1}E ) ⊆ {0, 1}E (that is, all binary signals are mapped to binary signals). Definition 1. Let : K E → K E . Then commutes with the threshold operation if and only if Tt [( f )] = (Tt [ f ]) for all t ∈ K and f ∈ K E . In other words, applying on a signal f and then thresholding the resulting signal at any level t yields a binary signal that is exactly the same as the one obtained by first thresholding f at level t and then applying . Theorem 1 (Maragos and Schafer, 1987a). An operator commutes with thresholding if and only if it is increasing. If commutes with thresholding, it is an immediate consequence that its characteristic function ψ also commutes with thresholding (i.e., Tt (ψ( g)) = ψ(Tt [g]) for all g ∈ K W ). Moreover, it is not difficult to see that ψ is also increasing.
B. Stack Filters: Definition and Properties Before defining stack filters, it is expedient to understand median filters and their threshold decomposition structure. Median filters are parameterized by a sliding window of odd size d. For each location, the output is the median of the d observations under the window. Figure 4 shows the characteristic function of the 3-point window binary median filter. For each element in {0, 1}3 , output is 1 only if at least two components have value equal to 1. The threshold decomposition structure of median filters (Fitch et al., 1984) is illustrated in Figure 5. At the top left is the input signal, and at the top right is the median filtered signal. In the lower part, at the left side are
8
Nina S. T. Hirata
111
011 001
101 010
110 100
000 FIGURE 4 Three-point width median filter. Output for an element in {0, 1}3 is 1 (solid circles) if and only if at least two components have value 1.
Median
Thresholding
Stacking Binary median Binary median Binary median Binary median Binary median
FIGURE 5 Threshold decomposition structure of median filters.
the five binary signals obtained by thresholding the input signal; the right side shows the respective median filtered signals. Their addition equals the output signal. The fact that median filters possess the threshold decomposition structure implies that the median filtered output for a multilevel signal can be obtained as a sum of the (binary) median filtered outputs of its cross sections. Note that the median of binary observations can be computed based only on counting (see details later). Hence, sorting of d observations required by multilevel median filters may be avoided. From a practical point of view, simplicity of the counting circuitry over the sorting circuitry was an important issue for hardware implementation and it has prompted investigations to find other filters with the same decomposition structure. These investigations led to the introduction of the stack filters. Stack filters are commonly defined as the filters that “obey a weak superposition property known as the threshold decomposition and an ordering property known as the stacking property” (Coyle and Lin, 1988; Wendt et al., 1986).
Stack Filters: From Definition to Design Algorithms
9
An operator , characterized by function ψ, obeys the threshold decomposition property if
[( f )](p) =
k
ψ(Tt [ f−p |W ]),
(8)
t=1
and it obeys the stacking property if for all 1 ≤ t < k
(Tt [ f ]) ≥ (Tt+1 [ f ]).
(9)
Notice that the stacking property is nothing more than increasingness. It is possible to obtain different filters by considering different binary (Boolean) functions at the right side of Eq. (8). Provided the chosen function obeys the stacking property, the resulting operator is a stack filter. Gilbert (1954) showed that Boolean functions obeying the stacking property are the monotone (positive) ones (Coyle and Lin, 1988). Thus, a stack filter can be built simply by choosing a positive Boolean function (PBF) for the right side of Eq. (8). Example 1. A 3-point width window binary median filter outputs 1 if and only if at least two components in the input have value 1. By assigning Boolean variables x1 , x2 , and x3 , respectively, to the three input components, the filter can be characterized by the Boolean function ψ(x1 , x2 , x3 ) = x1 x2 + x1 x3 + x2 x3 . The + signal corresponds to the logical OR, while xi xj expresses the logical AND operation between xi and xj . The corresponding multilevel median is obtained by replacing the logical AND by minand the logical OR, by max. Thus, given v1 , v2 , v3 ∈ K, med(v1 , v2 , v3 ) = max min{v1 , v2 }, min{v1 , v3 }, min{v2 , v3 } . Another characterization of stack filters is as operators that commute with thresholding (see Definition 1.1). If commutes with thresholding, then it can be expressed as
[( f )](p) =
k
Tt ([( f )](p))
t=1
=
k [(Tt [ f ])](p) t=1
=
k
ψ(Tt [ f−p |W ])
t=1
= max{t ∈ K | ψ(Tt [ f−p |W ]) = 1}.
(10)
10
Nina S. T. Hirata
The first equality is simply the threshold decomposition of ( f ); the second one holds because commutes with thresholding; the third one rewrites the second in terms of the characteristic function; and since they are increasing, if ψ(Tt [ f−p |W ]) = 1 for a given t, then ψ(Tt [ f−p |W ]) = 1 for all t < t (and, equivalently, if ψ(Tt [ f−p |W ]) = 0 for a given t, then ψ(Tt [ f−p |W ]) = 0 for all t > t), which implies the last equality. From Eq. (10) it follows that operators that commute with thresholding obey the threshold decomposition [Eq. (8)] and, from Theorem 1 it follows that they obey the stacking property [Eq. (9)]. Conversely, operators that obey the threshold decomposition and the stacking property commute with thresholding. To see that, let be an operator that obeys the threshold decomposition and the stacking property. From the threshold decomposition structure of ( f ) and Eq. (8),
( f ) =
k
Tt [( f )] =
t=1
k
(Tt [ f ]).
(11)
t=1
Moreover, T1 [( f )] ≥ T2 [( f )] ≥ . . . ≥ Tk [( f )] because thresholding generates a non-increasing sequence of cross sections, and (T1 [ f ]) ≥ (T2 [ f ]) ≥ . . . ≥ (Tk [ f ]) because obeys the stacking property. Hence, it can be concluded that Tt [( f )] = (Tt [ f ]) for any t = 1, 2, . . . , k. In summary, stack filters can be characterized as those that (1) possess the threshold decomposition and the stacking properties, (2) correspond to the positive Boolean functions when their domain is restricted to binary signals, or (3) commute with thresholding.
C. Subclasses of Stack Filters The best known subclass of the stack filters are the median filters. A natural extension of the median filters are the rank-order filters. These two classes of filters and respective weighted versions are reviewed in this section.
1. Median Filters The use of the median as a filter was first proposed in the early 1970s by Tukey for time series analysis (Tukey, 1974, 1977). Median filters were soon extended for two-dimensional (2D) signals (images) (Pratt, 1978) and became very popular due to their simplicity, capability for preserving edges better than linear filters (Pitas and Venetsanopoulos, 1990), and for removing impulse noise. Since then, several works on this subject have been published dealing with their properties (Gallagher and Wise 1981; Nodes and Gallagher, 1982) or their applications (Narendra, 1978; Schmitt et al., 1984; Scollar et al., 1984; Tyan, 1982). However, it has been observed
Stack Filters: From Definition to Design Algorithms
11
that median filters may cause edge jitter (Bovik et al., 1987) or streaking (Bovik, 1987), may destroy fine details (Nieminen et al., 1987), and cannot be tuned to remove or retain some predefined set of feature types (Brownrigg, 1984). To overcome these drawbacks, one modification of median filters resulted in the class of weighted median filters (Justusson, 1981). They act basically in the same manner as median filters except that weighted median filters assign a weight to each point in the sliding window and then the median is taken after duplicating each sample in the input by its corresponding weight. The use of weighted median to filter particular structural patterns from images has been investigated by Brownrigg (1984). Figure 6 shows the action of the weighted median filter based on a 5-point window, with weights (1, 2, 3, 2, 1). Many other variations of the median filter (a compilation may be found in Pitas and Venetsanopoulos (1990)), as well as studies of their deterministic and statistical properties (Gallagher and Wise, 1981; Justusson, 1981; Ko and Lee, 1991; Nodes and Gallagher 1982; Prasad and Lee 1989; Sun et al., 1994; Tyan, 1982; Yin et al., 1996) have been reported. Applications of median filters reported in the literature include a varying set of problems in signal and image processing, such as the elimination of pitches in digital speech signals (Rabiner et al., 1975), correction of transmission errors in digital speech signals (Jayant, 1976), the correction of scanner noise by removing salt-and-pepper artifacts (Wecksung and Campbell, 1974), enhancement of edge gradients by elimination of spurious oscillations (Frieden, 1976), image enhancement (Huang, 1981; Loupas et al., 1987; Narendra, 1981; Pratt, 1978; Rosenfeld and Kak, 1982; Scollar et al., 1984), satellite image processing (Carla et al., 1986), and biological/biomedical image processing (Grochulski et al., 1985; Schmitt et al., 1984). As mentioned previously, median filters possess the threshold decomposition structure (Fitch et al., 1984). Figure 7 shows the equivalence of
sort
duplicate
median FIGURE 6 Weighted median filter: window of size 5 and weight vector (1, 2, 3, 2, 1).
12
Nina S. T. Hirata
computing the median on multilevel data or on their cross sections. The shaded five columns in the signal correspond to the 5-point neighborhood considered for the median computation. As mentioned, the median of multilevel data requires sorting (Figure 7a), whereas median of binary values requires only counting (Figure 7b).
2. Rank-Order Filters A straightforward extension of the median filters is the rank-order filter (Justusson, 1981; Heygster, 1982; Nodes and Gallagher, 1982), based on order statistics. Given realizations u1 , u2 , . . . , ud of d random variables, with d ∈ N, d > 0, and r ∈ N, 1 ≤ r ≤ d, the r-th smallest2 element in the samples
sort
(a)
median (b)
threshold
counting
is > = 3?
# of answerYES
FIGURE 7 The threshold decomposition structure guarantees computation of the median based only on counting (and no sorting). (a) Median computation via sorting. (b) Median computation via counting.
2 Instead of the r-th smallest element, a common practice is to consider the r-th largest element. This issue
is clarified later.
Stack Filters: From Definition to Design Algorithms
13
u1 , u2 , . . . , ud is called the r-th order statistic and is denoted u(r) . Thus, u(1) ≤ u(2) ≤ . . . ≤ u(d) . If d is odd, then u((d+1)/2) is the median. The order statistics u(1) and u(d) are, respectively, the minimum and the maximum. By assigning a random variable to each point in the window and positioning it at any location of the input signal domain, the values of the signal under the window can be seen as realizations of those random variables. The rank-order filters are those filters that, instead of the median, outputs the r-th order statistics among the observations, 1 ≤ r ≤ d. This class includes the median filter, r = (d + 1)/2, as a particular case. Applications of rank-order filters include filtering of cell pictures (Heygster, 1982), detection of narrow-band signals (Wong and Chen, 1987), and document image analysis (Ma et al., 2002). The weighted version of the rank-order filters are termed generalized rank-order filters (Wendt et al., 1986) and weighted order statistic (WOS) filters (Yli-Harja et al., 1991). They work as follows: let u1 , u2 , . . . , ud be realizations of d random variables, d ∈ N, d > 0, let = (ω1 , ω2 , . . . , ωd ), ωi ∈ N and ωi > 0 for all i, and let r ∈ N, 1 ≤ r ≤ ωi . Each sample ui is duplicated by its respective weight ωi to obtain a sequence of ωi elements. The filter that outputs the element of rank r from this sequence is the WOS filter with weight and rank r. Note that the term order statistic filters is more commonly used to refer to d aj u(j) , where aj are real coefficients. They are also filters defined by y = j=1
known as L-filters. They generalize the rank-order, moving average, and other filters (see Bovik et al., 1983; Pitas and Venetsanopoulos, 1992). The two basic differences of these filters from WOS filters are: (1) WOS filters first duplicate each observation by the corresponding weight and then compute the order statistics, whereas order statistic filters do the inverse, and (2) weights of WOS filters are positive integers, whereas coefficients of order statistic filters are real numbers. Figure 8 shows an example of a WOS filter. Duplication by a given weight vector can be understood as a mapping to a space of larger dimension. If all elements of same Hamming weight are depicted horizontally side by side, then a WOS filter corresponds to tracing a horizontal line in the expanded lattice diagram and mapping all elements above that line to 1 and all elements below it to 0. This fact is precisely what defines the characterization of WOS filters as a counting (threshold) function, as explained in the following text. In general, determination of the element at a given rank requires that elements be first sorted. Most sorting algorithms have computational complexity of O(d log d). However, for binary variables the element at a given rank can be determined based on counting the number of samples with value 1 (or 0). For instance, given the samples 101101, there are four 1s (and, therefore, two 0s). Thus, considering descending order, it can be
14
Nina S. T. Hirata
11111
01111
Duplication by (1,1,3)
10111
11011
11101
11110
00111 01011
01101
01110
10011
10101
10110
11001
11010
11100
00011
00110
01001
01010
01100
10001
10010
10100
11000
00101
00001
00010
00100
01000
10000
111 00000
011
101
110
001
010
100
Thresholding at 3 111
000 011
101
110
001
010
100
000
FIGURE 8 Weighted order statistic filter.
easily inferred that element 1 occupies the first four ranks and the two last ranks are occupied by element 0. In other terms, for binary inputs u = (u1 , u2 , . . . , ud ) ∈ {0, 1}d , rank function for a given rank r (considering descending order) can be expressed by a counting function as follows:
ψr (u) = 1 ⇐⇒ |u| ≥ r,
(12)
where |u| denotes the Hamming weight of u (i.e., the number of components equal to 1 in vector u). According to this equation, for binary inputs, the median is given by
ψ(d+1)/2 (u) = 1 ⇐⇒ |u| ≥ (d + 1)/2.
(13)
With regard to WOS filters, in the binary domain they also can be expressed as a counting-based function. Let = (ω1 , ω 2 , . . . , ωd ) be a weight vector, a vector of positive integers. Denote d∗ = ωi and let 0 ≤ r∗ ≤ d∗ . d Define the function ψ,r∗ by, for any u ∈ {0, 1} ,
ψ,r∗ (u) = 1 ⇐⇒
ωi ui ≥ r∗ .
According to this, Eq. (12) is a particular case where = (1, 1, . . . , 1).
(14)
15
Stack Filters: From Definition to Design Algorithms
Binary functions that can be expressed in the form of Eq. (14) with arbitrary (non-necessarily positive) integer weights are called linearly separable Boolean functions. If both weights and thresholds (rank) are positive, then they are linearly separable positive Boolean functions (Muroga, 1971). Thus, while stack filters correspond to PBFs, WOS filters correspond to threshold functions (with positive weights and threshold). In addition, as a subclass of the stack filters, WOS (and thus median and rank-order) filters possess the threshold decomposition structure. For a fixed weight vector, different WOS filters may be obtained by varying the threshold. Figure 9 shows five WOS filters generated by weight vector = (1, 1, 3). An interesting question is to determine whether two filters ψ1 ,r1∗ and ψ2 ,r2∗ are identical (Astola et al., 1994). Of more interest might be whether two weight vectors are equivalent in the sense that they generate the same set of filters. Note also that some authors define WOS filters as the ones that duplicate the d input samples by their respective weights and outputs the r-th largest 111 011
101
110
001
010
100
000
Duplication by (1,1,3)
5
111
11111
4 01111
10111
11011
11101
11110
011
101
110
001
010
100
00111 01011 01101 01110 10011 10101 10110 11001 11010 11100
3
000
00011 00101 00110 01001 01010 01100 10001 10010 10100 11000
00001
00010
00100
01000
10000
111 00000
2 1
011
101
110
001
010
100
000
111 011
101
110
001
010
100
000
111 011
101
110
001
010
100
000
FIGURE 9 WOS filters generated by the weight vector = (1, 1, 3).
16
Nina S. T. Hirata
element of the sequence. This implies descending order. However, other authors define the output of the filter as the r-th smallest element, which implies ascending order. This difference may generate some confusion. In general, descending order is adopted because of the convenience of having the threshold value of the threshold function equal to the desired rank. Since WOS are a subclass of the stack filters, not all PBFs can be expressed as Eq. (14). Figure 10 illustrates a positive Boolean function with d = 4 variables that does not correspond to any WOS filter (it is not linearly separable). To see that, consider the six elements with Hamming weight 2 (0011, 0101, 1010, 1100, 0110, 1001). There must be a weight vector (a, b, c, d) such that the first four, when expanded by the weight vector, result in elements with Hamming weight larger than the weight of the two others. More specifically, the following eight inequalities must be satisfied: c+d >b+c c+d >a+d a+c >b+c a+c >a+d
b+d >b+c b+d >a+d a+b>b+c a+b>a+d
It is easy to verify that there are no positive integers that satisfy the above inequalities. Therefore, the filter shown above is not a WOS filter. The number of WOS filters, as well as of stack filters, is not known for a general
1111
0111
0011
1011
0101
0001
1101
0110
1001
0010
0100
0000
FIGURE 10 A positive Boolean function that is not WOS.
1110
1010
1100
1000
Stack Filters: From Definition to Design Algorithms
17
dimension d. Finding the number of monotone Boolean functions on d variables is an open problem known as Dedekind’s problem (Kleitman, 1969; Kleitman and Markowsky, 1975).
D. Relation to Morphological Filters While stack filters have been initially investigated predominantly in the 1D signal-processing context, mathematical morphology has its origin in the study of binary images and their processing modeled respectively as sets and set operators (Serra, 1982). Mathematical morphology is a discipline that, from a practical point of view, is concerned with the development and application of operators that identify and modify particular structural (geometrical) information in images. Such information is identified by probing the image to be processed with structuring elements of different shapes and sizes (Serra, 1982; Soille, 2003). From a theoretical point of view, one of the main concerns is the study of algebraic representation and properties of the operators in the context of lattice theory. Lattice theory is an appropriate framework for the formal study of morphological operators since images can be modeled as elements of complete lattices (Heijmans, 1994; Matheron, 1975; Serra, 1988). Many morphological operators are obtained by composing two basic operators, the erosion and the dilation. In fact, it can be shown that any translation-invariant image operator can be expressed as a supremum of interval operators, which can be expressed in terms of these two operators. The first decomposition results in terms of the basic operators are credited to Matheron (1975), who showed that any increasing operator can be expressed as a supremum of erosions by structuring elements in the kernel of the operator. Maragos (1989) showed the necessary conditions for the existence of a more compact sup-representation, namely, the minimal decomposition as a supremum of erosions by structuring elements in the basis of the operator. These results have been extended to non-necessarily increasing operators by Banon and Barrera (1991, 1993). Although these results hold for any translation-invariant mappings between two complete lattices, hereafter the scope is restricted to binary and gray-level image operators. The specialization of the results mentioned above for binary Woperators (Banon and Barrera, 1991) is described next. Binary morphology is based on set operators. Let S ⊆ E denote a binary image and B ⊆ E be a subset to be called a structuring element. The erosion of S by B is defined, ∀S ∈ P(E), as
εB (S) = {p ∈ E | Bp ⊆ S} =
b∈B
where Bp denotes the set B translated by p.
S−b ,
(15)
18
Nina S. T. Hirata
The dilation of S by B is defined, ∀S ∈ P(E), as
δB (S) = {p ∈ E | Bˇ p ∩ S = ∅} =
Sb ,
(16)
b∈B
where Bˇ denotes the transpose of set B. Given A ⊆ B ⊆ E, [A, B] = {X ⊆ E | A ⊆ X ⊆ B} denotes the interval with extremities A and B. The interval operator, parameterized by an interval [A, B], is defined, ∀S ∈ P(E), as
λ[A, B] (S) = {p ∈ E | Ap ⊆ S ⊆ Bp }.
(17)
They are equivalent to the hit-or-miss operators, denoted H(U, V) , U, V ∈ P(E), and defined as H(U, V) (S) = {p ∈ E: Up ⊆ S and Vp ⊆ Sc }, for any S ∈ P(E). Equivalence is given by the equality λ[A, B] = H(A, Bc ) . An interval operator λ[A,B] can be expressed, ∀S ∈ P(E), as
λ[A, B] (S) = εA (S) ∩ [δBc (S)]c .
(18)
The kernel of a W-operator : P(E) → P(E) is defined as
KW () = {X ∈ P(W) | o ∈ (X)}.
(19)
Note that if W = E, then KW () = K() = {X ∈ P(E) | o ∈ (X)}, the original definition of the kernel (see, for instance, Banon and Barrera, 1991). The basis of is denoted BW () and defined as the set of all maximal intervals contained in the kernel, that is, [A, B] ⊆ KW () is maximal if ∀[A , B ] ⊆ KW () such that [A, B] ⊆ [A , B ] we have [A, B] = [A , B ]. Theorem 2 (Banon and Barrera, 1991). Any W-operator can be expressed uniquely as a union of interval operators, characterized by intervals in its kernel; that is,
=
λ[A, B] | [A, B] ⊆ KW () .
(20)
In terms of its basis, can be expressed as
=
λ[A, B] | [A, B] ∈ B() .
(21)
Stack Filters: From Definition to Design Algorithms
19
In fact, Eq. (20) can be simplified to = λ[A,A] | A ∈ KW () . A simple proof of this equality is provided by Heijmans (1994). If is increasing, then all maximal intervals contained in KW () are of the form [A, E], and hence [δEc (S)]c = [δ∅ (S)]c = ∅c = E, ∀S ∈ P(E). Thus, εA (S) ∩ [δEc (S)]c = εA (S), resulting in the decomposition of as a supremum (union) of erosions. Recalling that p ∈ (S) ⇐⇒ ψ(S−p ∩ W) = 1 [the set operator version of Eq. (4)], and thus o ∈ (X) ⇐⇒ ψ(X) = 1 for all X ∈ P(W), Eq. (19) can be rewritten as KW () = {X ∈ P(W) | ψ(X) = 1}. This characterization of the kernel in terms of the characteristic function ψ establishes the connection between binary W-operators and Boolean functions. In fact, the canonical decomposition in terms of the kernel corresponds to the canonical sum of products form of the corresponding Boolean function, and the minimal decomposition in terms of the basis corresponds to the minimal sum of products form of the Boolean function. An interval operator corresponds to a logic product term. The connection between mathematical morphology and stack filters was discussed by Maragos and Schafer (1987b). Since erosion is an increasing operator, it commutes with thresholding. The erosion of gray-level images by a flat structuring element B can be defined by simply replacing ∩ with min; that is,
[εB ( f )](p) = min{f (q) | q ∈ Bp }.
(22)
Similarly, dilation is given by
[δB ( f )](p) = max{f (q) | q ∈ Bˇ p }.
(23)
Considering descending ordering, gray-level erosion and dilation correspond respectively to ψd and ψ1 (rank filters of ranks d and 1, respectively). As mentioned previously, stack filters are operators that commute with thresholding, or equivalently, they are increasing operators by flat structuring elements. They are also known as flat filters (Heijmans, 1994). Thus, as morphological operators, stack filters can be expressed in the binary domain as a union of binary erosions by structuring elements that are subsets of W or, in the gray-level domain, as the maximum of gray-level erosions with the same structuring elements (Maragos and Schafer, 1987b; Soille, 2002).
20
Nina S. T. Hirata
III. OPTIMAL STACK FILTERS One of the main concerns when designing a filter is to find filters that have good filtering performance on signals of a given domain. The goodness of a filter may be stated in statistical terms by assuming that signals to be processed, as well as their respective ideal filtered signals, are modeled by random processes. It will be assumed that the input (observed, to be processed) signals and the corresponding ideal (desired output) signals are modeled by stationary random processes fi and fo , respectively. More strictly, it is assumed that they form a stationary joint random process (fi , fo ) with joint distribution P(fi , fo ). An optimal filter is one that, given fi , best estimates fo according to some performance measure. Let M be a statistical measure that describes the closeness of (fi ) to fo . Then, a filter opt is optimal with respect to measure M and process (fi , fo ), if Mopt ≤ M for any filter . In the case of stack filters, it is well known that the MAE of a stack filter can be expressed as a linear combination of the MAEs of the corresponding binary filter (Coyle and Lin, 1988). Thus, optimal MAE stack filters can be expressed in terms of optimal MAE PBFs with respect to the cross sections of the multilevel signals. The MAE of stack filters, its relation to the MAE of the corresponding PBF, and the integer linear programming formulation of the problem of finding an optimal MAE stack filter are presented in the subsequent sections.
A. Mean Absolute Error Optimal Stack Filters Let be a stack filter with characteristic function ψ: K W → K and let (fi , fo ) be a pair of observed-ideal jointly stationary random processes with joint distribution P(fi , fo ). The MAE of at a given location p ∈ E with respect to these processes is defined as
MAEp = E ψ(fi−p |W ) − fo (p) ,
(24)
where E[·] denotes the expected value of its argument. Clearly, fi−p |W is a random process with realizations in K W , and fo (p) is a random variable with realizations in K. Due to stationarity, location p is arbitrary. Thus, p may be dropped from fi−p |W and from fo (p), resulting
21
Stack Filters: From Definition to Design Algorithms
respectively in a multivariate random variable g with realizations in K W and a random variable y with realizations in K. The process (g, y) is the local process of (fi , fo ) and its joint distribution is denoted P(g, y). Thus, considering joint stationarity of (fi , fo ) and the local definition of , MAE can be rewritten as
MAE = E[|ψ(g) − y|].
(25)
The expected value in Eq. (25) is with respect to the joint distribution P(g, y). The next two propositions establish the linear relation between the MAE of on multilevel signals and the MAE of on binary signals (obtained by thresholding the multilevel ones). Proposition 1. Let ψ : K W → K, g ∈ K W , and y ∈ K. Then, for any t ∈ K, k k Tt (ψ(g)) − Tt (y) = Tt (ψ(g)) − Tt (y) . t=1
(26)
t=1
If ψ( g) > y, then
Proof. k
Tt (ψ(g)) − Tt (y) =
t=1
y t=1
k 1−1 + Tt (ψ(g)) − Tt (y) , t=y+1
0
and since all terms in the second sum of the right side are non-negative and because | i ai | = i |ai | if ai ≥ 0 for all i, k Tt (ψ(g)) − Tt (y) = t=1 y k = Tt (ψ(g)) − Tt (y) + Tt (ψ(g)) − Tt (y) t=1
t=y+1
k = Tt (ψ(g)) − Tt (y). t=1
Similarly, if ψ( g) ≤ y, then k t=1
Tt (ψ(g)) − Tt (y) =
ψ( g)
t=1
1−1 +
k t=ψ( g)+1
Tt (ψ(g)) −Tt (y) , 0
22
Nina S. T. Hirata
and since all the terms in the second sum at the right side are non-positive k Tt (ψ(g)) − Tt (y) = t=1
=
ψ( g)
Tt (ψ(g)) − Tt (y) + t=1
=
k
Tt (ψ(g)) − Tt (y)
t=ψ( g)+1
k Tt (ψ(g)) − Tt (y) . t=1
Given a process (g, y) as defined above, let g and y denote realizations of g and y, respectively. The binary signal Tt [g] can be regarded as a realization of a binary random vector denoted by Ut and the binary value Tt (y) as a realization of a binary random variable denoted by bt . Proposition 2. Let : K E → K E be a W-operator that commutes with thresholding (hence, a stack filter) characterized by a function ψ: K W → K. Let also (g, y) be as defined above and let (Ut , bt ), t = 1, 2, . . . , k, be the processes corresponding to the cross sections of (g, y). Then
MAE =
k
MAEt ,
(27)
t=1
where MAEt corresponds to the mean absolute error of with respect to the process (Ut , bt ). Proof. MAE =
= E ψ(g) − y
[Eq. (25)]
k
Tt (ψ(g)) − kt=1 Tt (y) (Threshold decomposition) =E t=1 k
=E [ Tt (ψ(g)) − Tt (y) ]
(Rearranging sum)
t=1 k
Tt (ψ(g)) − Tt (y)
=E
t=1
(Proposition 1)
23
Stack Filters: From Definition to Design Algorithms
=
=
k
E Tt (ψ(g)) − Tt (y) (Expected value
t=1
commutes with sum)
k
E ψ(Tt [g]) − Tt (y)
(ψ commutes with thresholding)
t=1
Ut
bt
k
= E ψ(Ut ) − bt t=1
(Rewriting in terms of (Ut , bt ))
MAEt
Notice that the first five equalities hold for any W-operator , not necessarily for stack filters. This proposition shows that the MAE of a stack filter with respect to a random process (fi , fo ) (or equivalently, to its corresponding local process (g, y)) can be expressed as a linear combination (summation) of the MAEs of the filter with respect to each of the binary processes (Ut , bt ) corresponding to the cross sections of (fi , fo ).
B. Equivalent Optimality in the Binary Domain This text section shows the characterization of optimal MAE stack filters in terms of the MAE optimality of the corresponding PBFs. Recalling that (Ut , bt ) denotes the binary joint random process corresponding to the cross sections of (g, y) (the local process of (fi , fo )) at level t and MAEt denotes the MAE of with respect to this process, let C(ψ) = kt=1 MAEt and let Pt (u, b) denote the probability of Ut = u and bt = b (that is, Pt (u, b) = P(Ut = u, bt = b)), where u ∈ {0, 1}d and b ∈ {0, 1}. Using this notation,
C(ψ) =
k
E[|ψ(Ut ) − bt |]
t=1
=
k t=1 u
=
k u t=1
|ψ(u) − b|Pt (u, b)
b
b
|ψ(u) − b|Pt (u, b) .
Cu (ψ)
The term Cu (ψ) is the amount u contributes to C(ψ) =
(28)
k
t=1 MAEt .
24
Nina S. T. Hirata
Since b ∈ {0, 1}, Cu (ψ) can be rewritten, for any Boolean function ψ and u ∈ {0, 1}d , as
Cu (ψ) = ψ(u)
k
Pt (u, 0) + (1 − ψ(u))
t=1
k
Pt (u, 1).
(29)
t=1
Thus
C(ψ) =
Cu (ψ) =
u
=
k
Pt (u, 0) +
{u | ψ(u)=1} t=1
k
Pt (u, 1).
(30)
{u | ψ(u)=0} t=1
The Boolean function that minimizes C(ψ) is obtained by minimizing Cu for each u, i.e., by the Boolean function:
ψopt (u) =
⎧ k k ⎪ ⎪ ⎪ Pt (u, 1) > Pt (u, 0), ⎨ 1, if t=1
t=1
t=1
t=1
k k ⎪ ⎪ ⎪ Pt (u, 1) ≤ Pt (u, 0). ⎩ 0, if
(31)
However, our aim is to minimize MAE. Notice that the equality MAE = C(ψ) holds if ψ is a PBF, but ψopt may not be a PBF. An optimal PBF (that characterizes the optimal stack filter) is the one with the smallest value C among all PBFs. Denoting P(u, b) = kt=1 Pt (u, b), b ∈ {0, 1}, and rewriting C(ψ) in terms of Cu (ψ) given in Eq. (29), it follows that
C(ψ) =
ψ(u)P(u, 0) + (1 − ψ(u)) P(u, 1)
u
=
[ψ(u)P(u, 0) + P(u, 1) − ψ(u)P(u, 1)] u
=
P(u, 1) +
u
(P(u, 0) − P(u, 1))ψ(u).
(32)
u
Thus, since the first sum in the last equality does not depend on ψ, finding a PBF ψ that minimizes C(ψ) is equivalent to finding a PBF ψ that minimizes
C (ψ) =
(P(u, 0) − P(u, 1))ψ(u). u
cu
(33)
Stack Filters: From Definition to Design Algorithms
25
C. Formulation as a Linear Programming Problem Optimal MAE stack filters can be computed by finding a PBF ψ that minimizes the cost C(ψ) defined in Eq. (30) or, equivalently, the cost C (ψ) defined in Eq. (33). As mentioned previously, finding a Boolean function (not necessarily positive) that minimizes those costs is straightforward. However, to guarantee positiveness of the Boolean function, monotonicity constraints must be imposed—the relation ψ(u1 ) ≤ ψ(u2 ) must hold for each pair (u1 , u2 ) ∈ {0, 1}d × {0, 1}d , such that u1 < u2 . To simplify notation, let xu ∈ {0, 1} be a variable corresponding to the value of the Boolean function at u (i.e., xu = ψ(u)), for each element u ∈ {0, 1}d . Using this notation, x corresponds to a vector with 2d components. Consider also cu = P(u, 0) − P(u, 1), the costs relative to individual elements in Eq. (33). Then the MAE stack-filter problem can be formulated as the following integer linear programming (ILP) problem (Coyle and Lin, 1988). Problem 1 (ILP formulation of the optimal MAE stack-filter problem).
min
d −1 2
cu xu
u=0
subject to
xu ≤ xv , if u ≤ v
(34)
xu ≥ 0 xu ≤ 1 xu integer The constraints in Problem 1 can be rewritten as a totally unimodular matrix and since all components in the right side of the inequalities are integers, all basic feasible solutions of Problem 1 are integral. Thus the integrality constraint in Problem 1 can be dropped (see, for instance, Cook et al. [1998]), resulting in: Problem 2 (Relaxation of the ILP in Problem 1).
min
d −1 2
cu xu
u=0
subject to
xu ≤ xv , if u ≤ v xu ≥ 0 −xu ≥ −1
(35)
26
Nina S. T. Hirata
The number of constraints of the form xu ≤ xv can be reduced by considering the transitivity of the partial-order relation. More specifically, u < w and w < v implies that u < v and therefore the third constraint is redundant. Thus, a constraint u < v should be included in the ILP formulation above if and only if there is no w such that u < w < v.
IV. STACK FILTER DESIGN APPROACHES A. Overview Given the joint distribution of the cross sections of input-output signals, an optimal MAE stack filter can be computed by solving the linear programming (LP) problem presented in the previous section. However, the number of variables and the number of constraints in the LP are, respectively, 2d and (d 2d−1 ), increasing exponentially to the window size d. Therefore, for windows of moderate size, solution of the LP problem by naive approaches becomes infeasible. To overcome this limitation, some heuristic approaches that result in suboptimal solutions were proposed in the 1990s. Joint probabilities are estimated from training data (sample pairs of input-output signals). Two major classes of heuristic approaches exist for stack-filter design. The first one, called adaptive algorithms, consists of repeatedly scanning the input data, updating a counting vector, and enforcing monotonicity (Lin et al., 1990; Lin and Kim, 1994; Yoo et al., 1999). The second approach estimates the costs given in Eq. (30) and searches the optimal solution directly on the Boolean lattice (Han and Fan, 1997; Hirata et al., 2000; Lee et al., 1999; T˘abus et al., 1996; T˘abus and Dumitrescu, 1999). Recently Dellamonica et al. (2007) proposed an algorithm for computing an exact solution of the LP problem. They were able to compute the exact solution for problems with window size up to 25. Their approach considers the network flow problem associated with the dual of the LP and strategies to decompose it into smaller subproblems that are solved efficiently. However, it requires a large amount of computer memory. In addition to these two methods, other approaches also have been proposed. Among them are those based on genetic algorithms (Doval et al., 1998; Undrill et al., 1997), neural networks (Zeng, 1996), sample selection probabilities (Doval et al., 1998; Prasad and Lee, 1994; Prasad, 2005; Shmulevich, et al., 2000), and structural approaches (Coyle et al., 1989; Gabbouj and Coyle, 1990; Yin, 1995). Figure 11 shows a taxonomy of the major approaches for stack-filter design. Notice, however, that other approaches for PBF design are not included in the diagram because they do not appear related to stack-filter design in the literature.
Stack Filters: From Definition to Design Algorithms
27
Design approaches
Statistical (Training data)
Heuristic
Structural
Exact
Others Graph search Adaptive FIGURE 11
LP (Minimum cost network flow)
Taxonomy of major stack-filter design approaches.
The following text sections describe the primary ideas of the two classes of heuristic approaches previously mentioned, and the algorithm for exact solution of the ILP problem.
B. Heuristic Solutions The adaptive and the lattice search–based heuristic algorithms that generate suboptimal solutions are described in this section.
1. Adaptive Algorithms The first algorithm of this class was proposed by Lin et al. (1990). The algorithm starts with the null function (which is a PBF). Iterative scanning of training data, and eventually several passes over the training data collection, sequentially updates the initial function in such a manner that it converges to the optimal PBF. More specifically, an array D with 2d positions is kept in memory. This array is indexed by elements of {0, 1}d . For each element u observed in the training data collection, D[u] is incremented (or decremented) depending on the corresponding output value b. Values in any position of D are allowed to vary from 0 to N, where N is some positive integer. At any time, a Boolean function can be obtained from D by setting ψ(u) = T N (D[u]), 2
for u ∈ {0, 1}d . If D[u] ≥ D[v] whenever u ≥ v, then ψ is a PBF. Let (ui , bi ) ∈ {0, 1}d × {0, 1}, i = 1, . . . , m, denote the collection of training data. Note that each pair (ui , bi ) is obtained by thresholding a d-point observation of a multilevel signal. Figure 12 shows the algorithm.
28
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:
Nina S. T. Hirata
D[u] = N/2, for all u ∈ {0, 1}d i=1 repeat if bi == 1 then D[ui ] = min{D[ui ] + 1, N} else D[ui ] = max{D[ui ] − 1, 0} end if Check and, if necessary, enforce monotonicity i = (i mod m) + 1 until convergence Return T N [D] 2
FIGURE 12 The adaptive algorithm proposed by Lin et al. (1990).
Monotonicity enforcement must consider two cases: those in which bi = 1 and those in which bi = 0. When bi = 1, D[ui ] is incremented and it is necessary to check if this increment violates monotonicity. Violation in this case corresponds to D[ui ] becoming larger than D[v] for some v such that |v| = |ui | + 1. If that happens, then D[ui ] and D[v] are swapped, resulting in D[ui ] < D[v]. However, after such swapping, there may exist w such that |w| = |v| + 1 and D[v] > D[w], configuring another monotonicity violation. Thus, monotonicity checking followed by swapping must be carried sequentially toward the largest element in the lattice until no violation exists. Since the longest path in the lattice {0, 1}d has length d, in the worst case d swapping will be necessary. The process of monotonicity enforcement is similar when bi = 0; in this case, swapping advances towards the smallest element in the lattice. Other adaptive algorithms are improvements of the first one. Lin and Kim (1994) proposed a modification to reduce the number of iterations. The modification is based on the observation that, when a multilevel sample is thresholded at the k levels, there are at most d + 2 distinct binary observations. Then, instead of k iterations, only d + 2 iterations are necessary at most. The amount of increment/decrement is directly related to the number of occurrences of each distinct binary signal. With this modification in the original algorithm, they report a speedup of factor 20. The most recent improvement for the algorithm was proposed by Yoo et al. (1999). They introduce a parameter L that corresponds to the enforcement period; that is, enforcements are done at each L increment/decrement iteration. They also make D[u] vary between −N/2 and N/2 and initialize D with zeros. Another significant modification is in the form in which the enforcements are performed. Since several increments/decrements may have been performed in L iterations, there may exist more than one
Stack Filters: From Definition to Design Algorithms
111
111
29
111
011
101
110
011
101
110
011
101
110
001
010
100
001
010
100
001
010
100
000
000
000
FIGURE 13 The three rounds of enforcements as proposed by Yoo et al. (1999) for a lattice of dimension d = 3.
local monotonicity violation. Their approach is highly parallel, allowing simple parallel implementation. It consists of d rounds of (possibly parallel) enforcements, covering all pairs of elements in the lattice that have Hamming distance equal to 1. Figure 13 shows the three rounds of enforcement for d = 3. The pairs of elements that are compared with each other in each round are highlighted by bold arcs linking them. After the three rounds, every pair whose Hamming distance equals 1 have been checked. The updating to enforce monotonicity does not consist of swapping as in the previous two algorithms. Instead, if u < v with D[u] > D[v], then the update performed is D[u] = D[v] =
D[u] + D[v] 2
D[u] + D[v] , 2
where · denotes the greatest integer smaller than or equal to its argument, and · denotes the smallest integer greater than or equal to its argument. Proofs that the enforcement strategy generates PBFs that converge to the optimal PBF as the number of iterations grows are provided in the respective works.
2. Graph Search−Based Algorithms Algorithms in this class perform searches over the graph that correspond to the Boolean lattice {0, 1}d . A formulation proposed by Hirata et al. (2000) provides a unifying framework for those algorithms. This section describes the unifying formulation and how other approaches of this class fit in this formulation.
30
Nina S. T. Hirata
Given a function ψ: {0, 1}d → {0, 1}, let L(ψ) = {u ∈ {0, 1}d | ψ(u) = 0} and U(ψ) = {u ∈ {0, 1}d | ψ(u) = 1}. Obviously, L(ψ) ∪ U(ψ) = {0, 1}d and L(ψ) ∩ U(ψ) = ∅. Thus, {L(ψ), U(ψ)} is a partition of {0, 1}d . A partition {L(ψ), U(ψ)} of {0, 1}d is a (L, U) partition of {0, 1}d if for all u ∈ L(ψ) it satisfies {v ∈ {0, 1}d | v ≤ u} ⊆ L(ψ) (or, equivalently, if for all u ∈ U(ψ) it satisfies {v ∈ {0, 1}d | u ≤ v} ⊆ U(ψ)). It is easy to see that a (L, U) partition of {0, 1}d defines a PBF and, conversely, a PBF defines a (L, U) partition of {0, 1}d . Figure 14 shows a (L, U) partition of {0, 1}4 . Because of the relationship between PBFs and (L, U) partitions, the problem of designing an optimal PBF can be viewed as a problem of finding an optimal (L, U) partition of the lattice. An optimal (L, U) partition of {0, 1}d is the one that minimizes
C(ψ) =
P(u, 0) +
u∈U
P(u, 1),
u∈L
which is simply the cost of Eq. (30) rewritten in terms of (L, U).
1111 Upper set
0111
0011
0101
0001
1011
1101
0110
1001
0010
0100
1110
1010
1100
1000
Lower set 0000 FIGURE 14 A PBF partitions the lattice in two subsets: the L (lower) and U (upper) sets. Elements in the U set (shaded ones) are those mapped to 1, whereas elements in the L set are those mapped to 0. Positiveness implies that no element in the U set lies below any element in the L set (and, equivalently, no element in the L set lies above any element in the U set).
Stack Filters: From Definition to Design Algorithms
31
If just a small portion of the lattice is analyzed at a time, it may be possible to decide in which region of the optimal partition it should d and let c(Z) = belong. Let Z ⊆ {0, 1} u ∈ Z P(u, 0) − P(u, 1). Since C(ψ) = (P(u, 0) − P(u, 1)) ψ(u), it makes sense to set ψ(Z) = 1 (or, equivalently, u to place Z in U) if c(Z) < 0, and to set ψ(Z) = 0 (or, equivalently, to place Z in L) if c(Z) > 0. If c(Z) = 0, it does not matter. Following this reasoning, an (L, U) partition may be build incrementally by finding subsets Z with the above characteristics and placing them in the upper part U or in the lower part L. However, since an element in U implies that any other element lying above it in the lattice must also be in U (and, similarly, an element in L implies that any other lying below it must also be in L), these subsets Z cannot be arbitrary. They must be chosen to maintain the validity of the partition being built. Also, if arbitrary subsets are chosen, it may happen that some elements will be placed in both parts alternatively several times. Thus, it is necessary to guarantee that the process finishes. The subsets that can be placed in U or L will be called feasible sets (they are defined next). The operator \ is the usual set subtraction. Let Q ⊆ {0, 1}d , such that for any u ∈ Qc , either {v ∈ {0, 1}d | u ≤ v} ⊆ Qc or {v ∈ {0, 1}d | v ≤ u} ⊆ Qc (that is, Q is a subset of {0, 1}d obtained by removing some elements from the top and others from the bottom, but none from the “middle”). An (L, U) partition of Q may be defined in a similar way as defined above. Definition 2. Let FU be the class of non-empty subsets F of Q such that (Q\F, F) is a valid partition of Q, and c( F) < 0. A subset F ∈ FU is U feasible if and only if F is minimal in FU relative to ⊆. Definition 3. Let FL be the class of non-empty subsets F of Q such that ( F, Q\F) is a valid partition of U, and c( F) ≥ 0. A subset F ∈ FL is L feasible if and only if F is minimal in FL relative to ⊆. The next theorem states that an optimal partition can be built by successively moving small subsets of Q to one of the regions. Theorem 3. Let F be a feasible set of Q, and (L , U ) be an optimal partition of Q\F. Then, (a) (b)
if F is U feasible, then (L , U ∪ F) is an optimal partition of Q, and if F is L feasible, then (L ∪ F, U ) is an optimal partition of Q.
Proof. See Hirata et al. (2000) The algorithm (see Figure 15) builds the optimal (L, U) partition by iteratively moving feasible sets from Q to one of the parts. It starts with empty upper and lower sets, and then sequentially moves feasible sets from the
32
1: 2: 3: 4: 5: 6:
Nina S. T. Hirata
Q = {0, 1}d L=U=∅ while Q is not empty do Search for a feasible set F in Q. If F is U feasible, then do U ← U ∪ F; if F is L feasible, then do L ← L ∪ F. Do Q ← Q\F. end while Return (L, U)
FIGURE 15 The lattice search algorithm proposed by Hirata et al. (2000).
remaining of Q to one of the regions. Since this is a greedy strategy, in the sense that once a subset is moved, it will never be put back into Q, the algorithm finishes when Q is empty. It can be shown that a non-empty set Q always contains at least one feasible set. It is interesting to notice that only elements with non-null cost need to be considered in the process, resulting in a sparse graph. This may be interesting for relatively large windows, because the number of such elements is likely to be much smaller than the number of nodes in the entire lattice. However, adequate data structures need to be considered in order to build and traverse the graph efficiently. Notice that given the costs, there is an inherent optimal BF that is not necessarily a PBF [see Eq. (31)]. If the optimal BF ψopt is not positive, that indicates that there are at least two elements u and v such that u < v and ψopt (u) > ψopt (v) (i.e., ψopt (u) = 1 and ψopt (v) = 0). These elements are said to be in the inversion set. There are two possibilities to “fix” ψopt in order to make it positive: (1) switch ψopt (u) from 1 to 0 or (2) switch ψopt (v) from 0 to 1. If there exists more than two elements in the inversion set, then the possible number of switchings is usually much larger. Finding an optimal (L, U) partition can be understood as finding the best set of switchings—the one that results in a PBF with smallest overall cost. Any set of valid switchings (i.e., one that results in a positive function) determines a valid (L, U) partition of the lattice. Similarly, a valid (L, U) partition determines a set of switchings. It is clear that, in considering how to switch values of ψopt at different elements in the lattice, only elements in the inversion set need to be processed. Therefore, to find an optimal (L, U) partition, set Q in Line 1 of the algorithm in figure 15 can be initialized only with those elements in the inversion set. All elements above any element of Q must be placed in U, and all elements below any element of Q must be placed in L in the second line of the algorithm. In practice, finding feasible sets is not a trivial task. Hirata et al. (2000) propose searching for feasible sets with one minimal/maximal element first and, in case none is found, searching for feasible sets with two minimal/maximal elements, and so on. The maximum number of minimal/maximal elements in the feasible sets is a parameter of the
Stack Filters: From Definition to Design Algorithms
33
algorithm. Thus, by not searching for feasible sets with minimal/maximal elements larger than this parameter the algorithm may miss the optimal solution. Other lattice search algorithms may be fit in the above formulation as discussed next. The approach proposed by Lee et al. (1999) considers an initial empty upper region. At each iteration the smallest subset with “negative cost that obey the stacking property” is moved to the upper region, until no such subset is found. The smallest subset with negative cost is equivalent to U-feasible sets defined above. L-feasible sets are not considered in their work. Tabus et al. (1996) propose an approach in which the inversion set is computed first (their inversion sets are called undecided sets) and then the LP restricted to the inversion set is solved. However, if the inversion set is relatively large, resolution of the associated LP problem becomes computationally infeasible. To address large inversion sets, the inversion set size (and thus, of the associated LP) is reduced by removing some easily detectable feasible sets (for instance, the feasible sets with one minimal/maximal element) from it (T˘abus¸ and Dumitrescu, 1999). Another approach that may be fit in this formulation is the one proposed by Han and Fan (1997). In their approach, only one element is moved to the upper region at a time. Among the elements that can be moved to the upper region (to preserve the validity of the partition) the one with the largest negative cost is preferred (this would correspond to a unitary U-feasible set). If no such candidate exists, then all candidates are added into a queue and processed afterward (an element can be moved to the upper region only if all elements larger than it have already been moved). Every time an element is moved to the upper region, a new valid partition is configured. In particular, every time a negative cost element is moved to the upper region, the respective new partition may correspond to the optimal PBF and thus its MAE should be compared to the minimum found so far. The process must be repeated until no negative cost elements are left in the unprocessed part of the diagram.
C. Optimal Solution Recall that the problem of designing an optimal MAE stack filter can be formulated as an LP problem (see Section III.C). The solution of the dual of an LP problem allows solution of the original LP problem. Thus, a common practice for solving LP problems is to solve their respective duals. The LP formulation of the optimal MAE stack-filter design problem is closely related to flow models (first suggested by Gabbouj and Coyle (1991).) Recently, Dellamonica et al. (2007) showed that the dual of the LP relaxation (Problem 2) corresponds to the LP formulation of a
34
Nina S. T. Hirata
minimum-cost network flow (MCNF) problem. In an MCNF problem, networks are modeled as directed graphs with costs associated with the arcs and demands associated with vertices. A feasible flow in the network is an assignment of values to the arcs that satisfy the demand; that is, the amount of flow in the arcs entering the vertex minus the amount of flow in arcs leaving the vertex must equal the vertex demand, for every vertex. The total cost of a flow is the sum of the flow in the arcs multiplied by their respective costs. The MCNF is a feasible flow with minimum cost. An MCNF problem can be solved by the network simplex algorithm, an efficient specialization of the original simplex algorithm. It can be shown that there is always a tree solution to an MCNF problem. The network simplex algorithm starts with an initial tree solution and at each iteration finds an improved tree solution by adding a new arc and removing another in such a way as to not increase the cost of the solution. However, the graph associated with the MCNF problem may be very large. To overcome this difficulty, Dellamonica et al. (2007) propose a strategy that decomposes the problem into smaller subproblems. According to the proposed decomposition principle, once an optimal solution is found for a subproblem (defined on a subset of the whole lattice), it partially defines an optimal solution for the entire lattice. In other words, there exists an optimal solution for the entire lattice that, when restricted to the domain of the subproblem, exactly matches the solution to the subproblem. The subproblems in the proposed decomposition strategy correspond to solving the MCNF restricted to subsets that are ideals of the lattice. According to Dellamonica et al., a subset I ⊆ {0, 1}d is an ideal3 of the lattice {0, 1}d if for all u ∈ I the relation {v ∈ {0, 1}d | v ≤ u} ⊆ I holds. Thus, the main steps of the algorithm are as follows: (1) generate an ideal, (2) solve the associated MCNF problem, (3) fix the values of the solution for the elements in the ideal, and (4) consider a larger ideal, until the whole lattice is covered. A key point exploited during these iterations is a simple extension of a tree solution corresponding to the smaller ideal to a feasible solution of the larger ideal. Details may be found in their work (Dellamonica et al., 2007). In order to find an optimal solution for an MCNF problem, given a feasible tree solution, the algorithm must find an arc in the graph to enter the solution in such a way as to decrease the total cost. Since the graph associated with the MCNF problem may be huge, it is not feasible to store the entire graph in memory. A solution to this difficulty consists of keeping only the tree solution and generating the candidate arcs only when they 3 Lattice ideals are usually defined as subsets that satisfy the property described in the text and also that
are closed under the supremum operation—if u, v ∈ I , then u + v ∈ I . The + operation in this case is the logical bitwise operation OR.
Stack Filters: From Definition to Design Algorithms
35
are needed. It is shown that some particularities of the problem allow a simple characterization of the candidate arcs. Again, details may be found in their work (Dellamonica et al., 2007). This algorithm has some similarities to the graph search algorithm based on feasible sets described previously. A first similarity is the fact that subproblems may be related to ideals at the bottom or top (in this case, called sup-ideals and defined similarly to ideals) of the lattice, allowing the optimal solution to be defined gradually for elements at the top and bottom parts of the lattice. A second similarity is the decomposition principle: in the graph search algorithm, once a feasible set is moved, the final solution for the elements in that set is fixed; the same happens to the elements in an ideal once the associated MCNF problem is solved. The code of the algorithm is available at the web page http:// www.vision.ime.usp.br/nonlinear/stackfd.
V. APPLICATION EXAMPLES A. Design Procedure This section describes a procedure for optimal MAE stack-filter design from training data. Consider given a window W of size d and a set of training data {(fi 1 , fo 1 ), (fi 2 , fo 2 ), . . . , (fi m , fo m )} with m pairs of observedideal signals. Then, the design procedure consists of the following three steps: 1. Estimate P(u, b) = kt=1 Pt (u, b), u ∈ {0, 1}d , from the training data. 2. Compute cu for each observed pattern u from the probabilities estimated in step (1). If u has not been observed, then consider cu = 0. 3. Apply an algorithm that finds a PBF ψ that minimizes C (ψ) [see Eq. (33)]. The probabilities in Step (1) involve probabilities for each threshold level. To estimate them, let • Nt be the number of observations through W in the cross sections at level t of the observed images • Nt (u, 1) be the number of times u is observed in the cross sections at level t with ideal value b = 1 • Nt (u, 0) be the number of times u is observed in the cross sections at level t with ideal value b = 0 The probabilities Pt (u, 1) and Pt (u, 0) are estimated, respectively, by
Nt (u, 1) , Pˆ t (u, 1) = Nt
36
Nina S. T. Hirata
and
Nt (u, 0) Pˆ t (u, 0) = . Nt (u,1) Since Nt = N for all t ∈ K (N being a positive integer), kt=1 NtN = t k 1 1 t=1 Nt (u, 1), and the constant N can be dropped. As a consequence, N there is no need to estimate the joint probabilities Pt (u, b) for each of the threshold levels. This explains why occurrences of (u, b) on different cross ˆ sections can be pooled and just a single value P(u, b) computed.
B. Examples Three application examples are presented. The filters have been designed using the stackfd algorithm (see Section IV.C and Dellamonica et al., 2007) implementation available at http://www.vision.ime.usp.br/nonlinear /stackfd. This text section presents some examples of filters that can be obtained by training algorithms and does not evaluate the performance of the algorithms. Performance details of the main design algorithms can be found in the corresponding papers (optimal solution algorithm, Dellamonica et al., 2007) and best heuristic algorithm (Yoo et al., 1999). The first example considers images corrupted with salt-and-pepper and dropout noise. Figure 16 shows an image without noise, whereas
FIGURE 16 Gray-level (ideal) image “boat.”
Stack Filters: From Definition to Design Algorithms
FIGURE 17
37
Test image “boat” (MAE = 14.5995).
Figure 17 shows a corrupted image. The noise consists of 5% additive and 5% subtractive impulse noise (both with amplitude 200, and maximum and minimum saturated at 255 and 0, respectively) plus horizontal line segments of intensity 255 with probability of occurrence 0.35%, with length following a normal distribution with mean 5 and variance of 49 pixels. One pair of noisy-ideal images has been used to compute a 3 × 3 and a 21-point (5 × 5 without the four corner points) window stack filters. The test image is an independent realization of the same noise type. Figure 18 shows the output of the optimal 21-point window stack filter for the test image shown in Figure 17. Pixels at positions where the window does not fit entirely in the image domain have not been processed by the filter. Output value for these pixels has been set to 0. The MAE values were computed disregarding these pixels. Figure 19 shows the output of the 3 × 3 window stack filter for the same test image. To contrast with the effects of the median filter, Figure 20 shows the output of the 3 × 3 window median filter for the same test image. Observe that the median tends to blur more than the optimal stack filter. For the same type of noise, stack filters trained with a given image tend to work for other images, not necessarily similar to the ones used to design the filter. Figure 21 shows the effect of the previous 21-point window stack filter on another image, with an independent realization of the same type of noise.
38
Nina S. T. Hirata
FIGURE 18 Output of the d = 21 optimal stack filter (MAE = 2.4411).
FIGURE 19 Output of the optimal 3×3 stack filter (MAE = 3.0499).
Stack Filters: From Definition to Design Algorithms
FIGURE 20
39
Output of the 3×3 median filter (MAE = 5.1266).
The second example considers salt-and-pepper noise. Figure 22 shows the filtering effect of the 5 × 3 window optimal stack filter computed from one pair of training images. The noise consists of 3% salt-and-pepper noise (with 3% additive and 3% subtractive impulse noise, both with amplitude 200, and maximum and minimum saturated at 255 and 0, respectively). The third example considers the effect of increasing window sizes. Binary images are considered to allow better perception. Figure 23 shows a noisy image, the respective ideal output, and a sequence of outputs obtained from filters based on increasing window sizes. As can be seen, as the window size increases, the respective MAE decreases. In fact, if actual costs were used, this behavior would always be true, with MAE becoming constant eventually, but never increasing. However, in practice, since filters are computed from estimated costs, from some point MAE starts to increase due to estimation imprecision (this is known as the curse of dimensionality).
VI. CONCLUSION An extensive overview of stack-filters, including their definition, some of their properties, relation to morphological operators, the fact that they
40
Nina S. T. Hirata
(a)
(b) (c) FIGURE 21 Effect of the 21-point window optimal stack filter trained with the “boat” images on a different image with same type of noise. (a) Ideal image. (b) Test (MAE = 14.2919). (c) Filtered (MAE = 2.3418).
are a generalization of median, rank-order filters and their variations, characterization of MAE on multilevel signals in terms of the MAEs of binary cross sections of the multilevel signals, main design approaches from training data, and some application examples have been presented. One of the most important properties of the class of stack filters is its equivalence to the class of positive Boolean functions. Another important result is the MAE theorem that relates the MAE of a stack filter with respect to multilevel signals to a linear combination of the MAEs of the corresponding PBF with respect to the binary cross sections of these multilevel signals.
Stack Filters: From Definition to Design Algorithms
(a)
(b)
(c) FIGURE 22 Filtering of salt-and-pepper noise: effect of the 5×3 window optimal stack filter. (a) Ideal image. (b) Test (MAE = 7.2031). (c) Filtered (MAE = 2.2941).
41
42
Nina S. T. Hirata
(a)
(b)
(c)
(d)
(e) (f) FIGURE 23 Binary image filtering: effect of increasing window sizes. (a) Test (MAE = 0.1051). (b) Ideal image. (c) d = 9 (MAE = 0.0259). (d) d = 15 (MAE = 0.0129). (e) d = 21 (MAE = 0.0087). (f) d = 24 (MAE = 0.0083).
These two results allow the reduction of the problem of designing stack filters to the problem of designing PBFs. Design of PBFs can be modeled as a linear programming problem. However, the number of variables and constraints in the problem is exponential
Stack Filters: From Definition to Design Algorithms
43
to the window size. This fact kept the problem unsolvable in conventional computers until recently. Several heuristic solutions have been proposed to overcome such limitation. The heuristic design approaches covered in this chapter are adaptive algorithms and those based on graph search techniques, both suboptimal approaches. A recently proposed algorithm that provides an exact optimal solution has also been described. The algorithm that produces an optimal solution was reported to have solved problem instances for d = 25 as fast as or even faster than the fastest heuristic algorithms. The optimality and speed is achieved at the expense of significantly higher memory requirement. Solving an instance for d = 25 requires ∼ 3.5 Gb of memory, whereas the fastest heuristic algorithms require ∼ 250 Kb. This fact makes heuristic solutions still very attractive. However, although heuristic solutions do converge on the optimal solution, there is no knowledge on how fast it converges. From a practical perspective, it is necessary to fix the number of iterations or to stop iterating when the decrease in the error between two successive iterations becomes negligible. The existence of an efficient algorithm for the computation of optimal solutions makes possible the realization of experimental researches in order to investigate the convergence behavior of the iterative heuristic algorithms (how fast they converge, how the type of noise affects convergence speed, etc). As mentioned, the computation of the optimal solution requires large memory space. The memory requirement will probably be satisfied by technological advances in hardware components, allowing larger instances to be solved. This may give a false impression that there are no challenges left for the design of stack filters. However, the main issue in the process of designing image operators from training data that still needs to be addressed is the difficulty in obtaining training data. This limitation affects the precision of statistical estimation and, consequently, performance of the designed filter. While it is relatively easy to edit images and produce the ideal outputs for binary images, the same task is much more complex for gray-level images. In practice, given a fixed amount of training data, there is a maximum window size that corresponds to minimum error; operators designed for windows larger than that will present a bigger error. This phenomenon is known as the curse of dimensionality and it is due to overfitting (excessive adjustment to training data, while training data do not reflect the true distribution with high precision). A possible means of improving performance of the designed filters with respect to error, for a fixed amount of training data, is to consider multilevel training. At each training level, filters are designed on moderate size and distinct windows and, at the last level, these filters are composed, resulting in a multilevel filter that ultimately depends on a larger window.
44
Nina S. T. Hirata
Aside from the issue related to precision of estimations from training data, knowledge on stack filters has already reached a very mature stage, including efficient and interesting design algorithms. This work, by reporting a broad overview of this knowledge, may contribute to the dissemination of the use of stack filters together with these algorithms to promote advances in knowledge related to the precision issue above and to the development of new classes of nonlinear filters and design algorithms.
ACKNOWLEDGMENTS N. S. T. Hirata acknowledges partial support from CNPq (Brazil), grant 312482/2006-0.
REFERENCES Astola J. T., Alaya-Cheikh, F., and Gabbouj, M. (1994). When are two weighted order statistic filters identical? In “Nonliner Image Processing V,” vol. 2180, no. V (E. R. Dougherty, J. T. Astola, and H. G. Longbotham, eds.), Proc. IS & T/SPIE Symposium on Electronic Imaging Science & Technology, pp. 45–54, San Jose, California, February 6–10, 1994. Banon, G. J. F., and Barrera, J. (1991). Minimal representations for translation-invariant set mappings by mathematical morphology. SIAM J. Appl. Math., 51(6):1782–1798. Banon, G. J. F., and Barrera, J. (1993). Decomposition of mappings between complete lattices by mathematical morphology, part I. General lattices. Signal Process. 30, 299–327. Bovik, A. C. (1987). Streaking in median filtered images. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-35, 493–503. Bovik, A. C., Huang, T. S., and Munson D. C. Jr. (1983). A generalization of median filtering using linear combinations of order statistics. IEEE Trans. Acoust. Speech Signal Process. 31, 1342–1350. Bovik, A. C., Huang, T. S., and Munson D. C. Jr. (1987). The effect of median filtering on edge estimation and detection. IEEE Trans. Pattern Anal. 9, 191–194. Brownrigg, D. R. K. (1984). The weighted median filter. Commun. ACM 27, 807–818. Carla, R., Sacco, V. M., and Baronti, S. (1986). Digital techniques for noise reduction in APT NOAA satellite images. In Proc. IGARSS 86, 995–1000. Cook, W. J., Cunningham, W. H., Pulleyblank, W. R., and Schrijver, A. (1998). Combinatorial Optimization. John Wiley & Sons, New York. Coyle, E. J., and Lin, J.-H. (1988). Stack filters and the mean absolute error criterion. IEEE Trans. Acoust. Speech Signal Process. 36, 1244–1254. Coyle, E. J., Lin, J.-H., and Gabbouj, M. (1989). Optimal stack filtering and the estimation and structural approaches to image processing. IEEE Trans. Acoust. Speech Signal Process. 37, 2037–2066. Dellamonica, D. Jr., Silva, P. J. S., Humes, C. Jr., Hirata, N. S. T., and Barrera, J. (2007). An exact algorithm for optimal MAE stack filter design. IEEE Trans. Image Process. 16, 453–462. Dougherty, E. R., and Astola, J. T., editors. (1999). Nonlinear Filters for Image Processing. The International Society for Optical Engineering and IEEE Press, New York. Doval, A. B. G., Mohan, A. K., and Prasad, M. K. (1998). Evolutionary algorithm for the design of stack filters specified using selection probabilities. In Adaptive Computing in Design and Manufacture. Fitch, J. P., Coyle, E. J., and Gallagher, N. C. Jr. (1984). Median filtering by threshold decomposition. IEEE Trans. Acoust. Speech Signal Process. ASSP-32, 1183–1188.
Stack Filters: From Definition to Design Algorithms
45
Frieden, B. (1976). A new restoring algoritm for the preferential enhancement of edge gradients. J. Opt. Soc. Am. 66, 280–283. Gabbouj, M., and Coyle, E. J. (1990). Minimum mean absolute error stack filtering with structural constraints and goals. IEEE Trans. Acoust. Speech Signal Process. 38, 955–968. Gabbouj, M., and Coyle, E. J. (1991). On the LP which finds a MMAE stack filter. IEEE Trans. Signal Process. 39, 2419–2424. Gallagher, N. C. Jr., and Wise, G. L. (1981). A theoretical analysis of the properties of median filters. IEEE Trans. Acoust. Speech Signal Process. ASSP-29, 1136–1141. Gilbert, E. N. (1954). Lattice-theoretic properties of frontal switching functions. J. Math. Phys. 33, 57–67. Grochulski, W., Mitraszewski, P., and Penczek, P. (1985). Application of combined medianaveraging filters to scintigraphic image processing. Nucl. Med. 24, 164–168. Han, C.-C., and Fan, K.-C. (1997). Finding of optimal stack filter by graphic searching methods. IEEE Trans. Signal Process. 45, 1857–1862. Haralick, R. M., Sternberg, S. R., and Zhuang, X. (1987). Image analysis using mathematical morphology. IEEE Trans. Pattern Anal. PAMI-9, 532–550. Heijmans, H. J. A. M. (1994). Morphological Image Operators. Academic Press, Boston. Heygster, G. (1982). Rank filters in digital image processing. Comput. Graph. Image Process. 19, 148–164. Hirata, N. S. T., Dougherty, E. R., and Barrera, J. (2000). A switching algorithm for design of optimal increasing binary filters over large windows. Pattern Recogn. 33, 1059–1081. Huang, T. S., editor (1981). Two-Dimensional Digital Signal Processing II: Transforms and Median Filters. Springer-Verlag, New York. Jayant, N. (1976). Average and median-based smoothing techniques for improving digital speech quality in the presence of transmission errors. IEEE Trans. Commun. 24, 1043–1045. Justusson, B. I. (1981). Median filtering: Statistical properties. In “Topics in Applied Physics, Two-Dimensional Digital Signal Procesing II”. (Huang, T.S., ed.). Springer-Verlag, New York. Kleitman, D. J. (1969). On Dedekind’s problem: The number of monotone boolean functions. Proc. Am. Math. Soc. 21, 677–682. Kleitman, D. J., and Markowsky, G. (1975). On Dedekind’s problem: The number of isotone Boolean functions II. Trans. Am. Math. Soc. 213, 373–390. Ko, S.-J., and Lee, Y. H. (1991). Center weighted median filters and their applications to image enhancement. IEEE Trans. Circuits Sys. 38, 984–993. Lee, W.-L., Fan, K.-C., and Chen, Z.-M. (1999). Design of optimal stack filters under the MAE criterion. IEEE Trans. Signal Process. 47, 3345–3355. Lee, Y. H., and Kassam, S. A. (1985). Generalized median filtering and related nonlinear filtering techniques. IEEE Trans. Acoust. Speech Signal Process. ASSP-33, 672–683. Lin, J.-H. and Kim, Y.-T. (1994). Fast algorithms for training stack filters. IEEE Trans. Signal Process. 42(4): 772–781. Lin, J.-H., Sellke, T. M., and Coyle, E. J. (1990). Adaptive stack filtering under the mean absolute error criterion. IEEE Trans. Acoust. Speech Signal Process. 38, 938–954. Loupas, T., McDicken, N., and Allan, P. I. (1987). Noise reduction in ultrasonic images by digital filtering. Br. J. Radiol. 60, 389–392. Ma, H., Zhou, J., Ma, L., and Tang, Y. Y. (2002). Order statistic filters (OSF): A novel approach to document analysis. Int. J. Pattern Recog. 16, 551–571. Maragos, P. (1989). A representation theory for morphological image and signal processing. IEEE Trans. Pattern Anal. 11, 586–599. Maragos, P., and Schafer, R. W. (1987a). Morphological filters: Part I: Their set-theoretic analysis and relations to linear shift-invariant filters. IEEE Trans. Acoust. Speech Signal Process. 35, 1153–1169.
46
Nina S. T. Hirata
Maragos, P., and Schafer, R. W. (1987 b). Morphological filters: Part II: Their relations to median, order statistic, and stack-filters. IEEE Trans. Acoust. Speech Signal Process. 35, 1170–1184 (corrections in ASSP 37, April 1989, p. 597 ). Marshall, S. and Sicuranza, G. L., eds. (2006). Advances in Nonlinear Signal and Image Processing. EURASIP Book Series on SP&C. Hindawi Publishing Corporation, New York. Matheron, G. (1975). Random Sets and Integral Geometry. John Wiley, New York. Mitra, S. and Sicuranza, G., eds. (2000). Nonlinear Image Processing. Academic Press, New York. Muroga, S. (1971). Threshold Logic and Its Applications. Wiley, New York. Narendra, P. M. (1981). A separable median filter for image noise smoothing. IEEE Trans. on pattern Analysis and machine Intelligence, 3(1), 20–29. Nieminen, A., Heinonen, P., and Neuvo, Y. (1987). A new class of detail-preserving filters for image processing. IEEE Trans. Pattern Anal. 9, 74–90. Nodes, T. A., and Gallagher, N. C. (1982). Median filters: Some modifications and their properties. IEEE Trans. Acoust. Speech Signal Process. 30, 739–746. Pitas, I., and Venetsanopoulos, A. N. (1990). Nonlinear Digital Filters–Principles and Applications. Kluwer Academic Publishers, Amsterdam. Pitas, I., and Venetsanopoulos, A. N. (1992). Order statistics in digital image processing. Proc. IEEE 80, 1893–1192. Prasad, M. K. (2005). Stack filter design using selection probabilities. IEEE Trans. Signal Process. 53, 1025–1037. Prasad, M. K., and Lee, Y. H. (1989). Weighted median filters: Generation and properties. In IEEE International Symposium on Circuits and Systems 1, pp. 425–428. Prasad, M. K., and Lee, Y. H. (1994). Stack filters and selection probabilities. IEEE Trans. Signal Process. 42, 2628–2643. Pratt, W. K. (1978). Digital Image Processing. Wiley Interscience, New York. Rabiner, L. R., Sambur, M. R., and Schmidt, C. E. (1975). Applications of nonlinear smoothing algorithm to speech processing. IEEE Trans. Acoust. Speech Signal Process. 23, 552–557. Rosenfeld, A., and Kak, A. C. (1982). Digital Picture Processing, vol. 2. Academic Press, New York. Schmitt, R. M., Meyer, C. R., Carson, P. L., and Samuels, B. I. (1984). Median and spatial low-pass filtering in ultrasonic computed tomography. Med. Phys. 11, 767–771. Scollar, I., Weidner, B., and Huang, T. S. (1984). Image enhancement using the median and the interquantile distance. Comput. Vision Graphics Image Processing 25, 236–251. Serra, J. (1982). Image Analysis and Mathematical Morphology. Academic Press, New York. Serra, J. (1988). Image Analysis and Mathematical Morphology. vol 2: Theoretical Advances. Academic Press, New York. Serra, J., and Vincent, L. (1992). An overview of morphological filtering. Circuits Systems Signal Process. 11, 47–108. Shmulevich, I., Melnik, V., and Egiazarian, K. (2000). The use of sample selection probabilities for stack filter design. IEEE Signal Process. Lett. 7, 189–192. Soille, P. (2002). On morphological operators based on rank filters. Pattern Recogn. 35, 527–535. Soille, P. (2003). Morphological Image Analysis, 2nd ed. Springer-Verlag, Berlin. Sun, T., Gabbouj, M., and Neuvo, Y. (1994). Center weighted median filters: Some properties and their applications in image processing. Signal Process. 35, 213–229. T˘abu¸s, I., and Dumitrescu, B. (1999). A new fast method for training stack filters (Çetin, ¨ un, ¨ A., Gurcan, M. N., and Yardimci, Y., eds.), pp. 511–515. In IEEEA. E., Akarun, L., Ertuz EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP’99), Antalya, Turkey. T˘abu¸s, I., Petrescu, D., and Gabbouj, M. (1996). A training framework for stack and Boolean filtering—fast optimal design procedures and robustness case study. IEEE Trans. Image Process. 5, 809–826. Tukey, J. (1974). Nonlinear (nonsuperposable) methods for smoothing data. In Cong. Rec., EASCON, page 673. Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, MA.
Stack Filters: From Definition to Design Algorithms
47
Tyan, S. G. (1982). Median filtering: Deterministic properties. In Digital Signal Processing II. Transforms and Median Filters (Huang, T. S., ed.) Springer-Verlag, New York. Undrill, P. E., Delibasis, K., and Cameron, G. G. (1997). Stack filter design using a distributed pararell implementation of genetic algorithms. J. UCS 3, 821–834. Wecksung, G., and Campbell, K. (1974). Digital image processing at EC&G. Computer 7, 63–71. Wendt, P. D., Coyle, E. J., and Gallagher, N. C. Jr. (1986). Stack filters. IEEE Trans. Acoust. Speech Signal Process. 34, 898–911. Wong, K. M., and Chen, S. (1987). Detection of narrow-band sonar signals using order statistical filters. IEEE Trans. Acoust. Speech Signal Process. 35, 597–613. Yin, L. (1995). Stack filter design: A structural approach. IEEE Trans. Signal Process. 43, 831–840. Yin, L., Yang, R., Gabbouj, M., and Neuvo, Y. (1996). Weighted median filters: A tutorial. IEEE Trans. Circuits Syst. II: Analog and Digital Signal Processing 43, 157–192. Yli-Harja, O., Astola, J. T., and Neuvo, Y. (1991). Analysis of the properties of median and weighted median filters using threshold logic and stack filter representation. IEEE Trans. Signal Process. 39, 395–410. Yoo, J., Fong, K. L., Huang, J. -J., Coyle, E. J., and Adams, G. B. III. (1999). A fast algorithm for designing stack filters. IEEE Trans. Image Process. 8, 1014–1028. Zeng, B. (1996). Design of optimal stack filters: a neural net approach with bp algorithm. In IEEE International Conference on Systems, Man, and Cybernetics 4, 2762–2767.
This page intentionally left blank
CHAPTER
2 The Foldy–Wouthuysen Transformation Technique in Optics Sameen Ahmed Khan*
Contents
I Introduction II The Foldy–Wouthuysen Transformation III Quantum Formalism of Charged-Particle Beam Optics IV Quantum Methodologies in Light Beam Optics V Conclusion Appendix A Appendix B Acknowledgments References
49 51 58 60 62 64 66 73 74
I. INTRODUCTION The Foldy–Wouthuysen transform is widely used in high-energy physics. It was historically formulated by Leslie Lawrence Foldy and Siegfried Adolf Wouthuysen in 1949 to understand the nonrelativistic limit of the Dirac equation, the equation for spin-1/2 particles (Foldy and Wouthuysen, 1950; Foldy, 1952; see also Pryce, 1948; Tani, 1951; see Acharya and Sudarshan, 1960 for a detailed general discussion of the Foldy–Wouthuysen-type transformations in particle interpretation of relativistic wave equations). The approach of Foldy and Wouthuysen used a canonical transform that has come to be known as the Foldy–Wouthuysen transformation (a brief account of the history of the transformation is found in the obituaries of Foldy and Wouthuysen [Brown et al., 2001; Leopold, 1997] and the biographical memoir of Foldy [2006]). Before their work, there was some difficulty in understanding and gathering all the interaction terms of a * Engineering Department, Salalah College of Technology, Salalah, Sultanate of Oman Advances in Imaging and Electron Physics,Volume 152, ISSN 1076-5670, DOI: 10.1016/S1076-5670(08)00602-2. Copyright © 2008 Elsevier Inc. All rights reserved.
49
50
Sameen Ahmed Khan
given order, such as those for a Dirac particle immersed in an external field. Their procedure clarified the physical interpretation of the terms, and it became possible to apply their work systematically to a number of problems that had previously defied solution (see Bjorken and Drell, 1964; Costella and McKellar, 1995, for technical details). The Foldy– Wouthuysen transform was extended to the physically important cases of the spin-0 and the spin-1 particles (Case, 1954) and was even generalized to the case of arbitrary spins ( Jayaraman, 1975). The powerful machinery of the Foldy–Wouthuysen transform has found applications in very diverse areas, such as atomic systems (Asaga et al., 2000; Pachucki, 2004); synchrotron radiation (Lippert et al., 1994), and derivation of the Bloch equation for polarized beams (Heinemann and Barber, 1999). The application of the Foldy–Wouthuysen transformation in acoustics is very natural; comprehensive and mathematically rigorous accounts can be found in Fishman (1994, 2004), Orris and Wurmser (1995), and Wurmser (2001, 2004). For ocean acoustic see Patton (1986). In the traditional scheme, the purpose of expanding the light optics = −(n2 (r) − Hamiltonian H p2⊥ ) as the expansion p2⊥ )1/2 in a series using ( 12 n0
parameter is to understand the propagation of the quasiparaxial beam in terms of a series of approximations (paraxial + nonparaxial). A similar situation is the case of charged-particle optics. In relativistic quantum mechanics a similar problem exists of understanding the relativistic wave equations as the nonrelativistic approximation plus the relativistic correction terms in the quasirelativistic regime. For the Dirac equation (which is first order in time) this is done most conveniently by using the Foldy–Wouthuysen transformation, leading to an iterative diagonalization technique. The main framework of the newly developed formalisms of optics (both light optics and charged-particle optics) is based on the transformation technique of the Foldy–Wouthuysen theory, which casts the Dirac equation in a form displaying the different interaction terms between the Dirac particle and an applied electromagnetic field in a nonrelativistic and easily interpretable form. In the Foldy–Wouthuysen theory the Dirac equation is decoupled through a canonical transformation into two two-component equations: one reduces to the Pauli equation (Osche, 1977) in the nonrelativistic limit, and the other describes the negative-energy states. There is a close algebraic analogy between (1) the Helmholtz equation (governing scalar optics) and the Klein–Gordon equation and (2) the matrix form of Maxwell’s equations (governing vector optics) and the Dirac equation. Thus, it is logical to use the powerful machinery of standard quantum mechanics (particularly, the Foldy–Wouthuysen transform) in analyzing these systems. The suggestion to use the Foldy–Wouthuysen transformation technique in the Helmholtz equation was first mentioned in the literature as a remark
The Foldy–Wouthuysen Transformation Technique in Optics
51
(Fishman and McCoy, 1984). The same idea was independently outlined by Jagannathan and Khan (see pp. 277 in Jagannathan and Khan, 1996). Only in recent works has this idea been exploited to analyze the quasiparaxial approximations for specific beam optical systems (Khan, 2005a; Khan et al., 2002). The Foldy–Wouthuysen technique is ideally suited for the Lie algebraic approach to optics. With all these positive features, the power, and ambiguity-free expansion, the Foldy–Wouthuysen transformation is still seldom used in optics. The Foldy–Wouthuysen transformation results in nontraditional prescriptions of Helmholtz optics (Khan, 2005a) and Maxwell optics (Khan, 2006b), respectively. The nontraditional approaches give rise to interesting wavelength-dependent modifications of the paraxial and aberrating behavior. The nontraditional formalism of Maxwell optics provides a unified framework of light beam optics and polarization. The nontraditional prescriptions of light optics are in close analogy with the quantum theory of charged-particle beam optics (Conte et al., 1996; Jagannathan et al., 1989; Jagannathan, 1990, 1993, 1999, 2002, 2003; Jagannathan and Khan, 1995, 1996, 1997; Khan and Jagannathan, 1993, 1994, 1995; Khan, 1997, 1999a, 1999b, 2001, 2002a, 2002b, 2002c). The following text sections provide details of the standard Foldy–Wouthuysen transform. An outline of the quantum theory of charged-particle beam optics and the nontraditional prescriptions of light optics are also presented. A comprehensive account can be found in the references. The Feshbach–Villars technique adopted from quantum mechanics to linearize the Klein–Gordon equation is described in Appendix A. An exact matrix representation of the Maxwell equations is presented in Appendix B.
II. THE FOLDY–WOUTHUYSEN TRANSFORMATION The standard Foldy–Wouthuysen theory is described briefly to clarify its use for the purposes of the above studies in optics. Let us consider a charged particle of rest mass m0 , charge q in the presence of an electromagnetic field characterized by E = −∇φ − ∂t∂ A, and B = ∇ × A. Then the Dirac equation is
i
∂ D (r, t) (r, t) = H ∂t D = m0 c2 β + qφ + cα · π H
(1)
+ O = m0 c 2 β + E = qφ E = cα · O π,
(2)
52
Sameen Ahmed Khan
where
0 σ 1l 0 10 , β= , 1l = , σ 0 0 −1l 01 01 0 −i 1 0 σ = σx = , σy = , σz = , 10 i 0 0 −1 α=
(3)
with π = p − qA, p = −i∇, and π2 = πx2 + πy2 + πz2 . In the nonrelativistic situation the upper pair of components of the Dirac spinor are large compared to the lower pair of components. The which does not couple the large and small components of , operator E, is called an “odd” operator that couples the large is called “even” and O to the small components. Note that
= −Oβ, βO
= Eβ. βE
(4)
such that the The search is for a unitary transformation, = −→ U, equation for does not contain any odd operator. In the free-particle case (with φ = 0 and π = p) such a Foldy– Wouthuysen transformation is denoted by
F −→ = U pθ , F = eiS = eβα· U
tan 2| p|θ =
| p| . m0 c
(5)
This transformation eliminates the odd part completely from the freeparticle Dirac Hamiltonian, reducing it to the diagonal form:
i
∂
= eiS m0 c2 β + cα · p e−iS
∂t " ! βα · p sin | p|θ m0 c2 β + cα · = cos | p|θ + p | p| ! " βα · p × cos | p|θ − sin | p|θ
| p| = m0 c2 cos 2| p|θ + c| p| sin 2| p|θ β
" !# = m20 c4 + c2 p 2 β .
(6)
Generally, when the electron is in a time-dependent electromagnetic field, it is not possible to construct an exp(i S) that removes the odd
The Foldy–Wouthuysen Transformation Technique in Optics
53
operators from the transformed Hamiltonian completely. This necessitates a nonrelativistic expansion of the transformed Hamiltonian in a power series in 1/m0 c2 keeping through any desired order. Note that in the F = nonrelativistic case, when |p| m0 c, the transformation operator U 2 exp(iS) with S ≈ −iβO/2m0 c , where O = cα · p is the odd part of the free Hamiltonian. So, in the general case we can start with the transformation
(1) = eiS1 ,
iβO iβα · π . S1 = − = − 2m0 c 2m0 c2
(7)
Then, the equation for (1) is
" ! ∂ ∂ (1) ∂ iS1 ∂ iS1 i S1 +e e = i e i i = i ∂t ∂t ∂t ∂t ∂ iS1 = i + eiS1 H e D ∂t ∂ iS1 −iS1 −i S1 = i e (1) + eiS1 H e De ∂t −i ∂ S1 = eiS1 H − ieiS1 e−iS1 (1) De ∂t (1) (1) , =H D where we have used the identity Now, using the two identities
(8) ∂ ∂t
I = 0. e A e−A + e A ∂t∂ e−A = ∂t∂
1 1 − Be A = A, [ A, B]] + [ A, [ A, [ A, B]]] + · · · e A B + [ A, B] + [ 2! 3! " ! 1 1 2 3 A(t) ∂ −A(t) e e A(t) + A(t) · · · = 1+ A(t) + ∂t 2! 3! ! " ∂ 1 2 1 3 × 1 − A(t) + A(t) − A(t) · · · ∂t 2! 3! ! " 1 1 2 3 = 1+ A(t) + A(t) + A(t) · · · 2! 3! $ % & ∂ A(t) 1 ∂ A(t) ∂ A(t) × − + A(t) + A(t) ∂t 2! ∂t ∂t
54
Sameen Ahmed Khan
%
∂ A(t) 2 ∂ A(t) A(t) + A(t) A(t) ∂t ∂t & ' ∂ A(t) 2 + A(t) ... ∂t ( ) ∂ A(t) 1 ∂ A(t) ≈− − A(t), ∂t 2! ∂t ( ( )) 1 ∂ A(t) − A(t), A(t), 3! ∂t ( ( ( ))) 1 ∂ A(t) − A(t), A(t), A(t), , 4! ∂t 1 − 3!
(9)
with A = i S1 , we find
( ) S1 ∂ S1 ∂ (1) + i S1 , HD − HD ≈ HD − ∂t 2 ∂t ( ( )) 1 S1 ∂ − S1 , S1 , HD − 2! 3 ∂t ( ( ))) ( i ∂ S 1 D − − S1 , S1 , . S1 , H 3! 4 ∂t
(10)
D = m0 c2 β + E + O, simplifying the right-hand Substituting in Eq. (10), H and collecting the terms side using the relations βO = −Oβ and βE = Eβ together yields
(1) ≈ m0 c2 β + E 1 + O 1 H D
') ( $ * + 1 1 ∂ O 2 1 ≈ E + − O, E + i E βO O, ∂t 2m0 c2 8m20 c4 1
4 βO 8m30 c6 $ ' * + ∂ O β 1 3 1 ≈ E + i O O , O, − 2 ∂t 2m0 c 3m20 c4 −
(11)
55
The Foldy–Wouthuysen Transformation Technique in Optics
1 and O 1 obeying the relations βO 1 = −O 1 β and βE 1 = E 1 β exactly with E like E and O. Whereas the term O in HD is of order zero with respect to = O((1/m0 c2 )0 )], the odd part of the expansion parameter 1/m0 c2 [i.e., O (1) , namely O 1 , contains only terms of order 1/m0 c2 and higher powers H D 2 1 = O((1/m0 c2 )1 )]. of 1/m0 c [i.e., O A second Foldy–Wouthuysen transformation is applied with the same prescription to reduce the strength of the odd terms further in the transformed Hamiltonian:
(2) = eiS2 (1) , 1 iβO S2 = − 2m0 c2 iβ =− 2m0 c2
(
β 2m0 c2
$
E + i ∂O O, ∂t
*
+
' −
)
1 3m20 c4
3 . (12) O
After this transformation,
i
∂ (2) (2) (2) , =H D ∂t 2 ≈ E 1 , E
(2) = m0 c2 β + E 2 + O 2 H D
2 ≈ β O 2m0 c2
$
1 + i ∂O1 1 , E O ∂t
*
+
' ,
(13)
2 = O((1/m0 c2 )2 ). After the third transformation, where, now, O
(3) = eiS3 (2) ,
2 iβO S3 = − 2m0 c2
(14)
we have
i
∂ (3) (3) (3) , =H D ∂t 3 ≈ E 2 ≈ E 1 , E
(3) = m0 c2 β + E 3 + O 3 H D 3 ≈ β O 2m0 c2
$
3 , 3 = O((1/m0 c2 )3 ). So, neglecting O where O
2 + i ∂O2 2 , E O ∂t
*
+
' ,
(15)
56
Sameen Ahmed Khan
1 2 βO 2m0 c2 ( $ ') * + ∂ O 1 O, E + i O, − 2 ∂t 8m0 c4 ⎧ $ '2 ⎫ ⎨ ⎬ * + 1 ∂ O 4 + O, E + i − 3 β O . ⎭ ∂t 8m0 c6 ⎩
(3) ≈ m0 c2 β + E + H D
(16)
O) pairs can be By starting with the second transformation, successive (E, obtained recursively using the rule
/ 0 j = E 1 E →O j−1 → E j−1 , O E / 0 j = O → E j−1 , O 1 E →O j−1 , O
j > 1,
(17)
and retaining only the relevant terms of desired order at each step. = qφ and O = cα · With E π, the final reduced Hamiltonian [Eq. (16)] is, to the order calculated,
$
(3) H D
π2 p4 = β m0 c2 + − 2m0 8m30 c6 − −
iq2 8m20 c2 q2 8m20 c2
· curl E − divE,
' + qφ − q
4m20 c2
q β · B 2m0 c
· E × p (18)
with the individual terms having direct physical interpretations. The terms in the first set of parentheses result from the expansion of #
m20 c4 + c2 π2 , showing the effect of the relativistic mass increase. The second and third terms are the electrostatic and magnetic dipole energies. The next two terms, taken together (for hermiticity), contain the spin-orbit interaction. The last term, the so-called Darwin term, is attributed to the zitterbewegung (trembling motion) of the Dirac particle—because of the rapid coordinate fluctuations over distances of the order of the Compton wavelength (2π/m0 c), the particle sees a somewhat smeared-out electric potential. The Foldy–Wouthuysen transformation technique clearly expands the Dirac Hamiltonian as a power series in the parameter 1/m0 c2 , thus
The Foldy–Wouthuysen Transformation Technique in Optics
57
enabling the use of a systematic approximation procedure to study the deviations from the nonrelativistic situation. The similarities between the nonrelativistic particle dynamics and paraxial optics for light beams and particle beams, respectively, are noted in the charts below. Standard Dirac Equation
Light Beam Optical Form
D + O D m0 c2 β + E m0 c 2 Positive energy Nonrelativistic, | π | m0 c Nonrelativistic motion + Relativistic corrections
+ O −n0 β + E −n0 Forward propagation Paraxial beam, p n0
Standard Dirac Equation
Particle-Beam Optical Form
D + O D m0 c2 β + E m0 c 2 Positive energy Nonrelativistic, |π| m0 c Nonrelativistic motion + Relativistic corrections
+ O −p0 β + E ∂ −p0 = i ∂z Forward propagation Paraxial beam, |π⊥ | k Paraxial behavior + Aberration corrections
⊥
Paraxial behavior + Aberration corrections
Noting the above similarities, the concept of the Foldy–Wouthuysen form of the Dirac theory has been adopted to study paraxial optics and deviations. The Helmholtz equation governing scalar optics is first linearized in a procedure similar to the manner in which the Klein–Gordon equation is written in the Feshbach–Villars form (linear in ∂/∂t), unlike the Klein–Gordon equation (quadratic in ∂/∂t). This enables use of the Foldy–Wouthuysen transformation technique. (See Appendix A for the Feshbach–Villars form of the Klein–Gordon equation.) In the case of vector optics, Maxwell’s equations are cast in a spinor form resembling exactly the Dirac equation [Eqs. (1) and (2)] in all respects: i.e., a multicomponent with the upper half of its components large compared to an odd the lower components and the Hamiltonian with an even part (E), part (O), a suitable expansion parameter, (| p⊥ |/n0 1) characterizing the dominant forward propagation, and a leading term with a β coefficient and anticommuting with O. It is important to note commuting with E that the Dirac field and the electromagnetic field are two distinct entities. However, their striking resemblance in the underlying algebraic structure can be exploited to perform some useful calculations with meaningful results. (See Appendix B for the derivation of an exact matrix representation of Maxwell’s equations and differences from other representations).
58
Sameen Ahmed Khan
The additional feature of our formalism is to return finally to the original representation after making an extra approximation, dropping β from the final reduced optical Hamiltonian, taking into account the fact our primary interest is in only the forward-propagating beam. The Foldy–Wouthuysen transformation has allowed entirely new approaches to light optics and charged-particle optics, respectively.
III. QUANTUM FORMALISM OF CHARGED-PARTICLE BEAM OPTICS The classical treatment of charged-particle beam optics has been very successful in the design and function of numerous optical devices—from electron microscopes to very large particle accelerators. It is natural, however, to look for a prescription based on the quantum theory, since any physical system is quantum mechanical at the fundamental level. Such a prescription is sure to explain the grand success of the classical theories. It is certain to be of assistance in a deeper understanding and better design of charged-particle beam devices. The starting point of the quantum prescription of charged-particle beam optics is building a theory based on the basic equations of quantum mechanics (Dirac, Klein–Gordon, Schrödinger) appropriate to the situation under study. In order to analyze the evolution of the beam parameters of the various individual beam optical elements (quadrupoles, bending magnets, and so on) along the optic axis of the system, the first step is starting with the basic time-dependent equations of quantum mechanics, followed by obtaining an equation of the form
i
0 / 0 / 0 ∂ / x, y; s ψ x, y; s , ψ x, y; s = H ∂s
(19)
where (x, y; s) constitutes a curvilinear coordinate system adapted to the geometry of the system. Equation (19) is the basic equation in the quantum formalism, known as the beam-optical equation; H and ψ as the beamoptical Hamiltonian and the beam wavefunction, respectively. The second step requires obtaining a relationship between any relevant observable {O(s)} at the transverse plane at s and the observable {O(sin )} at the transverse plane at sin , where sin is some input reference point. This is achieved by the integration of the beam-optical Eq. in (19):
0 / 0 / (s, sin ) ψ x, y; sin , ψ x, y; s = U which provides the required transfer maps
(20)
The Foldy–Wouthuysen Transformation Technique in Optics
/ 02 1 / 0 O (sin ) −→ O (s) = ψ x, y; s |O| ψ x, y; s , / 3 / 0 † 04 ψ x, y; sin . OU = ψ x, y; sin U
59
(21)
This two-step algorithm is an oversimplified picture of the quantum formalism. Several crucial points should be noted. The first step in the algorithm of obtaining the beam-optical equation is not to be treated as a mere transformation that eliminates t in preference to a variable s along the optic axis. A clever set of transforms is required to not only eliminate the variable t in preference to s but also yield the s-dependent equation, which has a close physical and mathematical correspondence with the original t-dependent equation of standard time-dependent quantum mechanics. The imposition of this stringent requirement on the construction of the beam-optical equation ensures the execution of the second step of the algorithm. The beam-optical equation is such that all the required rich machinery of quantum mechanics becomes applicable to the computation of the transfer maps that characterize the optical system. This describes the essential scheme of obtaining the quantum formalism. The remainder is mostly a mathematical detail inbuilt into the powerful algebraic machinery of the algorithm, accompanied with some reasonable assumptions and approximations dictated by the physical considerations. The nature of these approximations can be best summarized in the optical terminology as a systematic procedure of expanding the beam-optical Hamiltonian in a power series of | π⊥ /p0 |, where p0 is the design (or average) momentum of beam particles moving predominantly along the direction of the optic axis, and π⊥ is the small transverse kinetic momentum. The required expansion is obtained using the ambiguity-free procedure of the Foldy–Wouthuysen transformation. The Feshbach–Villars procedure (Feshbach and Villars, 1958) brings the Schrödinger and the Klein–Gordon equations to a two-component form facilitating the application of the Foldy–Wouthuysen expansion. The leading order approximation, along with | π⊥ /p0 | 1, constitutes the paraxial or ideal behavior, and higherorder terms in the expansion give rise to the nonlinear or aberrating behavior. The paraxial and aberrating behavior is modified by the quantum contributions, which are in powers of the de Broglie wavelength (– λ0 = /p0 ). The classical limit of the quantum formalism reproduces the well-known Lie algebraic formalism of charged-particle beam optics (e.g., see Dragt and Forest 1986; Dragt et al., 1988; Rangarajan et al., 1990; Ryne and Dragt, 1991; see also Forest et al., 1989; Forest and Hirata, 1992). The Hamiltonian description allows us to relate our formalism with other traditional prescriptions, such as the quantum-like approach (Fedele and Man’ko, 1999). A complete coverage of the new field of quantum aspects of beam physics (QABP), can be found in the proceedings of the series of meetings under the same name (Chen, 1999, 2002; Chen and Reil, 2003) and their Reports (Chen, 1998, 2000, 2003a, 2003b).
60
Sameen Ahmed Khan
IV. QUANTUM METHODOLOGIES IN LIGHT BEAM OPTICS Historically, the scalar wave theory of optics (including aberrations to all orders) is based on Fermat’s principle of least time. In this approach, the beam-optical Hamiltonian is derived using Fermat’s principle. This approach is purely geometrical and works adequately in the scalar regime. All the laws of geometrical optics can be deduced from Maxwell’s equations (e.g., see Born and Wolf, 1999). This deduction is traditionally done using the Helmholtz equation, which is derived from Maxwell’s equation. In this approach, one takes the square root of the Helmholtz operator followed by an expansion of the radical (Dragt, 1982, 1988; Dragt et al., 1986). It should be noted that the square-root approach reduces the original boundary value problem to a first-order initial value problem. This reduction has great practical value, since it leads to the powerful system or the Fourier optic approach (Goodman, 1996). However, the beam-optical Hamiltonian in the square-root approach is no different from the geometrical approach of Fermat’s principle. Moreover, the reduction process itself can never be claimed to be rigorous or exact. The Helmholtz equation governing scalar optics is algebraically very similar to the Klein–Gordon equation for a spin-0 particle. Exploiting this similarity, the Helmholtz equation is linearized in a procedure very similar to the one used by Feshbach–Villars (1958), to linearize the Klein– Gordon equation. This brings the Helmholtz equation to a Dirac-like form, allowing the Foldy–Wouthuysen expansion used in the Dirac electron theory. This formalism gives rise to wavelength-dependent contributions modifying the paraxial behavior (Khan et al., 2002) and the aberration coefficients (Khan, 2005a). This is the nontraditional prescription of scalar optics. In regard to polarization, a systematic procedure for the passage from scalar to vector wave optics to handle paraxial beam propagation problems, completely taking into account the manner in which Maxwell’s equations couple the spatial variation and polarization of light waves, has been formulated by analyzing the basic Poincaré invariance of the system. This procedure has been successfully used to clarify several issues in Maxwell optics (Mukunda et al., 1983a, 1983b, 1985; Simon et al., 1986, 1987). In all the aforementioned approaches, the beam optics and the polarization are studied separately using very different processes. The derivation of the Helmholtz equation from Maxwell’s equations is an approximation, since the spatial and temporal derivatives of the permittivity and permeability of the medium are meglected. It is logical to seek a prescription based fully on Maxwell’s equations. The starting point for such a prescription is the exact matrix representation of the Maxwell equations, taking into account the spatial and temporal variations of the permittivity and permeability (Khan, 2005b). It is necessary and sufficient to
The Foldy–Wouthuysen Transformation Technique in Optics
61
use 8 × 8 matrices for such an exact representation (Khan, 2005b). This representation uses the Riemann—Silberstein vector (Silberstein, 1907a; Silberstein, 1907b). For a detailed discussion of the Riemann-Silberstein complex vector see Bialynicki-Birula (1994, 1996a, 1996b). The derivation of the required matrix representation and how it differs from numerous other is presented in Appendix B. The derived representation using 8 × 8 matrices has a close algebraic similarity to the Dirac equation, enabling the use of the Foldy–Wouthuysen transform. The beam-optical Hamiltonian derived from this representation reproduces the Hamiltonians obtained in the traditional prescription along with wavelength-dependent matrix terms, which we have called polarization terms. These polarization terms are very similar to the spin terms in the Dirac electron theory and the spin-precession terms in the beam-optical version of the Thomas—BMT equation (Conte et al., 1996). The matrix formulation provides a unified treatment of beam optics and light polarization. Some well-known results of light polarization are obtained as the paraxial limit of the matrix formulation (Mukunda et al., 1983a, 1983b, 1985; Simon et al., 1986, 1987). Results from the specific example of the graded-index medium considered in the nontraditional prescription of Maxwell optics (Khan, 2006b) are worth noting. First, it predicts an image rotation (proportional to the wavelength), and its magnitude is explicitly given. Second, it provides all nine aberrations permitted by the axial symmetry. (The traditional approaches give six aberrations. The exact treatment of Maxwell optics modifies the six aberration coefficients by wavelength-dependent contributions and also gives rise to the remaining three permitted by the axial symmetry.) The existence of the nine aberrations and image rotation are well known in axially symmetric magnetic lenses, even when treated classically. The quantum treatment of the same system leads to the wavelength-dependent modifications (Jagannathan and Khan, 1996). The alternate procedure for the Helmholtz optics (Khan, 2005a) gives the usual six aberrations (though modified by the wavelength-dependent contributions) and does not provide any image rotation. These extra aberrations and the image rotation are the exclusive outcome of the fact that the formalism is based on a treatment starting with an exact matrix representation of the Maxwell equations. The traditional beam optics (in particular, the Lie algebraic formalism of light beam optics (Dragt, 1982, 1988; Dragt et al., 1986) is completely obtained from the nontraditional prescriptions in the limit wavelength, – λ −→ 0, termed the traditional limit of our formalism. This is analogous to the classical limit obtained by taking −→ 0 in quantum prescriptions. The use of the Foldy–Wouthuysen machinery in the nontraditional prescriptions of Helmholtz optics and Maxwell optics is very similar to the one used in the quantum theory of charged-particle beam optics, developed by Jagannathan et al. There too the classical prescriptions are recovered (Lie algebraic formalism of charged-particle beam optics; Todesco, 1999; Turchetti, 1989) in the limit – λ0 −→ 0, where –λ0 = /p0 is the de Broglie
62
Sameen Ahmed Khan
wavelength and p0 is the design momentum of the system under study. The Foldy–Wouthuysen transformation has allowed novel approaches to light optics and charged-particle optics, respectively.
V. CONCLUSION The use of the Foldy-Wouthuysen transformation technique in optics (Khan, 2006a) has shed light on the deeper connections in the wavelengthdependent regime between light optics and charged-particle optics (Khan, 2002b). The beginning of the analogy between geometrical optics and mechanics is usually attributed to Descartes (1637 CE), but it can actually be traced back to Ibn Al-Haitham Alhazen (c. 0965–1037 CE) (Ambrosini et al., 1997; see also Khan, 2007, and the references therein for the “Medieval Arab Contributions to Optics”; Rashed, 1990, 1993). Historically, variational principles played a fundamental role in the evolution of mathematical models in classical physics, and many equations were derived using them. Here the relevant examples are Fermat’s principle in optics and Maupertuis’ principle in mechanics. The analogy between the trajectory of material particles in potential fields and the path of light rays in media with continuously variable refractive index was formalized by Hamilton in 1833. This Hamiltonian analogy led to the development of electron optics in 1920s, when Busch derived the focusing action and a lens-like action of the axially symmetric magnetic field using the methodology of geometrical optics. Around the same time, Louis de Broglie associated his now-famous wavelength to moving particles. Schrödinger extended the analogy by passing from geometrical optics to wave optics through his wave equation incorporating the de Broglie wavelength. This analogy played a fundamental role in the early development of quantum mechanics. On the other hand, the analogy, led to the development of practical electron optics, and one of the early inventions was the electron microscope by Ernst Ruska. A detailed account of Hamilton’s analogy is available in Hawkes and Kasper (1989a, 1989b, 1994), Born and Wolf (1999), and Forbes (2001). Until very recently, it was possible to recognize this analogy only between geometrical optics and classical prescriptions of electron optics—the quantum theories of charged-particle beam optics have been under development only for about a decade (Conte et al., 1996; Jagannathan et al., 1989; Jagannathan, 1990, 1993, 1999, 2002, 2003; Khan and Jagannathan 1993, 1994, 1995; Jagannathan and Khan, 1995, 1996, 1997; Khan, 1997, 1999a, 1999b, 2001, 2002a, 2002b, 2002c). The quantum prescriptions have the very expected wavelength-dependent effects, which have no analogue in the traditional descriptions of light beam optics. With the recent development of the nontraditional prescriptions of Helmholtz optics (Khan, 2002b, 2005a, Khan et al., 2002) and the matrix formulation of Maxwell optics (Khan, 2006b), accompanied with wavelength-dependent
The Foldy–Wouthuysen Transformation Technique in Optics
63
effects, it is seen that the analogy between the two systems persists. The nontraditional prescription of Helmholtz optics closly resembles the quantum theory of charged-particle beam optics based on the Klein– Gordon equation. The matrix formulation of Maxwell optics is in close semblance with the quantum theory of charged-particle beam optics based on the Dirac equation. The Table summarizes the Hamiltonians in the different prescriptions of light beam optics and charged-particle beam 0, p are the paraxial Hamiloptics for magnetic systems, respectively. H tonians, with lowest-order wavelength-dependent contributions. From the Hamiltonians in the table the following observations are made: The classical/traditional Hamiltonians of particle/light optics are modified by wavelength-dependent contributions in the quantum/nontraditional prescriptions respectively. The algebraic forms of these modifications in each row are very similar. The starting equations have one-to-one algebraic correspondence: Helmholtz ↔ Klein–Gordon; Matrix form of Maxwell ↔ Dirac equation. Finally, the de Broglie wavelength, –λ0 , and –λ have an analogous status, and the classical/traditional limit is obtained by taking – λ −→ 0, respectively. The parallels between the two systems λ0 −→ 0 and – are certain to provide more insights. If not for the Foldy–Wouthuysen transformation, it would not have been possible to recognize the new aspects of the similarities between light optics and charged-particle optics (Khan, 2002b).
TABLE
Hamiltonians in Different Prescriptions Notation
Light Beam Optics
Charged-Particle Beam Optics
Fermat’s principle 1/2 H = − n2 (r) − p2⊥
Maupertuis’ principle 1/2 H = − p20 − π2⊥ − qAz
NonTraditional Helmholtz 2 0, p = −n(r) + 1 H 2n0 p⊥
– ∂ − iλ 3 n(r) p2⊥ , ∂z
Klein–Gordon formalism 2 0, p = −p0 − qAz + 1 H 2p0 π ⊥ * 2 ∂ 2+ π⊥ , ∂z + i 4 π⊥
Maxwell matrix 2 0, p = −n(r) + 1 H 2n0 p⊥ – −iλβ ·u
Dirac formalism 0, p = −p0 − qAz + H
16n0
16p0
1 2 π⊥ 2p0
− 2p0 {μγ⊥ · B⊥ / 0 + q + μ z Bz + i m0 c Bz
λ2 w 2 β + 2n1 0 – √ Refractive index, n(r) = c (r)μ(r) √ Resistance, h(r) = μ(r)/(r)
π⊥ = p⊥ − qA⊥ Anomalous magnetic moment, μa
u(r) = − 2n(1r ) ∇n(r)
Anomalous electric moment, a
w(r) = 2h(1r ) ∇h(r) and β are the Dirac matrices.
μ = 2m0 μa /, γ = E/m0 c2
= 2m0 a /
64
Sameen Ahmed Khan
APPENDIX A The Feshbach–Villars Form of the Klein–Gordon Equation The method used to cast the time-independent Klein–Gordon equation ∂ into a beam optical form linear in ∂z , suitable for a systematic study, through successive approximations, using the Foldy–Wouthuysen-like transformation technique borrowed from the Dirac theory, is similar to the manner the time-dependent Klein–Gordon equation is transformed (Feshbach and Villars, 1958) to the Schrödinger form, containing only firstorder time derivative, in order to study its nonrelativistic limit using the Foldy–Wouthuysen technique (see, e.g., Bjorken and Drell, 1964). Defining
=
∂ , ∂t
(A1)
the free particle Klein–Gordon equation is written as
' $ 2 c4 ∂ m 0 . = c2 ∇ 2 − ∂t 2
(A2)
Introducing the linear combinations
+ =
! " i 1 + , 2 m0 c 2
− =
! " i 1 − 2 m0 c2
(A3)
the Klein–Gordon equation is seen to be equivalent to a pair of of coupled differential equations:
i
0 ∂ 2 ∇ 2 / + = − + + − + m0 c2 + ∂t 2m0
i
0 ∂ 2 ∇ 2 / + + − − m0 c2 − . − = ∂t 2m0
(A4)
Equation (A4) can be written in a two-component language as
! " ! " ∂ + FV + i = H0 , − ∂t −
(A5)
The Foldy–Wouthuysen Transformation Technique in Optics
65
FV , given by with the Feshbach–Villars Hamiltonian for the free particle, H 0
⎛ FV = ⎝ H 0
m0 c2 + p2
p2 2m0
p2 2m0 p2 −m0 c2 − 2m0
− 2m0
= m0 c2 σz +
⎞ ⎠
p2 p2 σz + i σy . 2m0 2m0
(A6)
For a free nonrelativistic particle with kinetic energy m0 c2 , it is seen that + is large compared to − . In the presence of an electromagnetic field, the interaction is introduced through the minimal coupling
p −→ π = p − qA,
i
∂ ∂ −→ i − qφ. ∂t ∂t
(A7)
The corresponding Feshbach–Villars form of the Klein–Gordon equation becomes
! " ! " ∂ + FV + =H i − ∂t − ⎛ ⎞ ⎛ + m1c2 + 0 1 ⎠= ⎜ ⎝ ⎝ 2 − − m1c2 0
FV
H
0 ⎞ i ∂t∂ − qφ ⎟ / ∂ 0 ⎠ i ∂t − qφ /
+ O = m0 c σz + E 2
π2 = qφ + E σz , 2m0
π2 =i O σy . 2m0
(A8)
As in the free-particle case, in the nonrelativistic situation + is large com does not couple + and − , whereas O is pared to − . The even term E odd, which couples + and − . Starting from Eq.(A8), the nonrelativistic limit of the Klein–Gordon equation, with various correction terms, can be understood using the Foldy–Wouthuysen technique (see e.g., Bjorken and Drell, 1964). It is clear that we have adopted the above technique for studying the z-evolution of the Helmholtz wave equation in an optical system comprising a spatially varying refractive index. The additional feature of our formalism is the extra approximation of dropping σz in an intermediate stage to take into account that we are interested only in the forward-propagating beam along the z-direction.
66
Sameen Ahmed Khan
APPENDIX B An Exact Matrix Representation of the Maxwell's Equations in a Medium Matrix representations of the Maxwell’s equations are very well-known (Laporte and Uhlenbeck 1931; Moses, 1959; Majorana, 1974). However, these representations all lack an exactness and/or are denoted in terms of a pair of matrix equations (Bialynicki–Birula, 1994, 1996a, 1996b). Some of these representations are in free space. Such a representation is an approximation in a medium with space- and time-dependent permittivity (r, t) and permeability μ(r, t), respectively. Even this approximation is often expressed through a pair of equations using 3 × 3 matrices: one for the curl and one for the divergence that occurs in the Maxwell equations. This practice of writing the divergence condition separately is completely avoidable by using 4 × 4 matrices for Maxwell’s equations in free-space (Moses, 1959). A single equation using 4 × 4 matrices is necessary and sufficient when (r, t) and μ(r, t) are treated as “local” constants (Bialynicki-Birula, 1996b; Moses, 1959). A treatment that considers the variations of (r, t) and μ(r, t) has been presented in Bialynicki-Birula(1996b). This treatment uses the Riemann– Silberstein vectors, F ± (r, t) to re-express the Maxwell equations as four equations; two equations are for the curl and two are for the divergences, and there is mixing in F + (r, t) and F − (r, t). This mixing is very precisely expressed through the two derived functions of (r, t) and μ(r, t). These four equations are then expressed as a pair of matrix equations using 6 × 6 matrices—again one for the curl and one for the divergence. Even though this treatment is exact, it involves a pair of matrix equations. We present a treatment that allows the expression of the Maxwell equations in a single matrix equation instead of a pair of matrix equations. This approach is a logical continuation of the treatment in Bialynicki-Birula (1996b). We use the linear combination of the components of the Riemann–Silberstein vectors, F ± (r, t) and the final matrix representation is a single equation using 8 × 8 matrices. This representation contains all four Maxwell equations taking into account the spatial and temporal variations of the permittivity (r, t) and the permeability μ(r, t). Section 1 summarizes the treatment for a homogeneous medium and introduces the required functions and notation. Section 2 presents the matrix representation in an inhomogeneous medium with sources.
The Foldy–Wouthuysen Transformation Technique in Optics
67
1. HOMOGENEOUS MEDIUM Begin with the Maxwell equations (Jackson, 1998; Panofsky and Phillips, 1962) in an inhomogeneous medium with sources,
∇ · D (r, t) = ρ, ∇ × H (r, t) −
∂ D (r, t) = J, ∂t
∇ × E (r, t) +
∂ B (r, t) = 0, ∂t
∇ · B (r, t) = 0.
(B1)
The media are assumed to be linear, that is, D = E, and B = μH, where is the permittivity of the medium and μ is the permeability of the medium. In general, = (r, t) and μ = μ(r, t). This section treats them as “local” constants in the various derivations. The magnitude √ of the velocity of light in the medium is given by v(r, t) = |v(r, t)| = 1/ (r, t)μ(r, t). In vacuum, 0 = 8.85 × 10−12 C2 /N.m2 and μ0 = 4π × 10−7 N/A2 . One possible way to obtain the required matrix representation is to use the Riemann–Silberstein vector (Bialynicki-Birula, 1996b) given by
! " 1 1 ; F (r, t) = √ (r, t)E (r, t) + i √ B (r, t) μ(r, t) 2 " ! 1 ; 1 − (r, t)E (r, t) − i √ B (r, t) . F (r, t) = √ μ(r, t) 2 +
(B2)
For any homogeneous medium it is equivalent to use either F + (r, t) or F − (r, t). The two differ by the sign before “i” and are not the complex conjugate of one another. No form is assumed for E(r, t) and B(r, t). These will both be needed in an inhomogeneous medium (see Section 2). If for a certain medium (r, t) and μ(r, t) are constants (or can be treated as local constants under certain approximations), then the vectors F ± (r, t) satisfy
i
∂ ± 1 F (r, t) = ±v∇ × F ± (r, t) − √ (iJ) ∂t 2
1 ∇ · F ± (r, t) = √ (ρ). 2
(B3)
68
Sameen Ahmed Khan
Thus, by using the Riemann–Silberstein vector it has been possible to re-express the four Maxwell equations (for a medium with constant and μ) as two equations. The first one contains the two Maxwell equations with curl and the second one contains the two Maxwell equations with divergences. The first of the two equations in Eq. (B3) can be immediately converted into a 3 × 3 matrix representation. However, this representation does not contain the divergence conditions (the first and the fourth Maxwell equations) contained in the second equation in Eq. (B3). A further compactification is possible only by expressing the Maxwell equations in a 4 × 4 matrix representation. To this end, using the components of the Riemann–Silberstein vector, we define,
⎡
⎡
⎤ −Fx+ + iFy+ ⎢ ⎥ Fz+ ⎥, + (r, t) = ⎢ + ⎣ ⎦ Fz + + Fx + iFy
⎤ −Fx− − iFy− ⎢ ⎥ Fz− ⎥. − (r, t) = ⎢ − ⎣ ⎦ Fz − − Fx − iFy
(B4)
The vectors for the sources are
W+ =
!
⎤ −Jx + iJy 1 ⎢ Jz − vρ ⎥ √ ⎣ J + vρ ⎦ , z 2 Jx + iJy "
⎡
W− =
!
⎤ −Jx − iJy 1 ⎢ Jz − vρ ⎥ √ ⎣ J + vρ ⎦ . (B5) z 2 Jx − iJy "
⎡
Then we obtain
∂ + = −v {M · ∇} + − W + ∂t ∂ − = −v M ∗ · ∇ − − W − , ∂t
(B6)
0 / where ( )∗ denotes complex conjugation and the triplet, M = Mx , My , Mz , is expressed in terms of
=
0 1l
−1l , 0
β=
1l 0
0 , −1l
1l =
1 0
0 . 1
(B7)
Alternately, the matrix J = − can be used. Both differ by a sign. For our purpose, it is fine to use either or J. However, they have a different meaning: J is contravariant and is covariant; the matrix corresponds to the Lagrange brackets of classical mechanics, and J corresponds to the Poisson brackets. An important relation is = J −1 . The M matrices are:
The Foldy–Wouthuysen Transformation Technique in Optics
⎡
0 ⎢0 Mx = ⎣ 1 0 ⎡ 0 ⎢0 My = ⎣ i 0 ⎡ 1 ⎢0 Mz = ⎣ 0 0
⎤ 0 1⎥ = −β, 0⎦ 0 ⎤ 0 −i 0 0 0 −i ⎥ = i, 0 0 0 ⎦ i 0 0 ⎤ 0 0 0 1 0 0 ⎥ = β. 0 −1 0 ⎦ 0 0 −1 0 0 0 1
69
1 0 0 0
(B8)
Each of the four Maxwell equations are easily obtained from the matrix representation in Eq. (B3). This is done by taking the sums and differences of row I with row IV and row II with row III, respectively. The first three give the y, x, and z components of the curl; the last one gives the divergence conditions present in the evolution Eq. (B3). Note that the matrices M are all nonsingular and all are Hermitian. Moreover, they satisfy the usual algebra of the Dirac matrices, including,
Mx β = −βMx , My β = −βMy , Mx2 = My2 = Mz2 = I, Mx My = −My Mx = iMz , My Mz = −Mz My = iMx , Mz Mx = −Mx Mz = iMy .
(B9)
Before proceeding further, note the following: The pair (± , M) are not unique. Different choices of ± would give rise to different M, such that the triplet M continues to satisfy the algebra of the Dirac matrices in Eq. (B9). We have preferred ± via the Riemann–Silberstein vector [Eq. (B2)] in (Bialynicki-Birula, 1996b). This vector has certain advantages over the other possible choices. The Riemann–Silberstein vector is well known in classical electrodynamics and has certain interesting properties and uses (Bialynicki-Birula, 1996b). In deriving the above 4 × 4 matrix representation of the Maxwell equations, we have ignored the spatial and temporal derivatives of (r, t) and μ(r, t) in the first two equations. We have treated and μ as local constants.
70
Sameen Ahmed Khan
2. INHOMOGENEOUS MEDIUM The previous section provided the evolution equations for the Riemann– Silberstein vector in Eq. (B3), for a medium, treating (r, t) and μ(r, t) as local constants. From these pairs of equations we derived the matrix form of the Maxwell equations. This section provides the exact equations, taking into account the spatial and temporal variations of (r, t) and μ(r, t). It is possible to write the required evolution equations using (r, t) and μ(r, t), but we follow the procedure in Bialynicki-Birula (1996b) of using the two derived laboratory functions
1 Velocity function : v(r, t) = √ (r, t)μ(r, t) B μ(r, t) . Resistance function : h(r, t) = (r, t)
(B10)
The function, v(r, t) has the dimensions of velocity and the function, h(r, t) has the dimensions of resistance (measured in Ohms). We can equivalently use the conductance function, κ(r, t) = 1/h(r, t) = (r, t)/μ(r, t) (measured in Ohms−1 or Mhos!) in place of the resistance function, h(r, t). These derived functions allow a more transparent understanding of the dependence of the variations (Bialynicki-Birula, 1996b). Moreover, the derived functions are the ones√that are measured experimentally. In terms of these func; tions, = 1/ vh and μ = h/v. Using these functions, the exact equations satisfied by F ± (r, t) are
i
/ 0 1/ 0 ∂ + F (r, t) = v(r, t) ∇ × F + (r, t) + ∇v(r, t) × F + (r, t) ∂t 2 0 / i ; v(r, t) ∇h(r, t) × F − (r, t) − √ v(r, t)h(r, t) J + 2h(r) 2
˙ t) i h(r, i v˙ (r, t) + F (r, t) + F − (r, t) 2 v(r, t) 2 h(r, t) / 0 1/ 0 ∂ ∇v(r, t) × F − (r, t) i F − (r, t) = −v(r, t) ∇ × F − (r, t) − ∂t 2 0 i ; v(r, t) / ∇h(r, t) × F + (r, t) − √ v(r, t)h(r, t) J − 2h(r, t) 2 ˙ t) i v˙ (r, t) − i h(r, + F (r, t) + F + (r, t) 2 v(r, t) 2 h(r, t) +
The Foldy–Wouthuysen Transformation Technique in Optics
∇ · F + (r, t) =
71
/ 0 1 ∇v(r, t) · F + (r, t) 2v(r, t)
0 / 1 ∇h(r, t) · F − (r, t) 2h(r, t) 1 ; + √ v(r, t)h(r, t) ρ, 2 0 / 1 ∇v(r, t) · F − (r, t) ∇ · F − (r, t) = 2v(r, t) +
0 / 1 ∇h(r, t) · F + (r, t) 2h(r, t) 1 ; + √ v(r, t)h(r, t) ρ, 2
+
(B11)
∂h ˙ where v˙ = ∂v ∂t and h = ∂t . The evolution equations in Eq. (B11) are exact (for a linear media), and the dependence on the variations of (r, t) and μ(r, t) has been neatly expressed through the two derived functions. The coupling between F + (r, t) and F − (r, t) is via the gradient and time derivative of only one derived function, namely, h(r, t) or equivalently κ(r, t). Either of these can be used and both are the directly measured quantities. We further note that the dependence of the coupling is logarithmic
/ 0 1 ∇h(r, t) = ∇ ln h(r, t) , h(r, t)
0 1 ˙ ∂ / h(r, t) = ln h(r, t) , h(r, t) ∂t (B12)
where ln is the natural logarithm. The coupling can be best summarized by expressing the equations in Eq. (B11) in a (block) matrix form. For this we introduce the following logarithmic function:
L(r, t) =
/ 0 1 1l ln (v(r, t)) + σx ln h(r, t) , 2
(B13)
where σx is from the triplet of the Pauli matrices
! 0 σ = σx = 1
1 , σy = 0
0 i
−i , σz = 0
1 0
0 −1
" .
(B14)
72
Sameen Ahmed Khan
Using the above notation, the matrix form of the equations in (B11) is
C
∂ ∂ i 1l − L ∂t ∂t
{1l∇ − ∇L} ·
F + (r, t) F − (r, t)
= v(r)σz {1l∇ + ∇L} ×
F + (r, t) F − (r, t)
i ; − √ v(r, t)h(r, t) J 2 F + (r, t) F − (r, t)
1 ; = + √ v(r, t)h(r, t) ρ, 2
(B15)
where the dot-product and the cross-product are to be understood as
A C
A C
B u · D v B u × D v
= =
A·u+B·v C·u+D·v A×u+B×v . C×u+D×v
(B16)
Note that the 6 × 6 matrices in the evolution equations in Eq. (B15) are either Hermitian or antihermitian. Any dependence / on the 0variations / of (r, t)0 and μ(r, t) is at best weak. Further note, ∇ ln (v(r, t)) = −∇ ln (n(r, t)) / 0 / 0 and ∂t∂ ln (v(r, t)) = − ∂t∂ ln (n(r, t)) . In some media, the coupling may ˙ t) = 0) and in the same media the refracvanish (∇h(r, t) = 0 and h(r, ˙ 00t) = 0). It tive index, n(r, t) = c/v(r, t) may vary (∇n(r, t) = 0 and/or / / n(r, may be further possible to use the approximations ∇ ln h(r, t) ≈ 0 and / / 00 ∂ ∂t ln h(r, t) ≈ 0. We use the following matrices to express the exact representation
=
σ 0
0 , σ
α=
0 σ
σ , 0
I=
1l 0
0 , 1l
(B17)
where are the Dirac spin matrices and α are the matrices used in the Dirac equation. Then,
∂ ∂t
I 0 0 I
+ −
v˙ (r, t) − 2v(r, t)
˙ t) 0 h(r, + 2h(r, t) iβαy
I 0
iβαy 0
0 I
+ −
+ −
The Foldy–Wouthuysen Transformation Technique in Optics
{M ·/∇ + 0· u} −iβ ∗ · w αy + 0 W , I W−
= −v(r, t) −
I 0
( · w) αy −iβ M ∗ · ∇ + ∗ · u
73
+ − (B18)
where
1 ∇v(r, t) = 2v(r, t) 1 w(r, t) = ∇h(r, t) = 2h(r, t) u(r, t) =
1 1 ∇ {ln v(r, t)} = − ∇ {ln n(r, t)} 2 2 1 ∇ {ln h(r, t)} . (B19) 2
The above representation contains thirteen 8 × 8 matrices! Ten of these are Hermitian. The exceptional ones are those that contain the three components of w(r, t), the logarithmic gradient of the resistance function. These three matrices, for the resistance function, are antihermitian. We have expressed the Maxwell equations in a matrix form in a medium with varying permittivity (r, t) and permeability μ(r, t), in presence of sources. We have been able to do so using a single equation instead of a pair of matrix equations. We have used 8 × 8 matrices and have been able to separate the dependence of the coupling between the upper components (+ ) and the lower components (− ) through the two laboratory functions. Moreover, the exact matrix representation has an algebraic structure very similar to the Dirac equation. It is interesting to note that the Maxwell equations can be derived from the Fermat principle of geometrical optics by the process of wavization analogous to the quantization of classical mechanics (Pradahan, 1987). We believe that this representation would be more suitable for some of the studies related to the photon wave function (Bialynicki-Birula, 1996b).
ACKNOWLEDGMENTS I am grateful to Professor Ramaswamy Jagannathan for my training in the field of quantum theory of charged-particle beam optics, which was the topic of my doctoral thesis, which he so elegantly supervised. Naturally, we were dealing with the relativistic wave equations; he taught me the related techniques, including the Foldy–Wouthuysen transformation. I am thankful to him for suggesting the novel project of investigating the light beam optics, initially the scalar optics (Helmholtz optics) using the Foldy–Wouthuysen transformation. I also thankfully acknowledge the collaboration, during the initial work, with Professor Rajiah Simon. Later Professor Jagannathan guided me to the logical continuation from scalar optics to the vector optics, leading to the matrix formulation of Maxwell optics, again using the Foldy–Wouthuysen transformation. I had the benefit of many discussions with Professor Simon. During the course of my investigations, I had the privilege to enjoy the hospitality of the Institute of Mathematical Sciences (MatScience/IMSc) in Chennai (Madras),
74
Sameen Ahmed Khan
India. Also thank Professor Hawkes for showing a keen interest in the quantum theory of charged-particle beam optics, which led Jagannathan and me to write a long and comprehensive chapter a decade ago. Professor Hawkes’ continued encouragement has resulted in this chapter.
REFERENCES Acharya, R., and Sudarshan, E. C. G. (1960). Front description in relativistic quantum mechanics. J. Mathe. Phys. 1, 532–536. Ambrosini, D., Ponticiello, A., Schirripa Spagnolo, G., Borghi, R., and Gori, F. (1997). Bouncing light beams and the Hamiltonian analogy. Eur. J. Phys. 18, 284–289. Asaga, T., Fujita, T., and Hiramoto M. (2000). EDM operators free from Schiff’s Theorem. arXiv: hep-ph/0005314. Bialynicki-Birula, I. (1994). On the wave function of the photon. Acta Phys. Polonica A 86, 97–116. Bialynicki-Birula, I. (1996a). The photon wave function. In “Coherence and Quantum Optics VII” (J. H., Eberly, L. Mandel, and E. Wolf. eds.), pp. 313–322. Plenum Press, New York. Bialynicki-Birula, I. (1996b). Photon wave function. In “Progress in Optics, vol. XXXVI (E. Wolf, ed.), pp. 245–294. Elsevier, Amsterdam. Bjorken, J. D., and Drell, S. D. (1964). Relativistic Quantum Mechanics. McGraw-Hill, New York. Born, M. and Wolf, E. (1999). Principles of Optics, Cambridge University Press, Cambridge, UK. Brown, R. W., Krauss, L. M., and Taylor, P. L. (2001). Obituary of Leslie Lawrence Foldy. Physics Today 54(12), 75–76. Case, K. M. (1954). Some generalizations of the Foldy-Wouthuysen transformation. Phys. Rev. 95, 1323–1328. Chen, P. (ed). (1999). Proceedings of the 15th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, January 4–9, 1998, Monterey, California, World Scientific, Singapore. Chen, P. (ed.). (2002). Proceedings of the 18th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, October 15–20, 2000, Capri, Italy, World Scientific, Singapore. Chen, P. and K. Reil (Eds.) (2003) Proceedings of the Joint 28th ICFA Advanced Beam Dynamics and Advanced & Novel Accelerators Workshop on Quantum Aspects of Beam Physics and Other Critical Issues of Beams in Physics and Astrophysics, January 7–11, 2003, Hiroshima University, Japan, World Scientific, Singapore, http://home.hiroshimau.ac.jp/ogata/qabp/home.html, http://www.slac.stanford.edu/pubs/slacreports/slacr-630.html; Chen, P. (1998). Workshop Reports: ICFA Beam Dynamics Newsletter, 16, (1998) 22–25. Chen, P. (2000). Workshop Reports: ICFA Beam Dynamics Newsletter, 23, (2000) 13–14. Chen, P. (2003a). Workshop Reports: ICFA Beam Dynamics Newsletter, 30, (2003) 72–75. Chen, P. (2003b). Workshop Reports: Bulletin of the Association of Asia Pacific Physical Societies, 13(1), (2003) 34–37. Conte, M., Jagannathan, R., Khan, S. A., and Pusterla, M. (1996). Beam optics of the Dirac particle with anomalous magnetic moment. Particle Accelerators 56, 99–126. Costella, J. P., and McKellar, B. H. J. (1995). The Foldy-Woutuysen transformation. arXiv: hep-ph/9503416. American Journal of Physics, 63 (1995) 1119–1121. Dragt, A. J. (1982). A Lie algebraic theory of geometrical optics and optical aberrations. J. Opt. Soc. Am. 72, (1982) 372–379. Dragt, A. J. (1988). Lie Algebraic Method for Ray and Wave Optics. University of Maryland Physics Department Report.
The Foldy–Wouthuysen Transformation Technique in Optics
75
Dragt, A. J., and Forest, E. (1986). Advances in Imaging and Electron Physics. Academic Press, San Diego, 67, 65–120. Dragt, A. J., Forest, R., and Wolf, K. B. (1986). Foundations of a Lie algebraic theory of geometrical optics. In “Lie Methods in Optics, Lecture Notes in Physics,” Vol. 250, pp. 105–157. Springer Verlag, Berlin. Dragt, A. J., Neri, F., Rangarajan, G., Douglas, D. R., Healy, L. M., and Ryne, R. D. (1988). Lie Algebraic Treatment of Linear and Nonlinear Beam Dynamics. Ann. Rev. Nucl. Part. Sci., 38, 455–496. Forest, E., and Hirata, K. (1992). A Contemporary Guide to Beam Dynamics, KEK Report 92-12 National Laboratory for High Energy Physics, Tsukuba, Japan. Forest, E., Berz, M., and Irwin, J. (1989). Part. Accel. 24, 91–97. Fedele. R., and Man’ko, V. I. (1999). The role of Semiclassical description in the Quantum-like theory of light rays. Physi. Rev. E 60, 6042–6050. Feshbach, H., and Villars, F. M. H. (1958). Elementary relativistic wave mechanics of spin 0 and spin 1/2 particles. Rev. Mod. Phys. 30, 24–45. Fishman, L. (1992). Exact and operator rational approximate solutions of the Helmholtz, Weyl composition equation in underwater acoustics-the quadratic profile. J. Math. Phys. 33(5), 1887–1914. Fishman, L. (2004). One-way wave equation modeling in two-way wave propagation problems. In Mathematical Modelling of Wave Phenomena 2002, Mathematical Modelling in Physics, Engineering and Cognitive Sciences, vol. 7, B. Nilsson, L. Fishman (eds.), pp. 91–111. Växjö University Press, Växjö Sweden. Fishman, L., and McCoy, J. J. (1984). Derivation and application of extended parabolic wave theories. Part I. The factored Helmholtz equation. J. Math. Phy. 25, 285–296. Foldy, L. L. (1952). The electromagnetic properties of the Dirac particles. Phys. Rev., 87(5), 682–693. Foldy, L. L. (2006). Origins of the FW transformation: A memoir, appendix G. In “Physics at a Research University, Case Western Reserve University 1830–1990” (William Fickinger, ed), pp. 347–351. http://www.phys.cwru.edu/history Foldy, L. L. and Wouthuysen, S. A. (1950). On the Dirac theory of spin 1/2 particles and its non-relativistic limit. Phys. Rev. 78, 29–36. Forbes, G. W. (2001). Hamilton’s optics: Characterizing ray mapping and opening a link to waves. Optics Photonics News 12(11), 34–38. Goodman, J. W. (1996). Introduction to Fourier Optics. 2nd ed., McGraw-Hill, New York. Hawkes, P. W., and Kasper, E. (1989a). “Principles of Electron Optics”, Vol. I, “Basic Geometrical Optics.” Academic Press, London. Hawkes, P. W., and Kasper, E. (1989b). “Principles of Electron Optics”, Vol. II, “Applied Geometrical Optics.” Academic Press, London. Hawkes, P. W., and Kasper, E. (1994). “Principles of Electron Optics”, Vol. III, “Wave Optics.” Academic Press, London. Heinemann, K., and Barber, D. P. (1999). The semiclassical Foldy-Wouthuysen transformation and the derivation of the Bloch equation for spin-1/2 polarised beams using Wigner functions, arXiv: physics/9901044. In Proceedings of the 15th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, (P. Chen, ed.), pp. 4–9 January 4–9, 1998, Monterey, California. World Scientific, Singapore. Jackson, J. D. (1998). Classical Electrodynamics, 3rd ed. John Wiley & Sons, New York. Jagannathan, R., (1990). Quantum theory of electron lenses based on the Dirac equation. Phys. Rev. A 42, 6674–6689. Jagannathan, R. (1993). Dirac equation and electron optics. In “Dirac and Feynman: Pioneers in Quantum Mechanics” (R. Dutt and A. K. Ray, eds.), pp. 75–82. Wiley Eastern, New Delhi, India. Jagannathan, R. (1999). The Dirac equation approach to spin-1/2 particle beam optics, arXiv: physics/9803042, pp. 670–681. In Proceedings of the 15th Advanced ICFA Beam Dynamics
76
Sameen Ahmed Khan
Workshop on quantum Aspects of Beam Physics, (P. Chen. ed.) January 4–9, 1998, Monterey, California, World Scientific, Singapore. Jagannathan, R. (2002). Quantum mechanics of Dirac particle beam optics: Single-particle theory, arXiv: physics/0101060, pp. 568–577. In Proceedings of the 18th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, (P. Chen. ed.) October 15–20, 2000, Capri, Italy, World Scientific, Singapore. Jagannathan, R. (2003). Quantum mechanics of Dirac particle beam transport through optical elements with straight and curved axes, arXiv: physics/0304099. pp. 13–21, In Proceedings of the 28th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, (P. Chen. and K. Reil. eds.) January 2003, Hiroshima, Japan, World Scientific, Singapore. Jagannathan, R., and Khan, S. A. (1995), Wigner functions in charged particle optics. In “Selected Topics in Mathematical Physics-Professor R. Vasudevan Memorial Volume,” (R. Sridhar, K. Srinivasa Rao, and V. Lakshminarayanan, Eds.), pp. 308–321. Allied Publishers, Delhi, India. Jagannathan, R., and Khan, S. A. (1996). Quantum theory of the optics of charged particles. In Advances in Imaging and Electron Physics, vol. 97, (Peter Hawkes, ed.), pp. 257–358. Academic Press, San Diego. Jagannathan, R., and Khan, S. A. (1997). Quantum mechanics of accelerator optics. ICFA Beam Dynamics Newsletter 13, 21–27. Jagannathan, R., Simon, R., Sudarshan, E. C. G., and Mukunda, N. (1989). Quantum theory of magnetic electron lenses based on the Dirac equation. Phys. Lett. A 134, 457–464. Jayaraman, J. (1975). A note on the recent Foldy-Wouthuysen transformations for particles of arbitrary spin. J. Phys. A Math. Gen. 8, L1–L4. Khan, S. A. (1997). Quantum Theory of Charged-Particle Beam Optics. Ph.D Thesis, University of Madras, Chennai, India. Khan, S. A. (1999a). Quantum theory of magnetic quadrupole lenses for spin-1/2 particles, arXiv: hysics/9809032, pp. 682–694. In Proceedings of the 15th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, (P. Chen. ed.) January 4–9, 1998, Monterey, California, World Scientific, Singapore. Khan, S. A. (1999b). Quantum aspects of accelerator optics, arXiv: physics/9904063, In: Proceedings of the 1999 Particle Accelerator Conference (PAC99), pp. 2817–2819. (Luccio, A., MacKay, W., Ed.) March 29–April 02, 1999, New York City, NY), (IEEE Catalogue Number: 99CH36366). Khan, S. A. (2001). The World of Synchrotrons, arXiv: physics/0112086, Resonance Journal of Science Education, 6 (11) 77–84. (Monthly Publication of the Indian Academy of Sciences (IAS)). Khan, S. A. (2002a). Quantum formalism of beam optics, arXiv: physics/0112085, pp. 517–526. In Proceedings of the 18th Advanced ICFA Beam Dynamics Workshop on Quantum Aspects of Beam Physics, (P. Chen. ed.) October 15–20, 2000, Capri, Italy, World Scientific, Singapore. Khan, S. A. (2002b). Analogies between light optics and charged-particle optics. arXiv: physics/0210028. ICFA Beam Dynamics Newsletter 27, 42–48. Khan, S. A. (2002c). Introduction to Synchrotron Radiation, arXiv: physics/0112086, Bulletin of the IAPT, 19 (5), 149–153. (IAPT: Indian Association of Physics Teachers). Khan, S. A. (2005a). Wavelength-dependent modifications in Helmholtz optics. Int. J. Theoret. Phys., 44(1), 95–125. Khan, S. A. (2005b). An exact matrix representation of Maxwell’s equations. Phys. Scripta 71(5), 440–442. Khan, S. A. (2006a). The Foldy-Wouthuysen transformation technique in optics. OptikInternational Journal for Light and Electron Optics 117(10), 481–488. Khan, S. A. (2006b). Wavelength-dependent effects in light optics. In “New Topics in Quantum Physics Research,” (Volodymyr Krasnoholovets and Frank Columbus, eds.), pp. 163–204. Nova Science Publishers, New York.
The Foldy–Wouthuysen Transformation Technique in Optics
77
Khan, S. A. (2007). Arab origins of the discovery of the refraction of light: Roshdi Hifni Rashed awarded the 2007 King Faisal International Prize. Optics Photonics News 18(10), 22–23. Khan, S. A., and Jagannathan, R. (1993). Theory of relativistic electron beam transport based on the Dirac equation. pp. 102–107. In Proceedings of the 3rd National Seminar on Physics and Technology of Particle Accelerators and their Applications, PATPAA-93, (S. N. Chintalapudi, ed) November 25–27, 1993, Kolkata (Calcutta), IUC-DAEF, Kolkata (Calcutta), India. Khan, S. A., and Jagannathan, R. (1994). Quantum mechanics of charged-particle beam optics: An operator approach. Presented at the JSPS-KEK International Spring School on High Energy Ion Beams—Novel Beam Techniques and Their Applications, March 1994, Tsukuba, Japan. Khan, S. A., and Jagannathan, R. (1995). On the quantum mechanics of charged particle beam transport through magnetic lenses. Phys. Rev. E 51, 2510–2515. Khan, S. A., Jagannathan, R., and Simon, R. (2002). Foldy-Wouthuysen transformation and a quasiparaxial approximation scheme for the scalar wave theory of light beams. arXiv: physics/0209082. Laporte, O., and Uhlenbeck, G. E. (1931). Applications of spinor analysis to the Maxwell and Dirac equations. Phys. Rev., 37, 1380–1397. Leopold, H. (1997). Obituary of Siegfried A Wouthuysen. Physics Today 50(11), 89. Lippert, M., Brückel, Th., Köhler, Th., and Schneider, J. R. (1994). High-resolution bulk magnetic scattering of high-energy synchrotron radiation. Europhys. Lett. 27(7), 537–541. Majorana, E. (1974). (unpublished notes), quoted after Mignani, R., Recami, E., and Baldo, M. About a Diraclike equation for the Photon, according to Ettore Majorana. Lett. Nuovo Cimento 11, 568–572. Moses, E. (1959). Solutions of Maxwell’s equations in terms of a spinor notation: the direct and inverse problems. Phys. Rev. 113(6), 1670–1679. Mukunda, N., Simon, R., and Sudarshan, E. C. G. (1983a). Paraxial-wave optics and relativistic front description. I. The scalar theory. Phys. Rev. A 28, 2921–2932. Mukunda, N., Simon, R., and Sudarshan, E. C. G. (1983b). Paraxial-wave optics and relativistic front description. II. The vector theory. Phys. Rev. A 28, 2933–2942. Mukunda, N., Simon, R., and Sudarshan, E. C. G. (1985). Fourier optics for the Maxwell field: Formalism and applications. J. Opti. Soc. Am. A 2(3), 416–426. Orris, G. J., and Wurmser, D. (1995). Applications of the Foldy-Wouthuysen transformation to acoustic modeling using the parabolic equation method. J Acoust. Soc. Am. 98, 2870. Osche, G. R. (1977). Dirac and Dirac-Pauli equation in the Foldy-Wouthuysen representation, Phys. Rev. D 15(8), 2181–2185. Pachucki, K. (2004). Higher-order effective Hamiltonian for light atomic systems. arXiv: physics/0411168. Panofsky, W. K. H., and Phillips, M. (1962). Classical Electricity and Magnetics. Addison-Wesley Publishing Company. Reading, Massachusetts, USA. Patton, R. S. (1986). In “Path Integrals from meV to MeV” (M. Gutzwiller, A. Inomata, J. R. Klauder, and L. Streit, Eds.), pp. 98–115. World Scientific, Singapore. Pradhan, T. (1987). Maxwell’s equations from geometrical optics. Phys. Lett. A 122(8), 397–398. Pryce, M. H. L. (1948). The mass-centre in the restricted theory of relativity and its connexion with the quantum theory of elementary particles. Proc. R. Soc. London A Math Phys. Sci. A 195, 62–81. Rangarajan, G., Dragt, A. J., and Neri, F. (1990). Solvable map representation of a nonlinear symplectic map. Part. Accel. 28, 119–124. Rashed, R. (1990). A Pioneer in Anaclastics—Ibn Sahl on Burning Mirrors and Lenses. ISIS, 81 464–491. Rashed, R. (1993). Géométrie et Dioptrique au Xe siècle: Ibn Sahl, al-Quhî et Ibn al-Haytham. Collection Sciences et Philosophie Arabes, Textes et Études, Les Belles Lettres, Paris. France.
78
Sameen Ahmed Khan
Ryne, R. D., and Dragt, A. J. (1991). Magnetic optics calculations for cylindrically symmetric beams. Part. Accel. 35, 129–165. Silberstein, L. (1907a). Elektromagnetische Grundgleichungen in bivektorieller Behandlung, Ann. Phys. (Leipzig) 22, 579–586. Silberstein, L. (1907b). Nachtrag zur Abhandlung über Elektromagnetische Grundgleichungen in bivektorieller Behandlung. Ann. Phys. (Leipzig) 24, 783–784. Simon R., Sudarshan, E. C. G., and Mukunda, N. (1986). Gaussian-Maxwell beams. J. Optic. Soc. Am. A 3(4), 536–540. Simon R., Sudarshan, E. C. G., and Mukunda, N. (1987). Cross polarization in laser beams. Appl. Optics 26(9), 1589–1593. Tani, S. (1951). Connection between particle models and field theories. I. The case spin 1/2. Prog. Theoret. Phys. 6, 267–285. Todesco, E. (1999). Overview of single-particle nonlinear dynamics. Presented at 16th ICFA Beam Dynamics Workshop on Nonlinear and Collective Phenomena in Beam Physics, Arcidosso, Italy, Sept 1–5, 1998. AIP Conf. Proc. 468, 157–172. Turchetti, G., Bazzani, A., Giovannozzi, M., Servizi, G., and Todesco, E. (1989). Normal forms for symplectic maps and stability of beams in particle accelerators, pp. 203–231. Proceedings of the Dynamical Symmetries and Chaotic Behaviour in Physical Systems, Bologna, Italy. Wurmser, D. (2001). A new strategy for applying the parabolic equation to a penetrable rough surface. J. Acoust. Soc. Am. 109(5), 2300. Wurmser, D. (2004). A parabolic equation for penetrable rough surfaces: using the FoldyWouthuysen transformation to buffer density jumps. Ann. Phys. 311, 53–80.
CHAPTER
3 Nonlinear Systems for Image Processing Saverio Morfu*, Patrick Marquié*, Brice Nofiélé*, and Dominique Ginhac*
Contents
I Introduction II Mechanical Analogy A Overdamped Case B Inertial Systems III Inertial Systems A Image Processing B Electronic Implementation IV Reaction–Diffusion Systems A One-Dimensional Lattice B Noise Filtering of a One-Dimensional Signal C Two-Dimensional Filtering: Image Processing V Conclusion VI Outlooks A Outlooks on Microelectronic Implementation B Future Processing Applications Acknowledgments Appendix A Appendix B Appendix C Appendix D References
79 83 84 90 95 95 103 108 108 111 119 133 134 134 135 141 142 143 144 145 146
I. INTRODUCTION For almost 100 years, nonlinear science has attracted the attention of researchers to circumvent the limitation of linear theories in the explanation of natural phenomenons. Indeed, nonlinear differential equations can model the behavior of ocean surfaces (Scott, 1999), the recurrence of ice ages (Benzi et al., 1982), the transport mechanisms in living cells * Laboratoire LE2I UMR 5158, Aile des sciences de l’ingénieur, BP 47870 21078 Dijon, Cedex, France Advances in Imaging and Electron Physics,Volume 152, ISSN 1076-5670, DOI: 10.1016/S1076-5670(08)00603-4. Copyright © 2008 Elsevier Inc. All rights reserved.
79
80
Saverio Morfu et al.
(Murray, 1989), the information transmission in neural networks (Izhikevich, 2007; Nagumo et al., 1962; Scott, 1999), the blood pressure propagation in arteries (Paquerot and Remoissenet, 1994), or the excitability of cardiac tissues (Beeler and Reuter, 1977; Keener, 1987). Therefore, nonlinear science appears as the most important frontier for a better understanding of nature (Remoissenet, 1999). In the recent field of engineering science (Agrawal1, 2002; Zakharov and Wabnitz, 1998), considering nonlinearity has allowed spectacular progress in terms of transmission capacities in optical fibers via the concept of soliton (Remoissenet, 1999). More recently, nonlinear differential equations in many areas of physics, biology, chemistry, and ecology have inspired unconventional methods of processing that transcend the limitations of classical linear methods (Teuscher and Adamatzky, 2005). This growing interest for processing applications based on the properties of nonlinear systems can be explained by the observation that fundamental progress in several fields of computer science sometimes seems to stagnate. Novel ideas derived from interdisciplinary fields often open new directions of research with unsuspected applications (Teuscher and Adamatzky, 2005). On the other hand, complex processing tasks require intelligent systems capable of adapting and learning by mimicking the behavior of the human brain. Biologically inspired systems, most often described by nonlinear reaction-diffusion equations, have been proposed as convenient solutions to very complicated problems unaccessible to modern von Neumann computers. It was in this context that the concept of the cellular neural network (CNN) was introduced by Chua and Yang as a novel class of information-processing systems with potential applications in areas such as image processing and pattern recognition (Chua and Yang, 1988a, b). In fact, CNN is used in the context of brain science or the context of emergence and complexity (Chua, 1998). Since the pioneer work of Chua, the CNN paradigm has rapidly evolved to cover a wide range of applications drawn from numerous disciplines, including artificial life, biology, chemistry, physics, information science, nonconventional methods of computing (Holden et al., 1991), video coding (Arena et al., 2003; Venetianer et al., 1995), quality control by visual inspection (Occhipinti et al., 2001), cryptography (Caponetto et al., 2003; Yu and Cao, 2006), signal-image processing (Julian and Dogaru, 2002), and so on (see Tetzlaff (2002), for an overview of the applications). In summary, the past two decades devoted to the study of CNNs have led scientists to solve problems of artificial intelligence by combining the highly parallel multiprocessor architecture of CNNs with the properties inherited from the nonlinear bio-inspired systems. Among the tasks of high computational complexity routinely performed with nonlinear systems are the optimal path in a two-dimensional (2D) vector field (Agladze et al., 1997), image skeletonization (Chua, 1998), finding
Nonlinear Systems for Image Processing
81
the shortest path in a labyrinth (Chua, 1998; Rambidi and Yakovenchuk, 2001), or controlling mobile robots (Adamatzky et al., 2004). However, the efficiency of these nonlinear systems for signal-image processing or pattern recognition does not come only from their biological background. Indeed, the nonlinearity offers an additional dimension lying in the signal amplitude, which gives rise to novel properties not shared by linear systems. Noise removal with a nonlinear dissipative lattice (Comte et al., 1998; Marquié et al., 1998), contrast enhancement based on nonlinear oscillators properties (Morfu and Comte, 2004), edge detection exploiting vibration noise (Hongler et al., 2003), optimization by noise of nonoptimum problems or signal detection aided by noise via the stochastic resonance phenomenon (Chapeau-Blondeau, 2000; Comte and Morfu, 2003; Gammaitoni et al., 1998) constitute a nonrestrictive list of examples in which the properties of nonlinear systems have allowed overcoming the limitations of classical linear approaches. Owing to the rich variety of potential applications inspired by nonlinear systems, the efforts of researchers have focused on the experimental realization of such efficient information-processing devices. Two different strategies were introduced (Chua and Yang, 1988a; Kuhnert, 1986), and today, the fascinating challenge of artificial intelligence implementation with CNN is still being investigated. The first technique dates from the late 1980s with the works of Kuhnert, who proposed taking advantage of the properties of Belousov– Zhabotinsky-type media for image-processing purposes (Kuhnert, 1986; Kuhnert et al., 1989). The primary concept is that each micro-volume of the active photosensitive chemical medium acts as a one-bit processor corresponding to the reduced/oxidized state of the catalyst (Agladze et al., 1997). This feature of chemical photosensitive nonlinear media has allowed implementation of numerous tools for image processing. Edge enhancement, classical operations of mathematical morphology, the restoration of individual components of an image with overlapped components (Rambidi et al., 2002), the image skeletonization (Adamatzky et al., 2002), the detection of urban roads, or the analysis of medical images (Teuscher and Adamatzky, 2005) represent a brief overview of processing tasks computed by chemical nonlinear media. However, even considering the large number of chemical “processors,” the very low velocity of trigger waves in chemical media is sometimes incompatible with real-time processing constraints imposed by practical applications (Agladze et al., 1997). Nevertheless, the limitations of these unconventional methods of computing no way dismiss the efficiency and high prospects of the processing developed with active chemical media (Adamatzky and de Lacy Costello, 2003). By contrast, analog circuits do not share the weakness of the previous strategy of integration. Therefore, because of their real-time processing
82
Saverio Morfu et al.
capability, electronic hardware devices constitute the most common way to implement CNNs (Chua and Yang, 1988a). The first step to electronically develop a CNN for image-processing purposes consists of designing an elementary cell. More precisely, this basic unit of CNNs usually contains linear capacitors, linear resistors, and linear and nonlinear controlled sources (Chua and Yang, 1988b; Comte and Marquié, 2003). Next, to complete the description of the network, a coupling law between cells is introduced. Owing to the propagation mechanism inherited from the continuous-time dynamics of the network, the cells do not only interact with their nearest neighbors but also with cells that are not directly connected. Among the applications that can be electronically realized are character recognition (Chua and Yang, 1988), edge filtering (Chen et al., 2006; Comte et al., 2001), noise filtering (Comte et al., 1998; Julián and Dogaru, 2002; Marquié et al., 1998), contrast enhancement, and graylevel extraction with a nonlinear oscillators network (Morfu, 2005; Morfu et al., 2007). The principle of CNN integration with discrete electronic components is closely related to the development of nonlinear electrical transmission lines (NLTLs) (Remoissenet, 1999). Indeed, under certain conditions (Chua, 1998), the parallel processing of information can be ruled by nonlinear differential equations that also describe the evolution of the voltage at the nodes of an electrical lattice. It is then clear that considering a onedimensional (1D) lattice allows signal filtering, while extending the concept to a 2D network can provide image processing applications. The development of NLTLs was motivated mainly by the fact that these systems are quite simple and relatively that inexpensive experimental devices allow quantitative study of the properties of nonlinear waves (Scott, 1970). In particular, since the pioneering works by Hirota and Suzuki (1970) and Nagashima and Amagishi (1978) on electrical lines simulating the Toda lattice (Toda, 1967), these NLTLs, which can be considered as analog simulators, provide a useful way to determine the behavior of excitations inside the nonlinear medium (Jäger, 1985; Kuusela, 1995; Marquié et al., 1995; Yamgoué et al., 2007). This chapter is devoted primarily to the presentation of a few particular nonlinear processing tools and discusses their electronic implementation with discrete components. After a brief mechanical description of nonlinear systems, we present a review of the properties of both purely inertial systems and overdamped systems. The following sections show how taking advantage of these properties allows the development of unconventional processing methods. Especially considering the features of purely inertial systems, we show how it is possible to perform various image-processing tasks, such as contrast enhancement of a weakly contrasted picture, extraction of gray levels, or encryption of an image. The electronic sketch of the elementary cell of this
Nonlinear Systems for Image Processing
83
inertial CNN is proposed, and the nonlinear properties that allows the previous image processing tasks are experimentally investigated. The third part of this chapter is devoted exclusively to the filtering applications inspired by reaction-diffusion media—for example, noise filtering, edge detection, or extraction of interest regions in a weakly noisy contrasted picture. In each case, the elementary cell of the electronic CNN is developed and we experimentally investigate its behavior in the specific context of signal-image processing. We conclude by discussing the possible microelectronic implementations of the previous nonlinear systems. In addition, the last section contains some perspectives for future developments inspired by recent properties of nonlinear systems. In particular, we present a paradoxical nonlinear effect known as stochastic resonance (Benzi et al., 1982; Chapeau-Blondeau, 1999; Gammaitoni et al., 1998), which is purported to have potential applications in visual perception (Simonotto et al., 1997). We trust that the multiple topics in this contribution will assist readers in better understanding the potential applications based on the properties of nonlinear systems. Moreover, the various electronic realizations presented constitute a serious background for future experiments and studies devoted to nonlinear phenomena. As it is written for an interdisciplinary readership of physicist and engineers, it is our hope that this chapter will encourage readers to perform their own experiments.
II. MECHANICAL ANALOGY In order to understand the image-processing tools inspired by the properties of nonlinear systems, we present a mechanical analogy of these nonlinear systems. From a mechanical point of view, we consider a chain of particles of mass M submitted to a nonlinear force f deriving from a potential and coupled with springs of strength D. If Wn represents the displacement of the particle n, the fundamental principle of the mechanics is written as
M
d 2 Wn dWn d +λ + Rn , =− dt dWn dt2
(1)
d2 W dW represents the inertia term and λ corresponds to a friction dt dt2 force. Furthermore, the resulting elastic force Rn applied to the nth particle by its neighbors can be defined by:
where M
Rn = D
!
" Wj − Wn ,
j∈Nr
(2)
Saverio Morfu et al.
84
where Nr is the neighborhood, namely, Nr = {n − 1, n + 1} in the case of a 1D chain. We propose to investigate separately the purely inertial case, that is d2 W dW d2 W , and the overdamped one deduced when M 2 > λ dt dt dt dW λ . dt
A. Overdamped Case In this section, an overdamped system is presented by neglecting the inertia term of Eq. (1) compared to the friction force. We specifically consider λ = 1 and the case of a cubic nonlinear force
f (W) = −W(W − α)(W − 1),
(3)
DW deriving from the double-well potential (W) = − 0 f (u)du as represented in Figure 1 for different values of α. The roots of the nonlinear force 0 and 1 correspond to the positions of the local minima of the potential, namely, the well bottoms, whereas the root α represents the position of the potential maximum. The nonlinearity threshold α defines the potential barrier between the potential minimum with the highest energy and the potential maximum. To explain the propagation mechanism in this chain, it is convenient to define the excited state by the position of the potential minimum with the highest energy, and the rest state by the position corresponding to the minimum of the potential energy. As shown in Figure 1a,
0.02
0.06 ␣ 5 0.4
␣ 5 0.8
0.04
20.02
␣ 5 0.7
F(W )
F(W )
0
␣ 5 0.2
0.02 20.04 20.2
␣ 5 0.6
␣ 5 0.3
0
0.2
0.4
0.6
0.8
1
1.2
0 20.2
0
0.2
0.4
0.6
W
W
(a)
(b)
0.8
1
1.2
FIGURE 1 Double-well potential deduced from the nonlinear force (3). (a) For α < 1/2 the well Dbottom with highest energy is located at W = 0, the potential barrier is given α by = 0 f (u)du = φ(α) − φ(0). (b) For α > 1/2 the symmetry of the potential is reversed: W = 1 becomes D α the position of the well bottom of highest energy, and the potential barrier is = 1 f (u)du = φ(α) − φ(1).
Nonlinear Systems for Image Processing
85
the excited state is 0 and the rest state is 1 when the nonlinearity threshold α < 1/2. In the case α > 1/2, since the potential symmetry is reversed, the excited state becomes 1 and the rest state is 0 (Figure 1b). The equation that rules this overdamped nonlinear systems can be deduced from Eq. (1). Indeed, when the second derivative versus time is neglected compared to the first derivative and when λ = 1, Eq. (1) reduces to the discrete version of Fisher’s equation, introduced in the 1930s as a model for genetic diffusion (Fisher, 1937):
dWn = D(Wn+1 + Wn−1 − 2Wn ) + f (Wn ). dt
(4)
1. Uncoupled Case We first investigate the uncoupled case, that is, D = 0 in Eq. (4), to determine the bistability of the system. The behavior of a single particle of displacement W and initial position W 0 obeys
dW = −W(W − α)(W − 1). dt
(5)
The zeros of the nonlinear force f , W = 1 and W = 0 correspond to stable steady states, whereas the state W = α is unstable. The stability analysis can be realized by solving Eq. (5) substituting the nonlinear force f = −W(W − α)(W − 1) for its linearized expression near the considered steady states W ∗ ∈ {0, 1, α}. If fW (W ∗ ) denotes the derivative versus W of the nonlinear force for W = W ∗ , we are led to solve:
dW = fW (W ∗ )(W − W ∗ ) + f (W ∗ ). dt
(6)
The solution of Eq. (6) can then be easily expressed as
W(t) = W ∗ + Ce fW (W
∗ )t
−
f (W ∗ ) fW (W ∗ )
(7)
where C is a constant depending on the initial condition—the initial position of the particle. The solution in Eq. (7), obtained with a linear approximation of the nonlinear force f , shows that the stability is set by the sign of the argument of the exponential function. Indeed, for W ∗ = 0 and W ∗ = 1, the sign of fW (W ∗ ) is negative, involving that W(t # → ∞) tends to a constant. Therefore, the two points W ∗ = 0 and W ∗ = 1 are stable steady states. Conversely, for W ∗ = α, fW (W ∗ ) is
Saverio Morfu et al.
86
positive, inducing a divergence for W(t # → ∞). W ∗ = α is an unstable steady state. We now focus our attention on the particular case α = 1/2 since it will allow interesting applications in signal and image processing. This case is intensively developed in Appendix A, where it is shown that the displacement of a particle with initial position W 0 can be expressed by
⎞
⎛ W(t) =
1 2
− 1⎜ ⎟ ⎠. ⎝1 + # 1 2 1 − t (W 0 − 2 )2 − W 0 (W 0 − 1)e 2 W0
(8)
This theoretical expression is compared in Figure 2 to the numerical results obtained solving Eq. (5) using a fourth-order Runge–Kutta algorithm with integrating time step dt = 10−3 . As shown in Figure 2, when the initial condition W 0 is below the unstable state α = 1/2, the particle evolves toward the steady state 0. Otherwise, if the initial condition W 0 exceeds the unstable state α = 1/2, the particle evolves toward the other steady state 1. Therefore, the unstable states α = 1/2 acts as a threshold and the system exhibits a bistable behavior.
1
1 Stable state
0.9 0.8
0.8
0.7
0.7 Unstable state
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3 0.2
0.2 Stable state
0.1 0
displacement W
Displacement x (normalized units)
0.9
0
1
2
3
4 5 6 7 t (normalized units)
8
9
10
16
8 F(W )(1023)
0.1 0
0
FIGURE 2 Bistable behavior of the overdamped system in the case α = 1/2. Left: Evolution of a particle for different initial conditions in the range [0; 1]. The solid line is plotted with the analytical expression in Eq. (8), whereas the (o) signs correspond to the numerical solution of Eq. (5) for different initial conditions W 0 ∈ [0; 1]. The potential φ obtained by integrating the nonlinear force (3) is represented at the right as a reference.
Nonlinear Systems for Image Processing
87
2. Coupled Case We now consider the coupled case (D = 0). In such systems ruled by Eq. (4), the balance between the dissipation and the nonlinearity gives rise to the propagation of a kink (a localized wave) called a diffusive soliton that propagates with constant velocity and profile (Remoissenet, 1999). To understand the propagation mechanism, we first consider the weak coupling limit and the case α < 1/2. The case of strong coupling, which corresponds to a continuous medium, is discussed later since it allows theoretical characterization of the waves propagating in the medium.
a. Weak Coupling Limit. As shown in Figure 3a, initially all particles of the chain are located at the position 0—the excited state. To initiate a kink, an external forcing allows the first particle to cross the potential barrier in W = α and to fall in the right well, at the rest state defined by the position W = 1. Thanks to the spring coupling the first particle to the second one, but despite the second spring, the second particle attempts α3 α4 (Morfu, 2003) to cross the potential barrier with height (α) = − + 12 6 (see Figure 3b).
W4 D W3
W4 D (Wn)
W3
?
(Wn)
D
W2
D D W2
W1
D Δ(␣)
D
Δ(␣)
W1
0
␣ (t 5 0)
1
Wn
0 W2 ␣
1
Wn
(t . 0)
(a) (b) FIGURE 3 Propagation mechanism. (a) Initially all particles of the chain are in the excited state 0, that is, at the bottom of the well with highest energy. (b) State of the chain for t > 0. The first particle has crossed the potential barrier and attempts to pull the second particle down in its fall.
88
Saverio Morfu et al.
According to the value of the resulting force applied to the second particle by the two springs compared to the nonlinear force f between [0, α[, two behaviors may occur: 1. If the resulting elastic force is sufficiently important to allow the second particle to cross the potential barrier (α), then this particle falls in the right well and pulls the next particle down in its fall. Since each particle of the chain successively undergoes a transition from the excited state 0 to the rest state 1, a kink propagates in the medium. Moreover, its velocity increases versus the coupling and as the barrier decreases (namely, as α decreases). 2. Otherwise, if the resulting force does not exceed a critical value (i.e., if D < D∗ (α)), the second particle cannot cross the potential barrier and thus stays pinned at a position w in [0; α[: it is the well-known propagation failure effect (Comte et al., 2001; Erneux and Nicolis, 1993; Keener, 1987; Kladko et al., 2000). The mechanical model associated with Eq. (4) shows that in the weak coupling limit the characteristics of the nonlinear system are ruled by the coupling D and the nonlinear threshold α. Moreover, the propagation of a kink is due to the transition from the excited state to the rest state and is only possible when the coupling D exceeds a critical value D∗ (α).
b. Limit of Continuous Media. The velocity of the kink and its profile can be theoretically obtained in the limit of continuous media—when the coupling D is large enough compared to the nonlinear strength. Then, in the continuous limit, the discrete Laplacian of Eq. (4) can be replaced by a second derivative versus the space variable z: ∂2 W ∂W = D 2 + f (W). ∂t ∂z
(9)
This equation, introduced by Nagumo in the 1940s as an elementary representation of the conduction along an active nerve fiber, has an important meaning in understanding transport mechanism in biological systems (Murray, 1989; Nagumo et al., 1962). Unlike the discrete Equation (4), the continuous Equation (9) admits D1 propagative kink solution only if 0 f (u)du = 0, which reduces to α = 1/2 in the case of the cubic force (3) (Scott, 1999). Introducing the propagative variable ξ = z − ct, these kinks and antikinks have the form (Fife, 1979; Henry, 1981)
! " 1 1 1 ± tanh √ (ξ − ξ0 ) , W(ξ) = 2 2 2D
(10)
89
Nonlinear Systems for Image Processing
where ξ0 is the initial position ; of the kink for t = 0 and where the kink velocity is defined by c = ± D/2(1 − 2α). When α < 1/2, the excited state is 0, and the rest state is 1. Therefore, the rest state 1 spreads in the chain, which set the sign of the velocity according to the profile of the kink initiated in the nonlinear system: " − ξ0 ) , a kink ; propagates from left to right with a positive velocity c = D/2(1 − 2α) (Figure 4a, left). " !
1. If the profile is given by W(ξ) =
1 2
!
1 − tanh
2. Otherwise, if the profile is set by W(ξ) =
1 2
√1 (ξ 2 2D
√1 (ξ 2 2D
1 + tanh
− ξ0 )
,
a; kink propagates from right to left with a negative velocity c = − D/2(1 − 2α) (Figure 4a, right). When α > 1/2, since the symmetry of the potential is reversed, the excited state becomes 1 and the rest state is 0. The propagation is then due to a transition between 1 and 0, which provides the following behavior: ! " 1. If W(ξ) = 12 1 − tanh √1 (ξ − ξ0 ) , a kink propagates from right to 2 2D ; left with a negative velocity c = D/2(1 − 2α) (Figure 4b, left).
25
0 Z
5
0.6
0.5
0.2 0 215
15
25
0 Z
5
W
W(z)
W(z)
0.6 0.2 0 215
1
1
1
0 15 0.5 21.5 23.5 F(W ) 102
(a)
25
0 Z
5
15
0.6
0.5
0.2 0 215
25
0 Z
5
15
4
W
W(z)
W(z)
0.6 0.2 0 215
1
1
1
0 2 0 2 F(W ) 10
(b)
FIGURE 4 Propagative solution of the continuous Nagumo Equation (9) with D = 1. Spatial representation of the kink for t = 0 in dotted line and for t = 20 in solid line. The arrow indicates the propagation direction, the corresponding potential is represented at the right end to provide a reference. (a) α = 0.3, (b) α = 0.7.
Saverio Morfu et al.
90
" − ξ0 ) , a kink propagates from left ; to right with a positive velocity c = − D/2(1 − 2α) (Figure 4b, right).
2. Else if W(ξ) =
1 2
!
1 + tanh
√1 (ξ 2 2D
B. Inertial Systems In this section, we neglect the dissipative term of Eq. (1) compared to the inertia term and we restrict our study to the uncoupled case. Moreover, in image-processing context, it is convenient to introduce a nonlinear force f under the form
f (W) = −ω02 (W − m)(W − m − α)(W − m + α),
(11)
where, m and α < m are two parameters that allow adjusting the width and height = ω02 α4 /4 of the potential (Figure 5):
E (W) = −
0
W
f (u)du.
(12)
The nonlinear differential equation that rules the uncoupled chain can be deduced by inserting the nonlinear force (11) into Eq. (1) with D = 0. 0
0
first particle: W 1
21 0
second particle: W 2
Potential energy
22 23 24 25 26 27 28 0
W2
0
0.5
1
1.5 2 2.5 3 3.5 4 m 2 ␣!W 2 m 1 ␣!W 2 W (Arb.Unit)
4.5
5
0
2m 2W 2
FIGURE 5 Double-well potential deduced from the nonlinear force (11) represented √ for m = 2.58, α = 1.02, and ω0 = 1. A particle with an initial condition Wi0 < m − α 2 evolves with an initial potential energy above the barrier .
Nonlinear Systems for Image Processing
91
Neglecting the dissipative term, the particles of unitary mass are then ruled by the following nonlinear oscillator equations:
d 2 Wi = f (Wi ). dt2
(13)
1. Theoretical Analysis We propose here to determine analytically the dynamics of the nonlinear oscillators obeying Eq. (13) (Morfu and Comte, 2004; Morfu et al., 2006). Setting xi = Wi − m, Eq. (13) can be rewritten as
d2 xi = −ω02 xi (xi − α)(xi + α). dt2
(14)
Noting xi0 the initial position of the particle i and considering that all the particles initially have a null velocity, the solutions of Eq. (14) can be expressed with the Jacobian elliptic functions as
xi (t) = xi0 cn(ωi t, ki ),
(15)
where ωi and 0 ≤ ki ≤ 1 represent, respectively, the pulsation and the modulus of the cn function (see recall on the properties of Jacobian elliptic function in Appendix B). Deriving Eq. (15) twice and using the properties in Eq. (B3), yields
dxi = −xi0 ωi sn(ωi t, ki )dn(ωi t, ki ), dt
d2 xi 0 2 2 2 = −x ω cn(ω t, k ) dn (ω t, k ) − k sn (ω t, k ) i i i i i i i . i i dt2
(16)
Using the identities in Eq. (B4) and (B5), Eq. (16) can be rewritten as
2ki ωi2 2ki − 1 02 d 2 xi = − x x2 − xi . 2 0 2ki dt2 xi
(17)
Identifying this last expression with Eq. (14), we derive the pulsation of the Jacobian elliptic function
# 2 ωi = ω0 xi0 − α2 ,
(18)
Saverio Morfu et al.
92
and its modulus 2
1 xi0 ki = . 2 x02 − α2
(19)
i
Finally, introducing the initial condition Wi0 = xi0 + m, the solution of Eq. (13) can be straightforwardly deduced from Eqs. (15), (18), and (19):
/ 0 Wi (t) = m + Wi0 − m cn(ωi t, ki ),
(20)
with
/
ωi Wi0
0
# = ω0
/
Wi0
−m
02
− α2
and
/
ki Wi0
0
02 / 0 Wi − m 1 = / . 0 2 W 0 − m 2 − α2 i (21)
Both the modulus and the pulsation are driven by the initial condition Wi0 . Moreover, the constraints to ensure the existence of the pulsation / 02 ωi and of the modulus, respectively, are written as Wi0 − m − α2 ≥ 0 and 0 ≤ ki ≤ 1. These two restrict the range of the conditions
allowed ini√ F √ m + α 2; +∞ , as shown in tial conditions Wi0 to −∞; m − α 2 Figure 6, where the pulsation and the modulus are represented versus the initial condition Wi0 . Note that this allowed range of initial conditions corresponds also to a particle with an initial potential energy exceeding the barrier between the potential extrema (see Figure 5).
2. Nonlinear Oscillator Properties To illustrate the properties of nonlinear oscillators, we consider a chain of length N = 2 particles with a weak difference of initial conditions and with a null initial velocity. The dynamics of these two oscillators are ruled by Eq. (20), where the pulsation and modulus of both oscillators are driven by their respective initial condition. Moreover, we have restricted our study to the case of the following nonlinearity parameters m = 2.58, α = 1.02, ω0 = 104 . We have applied the initial condition W10 = 0 to the first oscillator, while the initial condition of the second oscillator is set to W20 = 0.2, which corresponds to the situation of Figure 5. Figure 7a shows that the oscillations of both particles take place in the range [Wi0 ; 2m − Wi0 ] as predicted by Eq. (20) [that is, [0; 5.16] for the first oscillator and [0; 4.96] for the second one]. Moreover, owing to their difference of initial amplitude and to the nonlinear behavior of the system, the two oscillators quickly attain a phase opposition for the first time at t = topt = 1.64 × 10−3 . This phase opposition corresponds to the
93
Nonlinear Systems for Image Processing
2.5 Ⲑ0
2 1.5
⌬Ⲑ0
1 0.5 0
Forbidden range of parameters ]m 2 ␣ŒW; 2 m 1 ␣ŒW[ 2
⌬Ⲑ0
0
⌬Wi
0
0.5
1
⌬Wi0
1.5
2
2.5 Wi0
3
3.5
4
4.5
5
4
4.5
5
(a) 1.5 Forbidden range of parameters ]m 2 ␣ŒW; 2 m 1 ␣ŒW[ 2
k
1 0.5 0
0
0.5
1
1.5
2
2.5 Wi0
3
3.5
(b) FIGURE 6 (a): Normalized pulsation ω/ω0 versus the initial condition Wi0 . (b) Modulus parameter k versus Wi0 . The parameters of theFnonlinearity m = 2.58, α = 1.02 impose the allowed amplitude range ] − ∞; 1.137] ]4.023; +∞[.
5 4 3 2 1 0
topt 5 4 3
0
0.5
1
1.5 time
2
2.5 3 23 x(10 )
␦(t) 5 W2(t) 2 W1(t)
w2(t)
w1(t)
topt 6 5 4 3 2 1 0
2 1 0 21 22 23 24
0
0.5
1
1.5 time
(a)
2
2.5 3 x (1023)
25
0
0.5
1
1.5 time
2
2.5
3 23 x (10 )
(b)
FIGURE 7 (a) Temporal evolution of the two oscillators. Top panel: evolution of the first oscillator with initial condition W10 = 0. Bottom panel: evolution of the second oscillator with initial condition W20 = 0.2. (b) Temporal evolution of the displacement difference δ between the two oscillators. Parameters: m = 2.58, α = 1.02, and ω0 = 1.
94
Saverio Morfu et al.
situation where the first oscillator has reached its minimum W1 (topt ) = 0, whereas the second oscillator has attained its maximum W2 (topt ) = 4.96. As shown in Figure 7b, the displacement difference δ(t) = W2 (t) − W1 (t) is then maximum for t = topt and becomes δ(topt ) = 4.96. For this optimal time, a “contrast enhancement” of the weak difference of initial conditions is realized, since initially the displacement difference was δ(t = 0) = 0.2. Note that in Figure 7b, the displacement difference between the two oscillators also presents a periodic behavior with local minima and local maxima. In particular, the difference δ(t) is null for t = 3.96 × 10−5 , t = 1.81 × 10−4 , t = 3.5 × 10−4 , t = 5.21 × 10−4 ; minimum for t = 1.4 × 10−4 , t = 4.64 × 10−4 , t = 1.47 × 10−3 and maximum for t = 3 × 10−4 , t = 6.29 × 10−4 , t = 1.64 × 10−3 . These characteristic times will be of crucial interest in image-processing context to define the filtered tasks performed by the nonlinear oscillators network. Figure 6a reveals that the maximum variation of the pulsation com√ pared to the amplitude Wi0 , that is, ω/ω0 , is reached for Wi0 = m − α 2, that is, for a particle with an initial potential energy near the barrier . Therefore, to quickly realize a great amplitude contrast between the two oscillators, √ it could be interesting to launch them with an initial amplitude near m − α 2, or to increase the potential barrier height . We chose to investigate this latter solution by tuning the parameter of the nonlinearity α, when the initial amplitude of both oscillators remains W10 = 0 and W20 = 0.2. The results are reported in Figure 8, where we present the evolution of the difference δ(t) for different values of α. As expected, when the nonlinearity parameter α increases, the optimal time is significantly √ reduced. However, when α is adjusted near the critical value (m − W20 )/ 2 as in Figure 8d, the optimum reached by the difference δ(t) is reduced to 4.517 for α = 1.63 instead of 4.96 for α = 1.02. Even if it is not the best contrast enhancement that can be performed by the system, the weak difference of initial conditions between the two oscillators is nevertheless strongly enhanced for α = 1.63. To highlight the efficiency of nonlinear systems, let us consider the case of a linear force f (W) = −ω0 W in Eq. (13). In the linear case, the displacement difference δ(t) between two harmonic oscillators can be straightforwardly expressed as
δ(t) = cos(ω0 t),
(22)
where represents the slight difference of initial conditions between the oscillators. This last expression shows that it is impossible to increase the weak difference of initial conditions since the difference δ(t) always remains in the range [−; ]. Therefore, nonlinearity is a convenient solution to overcome the limitation of a linear system and to enhance a weak amplitude contrast.
Nonlinear Systems for Image Processing
topt
topt 5 ␦(t) 5W2(t ) 2W1(t)
␦(t) 5W2(t ) 2W1(t)
5 3 1 21 23 25 0
1
2
3 1 21 23 25 0
3
2 t (b)
topt
3 31023
topt 5 ␦(t ) 5W2(t ) 2W1(t )
5 ␦(t ) 5W2(t ) 2W1(t )
1
31023
t (a)
3 1 21 23 25 0
95
1
2
3
3 1 21 23 25 0
1 2 3 t 31023 t 31023 (c) (d) FIGURE 8 Influence of the nonlinearity parameter α on the displacement difference δ between the two oscillators of respective initial conditions 0 and 0.2. Parameters m = 2.58 and ω0 = 1. (a): (topt = 1.75 × 10−3 ; α = 0.4). (b): (topt = 1.66 × 10−3 ; α = 1.05). (c): (topt = 1.25 × 10−3 ; α = 1.5). (d): (topt = 0.95 × 10−3 ; α = 1.63).
III. INERTIAL SYSTEMS This section presents different image-processing tasks inspired by the properties of the nonlinear oscillators presented in Section II.B. Their electronic implementation is also discussed.
A. Image Processing By analogy with a particle experiencing a double-well potential, the pixel number (i, j) is analog to a particle (oscillator) whose initial position
96
Saverio Morfu et al.
corresponds to the initial gray level Wi,0 j of this pixel. Therefore, if N × M denotes the image size, we are led to consider a 2D network, or CNN, consisting of uncoupled nonlinear oscillators. The node i, j of this CNN relates to
d2 Wi, j dt2
= −ω02 (Wi, j − m − α)(Wi, j − m + α)(Wi, j − m),
(23)
with i = 1, 2 . . . N and j = 1, 2 . . , M. Note that we take into account the range of oscillations [0; 2m − Wi,0 j ] predicted in Section II.B.2 to define the gray scale of the images, namely, 0 for the black level and 2m = 5.16 for the white level. The image to be processed is first loaded as the initial condition at the nodes of the CNN. Next, the filtered image for a processing time t can be deduced noting the position reached by all oscillators of the network at this specific time t. More precisely, the state of the network at a processing time t is obtained by solving numerically Eq. (23) with a fourth-order Runge– Kutta algorithm with integrating time step dt = 10−6 .
1. Contrast Enhancement and Image Inversion The image to process with the nonlinear oscillator network is the weak contrasted image of Figure 9a. Its histogram is restricted to the range [0; 0.2], which means that the maximum gray level of the image (0.2) is the initial condition of at least one oscillator of the network, while the minimum gray level of the image (0) is also the initial condition of at least one oscillator. Therefore, the pixels with initial gray level 0 and 0.2 oscillate with the phase difference δ(t) predicted by Figure 7b. In particular, as explained in Section II.B.2, their phase difference δ(t) can be null for the processing times t = 3.96 × 10−4 , 1.81 × 10−4 , 3.5 × 10−4 , and 5.21 × 10−4 ; minimum for t = 1.4 × 10−4 , 4.64 × 10−4 , and 1.47 × 10−3 and maximum for t = 3 × 10−4 , 6.29 × 10−3 , and 1.64 × 10−3 . As shown in Figure 9b, 9d, 9f, and 9h, the image goes through local minima of contrast at the processing times corresponding to the zeros of δ(t). Furthermore, the processing times providing the local minima of δ(t) realize an image inversion with a growing contrast enhancement (Figure 9c, 9g, and 9j). Indeed, since the minima of δ(t) are negative, for these processing times the minimum of the initial image becomes the maximum of the filtered image and vice versa. Finally, the local maxima of δ(t) achieve local maxima of contrast for the corresponding processing times (Figure 9e, 9i, and 9k). Note that the best enhancement of contrast is attained at the processing time topt for which δ(t) is maximum. The histogram of each filtered image in Figure 9 also reveals the temporal dynamic of the network. Indeed, the width of the image histogram is periodically increased and decreased, which indicates that the
Nonlinear Systems for Image Processing
800
2500
0
4.96
500
0
4.96
2500
4.96 (c)
350 0
4.96
1200
0
4.96
0
(e)
(d)
300 4.96
250
0
(g)
4.96
0
(h)
500
4.96 (f)
1600 0
0
(b)
(a)
4.96 (i)
300 0
4.96 (j)
0
4.96 (k)
FIGURE 9 Filtered images and their corresponding histogram obtained with the nonlinear oscillators network (23) for different processing times. (a) Initial image (t = 0). (b) t = 3.96 × 10−5 . (c) t = 1.4 × 10−4 . (d) t = 1.81 × 10−4 . (e) t = 3 × 10−4 . (f) t = 3.5 × 10−4 . (g) t = 4.64 × 10−4 . (h) t = 5.21 × 10−4 . (i) t = 6.29 × 10−4 . (j) t = 1.47 × 10−3 . (k) t = topt = 1.64 × 10−3 . Parameters: m = 2.58, α = 1.02, ω0 = 1.
97
98
Saverio Morfu et al.
contrast of the corresponding filtered image is periodically enhanced or reduced. Another interesting feature of the realized contrast is determined by the plot of the network response at the processing time topt (Morfu, 2005). This curve also represents the gray level of the pixels of the filtered image versus their initial gray level. Therefore, the horizontal axis corresponds to the initial gray scale, namely, [0; 0.2], whereas the vertical axis represents the gray scale of the processed image. Such curves are plotted in Figure 10 for different values of the nonlinearity parameter α, and at the optimal time defined by the maximum of δ(t). In fact, these times were established in Section II.B.2 in Figure 8. Moreover, to compare the nonlinear contrast enhancement to a uniform one, we have superimposed (dotted line) the curve resulting from a simple multiplication of the initial gray scale by a scale factor. In Figure 10a, since the response of the system for the lowest value of α is most often above the dotted line, the filtered image at the processing time topt = 1.75 × 10−3 for α = 0.4 will be brighter than the image obtained with a simple rescaling. 5
Wi
Wi
5
2.5
0
0
0.1
2.5
0
0.2
0
0
(a)
(b) 5
Wi
Wi
0.2
0
Wi
5
2.5
0
0.1
Wi
0
0.1
W
0 i
(c)
0.2
2.5
0
0
0.1
W
0.2
0 i
(d)
FIGURE 10 Response of the nonlinear system for different nonlinearity parameters α at the corresponding optimal time topt (solid line) compared to a uniform rescaling (dotted line). The curves are obtained with Eqs. (20) and (21) setting the time to the optimum value defined by the maximum of δ(t) (see Figure 8). In addition, we let the initial conditions Wi0 vary in the range [0; 0.2] in Eqs. (20) and (21). (a): (topt = 1.75 × 10−3 ; α = 0.4). (b): (topt = 1.66 × 10−3 ; α = 1.05). (c): (topt = 1.25 × 10−3 ; α = 1.5). (d): (topt = 0.95 × 10−3 ; α = 1.63), ω0 = 1.
Nonlinear Systems for Image Processing
99
As shown in Figure 10b, increasing the nonlinearity parameter α to 1.05 involves an optimum time 1.66 × 10−3 and symmetrically enhances the light and dark gray levels. When the nonlinearity parameter is adjusted to provide the greatest potential barrier (Figure 10c and 10d), the contrast of the medium gray level is unchanged compared to a simple rescaling. Moreover, the dark and light grays are strongly enhanced with a greater distortion when the potential barrier is maximum, that is, for the greatest value of α (Figure 10d).
2. Gray-Level Extraction Considering processing times exceeding the optimal time topt , we propose to perform a gray-level extraction of the continuous gray scale represented in Figure 11a (Morfu, 2005). For the sake of clarity, it is convenient to redefine the white level by 0.2, whereas the black level remains 0. For the nine specific times presented in Figure 11, the response of the system displays a minimum that is successively reached for each level of the initial gray scale. Therefore, with time acting as a discriminating parameter, an appropriate threshold filtering allow extraction of all pixels with a gray level in a given range. Indeed, in Figure 11, the simplest case of a constant threshold Vth = 0.25 provides nine ranges of gray at nine closely different processing times, which constitutes a gray-level extraction. Moreover, owing to the response of the system, the width of the extracted gray-level ranges is reduced in the light gray. Indeed, the range extracted in the dark gray for the processing time t = 3.33 × 10−3 (Figure 11c) is approximatively twice greater than the range extracted in the light gray for t = 3.51 × 10−3 (Figure 11i). To perform a perfect gray-level extraction, the threshold must match with a slight offset the temporal evolution of the minimum attained by the response of the system. Under these conditions, the width of the extracted gray range is set by the value of this offset. Note that the response of the system after the optimal processing times also allow consecutive enhancement of the fragment of the image with different levels of brightness, which is also an important feature of image processing. For instance, in Belousov–Zhabotinsky-type media this property of the system enabled Rambidi et al. (2002) to restore individual components of the picture when the components overlap. Therefore, we trust that considering the temporal evolution of the image loaded in our network could give rise to other interesting image-processing operations.
3. Image Encryption Encryption is another field of application of nonlinear systems. In fact, the chaotic behavior of nonlinear systems can sometimes produce chaotic like waveforms that can be used to encrypt signals for secure communications (Cuomo and Oppenheim, 1993; Dedieu et al., 1993). Even if
100
Saverio Morfu et al.
0.2
0 (a)
Wi0
0.2
0
0
0.2
0.2
0 0
(e)
0.2
0 0
0 0
0.2
6 Wi
Wi
Wi
0.2
Wi0
(g)
6
Wi0
0.2
Wi Wi0
(f)
6
Wi0
6
Wi Wi0
0 0
(d)
6
Wi
6
0 0
Wi0
(c)
(b)
0 0
wi
Wi
Wi 0 0
6
6
6
Wi0
0.2
0 0
Wi0
0.2
(h) (i) (j) FIGURE 11 Gray-level extraction. The response of the system is represented at the top of each figure. At the bottom of each figure, a threshold filtering of the filtered image is realized replacing the pixel gray level with 0.2 (white) if that gray level exceeds the threshold Vth = 0.25, otherwise with 0 (black). (a) Initial gray scale (t = 0). (b) t = 3.3 × 10−3 . (c) t = 3.33 × 10−3 . (d) t = 3.36 × 10−3 . (e) t = 3.39 × 10−3 . (f) t = 3.42 × 10−3 . (g) t = 3.45 × 10−3 . (h) t = 3.48 × 10−3 . (i) t = 3.51 × 10−3 . (j) t = 3.54 × 10−3 . Nonlinearity parameters: m = 2.58, α = 1.02, and ω0 = 1.
Nonlinear Systems for Image Processing
101
many attempts to break the encryption key of these cryptosystems and to retrieve the information have been reported (Short and Parker, 1998; Udaltsov et al., 2003), cryptography based on the properties of chaotic oscillators still attracts the attention of researchers because of the promising applications of chaos in the data transmission field (Kwok and Tang, 2007). Contrary to most studies, in which the dynamics of a single element are usually considered, we propose here a strategy of encryption based on the dynamics of a chain of nonlinear oscillators. More precisely, we consider the case of a noisy image loaded as the initial condition in the inertia network introduced in Section II.B. In addition, we add a uniform noise over [−0.1; 0.1] to the weak-contrast picture of the Coliseum represented in Figure 9a. Since the pixels of the noisy image assume a gray level in the range [−0.1; 0.3], an appropriate change of scale is realized to reset the dynamics of the gray levels to [0; 0.2]. The resulting image is then loaded as the initial condition in the network. For the sake of clarity, the filtered images are presented at different processing times with the corresponding system response in Figure 12. Before the optimal time, we observe the behavior described in Section III.A.1: the image goes through local minima and maxima of contrast until the optimum time topt = 1.64 × 10−3 , where the best contrast enhancement is realized (Figure 12a). Next, for processing times exceeding topt , the noisy part of the image seems to be amplified while the coherent part of the image begins to be increasingly less perceptible (see Figure 12b and 12c obtained for t = 3.28 × 10−3 and t = 6.56 × 10−3 ). Finally, for longer processing times, namely, t = 8.24 × 10−3 and t = 9.84 × 10−3 , the noise background has completely hidden the Coliseum, which constitutes an image encryption. Note that this behavior can be explained with the response of the system, as represented below each filtered image in Figure 12. Indeed, until the response of the system versus the initial condition does not display a “periodic-like” behavior, the coherent part of the image remains perceptible (Figure 12a and 12b). By contrast, as soon as a “periodicity” appears in the system response, the coherent image begins to disappear (Figure 12c). Indeed, the response in Figure 12c shows that four pixels of the initial image with four different gray levels take the same final value in the encrypted image (see the arrows). Therefore, the details of the initial image, which corresponds to the quasi-uniform area of the coherent image, are merged and thus disappear in the encrypted image. Despite the previous merging of gray levels, since noise induces sudden changes in the gray levels of the initial image, the noise conserves its random feature in the encrypted image. Moreover, since the system tends to enlarge the range of amplitude, the weak initial amount of noise is strongly amplified whenever the processing time exceeds topt . The periodicity of
Saverio Morfu et al.
102
5
5
0
0
Wi
0
0.2
Wi
Wi
Wi 0
5
0
0
0.2
Wi
(a)
0
0
(b)
5
0.2
(c)
Wi
5
Wi 0 0
0
Wi
0
Wi
0.2
0
0
0
Wi
0.2
(d) (e) FIGURE 12 Encrypted image and the corresponding response of the nonlinear oscillators network for different times exceeding topt . (a): Enhancement of contrast of the initial image for t = topt = 1.64 × 10−3 . (b): t = 3.28 × 10−3 . (c): t = 6.56 × 10−3 . (d): t = 8.24 × 10−3 . (e): t = 9.84 × 10−3 . Parameters: m = 2.58, α = 1.02, ω0 = 1.
the system response can then be increased for longer processing times until only the noisy part of the image is perceptible (Figure 12d and 12e). A perfect image encryption is then realized. To take advantage of this phenomenon for image encryption, the coherent information (the enhanced image in Figure 12a), must be restored using the encrypted image of Figure 12e. Fortunately, owing to the absence of dissipation, the nonlinear systems is conservative and reversible. It is thus
Nonlinear Systems for Image Processing
103
possible to revert to the optimal time—when the information was the most perceptible. However, the knowledge of the encrypted image is not sufficient to completely restore the coherent information, since at the time of encryption, the velocity of the oscillators was not null. Consequently, it is necessary to know both the position and the velocity of all particles of the network at the time of encryption. The information then can be restored solving numerically Eq. (23) with a negative integrating time step dt = −10−6 . Under these conditions, the time of encryption constitutes the encryption key.
B. Electronic Implementation The elementary cell of the purely inertial system can be developed according to the principle of Figure 13 (Morfu et al., 2007). First, a polynomial source is realized with analog AD633JNZ multipliers and classical inverting amplifier with gain −K. Taking into account the scale factor 1/10 V −1 of the multipliers, the response of the nonlinear circuit to an input voltage VT 1Wi
0
Wi
m2α
m
m1␣ 2K 2K
AD633JN 2
(Wi 2 m)(Wi 2 m 1 ␣)/10
1 R2C2
ee
AD633JN
P (Wi) 5 (Wi 2 m 1 ␣)(Wi 2 m)(Wi 2 m 2 ␣)K 2/100
Wi 5 2 1 ee P (Wi) R2C2 FIGURE 13 Sketch of the elementary cell of the inertial system. m and α are adjusted with external direct current sources, whereas −K is the inverting amplifier gain obtained using TL081CN operational amplifier. The 1N4148 diode allows introduction of the initial condition Wi0 .
Saverio Morfu et al.
104
Wi is given by
P(Wi ) =
0/ 0/ 0 K2 / W i − m Wi − m − α W i − m + α , 100
(24)
where the roots m, m − α, m + α of the polynomial circuit are set with three different external direct current (DC) sources. As shown in Figure 14, the experimental characteristic of the nonlinear source is then in perfect agreement with its theoretical cubic law [Eq. (24)]. Next, a feedback between the input/output of the nonlinear circuits is ensured by a double integrator with time constant RC such that
K2 W=− 100R2 C2
E E
0/ 0/ 0 Wi − m + α Wi − m − α Wi − m dt.
/
(25)
Deriving Eq. (25) twice, the voltage Wi at the input of the nonlinear circuit is written as
0/ 0/ 0 K2 / d 2 Wi Wi − m + α Wi − m − α Wi − m , = − 2 2 2 dt 100R C
(26)
which corresponds exactly to the equation of the purely inertial system (13) with
ω0 = K/(10RC).
(27)
0.4
P (Wi) (Volt)
0.2
0
20.2
20.4
1.5
2
3
2.5
3.5
Wi (Volt) FIGURE 14 Theoretical cubic law in Eq. (24) in solid line compared to the experimental characteristic plotted with crosses. Parameters: m = 2.58 V, α = 1.02 V, K = 10.
4
Nonlinear Systems for Image Processing
105
Finally, the initial condition Wi0 is applied to the elementary cell via a 1N4148 diode with threshold voltage VT = 0.7 V. We adjust the diode anode potential to Wi0 + VT with an external DC source with the diode cathode potential initially set to Wi0 . Then, according to Section III, the circuit begins to oscillate in the range [Wi0 ; 2m − Wi0 ], while the potential of the diode anode remains VT + Wi0 . Assuming that m > Wi0 /2, which is the case in our experiments, the diode is instantaneously blocked once the initial condition is introduced. Note that using a diode to set the initial condition presents the main advantage to “balance” the effect of dissipation inherent in electronic devices. Indeed, the intrinsic dissipation of the experiments tends to reduce the amplitude of the oscillations Wi0 . As soon as the potential of the diode cathode is below Wi0 , the diode conducts instantaneously, introducing periodically the same initial condition in the elementary cell. Therefore, the switch between the two states of the diode presents the advantage of refreshing the oscillation amplitude to their natural value as in absence of dissipation. In summary, the oscillations are available at the diode cathode and are represented in Figure 15a for two different initial conditions, namely, W10 = 0 V (top panel) and W20 = 0.2 V (bottom panel). As previously explained, the way to introduce the initial condition allows balancing the dissipative effects since the oscillation remains with the same amplitude, namely in the range [0 V; 5.34 V] for the first oscillator with initial condition 0, and [0.2 V; 5.1 V] for the second one. Moreover, these ranges match with fairly good agreement the theoretical predictions presented in Section II.B.2, that is [0 V; 5.16 V] for the first oscillator and [0.2 V; 4.96 V] for the second one. Figure 15a also reveals that the two oscillators quickly achieve a phase opposition at the optimal time topt = 1.46 ms instead of 1.64 ms as theoretically established in Section II.B.2. The oscillations difference between the two oscillators in Figure 15b reaches local minima and maxima in agreement with the theoretical behavior observed in Section III. A maximum of 5.1 V is obtained corresponding to the phase opposition W1 (topt ) = 0 V and W2 (topt ) = 5.1 V. Therefore, the weak difference of initial conditions between the oscillators is strongly increased at the optimal time topt . Despite a slight discrepancy of 11% for the optimal time, mainly imputable to the component uncertainties, a purely inertial nonlinear system is then implemented with the properties of Section III. To perfectly characterize the experimental device, we now focus on the response of the nonlinear system to different initial conditions in the range [0 V; 0.2 V]. The plot of the voltage reached at the optimal time topt = 1.46 ms versus the initial condition is compared in Figure 16 to the theoretical curve obtained for the optimum time defined in Section II.B.2, namely, 1.64 ms. The experimental response of the system is
Saverio Morfu et al.
106
topt topt 6
Amplitude (Volt)
4 2 0 22 24 26 0
1
2 3 Time (ms)
4
5
(a) (b) FIGURE 15 (a): Temporal evolution of two elementary cells of the chain with respective initial conditions W10 = 0 V (top panel) and W20 = 0.2 V (bottom panel). (b): Evolution of the voltage difference between the two oscillators. Parameters: K = 10, R = 10 K, C = 10 nF, m = 2.58 V, α = 1.02 V, topt = 1.46 ms. 5
Wi (topt ) (Volt)
4
3
2
1
0 0
0.04
0.08
0.12
0.16
0.2
0
Wi (Volt) FIGURE 16 Response of the system to a set of initial conditions Wi0 ∈ [0; 0.2] at the optimal time. The solid line is obtained with Eqs. (20), (21), and (27) setting the time to the theoretical optimal value 1.64 ms, the initial condition varying in [0; 0.2V]. The crosses are obtained experimentally for the corresponding optimal time 1.46ms. Parameters: R = 10 K, C = 10 nF, m = 2.58 V, α = 1.02 V, K = 10.
107
Nonlinear Systems for Image Processing
then qualitatively confirmed by the theoretical predictions, which allows establishing the validity of the experimental elementary cell for the contrast enhancement presented in Section III.A.1. Finally, we also propose to investigate the response of the system after the optimum time, since it allows the extraction of gray levels. In order to enhance the measures accuracy, we extend the range of initial conditions to [0, 0.5 V] instead of [0, 0.2 V]. The corresponding experimental optimal time becomes topt = 564 μs, whereas the theoretical ones, deduced with the methodology in Section II.B.2, is 610 μs. The resulting theoretical and experimental responses are then plotted in Figure 17a, where a better agreement is effectively observed compared to Figure 16.
6
Wi ( topt) (Volt)
Wi ( topt) (Volt)
5 4 3 2 1 0 21 20.1
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 20.5 20.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0
0.1
5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 20.1
0.2
0.3
0.4
0.5
0.6
0.4
0.5
0.6
0
Wi (Volt)
(a)
(b) 6 5
Wi ( topt) (Volt)
Wi ( topt) (Volt)
0
Wi (Volt)
4 3 2 1 0
0
0.1
0.2 0
0.3
Wi (Volt)
0.4
0.5
0.6
21 20.1
0
0.1
0.2
0.3
0
Wi (Volt)
(c) (d) FIGURE 17 Theoretical response of the purely inertial system (solid line) compared to the experimental ones (crosses) for 4 different times and for a range of initial conditions [0; 0.5 V ]. Parameters: R = 10 K, C = 10 nF, m = 2.58 V, α = 1.02 V, K = 10. (a) Experimental time t = 564 μs corresponding to the theoretical time t = 610 μs. (b) Experimental time t = 610 μs and theoretical time 713 μs. (c) Experimental time t = 675 μs and theoretical time 789 μs. (d) Experimental time t = 720 μs and theoretical time 841 μs.
Saverio Morfu et al.
108
We have also reported the experimental device response for three different times beyond the optimal time topt = 564 μs in Figure 17b, c, and d—namely, for the experimental times t = 610 μs, t = 675 μs, and t = 720 μs. Since a time scale factor 610/564 = 1.1684 exists between the experimental and the theoretical optimal time, we apply this scale factor to the three previous experimental times. It provides the theoretical times 713 μs, 789 μs, and 841 μs. For each of these three times, we can then compare the experimental response to the theoretical one deduced by letting the initial condition vary in [0; 0.5 V] in Eqs. (20), (21), and (27). Despite some slight discrepancies, the behavior of the experimental device is in good agreement with the theoretical response of the system for the three processing times exceeding the optimal time. Therefore, the extraction of gray levels, presented in Section III.A.2, is electronically implemented with this elementary cell.
IV. REACTION–DIFFUSION SYSTEMS A. One-Dimensional Lattice The motion Eq. (4) of the nonlinear mechanical chain can also describe the evolution of the voltage at the nodes of a nonlinear electrical lattice. This section is devoted to the presentation of this nonlinear electrical lattice. The nonlinear lattice is realized by coupling elementary cells with linear resistors R according to the principle of Figure 18a. Each elementary cell consists of a linear capacitor C in parallel with a nonlinear resistor whose current-voltage characteristic obey the cubic law
INL (u) = βu(u − Va )(u − Vb )/(R0 Va Vb ),
(28)
where 0 < Va < Vb are two voltages, β is a constant, and R0 is the analog to a weighting resistor. D0
R4 1Vcc
i D1 R Un21
R
Un
R
D2
Un11 R U
RNL
C
RNL
C
RNL
C
2Vcc
R1 R3
R4 R2
(a) (b) FIGURE 18 (a) Nonlinear electrical lattice. (b) The nonlinear resistor RNL .
Nonlinear Systems for Image Processing
109
40 Nonlinear Current (mA)
30 20 10 0 210 220 230 240 250 260
0
0.2
0.4
0.6 0.8 1 1.2 Voltage (V ) FIGURE 19 Current-voltage characteristics of the nonlinear resistor. The theoretical law [Eq. (28)] in the solid line is compared to the experimental data plotted with crosses. The dotted lines represent the asymptotic behavior of the nonlinear resistor. Parameters: R0 = 3.078 K, Vb = 1.12 V, Va = 0.545 V, β = 1.
The nonlinear resistor can be developed using two different methods. The first method to obtain a cubic current is to consider the circuit of Figure 18b with three branches (Binczak et al., 1998; Comte, 1996). A linear resistor R3 , a negative resistor, and another linear resistor R1 are successively added in parallel thanks to 1N4148 diodes. Due to the switch of the diodes, the experimental current-voltage characteristic of Figure 19 asymptotically displays a piecewise linear behavior with successively a positive slope, a negative one, and finally a positive one. This piecewise linear characteristics is compared to the cubic law [Eq. (28)], which presents the same roots Va , Vb , and 0 but also the same area below the characteristic between 0 and Va . This last conditions leads to β = 1 and R0 = 3.078 K (Morfu, 2002c). An alternative way to realize a perfect cubic nonlinear current is to use a nonlinear voltage source that provides a nonlinear voltage P(u) = βu(u − Va )(u − Vb )/(Va Vb ) + u as shown in Figure 20 (Comte and Marquié, 2003). This polynomial voltage is realized with AD633JNZ multipliers and classical TL081CN operational amplifiers. A resistor R0 ensures a feedback between the input/output of the nonlinear source such that Ohm’s law applied to R0 corresponds to the cubic current in Eq. (28):
P(u) − u = INL (u). R0
(29)
As shown in Figure 21, this second method gives a better agreement with the theoretical cubic law [Eq. (28)].
110
Saverio Morfu et al.
INL(U )
U
R0 P(U )
RNL INL(U )
U
Polynomial generation circuits
FIGURE 20 Realization of a nonlinear resistor with a polynomial generation circuit. β = 10Va Vb .
2 1.5
INL (mA)
1 0.5 0 20.5 21 21.5 22 22 21.5 21 20.5 0 0.5 1 1.5 2 U (Volts) FIGURE 21 Current-voltage characteristics of the nonlinear resistor of Figure 20. Parameters: β = −10 Va Vb , Va = −2 V, Vb = 2 V.
Applying Kirchhoff’s laws, the voltage Un at the nth node of the lattice can be written as
C
0 dUn 1/ = Un+1 + Un−1 − 2Un − INL (Un ), dτ R
(30)
where τ denotes the experimental time and n = 1 . . . N represents the node number of the lattice. Moreover, we assume zero-flux or Neumann boundary conditions, which involves for n = 1 and n = N, respectively,
0 1/ dU1 = U2 − U1 − INL (U1 ), dτ R 0 dUN 1/ C = UN−1 − UN − INL (UN ). dτ R C
(31) (32)
Nonlinear Systems for Image Processing
111
Next, introducing the transformations
Wn =
Un , Vb
D=
R0 αβ, R
t=
τ , R0 αCβ
(33)
yields the discrete Nagumo equation in its normalized form,
/ 0 dWn = D Wn+1 + Wn−1 − 2Wn + f (Wn ). dt
(34)
Therefore, an electronic implementation of the overdamped network presented in Section II.A is realized.
B. Noise Filtering of a One-Dimensional Signal One of the most important problems in signal or image processing is removal of noise from coherent information. In this section, we develop the principle of nonlinear noise filtering inspired by the overdamped systems (Marquié et al., 1998). In addition, using the electrical nonlinear network introduced in Section IV.A, we also present an electronic implementation of the filtering tasks.
1. Theoretical Analysis To investigate the response of the overdamped network to a noisy signal loaded as an initial condition, we first consider the simple case of a constant signal with a sudden change of amplitude. Therefore, we study the discrete normalized Nagumo equation
0 / dWn = D Wn+1 + Wn−1 − 2Wn + f (Wn ), dt
(35)
with f (Wn ) = −Wn (Wn − α)(Wn − 1) in the specific case α = 1/2. Furthermore, the initial condition applied to the cell n is assumed to be uniform for all cells, except for the cell N/2, where a constant perturbation b0 is added; namely:
Wn (t = 0) = V 0
∀n =
WN/2 (t = 0) = V 0 + b0 .
N 2 (36)
The solution of Eq. (35) to the initial condition in Eq. (36) can be expressed with the following form
Wn (t) = Vn (t) + bn (t).
(37)
112
Saverio Morfu et al.
Inserting Eq. (37) in Eq. (35), we collect the terms of order 0 and 1 in with the reductive perturbation methods to obtain the set of differential equations (Taniuti and Wei, 1968; Taniuti and Yajima, 1969):
dVn = D(Vn+1 + Vn−1 − 2Vn ) + f (Vn ) dt dbn = D(bn+1 + bn−1 − 2bn ) − (3Vn2 − 2Vn (1 + α) + α)bn dt
(38) (39)
Assuming that Vn is a slow variable, Eq. (38) reduces to
dVn = f (Vn ), dt
(40)
which provides the response of the system to a uniform initial condition V 0 (see details in Appendix A):
⎛ V(t) =
⎞ 1 2
− 1⎜ ⎟ ⎝1 + # ⎠. t 2 (V 0 − 12 )2 − V 0 (V 0 − 1)e− 2 V0
(41)
Next, to determine the evolution of the additive perturbation, it is convenient to consider a perturbation under the following form:
bn (t) = In (2Dt)g(t),
(42)
where In is the modified Bessel function of order n (Abramowitz and Stegun, 1970). Substituting Eq. (42) in Eq. (39), and using the property of the modified Bessel function,
dIn (2Dt) = D(In+1 + In−1 ), dt
(43)
we obtain straightforwardly
dg = −2Dg − 3Vn2 − 2Vn (1 − α) + α g, dt
(44)
dg = −2Ddt − 3Vn2 − 2Vn (1 − α) + α dt. g
(45)
that is,
113
Nonlinear Systems for Image Processing
Noting that
dVn df (Vn ) = − 3Vn2 − 2Vn (1 − α) + α , dt dt
(46)
and deriving Eq. (40) versus time, we derive
Vn
= − 3Vn2 − 2Vn (1 − α) + α , Vn
(47)
where Vn and Vn
denote the first and second derivative versus time. Combining Eq. (47) and Eq. (45) allows expression of g(t) as:
g(t) = Ke−2Dt
dVn , dt
(48)
where K is an integrating constant. Deriving Eq. (41), we obtain g(t) and thus the evolution of the perturbation:
bn (t) = K
In
(2Dt)e−2Dt e−t/2 8
V 0 V 0 − 12 (V 0 − 1) V0
−
1 2
2
− V 0 (V 0
− 1)e−t/2
3/2
. (49)
Writing bn (t = 0) = b0n provides the value of the integrating constant K. The evolution of the perturbation bn (t) is then ruled by: t
In (2Dt)e−2Dt e− 2 2 t V0 − 12 − V0 (V0 − 1)e− 2
b0 bn (t) = n 8
3 2
.
(50)
Finally, in the case of multiple perturbations, the perturbation at the nth node of the lattice follows as
bn (t) =
b0
t
In −n (2Dt)e−2Dt e− 2 2 8 t V0 − 12 − V0 (V0 − 1)e− 2 n
n
where In −n is the modified Bessel function of order n − n.
3 2
,
(51)
114
Saverio Morfu et al.
Eq. (41) shows that the evolution of the constant background does not depend on the coupling D. By contrast, Eq. (51) shows that the coupling D can be tuned to speed up the diffusion of the perturbation without affecting the constant background. Therefore, in signal-processing context, this property can be used to develop a noise filtering tool validate.
2. Theoretical and Numerical Results In order to validate the theoretical analysis developed in Section IV.B.1, we have solved numerically Eq. (35) using a fourth-order Runge–Kutta algorithm with an integrating time step dt = 10−3 . Moreover, a uniform initial condition V 0 = 0.4 is loaded for all the N = 48 cells of the network except for the 24th cell. Indeed, for this cell, an additive perturbation b0 = 0.2 is superimposed onto the constant background V 0 in order to match exactly the initial condition [Eq. (36)] considered in the theoretical Section IV.B.1. We have investigated the evolution of both the constant background and the perturbation versus time. In Figure 22, the numerical results plotted with (•) signs match with perfect agreement the theoretical results predicted by Eqs. (41) and (51). Moreover, the curve (a) of Figure 22 shows that the constant background given by Eq. (41) is unaffected by the nonlinear systems regardless of the coupling value D. By contrast, the behavior of the system for the
1 0.9 0.8 0.7 W
0.6 0.5
(a)
0.4 0.3
(b)
0.2 0.1
(c)
0 0.4 0.5 0.6 0.2 0.3 Normalized time FIGURE 22 (a) Temporal evolution of a uniform initial condition U 0 = 0.4 applied to the entire network. (b) Temporal evolution of the perturbation applied to the cell n = 24 for D = 0.5 and b0 = 0.2. (c) Temporal evolution of the perturbation applied to the cell n = 24 for D = 5 and b0 = 0.2. Solid line: theoretical expressions of Eqs. (41) and (51); (•) signs: numerical results. 0
0.1
115
Nonlinear Systems for Image Processing
additive perturbation b0 depends on the coupling parameter D (curves (b) and (c) in Figure 22). Indeed, for weak coupling values, namely, D = 0.5, the perturbation slowly decreases and seems to be quasi-unchanged, whereas for D = 5, the curve (c) exhibits a greater decreasing behavior. After time t = 0.4, the perturbation is significantly reduced for D = 5. Therefore, the coupling parameter D can be tuned to speed up the diffusion of the perturbation without disturbing the constant background. Furthermore, the time acts as a parameter that adjusts the filtering of the perturbation. The state of the lattice for two different processing times is shown in Figure 23a and b for the previous coupling values, that is, D = 5 and D = 0.5, respectively. The initial perturbation represented in the dotted line (curve (I)) has almost disappeared for the specific value of the coupling D = 5 and for a processing time t = 2 (Figure 23a, curve (III)). As expected, curve (III) of Figure 23b shows that the perturbation is not filtered for D = 0.5 and for the same processing time t = 2. Furthermore, in both cases the constant background is slowly attracted by the nearest stable state—0 in our case. Note that the spatiotemporal views of Figure 24 also reveal that the noise filtering is performed for D = 5 and a processing time t = 2. Finally, to validate the processing task realized by the overdamped system, we propose to remove the noise from a more complex signal—a noisy sinusoïdal signal. The signal is first sampled with a total number of samples corresponding to the size of the overdamped network, namely, N. Next, a serial to parallel conversion is realized to load the N samples at the nodes of the 1D lattice. Therefore, we are led to consider the distribution of initial
0.7
0.7 0.6
0.6
(I)
0.5
0.5
(I)
Wn
Wn
(II) 0.4 0.3
(III)
0.3 (III)
0.2
0.2
0.1
0.1
0 0
5
10 15 20 25 30 35 40 45 n (a)
(II)
0.4
0 0
5
10 15 20 25 30 35 40 45 n (b)
FIGURE 23 Response of the lattice to a uniform initial condition corrupted by a constant perturbation at two different processing times. (•) signs: numerical results; solid line: theoretical expression in Eq. (51). (a): D = 5; (b): D = 0.5. (I) initial condition for t = 0, (II) state of the lattice for t = 1, (III) state of the lattice for t = 2.
Saverio Morfu et al.
0.6
0.6
0.5
0.5 Wn
Wn
116
0.4
0.4 0.3
0.3
0.2
0.2 40 30 20 n 10
2 0 0
0.5
1 time
40 30 n 20 10
1.5
2 0 0
0.5
1 time
1.5
(b)
(a)
FIGURE 24 Spatiotemporal view of the response of the lattice to the previous initial condition. (a): D = 5; (b): D = 0.5.
conditions of Figure 25a in relation to
!
2n xn = A cos 2π N
" +
1 + ηn , 2
(52)
where ηn is a discrete white gaussian noise of root mean square RMS amplitude σ = 0.15. A and 2/N represent, respectively, the amplitude and the frequency of the coherent signal. First we numerically investigate the response of the network with the coupling D = 0.5. As in the case of a constant background corrupted by a local perturbation, the system is unable to remove the noise from the sinusoidal signal for both processing times presented in Figure 25b and d. By contrast, for the favorable value of the coupling D = 5, the noise is completely filtered at the processing time t = 1 as shown in Figure 25e.
3. Experimental Results To validate the electronic implementation of the nonlinear noise filtering tool, we consider the nonlinear electrical lattice introduced in Section IV.A with the nonlinear resistor of Figure 18b. In order to match the coupling value D = 5 and D = 0.5, the coupling resistor R is set to R = 300 and R = 3 K, respectively. Moreover, all results are presented in normalized units using the transformation Eq. (33) to allow direct comparison with the theoretical analysis of Section IV.B.2. First, we experimentally report in Figure 26 the temporal evolution of the set of initial conditions consisting of a constant signal locally corrupted by a perturbation. As predicted in the theoretical section, the constant background is unaffected regardless of the coupling value (curve (a)), whereas when the coupling is adjusted
Nonlinear Systems for Image Processing
117
1
Wn
0.8 0.6 0.4 0.2 0 0
5
10 15 20 25 30 35 40 45 n
1
1
0.8
0.8
0.6
0.6
Wn
Wn
(a)
0.4 0.2 0 0
0.4 0.2
5
0 0
10 15 20 25 30 35 40 45 n
5
(c)
1
1
0.8
0.8
0.6
0.6
Wn
Wn
(b)
0.4 0.2 0 0
10 15 20 25 30 35 40 45 n
0.4 0.2
5
10 15 20 25 30 35 40 45 n (d)
0 0
5
10 15 20 25 30 35 40 45 n (e)
FIGURE 25 Noise filtering of a one-dimensional signal with an overdamped nonlinear network. (a): Noisy sinusoidal signal sampled and loaded as the initial condition at the nodes of the lattice. σ = 0.15, N = 48, and A = 0.264. (b), (c), (d), and (e) correspond to the filtered signal obtained for the following couples of processing time t and coupling D: (b) (t = 0.4, D = 0.5); (c) (t = 1, D = 0.5); (d) (t = 0.4, D = 5); (e) (t = 1, D = 5).
to its favorable value D = 5, the perturbation can be removed after a normalized processing time t = 0.4 (curve (c)). This result is also confirmed by the spatial response of the system at two different processing times. Indeed, as shown in Figure 27, the state of the lattice for t = 2 and t = 4 provides the signal without the perturbation only if the coupling D is chosen to equal 5. Finally, we propose to filter the noisy sinusoidal signal of Figure 28a. After a processing time t = 0.6, the noise is completely removed for the coupling D = 5 (Figure 28c), which is not the case if the coupling is set to D = 0.5 (Figure 28b). Therefore, with a suitable choice of both processing time and resistor coupling, a noise filtering tool inspired by the
Saverio Morfu et al.
118
1 0.9 0.8 0.7 W
0.6 0.5
(a)
0.4 0.3
(b)
0.2 (c)
0.1 0
0.4 0.5 0.6 0.2 0.3 Normalized time FIGURE 26 (a) Temporal evolution in normalized units of a uniform initial condition W 0 = 0.4 applied to the network. (b) Temporal evolution of the perturbation applied to the cell n = 24 for b0 = 0.2 and D = 0.5 corresponding to a coupling resistor R = 3 K. (c) Temporal evolution of the perturbation applied to the cell n = 24 for b0 = 0.2 and D = 5 corresponding to a coupling resistor R = 300 . C = 33 nF. Nonlinearity parameters β = 1, Vb = 1.12V, Va = 0.545V involving α = 0.49. 0.1
1
1
0.9
0.9
0.8 0.7
(I)
0.6 0.5
(II)
0.4 0.3 0.2
(III)
0.1 0
5
10 15 20 25 30 35 40 45 n (a)
Normalized voltage Wn
Normalized voltage Wn
0
0.8 0.7
(I)
0.6 0.5
(II)
0.4 0.3 0.2
(III)
0.1 0
5
10 15 20 25 30 35 40 45 n (b)
FIGURE 27 Response of the lattice to a uniform initial condition corrupted by a constant perturbation at two different processing times. Parameters: C = 33 nF, Vb = 1.12 V, Va = 0.545 V, α = 0.49. (a): R = 300 that is D = 5; (b): R = 3 K that is D = 0.5. (I) initial condition for t = 0, (II) state of the lattice for t = 2(τ = 0.1 ms), (III) state of the lattice for t = 4 (τ = 0.2 ms).
properties of the nonlinear overdamped network is electronically implemented. Moreover, according to Eq. (33), the processing time could be adjusted by the value of the capacitor C to match real-time processing constraints.
Nonlinear Systems for Image Processing
119
1
Wn
0.8 0.6 0.4 0.2 0 0
5
10 15 20 25 30 35 40 45 n
1
1
0.8
0.8
0.6
0.6
Wn
Wn
(a)
0.4 0.2 0 0
0.4 0.2
5
10 15 20 25 30 35 40 45 n
0 0
5
10 15 20 25 30 35 40 45 n
(b)
(c)
FIGURE 28 Noise filtering of a one-dimensional signal with an electrical nonlinear lattice. (a): Normalized noisy sinusoidal signal given by Eq. (52) loaded as initial condition at the nodes of the lattice. σ = 0.15, N = 48, and A = 0.264. (b): Filtered signal obtained for a processing time t = 0.6 (τ = 92.3 μs) and a coupling D = 0.5 (that is, R = 3 K). (c): Filtered signal obtained for a processing time t = 0.6 (τ = 92.3 μs) and a coupling D = 5 (that is R = 300 ). Parameters: C = 100 nF, β = 1, Vb = 1.12 V, Va = 0.545 V.
C. Two-Dimensional Filtering: Image Processing We now numerically extend the properties of the 1D lattice to a 2D network. Consider a CNN whose cell state Wi, j , representing the gray level of the pixel number i, j, follows the following set of equations:
/ 0 dWi, j Wk, l − Wi, j , = f (Wi, j ) + D dt
i, j = 2 . . . N − 1, 2 . . . M − 1,
(k, l)∈Nr
(53) where Nr = {(i − 1; j), (i + 1, j), (i, j + 1), (i, j − 1)} is the set of the four nearest neighbors, N × M the image size, and f (Wi, j ) represents the nonlinearity. The boundary conditions for the edges of the image express
0 / 0 / dW1, j = f W1, j + D W1, j−1 + W2, j + W1, j+1 − 3W1, j , j = 2..M − 1 dt / 0 / 0 dWN, j = f WN, j + D WN, j−1 + WN−1, j + WN, j+1 − 3WN, j , dt j = 2..M − 1
120
Saverio Morfu et al.
0 / dWi, 1 = f (Wi, 1 ) + D Wi−1, 1 + Wi+1, 1 + Wi, 2 − 3Wi, 1 , i = 2..N − 1 dt 0 / dWi, M = f (Wi, M ) + D Wi−1, M + Wi+1, M + Wi, M−1 − 3Wi, M , dt i = 2..N − 1 while for the image corners, we consider the two nearest neighbors, that is
dW1, 1 dt dWN, M dt dWN, 1 dt dW1, M dt
0 / = f (W1, 1 ) + D W2, 1 + W1, 2 − 2W1, 1 , / 0 = f (WN, M ) + D WN, M−1 + WN−1, M − 2WN, M , / 0 = f (WN, 1 ) + D WN−1, 1 + WN, 2 − 2WN, 1 , 0 / = f (W1, M ) + D W2, M + W1, M−1 − 2W1, M .
1. Noise Filtering The initial condition applied to the cell i, j of the network corresponds to the initial gray level Wi,0 j of the noisy image shown in Figure 29. The image after a processing time t is obtained noting the state Wi, j (t) of all cells of the network at this specific time t (Comte et al., 1998). Figure 30 shows the filtered image obtained at the processing times t = 1, t = 3, t = 6, t = 9 and for the coupling values D = 0.075, D = 0.1, D = 0.2, and D = 0.3, respectively. The bistable behavior of the system established in Section II.A.1 involves a natural evolution of the image toward the two stable states of the system—0 and 1. Thus, as time increases, the image evolves into a black-and-white pattern. Therefore, to achieve correct noise filtering, the coupling parameter and the processing time must be adjusted.
FIGURE 29 Noisy image of the Coliseum.
Nonlinear Systems for Image Processing
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
121
(m) (n) (o) (p) FIGURE 30 Noise filtering of the image represented in Figure 29. (a)−(d). Filtered image obtained for D = 0.075 and for the respective processing times t = 1, t = 3, t = 6, and t = 9. (e)−(h). Filtered image obtained for D = 0.1 and for the respective processing times t = 1, t = 3, t = 6, and t = 9. (i)−(l). Filtered image obtained for D = 0.2 and for the respective processing times t = 1, t = 3, t = 6, and t = 9. (m)−(p). Filtered image obtained for D = 0.3 and for the respective processing times t = 1, t = 3, t = 6, and t = 9.
122
Saverio Morfu et al.
For the lowest coupling value D = 0.075, Figure 30 shows that the noise is not removed before the image is binarized. For the coupling parameter D = 0.2 and D = 0.3, even if the noise is quickly removed, the filtered image becomes blurred for t = 6 and t = 9 (Figure 30k, l, o, p). Therefore, these settings of the coupling parameter are inappropriate. In fact, Figure 30f and g shows the filtered image with the best setting of the coupling and processing time: a coupling D = 0.1 and the processing times t = 3 or t = 6. Indeed, the filtered images are neither blurred nor binarized. Moreover, the system not only removes the noise, it also enhances the contrast of the initial image.
2. Edge Filtering Because of a strong relationship between edge and object recognition, edge detection constitutes one of the most important steps for image recognition. Indeed, scene information often can be interpreted because of the edges. Classical edge detection algorithms are based on a second-order local derivative operator (Gonzalez and Wintz, 1987), whereas nonlinear techniques of edge enhancement are inspired mainly by the properties of reaction-diffusion media (Chua and Yang, 1988; Rambidi et al., 2002). We propose a strategy of edge detection based on the propagation properties of the nonlinear diffusive medium (Comte et al., 2001). The image loaded in the 2D network is the black-and-white picture Figure 31a. We established in Section II.A.2 that a 1D lattice modeled by the Nagumo equation supports kink and anti-kink propagation owing to the bistable nature of the nonlinearity. Indeed, if the nonlinearity threshold parameter α < 1/2, the stable state 1 propagates, while if α > 1/2, the stable state 0 propagates. Therefore, extending this property to a 2D network allows calculation of either erosion for α > 1/2 or dilation for α < 1/2, which are basic mathematical morphology operations, commonly performed in image processing (Serra, 1986). Moreover, if the initial image is subtracted from the image obtained with the network obeying to Eq. (53), we can deduce the contours of the image after a processing time t. Figure 31b shows the contour of a black-and-white image and its profile obtained with this method. The profile of the contour shows that its resolution is ∼10 pixels, which is insufficient to allow good edge enhancement of a more complex image. This poor resolution is mainly attributable to the spatial expansion of the kink that results from the initial condition loaded in the lattice. Since the kink expansion reduces with the coupling, a natural solution consists of lowering the coupling. Unfortunately, the existence of the propagation failure effect provides a lower bound of the coupling D∗ and thus hinders contour detection with good resolution. An alternative solution can be developed by using a nonlinearity that eliminates the propagation failure effect. Indeed, it has been shown for dissipative media (Bressloff and
Nonlinear Systems for Image Processing
123
1
0 0
50
100
150
200
250
150
200
250
(a) 1
0 0
50
100 (b)
1
0 0
50
100
150 200 250 (c) FIGURE 31 Contour detection of a black square in a white background. (a) Initial image and its profile. (b) Edge detection of the object and its profile obtained with the standard cubic nonlinearity [Eq. (5)] with threshold α = 1/3. Processing time t = 4, D = 1. (c) Contour and the corresponding profile obtained with the nonlinearity [Eq. (54)]. Processing time t = 4, D = 1.
Rowlands, 1997) or for systems where both inertia and dissipation are taken into account (Comte et al., 1999) that an inverse method allows definition of a nonlinear function for which exact discrete propagative kinks exist. Especially in the purely dissipative case, such function expresses
f (Wi, j ) = D (1 − a2 /2) − (a0 Wi, j + a1 )2 −
Da2 (a0 Wi, j + a1 ) 1 − (a0 Wi, j + a1 )2
+ 2D(a0 Wi, j + a1 ),
(54)
124
Saverio Morfu et al.
where = 0.5, a2 = 0.9, a0 = 1.483, and a1 = − 0.742 to ensure that the zeros of the nonlinearity remain 0, 1/3, and 1. As expected, when this new nonlinearity is numerically implemented, the resolution of the detected contour in Figure 31c is reduced to 3 pixels. Note that edge enhancement with the nonlinear overdamped network is not restricted to a black-and-white image. Indeed, the concept is based on the propagation properties of the system and can be extended to the case of an image with 256 gray levels. For instance, we propose to show numerically the contour enhancement of Figure 31a by considering the methodology used for the edge detection of the black-and-white picture. The simulation results are summarized in Figure 32 for different processing times in the favorable case of the nonlinear function [Eq. (54)]. It is clear that again the time allows adjustment of the quality of the processing. Indeed, for processing times below t = 1, the edges of the image details are not revealed, whereas for processing times exceeding 1.33, the details begin to disappear. Furthermore, as times increases, the contours of the image are increasingly thinner owing to the propagation mechanism. The best contour enhancement is thus performed when the image details have not yet disappeared and when the enhanced contours remain sufficiently thin. In fact, this situation corresponds to the intermediate processing time t = 1.33 (Figure 32e).
3. Extraction of Regions of Interest As explained in the previous subsections, in the case of cubic nonlinearity, a nonlinearity threshold α = 0.5 allows noise filtering, while considering α = 0.5 provides the contour of an image with poor resolution. Moreover, the nonlinearity f (W) can be determined using an inverse method to optimize the filtering task. Therefore, the choice of the nonlinearity is of crucial interest in developing interesting and powerful image-processing tools. In this section, we go one step further by proposing a new nonlinearity to extract the regions of interest of an image representing the soldering between two rods of metal (Morfu et al., 2007). The noisy and weakly contrasted image of Figure 33 presents four regions of interest: • First, the two rods of metal constitute the background of the image in light gray • The stripe in medium gray at the center of the image represents the “soldered joint” • A white spot corresponds to a “projection” of metal occurring during the soldering of the two rods of metal • A dark gray spot represents a gaseous inclusion inside the soldering joint.
Nonlinear Systems for Image Processing
125
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h) (i) (j) FIGURE 32 Contour enhancement of a an image with 256 gray levels realized with the modified nonlinearity in Eq. (54). (a) Initial image. (b)−(j). Filtered image for the respective processing times t = 0.33, t = 0.66, t = 1, t = 1.33, t = 1.66, t = 2, t = 2.33, t = 2.66, and t = 3.
Saverio Morfu et al.
126
Projection Two roads of metal
Soldered joint 0 0.5 Gaseous inclusion
1
FIGURE 33 Noisy and weakly contrasted image of soldering between two rods of metal. The image histogram is represented at the right.
j 21
0.016
D
D
i 21
Particle
j 11
j
0.012 D
D F(Wij )
D
D
D
i
i 11
D
D
D D
0.008 0.004
D 0
0
Wij
W 0ij
1
(a) (b) FIGURE 34 Mechanical point of view of the bistable overdamped network used for image processing. (a) The pixel with coordinates i, j and gray level Wi, j is analog to an overdamped particle coupled by springs of strength D to its four nearest neighbors. (b) The particle is attracted in one of the two wells of the bistable potential according to the resulting elastic force applied by the four coupled particles.
a. Limit of the Bistable Network. We first discuss the inability of the bistable overdamped network ruled by Eq. (53) to extract the four objects of the image. As explained in Section II.A.1, the bistability is ensured by using the cubic nonlinearity in Eq. (3). According to the mechanical description of the bistable system presented in Section II, a pixel of the image is analog DW to a particle experiencing a double-well potential φ(W) = − 0 f (u)du and coupled to its four nearest neighbors by springs of strength D. As schematically shown in Figure 34, the particle with initial position Wi,0 j is attracted in one of the two wells of potential depending on the competition between the resulting elastic force and the nonlinear force f (Wi, j ). D Wk, l − Wi, j , (55) (k, l)∈Nr
Nonlinear Systems for Image Processing
127
(a) (b) (c) FIGURE 35 Filtered images obtained with the bistable overdamped network described by Eqs. (3) and (53) in the case α = 1/2. Coupling parameter: D = 0.05. Processing times: (a) t = 4; (b) t = 10; (c) t = 3000.
This property of the system allows sufficiently large time, the network is organized near the two stable states set by the nonlinearity, namely, 0 and 1. In image-processing context, it means that the resulting filtered image tends to be an almost black-and-white pattern. Figure 35 confirms this evolution of the filtered image versus the processing time since, when a cubic nonlinearity is considered, a quasi – black-and-white image is obtained at the time t = 3000 (Figure 35c). Note that for none of the proposed processing times was the bistable system able to properly remove the noise and to enhance the contrast of the regions of interest. Indeed, for t = 4 the noise is reduced but the details of the image begin to disappear (Figure 35a). In particular the projection is merged into the background for t = 10, indicating that the bistable nature of the system destroys the coherent information of the initial image (Figure 35b). Therefore, the inability of the overdamped system to extract the regions of interest is directly related to the bistable nonlinear force f (W).
b. The Multistable Network. To solve this problem and to maintain the coherent structure of the image, we introduce a nonlinearity with a multistable behavior. For instance, the following nonlinear force
f (W) = −β(n − 1) sin 2π(n − 1)W
(56)
DW derives from a potential φ(W) = − 0 f (u)du, which presents n wells and a potential barrier height between two consecutive potential extrema defined by β/π. This potential is represented in Figure 36 in the case of n = 5 wells of potential. The multistable behavior of the network obeying to Eq. (53) with the sinusoidal force in Eq. (56) can be established by considering the uncoupled case.
128
Saverio Morfu et al.
F(Wij)
0.02 0
Ⲑ
20.02 0
0.5 Wi,j
1
FIGURE 36 Multistable potential represented for β = 9.82 × 10−2 and n = 5. The potential barrier between two consecutive extrema is β/π.
Setting D = 0 in Eq. (53), we obtain
dWi, j = −β(n − 1) sin 2π(n − 1)Wi, j . dt
(57)
The stability analysis of the system can be performed with the methodology developed in Section II.A.1 by considering the roots of the sinusoidal force in Eq. (56). According to the sign of the derivative of the sinusoidal force, we can straightforwardly deduce that the unstable steady states of the system are given by
Wthk = (2k + 1)/(2(n − 1))
with
k ∈ Z,
(58)
while the stable steady states are defined by
Wk∗ = k/(n − 1)
with
k ∈ Z.
(59)
Eq. (57) is solved in Appendix C to provide the temporal evolution of an overdamped particle experiencing the multistable potential of Figure 36 in the uncoupled case. If k denotes the nearest integer of (n − 1)Wi,0 j , and Wi,0 j the initial position of the particle, the displacement Wi, j (t) of the particle is expressed as
! " 0 −β(n−1)2 2πt / k 1 0 arctan tan π(n − 1)Wi, j e . + Wi, j (t) = π(n − 1) n−1 (60) The multistable behaviour of the system is illustrated in Figure 37, which shows the temporal evolution of a particle submitted to different initial conditions in the range [0; 1]. It is clear that the unstable steady states of the system Wthk act as thresholds, while the stable steady states Wk∗ correspond to attractors. Indeed, the final state of the particle
Nonlinear Systems for Image Processing
129
1
W*5
1 Wth4
0.8
Wth3
W*4
0.6
Wi,j
Wij
W*3
0.4
Wth2
W*2
0.2 Wth1 0 0
0.05
0.1 0.15 0.2 Normalized time t
0.25
0 0.04 0 20.04 F(Wi,j)
W*1
FIGURE 37 Temporal evolution of an overdamped particle experiencing the multistable potential. Parameters: n = 5 and β = 0.25. Solid line: theoretical expression of Eq. (60); open circles: numerical results obtained solving Eq. (57).
depends on the value of the initial condition compared to the thresholds Wthk . In particular if we neglect the transitional regime, the asymptotic behavior of the uncoupled network is reduced to the following rules
if
2k − 1 2k + 1 < Wi,0 j < 2(n − 1) 2(n − 1)
Wi, j (t #→ +∞) =
k . (n − 1)
(61)
Therefore, the asymptotic functioning [Eq. (61)] of the uncoupled network proves the multistable behavior of the system. We now numerically use this multistable feature to extract the regions of interest of the image. In the coupled case, a pixel with initial gray level Wi,0 j can take one of the n possible stable states according to the competition between the sinusoidal force and the resulting elastic force. The specific case n = 5 is shown numerically in Figure 38. Unlike the bistable network, the noise is quickly removed without disturbing the coherent structure of the image consisting of “the projection,” “the gaseous inclusion,” “the background,” and the “soldered joint” (Figure 38a for t = 0.2 and (b) for t = 2). Next, for a sufficiently longer time, namely t = 5000, the image no longer evolves and each defect of the soldering appears with a different mean gray level corresponding to one of the five stable steady states of the system (Figure 38c). An extraction of the interest regions of the image is then performed with this overdamped network.
Saverio Morfu et al.
130
(a) (b) (c) FIGURE 38 Filtered images obtained with the multistable overdamped network described by Eqs. (53) and (56). Nonlinearity parameters: β = 9.82 × 10−2 , n = 5. Coupling parameter: D = 1.6. Processing times: (a) t = 0.2; (b) t = 2; (c) t = 5000. j21 Ui21, j21
i 21
j 11
j Ui21, j R
R
R
R
i
R
R
R
Ui21 ,j11
Ui,j11
Ui,j C
Ui,j21 R
R
RNL
R INL(Ui,j)
i 11 Ui11,j21
Ui11,j
Ui11,j11
FIGURE 39 Electronic sketch of the multistable nonlinear network. R Represents the coupling resistor, C a capacitor, and RNL a nonlinear resistor. INL denotes the nonlinear current and Ui, j the voltage of the cell with coordinates i, j.
c. Electronic Implementation of the Multistable Network. The electronic implementation of the multistable network is realized according to the methodology of Figure 39 by coupling elementary cells with linear resistors. Each elementary cell includes a capacitor in parallel with a nonlinear resistor whose current-voltage characteristics can be approximated by the sinusoïdal law on the range [−2V; 2V]: INL (U) % IM sin(2πU).
(62)
The methodology in the Section IV.A to realize the cubic nonlinearity with a polynomial source can be used to obtain the sinusoidal law in Eq. (62). First, a least-square method at the order 15 allows us to fit the
131
Nonlinear Systems for Image Processing
sinusoidal expression in Eq. (62) by a polynomial law P(U) in the range [−2V; 2V]. This provides the coefficients of the polynomial source P(U) by generating the sinusoidal current
INL (U) = P(U)/R0 .
(63)
2
1.5
0.5
Uth 3
1
1
0
0
Uth 2 20.5
21 20.5
U (Volt)
0.5
Uth1 21.5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
time (ms)
3 2 1 0 21 22 23 22 21.5
21
22 0
U*5
U(Volt)
Uth 4
2
1.5
The experimental current-voltage characteristics is compared in Figure 40b to the theoretical expression in Eq. (62). The weak discrepancies observed between the theoretical and experimental laws can be reduced by increasing the order of the least-square method. However, enhancing the agreement with the sinusoidal law presents the main disadvantage to considerably increasing the number of electronic components used for the realization of the nonlinear resistor. Nevertheless, at the order 15, the experimental nonlinear current presents nine zeros and its derivative ensures the existence of five stable steady states and four unstable steady states. It is thus not of crucial interest to increase the order of the approximation, provided that the nonlinear resistor exhibits the multistability.
U*4
U*3
U*2
U*1
I NL (mA)
(a) (b) FIGURE 40 (a) Response of an elementary cell of the multistable network to different initial conditions in the uncoupled case. The plot of the theoretical expression [Eq. (60)] in the solid line is compared to the experimental results represented by crosses. (b) Nonlinear current-voltage characteristics. The sinusoidal law [Eq. (62)] in the solid line matches the experimental characteristics shown by plus sign (+). The component values are R0 = 2 K, C = 390 nF, IM = 2mA. The zeros of the sinusoidal current defines the four unstable states Uth1 , Uth2 , Uth3 , and Uth4 , which correspond to thresholds and the five stable steady states U1* , U2* , U3* , U4* , and U5* , which correspond to attractors.
132
Saverio Morfu et al.
Applying the Kirchhoff laws to the electrical network of Figure 39, we deduce the differential equation, which rules the evolution of the voltage Ui, j at the nodes (i, j)
C
dUi, j 1 = −INL (Ui, j ) + (Uk, l − Ui, j ). dτ R
(64)
(k, l)∈Nr
In Eq. (65), Nr = {(i; j − 1), (i; j + 1), (i − 1; j), (i + 1; j)} denotes the neighborhood and τ represents the experimental time. Next, the transformations
τ = tR0 C, β =
IM R0 R0 , , Ui, j = Wi, j (n − 1) − 2 and D = 2 R (n − 1)
(65)
lead to the normalized equation
/ 0 P(Wi, j (n − 1) − 2) dWi, j Wk, l − Wi, j . = +D dt n−1
(66)
(k, l)∈Nr
The normalization is completed by noting that for Wi, j ∈ [0; 1], that is, for Ui, j ∈ [−2V; 2V],
P Wi, j (n − 1) − 2 = −R0 INL Wi, j (n − 1) − 2 % −β(n − 1)2 sin(2π(n − 1)Wi, j ).
(67)
The experimental network described by Eq. (66) appears as an analog simulation of the normalized multistable network used for image processing. Let us finally reveal the multistable behaviour of the elementary cell of the experimental network by investigating its response to different initial conditions in the uncoupled case. In addition, to allow a direct comparison with the theoretical expression (60), all the results are presented in normalized units in Figure 40a. First, we note that the component uncertainties does not explain the observed discrepancies. The poor correlation between the experimental results and the theoretical prediction is allocated to the nonlinearity provided by the nonlinear resistor, which does not exactly follow the sinusoidal law in Eq. (62). Nevertheless, the multistable property of the system is experimentally established. Indeed, there exist four threshold values, Uth1 , Uth2 , Uth3 , and Uth4 that allow determination of the final state of the elementary cell among the five possible stable steady states,
Nonlinear Systems for Image Processing
133
U1∗ , U2∗ , U3∗ , U4∗ , and U5∗ . Therefore, the image-processing task inspired by the multistable property of the system is implemented with the electronic device in Figure 39.
V. CONCLUSION This chapter has reported a variety of image-processing operations inspired by the properties of nonlinear systems. Considering a mechanical analogy, we have split the class of nonlinear systems into purely inertial systems and overdamped systems. Using this original description, we have established the properties of nonlinear systems in the context of image processing. For purely inertial systems, image-processing tasks such as contrast enhancement, image inversion, gray level extraction, or image encryption can be performed. The applications of the nonlinear techniques presented herein are similar to those developed by means of chemical active media (Teuscher and Adamatzky, 2005), even if these last media are rather overdamped than inertial. In particular, the dynamics of the nonlinear oscillators network, which enables contrast enhancement, can also be used to reveal “hidden images.” Indeed, “hidden images” are defined as fragments of a picture with brightness very close to the brightness of the image background. Despite a weak difference of brightness between the hidden image and the image background, our nonlinear oscillators network take advantage of its properties to reveal the hidden image. Another interesting property of this network is that is consecutively reveals fields of the image with increasing or decreasing brightness at different processing times. We trust that this feature, also shared by Belousov–Zhabotinsky chemical media, may have potential applications in image analysis in medicine (Teuscher and Adamatzky, 2005). Finally, the noise effects in this purely inertial network lead to cryptography applications. Unlike classical cryptography devices, built with chaotic oscillators, we have proposed an encryption scheme based on the reversibility of our inertial system. Moreover, the encryption key, which ensures the restoration of the initial data, is the time of evolution of the data loaded in the nonlinear network. Therefore, the main advantage of our device is that it allow an easy change of the encryption key. The properties of strongly dissipative or overdamped systems can also give rise to novel image-processing tools. For instance, we have shown the possibility of achieving noise filtering, edge detection, or extraction of regions of interest of a weakly contrasted picture. With regard to noise filtering applications based on reaction-diffusion media, the processing is based on the transient behavior of the network since the filtered image depends on the processing times. By contrast, the extraction of regions of
134
Saverio Morfu et al.
interest presents the main advantage of independence from the processing time since the filtering is realized when the network reaches a stationary pattern. Therefore, this feature can allow an automatic implementation of the processing task.
VI. OUTLOOKS A. Outlooks on Microelectronic Implementation For each nonlinear processing example, we have attempted to propose an electronic implementation using discrete electronic components. Even if these macroscopic realizations are far from real practical applications, they present the primary advantage of validating the concept of integration of CNN for future development in microelectronics. Indeed, in recent years the market for solid-state image sensors has experienced explosive growth due to the increasing demands for mobile imaging systems, video cameras, surveillance, or biometrics. Improvements in this growing digital world continue with two primary image sensor technologies: charge coupled devices (CCD) and Complementary Metal Oxyde Semiconductor (CMOS) sensors. The continuous advances in CMOS technology for processors and Dynamic Random Access Memory (DRAMs) have made CMOS sensor arrays a viable alternative to the popular CCD sensors. New technologies provide the potential for integrating a significant amount of Very Large Scale Integration (VLSI) electronics into a single chip, greatly reducing the cost, power consumption, and size of the camera (Fossum, 1993; Fossum, 1997; Litwiller, 2001; Seitz, 2000). In past years, most research on complex CMOS systems has dealt with the integration of sensors providing a processing unit at chip level (system-on-chip approach) or at column level by integrating an array of processing elements dedicated to one or more columns (Acosta et al., 2004; Kozlowski et al., 2005; Sakakibara 2005; Yadid-Precht and Belenky, 2003). Indeed, pixel-level processing is generally dismissed because pixel sizes are often too large to be of practical use. However, as CMOS image sensors scale to 0.18-μm processes and below, integrating a processing element at each pixel or group of neighboring pixels becomes feasible. More significantly, using a processing element per pixel offers the opportunity to achieve massively parallel computations and thus the ability to implement full-image systems requiring significant processing such as digital cameras and computational sensors (El-Gamal et al., 1999; Loinaz et al., 1998; Smith et al., 1998). The latest significant progress in CMOS technologies have made possible the realization of vision systems on chip (VSoCs). Such VSoCs are eventually targeted to integrate within a semiconductor substrate the functions of optical sensing, image processing in space and time, high-level processing, and the control of actuators. These chips consist of arrays of mixed-signal processing elements (PEs), which operate
Nonlinear Systems for Image Processing
135
in accordance with single-instruction multiple-data (SIMD) computing architectures. The main challenge in designing a SIMD pixel parallel sensor array is the design of a compact, low-power but versatile and fully programmable processing element. For this purpose, the processing function can be based on the paradigm of CNNs. CNNs can be viewed as a very suitable framework for systematic design of image-processing chips (Roska and Rodriguez-Vazquez, 2000). The complete programmability of the interconnection strengths, its internal image-memories, and other additional features make this paradigm a powerful beginning for the realization of simple and medium-complexity artificial vision tasks (Espejo et al., 1996). Some proof-of-concept chips operating on preloaded images have been designed (Czuni et al., 2001; Rekeczky et al., 1999). Only a few researchers have integrated CNN on real vision chips. As an example, Espejo (Espejo et al., 1998) reports a 64 × 64-pixel programmable computational sensor based on a CNN. This chip is the first fully operational CNN vision-chip reported in the literature that combines the capabilities of image transduction, programmable image-processing, and algorithmic control on a common silicon substrate. It has successfully demonstrated operations such as low-pass image filtering, corner and border extraction, and motion detection. More recently, other studies have focused on the development of CMOS sensors including the CNN paradigm (Carmona et al., 2003; Petras et al., 2003). The chip consists of 1024 processing units arranged into a 32 × 32 grid and contains approximatively 500,000 transistors in a standard 0.5-μm CMOS technology. However, in these pioneering vision chips, the pixel size is often greater than 100 μm × 100 μm. Obviously, these dimensions cannot be considered as realistic dimensions for a real vision chip. A major part of this crucial problem should be resolved in future years by using the newly emerging CMOS technologies. Indeed, CMOS image sensors directly benefit from technology scaling by reducing pixel size, increasing resolution, and integrating more analog and digital functionalities on the same chip with the sensor. We expect that further scaling of CMOS image sensor technology and improvement in their imaging performances will eventually allow the implementation of efficient CNNs dedicated to nonlinear image processing.
B. Future Processing Applications The nonlinear processing tools developed in this chapter are inherited from the properties of homogeneous media. In the case of applications based on the properties of reaction-diffusion media, it is interesting to consider the effects of both nonlinearity and structural inhomogeneities. Indeed, novel properties inspired by biological systems, which are inhomogeneous rather
136
Saverio Morfu et al.
than homogeneous (Keener, 2000; Morfu et al., 2002a; Morfu et al., 2002b; Morfu, 2003), could allow optimizing the filtering tools developed in this chapter. For instance, in Section IV.C.1, the noise removal method based on the homogeneous Nagumo equation provides a blurry filtered image. In addition, it is difficult to extract the edge of the image with an accurate location. Indeed, noting that the contours of the image correspond to steplike profiles, the diffusive process increases the spatial expansion of the contours. To avoid this problem, anisotropic diffusion has been introduced to reduce the diffusive effect across the image contour. This method has been proposed by Perona and Malik (1990) to encourage intraregion smoothing in preference to interregion smoothing. To obtain this property, Perona and Malik replaced the classical linear isotropic diffusion equation
∂I(x, y, t) = div(∇I), ∂t
(68)
by
∂I(x, y, t) = div(g(&∇I&)∇I), ∂t
(69)
to adapt the diffusion with the image gradient. In Eqs. (68) and (69), I(x, y, t) represents the brightness of the pixel located at the spatial position (x, y) for a processing time t, while &∇I& is the gradient amplitude. Moreover, the anisotropy is ensured by the function g(&∇I&) which “stops” the diffusion across the edges. For instance, Perona and Malik considered the function
g(x) =
1 1+
x2 K2
,
(70)
where K is a positive parameter. Noting that when x # → ∞, g(x) # → 0, the effect of anisotropic diffusion is to smooth the original image while the contours are preserved. Indeed, the edge of the image corresponds to brightness discontinuities that lead to strong values of the image gradient (Black et al., 1998). This interesting property of anisotropic diffusion is illustrated in Figure 41. For sake of clarity, the algorithm developed by Perona and Malik is rather extensively detailed in Appendix D and we discuss here only the results obtained by filtering the noisy picture in Figure 41a. Contrary to the isotropic nonlinear diffusion based on the Nagumo equation, the edge of the image remains well localized for all the processing times presented
Nonlinear Systems for Image Processing
(a)
(b)
(c)
(d)
(e)
(f)
137
(g) (h) (i) FIGURE 41 Noise filtering based on anisotropic diffusion. The filtering images are obtained using the algorithm detailed in Appendix D with the parameters dt = 0.01 and K = 0.09. (a) initial image, (b)−(i) images for the respective processing times t = 1, t = 2, t = 3, t = 4, t = 5, t = 6, t = 7, and t = 8.
in Figure 41. However, although the noise seems removed for processing times exceeding t = 5, the contrast of the image is never enhanced. Therefore, anisotropic diffusion and nonlinear diffusion do not share the same weakness and it could be interesting to attempt to circumvent the limitations of this two techniques. For instance, if we compare the continuous Equation (9) of nonlinear diffusion with the anisotropic Equation (69) proposed by Perona and Malik, it is clear that the anisotropy can be introduced into our system via the coupling parameter D. Moreover, with Perona and Malik’s method, the pixel brightness does not directly experience the nonlinearity as in our method. Therefore, the nonlinear noise filtering tool presented in Section IV.C.1 could be more efficient if the interesting properties of anisotropic
138
Saverio Morfu et al.
diffusion were also considered by introducing a coupling law. In particular we expect that the anisotropy preserves the location of the image edges, while the nonlinearity enhances the image contrast and removes the noise in the same time. Finally, we close this chapter by presenting another interesting and nonintuitive phenomenon that occurs in nonlinear systems under certain conditions. This effect, known as the stochastic resonance (SR) effect was introduced in the 1980s to account for the periodicity of ice ages (Benzi et al., 1982). Since then, the SR effect has been widely reported in a growing variety of nonlinear systems (Gammaitoni et al., 1998), where it has been shown that adding an appropriate amount of noise to a coherent signal at a nonlinear system input enhances the response of the system. Detection of subthreshold signal using noise has been proven in neural information process (Longtin, 1993; Nozaki et al., 1999; Stocks and Manella, 2001) and in data transmission fields (Barbay, et al., 2001; Comte and Morfu, 2003; Duan and Abbott, 2005; Morfu et al., 2003; Zozar and Amblard, 2003), as well as information transmission in array such as a stochastic resonator (Báscones et al., 2002; Chapeau-Blondeau, 1999; Lindner et al., 1998; Morfu, 2003). Recent studies have also shown that noise can enhance image perception (Moss et al., 2004; Simonotto et al., 1997), autostereogram interpretation (Ditzinger et al., 2000), human visual perception by microsaccades in the retina (Hongler et al., 2003), and image processing (Vaudelle et al., 1998; Chapeau-Blondeau, 2000; Histace and Rousseau, 2006; Blanchard et al., 2007). The investigation of noise effects in nonlinear systems is undoubtedly of great interest in nonlinear signal processing or in image processing context (Zozar and Amblard, 1999, 2005).
CXY and RXY
0.65 0.6 0.5 0.4 0.3 0.2 0.1 0 0
0.2 0.4 0.6 0.8 1 1.2 1.4 Noise RMS amplitude s (a) (b) FIGURE 42 (a) Initial black-and-white image with p1 = 0.437. (b) Similarity measures from Eqs. (72) and (73) versus the noise RMS amplitude value σ for Vth = 1.1.
Nonlinear Systems for Image Processing
139
We thus propose to present the phenomenon of SR using the methodology exposed in proposed by Chapeau-Blondeau (2000). Moreover, to show a visual perception of the SR effect, we consider the black-and-white image of Figure 42a, where we note the probability p1 to have a white pixel and p0 = 1 − p1 the probability to have a black one. A gaussian white spatial noise ηi, j with RMS amplitude value σ is added in each pixel Ii, j of the initial image. The resulting noisy image is then threshold filtered with a threshold Vth to obtain the image Ib , according to the following threshold filtering rule:
if Ii, j + ηi, j > Vth then Ibi, j = 1 else Ibi, j = 0.
(71)
The similarity between the two images I and Ib can then be quantified by the cross-covariance (Chapeau-Blondeau, 2000)
G
H
/
CIIb = BG
1 20/ 1 20 I − I I b − Ib
HG H, 1 202 / 1 202 I− I Ib − I b
/
(72)
or by
G
H
IIb RIIb = BG HG H , I 2 Ib2 where < . > corresponds to an average over the images. These two similarity measures are defined by
p1 (1 − Fη (Vth − 1)) RIIb = B ! "! " ! " . p1 p1 1 − Fη (Vth − 1) + 1 − p1 1 − Fη (Vth ) and
CIIb =
p1 (1 − Fη (Vth − 1)) − p1 q1 . # 2 2 (p1 − p1 )(q1 − q1 )
(73)
140
Saverio Morfu et al.
with q1 = p1 (1 − Fη (Vth − 1)) + (1 − p1 )(1 − Fη (Vth )) and where Fη is the cumulative distribution function of the noise. In the case of a gaussian white noise of RMS amplitude σ, the cumulative distribution function can be expressed as
! " u 1 1 Fη (u) = + er f √ . 2 2 2σ
(74)
Du In Eq. (74) the error function is defined by er f (u) = √2π 0 exp(−t2 )dt. The two quantities expressed in Eqs. (72) and (73) are plotted versus the RMS noise amplitude σ in Figure 42b, where a resonant-like behavior reveals the standard stochastic resonance signature. Indeed, there exists an optimum amount of noise that maximizes the similarity measures in Eqs. (72) and (73). According to Figure 42b, this optimal noise RMS value is σ = 0.4. To validate the similarity measures, we qualitatively analyze the pictures obtained for different noise amplitudes. It is confirmed in Figure 43 that the noise optimal value σ = 0.4 allows the best visual perception of the Coliseum through the nonlinear systems. Even if the model of human visual perception is more complex than a standard threshold filtering (Bálya et al., 2002), this simple representation is convenient to determine analytically the optimum amount of noise that provides the best visual perception of images via SR. Moreover, the SR phenomenon is shared by a wide class of nonlinear systems, including neural networks that also intervene in the process of image perception. Since neurons are basically threshold devices that are supposed to work in a noisy environment, the interest of considering noise effect seems to be of crucial importance in developing artificial intelligence applications that perfectly mimic the real behavior of nature. Therefore, for the next few decades we trust that one of the most interesting challenges could be completing the description of nonlinear models by including the contribution of noise effects.
(a) (b) (c) (d) FIGURE 43 (a), (b), (c), (d), Threshold filtered image with the rules (71) and a threshold Vth = 1.1 and with white gaussian noise with respective RMS noise amplitude σ = 0.1, σ = 0.4, σ = 0.8, σ = 1.4.
Nonlinear Systems for Image Processing
141
ACKNOWLEDGMENTS S. Morfu thanks J.M. Bilbault and O. Michel for giving him the opportunity to evolve in this fascinating scientific world and extends his appreciation to P.O. Amblard, F. ChapeauBlondeau, and D. Rousseau who have brought specific references to his attention. He is also grateful to his colleague Julien Dubois for useful advice. P. Marquié dedicates this chapter to Arnaud and Julie and thanks J.M. Bilbault and M. Remoissenet. The authors take this opportunity to highlight the technical assistance of M. Rossé for the design of the sinusoidal nonlinear resistor. Finally, S. Morfu warmly dedicates this chapter to Giovanni and Grazia Morfu.
142
Saverio Morfu et al.
APPENDIX A Response of a Cell of an Overdamped Network In the uncoupled case, and for α = 1/2, a particle of displacement W follows
1 dW = −W(W − )(W − 1). dt 2
(A1)
Separating the variables in Eq. (A1) yields
2dW 4dW 2dW − + = −dt, W W − 1/2 W − 1
(A2)
which can be integrated to obtain 1 W(W − 1) = K exp− 2 t , (W − 1/2)2
(A3)
where K is an integration constant. Equation (A3) can be arranged as a second-order equation in W
! W
2
1 − Ke
− 12 t
"
! − W 1 − Ke
− 12 t
"
1 1 − Ke− 2 t = 0. 4
(A4)
Provided that the discriminant is positive, the solutions are given by
1 1 W(t) = ± 2 2
B
1 1
1 − Ke− 2 t
.
(A5)
Assuming that initially the position of the particle is W(t = 0) = W 0 , the integration constant K can be expressed in the form
K=
W 0 (W 0 − 1) (W 0 − 12 )2
.
(A6)
Inserting the constant Eq. (A6) in the solution in Eq. (A5) leads to the following expression of the displacement:
! " |W 0 − 12 | 1 1± # W(t) = . 1 2 (W 0 − 12 )2 − W 0 (W 0 − 1)e− 2 t
(A7)
Nonlinear Systems for Image Processing
143
Assuming that when t # → +∞, the particle evolves to the steady states W = 0 for W 0 < 1/2 and W = 1 for W 0 > 1/2, we finally obtain the displacement of the particle with initial position W 0 as
! " W 0 − 12 1 1+ # . W(t) = 1 2 (W 0 − 12 )2 − W 0 (W 0 − 1)e− 2 t
(A8)
APPENDIX B Recall of Jacobian Elliptic Function We recall here the properties of Jacobian elliptic functions used in Section II.B. These three basic functions—cn(u, k), sn(u, k), and dn(u, k)— play an important role in nonlinear evolution equations and arise from the inversion of the elliptic integral of first kind (Abramowitz and Stegun, 1970):
E u(ψ, k) =
0
ψ
dz , ; 1 − k sin2 z
(B1)
where k ∈ [0; 1] is the elliptic modulus. The Jacobian elliptic functions are defined by
# sn(u, k) = sin(ψ), cn(u, k) = cos(ψ), dn(u, k) = 1 − k sin2 (ψ). (B2) This definition involves the following properties for the derivatives:
d sn(u, k) = cn(u, k)dn(u, k), du d cn(u, k) = −sn(u, k)dn(u, k), du d dn(u, k) = −ksn(u, k)cn(u, k). du
(B3)
Considering the circular function properties, we also have
sn2 (u, k) + cn2 (u, k) = 1.
(B4)
144
Saverio Morfu et al.
Moreover, using the result in Eq. (B2), we obtain the following identity:
dn2 (u, k) + ksn2 (u, k) = 1.
(B5)
APPENDIX C Evolution of an Overdamped Particle Experiencing a Multistable Potential The equation of motion of an overdamped particle submitted to the sinusoidal force in Eq. (56) can be expressed as
dW = −β(n − 1) sin 2π(n − 1)W , dt
(C1)
where W represents the particle displacement. The steady states of the system are deduced from the zeros of the nonlinear force. Using the methodology exposed in Section II.A.1, we can establish that the roots of the nonlinear force correspond alternatively to unstable and stable steady states. If k is an integer, the unstable and stable states of the system are written, respectively, as follows:
Wthk =
k (n − 1)
Wk∗ =
2k + 1 2(n − 1)
k ∈ Z.
(C2)
Separating the variables of Eq. (C1), we obtain
dW
= −β(n − 1)dt.
(C3)
sin 2π(n − 1)W Using the identity sin(2a) = 2 sin a cos a, Eq. (C3) becomes
dW = −β(n − 1)dt. tan [π(n − 1)W] cos2 [π(n − 1)W]
(C4)
Next, considering the derivative of the tangent function in Eq. (C4), yields
1 π(n − 1)
E
t 0
d tan [π(n − 1)W] =− tan [π(n − 1)W]
E
t 0
2β(n − 1)dt.
(C5)
Nonlinear Systems for Image Processing
145
A direct integration of Eq. (C5) gives
tan π(n − 1)W = tan π(n − 1)W 0 e−β(n−1)
2 2πt
,
(C6)
where W 0 denotes the initial position of the particle. Inverting the tangent function provides straightforwardly the solution of Eq. (C1) in the form
W(t) =
! " / 0 k 1 2 arctan tan π(n − 1)W 0 e−β(n−1) 2πt + , π(n − 1) n−1 (C7)
where k is an integer coming from the tangent inversion. Note that from a physical point of view, k must ensure that the particle position evolves toward one of the stable states of the system for a sufficiently long time, that is, when t # → +∞. Indeed, for an initial condition between two consecutive unstable steady states, the asymptotic behavior of the uncoupled network can reduce to the following rule:
if
2k − 1 2k + 1 < W0 < 2(n − 1) 2(n − 1)
W(t # → +∞) =
k (n − 1)
(C8)
This rule can be transformed to yield
if k −
1 1 < (n − 1)W 0 < k + 2 2
W(t # → +∞) =
k (n − 1)
(C9)
Finally, identifying Eq. (C7) with Eq. (C9) when t # → +∞, we deduce that k must be the nearest integer of W 0 (n − 1).
APPENDIX D Perona and Malik Anisotropic Diffusion Algorithm We recall here the algorithm introduced by Perona and Malik to compute their method based on anisotropic diffusion equation. The anisotropic diffusion Eq. (69) can be discretized with the time step dt
146
Saverio Morfu et al.
to obtain
Ist+1 = Ist +
dt g(∇Is, p )∇Is, p . ηs
(D1)
p∈Nr
In Eq. (D1), Ist represents the brightness of the pixel located at the position s in a discrete 2D grid that corresponds to the filtered image after a processing time t. ηs is the number of neighbors of the pixel s, that is, 4, except for the image edge, where ηs = 3 and for the image corners where ηs = 2. The spatial neighborhood of the pixel s is noted Nr. The local gradient ∇Is, p can be estimated by the difference of brightness between the considered pixel s and its neighbor p:
∇Is, p = Ip − Ist ,
p ∈ Nr.
(D2)
Finally, the description of the system is completed by defining the edge stopping function g(x) as the Lorentzian function:
g(x) =
1 1+
x2 K2
,
(D3)
where K is a positive parameter.
REFERENCES Abramowitz, M., and Stegun, I. A. (1970). Handbook of Mathematical Functions. Dover, New York, p. 569. Acosta-Serafini, P., Ichiro, M., and Sodini, C. (2004). A linear wide dynamic range CMOS image sensor implementing a predictive multiple sampling algorithm with overlapping integration intervals. IEEE J. Solid-State Circuits 39, 1487–1496. Adamatzky, A., and de Lacy Costello, B. (2003). On some limitations of reaction-diffusion chemical computers in relation to Voronoi diagram and its inversion, Phys. Lett. A 309, 397–406. Adamatzky, A., de Lacy Costello, B., Melhuish, C., and Ratcliffe, N. (2004). Experimental implementation of mobile robot taxis with onboard Belousov-Zhabotinsky chemical medium. Mater. Sci. Eng. C 24, 541–548. Adamatzky, A., de Lacy Costello, B., and Ratcliffe, N. (2002). Experimental reaction-diffusion preprocessor for shape recognition. Phys. Lett. A 297, 344–352. Agladze, K., Magone, N., Aliev, R., Yamaguchi, T., and Yoshikawa, K. (1977). Finding the optimal path with the aid of chemical wave. Physica D 106, 247–254. Agrawal, G. P. (2002). Fiber-Optic Communication Systems, 3rd ed., Wiley Inter-Science, New York. Arena, P., Basile, A., Bucolo, M., and Fortuna, L. (2003). An object oriented segmentation on analog CNN chip. IEEE Trans. Circ. Syst. I 50, 837–846.
Nonlinear Systems for Image Processing
147
Bálya, D., Roska, B., Roska, T., and Werblin, F. S. (2002). A CNN framework for modeling parallel processing in a mammalian retina. Int. J. Circ. Theor. Appl. 30, 363–393. Blanchard, S., Rousseau, D., Gindre, D., and Chapeau-Blondeau, F. (2007). Constructive action of the speckle noise in a coherent imaging system. Optics Lett. 32, 1983–1985. Barbay, S., Giacomelli, G., and Marin, F. (2001). Noise-assisted transmission of binary information: theory and experiment. Phys. Rev. E 63, 051110/1–9. Báscones, R., Garcìa-Ojalvo, J., and Sancho, J. M. (2002). Pulse propagation sustained by noise in arrays of bistable electronic circuits. Phys. Rev. E 65, 061108/1–5. Beeler, G. W., and Reuter, H. (1997). Reconstruction of the action potentials of ventricular myocardial fibers. J. Physiol. 268, 177–210. Benzi, R., Parisi, G., Sutera, A., and Vulpiani, A. (1982). Stochastic resonance in climatic change. Tellus 34, 10–16. Binczak, S., Comte, J. C., Michaux, B., Marquié, P., and Bilbault, J. M. Experiments on a nonlinear electrical reaction-diffusion line. Electron. Lett. 34, 1061–1062. Black, M. J., Sapiro, G., Marimont, D. H., and Heeger, D. (1998). Robust anisotropic diffusion IEEE Trans. On Image Processing 7, 421–432. Bressloff, P. C., and Rowlands, G. (1997). Exact travelling wave solutions of an “integrable” discrete reaction-diffusion equation. Physica D 106, 255–269. Caponetto, R., Fortuna, L., Occhipinti, L., and Xibilia, M. G. (2003). Sc-CNNs for chaotic signal applications in secure communication systems, Int. J. Bifurcat. Chaos, 13, 461–468. Carmona Galan, R., Jimenez-Garrido, F., Dominguez-Castro, R., Espejo, S., Roska, T., Rekeczky, C., Petras, I., and Rodriguez-Vazquez, A. (2003). A bio-inspired two-layer mixed-xignal flexible programmable chip for early vision. IEEE Trans. Neural Networ. 14, 13131–336. Chapeau-Blondeau, F. (1999). Noise-assisted propagation over a nonlinear line of threshold elements Electr. Lett. 35, 1055–1056. Chapeau-Blondeau, F. (2000). Stochastic resonance and the benefit of noise in nonlinear systems. Lect Notes Phys. 550, 137–155. Chen, H-C., Hung, Y-C., Chen, C-K., Liao, T-L., and Chen, C-K. (2006). Image processing algorithms realized by discrete-time cellular neural networks and their circuit implementation. Chaos Soliton Fract. 29, 1100–1108. Chua, L. O., and Yang, L. (1998a). Cellular neural networks: Theory, IEE Trans. Circ. Syst. 35, 1257–1272. Chua, L. O., and Yang, L. (1988b). Cellular neural networks: Applications. IEE Trans. Circ. Syst. 35, 1273–1290. Chua, L. O. (1998). CNN: A paradigm for complexity (World Scientific Series on Nonlinear Science, Series A, vol. 31). World Scientific Publishing, Singapore. Comte, J. C. (1996). Etude d’une ligne non linéaire de type Nagumo-Neumann. DEA Laboratory report LIESIB, Dijon, France. Comte, J. C., and Marquié, P. (2002). Generation of nonlinear current-voltage characteristics. A general method. Int. J. Bifurc. Chaos 12, 447–449. Comte, J. C., Marquié, P., and Bilbault, J. M. (2001). Contour detection based on nonlinear discrete diffusion in a cellular nonlinear network. Int. J. Bifurc. Chaos 11, 179–183. Comte, J. C., Marquié, P., Bilbault, J. M., and Binczak, S. (1998). Noise removal using a twodimensional diffusion network Ann. Telecom. 53, 483–487. Comte, J. C., Marquié, P., and Remoissenet, M. (1999). Dissipative lattice model with exact travelling discrete kink-soliton solutions: Discrete breather generation and reaction diffusion regime. Phys. Rev. E 60, 7484–7489. Comte, J. C., and Morfu, S. (2003). Stochastic resonance: Another way to retrieve subthreshold digital data. Phys. Lett. A 309, 39–43. Comte, J. C., Morfu, S., and Marquié, P. (2001). Propagation failure in discrete bistable reactiondiffusion systems: Theory and experiments. Phys. Rev. E 64, 027102/1–4.
148
Saverio Morfu et al.
Cuomo, K. M., and Oppenheim, A. V. (1993). Circuit implementation of synchronized chaos with applications to communications. Phys. Rev. Lett. 71, 65–68. Czuni, L., and Sziranyi, T. (2001). Motion segmentation and tracking with edge relaxation and optimization using fully parallel methods in the cellular nonlinear network architecture. Real-Time Imaging 7, 77–95. Dedieu, H., Kennedy, M. P., and Hasler, M. (1993). Chaos shift keying: Modulation and demodulation of a chaotic carrier using self-synchronizing Chua’s circuits. IEEE Trans. Circ. Syst. Part II 40, 634–642. Ditzinger, T., Stadler, M., Strüber, D., and Kelso, J. A. S. (2000). Noise improves threedimensional perception: Stochastic resonance and other impacts of noise to the perception of autostereograms, Phys Rev. E 62, 2566–2575. Duan, F., and Abbott, D. (2005). Signal detection for frequency-shift keying via short-time stochastic resonance. Phys. Lett. A 344, 401–410. Dudek, P. (2006). Adaptive sensing and image processing with a general-purpose pixelparallel sensor/processor array integrated circuit. International Workshop on Computer Architecture for Machine Perception and Sensing, Montreal, 2006, 1–6. El-Gamal, A., Yang, D., and Fowler, B. (1999). Pixel level processing—Why, what and how? Proc. SPIE Electr. Imaging ’99 Conference 3650, 2–13. Erneux, T., and Nicolis, G. (1993). Propagating waves in discrete bistable reaction-diffusion systems. Physica D 67, 237–244. Espejo, S., Rodriguez-Vázquez, A., Carmona, R., and Dominguez-Castro, R. (1996). A 0.8-μm CMOS programmable analog-array-processing vision-chip with local logic and image-memory. European Solid-State Devices and Reliability Conference, Neuchatel, 1996, pp. 276–279. Espejo, S., Dominguez-Castro, R., Linan, G., and Rodriguez-Vázquez, A. (1998). A 64 × 64 CNN universal chip with analog and digital I/O. Proc. 5th IEEE Int. Conf. Electronics, Circuits and Systems, 1998, pp. 203–206. Fife, P. C. (1970). Mathematical Aspects of Reacting and Diffusing Systems (Lecture Notes in Biomathematics, vol. 28). Springer-Verlag, New York. Fisher, R. A. (1937). The wave of advance of advantageous genes. Ann. Eugen. 7, 355–369. Fossum, E. (1993). Active pixel sensors: Are CCDs dinosaurs? Int. Soc. Opt. Eng. (SPIE) 1900, 2–14. Fossum, E. (1997). CMOS image sensor: Electronic camera on a chip. IEEE Trans. Electr. Dev. 44, 1689–1698. Gammaitoni, L., Hänggi, P., Jung, P., and Marchesoniand, F. (1998). Stochastic resonance. Rev. Mod. Phys. 70, 223–282. Gonzalez, R. C., and Wintz, P. (1997). Digital Image Processing, 2nd ed. Addison-Wesley. Henry, D. (1981). Geometric Theory of Semilinear Parabolic Equations. Springer-Verlag, New York. Hirota, R., and Suzuki, K. (1970). Studies on lattice solitons by using electrical networks. J. Phys. Soc. Jpn. 28, 1366–1367. Histace, A., and Rousseau, D. (2006). Constructive action of noise for impulsive noise removal in scalar images. Electr. Lett. 42, 393–395. Holden, A. V., Tucker, J. V., and Thompson, B. C. (1991). Can excitable media be considered as computational systems? Physica D 49, 240–246. Hongler, M., De Meneses, Y., Beyeler, A., and Jacquot, J. (2003). The resonant retina: Exploiting vibrational noise to optimally detect edges in an image. IEEE Trans. Patt. Anal. Machine Intell. 25, 1051–1062. Izhikevitch, E. M. (2007). Dynamical Systems in Neuroscience: The Geometry of Excitability and Bursting. MIT Press, Cambridge, Massachussetts. Jäger, D. (1985). Characteristics of travelling waves along the nonlinear transmission line for monolithic integrated circuits: a review, Int. J. Electron. 58, 649–669.
Nonlinear Systems for Image Processing
149
Julián, P., and Dogaru R. (2002). A piecewise-linear simplicial coupling cell for CNN gray-level image processing. IEEE Trans. Circ. Syst. I 49, 904–913. Keener, J. P. (1987). Propagation and its failure in coupled systems of discrete excitable cells. SIAM J. Appl. Math. 47, 556–572. Keener, J. P. (2000). Homogeneization and propagation in the bistable equation. Physica D 136, 1–17. Kladko, K. (2000). Universal scaling of wave propagation failure in arrays of coupled nonlinear cells. Phys. Rev. Lett. 84, 4505–4508. Kozlowski, L., Rossi, G., Blanquart, L., Marchesini, R., Huang, Y., Chow, G., Richardson, J., and Standley, D. (2005). Pixel noise suppression via SoC management of target reset in a 1920 × 1080 CMOS image sensor. IEEE J. Solid-State Circ. 40, 2766–2776. Kuhnert, L. (1986). A new optical photochemical device in a light-sensitive chemical active medium. Nature 319, 393–394. Kuhnert, L., Agladze K. I., and Krinsky, V. I. (1989). Image processing using light-sensitive chemical waves. Nature 337, 244–247. Kuusela, T. (1995). Soliton experiments in transmission lines. Chao Solitons Fract. 5, 2419–2462. Kwok, H. S., and Tang, W. K. S. (2007). A fast image encryption system based on chaotic maps with finite precision representation. Chaos Solitons Fract. 47, 1518–1529. Lindner, J. F., Chandramouli, S., Bulsara Adi, R., Löcher, M., and Ditto, W. L. (1998). Noise enhanced propagation. Phys. Rev. Lett. 81, 5048–5051. Litwiller, D. (2001). CCD vs. CMOS: Facts and fiction. Photon. Spectra ·(Special Issue)·, 154–158. Loinaz, M., Singh, K., Blanksby, A., Inglis, D., Azadet, K., and Ackland, B. (1998). A 200-mv 3.3-v CMOS color camera IC producing 352 × 288 24-b Video at 30 frames/s. IEEE J. Solid-State Circ. 33, 2092–2103. Longtin, A. (1993). Stochastic resonance in neuron models. J. Stat. Phys. 70, 309–327. Marquié, P., Bilbault, J. M., and Remoissenet, M. (1995). Observation of nonlinear localized modes in an electrical lattice. Phys. Rev. E 51, 6127–6133. Marquié, P., Binczak, S., Comte, J. C., Michaux, B., and Bilbault, J. M. (1998). Diffusion effects in a nonlinear electrical lattice. Phys. Rev. E 57, 6075–6078. Morfu, S. (2002c). Etude des défauts et perturbations dans les réseaux électroniques dissipatifs non linéaires: Applications à la transmission et au traitement du signal. Ph. D. thesis, Laboratory LE2I, Dijon, France. Morfu, S. (2003). Propagation failure reduction in a Nagumo chain. Phys. Lett. A 317, 73–79. Morfu, S. (2005). Image processing with a cellular nonlinear network. Phys. Lett. A 343, 281–292. Morfu, S., and Comte, J. C. A. (2004). Nonlinear oscillators network devoted to image processing. Int. J. Bifurc. Chaos 14, 1385–1394. Morfu, S., Bossu, J., and Marquié, P. Experiments on an electrical nonlinear oscillators network. Int. J. Bifurcat. Chaos, 17, 3535–3538 (2007). To appear. Morfu, S., Bossu, J., Marquié, P., and Bilbault, J. M. (2006). Contrast enhancement with a nonlinear oscillators network. Nonlinear Dynamics 44, 173–180. Morfu, S., Comte, J. C., and Bilbault, J. M. (2003). Digital information receiver based on stochastic resonance. Int. J. Bifurc. Chaos 13, 233–236. Morfu, S., Comte, J. C., Marquié, P., and Bilbault, J. M. (2002). Propagation failure induced by coupling inhomogeneities in a nonlinear diffusive medium. Phys. Lett. A 294, 304–307. Morfu, S., Nekorkin, V. B., Bilbault, J. M., and Marquié, M. (2002). The wave front propagation failure in an inhomogeneous discrete Nagumo chain: Theory and experiments. Phys. Rev. E 66, 046127/1–8. Morfu, S., Nofiélé, B., and Marquié, P. (2007). On the use of multistability for image processing. Phys. Lett. A 367, 192–198. Moss, F., Ward, L. M., and Sannita, W. G. (2004). Stochastic resonance and sensory information processing: A tutorial and review of application. Clin. Neurophysiol. 115, 267–281.
150
Saverio Morfu et al.
Murray, J. D. (1989). Mathematical Biology. Springer-Verlag, Berlin. Nagumo, J., Arimoto, S., and Yoshisawa, S. (1962). An active pulse transmission line simulating nerve axon. Proc. IRE 50, 2061–2070. Nagashima, H., and Amagishi, Y. (1978). Experiments on the Toda lattice using nonlinear transmission line. J. Phys. Soc. Jpn. 45, 680–688. Nozaki, D., Mar, D. J., Grigg, P., and Collins, J. J. (1999). Effects of colored noise on stochastic resonance in sensory neurons. Phys. Rev. Lett. 82, 2402–2405. Occhipinti, L., Spoto, G., Branciforte, M., and Doddo, F. (2001). Defects detection and characterization by using cellular neural networks. IEEE Int. Symposium on Circuits and Systems ISCAS 3, Sydney, Australia, May 6–9, 2001, pp. 481–484. Paquerot, J. F., and Remoissenet, M. (1994). Dynamics of nonlinear blood pressure waves in large arteries. Phys. Lett. A 194, 77–82. Perona, P., and Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Patt. Anal. Machine Intell. 12, 629–639. Petras, I., Rekeczky, C., Roska, T., Carmona, R., Jimenez-Garrido, F., and RodriguezVazquez, A. (2003). Exploration of spatial-temporal dynamic phenomena in a 32 × 32-cells stored program 2-layer CNN universal machine chip prototype. J. Circ. Syst. Computers 12, 691–710. Rambidi, N. G., Shamayaev, K. E., and Peshkov, G. Y. (2002). Image processing using lightsensitive chemical waves. Phys. Lett. A, 298, 2002, 375–382. Rambidi, N. G., and Yakovenchuk, D. (2001). Chemical reaction-diffusion implementation of finding the shortest paths in a labyrinth. Phys. Rev. E 63, 026607/1–6. Rekeczky, C., Tahy, A., Vegh, Z., and Roska, T. (1999). CNN-based spatio-temporal nonlinear in filtering and endocardial boundary detection in echocardiography. Int. J. Circuit Theory Applicat. 27, 171–207. Remoissenet, M. (1999). Waves Called Solitons: Concepts and Experiments, 3rd rev. enlarged ed., Springer-Verlag, Berlin, p. 284. Roska, T., and Rodriguez-Vazquez, A. (2000). Review of CMOS implementations of the CNN universal machine-type visual microprocessors. ISCAS 2000 IEEE International Symposium on Circuits and Systems, Geneva, 2000, pp. 120–123. Sakakibara, M., Kawahito, S., Handoko, D., Nakamura, N., Higashi, M., Mabuchi, K., and Sumi, H. (2005). A high-sensitivity CMOS image sensor with gain-adaptative column amplifiers/IEEE J. Solid-State Circ. 40, 1147–1156. Scott, A. C. (1970). Active and Nonlinear Wave Propagation in Electronics. Wiley Interscience, New York. Scott, A. (1999). Nonlinear Science: Emergence and Dynamics of Coherent Structures (Oxford Applied and Engineering Mathematics, 8), Oxford University Press, New York. Seitz, P. (2000). Solid-state image sensing. Handbook of Computer Vision and Applications 1, 165–222. Serra, J. (1986). Introduction to mathematical morphology. Comput. Vision Graph. 35, 283–305. Short, K. M., and Parker, A. T. (1998). Unmasking a hyperchaotic communication scheme. Phys. Rev. E 58, 1159–1162. Simonotto, E., Riani, M., Seife, C., Roberts, M., Twitty, J., and Moss, F. (1997). Visual perception of stochastic resonance. Phys. Rev. Lett. 78, 1186–1189. Smith, S., Hurwitz, J., Torrie, M., Baxter, D., Holmes, A., Panaghiston, M., Henderson, R., Murrayn, A., Anderson, S., and Denyer, P. A single-chip 306 × 244-pixel CMOS NTSC video camera. In “Solid-State Circuits Conference, 1998. Digest of Technical Papers. 45th ISSCC 1998 IEEE International,” pp. 170–171. 432, San Francisco. Stocks, N. G., and Manella, R. (2001). Generic noise-enhanced coding in neuronal arrays. Phys. Rev. E 64, 030902/1–4. Taniuti, T., and Wei, C. C. (1968). Reductive perturbation method in nonlinear wave propagation. J. Phys. Soc. Jap. 21, 941–946.
Nonlinear Systems for Image Processing
151
Taniuti, T., and Yajima, N. (1969). Perturbation method for a nonlinear wave modulation. J. Math. Phys. 10, 1369–1372. Tetzlaff. R. (ed). (2002). Cellular Neural Networks and Their Applications, World Scientific Publishing, Singapore. Teuscher, C., and Adamatzky, A (eds). (2005). Proceedings of the 2005 Workshop on Unconventional Computing, From Cellular Automata to Wetware. Luniver Press, Lightning Source. Toda, M. (1967). Wave propagation in anharmonic lattices. J. Phys. Soc. Jpn. 23, 501–506. Udaltsov, V. S., Goedgebuer, J. P., Larger, L., Cuenot, J. B., Levy, P., and Rhodes, W. T. (2003). Cracking chaos-based encryption systems ruled by nonlinear time delay differential equations. Phys. Lett. A 308, 54–60. Vaudelle, F., Gazengel, J., Rivoire, G., Godivier, X., and Chapeau-Blondeau, F. (1998). Stochastic resonance and noise-enhanced transmission of spatial signals in optics: The case of scattering. J. Opt. Soc. Am. B 13, 2674–2680. Venetianer, P. L., Werblin, F., Roska, T., Chua, L. O. (1995). Analogic CNN algorithms for some image compression and restoration tasks. IEEE Trans. Circ. Syst. I 42, 278–284. Yadid-Pecht, O., and Belenky, A. (2003). In-pixel autoexposure CMOS APS. IEEE J. Solid-State Circ. 38, 1425–1428. Yamgoué, S. B., Morfu, S., and Marquié, P. (2007). Noise effects on gap wave propagation in a nonlinear discrete LC transmission line. Phys. Rev E 75, 036211/1–036211/1–7. Yu, W., and Cao, J. (2006). Cryptography based on delayed chaotic neural networks. Phys Lett. A 356, 333–338. Zakharov, V. E., and Wabnitz, S. (1998). Optical solitons: Theoretical challenges and industrial perspectives. Springer-Verlag, Berlin. Zozor, S., and Amblard, P. O. (1999). Stochastic resonance in discrete time nonlinear AR(1) models. IEEE Trans. Signal Proc. 49, 109–120. Zozor, S., and Amblard, P. O. (2003). Stochastic resonance in locally optimal detectors. IEEE Trans. Signal Proc. 51, 3177–3181. Zozor, S. and Amblard P. O. (2005). Noise aidded processing: revisiting dithering in a SigmaDelta quantizer. IEEE Trans. Signal Proc. 53, 3202–3210.
This page intentionally left blank
CHAPTER
4 Complex-Valued Neural Network and Complex-Valued Backpropagation Learning Algorithm Tohru Nitta*
Contents
I Introduction II The Complex-Valued Neural Network A The Complex-Valued Neuron B Multilayered Complex-Valued Neural Network III Complex-Valued Backpropagation Learning Algorithm A Complex-Valued Adaptive Pattern Classifier B Learning Convergence Theorem C Learning Rule IV Learning Speed A Experiments B Factors to Improve Learning Speed C Discussion V Generalization Ability VI Transforming Geometric Figures A Examples B Systematic Evaluation C Mathematical Analysis D Discussion VII Orthogonality of Decision Boundaries in the Complex-Valued Neuron A Mathematical Analysis B Utility of the Orthogonal Decision Boundaries VIII Conclusions References
154 155 155 162 162 162 163 164 169 169 174 175 175 181 182 191 194 203 209 210 211 217 218
* National Institute of Advanced Industrial Science and Technology, Tsukuba Central 2, 1-1-1 Umezono, Tsukuba, Ibaraki, 305-8568 Japan Advances in Imaging and Electron Physics,Volume 152, ISSN 1076-5670, DOI: 10.1016/S1076-5670(08)00604-6. Copyright © 2008 Elsevier Inc. All rights reserved.
153
154
Tohru Nitta
I. INTRODUCTION In recent years there has been much interest in complex-valued neural networks whose parameters (weights and threshold values) are all complex numbers and their applications (Hirose, 2003; ICANN, 2007; ICANN/ICONIP, 2003; ICONIP, 2002; IJCNN, 2006; KES, 2001, 2002, 2003). The applications of complex-valued neural networks cover various fields, including telecommunications, image processing, computer vision, and independent component analysis. Many types of information can be represented naturally in terms of angular or directional variables that can be represented by complex numbers. For example, in computer vision, optical flow fields are represented as fields of oriented line segments, each of which can be described by a magnitude and direction, or a complex number. Thus, it is natural to choose complex-valued neural networks to process information expressed by complex numbers. One of the most popular neural network models is the multilayer neural network and the related backpropagation training algorithm (called real-BP here in the sense of treating real-valued signals) (Rumelhart, et al., 1986a, b). The Real-BP is an adaptive procedure widely used in training a multilayer perceptron for a number of classification applications in areas such as speech and image recognition. The Complex-BP algorithm is a complex-valued version of the Real-BP, which was proposed by several researchers independently in the early 1990s (Benvenuto and Piazza, 1992; Georgiou and Koutsougeras, 1992; Kim and Guest, 1990; Nitta, 1993, 1997; Nitta and Furuya, 1991). The Complex-BP algorithm can be applied to multilayered neural networks whose weights, threshold values, and input and output signals are all complex numbers. This algorithm enables the network to learn complex-valued patterns naturally and has several inherent properties. This chapter, elucidates the Complex-BP proposed in Nitta and Furuya, (1991) and Nitta (1993, 1997) and its properties (Nitta, 1997, 2000, 2003, 2004). The primary contents are as follows. (1) The multilayered complexvalued neural network model and the derivation of the related ComplexBP algorithm are described. The learning convergence theorem for the Complex-BP can be obtained by extending the theory of adaptive pattern classifiers (Amari, 1967) to complex numbers. (2) The average convergence speed of the Complex BP is superior to that of the Real-BP, whereas the generalization performance remains unchanged. In addition, the required number of learnable parameters is only about half√of the Real-BP, where a complex-valued parameter z = x + iy (where i = −1) is counted as two because it consists of a real part x and an imaginary part y. (3) The Complex-BP can transform geometric figures (e.g., rotation, similarity transformation, and parallel displacement of straight lines, circles), whereas the Real-BP cannot. Numerical experiments suggest that the
Complex-Valued Neural Network and Complex-Valued Backpropagation
155
behavior of a Complex-BP network that has learned the transformation of geometric figures is related to the identity theorem in complex analysis. Mathematical analysis indicates that a Complex-BP network that has learned a rotation, a similarity transformation, or a parallel displacement has the ability to generalize the transformation with an error represented by the sine. (4) Weight parameters of a complex-valued neuron have a restriction related to two-dimensional (2D) motion, and learning proceeds under this restriction. (5) The decision boundary of a complex-valued neuron consists of two hypersurfaces that intersect orthogonally and divide a decision region into four equal sections. The Exclusive OR (XOR) problem and the detection of symmetry problem, neither of which can be solved with a single real-valued neuron, can both be solved by a single complexvalued neuron with the orthogonal decision boundaries, revealing the potent computational power of complex-valued neurons. Furthermore, the fading equalization problem can be successfully solved by a single complex-valued neuron with the highest generalization ability. This chapter is organized as follows. Section II describes the multilayered complex-valued neural network model. Section III presents the derivation of the Complex-BP algorithm. Section IV experimentally investigates the learning speed. Section V deals with the generalization ability of the Complex-BP network. Section VI shows how the Complex-BP algorithm can be applied to the transformation of geometric figures and analyzes the behavior of the Complex-BP network mathematically. Section VII is devoted to a theoretical analysis of decision boundaries of the complexvalued neuron model and the constructive proof that some problems can be solved with a single complex-valued neuron with the orthogonal decision boundaries. Section VIII follows with our conclusions.
II. THE COMPLEX-VALUED NEURAL NETWORK This section describes the complex-valued neural network proposed by Nitta and Furuya (1991) and Nitta (1993, 1997).
A. The Complex-Valued Neuron 1. The Model The complex-valued neuron is defined as follows (Figure 1). The input signals, weights, thresholds, and output signals are all complex numbers. The net input Yn to a complex-valued neuron n is defined as:
Yn =
m
Wnm Xm + Vn ,
(1)
156
Tohru Nitta
X1
Wn1 Vn Output fC (z)
neuron n
XN
Yn 5 SWnm Xm 1Vn 5 z
WnN
m
FIGURE 1 A model neuron used in the Complex-BP algorithm. Xm, Yn, Vn, Wnm, z, and fC (z) are all complex numbers. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
where Wnm is the complex-valued weight connecting complex-valued neurons n and m, Xm is the complex-valued input signal from complexvalued neuron m, and Vn is the complex-valued threshold value of neuron n. To obtain the complex-valued output signal, convert the net input Yn into its√real and imaginary parts as follows: Yn = x + iy = z, where i denotes −1. The complex-valued output signal is defined as
fC (z) = fR (x) + ifR (y),
(2)
where fR (u) = 1/(1 + exp(−u)), u ∈ R (R denotes the set of real numbers), that is, the real and imaginary parts of an output of a neuron are the sigmoid functions of the real part x and imaginary part y of the net input z to the neuron, respectively. It is obvious that 0 < Re[ fC ], Im[ fC ] < 1, √ and | fC (z)| < 2. Note that the activation function fC is not a regular complex-valued function because the Cauchy–Riemann equations do not hold: ∂fC (z)/∂x + i∂fC (z)/∂y = (1 − fR (x))fR (x) + i(1 − fR (y))fR (y) = 0, where z = x + iy.
2. The Activation Function In the case of complex-valued neural networks, careful attention should be paid to the choice of activation functions. In the case of real-valued neural networks, an activation function of real-valued neurons is usually chosen to be a smooth (continuously differentiable) and bounded function such as a sigmoidal function. As some researchers have noted (Georgiou and Koutsougeras, 1992; Kim and Adali, 2003; Kuroe et al., 2002; Nitta, 1997), in the complex region, we should recall Liouville’s theorem (e.g., Derrick, 1984), which states that if a function G is regular at all z ∈ C and bounded, then G is a constant function, where C denotes the set of complex numbers. That is, we need to choose either the regularity or the boundedness for an activation function of complex-valued neurons. In the
Complex-Valued Neural Network and Complex-Valued Backpropagation
157
literature (Nitta and Furuya, 1991; Nitta, 1993, 1997), Eq. (2) is adopted as an activation function of complex-valued neurons, which is bounded but nonregular (that is, the boundedness is chosen). A first attempt to extend the real-valued neural network to complex numbers in 1990 was formulated as a complex-valued neuron with a regular complex function
fC (z) =
1 , 1 + exp(−z)
(3)
where z = x + iy, because the regularity of a complex function seemed to be natural and to produce many interesting results. Kim and Guest (1990) independently proposed the complex-valued neuron with this activation function [Eq. (3)]. However, the complex-valued backpropagation algorithm with this regular complex-valued activation function never converged in our experiments. We considered that the cause was nonboundedness of the complex function [Eq. (3)] and decided to adopt bounded functions. Actually, Eq. (3) has the periodic poles at the imaginary axis. The complex function defined in Eq. (2) is valid as an activation function of complex-valued neurons for several reasons. First, although Eq. (2) is not regular (i.e., the Cauchy—Riemann equations do not hold), there is a strong relationship between its real and imaginary parts. Consider a complexvalued neuron with n-inputs, weights wk = wkr + iwki ∈ C(1 ≤ k ≤ n), and a threshold value θ = θ r + iθ i ∈ C. Then, for n input signals xk + iyk ∈ C(1 ≤ k ≤ n), the complex-valued neuron generates
X + iY = fC
! n
wkr
k=1
= fR
+ iwki
/
xk + iyk
0
" r i + θ + iθ
! n
" wkr xk − wki yk + θ r
k=1
+ ifR
! n
wki xk
+ wkr yk
" +θ
i
(4)
k=1
as an output. Both the real and imaginary parts of the right-hand side n of Eq. (4) contain the common variables xk , yk , wkr , wki k=1 , and influence each other via those 4n variables. Moreover, the activation function [Eq. (2)] can process complex-valued signals properly through its amplitude-phase relationship. As described in Nitta (1997), the 1 − n − 1 Complex-BP network with the activation function Eq. (2) can transform
158
Tohru Nitta
geometric figures (e.g., rotation, similarity transformation and parallel displacement of straight lines, circles). This cannot be done without the ability to process signals properly through the amplitude-phase relationship in the activation function. Section VI is devoted to this topic. Second, it has been proved that the complex-valued neural network with the activation function Eq. (2) can approximate any continuous complexvalued function, whereas the one with a regular activation function (for example, fC (z) = 1/(1 + exp(−z)), proposed by Kim and Guest, 1990, and fC (z) = tanh(z) by Kim and Adali, 2003) cannot approximate any nonregular complex-valued function (Arena et al., 1993, 1998). That is, the complex-valued neural network with the activation function Eq. (2) is a universal approximator, and the one with a regular activation function is not a universal approximator. It should be noted that the complex-valued neural network with the regular complex-valued activation function such as fC (z) = tanh(z) with the poles can be a universal approximator on the compact subsets of the deleted neighborhood of the poles (Kim and Adali, 2003). The fact is very important as a theory; unfortunately, the complex-valued neural network for the analysis is not usual, that is, the output of the hidden neuron is defined as the product of several activation functions. Thus, the statement seems to be insufficient compared with the case of a nonregular complex-valued activation function. Therefore, the ability of complex-valued neural networks to approximate complexvalued functions relies heavily on the regularity of activation functions used. Third, the stability of the learning of the complex-valued neural network with the activation function Eq. (2) has been confirmed through some computer simulations (Arena et al., 1993; Nitta, 1993, 1997; Nitta and Furuya, 1991). Finally, the activation function Eq. (2) clearly satisfies the following five properties an activation function H(z) = u(x, y) + iv(x, y), z = x + iy should possess, which Georgiou and Koutsougeras (1992) pointed out: H is nonlinear in x and y. H is bounded. The partial derivatives ux , uy , vx , and vy exist and are bounded. H(z) is not entire. That is, there exists some zo ∈ C such that H(z0 ) is not regular. 5. ux vy is not identically equal to vx uy . 1. 2. 3. 4.
The activation function Eq. (2) is valid for the above reasons.
3. Relationship With the Real-Valued Neuron The following question arises in encountering the complex-valued neuron: Is a complex-valued neuron simply equivalent to the two real-valued neurons? The answer is no.
Complex-Valued Neural Network and Complex-Valued Backpropagation
159
We first confirm the basic structure of the weights of a real-valued neuron. Consider a real-valued neuron with n-inputs, weights wk ∈ R (1 ≤ k ≤ n), and a threshold value θ ∈ R. Let an output function fR : R → R of the neuron be fR (u) = 1/(1 + exp(−u)). Then, for n input signals xk ∈ R (1 ≤ k ≤ n), the real-valued neuron generates
$ fR
n
' wk xk + θ
(5)
k=1
as an output. This may be interpreted as follows: a real-valued neuron moves a point xk on a real line (one dimension) to another point wk xk whose distance from the origin is wk times as long as that of the point xk (1 ≤ k ≤ n), and regarding w1 x1 , . . . , wn xn as vectors w1 x1 , . . . , wn xn , the real-valued neuron adds them, resulting in a 1-dimensional (1D) realn moves the end point of the vector valued vector k=1 wk xk , and finally, n n k=1 wk xk to another point k=1 wk xk + θ (Figure 2). The output value of the real-valued neuron canbe obtained by applying the nonlinear transformation fR to the value nk=1 wk xk + θ. Thus, the real-valued neuron basically administers the movement of points on a real line (1D) and its weight parameters w1 , . . . , wn are completely independent of one another. Next, we examine the basic structure of the weights of the complexvalued neuron. Consider a complex-valued neuron with n-inputs, weights wk = wkr + iwki ∈ C (1 ≤ k ≤ n), and a threshold value θ = θ r + iθ i ∈ C. Then, for n input signals xk + iyk ∈ C (1 ≤ k ≤ n), the complex-valued
0
0
x1 w1x1
xn wn xn
0
n
S
n
S
wk xk
k51
wk xk 1
k51
fR 0 FIGURE 2 An image of the processing in a real-valued neuron. From Nitta, T. (2000). Figure 1 used by permission from Springer Science and Business Media.
160
Tohru Nitta
neuron generates
X + iY = fC
! " n (wkr + iwki )(xk + iyk ) + (θ r + iθ i ) k=1
! " ! " n n r i r i r i (wk xk − wk yk ) + θ + ifR (wk xk + wk yk ) + θ = fR k=1
k=1
(6) as an output. Hence, a single complex-valued neuron with n-inputs is equivalent to two real-valued neurons with 2n-inputs (Figure 3). We shall refer to a real-valued neuron corresponding to the real part X of an output of a complex-valued neuron as a real-part neuron and a real-valued neuron corresponding to the imaginary part Y as an imaginary-part neuron. Note here that
⎤ ⎞ x1 ⎜( ⎟ )⎢ ⎥ ⎜ wr −wi · · · wr −wi ⎢ y1 ⎥ r ⎟ θ X ⎥ ⎜ ⎟ ⎢ n n 1 1 = F⎜ ⎢ ... ⎥ + i ⎟ r i r Y θ ⎥ ⎜ w1i ⎟ ⎢ w1 · · · wn wn ⎝ ⎠ ⎣ xn ⎦ yn ! x1 cos α1 − sin α1 +··· = F |w1 | sin α1 cos α1 y1 r " θ xn cos αn − sin αn + |wn | + i , sin αn cos αn yn θ ⎛
⎡
w xX 5 w yY 5 k
x1 y1
w xX 1
xn yn
k
1
r n
w yX
n
wr k
w xY
w yX 52 wxY 52 wki
w yX w Xx
k
k
(7)
1
x1 w yY 1 y1
X
w xY xn yn
i
Y
n
w yY
n
FIGURE 3 Two real-valued neurons which are equivalent to a complex-valued neuron. From Nitta, T. (2000). Figure 2 used by permission from Springer Science and Business Media.
Complex-Valued Neural Network and Complex-Valued Backpropagation
161
Im wk ak ( xk, yk) Re 0 FIGURE 4 An image of the two-dimensional motion for complex-valued signals. From Nitta, T. (2000). Figure 3 used by permission from Springer Science and Business Media.
where, F(t [x y]) = t [ fR (x) fR (y)], αk = arctan(wki /wkr ) (1 ≤ k ≤ n). In Eq. (7), |wk | means reduction or magnification of the distance between a point cos αk − sin αk (xk , yk ) and the origin in the complex plane, the countersin αk cos αk t r i clockwise rotation by αk radians about the origin, and [θ θ ] translation. Thus, we find that a single complex-valued neuron with n-inputs applies a linear transformation called 2D motion to each input signal (complex number), that is, Eq. (7) basically involves 2D motion (Figure 4). As seen above, a real-valued neuron basically administers the movement of points on a real line (1D), and its weight parameters are completely independent of one another. Conversely, as seen, a complex-valued neuron basically administers 2D motion on the complex plane, and we may also interpret that the learning means adjusting 2D motion. This structure imposes the following restrictions on a set of weight parameters of a complex-valued neuron (Figure 3).
(Weight for the real part xk of an input signal to real-part neuron) = (Weight for the imaginary part yk of an input signal to imaginarypart neuron), (8) (Weight for the imaginary part yk of an input signal to real-part neuron) = − (Weight for the real part xk of an input signal to imaginary(9) part neuron). Learning is carried out under these restrictions. From a different angle, we can find that the real-part neuron and the imaginary-part neuron influence each other via their weights.
162
Tohru Nitta
B. Multilayered Complex-Valued Neural Network A complex-valued neural network consists of such complex-valued neurons described in Section II.A.1. For the sake of simplicity, the networks used in the analysis and experiments of this chapter will have three layers. We use wml for the weight between the input neuron l and the hidden neuron m, vnm for the weight between the hidden neuron m and the output neuron n, θm for the threshold of the hidden neuron m, and γn for the threshold of the output neuron n. Let Il , Hm , and On denote the output values of the input neuron l, the hidden neuron m, and the output neuron n, respectively. Let also Um and Sn denote the internal potentials of the hidden neuron m and the output neuron n, respectively. That is, Um = l wml Il + θm , Sn = m vnm Hm + γn , Hm = fC (Um ), and On = fC (Sn ). Let δn = Tn − On denote the error between the actual pattern On and the target pattern Tn of output neuron n. We define the square error for the 2 pattern p as Ep = (1/2) N n=1 |Tn − On | , where N is the number of output neurons.
III. COMPLEX-VALUED BACKPROPAGATION LEARNING ALGORITHM A. Complex-Valued Adaptive Pattern Classifier The Real-BP is based on the adaptive pattern classifiers model, or APCM (Amari, 1967), which guarantees that the Real-BP converges. This section formulates a complex-valued version of the APCM (called complex APCM) for the Complex-BP by introducing complex numbers to the APCM, which will guarantee that the Complex-BP converges. Let us consider two information sources of complex-valued patterns. Two complex-valued patterns x ∈ C n and y ∈ C m occur from information sources 1 and 2 with the unknown joint probability P(x, y), respectively. We will assume that the number of patterns is finite. Note that the set of pairs of complex-valued patterns {(x,y)} corresponds to the set of learning patterns in neural networks. The purpose of learning is to estimate a complex-valued pattern y that occurs from information source 2 given a complex-valued pattern x that occurred from information source 1. Let z(w, x) : C p × C n → C m be a complex function that provides an estimate of y, where w ∈ C p is a parameter that corresponds to all weights and thresholds in neural networks, and z(w, x) corresponds to the actual output pattern of neural networks. Let r(y , y) : C m × C m → R+ be an error function that represents an error that occurs when we give an estimate y
for the true complex-valued pattern y (R+ denotes the set of nonnegative real numbers). Note that r is a nonnegative real function and not a complex
Complex-Valued Neural Network and Complex-Valued Backpropagation
163
function. We define an average error R(w) as def
R(w) =
x
r(z(w, x), y)P(x, y).
(10)
y
R(w) corresponds to the error between the actual output pattern and the target output pattern of neural networks, and the smaller R(w) is, the better the estimation.
B. Learning Convergence Theorem This section presents a learning algorithm for the complex APCM described in Section III.A and prove that it convergences. The algorithm is a complex-valued version of the probabilistic-descent method (Amari, 1967). We introduce a parameter n for discrete time. Let (xn , yn ) be a complexvalued pattern that occurs at time n. Moreover, we assume that the complex-valued parameter w is modified by
wn+1 = wn + wn ,
(11)
where wn denotes a complex-valued parameter at time n. Equation (11) can be rewritten as follows:
Re[wn+1 ] = Re[wn ] + Re[wn ],
(12)
Im[wn+1 ] = Im[wn ] + Im[wn ],
(13)
where Re[z], Im[z] denote the real and imaginary parts of a complex number z, respectively. By definition, a parameter w is optimal if and only if the average error R(w) is the local or global minimum. Then, the following theorem holds. Theorem 1. Let A be a positive definite matrix. Then, by using the update rules
Re[wn ] = −εA∇ Re r(z(wn , xn ), yn ), Im
Im[wn ] = −εA∇ r(z(wn , xn ), yn ),
(14) n = 0, 1, . . . ,
(15)
the complex-valued parameter w approaches the optimum as near as desired by choosing a sufficiently small learning constant ε > 0 (∇ Re is a gradient operator with respect to the real part of w, and ∇ Im with respect to the imaginary part). Proof. The theory of APCM (Amari, 1967) is applicable to this case. The differences are that w ∈ Rp , x ∈ Rn and y ∈ Rm are real-valued variables in the APCM, whereas w ∈ C p , x ∈ C n , and y ∈ C m are complex-valued
164
Tohru Nitta
variables in the complex APCM that influence z(w, x) : C p × C n → C m , r(y , y) : C m × C m → R+ , and R(w) : C p → R1 . In one training step of the complex APCM, the real part Re[w] ∈ Rp and the imaginary part Im[w] ∈ Rp of the complex-valued parameter w are independently changed according to Eqs. (14) and (15). Thus, the manner of changing the parameter w in both models is identical in the sense of updating reals. Hence, there is no need to take into account the change of w from a real-valued variable to a complex-valued variable. Next, x ∈ C n and y ∈ C m , appear in the functions z(w, x) and r(y , y), which are manipulated only in the form of the mathematical expectation with respect to the complex-valued random variables (x, y); For example, E(x, y) [z(w, x)] and E(x, y) [r(y , y)]. Generally, a complex-valued random variable can be manipulated in the same manner as a real-valued random variable. Hence, we can manipulate the functions z(w, x) and r(y , y), just as the corresponding real functions in the APCM. Therefore, there is no need to change the logic of the proof in Amari’s APCM theory. For reference, we describe below the learning convergence theorem in APCM (Amari, 1967) whose parameters and functions assume real values: x ∈ Rn , y ∈ Rm , w ∈ Rp , z(w, x) : Rp × Rn → Rm , r(y , y) : Rm × Rm → R+ , and R(w) : Rp → R1 . Theorem 2 (Convergence Theorem for APCM (Amari, 1967)). Let A be a positive definite matrix. Then, by using the update rule
wn = − εA∇r(z(wn , xn ), yn ),
n = 0, 1, . . . ,
(16)
the real-valued parameter w approaches the optimum as near as desired by choosing a sufficiently small learning constant ε > 0 (∇ is a gradient operator with respect to w). Amari (1967) has proved that the performance of the classifier in the APCM depends on the constant ε and the components of the positive definite matrix A. Similarly, it is assumed that the constant ε and the components of A influence the performance of the classifier in the Complex APCM (it should be rigorously proved).
C. Learning Rule 1. Generalization of the Real-Valued Backpropagation Learning Algorithm This section, applies the theory of the complex APCM to the threelayered complex-valued neural network defined in Section II.B to derive an updating rule of the Complex-BP learning algorithm.
Complex-Valued Neural Network and Complex-Valued Backpropagation
165
For a sufficiently small learning constant (learning rate) ε > 0 and a unit matrix A, using Theorem 1, it can be shown that the weights and the thresholds should be modified according to the following equations:
vnm = −ε γn = −ε wml = −ε θm = −ε
∂Ep ∂Ep − iε , ∂Re[vnm ] ∂Im[vnm ]
(17)
∂Ep ∂Ep − iε , ∂Re[γn ] ∂Im[γn ]
(18)
∂Ep ∂Ep − iε , ∂Re[wml ] ∂Im[wml ]
(19)
∂Ep ∂Ep − iε . ∂Re[θm ] ∂Im[θm ]
(20)
Equations (17)–(20) can be expressed as:
vnm = H m γn , (21) γn = ε Re[δn ](1 − Re[On ])Re[On ] + iIm[δn ](1 − Im[On ])Im[On ] , (22) (23) wml = I l θm , Re[δn ](1 − Re[On ])Re[On ]Re[vnm ] θm = ε (1 − Re[Hm ])Re[Hm ] n
+ Im[δn ](1 − Im[On ])Im[On ]Im[vnm ] − i(1 − Im[Hm ])Im[Hm ] Re[δn ](1 − Re[On ])Re[On ]Im[vnm ] n
− Im[δn ](1 − Im[On ])Im[On ]Re[vnm ]
,
(24)
where z denotes the complex conjugate of a complex number z. In this connection, the updating rule of the Real-BP is as follows:
vnm = Hm γn ,
(25) n
γn = ε(1 − On )On δ , wml = Il θm , θm = (1 − Hm )Hm
n
(26) (27) vnm γn ,
(28)
166
Tohru Nitta
where δn , Il , Hm , On , vnm , γn , wml , θm are all real numbers. Equations (21)–(24) resemble those of the Real-BP.
2. Geometry of Learning This section clarifies the structure of the learning rule of the Complex-BP algorithm when the three-layered complex-valued neural network defined in Section II.B is used as an example. Let zR , zI be the real part and the imaginary part of the magnitude of change of a learnable parameter z, respectively; that is, zR = Re[z], zI = Im[z]. Then, the learning rule in Equations (21)–(24) can be expressed as:
(
) R vnm I vnm
( =
Re[Hm ]
Im[Hm ]
−Im[Hm ]
Re[Hm ]
(
(
vnm
γnR
( ) R wml I wml
− sin βm cos βm = |Hm |e−iβm γn ,
)
γnI
(
An 0 =ε 0 Bn ( =
R θm I θm
)
(
−Im[Il ]
Re[Il ]
=
0 Dm
=
)
cos φl − sin φl
n
(
)
γnI
,
(29) (30)
) ,
Im[δn ] Im[Il ]
(
Cm 0
Re[δn ]
Re[Il ]
= |Il | (
γnR
γnI )( ) sin βm γnR
cos βm
= |Hm |
)(
)(
(31)
R θm
)
I θm )( ) R sin φl θm
cos φl
I θm
Re[vnm ]
Im[vnm ]
−Im[vnm ]
Re[vnm ]
( cos ϕnm Cm 0 |vnm | 0 Dm − sin ϕnm n
(32)
,
)(
γnR
)
γnI )( ) sin ϕnm γnR
cos ϕnm
γnI
, (33)
where An = (1 − Re[On ])Re[On ], Bn = (1 − Im[On ])Im[On ], Cm = (1 − Re[Hm ])Re[Hm ], Dm = (1 − Im[Hm ])Im[Hm ], βm = arctan(Im[Hm ]/Re[Hm ]), φl = arctan(Im[Il ]/Re[Il ]), and ϕnm = arctan(Im[vnm ]/Re[vnm ]).
Complex-Valued Neural Network and Complex-Valued Backpropagation
167
InEq. (29), |Hm | is a similarity transformation (reduction, magnification), cos βm sin βm and is a clockwise rotation by βm radians around the − sin βm cos βm origin. Thus, Eq. (29) performs the linear transformation called 2D motion. Hence, the magnitude of change in the weight between the hidden and R , vI ) can be obtained via the above linear transoutput neurons (vnm nm formation (2D motion) of (γnR , γnI ), which is the magnitude of change in the threshold of the output neuron (Figure 5). Similarly, the magniR , θ I ) can be tude of change in the threshold of the hidden neuron (θm m obtained by applying the 2D motion concerning vnm (the weight between the hidden and output neurons) to (γnR , γnI ), which is the magnitude of R , wI ) change in the threshold of the output neuron Eq. (33). Finally, (wml ml R , θ I ) can be obtained by applying the 2D motion concerning Il to (θm m Eq. (32). Thus, the error propagation in the Complex-BP is based on the 2D motion. Let us now contrast this with the geometry of the Real-BP. Representing the magnitude of the learnable parameter updates (real number) as a point on a real line (1D), we can interpret vnm as the product of |γn | and Hm [Eq. (25)], (Figure 6). Similarly, the product of |γn | and vnm produces θm
Im
I R (D␥n , D␥n )
m
( Dv nm , Dvnm ) R
Hm 0
I
Re
FIGURE 5 An image of the error backpropagation in the Complex-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Hm Dvnm 0 Dgn FIGURE 6 An image of the error backpropagation in the Real-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
168
Tohru Nitta
[Eq. (28)], and the product of |θm | and Il leads wml [Eq. (27)]. Hence, the error propagation in the Real-BP is based on the 1D motion. Therefore, we can find that extending the Real-BP to complex numbers varies the structure of the error propagation from one to two dimensions. The 2D structure of the error propagation just described also means that the units of learning in the Complex-BP algorithm are complex-valued sigR and vI nals flowing through neural networks. For example, both vnm nm are functions of the real parts (Re[Hm ], Re[On ]) and the imaginary parts (Im[Hm ], Im[On ]) of the complex-valued signals (Hm , On ) flowing through R the neural networks [Eq. (29)]. That is, there is a relation between vnm I and vnm through (Re[Hm ], Re[On ]) and (Im[Hm ], Im[On ]). Similarly, there R and wI [Eq. (32)] and between θ R and are relations between wml m ml I θm [Eq. (33)]. Equation (31) indicates no relation between γnR and γnI . However, one can be determined since Re[On ] is a function of Re[Hm ] and Im[Hm ], because
Re[On ] = fR (Re[Sn ]),
(34)
where
Re[Sn ] =
Re[vnm ]Re[Hm ] − Im[vnm ]Im[Hm ] + Re[γn ].
(35)
m
Similarly, Im[On ] is also a function of Re[Hm ] and Im[Hm ], because
Im[On ] = fR (Im[Sn ]),
(36)
where
Im[Sn ] =
Re[vnm ]Im[Hm ] + Im[vnm ]Re[Hm ] + Im[γn ].
(37)
m
Thus, both γnR and γnI are functions of Re[Hm ] and Im[Hm ]. Hence, there is also a relation between γnR and γnI through Re[Hm ] and Im[Hm ]. Therefore, in the Complex-BP algorithm, both the real part and the imaginary part of learnable parameters are modified as a function of the real part and the imaginary part of complex-valued signals, respectively (Figure 7). From these facts, we can conclude that the complex-valued signals flowing through neural networks are the unit of learning in the Complex-BP algorithm.
Complex-Valued Neural Network and Complex-Valued Backpropagation
A signal flowing through the neural network (complex number)
Real part
i
169
Imaginary part
A magnitude of Real Imaginary change of a i part part learnable parameter (complex number) FIGURE 7 Factors to determine the magnitude of change of learnable parameters. The starting point of an arrow refers to a determination factor of the endpoint. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
IV. LEARNING SPEED This section discusses the learning performance of the Complex-BP algorithm presented in Section III.C.
A. Experiments First, we study the learning speed of the Complex-BP algorithm on a number of examples using complex-valued patterns. Then we compare its performance with that of the Real-BP. We investigate the learning speed in terms of a computational complexity perspective (i.e., time and space complexities). Here, time complexity means the sum of four operations for real numbers, and space complexity is the sum of learnable parameters (weights and thresholds), where a complex-valued parameter w = wR + iwI is counted as 2 because it consists of a real part wR and an imaginary part wI . The average number of learning cycles needed to converge by the Complex-BP algorithm was compared with that of the Real-BP algorithm. In the comparison, the neural network structures such that the time complexity per learning cycle of the Complex-BP was almost equal to that of the Real-BP were used. In addition, the space complexity was also examined. In the experiments, the initial real and imaginary components of the weights and thresholds were chosen to be random numbers between −0.3 and +0.3. The stopping criteria used for learning was
I J N J (p) (p) K |Tn − On |2 = 0.10, p n=1
(38)
170
Tohru Nitta
(p)
(p)
where Tn , On ∈ C denote the desired output value, the actual output value of the neuron n for the pattern p [i.e., the left side of Eq. (38)] denotes the error between the desired output pattern and the actual output pattern; N denotes the number of neurons in the output layer. The presentation of one set of learning patterns to the neural network was regarded as one learning cycle.
1. Experiment 1 First, a set of simple complex-valued learning patterns shown in Table I was used to compare the performance of the Complex-BP algorithm with that of the Real-BP algorithm. We used a 1-3-1 three-layered network for the Complex-BP and a 2-7-2 three-layered network for the Real-BP because their time complexities per learning cycle were almost equal as shown in Table II. In the experiment with the Real-BP, the real component of a complex number was input into the first input neuron, and the imaginary component was input into the second input neuron. The output from the first output neuron was interpreted as the real component of a complex-number;
TABLE I
Learning Patterns (Experiment 1)
Input pattern
Output pattern
0 i 1 1+i
0 1 1+i i
√ i = −1. Reprinted from Nitta, T. (1997). © (1997), with permission from Elsevier.
TABLE II Computational Complexity of the Complex-BP and the Real-BP (Experiment 1) Network
Complex-BP 1-3-1 Real-BP 2-7-2
Time complexity
Space complexity
× and ÷
+ and −
Sum
Weights
Thresholds
Sum
78 90
52 46
130 136
12 28
8 9
20 37
Time complexity is the sum of the four operations performed per learning cycle. Space complexity is the sum √ of the parameters (weights and thresholds), where a complex-valued parameter z = x + iy (where i = −1) is counted as two because it consists of a real part x and an imaginary part y. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
171
the output from the second output neuron was interpreted as the imaginary component. The average convergence of 50 trials for each of the 6 learning rates (0.1, 0.2, . . . , 0.6) was used as the evaluation criterion. Although we stopped learning at the 50,000th iteration, all trials succeeded in converging. The result of the experiments is shown in Figure 8.
2. Experiment 2 Next, we conducted an experiment using the set of complex-valued learning patterns shown in Table III. The learning patterns were defined according to the following two rules: 1. The real part of complex number 3 (output) is 1 if complex number 1 (input) is equal to complex number 2 (input), otherwise it is 0. 2. The imaginary part of complex number 3 is 1 if complex number 2 is equal to either 1 or i; otherwise it is 0. Average of learning cycles 100,000 70,000
Complex-BP
50,000 40,000
Real-BP
30,000 20,000 15,000 10,000 7,000 5,000 4,000 3,000 2,000 1,500 1,000
Learning rate 0.1 0.2 0.3 0.4 0.5 0.6 FIGURE 8 Average of learning speed (a comparison between the Complex-BP and the Real-BP (Experiment 1). Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
172
Tohru Nitta
TABLE III Learning Patterns (Experiment 2) Input pattern
Output pattern
Complex number 1
Complex number 2
Complex number 3
0 0 i i 1 i 1+i 1+i
0 i i 1 1 0 1+i i
1 i 1+i i 1+i 0 1 i
i=
√ −1. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
TABLE IV Computational Complexity of the Complex-BP and the Real-BP (Experiment 2) Network
Complex-BP 2-4-1 Real-BP 4-9-2
Time complexity
Space complexity
× and ÷
+ and −
Sum
Weights
Thresholds
Sum
134 150
92 76
226 226
24 54
10 11
34 65
Time complexity is the sum of the four operations performed per learning cycle. Space complexity is the sum √ of the parameters (weights and thresholds), where a complex-valued parameter z = x + iy (where i = −1) is counted as two because it consists of a real part x and an imaginary part y. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
The experimental task was the same as in Experiment 1 except for the layered network structure; a 2-4-1 three-layered network was used for the Complex-BP, and a 4-9-2 three-layered network was used for the Real-BP because their time complexities per learning cycle were equal, as shown in Table IV. In the experiment with the Real-BP, the real and imaginary components of complex number 1 and the real and imaginary components of complex number 2 were input into the first, second, third, and fourth input neurons, respectively. The output from the first output neuron was interpreted to be the real component of a complex number; the output from the second output neuron was interpreted to be the imaginary component. We stopped learning at the 100,000th iteration. The results of the experiments are shown in Figure 9. For reference, the rate of convergence is shown in Table V.
Complex-Valued Neural Network and Complex-Valued Backpropagation
173
Average of learning cycles 100,000 70,000
Complex-BP
50,000 40,000
Real-BP
30,000 20,000 15,000 10,000 7,000 5,000 4,000 3,000 2,000 1,500 1,000 Learning rate 0.1 0.2 0.3 0.4 0.5 0.6 FIGURE 9 Average of learning speed (a comparison between the Complex-BP and the Real-BP (Experiment 2). Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
TABLE V
Rate of Convergence (Experiment 2) Learning rate
Network
Complex-BP 2-4-1 Real-BP 4-9-2
0.1
0.2
0.3
0.4
0.5
0.6
100 0
96 22
88 64
92 78
90 90
98 100
Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
We can conclude from these experiments that the Complex-BP exhibits the following characteristics in learning complex-valued patterns: the learning speed is several times faster than that of the conventional technique (see Figures 8 and 9), whereas the space complexity (i.e., the number of learnable parameters) is only about the half of Real-BP (see Tables II and IV).
174
Tohru Nitta
B. Factors to Improve Learning Speed This section shows how the structure of the error propagation of the Complex-BP algorithm described in Section III.C.2 improves learning speed. In the learning rule of the real-valued backpropagation /[Eqs. (25)–(28)], 0 (1 − Hm )Hm ∈ R and (1 − On )On ∈ R are the derivative 1 − fR (u) fR (u) of the sigmoid function fR (u) = 1/(1 + exp(−u)), which is the activation function of each neuron. The value of the derivative asymptotically approaches 0 as the absolute value of the net input u to a neuron increases (Figure 10). Hence, as |u| increases to make the / output 0value of a neuron exactly approach 0.0 or 1.0, the derivative 1 − fR (u) fR (u) shows a small value, which causes a standstill in learning. This phenomenon is called getting stuck in a local minimum if it continuously takes place for a considerable length of time, and the error between the actual output value and the desired output value remains large. As is generally known, this is the mechanism of standstill in learning in the Real-BP. On the other hand, two types of derivatives of the sigmoid function appear in the learning rule of the Complex-BP algorithm [Eqs. (21)–(24)]: one is the derivative of the real part of an output function ((1 − Re[On ])Re[On ], (1 − Re[Hm ])Re[Hm ]); the other is that of the imaginary part ((1 − Im[On ])Im[On ], (1 − Im[Hm ])Im[Hm ]). The learning rule of the Complex-BP algorithm basically consists of the two linear combinations
(1 – fR (u))fR (u) 0.30 0.25 0.20 0.15 0.10 0.05 0.10
u –10.0 –5.0 0.0 5.0 10.0 FIGURE 10 Derived function of the sigmoid function fR(u). Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
175
of these derivatives:
α1 (1 − Re[On ])Re[On ] + β1 (1 − Im[On ])Im[On ],
(39)
α2 (1 − Re[Hm ])Re[Hm ] + β2 (1 − Im[Hm ])Im[Hm ],
(40)
where αk , βk ∈ R (k = 1, 2). Note that Eq. (39) has a very small value only when both (1 − Re[On ])Re[On ] and (1 − Im[On ])Im[On ] are very small. Hence, Eq. (39) does not show an extremely small value even if (1 − Re[On ])Re[On ] is very small, because (1 − Im[On ])Im[On ] is not always small in the Complex-BP algorithm (whereas the magnitude of learnable parameter updates inevitably becomes quite small if (1 − On )On ∈ R is quite small in the Real-BP algorithm [Eqs. (25)–(28)]. In this sense, the real factor ((1 − Re[On ])Re[On ], (1 − Re[Hm ])Re[Hm ]) makes up for the imaginary factor ((1 − Im[On ])Im[On ], (1 − Im[Hm ])Im[Hm ]) having an abnormally small value and vice versa. Thus, compared with the updating rule of the Real-BP, the Complex-BP is such that the probability for a standstill in learning is reduced. This indicates that the learning speed of the Complex-BP is faster than that of the Real-BP. We can assume that the structure of reducing standstill in learning by the linear combinations [Eqs. (39) and (40)] of the real component and the imaginary component of the derivative of an output function causes the learning speed of the Complex-BP algorithm on a number of examples using complex-valued patterns described in Section IV.A.
C. Discussion We conducted the experiments on the learning characteristics using the comparatively small number of learning patterns and the comparatively small networks and showed the superiority of the Complex-BP algorithm in terms of a computational complexity perspective in Section IV.A. We believe that the Complex-BP algorithm can be increasingly superior to the Real-BP algorithm when it tackles larger problems with larger networks, such as massive real-world applications. This is because the experimental results in Section IV.A suggest that the difference of the learning speed between the Complex-BP and the Real-BP shown in Experiment 2 is larger than that shown in Experiment 1 where the network size and the number of learning patterns used in Experiment 2 are larger than those used in Experiment 1 (see Figures 8 and 9, Tables I and III). Systematic experiments are needed to clarify this statement.
V. GENERALIZATION ABILITY This section describes the research results on the usual generalization ability of the two sets of the Real-BP and Complex-BP networks and the
176
Tohru Nitta
algorithms. In this connection, the inherent generalization ability of the complex-valued neural network are described in Section VI. The learning patterns and the networks for the Experiments 1 and 2 in Section IV.A were also used to test the generalization ability on unseen data inputs. The learning constant used in these experiments was 0.5. Experiments 1 and 2 in the following correspond to the ones in Section IV.A, respectively.
1. Experiment 1 After training (using a 1-3-1 network and the Complex-BP) with the four training points (see Table I, Figures 11a and b), by presenting the 12 test points shown in Figure 11c, the Complex-BP network generated the points as shown in Figure 12a. Figure 12b shows the case in which the 2-7-2 Real-BP network was used.
2. Experiment 2 After training with the eight training points shown in Table III, Figures 13a and b, the 2-4-1 Complex-BP network formed the set of points as shown in Figure 14a for the eight test points (Figure 13c). The results for the 4-9-2 Real-BP network appear in Figure 14b. Here, we need to know the distances between the input training points and the test points to evaluate the generalization performance of the Real-BP and the Complex-BP. However, Figures 13a and c do not always express the exact distances between the input training points and the test points. To clarify this, for any input training point x = (x1 , x2 ) ∈ C 2 and test point y = (y1 , y2 ) ∈ C 2 , we define a distance measure as def
&x − y&2 = |x1 − y1 |2 + |x2 − y2 |2 = (Re[x1 − y1 ])2 + (Im[x1 − y1 ])2 + (Re[x2 − y2 ])2 +(Im[x2 − y2 ])2
(41)
and show the distances between the input training points and the test points in Table VI using the distance measure [Eq. (41)]. For example, the closest input training point to the test point 6 is 5 (Table VI). The above simulation results clearly suggest that the Complex-BP algorithm has the same degree of generalization performance compared to the Real-BP. It may be stated that the generalization performance will not change according to the network size and the number of the learning patterns (see Figures 12 and 14).
Complex-Valued Neural Network and Complex-Valued Backpropagation
177
Im 1.0
0.0 (a)
3
4
1
2
0.0 Im 1.0
0.0 (b)
4
2
1
3
0.0 Im 1.0
1.0 Re 11
12
7
8
9
10
3
4
5
6
1
2
0.0 (c)
1.0 Re
0.0
1.0 Re FIGURE 11 Learning and test patterns for the comparison of the generalization performance of the Complex-BP and the Real-BP (Experiment 1). A solid circle denotes an input training point, an open circle indicate an output training point, and a solid triangle shows an input test point. (a) Input training points. (b) Output training points. (c) Input test points. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
178
Tohru Nitta
Im 1.0 10 12
9
6
5 2
4 11 8 1 7
3
0.0 (a)
0.0
1.0
Re
Im 10
1.0
6 5 2
9
12
4
1
8
11 3 7
0.0 (b)
0.0
1.0
Re FIGURE 12 Result of the comparison of the generalization performance of the Complex-BP and the Real-BP (Experiment 1). An open square denotes an output test point generated by the Complex-BP, and a solid square an output test point generated by the Real-BP. (a) Complex-BP. (b) Real-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
11i
179
8
5
i
6
3
1
7
4
1
2
0 (a)
0
1
11i
i
Im 1.0 4
5 7
3 6
2
1
0.0 (b)
0.0
8 1.0 Re
11i 6
7
8
5 i
3
4
1
1
2
0 (c)
0
1
i
11i
FIGURE 13 Learning and test patterns for the comparison of the generalization performance of the Complex-BP and the Real-BP (Experiment 2). A solid circle denotes an input training point, an open circle an output training point, and a solid triangle an input test point. (a) Input training points. (b) Output training points. (c) Input test points. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
180
Tohru Nitta
lm 7
1.0
2
1
5
8
3
6
4
0.0 (a)
0.0
1.0 Re
lm 1.0
8 3 7 6 5
4
2
1
0.0 (b)
0.0
1.0 Re FIGURE 14 Result of the comparison of the generalization performance of the Complex-BP and the Real-BP (Experiment 2). An open square denotes an output test point generated by the Complex-BP, and a solid square an output test point generated by the Real-BP. (a) Complex-BP. (b) Real-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
181
TABLE VI Distances between the Input Training Points and the Test Points Input training point
Test point
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
1 2 1 3 2 2 3 3
1 1 2 2 2 3 3 2
1 2 1 1 1 2 1 2
2 2 1 1 2 2 2 1
2 3 1 3 1 1 2 2
2 2 2 2 1 2 2 1
2 1 3 1 1 3 2 2
3 2 3 1 2 2 1 1
The distance-measure &x − y&2 is used for an input training point x ∈ C 2 and a test point y ∈ C 2 , which is defined in Eq. (41). Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
VI. TRANSFORMING GEOMETRIC FIGURES This section shows that the Complex-BP can transform geometric figures in a natural way. We used a 1-6-1 three-layered network, which transformed a point (x, y) into (x , y ) in the complex plane. Although the Complex-BP network generates a value z within the range 0 ≤ Re[z], Im[z] ≤ 1, for the sake of convenience, we present it in the figures below as having a transformed value within the range −1 ≤ Re[z], Im[z] ≤ 1. We also conducted some experiments with a 2-12-2 network with real-valued weights and thresholds to compare the Complex-BP with the Real-BP. The real component of a complex number was input into the first input neuron, and the imaginary component was input into the second input neuron. The output from the first output neuron was interpreted as the real component of a complex number; the output from the second output neuron was interpreted as the imaginary component. The learning constant used in these experiments was 0.5. The initial real and imaginary components of the weights and the thresholds were chosen to be random real numbers between 0 and 1. The experiments described in this section consisted of two parts: a training step, followed by a test step.
182
Tohru Nitta
A. Examples 1. Simple Transformation This section presents the results of the experiments on simple transformation. The training input and output pairs were presented 1,000 times in the training step.
Rotation. In the first experiment (using a 1-6-1 network and the ComplexBP), the training step consisted of learning a set of complex-valued weights and thresholds, such that the input set of (straight line) points (indicated by solid circles in Figure 15a) gave as output, the (straight line) points
lm 1.0
21.0
1.0
Re
21.0
(a)
lm
21.0
1.0
Re
21.0 (b) FIGURE 15 Rotation of a straight line. A solid circle denotes an input training point, an open circle an output training point, a solid triangle an input test point, an open triangle a desired output test point, a solid square an output test point generated by the Real-BP, and an open square an output test point generated by the Complex-BP. (a) Case 1. (b) Case 2. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
183
(indicated by open circles) rotated counterclockwise over π/2 radians around the origin. These complex-valued weights and thresholds were then used in a second (test) step, in which the input points lying on two straight lines (indicated by solid triangles in Figures 15a and b) would hopefully be mapped to an output set of points lying on the straight lines (indicated by open triangles) rotated counterclockwise over π/2 radians around the origin. The actual output test points for the Complex-BP did, indeed, lie on the straight lines (indicated by open squares). It appears that the complex-valued network has learned to generalize the transformation of each point Zk (= rk exp[iθk ]) into Zk exp[iα](= rk exp[i(θk + α)]) (i.e., the angle of each complex-valued point is updated by a complex-valued factor exp[iα]), but the absolute length of each input point is preserved. To compare performance of a real-valued network, the 2-12-2 (realvalued) network mentioned above was trained using the linear pairs of points, that is, the (input) solid circles and (desired output) open circles of Figure 15. The solid triangle points of Figure 15 were then input with this real-valued network. The outputs were the solid squares. Obviously, the Real-BP did not preserve each input point’s absolute length. All points were mapped onto straight lines, as shown in Figure 15. In the above experiments, the 11 training input points lay on the line y = −x + 1 (0 ≤ x ≤ 1) and the 11 training output points lay on the line y = x + 1 (−1 ≤ x ≤ 0). The 13 test input points lay on the lines y = 0.2 (−0.9 ≤ x ≤ 0.3) (Figure 15a) and y = −x + 0.5 (0 ≤ x ≤ 0.5) (Figure 15b). The desired output test points should lie on the lines x = − 0.2 and y = x + 0.5. Next, we performed an experiment on rotation of the word ISO, which consisted of three characters (Figure 16). The training set of points was as follows: the input set of points lay on the slanted character I (indicated by solid circles in Figure 16a), and the output set of points lay on the vertical (straight) character I (indicated by open circles). The angle between the input points and the output points was π/4 radians. In a test step, we gave the network some points (indicated by solid triangles in Figures 16b and c) on two slanted characters S and O as the test input points. The Complex-BP rotated the slanted characters S and O counterclockwise over π/4 radians around the origin, whereas the Real-BP destroyed them (see Figure 16).
Similarity Transformation. We examined a similarity transformation with scaling factor α = 1/2 from one circle x2 + y2 = 1 to another circle x2 + y2 = 0.52 (Figure 17a). The training step consisted of learning a set of complexvalued weights and thresholds, such that the input set of (straight line) points (indicated by solid circles in Figure 17a) provided as output the half-scaled straight line points (indicated by open circles). In a second (test) step, the input points lying on a circle (indicated by solid triangles) would hopefully be mapped to an output set of points (indicated by open
184
Tohru Nitta
lm 1.0
Re
21.0
1.0
21.0
(a)
Im 1.0
Re
21.0
1.0
21.0
(b)
Im 1.0
21.0
1.0
Re
21.0 (c) FIGURE 16 Rotation of the word ISO. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. (a) Learning pattern I. (b) Test pattern S. (c) Test pattern O. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
Im
185
Im
1.0
1.0
21.0
(a)
1.0
Re 21.0
21.0
1.0
Re
21.0
(b) Im 1.0
21.0
1.0
Re
21.0 (c) FIGURE 17 Similarity transformation. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. (a) Reduction of a circle. (b) Reduction of a curved line. (c) Magnification of a square. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
triangles) lying on a half-scaled circle. The actual output test points for the Complex-BP did, indeed, lie on the circle (indicated by open squares). It appears that the complex-valued network has learned to generalize the transformation of each point Zk (= rk exp[iθk ]) into αZk (= αrk exp[iθk ]): that is, the absolute length of each complex-valued point is shrunk by a real-valued factor α, but the angle of each input point is preserved. To compare the performance of a real-valued network, the real-valued network was trained using the linear pairs of points—the (input) solid circles and (desired output) open circles of Figure 17a. The solid triangle points of Figure 17a were then input with this real-valued network. The outputs were the solid squares. Obviously, the Real-BP did not preserve each input point’s angle. All angles were mapped onto a straight line, as shown in Figure 17a.
186
Tohru Nitta
We also conducted an additional experiment. Figure 17b shows the result of the responses of the networks to presentation of the points on an arbitrary curved line. The curved line was halved by the Complex-BP, holding its shape, as in the case of the circle, whereas a similar response did not occur in the Real-BP. In these two experiments, the 11 training input points lay on the line y = x (0 ≤ x ≤ 1), and the 11 training output points lay on the line y = x (0 ≤ x ≤ 0.5). In the case of Figure 17a, the 12 test input points lay on the circle with equation x2 + y2 = 1, and the desired output test points should lie on the circle with equation x2 + y2 = 0.52 . In addition, we conducted an experiment on the magnification of a square. The 11 training input points (indicated by solid circles in Figure 17c) lay on the line y = x (0 ≤ x ≤ 0.3), and the training output points (indicated by open circles) lay on the straight line y = x (0 ≤ x ≤ 0.99), which could be generated by magnifying the line y = x (0 ≤ x ≤ 0.3) with a scale magnification factor of 3.3. For a square whose side was 0.3 (indicated by solid triangles), the Complex-BP generated a square whose side was nearly 1.0 (indicated by open squares), whereas the Real-BP generated points (indicated by solid squares) on the straight line y = x.
Parallel displacement. Figure 18a shows the results of an experiment on parallel displacement of a straight line. The training points used in the experiment were as follows: the input set of (straight line) points (indicated by solid circles in Figure 18a) yielded as output the straight line points displaced in parallel (indicated by open circles). The distance of the par√ allel displacement was 1/ 2, and the direction was a −π/4-radian angle.
Im
Im 1.0
1.0
21.0
1.0
Re
21.0
1.0
21.0
Re
21.0 (a) (b) FIGURE 18 Parallel displacement. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. (a) Straight line. (b) Curved line. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
187
In a test step, the input points lying on a straight line (indicated by solid triangles in Figure 18a) would hopefully be mapped to an output set of points (indicated by open triangles) lying on a straight line displaced in parallel. The actual output test points for the Complex-BP did, indeed, lie on the straight line (indicated by open squares). It appears that the complex-valued network has learned to generalize the transformation of each point Zk into Zk + α, where α is a complex number. To compare how a real-valued network would perform, the 2-12-2 realvalued network was trained using the linear pairs of points,—the (input) solid circles and (desired output) open circles of Figure 18a. The solid triangle points of Figure 18a were then input with this real-valued network. The outputs were the solid squares. Obviously, the Real-BP did not displace them in parallel. In the above experiments, the 11 training input points lay on the line y = x + 1 (−1 ≤ x ≤ 0), and the 11 training output points lay on the line y = x (−0.5 ≤ x ≤ 0.5). The 11 test input points lay on the straight line y = x (−0.5 ≤ x ≤ 0.5). The desired output test points should lie on the straight line y = x − 1 (0 ≤ x ≤ 1). We also conducted an experiment on parallel displacement of an arbitrary curved line. As shown in Figure 18b, only the Complex-BP moved it in parallel.
2. Complex Transformation This section shows that the Complex-BP can perform more complicated transformation. The following experiments were conducted under the same conditions as the previous experiments in Section VI.A.1 except the number of iterations (i.e., the training input and output pairs) were presented 7,000 times in the training step.
Rotation. First, we conducted an experiment using two rotation angles: π/4 and π/2 radians. Figure 19 shows how the training points mapped onto each other. Those (input) points lying along the straight line (indicated by solid circles with superscript 1), mapped onto points lying along the straight line (indicated by open circles with subscript 1) (denoted as Learning Pattern 1), and those (input) points lying along the straight line (indicated by solid circles with subscript 2), mapped onto points lying along the straight line (indicated by open circles with subscript 2) (denoted as Learning Pattern 2). In the test step, by presenting the points lying along the six straight lines (indicated by solid triangles with subscripts or superscripts 1–6), the actual output points (indicated by open squares with subscripts or superscripts 1–6) yielded the pattern as shown in Figure 19, where subscript
188
Tohru Nitta
Im Learning Pattern 1
1
1.0
1
/2 radians
1
2
1
2 2 2 2
1 11 6 6
5
5
5
1
1
1
21.0
1 1 6 1 6 1 1 6 6 1 6 1 6 6 5 5 5 5
1
2 2
1
1 1
3
1
2
1
1 1
3
3
3
3 2 1 1 3 21 3 1 3 11 3 5 5 5 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 14 2 3 3 3 4 2 1.0 3 4 4 3 4 652 2 3 4 4 3 4 6 2 4 3 6 2 2 3 4 6 2 3 4 4 5 2 3 2 4 6 2 2 4 6 2 5 6 2 4 6 2 2 4 6 2 5 6 4 5 2 4 5 /4 radians 4 2 4 5 5 5 21.0 2 1
Re
Learning Pattern 2
FIGURE 19 Two rotations: π/4 and π/2 radians. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15.
or superscript denoted a (test) pattern number; for example, a (test) point with subscript 6 belonged to Test Pattern 6. It appears that this complex-valued network has learned to generalize the rotation factor α as a function of the angle θ in the second and third quadrants: a point Zk (= rk exp[iθk ]) is transformed into another point Zk exp[iα(θk )] (= rk exp[i(θk + α(θk ))]), where α(θk ) ≈ π/2 for θk in the second quadrant, and α(θk ) ≈ π/4 for θk in the third quadrant. However, the opposite paradoxically holds true in the first and fourth quadrants. Note that the (input) points (Test Patterns 2 and 5) lying along the borderline of the two input learning patterns were rotated nearly counterclockwise in a 3π/8 = (π/2 + π/4)/2 radians arc about the origin. In the above experiment, the 11 training input points lay along the line x = 0 (0 ≤ y ≤ 1) and the 11 training output points lay along the line y = 0 (−1 ≤ x ≤ 0) (Learning Pattern 1); the 11 training input points lay along the line x = 0 (−1 ≤ y ≤√0) and the 11 training output points lay along the line y = −x (0 ≤ x ≤ 1/ 2) (Learning Pattern 2). The 66 test input points lay along the six lines y = x (0 ≤ x ≤ 0.5) (Test Pattern 1), y = 0 (0 ≤ x ≤ 1) (Test Pattern 2), y = −x (0 ≤ x ≤ 0.5) (Test Pattern 3), y = x (−0.5 ≤ x ≤ 0) (Test Pattern 4), y = 0 (−1 ≤ x ≤ 0) (Test Pattern 5), and y = −x (−0.5 ≤ x ≤ 0) (Test Pattern 6).
Complex-Valued Neural Network and Complex-Valued Backpropagation
189
Im Learning Pattern 1 1.0
1 1 1 1 1
0.5
1 1 1 1
21.0
1 1 1 2 1 2 2 1 1 22 2
20.5
1
1
1
1
1 1
0.5
1.0
Re
2 2 2 2 2
20.5
2 2 2 2
Learning Pattern 2
21.0
Border Line
FIGURE 20 Two similarity transformations: 0.1 and 0.5 similitude ratios. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15.
Similarity Transformation. Next we examined how a complex-valued neural network that learned two similitude ratios performed. Figure 20 shows how the training points mapped onto each other. Those points lying northeast of the borderline mapped onto points along the same line, but with a scale reduction factor of 2. Those points lying southwest of the borderline mapped onto points along the same line, but with a scale reduction factor of 10. In the test step, by presenting the points lying on the outer circle (indicated by solid triangles in Figure 20), the actual output points (indicated by open squares) formed the pattern as shown in the figure. It appears that this complex-valued network has learned to generalize the reduction factor α as a function of the angle θ: Zk (= rk exp[iθk ]) is transformed into α(θk )Zk (= α(θk )rk exp[iθk ]), where α(θk ) ≈ 0.5 for θk northeast of the borderline, and α(θk ) ≈ 0.1 for θk southwest of the borderline. However, angles are preserved for each input point.
190
Tohru Nitta
In the above experiment, the 11 training input points lay along the line y = x (0 ≤ y ≤ 1) and the 11 training output points lay along the line y = x (0 ≤ x ≤ 0.5) (Learning Pattern 1); the 11 training input points lay along the line y = x (−1 ≤ y ≤ 0) and the 11 training output points lay along the line y = x (−0.1 ≤ x ≤ 0) (Learning Pattern 2). The 24 test input points lay on the circle x2 + y2 = 1.
Parallel Displacement. Finally, we performed an experiment on downward parallel displacement. Figure 21 shows the training points mapped onto each other. Those (input) points lying along the straight line indicated by solid circles with superscript 1, mapped onto points lying along the straight line indicated by open circles with superscript 1, where the distance the network should learn was 0.4 (Learning Pattern 1). Those input points lying along the straight line indicated by solid circles with subscript 2 mapped onto points lying along the straight line indicated by open circles with subscript 2, where the distance the network should learn was 0.8 (Learning Pattern 2). In the test step, by presenting the points lying along the two straight lines (indicated by solid triangles with superscripts 1 and 2), the actual output points (indicated by open squares with superscripts 1 and 2) took the Im
Learning Pattern 1
1 1 1 1 1 1 1 1 1 1 1
1.0 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
Test Pattern 1 2 2 2 2 2 2 2 2 2 2 2 111 1 1
Learning Pattern 2
0.5 2 2
2 2 2 2 2 2
1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2
1 1
2 2
1 1 1 11
Test Pattern 2
2 2 2 2 2 2 2 2 2 2
Re 21.0
20.5
2 22 2 2
2
0.5
2 2 2 2 2 2 2 2 2
2
1.0 2
2 2 22
20.5 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2
21.0 FIGURE 21 Two parallel displacements. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15.
Complex-Valued Neural Network and Complex-Valued Backpropagation
191
pattern as shown, where the superscript denoted a (test) pattern number; for example, a (test) point with superscript 2 belonged to Test Pattern 2. It appears that this complex-valued network has learned to generalize the parallel displacement factor α as a function of a distance d from real axis—a point Zk is transformed into another point Zk + α(dk ), where α(dk ) ≈ −0.4i for dk ≈ 1.0, and α(dk ) ≈ −0.8i for dk ≈ 0.2. In the above experiment, the 21 training input points lay along the line y = 1 (−1 ≤ x ≤ 1) and the 21 training output points lay along the line y = 0.6 (−1 ≤ x ≤ 1) (Learning Pattern 1); the 21 training input points lay along the line y = 0.2 (−1 ≤ x ≤ 1) and the 21 training output points lay along the line y = −0.6 (−1 ≤ x ≤ 1) (Learning Pattern 2). The 42 test input points lay along the two lines y = 0.8 (−1 ≤ x ≤ 1) (Test Pattern 1), and y = 0.2 (−1 ≤ x ≤ 1) (Test Pattern 2).
B. Systematic Evaluation This section reports a systematical investigation of the generalization ability of the Complex-BP algorithm on the transformation of geometric figures. In the experiments (using 1-1-1 and 1-6-1 networks, and the ComplexBP), training input and output pairs were as follows: 1. Rotation. The input set of (straight line) points (indicated by solid circles in Figure 22a) gave as output the (straight line) points (indicated by open circles) rotated counterclockwise over π/2 radians. 2. Similarity transformation. The input set of (straight line) points (indicated by solid circles in Figure 22b) gave as output the half-scaled straight line points (indicated by open circles). 3. Parallel displacement. The input set of (straight line) points (indicated by solid circles in Figure 22c) gave as output the straight line points displaced in parallel (indicated by √ open circles), where the distance of the parallel displacement was 1/ 2, and the direction was a −π/4radian angle. Note that the purpose of the use of a 1-1-1 network is to investigate the degree of the approximation of the results of the mathematical analysis, which is presented in Section VI.C. For each of the above three cases, the input (test) points lying on a circle (indicated by solid triangles in Figure 22) were presented in a second (test) step. We then evaluated the performance of the generalization ability of the Complex-BP on a rotation angle, a similitude ratio, and a parallel displacement vector. The evaluation results are shown in Figure 23. The vertical line of Figure 23 denotes the generalization performance, and the horizontal line the difference φ between the argument of a test point and
192
Tohru Nitta
Im
Im a50.5
1.0 radians 2
0.5
0.5
0.5 21.0
0.5 Re
Re
20.5
20.5
20.5
20.5
(a)
1.0
(b)
Im 1.0 0.5
0.5
Re
20.5 20.5 (c) FIGURE 22 Learning and test patterns for the systematic investigation of the generalization ability of the Complex-BP. The circles and triangles (solid or open) have the same meanings as in Figure 15. (a) Rotation. (b) Similarity transformation. (c) Parallel displacement. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
that of an input training point. Error in Figure 23 refers to the left side of Eq. (38); that is, the error between the desired output patterns and the actual output patterns in the training step. As evaluation criteria of the generalization performance, we used |ER (φ)|, |ES (φ)|, and |EP (φ)|, which denoted the Euclidean distances between the actual output test point and the expected output test point (see Figure 23). As shown in Figure 23, the generalization error (generalization performance) increased as the distance between the test point and the input training point became larger (i.e., φ became larger), and it showed the maximum value around the point that yielded the largest distance (φ ≈ 180). Furthermore, it decreased again as the test point approached the input training point. Figure 23 also suggests
Complex-Valued Neural Network and Complex-Valued Backpropagation
|E R()|
193
|E S ()|
0.4 0.15
Error 5 0.06 Error 50.08 Error 50.10
Error 5 0.02 Error 50.06 Error 50.10
0.3 0.10
0.2
0.05 0.1
0.0 (a)
0
100
200
300
0.00 0
(b)
|E P ()|
100
200
300
|E R ()|
0.4 Error 5 0.02 Error 5 0.06 Error 5 0.10
Error 5 0.06 Error 5 0.08 Error 5 0.10
0.20
0.3 0.15
0.2 0.10
0.1
0.05
0.0
0.00
(c)
0
100
200
300
(d)
|E S ()|
0
100
200
300
|E P ()|
0.100
0.3 Error 5 0.02 Error 5 0.06 Error 5 0.10
Error 5 0.02 Error 5 0.06 Error 5 0.10
0.075 0.2
0.050
0.1 0.025
0.000 (e)
0
100
200
300
0.0 (f)
0
100
200
300
FIGURE 23 Results of the evaluation of the performance of the generalization ability of the Complex-BP. |ER(ϕ)|, |ES(ϕ)|, and |EP(ϕ)| denote the Euclidean distances between the actual output test point and the expected output test point in the test step, respectively; ϕ denotes the difference between the argument of a test point and that of an input training point. Error refers to the left side of Eq. (38), i.e., the error between the desired output patterns and the actual output patterns in the training step. (a) Rotation, 1-1-1 network. (b) Similarity transformation, 1-1-1 network. (c) Parallel displacement, 1-1-1 network. (d) Rotation, 1-6-1 network. (e) Similarity transformation, 1-6-1 network. (f) Parallel displacement, 1-6-1 network. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
194
Tohru Nitta
that the generalization error on the transformation of geometric figures decreases as the number of hidden neurons increases, where only one hidden neuron was used in the three experiments of Figures 23a–c and six hidden neurons were used in the three experiments of Figures 23d–f. In the above experiments, the 11 training input points lay on the line x = 0 (0 ≤ y ≤ 1) and the 11 training output points lay on the line y = 0 (−1 ≤ x ≤ 0) for the rotation, the 11 training input points lay on the line x = 0 (0 ≤ y ≤ 1) and the 11 training output points lay on the line x = 0 (0 ≤ y ≤ 0.5) for the similarity transformation, and the 11 training input points lay on the line x = 0 (0 ≤ y ≤ 1) and the 11 training output points lay on the line x = 0.5 (−0.5 ≤ y ≤ 0.5) for the parallel displacement. The eight test input points lay on the circle x2 + y2 = 0.52 for all three cases.
C. Mathematical Analysis This section presents a mathematical analysis of the behavior of a complexvalued neural network that has learned the concept of rotation, similarity transformation, or parallel displacement using the Complex-BP algorithm. We introduce a simple 1-1-1 three-layered complex-valued network for the analysis. We use v exp[iw] ∈ C for the weight between the input and hidden neurons, c exp[id] ∈ C for the weight between the hidden and output neurons, s exp[it] ∈ C for the threshold of the hidden neuron, * + and r exp[il] ∈ C for the threshold of the output neuron. Let v0 exp iw0 , * + * + * + c0 exp id0 , s0 exp it0 , and r0 exp il0 denote the learned values of the learnable parameters. We define the following constants in advance:
K =
(1 +
√
1 2)c0 + 2r0
c0 B = √ , C = r0 , 2
, G=
c0 s0 kac0 v0 , A = , 2(kav0 + s0 ) 2(kav0 + s0 )
π HR = A cos(t0 + d0 ) + B cos d0 + + C cos(l0 ), 4 π HI = A sin(t0 + d0 ) + B sin d0 + + C sin(l0 ), 4 # M = 2K HR2 + HI2 , L 0 / π M = 2 K 2 A2 + B2 + C2 + 2AB cos t0 − + 2AC cos t0 + d0 − l0 4
τ 2 π − τK A cos(t0 + d0 − ω) + 2BC cos d0 + − l0 + 4 4 π + B cos d0 + − ω + C cos(l0 − ω) . (42) 4
Complex-Valued Neural Network and Complex-Valued Backpropagation
195
Im
Learning pattern (output)
␣ radians radians
Test pattern (input)
Learning pattern (input)
x radians Re
␣ radians
Test pattern (output) FIGURE 24 Learning and test patterns used in the mathematical analysis of the behavior of a Complex-BP network that has learned the counterclockwise rotation of the points in the complex plane by α radians around the origin. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
First, we investigate the behavior of the Complex-BP network that has learned rotation. Learning patterns are as follows: p points with equal intervals on a straight line that forms an angle of x radians with the real axis are rotated counterclockwise over α radians in the complex plane (Figure 24). That is, there are p training points such that for any 1 ≤ k ≤ p, an input point can be expressed as
ka exp[ix],
(43)
and the corresponding output point as
π 1 1 ka exp[i(x + α)] + √ exp i , 2 4 2
(44)
where k, p ∈ N (N denotes the set of natural numbers), a, x, α ∈ R+ , 0 < pa ≤ 1 which limits the values of learning patterns to the range from −1 to 1 in the complex plane, and the constant a denotes the interval
196
Tohru Nitta
between points. Note that although the output points take a value z within the range −1 ≤ Re[z], Im[z] ≤ 1, we transformed them as having a value z within the range 0 ≤ Re[z], Im[z] ≤ 1, because the Complex-BP network generates a value z within the range 0 ≤ Re[z], Im[z] ≤ 1. For this reason, Eq. (44) appears somewhat complicated. The following theorem explains the qualitative properties of the generalization ability of the Complex-BP on a rotation angle. Theorem 3. Fix 1 ≤ k ≤ p arbitrarily. To the Complex-BP network that has learned the training points [Eqs. (43) and (44)], a test point ka exp[i(x + φ)] is given, which can be obtained by a counterclockwise rotation of an input training point ka exp[ix] by arbitrary φ radians around the origin (Figure 24). Then, the network generates the following value:
π 1 1 ka exp [i(x + φ + α)] + √ exp i + ER (φ) ∈ C. 2 4 2
(45)
The first term of Eq. (45) refers to the point that can be obtained by the counterclockwise rotation of the test point ka exp[i(x + φ)] by α radians around the origin (Figure 24). Note that α is the angle that the network has learned. Also, the second term ER (φ) is a complex number that denotes the error, and the absolute value called generalization error on angle is given in the following expression:
! " φ , |E (φ)| = M sin 2 R
(46)
where M is a constant. A technical result is needed to prove Theorem 3: Lemma 1. For any 1 ≤ k ≤ p, the following approximate equations hold:
1 K G cos(x + w0 + d0 ) + HR + = 2 1 K G sin(x + w0 + d0 ) + HI + = 2
1 1 ka cos(x + α) + , 2 2
(47)
1 1 ka sin(x + α) + . 2 2
(48)
Proof. For any 1 ≤ k ≤ p, by computing the output value of the ComplexBP network for the input training point ka exp[ix], we find that the real part of the output value is equal to the left side of Eq. (47) and the imaginary part is equal to the left side of Eq. (48). In the above computations, the sigmoid function in the activation function [Eq. (2)] of each neuron was
197
Complex-Valued Neural Network and Complex-Valued Backpropagation
approximated by the following piecewise linear functions:
g(x) =
⎧ ⎪ ⎨ ⎪ ⎩
1 x 2(kav0 +s0 )
+
1 2
1 0
/
−(kav0 + s0 ) ≤ x ≤ kav0 + s0 0 / 0 kav + s0 < x 0 / x < −(kav0 + s0 )
0 (49)
for the hidden neuron, and
h(x) =
⎧ ⎪ ⎪ ⎪ ⎪ ⎨
√ 1 x (1+ 2)c0 +2r0
+
1 2
1 ⎪ ⎪ ⎪ ⎪ ⎩0
√
−( 1+2 2 c0 + r0 ) ≤ x ≤ √ 1+ 2 0 0 <x c + r 2 √ x < − 1+2 2 c0 + r0
√ 1+ 2 0 2 c
+ r0
(50) for the output neuron. Conversely, the real part and the imaginary part of the output value of the complex-valued neural network for the input training point should be equal to the real part and the √imaginary part of the output training point (1/2)ka exp[i(x + α)] + (1/ 2) exp [i(π/4)], respectively. This concludes the proof of Lemma 1. Proof of Theorem 3 Theorem 3 is proved according to the following policy. Using Equations (47) and (48) in Lemma 1, we compute the output value of the Complex-BP network for the test point ka exp[i(x + φ)], and transform it into [The point generated by the counterclockwise rotation over α radians of the test point ka exp[i(x + φ + α)] ] + [The error]. First, we compute the real part of the output value when the test point ka exp[i(x + φ)] is fed into the Complex-BP network. Using the equation
cos θ − λ sin θ =
; 1 + λ2 cos(θ + φ)
(51)
for any θ, where λ = tan φ, and by computing [ Eq. (47) ] − λ·[ Eq. (48)], we derive
0
0
K G cos(x + φ + w + d ) + HR
1 1 1 + = ka cos(x + φ + α) + 2 2 2 + ER re (φ),
where
ER re (φ)
(52)
! " ! " ! " φ φ π φ 0 0 0 · A sin t + d + + B sin d + + = 2K sin 2 2 4 2 ! " φ + C sin l0 + . (53) 2
198
Tohru Nitta
Note that the left side of Eq. (52) refers to the real part of the output value of the Complex-BP network for the test point, and the first term of the right side of Eq. (52) refers to the real part of the point generated by the counterclockwise rotation over α radians of the test point ka exp[i(x + φ + α)]. Finally, ER re (φ) refers to the real part of a complex number that denotes the error. Similarly, using the equation
λ cos θ + sin θ =
; 1 + λ2 sin(θ + φ)
(54)
for any θ, and by computing λ·[ Eq. (47) ] + [ Eq. (48) ], we derive
0
0
K G sin(x + φ + w + d ) + HI
1 1 1 + = ka sin(x + φ + α) + 2 2 2 + ER im (φ),
where
ER im (φ) = −2K sin
(55)
! " ! " ! " φ φ π φ · A cos t0 + d0 + + B cos d0 + + 2 2 4 2 ! " φ + C cos l0 + . (56) 2
Note that the left side of Eq. (55) refers to the imaginary part of the output value of the Complex-BP network for the test point, and the first term of the right side of Eq. (55) refers to the imaginary part of the point generated by the counterclockwise rotation over α radians of the test point ka exp[i(x + φ + α)]. Finally, ER im (φ) refers to the imaginary part of a complex number that denotes the error. Hence, it follows from Equations (52) and (55) that the output value of the Complex-BP network for the test point can be expressed as
π 1 1 ka exp[i(x + φ + α)] + √ exp i + ER (φ), 2 4 2
(57)
where def
R ER (φ) = ER re (φ) + iEim (φ);
(58)
that is, the Complex-BP network rotates the test point ka exp[i(x + φ)] counterclockwise over α radians with the error ER (φ), and Eq. (46) follows from Equations (53) and (56).
Complex-Valued Neural Network and Complex-Valued Backpropagation
199
The above theorem tells us that the generalization error on angle |ER (φ)| increases as the distance between the test point and the input training point increases (i.e., φ becomes larger), and it shows the maximum value M at the point that gives the largest distance (φ = 180). Furthermore, it decreases as the test point becomes closer to the input training point. Remark. The value of M differs with each learning because M depends on the values of the learnable parameters after learning; that is, the value of M is a constant in the world after learning in which the learnable parameters are fixed. Therefore, M is a constant, not a function of α, in Theorem 3 where a situation after one learning with a fixed value of α is assumed. Next, we explain the behavior of the Complex-BP network that has learned a similarity transformation. We use the following learning patterns: p points with equal intervals a on a straight line that forms an angle of x radians with the real axis are transformed into the points obtained by the similarity transformation with the similitude ratio β in the complex plane, respectively (Figure 25). That is, there are p training points; for any 1 ≤ k ≤ p, an input point can be expressed as
ka exp[ix],
(59)
and the corresponding output point as
π 1 1 kaβ exp[ix] + √ exp i , 2 4 2
(60)
where k, p ∈ N, a, x, β ∈ R+ , 0 < paβ ≤ 1, which limits the values of learning patterns to the range from −1 to 1 in the complex plane. Note that although the output points take a value z within the range −1 ≤ Re[z], Im[z] ≤ 1, we transformed them as having a value z within the range 0 ≤ Re[z], Im[z] ≤ 1 as in the case of rotation. The following theorem shows the qualitative property of the generalization ability of the Complex-BP on a similitude ratio. Theorem 4. Fix 1 ≤ k ≤ p arbitrarily. To the Complex-BP network that has learned the training points [Equations (59) and (60)], a test point ka exp[i(x + φ)] is given that can be obtained by a counterclockwise rotation of an input training point ka exp[ix] by arbitrary φ radians around the origin (Figure 25). Then, the network generates the following value:
π 1 1 kaβ exp [i(x + φ)] + √ exp i + ES (φ) ∈ C. 2 4 2
(61)
200
Tohru Nitta
Im Learning pattern (output)

Test pattern (input) radians Learning pattern (input) x radians Re

Test pattern (output)
FIGURE 25 Learning and test patterns used in the mathematical analysis of the behavior of a Complex-BP network that has learned the similarity transformation with the similitude ratio β in the complex plane. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
The first term of Eq. (61) refers to the point that can be obtained by the similarity transformation of the test point ka exp[i(x + φ)] on the distance from the origin with the similitude ratio β in the complex plane (Figure 25). Note that β is the similitude ratio that the network has learned. Also, the second term ES (φ) is a complex number that denotes the error, and the absolute value called generalization error on similitude ratio is given in the following expression:
! " φ . |E (φ)| = M sin 2 S
(62)
Remark. For the same reason as Theorem 3, M is a constant, not a function of β, in Theorem 4 where a situation after one learning with a fixed value of β is assumed. The proof is omitted, because it can be done in the same way as Theorem 3. We derive from Theorem 4 that the generalization error on similitude ratio |ES (φ)| increases as the distance between the test point and the input training point increases (i.e., φ becomes larger), and it takes the
Complex-Valued Neural Network and Complex-Valued Backpropagation
201
maximum value M at the point that gives the largest distance (φ = 180). Furthermore, it decreases as the test point approaches the input training point. Finally, we will show the behavior of the Complex-BP network that has learned parallel displacement. We use the following learning patterns: p points with equal intervals a on a straight line that forms an angle of x radians with the real axis are transformed into the points that can be obtained by the parallel displacement with a complex number γ = τ exp[iω] (called parallel displacement vector here) determining the direction and distance in the complex plane, respectively (Figure 26). That is, there are p training points; for any 1 ≤ k ≤ p, an input point can be expressed as
ka exp[ix],
(63)
and the corresponding output point as
π 1 1 (ka exp[ix] + γ) + √ exp i , 2 4 2
(64)
Im
Test pattern (input)
␥
Learning pattern (input)
radians
␥
Re x radians Test pattern (output)
Learning pattern (output)
FIGURE 26 Learning and test patterns used in the mathematical analysis of the behavior of a Complex-BP network which has learned the parallel displacement of the points with the parallel displacement vector γ in the complex plane. The circles, triangles, and squares (solid or open) have the same meanings as in Figure 15. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
202
Tohru Nitta
where k, p ∈ N, a, x ∈ R+ , γ ∈ C and
−1 ≤ Re[pa exp[ix] + γ], Im[pa exp[ix] + γ] ≤ 1,
(65)
which limits the values of learning patterns to the range from −1 to 1 in the complex plane. Note that although the output points take a value z within the range −1 ≤ Re[z], Im[z] ≤ 1, we transformed them as having a value z within the range 0 ≤ Re[z], Im[z] ≤ 1 as in the previous cases. We can obtain the following theorem that clarifies the qualitative property of the generalization ability of the Complex-BP on a parallel displacement vector. Theorem 5. Fix 1 ≤ k ≤ p arbitrarily. To the Complex-BP network that has learned the training points [Equations (63) and (64)], a test point ka exp[i(x + φ)] is given that can be obtained by a counterclockwise rotation of an input training point ka exp[ix] by arbitrary φ radians around the origin (Figure 26). Then, the network generates the following value:
π 1 1 ka exp [i(x + φ)] + γ + √ exp i + EP (φ) ∈ C. 2 4 2
(66)
The first term of Eq. (66) refers to the point that can be obtained by the parallel displacement of the test point ka exp[i(x + φ)] with the parallel displacement vector γ (Figure 26). Note that γ is the parallel displacement vector that the network has learned. Also, the second term EP (φ) is a complex number that denotes the error, and the absolute value called the generalization error on parallel displacement vector is given in the following expression:
! " φ . |EP (φ)| = M sin 2
(67)
Remark. For the same reason as Theorem 3, M is a constant, not a function of γ, in Theorem 5 where a situation after one learning with a fixed value of γ is assumed. We can obtain this theorem in the same manner as Theorem 3. Therefore the proof is omitted. Theorem 5 indicates that the generalization error on parallel displacement vector |EP (φ)| increases as the distance between the test point and the input training point becomes larger (i.e., φ becomes larger), and it shows the maximum value M at the point that gives the largest distance (φ = 180). Furthermore, it decreases as the test point approaches the input training point.
Complex-Valued Neural Network and Complex-Valued Backpropagation
203
D. Discussion This section discusses the experimental and mathematical results described in Sections VI.A, VI.B, and VI.C on the ability of the Complex-BP algorithm to generalize the transformation. As seen in Section VI.A, 1-n-1 Complex-BP networks have the ability to generalize the transformation of geometric figures. This brings us to the second point: do the 1-n-1 Complex-BP networks have such a usual generalization ability that the Real-BP networks have? At first glance, the Real-BP and Complex-BP algorithms appear to have two different generalization abilities. To determine whether this is true or not, we tested the generalization ability of the 1-n-1 Complex-BP network for a continuous mapping task that 2-m-2 Real-BP networks could solve, which appeared in Tsutsumi (1989). In the experiments, a set of 25 training points shown in Figure 27 was used for a 1-6-1 Complex-BP network and a 2-12-2 Real-BP network. After sufficient training, by presenting the 252 test points on the 12 dotted lines shown in Figure 28, the actual output points formed the solid lines as shown in Figures 28a and b. Figure 29 shows the case in which the input training points are Figure 27b and the target training points are Figure 27a. Figures 28 and 29 suggest that both the 1-6-1 Complex-BP network and the 2-12-2 Real-BP network can obtain the same degree of generalization. We next discuss some examples on transformation described in Section VI.A.1. The counterclockwise rotation of a point (x, y) in the complex plane by θ radians around the origin corresponds to multiplying that complex number z1 = x + iy by the complex number z2 = exp[iθ], which has a radius of 1 and an argument of θ radians. That is, z1 z2 denotes the point generated by the counterclockwise rotation of a point (x, y) by θ radians around the origin. Furthermore, similarity transformations (reduction and magnification), and parallel displacement of a point (x, y) in the complex plane correspond, respectively to (1) multiplying a complex number z1 = x + iy by the real number α and (2) adding a complex number w to z1 = x + iy. We therefore believe that the complex-valued neural network has learned the complex function g(z) = z exp[iθ], g(z) = αz, or g(z) = z + w (for rotation, similarity transformation, and translation, respectively) in the experiments of Section VI.A.1. For example, θ = π/2 in Figure 15 (rotation), α = 0.5 in Figure 17a (reduction), and w = 0.5 − 0.5i in Figure 18a (parallel displacement). It should be noted that the neural network has learned nothing but some points on a certain straight line in the domain; nevertheless, the domain of the complex function g is [−1, 1] × [−1, 1]. The neural network presented a sequence of some points in the domain; the responses of the neural network to presentation of all points in the domain were nearly the values of the learned complex function g. This behavior of complex-valued neural networks closely resembles the identity theorem in complex analysis, which we now state.
204
Tohru Nitta
Im 1.0
0.0
21
22
23
24
25
16
17
18
19
20
11
12
13
14
15
6
7
8
9
10
1
2
3
4
5
1.0
0.0
Re
(a) Im 1.0 21
22
23
16
17
18
11
12
13
6
0.0
1
7
2
24
25
19
20
14
15
9
10
4
5
8
3
0.0 1.0 (b) Re FIGURE 27 Learning pattern for the comparison of the usual generalization performance of the 1-n-1 Complex-BP network and the 2-m-2 Real-BP network. The solid circles and the open circles denote the training points. (a) Learning pattern #1. (b) Learning pattern #2. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
205
Im 1.0
0.0 0.0 (a)
1.0 Re
Im 1.0
0.0 0.0 (b)
1.0 Re
FIGURE 28 Results of the comparison of the usual generalization performance of the 1-n-1 Complex-BP network and the 2-m-2 Real-BP network (Input: Learning pattern #1 in Figure 27a; Target: Learning pattern #2 in Figure 27b). The 12 dotted lines denote the input test pattern, and the solid lines the output test pattern. (a) Network output by the Complex-BP. (b) Network output by the Real-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
206
Tohru Nitta
Im 1.0
0.0 0.0 (a)
1.0 Re
Im 1.0
0.0 0.0 (b)
1.0 Re
FIGURE 29 Results of the comparison of the usual generalization performance of the 1-n-1 Complex-BP network and the 2-m-2 Real-BP network (Input: Learning pattern #2 in Figure 27b; Target: Learning pattern #1 in Figure 27a). The 12 dotted lines denote the input test pattern, and the solid lines the output test pattern. (a) Network output by the Complex-BP. (b) Network output by the Real-BP. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
207
Theorem 6 (The Identity Theorem). Let F and G be regular functions over the complex domain D. If F(z) = G(z) on a given line in D, then F(z) = G(z) over D identically. This theorem is indicative of a phenomenon found in complex analysis that does not exist in real analysis. We interpret the behavior of the Complex-BP, as shown in Section VI.A.1, in terms of the identity theorem. We assume that the training points are obtained from the points on a given (straight) line in the domain of the true complex function F : [−1, 1] × [−1, 1] → [−1, 1] × [−1, 1]. The neural network approximates F on the basis of the given training points, resulting in a complex function G (where G(z) should equal F(z), at least on the training points). Then, for all z in the complex domain, the neural network generates the point G(z) that is close to F(z), as though it had satisfied the identity theorem. We believe that Complex-BP networks satisfy the identity theorem; that is, Complex-BP networks can approximate complex functions just by training them over only a part of the domain of the complex functions. Conversely, as seen in Section VI.C, the generalization error of the Complex-BP on transformation of geometric figures can be represented by the sine of the difference between the argument of the test point and that of the input training point. These mathematical results agree qualitatively with the simulation results in Section VI.B, which state that the generalization error increases as the distance between the test point and the input training point increases, and it assumes the maximum value M around the point that gives the largest distance; furthermore, it decreases as the test point approaches the input training point. Here we investigated the theoretical values and the experimental values of M based on the simulation results (using 1-1-1 networks) in Section VI.B. Table VII shows that there are some errors between the theoretical values and the experimental values of M. It is assumed that the cause of such errors is that Theorems 3, 4, and 5 are approximations (i.e., the sigmoid function in the output function of each neuron was approximated by the piecewise linear function). TABLE VII Comparison of the Theoretical Values and the Experimental Values of M (and M') Type of transformation
Theoretical value
Experimental value
Rotation Similarity transformation Parallel displacement
0.19 0.02 0.49
0.35 0.03 0.27
Learning was stopped when the error between the desired output pattern and the actual output pattern was equal to 0.06 in the case of rotation, 0.02 in the cases of similarity transformation and parallel displacement. Reprinted from Nitta, T. (1997). © 1997, with permission from Elsevier.
208
Tohru Nitta
Thus, we can conclude that although there are approximation errors, Theorems 3, 4, and 5 clarify the qualitative property of the generalization ability of the Complex-BP on the transformation of geometric figures. The mathematical analysis in Section VI.C is restricted to a class of transformation of geometric figures such that the training points lie on a line starting from the origin, and the test points are obtained by an arbitrary counterclockwise rotation of the input training points. Thus, Theorems 3, 4, and 5 are applicable to only Figures 16 and 17 of the simulation results (Figures 15–18) presented in Section VI.A.1 (note that the training points in Figure 16 are not precise). Judging from the method of the analysis, it seems reasonable to suppose that similar theorems not only for Figures 15 and 18 but also for more general networks such as 1 − n − 1 and 1 − n1 − n2 − · · · − nk − 1 networks can be proved using the approach of Theorems 3, 4, and 5. However, the following problems may, occur. (1) Unlike Theorems 3, 4, and 5, the generalization error on the transformation of geometric figures cannot be simply represented by the sine. (2) The approximation errors increase as the number of neurons increases, because the sigmoid function in the output function of each neuron is approximated by piecewise linear functions. Another approach is needed to solve these problems. The Complex-BP algorithm can transform geometric figures so that the Real-BP cannot be its inherent property. On the other hand, we have experimentally clarified in Section V and in the beginning of this section that the Complex-BP algorithm has the same degree of the usual generalization performance compared to the Real-BP. The question now arises: while the Complex-BP has the inherent generalization ability (such as the ability to transform geometric figures), why can the Complex-BP have the same degree of the usual generalization performance compared to the Real-BP? Our answer to this question is given below. The learning patterns used in the experiments on transforming geometric figures (in Sections VI.A.1 and VI.B) were all very specific—meaning that they used some 10 learning patterns with narrow intervals (i.e., high density) massed on part of the plane (see Figures 15–18 and 22). However, the learning patterns used in the experiments on the learning speed in Section IV and the usual generalization ability in Section V and in the beginning of this section all had only low specificity, as detailed below: 1. Only four learning patterns with wide intervals (i.e., low density) were scattered on the plane in Experiment 1 of Sections IV and V—only a few learning patterns were used, and the distances between the learning patterns were large (see Table I and Figures 11a and b). 2. The learning patterns were evenly distributed on the plane, although many learning patterns with high density were used in the experiment of this section (see Figure 27).
Complex-Valued Neural Network and Complex-Valued Backpropagation
209
In addition, we cannot compare reasons why the Complex-BP algorithm could have the same degree of usual generalization performance compared to the Real-BP algorithm in Experiment 2 of Section V because the 2-4-1 Complex-BP network was used in Experiment 2 of Section V, which was substantially different from the 1-n-1 Complex-BP network used for transforming geometric figures. It is fairly certain that the inherent generalization ability to transform geometric figures is unique to the 1-n-1 network structure. Thus, we believe that the structures of the network and the learning patterns caused the experimental results that the ComplexBP could have the same degree of the usual generalization performance compared to the Real-BP. In other words, the use of the very specific learning patterns in the particular 1-n-1 network made the inherent ability to generalize emerge. There have been two applications of the ability to transform geometric figures of the Complex-BP network. One concerns optical flows, and the other relates to fractal images. Watanabe et al., (1994) applied the ComplexBP network in the computer vision field. They successfully used the ability to transform geometric figures of the Complex-BP network to complement the optical flow (2D velocity vector field on an image). The ability to transform geometric figure of the Complex-BP can also be used to generate fractal images. Miura and Aiyoshi (2003) applied the Complex-BP to the generation of fractal images and showed in computer simulations that some fractal images (such as the snow crystal) could be obtained with high accuracy where the iterated function systems (IFS) were constructed using the ability to transform geometric figure of the Complex-BP. The search for other various tasks in which the behavior of the ComplexBP network is dramatically different from that of the Real-BP network (for example, what the Real-BP can do, that the Complex-BP cannot) is a future research topic, which will greatly expand the complex-valued neural network world. It should be noted that it is cannot always be said that other inherent abilities of the Complex-BP algorithm, which may be discovered in the future, are superior to those of the Real-BP because the superiority of the Complex-BP as compared to the Real-BP depends on the specific problems and manner in which the algorithms are applied.
VII. ORTHOGONALITY OF DECISION BOUNDARIES IN THE COMPLEX-VALUED NEURON Decision boundary is a boundary by which the pattern classifier (such as the Real-BP) classifies input patterns, and generally consists of hypersurfaces. Decision boundaries of real-valued neural networks have been examined empirically by Lippmann (1987). This section mathematically analyzes
210
Tohru Nitta
the decision boundaries of the complex-valued neuron and presents their utility.
A. Mathematical Analysis We analyze the decision boundary of a single complex-valued neuron. Let the weights denote w = t [w1 · · · wn ] = wr + iwi , wr = t [w1r · · · wnr ], i w = t [w1i · · · wni ], and let the threshold denote θ = θ r + iθ i . Then, for n input signals (complex numbers) z = t [z1 · · · zn ] = x + iy, x = t [x1 · · · xn ], y = t [y1 · · · yn ], the complex-valued neuron generates
" ! ! x X + iY = fR [t wr − t wi ] + θ r + i fR [t wi y
t
wr ]
x + θi y
"
(68) as an output. Here, for any two constants CR , CI ∈ (0, 1), let
! " x X(x, y) = fR [t wr − t wi ] + θ r = CR , y !
t
Y(x, y) = fR [ w
i
t
x w] + θi y r
(69)
" = CI .
(70)
Note that Eq. (69) is the decision boundary for the real part of an output of the complex-valued neuron with n-inputs. That is, input signals (x, y) ∈ R2n are classified into two decision regions {(x, y) ∈ R2n |X(x, y) ≥ CR } and {(x, y) ∈ R2n |X(x, y) < CR } by the hypersurface given by Eq. (69). Similarly, Eq. (70) is the decision boundary for the imaginary part. The normal vectors QR (x, y) and QI (x, y) of the decision boundaries [Equations (69) and (70)] are given by
" ∂X ∂X ∂X ∂X Q (x, y) = ··· ··· ∂x1 ∂xn ∂y1 ∂yn ! " x
t r t i r = fR [ w − w ] + θ · [t wr − t wi ], y !
R
" ∂Y ∂Y ∂Y ∂Y ··· ··· ∂x1 ∂xn ∂y1 ∂yn ! " x = fR [t wi t wr ] + θ i · [t wi t wr ]. y
(71)
!
QI (x, y) =
(72)
Complex-Valued Neural Network and Complex-Valued Backpropagation
211
Noting that the inner product of Equations (71) and (72) is 0, we can find that the decision boundary for the real part of an output of a complex-valued neuron and that for the imaginary part, it intersects orthogonally. It can be easily shown that this orthogonal property also holds true for the other types of the complex-valued neurons proposed in Kim and Guest (1990), Benvenuto and Piazza (1992), and Georgiou and Koutsougeras (1992). Generally, a real-valued neuron classifies an input real-valued signal into two classes (0, 1). Conversely, a complex-valued neuron classifies an input complex-valued signal into four classes (0, 1, i, 1 + i). As described previously, the decision boundary of a complex-valued neuron consists of two hypersurfaces that intersect orthogonally and divides a decision region into four equal sections. Thus, it can be considered that a complexvalued neuron has a natural decision boundary for complex-valued patterns.
B. Utility of the Orthogonal Decision Boundaries This section shows the utility of the property on the decision boundary described in the previous section. Minsky and Papert (1969) clarified the limitations of a single real-valued neuron; in many cases, a single real-valued neuron is incapable of solving the problems. A classic example is the XOR problem with its long history in the study of neural networks; Many other difficult problems also involve the XOR as a subproblem. Another example is the detection of symmetry problem. Rumelhart et al., (1986a, b) showed that the three-layered real-valued neural network (i.e., with one hidden layer) can solve such problems—including the XOR problem and the detection of symmetry problem—and the interesting internal representations can be constructed in the weight-space. The XOR problem and the detection of symmetry problem cannot be solved with a single real-valued neuron. In the following text, first, contrary to expectation, it will be proved that such problems can be solved by a single complex-valued neuron with orthogonal decision boundaries, which reveals a potent computational power of complex-valued neuron. In addition, it is shown as an application of the above computational power that the fading equalization problem can be successfully solved by a single complex-valued neuron with the highest generalization ability. Rumelhart et al., (1986a, b) showed that increasing the number of layers raised the computational power of neural networks. This section shows that extending the dimensionality of neural networks to complex numbers causes the similar effect on neural networks. This may be a new direction for enhancing the ability of neural networks.
212
Tohru Nitta
1. The XOR Problem This section proves that the XOR problem can be solved by a single complex-valued neuron with the orthogonal decision boundaries. The input-output mapping in the XOR problem is shown in Table VIII. In order to solve the XOR problem with a single complex-valued neuron, the input-output mapping is encoded (as shown in Table IX), where the outputs 1 and i are interpreted as 0 and 0 and 1 + i are interpreted to be 1 of the original XOR problem (Table VIII), respectively. We use a single complex-valued neuron with only one input and a weight w = u + iv ∈ C (assuming that it has no threshold parameters). The activation function is defined as
1C (z) = 1R (x) + i1R (y), z = x + iy,
(73)
where 1R is a real-valued step function defined on R; that is, 1R (u) = 1 if u ≥ 0, 0 otherwise for any u ∈ R. The decision boundary of a single TABLE VIII
The XOR Problem
Input
Output
x1
x2
y
0 0 1 1
0 1 0 1
0 1 1 0
Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
TABLE IX An Encoded XOR Problem for a Single Complex-Valued Neuron Input
Output
z = x + iy
Z = X + iY
−1 − i −1 + i 1−i 1+i
1 0 1+i i
Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
Decision boundary for the imaginary-part (x 5 0) 21 1 i
213
Im Decision boundary for the real-part (y 5 0)
11i
Re 1 21 2 i
0
1 12i
FIGURE 30 The decision boundary in the input space of the complex-valued neuron that solves the XOR problem. The solid circle means that the output in the XOR problem is 1, and the open one 0. Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
complex-valued neuron described above consists of the following two straight lines that intersect orthogonally:
[u − v] · t [x y] = 0,
(74)
u] · t [x y] = 0
(75)
[v
for any input signal z = x + iy ∈ C, where u and v are the real and imaginary parts of the weight parameter w = u + iv, respectively. Equations (74) and (75) are the decision boundaries for the real and imaginary parts of a single complex-valued neuron, respectively. Letting u = 0 and v = 1 (i.e., w = i), provides the decision boundary shown in Figure 30, which divides the input space (the decision region) into four equal sections, and has the highest generalization ability for the XOR problem. On the other hand, the decision boundary of the three-layered real-valued neural network for the XOR problem does not always have the highest generalization ability (Lippmann, 1987). In addition, the required number of learnable parameters is only two (i.e., only w = u + iv), whereas at least nine parameters are needed for the three-layered real-valued neural network to solve the XOR problem (Rumelhart et √al., 1986a, b), where a complex-valued parameter z = x + iy (where i = −1) is counted as two because it consists of a real part x and an imaginary part y. This solution to the XOR problem uses the orthogonal property of a single complex-valued neuron. Note that several researchers had solved the XOR problem with a single complex-valued neuron in different ways (Nemoto and Kono, 1991; Igelnik et al., 2001; Aizenberg, 2006).
214
Tohru Nitta
2. The Detection of Symmetry Another interesting task that cannot be resolved by use of a single real-valued neuron is the detection of symmetry problem (Minsky and Papert, 1969). This section offers a solution to this problem using a single complex-valued neuron with orthogonal decision boundaries. The problem is to detect whether the binary activity levels of a 1D array of input neurons are symmetrical about the center point. For example, the input-output mapping in the case of three inputs is shown in Table X. We used patterns of various lengths (from 2 to 6) and could solve all the cases with single complex-valued neurons. Only a solution to the case with six inputs is presented here because the other cases can be solved similarly. We use a complex-valued neuron with six inputs with weights wk = uk + ivk ∈ C for input signal k (1 ≤ k ≤ 6) (we assume that it has no threshold parameters). In order to solve the problem with the complexvalued neuron, the input-output mapping is encoded as follows. An input xk ∈ R is encoded as an input xk + iyk ∈ C to the input neuron k, where yk = 0 (1 ≤ k ≤ 6); the output 1 ∈ R is encoded as 1 + i ∈ C; and the output 0 ∈ R is encoded as 1 or i ∈ C, which is determined according to inputs (for example, the output corresponding to the input t [0 0 0 0 1 0] is i). The activation function is the same as in Eq. (73). The decision boundary of the complex-valued neuron with six inputs as just described consists of the following two straight lines that
TABLE X Detection of Symmetry Problem with Three Inputs Input
Output
x1
x2
x3
y
0 0 0 1 0 1 1 1
0 0 1 0 1 0 1 1
0 1 0 0 1 1 0 1
1 0 1 0 0 1 0 1
Output 1 means that the corresponding input is symmetric, and 0 asymmetric. Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
215
intersect orthogonally:
[u1 · · · u6 −v1 · · · −v6 ] · t [x1 · · · x6 y1 · · · y6 ] = 0,
(76)
u6 ] · t [x1 · · · x6 y1 · · · y6 ] = 0
(77)
[v1 · · · v6
u1 · · ·
for any input signal zk = xk + iyk ∈ C, where uk and vk are the real and imaginary parts of the weight parameter wk = uk + ivk , respectively (1 ≤ k ≤ 6). Equations (76) and (77) are the decision boundaries for the real and imaginary parts of the complex-valued neuron with six inputs, respectively. Letting t [u1 · · · u6 ] = t [−1 2 −4 4 −2 1] and t [v1 · · · v6 ] = t [1 −2 4 −4 2 −1] (i.e., w1 = −1 + i, w2 = 2 − 2i, w3 = −4 + 4i, w4 = 4 − 4i, w5 = −2 + 2i and w6 = 1 − i) yields the orthogonal decision boundaries shown in Figure 31, which successfully detect the symmetry of the 26 (= 64) input patterns. In addition, the required number of learnable parameters is 12 (i.e., six complex-valued weights), whereas at least 17 parameters are needed for the three-layered real-valued neural network to solve the detection of symmetry (Rumelhart√et al., 1986a, b), where a complex-valued parameter z = x + iy (where i = −1) is counted as two, as in Section VII.B.1.
Decision boundary for the real-part of the net-input (x 5 0)
Im
Decision boundary for the imaginary-part of the net-input (y 5 0)
11i
i 1
Re 0 1 0
0
1
FIGURE 31 The decision boundary in the net-input space of the complex-valued neuron with six inputs that solves the detection of symmetry problem. Note that the plane is not the input space but the net-input space because the dimension of the input space is 6 and the input space cannot be written in a two-dimensional plane. The solid circle indicates a net-input for a symmetric input and the open one asymmetric. There is only one solid circle at the origin. The four circled complex numbers represent the output values of the complex-valued neuron in their regions, respectively. Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
216
Tohru Nitta
3. The Fading Equalization Technology This section, shows that single complex-valued neurons with orthogonal decision boundaries can be successfully applied to the fading equalization technology (Lathi, 1998). Channel equalization in a digital communication system can be viewed as a pattern classification problem. The digital communication system receives a transmitted signal sequence with additive noise and attempts to estimate the true transmitted sequence. A transmitted signal can assume one of the following four possible complex values: −1 − i, −1 + i, 1 − i, √ and 1 + i (i = −1). Thus, the received signal will take value around −1 − i, −1 + i, 1 − i, and 1 + i (for example, −0.9 − 1.2i, 1.1 + 0.84i or a similar value because some noises are added). We need to estimate the true complex values from such complex values with noises. Thus, a method with excellent generalization ability is needed for the estimate. The input-output mapping in the problem is shown in Table XI. We use the same complex-valued neuron with only one input (as in Section VII.B.1). To solve the problem with the complex-valued neuron, the TABLE XI Input-Output Mapping in the Fading Equalization Problem Input
Output
−1 − i −1 + i 1−i 1+i
−1 − i −1 + i 1−i 1+i
Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
TABLE XII An Encoded Fading Equalization Problem for Complex-Valued Neurons Input
Output
−1 − i −1 + i 1−i 1+i
0 i 1 1+i
Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
Complex-Valued Neural Network and Complex-Valued Backpropagation
Decision boundary for the real-part (x 5 0)
Decision boundary for the imaginary-part (y 5 0)
Im 21 1 i 1
217
11i i
11i Re
0
0
21 2 i
0
1 1
12i
FIGURE 32 The decision boundary in the input space of the complex-valued neuron with one input that solves the fading equalization problem. The solid circle indicates an input in the fading equalization problem. The four circled complex numbers represent the output values of the complex-valued neuron in their regions, respectively. Reprinted from Nitta, T. (2003). © 2003, with permission from Elsevier.
input-output mapping in Table XI is encoded as shown in Table XII. Letting u = 1 and v = 0 (i.e., w = 1), we have the orthogonal decision boundary shown in Figure 32, which has the highest generalization ability for the fading equalization problem, and can estimate true signals without errors. In addition, the required number of learnable parameters is only two (i.e., only w = u + iv).
VIII. CONCLUSIONS We have described the multilayered complex-valued neural network model where the input signals, weights, thresholds, and output signals are all complex numbers, and the related Complex-BP, a complex-valued version of the backpropagation learning algorithm. Furthermore, we have elucidated their inherent properties. The error backpropagation has a structure that concerns 2D motion. A unit of learning consists of complex-valued signals flowing through the neural network. Compared with the updating rule of the Real-BP, the Complex-BP updating rule is such that it reduces the probability for a standstill in learning. As a result, the average convergence speed is superior to that of the Real-BP (whereas the generalization performance remains unchanged). In addition, the number of learnable parameters needed is almost half of the Real-BP, where a complex-valued parameter z = x + iy was counted as two because it consisted of a real part x and an imaginary
218
Tohru Nitta
part y. Thus, it seems that the Complex-BP algorithm is well suited for learning complex-valued patterns. Of note, the Complex-BP can transform geometric figures in a way that the Real-BP cannot. Numerical experiments suggest that the behavior of a Complex-BP network that has learned the transformation of geometric figures is related to the identity theorem in complex analysis. Mathematical analysis indicates that a Complex-BP network that has learned a transformation, has the ability to generalize that transformation with an error represented by the sine of the difference between the argument of the test point and that of the training point. This mathematical result agrees qualitatively with simulation results. Furthermore, the 1-n-1 type Complex-BP network, with the ability to transform geometric figures can also solve a continuous mapping task on the usual generalization ability very well, as can the 2-m-2 type Real-BP network. We believe that the structure of the learning patterns caused the successful experimental result that the 1-n-1 type Complex-BP network can solve the continuous mapping task. A decision boundary of a single complex-valued neuron consists of two hypersurfaces that intersect orthogonally and divides a decision region into four equal sections. The XOR problem and the detection of symmetry problem that cannot be solved with a single real-valued neuron can be solved by a single complex-valued neuron with the orthogonal decision boundaries, which reveals a potent computational power of complex-valued neurons. Furthermore, the fading equalization problem can be successfully solved by a single complex-valued neuron with the highest generalization ability. The work presented in this chapter probably represents just the beginning of the possible extension of the backpropagation algorithm to the complex number domain.
REFERENCES Aizenberg, I. (2006). Solving the parity n problem and other nonlinearly separable problems using a single universal binary neuron. In “Advances in Soft Computing. Springer Series. Computational Intelligence, Theory and Application” (B. Reusch, ed.), pp. 457–471. Springer, New York. Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Trans. Electr. Comput. 16(3), 299–307. Arena, P., Fortuna, L., Re, R., and Xibilia, M. G. (1993). On the capability of neural networks with complex neurons in complex valued functions approximation. Proc. IEEE Int. Symposium. Circ. Syst. 4, 2168–2171. Arena, P., Fortuna, L., Muscato, G., and Xibilia, M. G. (1998). Neural networks in multidimensional domains. Lect Notes Contr. Inf. Sci. 234, 1–165. Benvenuto, N., and Piazza, F. (1992). On the complex backpropagation algorithm. IEEE Trans. Signal Proc. 40(4), 967–969. Derrick, W. R. (1984). “Complex Analysis and Applications.” Wadsworth, New York.
Complex-Valued Neural Network and Complex-Valued Backpropagation
219
Georgiou, G. M., and Koutsougeras, C. (1992). Complex domain backpropagation. IEEE Trans. Circ. Syst.–II 39(5), 330–334. Hirose, A. (ed.). (2003). “Complex-Valued Neural Networks—Theories and Applications.” World Scientific Publishing, Singapore. ICANN/ICONIP. (2003). Seven papers in the special session: Complex-valued neural networks. In “Artificial Neural Networks and Neural Information Processing.” Lect. Notes Comput Sci. 2714, 943–1002 (Proceedings of International Conference on Artificial Neural Networks/International Conference on Neural Information Processing, ICANN/ICONIP ’03–Istanbul, June 26–29). ICANN. (2007). Six papers in the special session: Complex-valued neural networks. In “Artificial Neural Networks and Neural Information Processing.” Lect. Notes Comput. Sci. 4668, 838–893 (Proceedings of International Conference on Artificial Neural Networks, ICANN ’07–Portugal, September 9–13). ICONIP. (2002). Six papers in the special session: Complex-valued neural networks. Proc. Int. Conf. Neural Inf. Proc. 3, 1074–1103. Igelnik, B., Tbib-Azar, M., and LeClair, S. R. (2001). A net with complex weights. IEEE Trans. Neural Networks 12(2), 236–249. IJCNN. (2006). Twelve papers in the special session: Complex-valued neural networks. Proc. Int. Joint Conf. Neural Netw. Vancouver, BC, Canada, pp. 595–626, 1186–1224. KES. (2001). Six papers in the special session: Complex-valued neural networks and their applications. In “Knowledge-based Intelligent Information Engineering Systems and Allied Technologies” (N. Baba, L. C. Jain, and R. J. Howlett, eds.), Part I, pp. 550–580. IOS Press, Tokyo. KES. (2002). Five papers in the special session: Complex-valued neural networks. In “Knowledge-Based Intelligent Information Engineering Systems and Allied Technologies” (E. Damiani, R. J. Howlett, L. C. Jain, and N. Ichalkaranje, eds.), Part I, pp. 623–647. IOS Press, Amsterdam. KES. (2003). Seven papers in the special session: Complex-valued neural networks. In “Knowledge-based Intelligent Information and Engineering Systems.” Lect. Notes Comput. Sci. 2774, 304–357. Kim, M. S., and Guest, C. C. (1990). Modification of backpropagation networks for complexvalued signal processing in frequency domain. Proc. Int. Joint Conf. Neural Netw. 3, 27–31. Kim, T., and Adali, T. (2003). Approximation by fully complex multilayer perceptrons. Neural Computation 15(7), 1641–1666. Kuroe, Y., Hashimoto, N., and Mori, T. (2002). On energy function for complex-valued neural networks and its applications. Proc. Int. Conf. Neural Inf. Proc. 3, 1079–1083. Lathi, B. P. (1998). “Modern Digital and Analog Communication Systems.” Oxford University Press, New York, 3rd ed. Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE Acoustic Speech Signal Proc. 4, April 4–22. Minsky, M. L., and Papert, S. A. (1969). “Perceptrons.” MIT Press, Cambridge. Miura, M., and Aiyoshi, E. (2003). Approximation and designing of fractal images by complex neural networks. EEJ Trans. Electron. Inf. Syst. 123(8), 1465–1472 (in Japanese). Nemoto, I., and Kono, T. (1991). Complex-valued neural networks. Trans. Inst. Electron. Inf. Comm. Eng. J74–D-II, 1282–1288 (in Japanese). Nitta, T., and Furuya, T. (1991). A complex back-propagation learning. Trans. Inf. Proc. Soc. Jp. 32(10), 1319–1329 (in Japanese). Nitta, T. (1993). A complex numbered version of the back-propagation algorithm. Proc. World Congress Neur. Netw. 3, 576–579. Nitta, T. (1997). An extension of the back-propagation algorithm to complex numbers. Neur. Netw. 10(8), 1392–1415.
220
Tohru Nitta
Nitta, T. (2000). An analysis of the fundamental structure of complex-valued neurons. Neur. Proc. Lett. 12(3), 239–246. Nitta, T. (2003). Solving the XOR problem and the detection of symmetry using a single complex-valued neuron. Neur. Netw. 16(8), 1101–1105. Nitta, T. (2004). Orthogonality of decision boundaries in complex-valued neural networks. Neur. Computation 16(1), 73–97. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986a). Learning internal representations by error propagation. In “Parallel Distributed Processing: Explorations in the microstructures of cognition” (D. E. Rumelhart and J. L. McClelland, eds.), vol. 1, pp. 318–362. Cambridge, MA: MIT Press. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986b). Learning representations by back-propagating errors. Nature 323, 533–536. Tsutsumi, K. (1989). A multi-layered neural network composed of backpropagation and Hopfield nets and internal space representation. Proc. Int. Joint Conf. Neural Netw. Washington, D.C., June 1, pp. 365–371. Watanabe, A., Yazawa, N., Miyauchi, A., and Miyauchi, M. (1994). A method to interpret 3D motions using neural networks. IEICE Trans. Fund. Electr. Comm. Comput. Sci. E77-A(8), 1363–1370.
CHAPTER
5 Blind Source Separation: The Sparsity Revolution Jerome Bobin*, Jean-Luc Starck*, Yassir Moudden*, and Mohamed Jalal Fadili†
Contents
I Introduction A Organization B Definitions and Notations II Blind Source Separation: A Strenuous Inverse Problem A Modeling Multichannel Data B Independent Component Analysis C The Algorithmic Viewpoint III Sparse Multichannel Signal Representation A The Blessing of Sparsity and Overcomplete Signal Representations B The Sparse Decomposition Issue C Overcomplete Multichannel Representations IV Morphological Component Analysis for Multichannel Data A Morphological Diversity and Morphological Component Analysis B Multichannel Overcomplete Sparse Recovery C Multichannel Morphological Component Analysis D Recovering Sparse Multichannel Decompositions Using mMCA E Handling Bounded Noise With mMCA F Choosing the Overcomplete Dictionary V Morphological Diversity and Blind Source Separation A Generalized Morphological Component Analysis B Results C Speeding Up Blind GMCA D Unknown Number of Sources E Variations on Sparsity and Independence F Results
222 223 223 224 224 226 227 231 231 233 234 237 237 239 239 242 243 243 244 245 249 250 257 263 265
* Laboratoire AIM, CEA/DSM-CNRS-Université Paris Diderot, CEA Saclay, IRFU/SEDI-SAP, Service d’Astrophysique, Orme des Merisiers, 91191, Gif-sur-Yvette, France † GREYC CNRS UMR 6072, Image Processing Group, ENSICAEN 14050, Caen Cedex, France
Advances in Imaging and Electron Physics,Volume 152, ISSN 1076-5670, DOI: 10.1016/S1076-5670(08)00605-8. Copyright © 2008 Elsevier Inc. All rights reserved.
221
222
Jerome Bobin et al.
VI Dealing With Hyperspectral Data A Specificity of Hyperspectral Data B GMCA for Hyperspectral BSS C Comparison With GMCA VII Applications A Application to Multivalued Data Restoration B Application to the Planck Data C Software VIII Conclusion References
275 275 276 280 284 284 292 296 296 298
I. INTRODUCTION Finding a suitable representation of multivariate data is a longstanding problem in statistics and related areas. Good representation means that the data are somehow transformed so that their essential structure is made more visible or more easily accessible. This problem is encountered for instance, in unsupervised learning, exploratory data analysis, and signal-processing. In the latter, a typical field where the good representation problem arises is source separation. Over the past few years, the development of multichannel sensors motivated interest in such methods for the coherent processing of multivariate data. Areas of application include biomedical engineering, medical imaging, speech processing, astronomical imaging, remote sensing, communication systems, seismology, geophysics, and econometrics. Consider a situation where a collection of signals is emitted by some physical objects or sources. These physical sources could be, for example, different brain areas emitting electrical signals; people speaking in the same room (the classical cocktail party problem), thus emitting speech signals; or radiation sources emitting their electromagnetic waves. Assume further that there are several sensors or receivers. These sensors are in different positions, so that each sensor records a mixture of the original source signals with different weights. It is assumed that the mixing weights are unknown, since their knowledge entails knowing all the properties of the physical mixing system, which is not accessible in general. Of course, the source signals are unknown as well, since the primary problem is that they cannot be recorded directly. The blind source separation (BSS) problem is to find the original signals from their observed mixtures, without prior knowledge of the mixing weights, and with very little knowledge of the original sources. In the classical cocktail party example, the BSS problem amounts to recovering the voices of the different speakers from the mixtures recorded at several microphones. A flurry of research activity has focused on BSS, which is one of the hottest areas in the signal-processing community. Some specific
Blind Source Separation: The Sparsity Revolution
223
issues have already been addressed using a blend of heuristic ideas and rigorous derivations, as indicated by the extensive literature on the subject. As clearly emphasized by previous work, it is fundamental that the sources to be retrieved present some quantitatively measurable diversity (e.g., decorrelation, independence, morphological diversity). Recently sparsity and morphological diversity have emerged as a novel and effective source of diversity for BSS. This chapter provides new and essential insights into the use of sparsity in source separation and outlines the fundamental role of morphological diversity as a source of diversity or contrast between the sources. This chapter describes a BSS method, and more generally a multichannel sparsity-based data analysis framework, termed generalized morphological component analysis (GMCA), which is fast, efficient, and robust to noise. GMCA takes advantage of both morphological diversity and sparsity, using recent sparse overcomplete signal representations. Theoretical arguments and numerical experiments in multivariate image processing are reported to characterize and illustrate the good performance of GMCA for BSS.
A. Organization Section II formally states the BSS problem and surveys the current state of the art in the field of BSS. Section III provides the necessary background on sparse overcomplete representation and decomposition, with extensions to the multichannel setting. Section IV describes the multichannel extension of the morphological component analysis (MCA) algorithm and states some of its theoretical properties. Section V presents a new way of thinking of sparsity in BSS. All necessary ingredients introduced in previous sections are combined, and the GMCA algorithm for BSS is provided. The extension of GMCA to hyperspectral data and application of GMCA to multichannel data restoration analysis are reported in Section VI and VII.A. We also discuss an application of the GMCA BSS algorithm to an astronomical imaging experiment.
B. Definitions and Notations Unless stated otherwise, a vector x will be a row vector 1 2 x = [x1 , . . . , xt ]. We equip the vector space Rt with the scalar product x, y = xyT . The p -norm 0 / p 1/p , with the usual notation of a vector x is defined by &x&p = i |x[i]| &x&∞ = maxi |x[i]|. The notation &x&0 defines the 0 quasi-norm of x (i.e., the number of nonzero elements in x). Bold symbols represent matrices, and XT is the transpose of X. The i-th entry of xp is xp [i], xp is the p-th row, and xq the q-th column of X. The 0 / p 1/p , “entrywise” p-norm of a matrix X is defined by &X&p = i, j |xi [j]|
224
Jerome Bobin et al.
not to be confused with matrix-induced The Frobenius norm of / p-norms. 0 X is obtained for p = 2, &X&2F = Trace XT X . Similar to vectors, &X&∞ and &X&0 , respectively, denote the maximum in magnitude and the number of nonzero entries in the matrix X. In the proposed iterative algorithms, x˜ (h) will be the estimate of x at iteration h. = [φ1T , . . . , φTT ]T defines a T × t dictionary the rows of which of (see Tropp, 2004, are unit 2 -norm atoms {φi }i . The mutual1coherence 2 and references therein) is μ = maxi =j φi , φj . When T > t, this dictionary is said to be redundant or overcomplete. In the next section, we will be interested in the decomposition of a signal x in . We thus define S0 (x) (respectively S1 (x)) the set of solutions to the minimization problem minc &c&0 s.t. x = c (respectively, minc &c&1 s.t. x = c). When the 0 -sparse decomposition of a given signal x has a unique solution, let α = (x), where x = α denotes this solution. Finally, we define λ (.) to be a thresholding operator with threshold λ (hard thresholding or soft thresholding; this will be specified when needed). The support (x) of row vector x is (x) = {k; |x[k]| > 0}. Note that the notion of support is well adapted to 0 -sparse signals as these are synthesized from a few nonzero dictionary elements. Similarly, we define the δ-support of x as δ (x) = {k; |x[k]| > δ&x&∞ }.
II. BLIND SOURCE SEPARATION: A STRENUOUS INVERSE PROBLEM A. Modeling Multichannel Data 1. The Blind Source Separation Model In a source separation setting, the observed data are composed √ √of m distinct monochannel datum {xi }i=1,...,m . Each datum could be a t × t image or a monodimensional signal with t samples. In the next, we assume that each observation {xi }i=1,...,m is a row-vector of size t. The classical instantaneous linear mixture model states that each datum is the linear combination of n so-called sources {sj }j=1,...,n such that:
∀i = 1, . . . , m;
xi =
n
aij sj ,
(1)
j=1
where the set of scalar values {aij }i=1,...,m; j=1,...,n models the “weight” of each source in the composition of each datum. For convenience, the mixing model with additive noise can be rewritten in matrix form:
X = AS + N,
(2)
Blind Source Separation: The Sparsity Revolution
225
where X is the m × t measurement matrix, S is the n × t source matrix, and A is the m × n mixing matrix. A defines the contribution of each source to each measurement. An m × t matrix N is added to account for instrumental noise or model imperfections. In this chapter, we study only the overdetermined case: m ≥ n; the converse underdetermined case (m < n) is a more difficult problem (see Georgiev et al., 2005; Jourjine et al., 2000 for further details). Further work is devoted to this particular case. In the BSS problem, both the mixing matrix A and the sources S are unknown and must be estimated jointly. In general, without further a priori knowledge, decomposing a rectangular matrix X into a linear combination of n rank-1 matrices is clearly ill posed. The goal of BSS is to understand the different cases in which this or that additional prior constraint allows discovery of well-posed inverse problems and devising separation methods that can handle the resulting models.
2. A Question of Diversity Note that the mixture model in Eq. (1) is equivalent to the following one:
X=
n
ai s i ,
(3)
i=1
where ai is the i-th column of A. BSS is equivalent to decomposing the data X into a sum of n rank-1 matrices {Xi = ai si }i=1,...,n . Obviously, there are infinitely many ways of decomposing a given matrix with rank n into the linear combination of n rank-1 matrices. Further information is required to disentangle the sources. Let us assume that the sources are random vectors. These may be known a priori to be different in the sense of being simply decorrelated. A separation scheme then looks for sources S such that their covariance matrix RS is diagonal. Unfortunately, the covariance matrix RS is invariant by orthonormal transformations such as rotations. Therefore, an effective BSS method must go beyond decorrelation (see Cardoso, 1998, 2001 for further reflections about the need for stronger a priori constraints going beyond the decorrelation assumption). The next sections emphasize different sets of a priori constraints and different methods to handle them. Section II.B, gives an overview of BSS methods that use statistical independence as the key assumption for separation. Recently sparsity has emerged as a very effective method to distinguish the sources. These new approaches are introduced in Section II.C.
226
Jerome Bobin et al.
B. Independent Component Analysis 1. Generalities The previous section emphasized the need for further a priori assumptions to bring BSS to the “land” of well-posed inverse problems. This section deals with noiseless mixtures assuming that X = AS. The case where the data are perturbed by additive noise is discussed at the end of this section. The seminal work by Comon (1994) paved the way for the outgrowth of independent component analysis (ICA). In the celebrated ICA framework, the sources are assumed to be independent random variables with joint probability density function fS such that:
fS (s1 , . . . , sn ) =
n M
fsi (si ).
(4)
i=1
Disentangling sources requires a means to measure the difference of separable sources. As statistical independence is verified by the probability density function (pdf) of the sources, devising a good “measure” of independence is not trivial. In that setting, ICA then is reduced to finding a multichannel representation/basis on which the estimated sources S˜ are as “independent as possible.” Equivalently, ICA looks for a separating/ demixing matrix B such that the estimated sources S˜ = BAS are independent. Until the end of the section devoted to ICA, we will assume that the mixing matrix A is a square invertible matrix (m = n and det(A) > 0). Previously, we could wonder whether independence makes the sources identifiable. Under mild conditions, the Darmois theorem (Darmois, 1953) shows that statistical independence means separability (Comon, 1994). It states that if at most one of the sources is generated from a Gaussian distribution, then if the entries of S˜ = BAS are independent, then B is a separating matrix and S˜ is equal to S up to a scale factor (multiplication by a diagonal matrix with strictly positive diagonal entries) and permutation. As a consequence, if at most one source is Gaussian, maximizing independence between the estimated sources leads to perfect estimation of S and A = B−1 . The Darmois theorem then motivates the use of independence in BSS. It paved the way for the popular ICA.
a. Independence and Gaussianity. The Kullback–Leibler (KL) divergence from the joint density fS (s1 , . . . , sn ) to the product of its marginal density is
Blind Source Separation: The Sparsity Revolution
227
a popular measure of statistical independence:
( J (S) = K fS (s1 , . . . , sn ),
n M
) fS (si )
" fS (s1 , . . . , sn ) = fS (s1 , . . . , sn ) log Nn . S i=1 fS (si ) E
(5)
i=1
!
(6)
Interestingly (see Cardoso, 2003), the KL can be decomposed into two terms as follows:
J (S) = C (S) −
n
G (si ) + K,
(7)
i=1
/ 0+ * * where C (S) = K N (E{S}, RS ), N E{S}, diag (RS ) and G (si ) = K f (si ) , 0+ / N E{si }, σs2i , σs2i is the variance of si , and N (m, ) is the normal probability density function with mean m and covariance . In Eq. (7) K is a constant. The first term in Eq. (7) vanishes when the sources are decorrelated. The second term measures the marginal Gaussianity of the sources. This decomposition of the KL entails that maximizing independence is equivalent to minimizing the correlation between the sources and maximizing their non-Gaussianity. Note that, owing to the central limit theorem, intuition tells us that mixing independent signals should lead to a kind of Gaussianization. It then seems natural that demixing leads to processes that deviate from Gaussian processes.
C. The Algorithmic Viewpoint a. Approximating Independence. In the ICA setting, the mixing matrix is square and invertible. Solving a BSS problem is equivalent to looking for a demixing matrix B that maximizes the independence of the estimated sources: S˜ = BX. In that setting, maximizing the independence of the sources (with respect to the KL divergence) is equivalent to maximizing the non-Gaussianity of the sources. Since the seminal article by Comon (1994), a variety of ICAalgorithms have been proposed. They all merely differ in the way they devise assessable quantitative measures of independence. Some popular approaches that have given “measures” of independence are presented below: • Information maximization (see Bell and Sejnowski, 1995; Nadal and Parga, 1994): Bell and Sejnowski showed that maximizing the information of the sources is equivalent to minimizing the measure of independence based on the KL divergence in Eq. (5).
228
Jerome Bobin et al.
• Maximum likelihood: Maximum likelihood (ML) has also been proposed to solve the BSS issue. The ML approach (Cardoso, 1997; Parra and Pearlmutter, 1997; Pham et al., 1992) has been showed to be equivalent to information maximization (InfoMax) in the ICA framework. • Higher-order statistics: As noted previously, maximizing the independence of the sources is equivalent to maximizing their non-Gaussianity under a strict decorrelation constraint. Because Gaussian random variables have vanishing higher-order cumulants, devising a separation algorithm based on higher-order cumulants should provide a way of accounting for the non-Gaussianity of the sources. A wide range of algorithms have been proposed based on the use of higher-order statistics (Hyvarinen et al., 2001; Belouchrani et al., 1997; Cardoso, 1999, and references therein). Historical papers (see Comon, 1994) proposed ICA algorithms that use approximations of the KL divergence (based on truncated edgeworth expansions). Interestingly, those approximations explicitly involve higher-order statistics. Lee et al. (1998) showed that most ICA-based algorithms are similar in theory and in practice.
b. Limits of ICA. Despite its theoretical strength and elegance, ICA has several limitations: • Probability density assumption: Even implicit, ICA algorithm requires information on the sources distribution. As stated in Lee et al. (1998), whatever the contrast function to minimize (mutual information, ML, higher-order statistics), most ICAalgorithms can be equivalently restated in a natural gradient form (Amari, 1999; Amari and Cardoso, 1996). In such a setting, the “demixing” matrix B is estimated iteratively: B ← B + μB where the natural gradient of B is given by:
˜ S˜ T B, B ∝ I − h(S)
(8)
* + ˜ = h(˜sij ) and S˜ is the where the function h is applied elementwise: h(S) current estimate of S: S˜ = BX. Interestingly, the so-called score function h in Eq. (8) is closely related to the assumed pdf of the sources (see Amari and Cardoso, 1996; Amari and Cichocki, 2002). Assuming that all the sources are generated from the same probability density function fS , the so-called score function h is defined as follows:
˜ =− h(S)
˜ ∂ log fS (S) ∂S˜
.
(9)
Blind Source Separation: The Sparsity Revolution
229
As expected, the way the “demixing” matrix (and thus the sources) is estimated closely depends on the way the sources are modeled (from a statistical point of view). For instance, separating platykurtic (distribution with negative kurtosis) or leptokurtic (distribution with positive kurtosis) sources requires completely different score functions. Even if ICA is shown in Amari and Cardoso to be quite robust to “mismodeling,” the choice of the score function is crucial with respect to the convergence (and rate of convergence) of ICA algorithms. Some ICA-based techniques (see Koldovsky and Oja, 2006) emphasized adapting the popular FastICA algorithm to adjust the score function to the distribution of the sources. They particularly emphasize modeling sources the distribution of which belongs to specific parametric classes of distributions such as N generalized Gaussian: fS (S) ∝ ij exp(−μ|sij |θ ).1 • Noisy ICA: Only a few works have already investigated the problem of noisy ICA (see Davies, 2004; Koldovsky and Tichavsky, 2006). As pointed out by Davies (2004), noise clearly degenerates the ICA model: it is not fully identifiable. In the case of additive Gaussian noise as stated in Eq. (2), using higher-order statistics yields an efficient estimate of the mixing matrix A = B−1 (higher-order statistics are blind to additive Gaussian noise; this property does not hold for non-Gaussian noise). Further, in the noisy ICA setting, applying the demixing matrix to the data does not yield an efficient estimate of the sources. Furthermore, most ICA algorithms assume the mixing matrix A to be square. When there are more observations than sources (m > n), a dimension reduction step is preprocessed. When noise perturbs the data, this subspace projection step can dramatically deteriorate the performance of the separation stage. The next section introduces a new way of modeling the data to avoid most of the aforementioned limitations of ICA.
1. Sparsity in Blind Source Separation In the above paragraph, we pointed out that BSS is overwhelmingly a question of contrast and diversity. Indeed, devising a source separation technique consists of finding an effective way of disentangling between the sources. From this viewpoint, statistical independence is a kind of “measure” of diversity between signals. Within this paradigm, we can wonder if independence is a natural way of differentiating between signals. As a statistical property, independence is a non-sense in a nonasymptotic study. In practice, one must deal with finite-length signals, sometimes with a few samples. Furthermore, most real-world data are 1 Note that the class of generalized Gaussian contains well-known distributions: the Gaussian (θ = 2) and
the Laplacian (θ = 1) distributions.
230
FIGURE 1
Jerome Bobin et al.
Examples of natural images.
modeled by stationary stochastic processes. Let us consider the images in Figure 1. Natural pictures are clearly nonstationary. As these pictures are slightly correlated, independence fails in differentiating between them. Hopefully, the human eye (more precisely the different levels of the human visual cortex) is able to distinguish between those two images. Then, what makes the eye so effective in discerning between visual “signals”? The answer may come from neurosciences. Indeed, for a decades, many researchers (Barlow, 1961; Field, 1999; Hubel and Wiesel, 1981;2 Olshausen and Field, 2006; Simoncelli and Olshausen, 2001, and references therein) in this field have endeavored to provide some exciting answers: the mammalian visual cortex seems to have learned via the natural selection of individuals, an effective way of coding the information in natural scenes. Indeed, the first level of the mammalian visual cortex (termed V1) seems to verify several interesting properties: (1) it tends to “decorrelate” the responses of visual receptive fields (following Simoncelli and Olshausen, 2001; an efficient coding cannot duplicate information in more than one neuron), (2) owing to a kind of “economy/compression principle,” saving neurons’ activity yields a sparse activation of neurons for a given stimulus (this property can be considered as a way of compressing information). Furthermore, the primary visual cortex is sensitive to particular stimuli (visual features) that surprisingly look like oriented Gabor-like wavelets (see Field, 1999). It gives support to the crucial part played by contours in natural scenes. Furthermore, each stimulus tends to be coded by a few neurons. Such a way of coding information is often referred to as sparse coding. These few elements of neuroscience motivate the use of sparsity 2 Hubel and Wiesel were awarded with the Nobel Prize in medicine in 1981.
Blind Source Separation: The Sparsity Revolution
231
as an effective way of compressing signal’s information, thus extracting its very essence. Inspired by the behavior of our visual cortex, seeking a sparse code may provide an effective way of differentiating between “different” signals. Here, “different” signals are signals with different sparse representations.
a. A Pioneering Work in Sparse BSS. The seminal paper of Zibulevsky and Pearlmutter (2001) introduced sparsity as an alternative to standard contrast functions in ICA. In this work, the authors proposed to estimate the mixing matrix A and the sources S in a fully Bayesian framework. Each source {si }i=1,...,n is assumed to be sparsely represented in the basis : ∀i = 1, . . . , n;
si =
t
αi [k]φk .
(10)
k=1
As the sources are assumed to be sparse, the distribution of their coefficients in is a “sparse” (i.e., leptokurtic) prior distribution:
fS (αi [k]) ∝ e−μi gγ (αi [k]) ,
(11)
where gγ (αi [k]) = |αi [k]|γ with γ ≤ 1.3 Zibulevsky proposed to estimate A and S via a maximum a posteriori (MAP) estimator. The optimization task is then run using a Newton-like algorithm: the relative newton algorithm (RNA; see Zibulevski, 2003 for more details). This new sparsity-based method paved the way for the use of sparsity in BSS. Note that several other works emphasized the use of sparsity in a parametric Bayesian approach (Hyvarinen et al., 2001 and references therein). Recently, sparsity has emerged as an effective tool for solving underdetermined source separation issues (Bronstein et al., 2005; Georgiev et al., 2005; Li et al., 2006; Vincent, 2007 and references therein). This chapter concentrates on overdetermined BSS (m ≥ n). Inspired by the work of Zibulevsky, we present a novel sparsity-based source separation framework providing new insights into BSS.
III. SPARSE MULTICHANNEL SIGNAL REPRESENTATION A. The Blessing of Sparsity and Overcomplete Signal Representations The last section emphasized the crucial role played by sparsity in BSS. Indeed, sparse representations provide an effective way to “compress” 3 Applying g (.) pointwisely to a vector α is equivalent to computing its norm. γ γ i
232
Jerome Bobin et al.
signals to a few very significant content. In previous work (see Bobin et al., 2006, 2007), we claimed that the sparser the signals are, the better the separation is. Therefore, the first step toward separation consists in finding an effective sparse representation, where effective means very sparse. Owing to its essential role in BSS, this section particularly emphasizes the quest for sparse representation.
1. What’s at Stake? In the past decade sparsity has emerged as one of the leading concepts in a wide range of signal-processing applications (restoration (Starck et al., 2002), feature extraction (Starck et al., 2005), source-separation (Bobin et al., 2006; Li et al., 2006; Zibulevsky and Pearlmutter, 2001), and compression (Vetterli, 2001), to name only a few). Sparsity has long been a theoretical and practical attractive signal property in many areas of applied mathematics (computational harmonic analysis (Donoho et al., 1998), statistical estimation (Donoho, 1993; Donoho and Johnstone, 1995)). Very recently researchers have advocated the use of overcomplete signal representations. Indeed, the attractiveness of redundant signal representations relies on their ability to sparsely represent a large class of signals. Furthermore, handling very sparse signal representations allows more flexibility and entails effectiveness in many signal-processing tasks (restoration, separation, compression, estimation). Neuroscience also underlined the role of overcompleteness. Indeed, the mammalian visual system has been shown to probably be in need of overcomplete representation (Olshausen and Field, 2006). In that setting, overcomplete sparse coding may lead to more effective (sparser) codes. In signal-processing, both theoretical and practical arguments (Starck et al., 2002, 2007) have supported the use of overcompleteness. It entails more flexibility in representation and effectiveness in many image-processing tasks. In the general sparse representation framework, a line vector signal x ∈ Rt is modeled as the linear combination of T elementary waveforms (the so-called signal atoms):
{φi }i=1,...,T ;
x=
T i=k
α[k]φk ,
(12)
2 1 where α[k] = x, φk are called the decomposition coefficients of x in the dictionary = [φ1T , . . . , φTT ]T (the T × t matrix whose rows are the atoms normalized to a unit 2 -norm). In the case of overcomplete representations, the number of waveforms {φk } that compose the dictionary is higher than the dimension of the space in which x lies: T > t. In practice, the dimensionality of the sparse decomposition (i.e., the vector of coefficients α) can be very high: T ) t.
Blind Source Separation: The Sparsity Revolution
233
Nonetheless, handling overcomplete representations is clearly an ill-posed problem owing to elementary linear algebra. Indeed decomposing a signal in an overcomplete representation requires solving an underdetermined linear problem with more unknowns than data: T > t. Linear algebra tells us that the problem x = α has no unique solution. The next section provides solutions to this puzzling issue.
B. The Sparse Decomposition Issue The transition from ill-posedness to well-posedness in the sparse decomposition framework is often fulfilled by reducing the space of candidate solutions to those satisfying some side constraints. Researchers have emphasized adding a sparsity constraint to the previous ill-posed problem. Among all the solutions of x = α the sparsest one (with the least number of nonzero coefficients αi ) is preferred, Donoho and Huo (2001) proposed to solve the following minimization problem:
min &α&0 s.t x = α. α
(13)
Clearly this is a combinatorial optimization problem that requires enumerating all the combinations of atoms {φi }i=1,...,T that synthesize x. This nondeterministic polynomial time (NP)-hard problem then appears hopeless. Donoho and Huo (2001) proposed to relax the nonconvex 0 sparsity by substituting the problem in Eq. (13) with the following convex problem:
min &α&1 s.t. x = α. α
(14)
The problem in Eq. (14) is known as basis pursuit (see Chen et al., 1998). However, the solutions to the 0 and 1 problems are not equivalent in general. An extensive work (Bruckstein and Elad, 2002; Donoho and Elad, 2003; Donoho and Huo, 2001; Fuchs, 2004; Feuer and Nemirovsky, 2003; Gribonval and Nielsen, 2003; Tropp, 2004) has focused on conditions under which the problems in Eqs. (13) and (14) are equivalent. Considering that x = k∈(x) α[k]φk , we recall that (x) is the support of x in and K = Card ((x)). The signal x is said to be K-sparse in . Interestingly, the first seminal work addressing the uniqueness and equivalence of the solutions to the 0 and 1 sparse decomposition recovery emphasized essentially the structure of the overcomplete dictionary . One quantitative measure that gives information about the structure of an overcomplete dictionary is its mutual coherence μ (see also Section I.B):
1 2 μ = max φi , φj . i =j
(15)
234
Jerome Bobin et al.
This parameter can be viewed as a worst-case measure of resemblance between all pairs of atoms. Interestingly, Donoho and Huo (2001) showed / 0 that if a vector x∗ with Card (x∗ ) = K is sufficiently sparse and verifies:
K
m, and is a T × t overcomplete dictionary with T > t. Let us first consider the noiseless case. The multichannel extension of Eq. (13) is written as follows:
min &α&0 s.t X = α, α
(28)
where α is an M × T matrix [see also Eq. (20)]. Arguing as in the monochannel case, the convex 1 minimization problem in Eq. (14) can also be rewritten in the multichannel setting:
min &α&1 s.t X = α; α
(29)
see also Eq. (21).
C. Multichannel Morphological Component Analysis The problem at stake in Eq. (27) can be solved by extending to the multichannel case well-known sparse decomposition algorithms as reviewed in subsection 3.3.2. Extension of matching pursuit (MP) and OMP to the multichannel case has been proposed by Gribonval and Nielsen (2006). The
240
Jerome Bobin et al.
aforementioned greedy methods iteratively select one dictionary atom at a time. Unfortunately, this stepwise selection of active atoms is burdensome and the process may be sped up, as in Donoho et al. (submitted), where a faster stagewise orthogonal matching pursuit (StOMP) is introduced. It is shown to solve the 0 sparse recovery problem in Eq. (13) with random dictionaries under mild conditions. Because of the particular structure of the problem in Eq. (27), extending the MCA algorithm (Starck et al., 2005) to the multichannel case would lead to faster and still effective decomposition results. Recall that in the mMCA setting, the data X are assumed to be the linear combination of D × D morphological components {jk }j=1,...,D;k=1,...,D . (jk ) is the support of jk in the subdictionary jk = k ⊗ j . As X is K-sparse in the whole dictionary, / 0 j, k Card (jk ) = K. The data can be decomposed as follows:
X=
D D
jk =
D D
αjk [i]ψjk [i].
(30)
j=1 k=1 i∈(jk )
j=1 k=1
Substituting Eq. (30) in Eq. (27), the mMCA algorithm approaches the solution to Eq. (27) by iteratively and alternately estimating each morphological component jk in a Block-coordinate relaxed way (see Sardy et al., 2000). Each matrix of coefficients αjk is then updated as follows:
O O2 αjk = arg min ORjk − k αjk j OF + 2λ&αjk &1 , αjk
(31)
where Rjk = X − p, q =j,k q αpq p is a residual term. Since we are assuming that the subdictionaries {j }j and {k }k are orthonormal, the updated rule in Eq. (31) is equivalent to the following:
O O2 O O αjk = arg min OTk Rk Tj − αjk O + 2λ&αjk &1 , αjk
F
(32)
which has a unique solution αjk = λ Tk Rk Tj known as soft thresholding with threshold λ as follows:
λ (u[i]) =
0 if u[i] < λ . u[i] − λ sign (u[i]) if u[i] ≥ λ
(33)
For a fixed λ, mMCA selects groups of atoms based on their scalar product with the residual Rjk . Assuming that we select only the most coherent atom (with the highest scalar product) with the residual Rjk , then
241
Blind Source Separation: The Sparsity Revolution
one mMCA iteration is reduced to a stepwise multichannel matching pursuit (mMP) step. In contrast to mMP, the mMCA algorithm is allowed to select several atoms at each iteration. Thus, when hard thresholding is used instead of soft thresholding, mMCA is equivalent to a stagewise mMP algorithm. Allowing mMCA to select new atoms is obtained by decreasing the threshold λ at each iteration. The mMCA algorithm is summarized as follows: 1. Set the number of iterations Imax and threshold λ(0) . 2. While λ(h) is higher than a given lower bound λmin (e.g., can depend on the noise variance, see Section IV.E), For j = 1, . . . , D and k = 1, . . . , D
(h) • Compute the residual term Rjk assuming the current estimates of pq =jk , (h−1) ˜ pq =jk are fixed: (h) (h−1) Rjk = X − pq =jk ˜ pq =jk .
• Estimate the current coefficients of ˜ jk(h) by thresholding with threshold λ(h) : (h) (h) α˜ jk = λ(h) Tk Rjk Tj . • Get the new estimate of jk by reconstructing from the selected coefficients α˜ (h) jk :
˜ jk(h) = k α˜ (h) k j .
3. Decrease the threshold λ(h) following a given strategy.
1. The Thresholding Strategy In a previous work (Bobin et al., 2007) we proposed a thresholding strategy that is likely to provide the solution to the 0 -sparse monochannel problem. The strategy, termed MOM (Mean of Max) can be extended to the multichannel case. At each iteration h the residual is projected onto each subdictionary and we define: (h−1)
mjk
O ⎛ ⎞ O O O O T O (h−1) TO O ⎝ ⎠ = Ok X − q α˜ pq p j O . O O p, q
(34)
∞
The multichannel MOM (mMOM) threshold is then computed as the mean (h−1) of the two largest values in the set {mjk }j=1,...,D; k=1,...,D
λ(h) =
1 (h−1) (h−1) . mj k + mj k 0 0 1 1 2
(35)
The next section shows conditions under which mMCA/mMOM selects atoms without error and converges asymptotically to the solution of the multichannel 0 -sparse recovery problem in Eq. (20).
242
Jerome Bobin et al.
D. Recovering Sparse Multichannel Decompositions Using mMCA The mMOM rule defined in Eqs. (34) and (35) is such that mMCA will select, at each iteration, atoms belonging to the same subdictionary jk = k ⊗ j . Although it seems more computationally demanding, the mMOM strategy has several nice properties. We show sufficient conditions under which (1) mMCA/mMOM selects atoms belonging to the active atom set of the solution of the 0 -sparse recovery problem (exact selection property), and (2) mMCA/mMOM converges exponentially to X and its sparsest representation in . The mMCA/mMOM exhibits an autostopping behavior and requires only one parameter λmin , whose choice is easy and discussed in Section IV.E. The next proposition states that mMCA/mMOM verifies the exact selection property at each iteration. Proposition 1 (Exact Selection Property). Suppose that X is K-sparse such that:
X=
D D
αjk [i]ψjk [i],
j=1 k=1 i∈(jk )
where K =
j, k
/ 0 Card (jk ) satisfying K