ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 109
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES/L,uboratoire d’ Optique Electronique du Centre Natiord de la Recherche Scientifique Toulouse, France
ASSOCIATE EDITORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdonz
Advances in
Imaging and Electron Physics EDITED B Y
PETER W. HAWKES CEMES/Lahorutoirc d’Optique Electronique du Centre Nutiond cle la Recherche Scientifique Toirlouse, France
VOLUME 109
@
ACADEMIC PRESS A Harcourt Science and Technology Company
San Diego
San Francisco New York London Sydney Tokyo
Boston
This book is printed on acid-free paper. @ Copyright 0 1999 by Academic Press All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per-copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-1998 chapters are as shown on the title pages: if no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/99 $30.00 ACADEMIC PRESS A Harcourt Science and Technology Company 525 B Street, 1900, San Diego, California 92101-4495, USA http://wuw.apnet.com Academic Press 24-28 Oval Road, London NW1 7DX, UK http://www.hbuk.co.uWap/ International Standard Serial Number: 1076-5670 International Standard Book Number: 0-12-014751-3 Typeset by Laser Words, Madras, India Printed in the United States of America 9 9 0 0 0 1 0 2 0 3 B B 9 8 7 6 5 4 3 2 1
CONTENTS
CONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . PKEFACE. . . . . . . . . . . . . . . . . . . . . . . . . FORTHCOMING CONIRIHUTOKS . . . . . . . . . . . . . . . . . . .
ix xi ...
Xlll
Development and Applications of a New Deep Level Transient Spectroscopy Method and New Averaging Techniques P L A M ~V.~ N KoI.~.vA N I I M. JAMAL. DEEN I. Introduction . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
2
11. Review of the Deep-Level Transient Spectroscopy Method . . . . .
6
I l l . Averaging and Recording of Digital DLTS Transient Signals . . . . IV. Feedback Circuits and Experimental Setup for CC-DLTS and CR-DLTS V. Constant-Resistance DLTS in Enhancement Mode MOSFETs . . . , nce DLTS i n Depletion Mode MOSFETs . . . . . VII. Constant-Resistance DLTS in Junction Field-Effect Transistors . . . VIII. Conclusions and Areas for Future Research . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . Appendix A: Magnitude Errors . . . . . . . . . . . . . . . Appendix B: Time Constant Errors . . . . . . . . . . . . . Appendix C: Noise Sources and Signal-to-Noise Ratio (SNR) in the DLTS Transients . . . . . . . . . . . . . . . . . . . . Appendix D: Electrical Circuit of the Pseudo-Logarithmic Generator . Appendix E: Electrical Circuits of the Feedback Circuit . . . . . . Appendix F: Listing of a Template for a DLTS Measurement Program Appendix G: Listing of a Template for a DLTS Analysis Program . . Appendix H: Radiation-Induced Defecls in Silicon . . . . . . . . List of Acronyms , . . . . . . . . . . . . . . . . . . . List of Symbols . . . . . . . . . . . . . . . . . . . .
39 61
72 83 96 1 I9 122 135 140 141 145 146 148 Is2 155 156 158
Complex Dyadic Multiresolution Analyses J:M. LINA, P. T~JRCOTTL A N 0 B. GO[ll A K D I. Introduction . . . . . . . . . . . . . . . . . . . . . . 11. The Spline Example . . . . . . . . . . . . . . . . . . . 111. Multiresolution and Wavelet . . . . . . . . . . . . . . . . V
163 164 I67
vi
CONTENTS
IV . Daubechies' Wavelets . . . . . . . . . . . . . . . . . .
i70
V . Symmetric Daubechies Wavelets . . . . . . . . . . . . . .
175
VI . The Phase of SDW Scaling Function . . . . . . . . . . . . . VII . The Mallat Algorithm with Complex Filters . . . . . . . . . .
176 178
VIII . Restoration from the Phase . . . . . . . . . . . . . . . .
180
1X. Image Enhancement . . . . . . . . . . . . . . . . . . . X . Complex Shrinkage . . . . . . . . . . . . . . . . . . .
183 187
XI . Conclusion . . . . . . . . . . . . . . . . . . . . . .
193
References . . . . . . . . . . . . . . . . . . . . . .
196
. . . . . . . . . . . . . . . . . . . . .
197
Reading List
Lattice Vector Quantization for Wavelet-Based Image Coding MlKHAlL SHNAIDLR A N D A N D R ~P W . PAPIJNSKI I. I1 . 111. IV .
Introduction . . . . . . . . . . . . . . . . . . . . . . Quantiation of Wavelet Coefficients . . . . . . . . . . . . . Lattice Quantization Fundamentals . . . . . . . . . . . . . Lattices . . . . . . . . . . . . . . . . . . . . . . .
V . Quantization Algorithms for Selected Lattices . . . . . . . . . VI . Counting the Lattice Points . . . . . . . . . . . . . . . . V11. Scaling Algorithm
. . . . . . . . . . . . . . . . . . .
VIII . Selecting a Lattice for Quantization . . . . . . . . . . . . .
IX . Entropy Coding of Lattice Vectors . . . . . . . . . . . . . . X . Experimental Results
. . . . . . . . . . . . . XI . Conclusions . . . . . . . . . . . . . . . . . . Appendix A: Cartan Matrices of Some Root Systems . . References . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . . . .
200 201 213
218 229 233 242 244 250 254 258 260 261
I~uzzyCellular Neural Networks and Their Applications to Image Processing TAUYANG I . Introduction . . . . . . . . . . . . . . . . . . . . . . 11. Fuzzy Cellular Neural Networks
. . . . . . . . . . . . . . 111. Theory of Fuzzy Cellular Neural Networks . . . . . . . . . . IV . FCNN as Computational Arrays . . . . . . . . . . . . . .
266 275 290 327
CONTENTS
Vii
V . Embed Linguistic Statements into FCNN . . . . . . . . . . . VI . Learning Algorithms of FCNN . . . . . . VII . Generic Algorithm for FCNN . . . . . . VIII . Applications of Discrete-Time FCNN . . . 1X . Conclusions and Future Work . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
360 37.5 401 407 420
References . . . . . . . . . . . . . . . . . . . . . .
427
.
447
Index . . . . . . . . .
This Page Intentionally Left Blank
CONTRIBUTORS
Numbers in parentheses indicate the pages on which the author’s contribution begins
M. JAMAL DEEN( l ) , School of Engineering Science, Simon Fraser University, Vancouver, British Columbia, Canada V5A 1 S6; Department of Electrical and Computer Engineering, McMaster University, Hamilton, Ontario, Canada L8S 4K1. BEKNARD GOULARD (163), Network for Computing and Mathematical Modeling, Centre de Recherches Mathematiques, Univ. de Montreal C.P. 6 128 Succ. Centre-Ville, Montreal, Quebec H3C 357, Canada PLAMEN Ko1.w ( I ) , School of Engineering Science, Simon Fraser University, Vancouver, British Columbia, Canada V5A I S6 JEAN-MARC LINA(l63), Network for Computing and Mathematical Modeling, Centre de Recherches Mathematiques, Univ. de Montreal C.P. 6128 Succ. Centre-Ville, Montreal, Quebec H3C 357, Canada ANDREW P. PAPLINSKI (199), School of Computer Science and Software Engineering, Monash University, Australia
MIKHAIL SHNAIDER ( I99), Motorola Australian Research Centre PAu TURCOTTE ( 163), Network for Computing and Mathematical Modeling, Centre de Recherches Mathematiques, Univ. de Montreal C.P. 6128 Succ. Centre-Ville, Montreal, Quebec H3C 357, Canada
TAOYANG(265), Electronics Research Laboratory and Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA 94720, U.S.A.
ix
This Page Intentionally Left Blank
PREFACE The themes of the four substantial contributions to this volume are all new to the series, though some aspects of image coding, which is the subject of the third chapter, have been examined here in the past. We begin with an account of a new method of deep-level transient spectroscopy and a related averaging technique developed by P.V. Kolev and M.J.Deen. These are extremely important for the design of semiconductor devices, the properties of which are influenced by deep-seated imperfections in the crystal structure-impurities, lattice defects and interactions between these in particular. Deep-level transient spectrosopy is a sensitive method for investigating these deep-lying defects. In the past, it has not been universally accepted but the authors have made significant progress and they argue convincingly that results obtained by this technique are reliable. Their chapter, which is long enough to be regarded as a monograph on the subject, covers the subject very fully, starting with a review of the technique and continuing with a detailed description of signal processing and the new approach. It is then applied to enhancement-mode and depletion-mode MOSFETs and to junction field-effect transistors. Specialized topics are examined in eight appendixes. The next two contributions are concerned with different aspects of wavelets. The first, by J.-M. Lina, P. Turcotte, and B. Goulard, considers certain questions of the highest interest, both practical and intellectual: What use are complex wavelet.. and what is the role of symmetry? Wavelets have been known and studied (though not under that name) for more than a century, but for many decades they were little more than a mathematical curiosity; the earlier books on image processing pointed out the attractive features of the Haar functions, for example, but they were rarely used in practice. With the work of Ingrid Daubechies in particular, the utility of a whole class of such functions was finally recognized and this chapter records and explains an extension to the theory, in which compactness of the support, orthogonality and symmetry are rendered compatible by the introduction of complex-valued scaling functions. The role of the phase thereby introduced is examined carefully. This is a very fascinating development and I am delighted that the authors agreed to explain it in the pages of AIEP. The second contribution on wavelets, by M. Shnaider and A.P. Paplihski, deals with the use of wavelets in vector coding. This type of coding, in which each codeword corresponds to a set of signal values (or image grey-levels) and not to a single value or grey-level), suffers from the rapid growth of the code-book as the size of the set increases. Such coding is, however, extremely
xi
xii
PREFACE
efficient and a considerable amount of research continues to be devoted to finding ways of circumventing this handicap. Lattice theory is a very promising approach, and in this chapter the authors present very fully the necessary background knowledge and show that the combination of lattice quantization and wavelets is indeed highly effective. The final contribution, which is again long enough to be regarded as a monograph on its subject, describes fuzzy cellular neural networks and their use in image processing. The controlled imprecision of fuzzy set theory, even if some probability theorists deny the need for a new terminology, has generated much new thinking, in control engineering for example, and also in signal and image processing. Here, the theory of fuzzy cellular neural networks is presented in great detail. The connection between such ideas and mathematical morphology is explored and applications in image processing are presented. This very full account of a fairly new aspect of image processing will, I hope, be widely used and in so fast-moving a research area, doubtless need to be complemeted by a new contribution on the subject in a few years time. As usual, I thank the authors very sincerely, on behalf of all readers of these volumes, for agreeing to present their ideas in a way that makes them accessible to a wide range of non-specialists and I conclude with a list of chapters promised for future volumes. Peter W. Hawkes
FORTHCOMING CONTRlIlUTlONS
L. Alvarez Leon and J. -M. Morel (vol. 1 1 I ) Mathematical models for natural images D. Antzoulatos Use of the hypermatrix W. Bacsa (vol. 110) Interference scanning optical probe microscopy
N. D. Black, R. Millar, M. Kunt, F. Ziliani and M. Reid Second generation image coding N. Bonnet Artificial intelligence and pattern recognition in microscope image processing G. Borgefors Distance transforms A. van den Bos and A. Dekker Resolution 0. Bostanjoglo (vol. 110) High-speed electron microscopy S. Boussakta and A. G. J. Holt (vol. 1 1 I ) Number-theoretic transforms and image processing P. G. Casazza Frames
J. A. Dayton Microwave tubes in space
E. R. Dougherty and D. Sinha Fuzzy morphology J. M. H. Du Buf Gabor filters and texture analysis
R. G. Forbes Liquid metal ion sources
E. Forster and F. N. Chukhovsky X-ray optics ...
Xlll
xiv
FORTHCOMING CONTRIBUTIONS
A. Fox The critical-voltage effect
M. J. Fransen (vol. 1 1 1) The Z r O N Schottky emitter M. Gabbouj Stack filtering A. Gasteratos and I. Andreadis (vol. 110) Soft morphology W. C. Henneberger (vol. 112) The Aharonov-Bohm effect
M. I. Herrera and L. Bru The development of electron microscopy in Spain K. Ishizuka Contrast transfer and crystal images C. Jeffries Conservation laws in electromagnetics M. Jourlin and J. -C. Pinoli Logarithmic image processing E. Kasper Numerical methods in particle optics A. Khursheed Scanning electron microscope design
G. Kogel Positron microscopy K. Koike Spin-polarized SEM
W. Krakow Sideband imaging A. van de Laak-Tijssen, E. Coets and T. Mulvey Memoir of J. B. Le Poole
L. J. Latecki Well-composed sets C. Mattiussi The finite volume, finite element and finite difference methods
FORTHCOMING CONTRIBUTIONS
S. Mikoshiba and F. L. Curzon Plasma displays
R. L. Morris Electronic tools in parapsychology J. G. Nagy Restoration of images with space-variant blur
P. D. Nellist and S. J. Pennycook Z-contrast in the STEM and its applications
M. A. O’Keefe Electron image simulation
G. Nemes Phase-space treatment of photon beams B. Olstad Representation of image operators M. Omote and S. Sakoda (vol. 110) Aharonov-Bohm scattering
C. Passow Geometric methods of treating energy transport phenomena E. Petajan HDTV
F. A. Ponce Nitride semiconductors for high-brightness blue and green light emission
J. W. Rabalais Scattering and recoil imaging and spectrometry
H. Rauch The wave-particle dualism D. Saldin Electron holography G. E. Sarty (vol. 1 11) Reconstruction from non-Cartesian grids G. Schmahl X-ray microscopy J. P. F. Sellschop Accelerator mass spectroscopy
xv
xvi
FORTHCOMING CONTRIBUTIONS
S. Shirai CRT gun design methods
T. Soma Focus-deflection systems and their applications
I. Talmon Study of complex fluids by transmission electron microscopy S. Tari (vol. 1 1 I ) Shape skeletons and greyscale images J. Toulouse New developments in ferroelectrics
T. Tsutsui and Z. Dechun Organic electroluminescence, materials and devices
Y. Uchikawa Electron gun optics
D. van Dyck Very high resolution electron microscopy J. S. Villarrubia Mathematical morphology and scanned probe microscopy L. Vincent Morphology on graphs
N. White Multi-photon microscopy J. B. Wilburn Generaked ranked-order filters
C. D. Wright and E. W. Hill Magnetic force microscopy
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 109
This Page Intentionally Left Blank
.
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOI . 109
Development and Applications of a New Deep Level Transient Spectroscopy Method and New Averaging Techniques PLAMEN V . KOLEV AND M . JAMAL DEEN
.
School of’ Etigineering Scic.nc.e Sitnon Frciser Univur.~itv.Witicouv~.r. British CCJ/Mtllhlri. ccinrrdei V5A 1% Dc.prirtrneni of’ Electriid cind Cornlnrter Engineering. McMcisfer Uniwrsiry. Haniilrori. Ontririo. Crinrrcki LKS 4 K I
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . A . Importance of Impurity Characterization . . . . . . . . . . . . B . Deep-Level Transient Spectroscopy . . . . . . . . . . . . . . C . Goals of This Article . . . . . . . . . . . . . . . . . . . D . Organization of the Ailicle . . . . . . . . . . . . . . . . . I1. Review of the Deep-Level Transient Spectroscopy Method . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . €3. Defects, Impurities and Energy Levels . . . . . . . . . . . . . C . Generation-Recoinbination Statistics . . . . . . . . . . . . . . D . Detection of the Emission of the Trapped Charge . . . . . . . . . E . Determination of the Deep Level Parameters . . . . . . . . . . . F . Conventional Deep-Level Transient Spectroscopy (DLTS) . . . . . . . G . Main Stages in the DLTS Experiment . . . . . . . . . . . . . H . A New Classification Scheme for DLTS Methods . . . . . . . . . 1. Other Methods and Comparisons with DLTS . . . . . . . . . . . 111. Averaging and Recording of Digital DLTS Transient Signals . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . B . Technical Overview of the DLTS Experiment . . . . . . . . . . . C . Pseudo-Logarithmic Averaging . . . . . . . . . . . . . . . . D . Continuous Time Averaging . . . . . . . . . . . . . . . . E. Applications . . . . . . . . . . . . . . . . . . . . . . F . Conclusions . . . . . . . . . . . . . . . . . . . . . . IV . Feedback Circuits and Experimental Setup for CC-DLTS and CR-DLTS . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . B . Feedback Circuit and Details on the Setup for CC-DLTS . . . . . . . C . Experimental Setup for CR-DLTS . . . . . . . . . . . . . . D . Illustrations . . . . . . . . . . . . . . . . . . . . . . E . Conclusions . . . . . . . . . . . . . . . . . . . . . . nce DLTS in Enhancement Mode MOSFETs . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . B . Overview of the DLTS Techniques Applied to FETs . . . . . . . .
2 2 3 4 5 6 6
7 II IS
21 24 25 34 35 39 39 39 43 52 55 60 61 61 63 65 61
72 72 72 73
1 Viilurnc 109 ISBN 0-1? ~ 0 1 4 7 5 1 ~ 3
ADVANCES I N IMAGING AND IiLECTIION I’HYSICS Copyright 0 1999 h y Acadcinic Prc% All right* of rcproductioii ti1 m y t o w reaervcd. ISSN 1117h-Sh70/99 $ 3 0 00
2
PLAMEN V . KOLEV AND M . JAMAL DEEN
C . Theoretical Background . . . . . . . . . . . . . . . . . . D . Experimental Results . . . . . . . . . . . . . . . . . . . E . Conclusions . . . . . . . . . . . . . . . . . . . . . . VI . Constant-Resistance DLTS in Depletion Mode MOSFETs . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . B . Theoretical Background . . . . . . . . . . . . . . . . . . C . Experimental Results and Discussions . . . . . . . . . . . . . D . Conclusions . . . . . . . . . . . . . . . . . . . . . . nce DLTS in Junction Field-Effect Transistors . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . B. Theoretical Background . . . . . . . . . . . . . . . . . . C . Experimental Results and Discussion . . . . . . . . . . . . . D . Conclusions . . . . . . . . . . . . . . . . . . . . . . VIII . Conclusions and Areas for Future Research . . . . . . . . . . . . . A . Conclusions . . . . . . . . . . . . . . . . . . . . . . B . Areas lor Future Research . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . Appendix A: Magnitude Errors . . . . . . . . . . . . . . . . . Appendix B: Time Constant Errors . . . . . . . . . . . . . . . Appendix C: Noise Sources and Signal-to-Noise Ratio (SNR) in the DLTS Transients . . . . . . . . . . . . . . . . . . . . . . . . C . I . Noise Sources . . . . . . . . . . . . . . . . . . . C.2. Noise Sources in CT-. CC-. and CR-DLTS Transients . . . . . . C.3. The Role of the Averaging Techniques for SNR Improvement . . . Appendix D: Electrical Circuit of the Pseudo-Logarithmic Generator . . . . Appendix E: Electrical Circuits of the Feedback Circuit . . . . . . . . Appendix F: Listing of a Template for a DLTS Measurement Program . . . . Appendix G: Listing of a Template for a DLTS Analysis Program . . . . . Appendix H: Kadiation-Induced Defects i n Silicon . . . . . . . . . . List of Acronyms . . . . . . . . . . . . . . . . . . . . . List of Symbols . . . . . . . . . . . . . . . . . . . . . .
75 71 83 83 83 84 87 95 96 96 97 99 118
119 119 120
122 135 140 141
141 143 144 145 146 148 153155 156 158
I . INTRODUCTION A . Importance of Inipurity Cliuructerization
The rapid advances in semiconductor technology during the last few decades are closely related to the success in achieving significant increase in semiconductor material purity (Sze. 1983)'. The ability to detect and measure the properties of a very small amount of impurity atoms of structural defects in the semiconductor material and in the active regions of semiconductor devices I
References are listed at the end of this article i n alphabetical order .
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
3
is of fundamental importance for this progress. The electrical properties of these impurities or defects are of particular interest for both the performance (Kwan and Deen, 1998; Raychaudhuri er al., 1996a, b; Kwan et al., 1997) and reliability (Kwan et al., 1996; Raychaudhuri et d., I996b; Deen and Raychaudhuri, 1994; Deen and Quon, 1991) of semiconductor devices (Graff, 1995; Pantelides, 1992; Deen and Zhu, 1993e; Sze, 1983, Zhu et al, 1992). Shallow impurities in semiconductors generally contribute extra free carriers, electrons, or holes. By intentionally incorporating shallow impurities in the semiconductor material, the type and magnitude of the conductivity of the material is controlled. The properties of the shallow impurities related to the host semiconductor material are considered to be well understood. Still, even for the best-known semiconductor, silicon, some details of the interaction of the shallow impurities with the host atoms were only recently found (Karasyuk et a/., 1994). Other imperfections in the crystal structure, such as other impurity atoms, lattice point defects and impurity-defect complexes, are referred to as deep centers. Their role is primarily to control the generation, recombination, and lifetime of the current carriers. Despite the significant progress in the last two decades, deep centers have proven to be far more difficult to investigate than shallow impurities. In many cases, the physical nature of the center causing the appearance of a deep level is poorly understood or unknown (Pantelides, 1992).
B. DeepLevel Trmsient Spectroscopy
Deep-level transient spectroscopy (DLTS) (Lang, 1974a) is a well-established research technique for characterization of electrically active centers deep in the semiconductor bandgap (Blood and Orton, 1992; Schroder, 1990). It is known for its high sensitivity and direct relation to the measured properties of the defects. In the last two decades, many variations of the method were developed and adapted for studying the defect properties of a variety of materials and devices. Still, new modifications and further improvements of the already existing DLTS techniques continue to be reported (Anand et al., 1992; Bosetti et al., 1995; Chretien e r a / . , 1995; Doolittle and Rohatgi, 1992; Hacke and Okushi, 1997; Istratov and Vyvenko, 1995; Lossen et a/., 1996; Martin, 1995; Ozder et al., 1996; Rancour, 1995; Shaban, 1996). Despite the large variety of modifications, the method is not yet accepted as a standard characterization technique in the semiconductor industry. There are many reasons for this limited acceptance. First, in order to extract the properties of the traps, there is need for a wide variation in sample temperature. Second, a standard describing the settings and parameters of the measurement
4
PLAMEN V . KOLEV AND M. JAMAL DEEN
instrumentation has not been established. Third, the wide variety of test structures and the dependence of'the signal magnitude on the size of the test device prevents the establishment of a standardized approach that is convenient for industrial applications. With the continued scaling-down in device dimensions in semiconductor integrated circuits and the emergence of new technologies, such as SiGe heterojunction bipolar technology, silicon-on-insulator (Sol), porous silicon, thin-film transistors, or copper metallization for VLSl and ULSI, the importance of accurate measurement and control of the defects that introduce deep levels will progressively increase. Therefore, steps toward further improvement and refinement of the DLTS method and instrumentation have important practical applications.
C'. Goals of This Chapter This chapter first describes new and improved digital techniques for transient data processing that offer better sensitivity and more effective data storage. This approach is applicable for all variations of DLTS and can be easily adapted to other experiments involving recording and analysis of transient signals. Also, it opens up opportunities for further development of the group of isothermal DLTS techniques that rely on the analysis of the transient decay for extraction of the characteristic time constant and not on the thermal scan. Second, a new feedback circuit is described that allows for fast and sensitive operation in one very attractive and technically challenging variation of the method-constant capacitance (CC-)DLTS. It is important to note that this variation produces a signal with a magnitude that is independent of the area of the device and, therefore, the method is better suited for routine parametric control in the semiconductor industry. Third, a new technique, termed constant-resistance (CR-)DLTS is presented. This new technique is well-suited for measurements of field-effect transistors (FETs) regardless of their size and without compromise to the sensitivity of the measurement. Unlike CC-DLTS, because the sensitivity is dependent of the gain of the transistor (thus. of the channel width-to-length ratio and not of the active area), it allows for sensitive measurement of very small, deep-submicron devices, which are the basic transistors in the advanced microelectronics circuits and systems. For corroboration of this technique, the results are compared to those obtained from conventional DLTS and CC-DLTS measurements. While the method has been applied to three different types of field-effect transistors, it can be easily used for a wider range of FETs. Illustrations are made with measurement of proton- and neutron-induced damages in metal-oxidesemiconductor (MOS) FETs and silicon and germanium junction FETs. The possibilities for measurement of interface trap density in the active interface
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
5
of regular MOSFETs and for defect profiling using the new technique are also demonstrated. In this chapter the emphasis is on the development of semiconductor metrology and instrumentation using mixed analog, digital and software engineering. The experimental results are used mainly for illustration of the new techniques and are not self-contained and complete studies. D. Orgcrnizution of' the Chupter
In Sect. 11, the basics of the DLTS method are introduced and various techniques and instrumentation are described. The existing large variety of DLTS makes any attempt at classification a very complex task. Nevertheless, an attempt to define the criteria that can be used for classification of the DLTS techniques is made. Following these criteria, a classification scheme is demonstrated with examples using well-known and less popular DLTS techniques. One potential benefit of this classification is the identification of techniques and conditions that allows the reader to quickly tailor the setup to fit the specific properties of the device or material under investigation. In Sect. 111, two complementary digital signal processing techniques that are well suited for DLTS applications are presented. These techniques are new in the processing of digitized DLTS transients and can be used in virtually any DLTS experiment. Furthermore, their application can be easily extended to the data processing in other physical experiments that produce a transient signal. A mathematical model is presented and error analysis is made of pseudologarithmic averaging, a technique which is less well-known in digital signal processing. Because of the substantial increase of signal-to-noise ratio (SNR) and efficient data reduction, these techniques may increase interest in the isothermal DLTS techniques (Akita et al., 1993; Kim et al., 1993; Kiyota et al., 1992; Okushi and Tokumaru, 1980, 1981; Yoshida et al., 1993). The latest developments in the feedback circuit used for CC-DLTS are presented in Sect. IV. This improved feedback circuit is essential for the successful implementation of the new CR-DLTS technique. The speed of the feedback is demonstrated by comparison of recorded traces from standard capacitance-transient DLTS, CC- and CR-DLTS. Guidelines are given for design of the feedback circuit and its setup during CC- and CR-DLTS measurements. The sensitivity achieved using the feedback circuit is demonstrated with measurements of interface-trap density of submicron MOSFETs. A description of the new CR-DLTS technique is given in Sect. V. This new technique is similar to the conductance DLTS, but it is more sensitive and does not require simultaneous measurement of the transconductance g,,,, or surface mobility ps of the transistor for calculation of the trap concentrations. An important advantage is that the DLTS signal is independent of the transistor
6
PLAMEN V. KOLEV AND M. JAMAL DEEN
size, which allows for measurements of very small transistors. In this chapter, the technique is demonstrated with measurements of submicron enhancementmode MOSFETs. The body effect on CR-DLTS is demonstrated and CR-DLTS is compared with CC-DLTS by using a multi-transistor structure containing 400 transistors connected in parallel. In Sect. VI, the new CR-DLTS technique is demonstrated with measurements of radiation-induced traps in buried channel MOSFETs, which are used as CCD output amplifiers. These devices exhibit a unique structure that offers extended opportunities for studying the spatial distribution of the defects. In addition to the normal front-gate mode of operation, the back-gate mode of operation is demonstrated, and this mode is applicable for studying the channel-substrate p-n junction. The results are compared with CC-DLTS data. Complementary measurements using front-gate and back-gate operation of CR-DLTS can be useful to resolve the difficulties in the analysis of DLTS data from structures with symmetrical p-iz junctions. In Sect. VII of this article, the CR-DLTS technique is applied to study virgin and fast neutron irradiated junction field-effect transistors (JFETs). The technique is demonstrated with measurements of three groups of devices: commercially available discrete silicon JFETs; virgin and high-level neutron-irradiated silicon JFETs made by a specific monolithic technology (Citterio et ul., 1995); and commercially available discrete germanium ychannel JFETs. The CR-DLTS is demonstrated to be a very sensitive and area independent technique applicable for measurement of a wide range of deep level defect concentrations. Comparisons are made with the CC-DLTS and standard capacitance DLTS. Possibilities for defect profiling in the channel are also demonstrated. Section VIII summarizes the accomplishments described in the chapter and recommends directions for future work. It also gives references to follow in future research in this field and proposes some attractive applications of the developed system. At the end of the chapter, an extended list with references and appendixes, which include schematics of electronic blocks, program listings and mathematical transformations, is provided for readers.
11. REVIEW OF THE DEEP-LEVEL TRANSIENT SPECTROSCOPY METHOD A. Introduction
In this section, we review the basics of deep-level transient spectroscopy (DLTS), including methods for DLTS signal detection and data processing. First, the kinds of imperfections that can be studied with DLTS are described
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
7
and some important results of the generation-recombination statistics are reviewed. Then, several detection techniques, based on the effect of trapped charge on measurable electrical parameters, are presented. Next, the standard capacitance-transient DLTS method is introduced. This is followed by a general description of the main stages in DLTS experiments. This definition of the stages in DLTS helps the reader to understand better the options for design of an experiment and serves as the basis for a classification scheme. Section I1 concludes with a demonstration of how a given technique could be classified according to this new classification scheme; a few examples are given for illustration. B. Defects, Impurities and Energy Levels
1. Shallow Impurities The explosive growth of the electronic industry during the last few decades is based on a significant increase in accumulated experience in manipulating the properties of crystalline semiconductors. This manipulation is done by tight control of the purity and perfection of the crystal lattice and intentional incorporation of a t o m extrinsic to the host semiconductor material. Shallow impurities in semiconductors introduce minor perturbations in the lattice, creating bound states in the bandgap of the host material very close to the band edges. They generally contribute extra charge carriers -electron or holes. The primary role of intentionally incorporating shallow impurities in the semiconductor material is to control the type and magnitude of the conductivity in the material. The ionization energy E,!,is the amount of energy that the foreign atom needs to release a free current carrier in the host material. It is clear that for conductivity due primarily to the extrinsic atoms, this ionization energy must be less than kT ( k - Boltzmann constant; T - absolute temperature) or much smaller than the bandgap energy E , of the host material, which determines the amount of intrinsic free carriers at a given temperature. For example, if electron conduction is required, the extrinsic atom must “donate” an electron and therefore is called a donor. Similarly, if hole conduction is required, the extrinsic atom must “accept” an electron from the host material, thus creating a free hole, and the extrinsic atom is called an acceptor. In Table I are listed some experimentally determined (using thermal and optical methods) ionization energies of shallow impurities in silicon and germanium (Kohn, 1957). As Si and Ge are group IV on the periodic table, most donors are from group V, which substitute for the Si or Ge atoms and have one remaining electron in the upper valence shell from which an electron can be easily released to contribute to electron conduction. Similarly, the elements from group 111 are acceptors, as they require an extra electron to
8
PLAMEN V. KOLEV AND M. JAMAL DEEN TABLE 1 IONIZATION
ENERGIES OF SHALLOW
LMPURITIES IN
E,,, in Si l e v ] Impurity element
Donors Li P As Sb Bi Acceptors B Al Ga In
si AND Ge
E , / , in Ge [eVl
Thermal
Optical
0.033 0.044 0.049 0.039 0.069
0.045 0.053 0.043
0.0120 0.0127 0.0096
0.04s 0.057 0.065 0.16
0.046 0.067 0.071 0.154
0.0 104 0.0 I02 0.0108 0.01 12
-
Thermal
~
complete the upper valence shell and taking this electron from the lattice, they will produce a free hole. In general, the presence of shallow-level impurities is well understood: the potential in the crystal lattice is essentially Coulomb-like and the impurity states are very closely related to the states of the hydrogen atom (Milnes, 1973; Pantelides, 1992). The properties of the shallow impurities related to the host semiconductor material were essentially determined both theoretically (effective-mass theory) and experimentally, using mainly optical absorption techniques, by the end of the 1950s. However, even for the best-known semiconductor, silicon, details are still being revealed (Karasyuk, 1994). 2. Deep-Level Stotes: Electron, Hole Trups and Recombination Centers Other impurities and point lattice defects such as vacancies, self-interstitials, anti-site defects etc., induce more severe local perturbations of the potential in the crystal lattice, creating bond states more localized and deeper in the bandgap. Imperfections in the crystal structure, such as impurity atoms, lattice points defects and impurity-defect complexes, are referred to as deep centers (Jaros, 1982; Milnes, 1973; Pantelides, 1992; Sze, 1983). As opposed to shallow impurities, they act primarily as charge carrier traps or recombination centers and, thus, they control the lifetime of current carriers. When a center in the forbidden energy gap interacts mainly with electrons from the conduction band, it is defined as an electron trup; in the case in which the center interacts mainly with holes in the valence band, it is defined as a hole trup; in the case in which both types of carriers can interact with the center, it is called a recombination center (see Fig. 1 and Miller et ul., 1977; Sah et al., 1970; Schroder, 1990; Sze, 1983). Since the probability for electron or hole emission
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
9
FIGUREI . Illustration of relaxation processes; a) direct bmdgap recombination; b) relaxation pmcesses involving a deep level.
is exponentially dependent on the energy separation to the corresponding conduction or valence band, the most effective recombination centers are those with energy levels around the midgap that have nearly equal probabilities for electron or hole emission. For the same reason, at relatively low temperatures, deep levels above the midgap usually act as electron traps (or donor-like) and those below the midgap as hole traps (acceptor-like) (Jaros, 1982; Milnes, 1973; Schroder, 1990; Sze, 1983). However, there are many exceptions to this rule at higher temperatures. Deep centers have proven to be far more difficult to investigate than shallow impurities. In many cases the physical nature of the center causing the appearance of a deep level is poorly understood or unknown (Pantelides, 1992). As a common characteristic,bulk-induced defects exhibit one or more well-defined discrete energy levels and often are called deep levels2 (Jaros, 1982; Li and Sah, 1982b; Milnes, 1973; Pantelides, 1992; Pearton et al., 1992; Schrober, Here and elsewhere, we use the short form deep levels to stand for deep level miters.
10
PLAMEN V. KOLEV AND M. JAMAL DEEN
1990; Sze, 1983). In contrast, surface related defects usually generate a continuum of energy levels spread over the entire bandgap (Blood and Orton, 1992; Nicolian and Brews, 1982; Klausmann, 1981; Sze, 1983) and are commonly referred to as ititeqhce t r q u . In addition to the shallow impurities and deep centers, the semiconductor lattice may contain extended defects, such as dislocations, stacking faults, precipitates, and grain boundaries. In most cases, they behave electrically as large-concentration deep centers and, in general, crystals free of dislocations are desired for electronic and optoelectronic applications (Pantelides, 1992).
3. I$ects of the Deep Centers on Device Perjormance As previously mentioned, deep levels control primarily the lifetime of the excess charge carriers. In devices where long carrier lifetime and stability are desired, deep centers usually are unintentionally incorporated and have a negative effect. All of the following may be attributed to deep centers: leakage currents in p-n junctions (Chen et al. 1984); Schottky barriers and related devices (Milnes, 1973); minority carrier lifetime degradation (Hamilton er LII., 1979); charge losses in CCD cells (Murowinski et a/., 1995) or DRAM capacitors; reduced efficiency and degradation of solar cells (Rohatgi, 199 I ; Rohatgi et d., 1993; Schott et a/., 1980) and high electron mobility transistors (HEMTs) (Meneghesso et ail., 1996); contact quality (Pantelides, 1992); performance degradation in avalanche photodiodes (Zhao et al., 1996; Ma et nl., 1994); resonant tunneling diodes (Deen, 1993a, b; Ng et a/., 1991; Ma er a/., 1992); and bipolar transistors and low-frequency noise (Chen et a / , I998a, b; Deen et d.,199Sa, b; Ng er d., 1992; Murowinski e t a / . , 1993a, b; Doan et ul, 1997). In VLSI technology, as the dimensions shrink and die area increases, defect density must be decreased appropriately and, thus, very low deep-center concentrations must be detected and eliminated (Rancour, 1995). Furthermore, the new technology trends are for using deep trenches filled with a dielectric that has quite different mechanical properties from those of the host semiconductor. As a re:wlt, consecutive thermal treatment may generate stress-induced lattice defects (Pantelides, 1992). Annealing of these defects (Johnson and Herring, 1991) is not a simple task, because another important requirement, the need to keep the p-12 junctions shallow, calls for lowering the temperature and shortening the time of the thermal processing. In a number of models, native defects at the interface were recently attributed to the Fermi-level pinning observed at Schottky contacts (Pantelides, 1992). For optoelectronic devices, nonradiative transitions through deep levels may substitute for the radiative transitions, thus degrading photon yield by up to a few orders of magnitude (Pearton et d . , 1992). Therefore, deep-level control is of particular importance for III-V devices and especially for solid-state
DEVELOPMENT A N D APPLICATIONS OF A NEW DLTS METHOD
I1
lasers, such as GaAs/AICaAs lasers (Lang, 1989). Recent success in the development of SiGe hetero-junction devices is particularly exciting for the future of the silicon-based electronics- this may provide two properties missing in crystalline silicon microelectronics, but inherent in Ill-V devices: bandgap engineering and light emission. Because of the lattice mismatch between Si and Ge, thin Ge,Sil-, layers may generate a significant amount of deep levels, which can lead to a significant deterioration of some desired device properties, mainly carrier mobility (Lang, 1989). Studying deep levels is also important for optimization of Ill-V devices (Gotz et al., 1994; Gotz e t a / . , 199621, b, c). On the other hand, in devices where short lifetime is beneficial or recombination is light emitting, such as in fast switches or light-emitting diodes (LEDs), respectively, deep centers can be introduced intentionally. For example, nitrogen and oxygen introduced in GaP are responsible lor the radiative recombination in commercial LEDs, and gold-doping is used to reduce the switching time in fast Si-based bipolar devices (Pantelides, 1992). Another example of the possible benefits of deep levels is that of Cr-doping used to reduce the conductivity and obtain semi-insulating GaAs substrates. Nevertheless, in these technologies control of non-intentional contaminants is required, as they can impede the role ofthe desired impurity. In all cases, the ability to identify the deep centers and to measure their concentration, and sometimes their spatial distribution (impurity profile), is a necessary requirement. C. Generation-Recoinhiriation Stutisticx
I . PheriomerioloRical Overview In an ideal semiconductor material, there are no allowed energy states inside the bandgap. As already outlined, they are created by incorporation of shallow impurities and deep defect states. The latter are also referred to as traps, generation-recombination (G-R) centers, deep levels, deep centers. deep impurities, deep imperfections, etc. Although in many cases the use of a specific term is justified, this abundance of names might sometimes be confusing. Further complication arises from the fact that the same defect state can be a trap or a G-R center, depending on the temperature and the energy level in the bandgap. In Fig. 2 are defined electron and hole traps, and a recombination center (Miller et ul., 1977). The probability for a given transition is illustrated by the width of the arrow defining this process. From Fig. 2 is clear that the complete description of a particular defect does not define it only as a trap for electrons or holes but as a deep level with specific capture cross sections for electrons CT,~and for holes o ~ Then ,. the capture rate coefficients cIIand cll are defined as (Hall, 1952; Shockley and Read, 1952) =
(u,,)
and
('/1
= ol,(u,,)
(1)
13
PLAMEN V . KOLEV AND M. JAMAL DEEN
FI(;IIRE2. Definition of electron trap. hole trap, and recombination cciiIci-.
where ( u , ~and ) ( u / ) )dcnote the average thermal velocities for electron and holes, respectively. Considering that the capture rate itself equals the capture coefficient multiplied by the concentration of the free carriers, that is, c,!n for electrons. it is obvious that ;I given defect with specific capture cross sections nI, and mil can be a trap or recombination center depending on the free carrier concentrations. 2. Some Results ,from Shoc.lilc.\,-ReLid-Hull (SRH) Theor!, Let us consider the simple case in which only one kind of deep center exists in the material. A deep ccntcr may be occupied by an electron or hole. The
concentrations of deep centcrs occupied by electrons n I , and those occupied by holes 11,. must equal the total concentration of the deep centers A',, that is, N , = )it p i . When free electrons and holes are generated and trapped, the electron concentration in the conduction band 17, the hole concentration in the valence band p , n , , and p , are functions of time. Thc time rate of change of these concentrations is described by Shockley-Read-Hall (SRH) theory (Hall, 1952; Shockley and Read. 1052). The emission rute for electrons e,, represents the number of electrons, c n i i k d from charged deep cenkrs per second. Similar is the definition of the emission rate for holes el,. The ccrpture rutes c,, and cI, are defined in the preceding by using Eq. ( I ) . In gener;ll, the rate of change of n , or p i is described by nonlinear differential equations. Either of these equations can be linearized and solved in two cases: I ) in the reverse-biased space charge region, where the free carrier concentrations n and p are small and can be neglected; and 2) in the quasi-neutral region, where 17 and 17 are
+
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
13
essentially constant. For the last case, the solution for n ( t ) gives (Schroder, 1 990).
where n ( t ) is the concentration of the deep center occupied by electrons at t = 0, en and e , are the emission rates for electrons and holes, c,,n and c p p are their respective capture rates, and r = l/(en c,,n e , c,p). Equation (2) is difficult to solve without additional simplifications because e,,, el,, c,,, and c,’ are not known, and n and p vary with time and distance in the material. However, some important simplifications can be made for extrinsic semiconductors in the case of one-sided p-n junction or Schottky barrier (Miller et al., 1977; Schroder, 1990). In an n-type semiconductor, to a first approximation, p can be neglected. If we consider a Schottky diode on an n-type substrate with electron traps in the active region, then at zero bias the capture rate dominates the emission rate and the steady-state concentration is n , = N , . After applying a reverse bias pulse, within several picoseconds, the free electrons are swept out of the space-charge region (SCR) and emission dominates since ~ , 2 ~ 0. n Then, with r,,, = l/e,,, Eq. (2) reduces to
+
+ +
(3 1
However, at the edge of the space-charge region (the so-called Debye tail) the electron concentration is not negligible, c,,n is not zero, e,7 is not constant, and so the time dependence of n ( r ) can be non-exponential if the contribution of that effect to the total transient is not small enough. When the diode is pulsed back to zero bias, the free electrons flow back in the space-charge region, and if the previous emission pulse was long enough to empty most of the electron traps, then capture dominates, and the concentration n ( t ) is given by n,(t)= N t
-
[ N , - n,(O)]exp
(
2
--
where r,,, = I/c,,n and n,(O) is the initial steady-state concentration. Considering equilibrium conditions, the principle qf detuiled balutzce requires the rates of emission and capture processes lo be balanced. Thus (Schroder, 1990),
14
PLAMEN V. KOLEV AND M. JAMAL DEEN
where no (the index
denotes equilibrium values) is
and
nTo =
1
+
NT exp[(E, - Ep)/kTI
where E,. is the conduction band energy level, E , is the deep-center level energy, T is the absolute temperature and k is Boltzmann’s constant. Using Eq. (7) and Eq. (5) we obtain
The emission rate for holes e , , can be similarly obtained. An important assumption is then made. The deviation from equilibrium state is considered small, the new state is referred to as quasi-equilibrium, and the new notz-equilibrium emission und cupture rutes ure considered equul to their equiIibriutn values. This is expressed as en = c , , n ~ and
where
n = N,. exp and
[
e p = c,,p~
(9)
E,. - E l -
[
P I =N,,exp -~ E1,,J]
In the reverse-biased junction where a strong electric field exists, this is certainly a poor approximation and capture cross sections determined under these conditions generally do not give reliable results (Schroder, 1990). Nevertheless, the assumption is commonly made and the accuracy of all results is contingent on the limits of this uncertainty. Considering Eqs. (9) and ( 1 O), the expression for the emission rate e,, of electrons trapped in centers with energy level El below the conduction band is e,, = (T,v,,N,.exp
(
Z).
-~
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
15
where N , is the effective density of electron states in the conduction band and A E , = E , - E,. Here and later in the text we omit the brackets around u , , , ~ ] for brevity. Similarly, for hole traps in p-type semiconductors, we obtain
where N , is the effective density of hole states in the valence band and A E , = E, - EL,.The expressions for the emission rate-equations (12) and (13)-are the fundamental equations in the DLTS method. As the sample material is known, u,, and N , . (or uI, and N,,, respectively) can be easily calculated over a wide range of temperatures. Then it remains to determine experimentally the characteristic defect parameters CJ?, (or 0,)) and A E , . Each one of these equations links two unknown parameters of a deep center with two measurable variables -emission rate and temperature. Therefore, one has to measure the dependence of the emission rate from the temperature in order to obtain the deep-center parameters. Alternatively, the capture cross section can be determined from the capture process kinetics (Henry et al., 1973; Pals, 1974; Wang and Sah, 1985), the emission rate can be measured, and A E , determined at a constant temperature (isothermal measurement). However, for a wide range of deep levels, the emission rate at room temperature is too fast or too slow in order to be accurately measured conveniently with sensitive instrumentation. Considering the exponential dependence of the emission rate on the temperature, as indicated in Eqs. (12) and (13), the relative distance from the corresponding band edge, and the speed of most frequently used instrumentation, the useful temperature range can be estimated to be 50-450 K. This means that control of the sample temperature and, most likely, cooling well below 0°C might be required even for isothermal measurement of the deep-level parameters. D. Detection of the Emission of the Trcrpped Churge
I . Capacitance Trunsient For simplicity, the case of a one-sided abrupt p-n junction or Schottky barrier will be considered in the following sections. The specifics of MOS structure will also be discussed where applicable. Let us consider the space-charge region (SCR) of a reverse biased p+-n junction or a Schottky barrier on n-type semiconductor (Miller et al., 1977; Stolt and Bohlin, 1983). Figure 3 shows only the SCR in the lightly doped ti-type material, because the SCR spreads mainly on this side of the junction. In ideal n-type material, the SCR contains only the charge of the ionized donors. Let us assume that donor-like deep levels are present in the SCR and that they are charged with trapped electrons, for
16
PLAMEN \/. KOLEV A N D M. JAMAL DEEN
space
transsition quasi-
VI,
+++
+++
Vn
f
+ ++ + +
tran-
- _ _ -E,_ ++ + ++ Ed ~
- . +++ - . h + ~- ~ .-E~ . - . OOOOOE,
)---I -w-
I
-
b
-w-
I
-
example, after a pulse to zero or a slight forward bias. Applying reverse bias again will sweep out the free electrons in a few picoseconds and for 0 < x < y in the SCR, the charge trapped in the deep levels will be added to that of the ionized donors. Within that part of the SCR the electronic transition will be entirely an emission process as there are no free electrons and the capture rate will be zero (Sze, 1983). The situation is more complicated in the transition region at y < .r < w where both generation and recombination processes contribute to the equilibrium occupation of the deep-level traps (Miller et d., 1977). One should also remember that the transition to the quasi-neutral region in the bulk is not abrupt, but has some width determined from the diffusion of free electrons into the SCR. This transition is also known as the Debye tail region (Miller et al., 1977; Rockett and Peaker, 1981; Sze, 1983). When the width of the SCR is much larger than the width of the transition region, the emission process described by Eq. ( 1 2) will dominate and the junction capacitance can be used to monitor this process. With a reverse bias, the traps inside 0 < x < y cannot be charged with electrons, so it must be alternated with a bias at which electron capture will dominate. This is usually accomplished by applying for some short period of time a slight forward or zero bias, which is commonly referred to as a trap $fling pulse (Schroder, 1990). During this filling pulse, the capture rate dominates because of the large concentration of free electrons. In Fig. 4, this is the second event in the sequence. After this pulse (event 3 in Fig. 4), the bias is set back at V R but the capacitance is below its quiescent value (event 1 in Fig. 4) because of the compensation effect of the trapped majority carriers, electrons in this case. Next, at the moment = O+ these trapped electrons will be emitted at a rate defined by Eq. (12) and after a sufficiently long time the system will restore to its quiescent state
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
due to thcrmal emission I> 0
reverse hias I T , ~ ~ the , , emitted charge can be detected as a voltage transient (Farmer et al., 1982; Schroder, 1990), which is given by
Coinparing Eq. (16) with Eq. (15), we see that t,,,is eliminated from the denominator in the exponential prefactor, thus simplifying the analysis (Thurzo and Lalinsky, 1982). Another advantage is the simpler instrumentation needed to measure fast decaying transients.
4. Voltage Trrrnsient When the volume of the space charge region can be held constant by applying a voltage to compensate for the emitted charge, this voltage is directly proportional to the emission of the trapped charge. The simplest method to keep the volume of the SCR constant is to maintain the capacitance C of the diode 1985; Goto et ~ d . 1973; , constant by adjusting the reverse bias (DeJule et d., Johnson, 1979; Johnson ef al., 1979; Miller, 1972; Shiau et d., 1987a; Kolev. 1992). The compensation voltage needed to maintain the capacitance constant
20
PLAMEN V. KOLEV AND M. JAMAL DEEN
is a voltage transient (DeJule et ul., 1985; Schroder, 1990), and is given by
where E,Y is the dielectric permittivity of the semiconductor, A is the diode's area, and Vf,; is the built-in junction potential. Unlike the expression for the capacitance transient, Eq. (17) is valid even when the condition N , 1 ,OOO,OOO), the actual capacitance change needed to produce the com-
pensation voltage is negligibly small and the term “constant-capacitance” is completely justified. In Fig. 10, a setup for constant-capacitance measurements is shown. The reference capacitor C , is set to be equal to the sample’s capacitance during the filling pulse, and C , is set to balance the sample capacitance during the emission pulse. Thus, when the switch S W alternates between two reference capacitors C , and C , connected to the differential terminals of the capacitance meter, the feedback amplifier OA produces a bias applied to the sample, which is precisely the one needed to balance the corresponding reference capacitor. For dc and low frequencies, this feedback loop is negative, but for high frequencies the delay introduced primarily by the capacitance meter makes the feedback positive, causing large oscillations in the feedback loop. In order to avoid these oscillations, a lowpass RC filter is integrated into the feedback amplifier. This filter is adjustable, because the setup depends on the specific effective gain of the sample dV/dC, whose gain may vary with the temperature during the measurement. During the filling or emission pulse, the capacitance is largely stable. If for some reason it changes, a compensation voltage is immediately produced by the feedback amplifier to compensate for the change. Thus, the charge emitted from the deep levels is precisely balanced by the voltage applied to the structure. This constitutes the voltage transient. Probably the most important consequence of maintaining a constant volume of the depleted region is the avoidance of the requirement that N r 0 . I N d ) and limited range ( A E > 0.3 eV), they have been largely replaced with DLTS (Schroder, 1990). For the special case of measurement of charge-coupled devices (CCDs), charge transfer
-
-
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
37
efficiency can provide valuable information about the deep levels (Hardy et a/., 1998a, b; Murowinski et ul., 1995). Optical methods such as photoluminescence (PL) can be applied only to those impurities for which radiative recombination has been observed (Schroder, 1990). Because of the high sensitivity of detection instrumentation, PL can be used even when radiation is dominated by another recombination mechanism. Compared to DLTS, PL allows easier impurity identification, but is more difficult for concentration measurement. Nonradiative bulk and surface recombination complicate correlation of a given PL spectral line with the concentration of the impurity corresponding to that line. The sample’s volume detected by PL is determined by the absorption depth of the exciting laser light and the diffusion length of the minority carriers, and this does not allow for deep-level profiling. In some cases, PL can complement DLTS as a tool for investigation of the shallow levels but that is complicated because of the different test structures used for PL and DLTS. The PL has the advantage of being a nondestructive, contactless method, which allows deep-level mapping on large area samples. It also requires complex and very expensive instruments such as argon ion laser, helium refrigerator, spectrometer, and photon counting and detection system (Schroder, 1990). Recently, low-frequency noise measurements of current or voltage noise spectrum have also been used for characterization of deep levels (Citterio et al., 1996; Deen, 199%; Deen and Raychaudhuri, 1994; Deen and Zhu, 1993; Deen et a/., 1993; Deen et al., 1995a, b, c, d; Deen and Quon, 1991; Jones, 1994; Kolev et NI., 1997c; Murowinski et al., 1993a, b, 1995; van der Ziel, 1986). The total noise is regarded as a superposition of noises that originated from several noise sources (Citterio et id., 1996; Deen, 1993b). The thermal and shot noise are considered to be fundamental sources whose contribution can be minimized by appropriate device design and operating conditions such as bias and temperature. Added to those sources are excess noise components such as flicker ( I /f), generation-recombination (G-R), and random telegraph signal (RTS) noises. These three sources are common for inany devices, and their spectral densities and amplitude distributions are well defined (van der Ziel, 1986). The intensity of the G-R noise is proportional to the concentration of the electrically active deep levels and at a given temperature, it peaks at some characteristic frequency, depending on the activation energy of the deep level. The variation of the peak with the temperature then gives the Arrhenius plot. At some other temperature, the actual peak may be above the observation limit of the measuring instrument and then the results may be incorrectly interpreted. This is a Fermi level probe method and can be compared to the small-signal variations of the DLTS. In Table 11, some of the most popular techniques for interface and bulk deep-level measurements are summarized.
TABLE I1 COMPARISON OF DLTS WITHDIFFERENT METHODS Method
DLTS methods
w 03
C-v methods charge pumping G-V method Low-freq. noise method CCD's charge TSCAP TSC
Bulk sensitivity
-
109
NIA
Surface sensitivity
-
E,
109
10"'
NIA
- 109
NIA
NIA
NIA
Energy range
108
NIA NIA
Tempscan needed
- 0.1
-- +
eV midgap E, 0.1 eV E, -0.1 eV E, t 0 . 1 eV E, 0.1 eV midgap E,, f0.05 eV midgap E, - 0.1 eV midgap
-
- 0.05
eV midpap (minority cc.) AE > 0.3 eV AE > 0.3 eV
-
Test device
Deep levels
Interface states
Optical properties
space-charge r e g i x ; resistoi; MOS structure; MOS capacitor MOS transistor MOS capacitor
+
-
E, NIA
data for u
+ +
+ +
a few points
same as DLTS
a few points
CCD
+ +
same as DLTS same as DLTS
+
NIA
+
+ +
+
+
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
39
111. AVERAGING AND RECORDING o t DIGITAL DLTS TRANSIENT SIGNALS
I . Introduction Since the introduction of deep-level transient spectroscopy (DLTS) in 1974 by Lang, the method has been refined many times with new techniques for improving sensitivity and accuracy, adapting DLTS to the specific properties of new types of samples, and simplifying procedures for measurement and data analysis. At present, DLTS is composed of a large group of different measurement techniques applied to a variety of test devices and materials, and is considered to be among the most accurate and reliable tools for investigating the properties of electrically active point defects in semiconductors (Benton, 1990; Schroder, 1990). Because of the great variety of DLTS techniques and numerous possibilities for their combination, any attempt at a detailed classification easily becomes a formidable task. Nevertheless, in all these techniques, there are several common features identifying the technique as of DLTS type. First, it is a time domain measurement of a relaxation process, most often the emission of charges trapped in electrically active centers. Second, the process is considered to be thermally activated and the validity of the Shockley-Read-Hall theory (Shockley and Read, 1952; Hall, 1952) is assumed. Third, there is periodic alternation between filling of the traps with charge (filling pulse) and emission of the trapped charges (emission pulse). This last feature is used for synchronized measurement and integration -a basic method used for signal recovery from noise in electronic instrumentation (Wilmshurst, 1990). The increased sensitivity, allowing detection of a signal immersed in noise, is the major advantage of DLTS over the previously used thermally stimulated current (TSC) (Driver and Wright, 1963), or thermally stimulated capacitance (TSCAP) (Carballes and Lebailly, 1968) methods for the study of deep levels in semiconductors (Sah et ul., 1969; 1970; Sah, 1976).
2. Technicul Ovendew of the DLTS Experiment The DLTS experiment can be regarded as a sequence of several steps. The first step is to populate the traps in the investigated volume with charges. The second step is to detect the change of the trap occupancy with electrons as a measurable change in some electrical parameter of the test structure. In classical DLTS, this is the small-signal high-frequency capacitance. The next steps are synchronous detection and averaging of this signal in order to improve the SNR. There are two main approaches at this stage of the experiment-use of either analog or digital signal processing methods (Fig. 13).
40
PLAMEN V. KOLEV AND M. JAMAL DEEN
I
+";: integratoi
amplifier
digital inethods
correlator
FIGURE13. Classification scheme of the signal processing of DLTS transients. The shadowcd areas indicate whcre the averaging techniques presented in this chapter belong.
1. Analog Methods The noisy transient signal can be measured and integrated over many pulses using analog instruments such as a boxcar averager (Kosai, 1982; Lang, I974a), a lock-in amplifier (Kimerling, 1976), or an exponential correlator (Miller et ~ l . 1975). , In the classical setup, two boxcar channels are used to measure the signal at two different times in the transient, and these times define a time constant window of the instrumentation (Lang, 1974a; Miller er ul., 1977). The averaging is performed by two boxcar channels and the difference of the signals from these channels vs the sample temperature is recorded, giving a DLTS spectrum. When the emission time constant of a deep level coincides with that of the instrumentation while scanning the temperature of the sample, the output signal indicates this coincidence with a peak in the spectrum. To apply the Arrhenius relationship to determine the energy level and the capture cross section of a trap, several temperature scans are needed using different settings of the time constant window of the instrumentation. Similarly, it is necessary to change the pulse frequency when using a lock-in amplifier, or the reference time constant when using an exponential correlator, and to repeat the temperature scan in order to obtain enough data for the Arrhenius plot technique. In all these cases, the averaging technique simultaneously achieves two different goals, increasing the SNR and analyzing the transient parameters. One serious disadvantage of the analog techniques is the need to perform more than one temperature scan. This practical difficulty can be eliminated, for example, by using more complex (and more expensive) instrumentation, or by analyzing the shape of the DLTS spectrum (Hjalmarson and Samara, 1988; Steele, 1986; SLIarid Farmer, 1990), instead of just finding the peak
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
41
temperature. Other disadvantages of the analog methods are their inherent limitations i n resolving closely spaced energy levels of defects, and difficulties associated with the analysis of nonexponential transients. Nevertheless, the analog methods are simple to implement, they produce an analog signal in real time during the experiment, and plotting this signal vs the temperature gives spectra that are available for immediate interpretation. 2. Digital Metliocls While the analog methods produce an integrated response to the transient signal, digital methods record the whole transient as a set of data points. The noisy signal is first digitized and then processed using various digital techniques (Asada and Sugano, 1982; Chang et d . , 1989; Doolittle and Rohatgi, 1992; Hanak et ul., 1990; Henini rt ul., 1985; Holzlein et NI., 1986; IkossiAnastasiou and Roenker, 1987; Ikeda and Takaoka, 1982; Jack et cil., 1980; Jervis et i l l . , 1982; Kirchner et al.. 1981; Losson et al., 1996; Morimoto et ul., 1987, 1988; Okuyama et al., 1983; Shapiro et nf., 1984; Valeur, 1978; Weiss and Kassing, 1988; Zitti et d., 1989). Analysis of the transient parameters can take place during the experiment, or data can be stored for analysis later. In common with the analog methods described here, the transient can be digitally correlated with a boxcar, lock-in or exponentially decaying function, and used to build a set of DLTS scans (Doolittle and Rohatgi, 1992). Alternatively, the transient can be analyzed using spectral analysis DLTS (SADLTS) (Morimoto et al., 1987, 1988), Fourier transform analysis (Ikeda and Takaoka, 1982; Okuyama et af., 1983; Weiss and Kassing, 1988), nonlinear least square fitting (Hanak et d., 1990), the modulation function method (Valeur, 1978), the method of moments (Kirchner et d., 1981; Ikossi-Anastasiou and Roenker, 1987), the correlation method of linear predictive modeling (Shapiro et d., 1984), or other digital methods described, for example, in Bevington and Robinson (1992). As the whole transient is recorded, it is necessary to perform only one experimental temperature scan, and this greatly reduces the time needed to perform the experiment. This is a major advantage over the analog methods, in addition to the extensive possibilities for analysis. However, digital methods have disadvantages as well. Because the whole set of data points representing the DLTS transient is recorded for each temperature, the total number of stored data points can be very large compared to that recorded using analog methods. Consider for example a digitizer fast enough to study transients with time constants of below 0.5 ms. Commonly available 12bit analog-to-digital converters (ADC) with conversion time of below 10 ps, which interface directly to a PC, are suitable for that purpose and their cost is below ten dollars. If operated at rate of 33.3 kHz, the time spacing between the data points would be 30 ps and, For simultaneous observation of slow transients with time constants in the tens of milliseconds range, there will be several
42
PLAMEN V. KOLEV A N D M . JAMAL D E N
thousand samples for one transient recording. For example, if the emission pulse is 90 ms, there will be 3000 samples for each transient recording. If the transient is recorded in I " intervals over a temperature range of 50-350 K , there will be a total of 900,000 samples, which occupy 1.8 Mbytes of disk space (each 12-bit sample occupies two bytes). The problem, however, is not so much the required disk space, but rather the difficulties in processing this large amount of data, as many of the digital methods involve iterative numerical calculations. The need to transfer 6 kbytes of data in just 90 ins is one more obstacle when the system is based on IBM PC and the processor operates in real mode, known also as MS-DOS mode. Another disadvantage of the digital methods, as compared to analog methods, is the decreased SNR for long delay times. Assuming constant noise, as the transient progresses, the signal decay leads to decreased SNR at the input of the signal processing apparatus. The analog methods have various ways of dealing with this problem. The boxcar technique enables one to compensate for the decreased SNR by increasing the sampling aperture of thc channel that is recording the second time delay point (Kirchner et nl., 1981; Day et id., 1979b). In the lock-in technique, an integral of the transient is processed and the high SNR at the beginning of the transient is averaged with the low SNR at the end. The correlator reference function, which multiplies the noisy signal transient before integration, also decays along with the signal, thus minimizing the influence of the low SNR at the end of the transient. However, in a digital system, the sampling aperture of the ADC remains constant, and so the SNR decrease in the tail of the transient cannot be compensated for. In general, both analog and digital DLTS signal processing techniques are mainly focused on the transient parameter extraction and on the ability of the method to resolve closely spaced energy levels of defects. Averaging to improve the SNR is considered important, but is a secondary task. While in the analog techniques these two tasks are merged, in the digital techniques, the transient analysis is separate and usually performed ufter some simple digital averaging is done. Most often, this is the multiple time averaging of a selected number of successive transient recordings (Doolittle and Rohatgi, 1992; Morinioto et a/., 1987, 1988; Okuyama et al., 1983; Hanak et al., 1990; Kirchner et ul., 1981) and this number is usually in the range of 100-300. This technique has three disadvantages. First, there is a need to allocate a memory buffer large enough to accornmodate the selected number of transients. In the preceding example it takes more than 300 kbytes, or approximately half of the conventional memory of an IBM PC. Second, after averaging, the buffer is cleared for processing of the new transient, and the old information is lost, thus making it more difficult to maintain high SNR. Third, the progress of the summed transient is not monitored as it accumulates in the allocated buffer and, if anything goes wrong and some adjustment and a restart are needed,
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
43
one is not aware of the problem until the full sequence of the selected number of transients has been accumulated. In our approach (Kolev et d., 1997a), we consider averaging of the noisy DLTS signal as a sepparatc task from the extraction of the transient parameters. We present two different, but complementary averaging techniques offering an increased time resolution for short delay times, improved SNR for long delay times, and more efiicient data storage. In addition, the transient and the DLTS scan are available i n real time for continuous observation and data analysis after the acquisition of each single transient.
C. Pseudn-LoRciritlimic Awrcigirig 1 . Theory
For detailed observation of an exponentially decaying signal, short sampling intervals are needed at the beginning of the transient, while much longer intervals are sufficient at the end. At the same time, high SNR of the processing instrumentation is much more important at the end of the transient, where the signal is weak, as compared to the beginning of the transient, where the signal is strong. One way partly to fulfill these requirements is to use a high sampling rate and short emission pulse at high temperatures, and to lower the rate and increase the emission pulse width at low temperatures (Okuyama et d., 1983; Hanak et al., 1990). This is easily done by using a programmable digital scope as a digitizer. Although this approach does not allow the simultaneous monitoring of both fast and slow transients, it is still possible to optimize the time resolution setting of the measurement system for a given time constant of interest. A much better way to satisfy the time spacing requirements is to use sampling with a logarithmic time base. This approach is not new in DLTS experiments. Morimoto et a / . (1987, 1988; Ikeda and Takaoka, 1982) described selecting from the measured 512 data points a set of 195 data points having almost logarithmic time distribution. The data points in (Wang et a/., 1993) also appear to have a logarithmic time spacing. To our knowledge, the best published work on this subject is described in (Doolittle and Rohatgi, 1992) where a high, but constant ADC sampling rate is used, all sampling results are transferred to the computer memory, and pseudo-logarithmically spaced data points are selected by specially written software. The pseudo-logarithmic storage scheme in (Doolittle and Rohatgi, 1992) uses a logarithm with base 2 because of the ease of unsigned integer division by 2" in binary arithmetic (Fernandez and Ashley, 1990). This simplicity is equally advantageous in hardware logic circuits and is used in our scheme as well. Another. probably less obvious advantage is the fact that the function 2" is closer to ~ x p ( n )
44
PLAMEN V . KOLEV A N D M. JAMAL DEEN
than the commonly used 10”. In our system, the sampling rate is 100 kHz and, therefore, the sampling intervals At are spaced 10 ps apart. If the data is averaged over time intervals t,,, with each one twice as long as the preceding one, the second averaging interval is 2At, and the n‘” averaging interval starts at t,, = (2” - 1)At. (22) If, for convenience, we number the intervals starting from zero, which is a commonly used standard i n programming and digital electronics, for 2” >> 1 the index n can be expressed as
n
%
log, t , ) / ( A t ) .
This represents a logarithmic dependence of the interval index on the interval time length, as the sampling interval At is fixed by the constant sampling rate of the ADC. Unfortunately, the pseudo-logarithmic storage scheme proposed in (Doolittle and Rohatgi, 1992) has Feveral disadvantages. First, it is still necessary to allocate very large memory buffers, as the data points are selected cfter the full set of transient sampling points is recorded into the computer memory. Sampling each 10 ps, there are 131,072 sampling points that are stored in a 256-kbyte buffer. From this large set of data, only 768 data points are selected and used, while the remaining 130,304 points are discarded. Second, more buffers are needed for the multiple time averaging of successive transients as described in the preceding. iind the result is available for observation after the selected number of transients is averaged. Third and most important, there is a substantial decrease in the SNR, especially in the tail of long transients, where the data point intervals are in the millisecond range, because the data points are selected from just 10 ps long sampling intervals. In addition, recording of 768 data points for each degree in the 50-350 K range still requires almost 0.5 Mbytes disk space. Also, direct memory access (DMA) transfer into memory blocks larger than 64 kbytes is complicated as it requires continuous initialization of the DMA controller for crossing the memory page boundary (Royer, 1987). The main difference between our pseudo-logarithmic averaging scheme and that described in Doolittle and Rohatgi (1992) is the method of obtaining the data points. In Doolittle and Rohatgi; ( I 992), the data points are srlected from the large set of sampling points, but in our system, the data points are Liverciges o j the satnpling points inside the averaging intervals, which increase in a pseudo-logarithmic manner (Austin, et d., 1976). In this way, ~ 1 1 1measured samnples are used. Although our electronic circuit implementation is different, the averaging principle (Fig. 14) is the same as in (Austin, et al., 1976). The
DEVELOPMENT A N D APPLICATIONS OF A NEW DLTS METHOD
45
M = 5 for all groups ti = groups nunibcr 0 ADC sampling A date point
sanigJletl trilnsient n
Ill
11=0;
,I=
I
j
It
I
0 FIGURE
10
=5
I
I
I
I
30 Time [sampling intcrvalsl
20
I
I
I
I
I
/
40
.
=3 .
.
.
_
_
, 50
14. Pseudo-logarithmic time averaging scheme. After each 5 data points the number 2” doubles as II increases by 1 .
of averaged samples
logarithmic time spacing algorithm already described gives only about three data points per decade. To increase this number, the length of the averaging interval in the pseudo-logarithmic averaging scheme is doubled after a preselected number M of averaging operations, thus producing a linearly spaced train of M averaging intervals. In this way, the pure logarithmic time spacing is mixed with linear time spacing, and hence the name p.seud[~-loRarithrnic. The sampling interval remains the same, 10 ps in our system, and the result is a sequence containing ( N I ) groups of averaging intervals m,,. Inside the group n , each interval m,, has the same length, but it is twice as long as an interval from the preceding group rn,,-~.or half of the time of an interval from the succeeding group m , , + ~Because . the maximum number M of averaging intervals m,,inside each group is the same for all the groups, the time lengths of the groups change in the same way. Therefore, we have ( N I ) groups, each one containing M equal averaging intervals m,,, but with the time length of the groups increasing as 2”.
+
+
2. Error Anulysis At first glance, this scheme for averaging of the sampling points may be understood as an approximation of the exponentially decaying function with a set of straight lines connecting the obtained data points (piecewise linear approximation). Indeed, the obtained set of data points does not contain explicitly any information about the time intervals between the data points -this information remains hidden in the summing results from which the data points were obtained. However, the data points themselves were obtained by summing of
46
PLAMEN V. KOLEV AND M. JAMAL DEEN
the true experimental signal, sampling each A t , and dividing the result by the number of the samples in the averaged interval. For a given group with an index n , the largest error is always expected to occur in the first of the averaging intervals m,, with m = 1, where the signal varies more rapidly as compared to its variation during the subsequent intervals from the same group with nz > 1. Therefore we need to consider only the worst case when the averaging is done from ts,,,M = At.M.(2" - 1) to te,,,M = At.M.(2" - I ) At.2". The result of the averaging process is assigned to the middle of the averaging interval defined as tnz,,,M = At.M.(2" - 1 ) At.2"-'. Thus, the error function Err(t) can be obtained by subtracting the magnitude obtained by averaging of the exponentially decaying function in this interval from the exact magnitude in the midpoint rm,,,M as
+
+
Err(t)= C .exp
.exp
I?-[
2"
.xexp
At
[-T . i ]
(24)
1=I
where C is the transient magnitude at the beginning of the transient. When Err(r) is plotted vs r for B given set of A t and M , this function exhibits one sharp maximum at some tin+, which dominates short time constants for small t i . At long time constants, this maximum is compensated for by another sharp minimum that begins to dominate for large n at some tnr-< tnl+. Away from these peaks, the error function is close to zero. We find these particular t,,,+ and r,,,- by setting the first derivative vs r equal to zero and solving numerically (Appendix A) with convenient choice of the initial guess value for r
-Cat
where y , , , ~is M ( 2 P - 1). Next, we replace the values obtained for tm+ and tnl-in Eq. (24), which gives the maximal errors Err(r,,,+) and Err(r,l,-) when using the averaged values in the mrr interval with m = 1. This function was normalized relative to the initial magnitude C and plotted against t,,,+and r,,,- in Fig. 15 with parameters n and M for two values of A t . There are two distinct reasons for the errors in this pseudo-logarithmic scheme.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
47
10
I
L
b 0.0 I
0.0 1
0.1
I
10
100
L,,, lmsl FIGURE15. Normalized absolute magnitude and time coiistant errors plotted vs the time constant t,,,,, at which these errors occur. The straight lines indicate independence from the gi-oup number n or number of equal averaging intervds inside the group M and dependence only on the sampling rate. Inverted triangles indicate sampling with I MHz rate and M = 10. The right line with hollow symbols gives the errors in determining the time constant error sampling with 100 kHz.
First, the limited speed of the ADC gives an error that is strictly dependent on the sampling interval Af, in our case 10 ps, and it does not depend on the number of the averaged samples 2” or the number of the equal intervals inside the group M . This gives the straight line, delnonstrdtes increased error for short time constants. Obviously, this error is not inherent to the pseudo-logarithmic averaging, and can be decreased only by using faster ADCs. The second type of error is more pronounced for longer time constants and it depends much more on M than on 2”. Figure 15 demonstrates that for time constants longer than 1 ms the errors are below 0.1% if the groups are divided into five or more averaging intervals. Calculations with a 10 times faster ADC sampling rate show a 10 times reduction of the errors for fast time constants. For transient analysis, it is attractive to use a boxcar function obtained simply by subtracting two data points with time delays tl and f 2 . Because each data point represents an average with time width W, it is interesting to evaluate the error in defining the rate window. We follow the procedure outlined in (Day et d., 1979b) with a slight modification. Instead of integral averages, we use summing averages in order to model the actual function of our circuit accurately, and to account for the limited rate of the ADC. Furthermore, we subtract only the first subintervals in two adjacent groups where the maximal error is expected to occur. The normalized output S(t) is
48
PLAMEN V. KOLEV AND M . JAMAL DEEN
then given by
We find the rate window by differentiating S ( r ) with respect to r and setting the result equal to zero. The solution can be found numerically (Appendix B) to give r,,,,. We calculated r,,, for variations of 17 and M and compared the values with those obtained using the Lang's expression using the midpoints of the averaged intervals, which is
The normalized error in determining rcn= (rkax- rmax)/rmax is also shown in Fig. 15. One should mention that because of the logarithmic dependence from r on the Arrhenius plot errors for r in the range of 1 % can be ignored (Day et a / . , 1979b).
3. Ir~i~~lernrritcrtion There are two possible ways to implement the averaging scheme described here. It might be possible to use the same hardware as described i n (Doolittle and Rohatgi, 1992). Instead of selecting only 768 data points from the whole set of sampled points, fast software could be used to change the length of the averaging interval, perform the actual averaging of the sampled points inside the interval, and store the corresponding data points. Equation (24) could give the starting point for such an implementation. There are two potential problems with this approach. First, there may be some d processing during the first three groups, when there is no averaging or only 2 or 4 samples are averaged per data point. In processor real mode, fast data transfer can only be done with direct memory access (DMA). As the DMA process actually "steals" cycles from the PC processor, an intensive DMA transfer can considerably slow down program execution (Royer, 1987). Thus, the DMA transfer can cause timing conflict with the real-time averaging routines. Therefore, we give preference to the hardware implementation shown in Fig. 16. In our circuit, one can distinguish four functional blocks: an ADC, a pseudologarithmic pulse generator, an averager, and an interfacing block. The ADC
DEVELOPMENT A N D APPLICATIONS OF A NEW DLTS METHOD
49
FIOIIKF.16. Block diagram reprcscntation of pseudo-logal.ithmic time hase averaging circuit.
operates continuously at a fixed rate of 100 kHz. The pseudo-logarithmic pulse generator (Appendix C) is constructed from programmable counters and shift registers. I t incorporates a programmable divider of the clock frequency, which is set at the beginning of the transient t o divide by 2", because for the first group 11 = 0. Of course, i n practice this means no division ;it all, because 2" = I . When the counter of linear pulses m reaches a predetermined number M , it is cleared to start counting again, and this produces a logarithmic pulse 1 1 , thus changing the setup of the programmable divider to 2 ' . This doubles the time between each one of the next M pulses. When the linear pulse counter reaches M again, ti increments by one, and this sets up the divider t o divide the clock pulses by 2'. This sequence is repeated until the logarithmic pulse counter reaches the software-programmed number of groups N . The averager is made by combining an accumulator, shift register, and a prograininable down counter. The end-of-conversion (EOC) signal from the ADC triggers an adding operation that adds the result from the current conversion to the stored sum of the previous conversions. This accumulation of conversion results is repeated over the whole length of the averaging interval. When the new linear pulse arrives, it stores the accumulated result into the shift register, which must be large enough to accommodate the whole sum. Then the accumulator is cleared for storing the new sum while the result in the register is divided by 2" by being shifted right IZ bits by the down counter. As the down counter is programmed by the same logarithmic pulse counter H , which is programming the clock frequency divider to divide by 2", the accumulated results are always correctly divided.
50
PLAMEN V. KOLEV AND M. JAMAL DEEN
After the averaging operation is over, an interfacing circuit signal requests the DMA controller of the PC to transfer the averaged result to the specified memory location. After the last result from the last octave is transferred to the PC memory, the logarithmic pulse counter is cleared and ready for the next transient. Simultaneously, the interfacing circuit produces a signal that triggers a hardware interrupt routine for performing the second averaging technique that will be described in this section. The operation of the whole circuit is synchronized by a 16 MHz system clock, and the ADC sampling frequency is obtained by dividing the system clock frequency by 160. The maximum allowed numbers for N and M depend on the hardware circuit. In our system, N,,,,, = 12 and MI,,i,X= 16, which gives a total of 208 pulses (13 x 16, because N starts from zero). This is enough to store transients of up to I .3 s long, with 10 ps and 20 ys resolution, and with 32 data points in the tirst 0.5 ms. In addition, we have a programmable delay of up to 160 ps i n order to compensate for possible slow response of the capacitance meter. The intervals in the last group are 40.96 ins long, each one averaging 4096 J4096 or 64 times samples. Therefore, the expected SNR improvement is (Wilmshurst, 1990). However. i t refers to points inside the averaging interval and not for the whole transient, and this implies that the SNR improvement will be less for low-frequency noise. If we use the averaging scheme proposed in Doolittle and Rohatgi (1992), then each data point from the last group (the second half of the transient) would be just one selected ADC sampling result out of 4096 ADC samples. In our averaging scheme that same data point represents an average of 4096 ADC samples. This gives a major advantage of our averaging technique for long delay times over the previously used technique. Another advantage is storage efficiency. Because of significant SNR improvement, even long transients can be recorded with just 208 points and the disk storage space for a DLTS scan in the range 50-350 K occupies less than 125 kbytes. Other advantages are the reduced size of the required buffers (in our system we allocate just two 420-byte buffers) and the large amount of available time for the microprocessor to process and display the results while the measurement is in progress.
-
4. Demonstrcitiori In order to demonstrate the benefits of using the proposed pseudo-logarithmic averaging scheme, we recorded a real DLTS signal when our system was operated in constant-capacitance mode. We used an analog memory feedback (Kolev, 1992), which eliminates the distortion of the voltage transient introduced by the integration of the voltage step from the filling to the emission pulse, and provides a stable zero baseline (Shiau er al., 1987a). This feedback was used in combination with a Boonton 7200 capacitance meter. The measured sample is a p-Si MOS capacitor with insufficient annealing after
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
0
100
200
300
400
so0
51
600
Delay time [ins]
FIGLJR~, 17. a) Original signal sampled each 0.2 nis with 12-bit resolution: h) same signal aftcr multiple time averaging ol’ 32 convxutivc transients.
the ion implantation for correction of the threshold voltage. More details can be found elsewhere (Kolev, 1992). The traces i n Fig. I7 were recorded using DMA transfer and a commercially available data acquisition board. Trace a) appears in the same way as the signal seen on the oscilloscope screen, and trace b) was obtained with multiple time averaging of 32 consecutive transients. This trace gives an idea of the expected improvement in SNR using a simple averaging technique. It should be noted that real-time observation of these traces on the computer screen was impossible. Also, we demonstrate our system mainly with a relatively weak DLTS signal and with 60 Hz electromagnetic interference present, in order to demonstrate better the complementary action of both averaging techniques presented in this section. For the same reason, we intentionally recorded transients longer than those normally used in DLTS. In Fig. 18 are shown four traces. The trace denoted with k = 0 ( k will be defined later) was obtained using only pseudo-logarithmic averaging. At first glance, the advantage of using this technique alone is not obvious. Although the noise magnitude inside the averaging intervals is significantly reduced, the trace shape is substantially different from the expected exponentially decaying transient. The reason for this unsatisfactory result is because the pseudo-logarithmic averaging technique is used inside the averaging intervals and, therefore, it cannot suppress low-frequency noise components. However, note the transition to longer averaging intervals around 320 ins and its effect on the curve shape. In order for the forementioned digital averaging circuit to operate correctly, it must be assured that essentially all the averaged
52
PLAMEN V. KOLEV AND M. JAMAL DEEN
transients are inside the limits of the ADC. We use a digital-to-analog converter to display the averaged data points on the oscilloscope screen, which helps us to detect ADC input overloading.
D. Contiiiuous Time Avercigiizg 1. Theot:v
The pseudo-logarithmic averaging proposed here is a very efficient technique for reducing the number of data points and for SNR improvement at relatively high frequencies and large delay times. However, it needs to be complemented with another averaging technique that can suppress low frequency noise components and improve the SNR at the beginning of the transient. We found that continuous time averaging (Wilmshurst, 1990) is a very convenient technique for these purposes. In addition, it has the advantage of allowing continuous transient display after each pulse and the size of the allocated memory buffer is independent of the number of averaged transients. The continuous time averaging mode is similar to the running average formed by a low-pass filter. In this mode, the result of the ltist n , transients is available at any time for display and for other data processing. Unfortunately, it is not convenient to apply the low-pass filter directly to the: multiple time averaging scheme because the data points it averages are consecutive in time. In contrast, multiple time averaging processes the whole set of transient data points in parallel, that is, each data
0
100
200
300 400 500 Dclay time [ins]
600
FIOIJRI~, 18. The signal of trace a) in Fig. I7 after only pseudo-logarithmic time averaging for k = 0. Fork > 0 , pseudo-logarithmic time averaging is combined with continuous time averaging with tinic constants 2'. The inset shows good time resolution during first 2 nis with k = 3.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
53
point is averaged with corresponding data points from the other transients that are at the same delay point from the start of the transient. In this way, the data points to be averaged are separated by one or more pulse periods and are not consecutive in time. Therefore, we need a method to adapt the low-pass filter function to be applied separately for each data point of the transient. For a simple asymmetrical first order low-pass filter consisting only of a resistor R and capacitor C, the voltage increment of u[) is (Wilnishurst, 1990)
where T , = RC is the filter time constant. Let us consider u,,, digitized at short sample intervals 6 t . Then in discrete form, Eq. (28) becomes
where n,& is the digital time constant cotresponding to the analog time constant RC. Therefore, in order to implement a lowpass filter function on each data point of the transient. we have to replace the simple summing and dividing algorithm with W) + 141 (uirr - V O ) / ? ~ / (30)
+
7
which means, that the value u()of the data point stored in the computer memory buffer is updated after each new transient with a fraction of its difference from the corresponding data point v i I r from the new transient. If the starting value stored in the buffer is zero, then in the beginning of the averaging process the difference is large, and the value in the buffer quickly grows as a new value is added to it after each new transient. As the value stored in the buffer approaches that of the new transient, the growth rate decreases and eventually when the stored value is nearly equal to that o f t h e new transient, the growth terminates. After this, the value i n the buffer becomes largely stable and it reflects only changes in the incoming transient that are sustained long enough to be comparable to the digital time constant. This evolution of the transient stored in the computer memory buffer is continuously monitored on the computer screen where each data point looks like it was produced by a separate “virtual” boxcar channel. This is not a surprise, as a real boxcar channel performs exactly the same averaging operation. Of course, our “virtual” boxcar channel differs from the real one. Each “virtual channel” has a fixed time delay and aperture. Fortunately, the pseudo-logarithmically averaged interval length, or the “virtual aperture,” is conveniently self-adjusted depending on the time delay, new “channels” can be easily created, and selecting the averaging time constant t i , & can be fully automated and varied throughout the experiment. Because 6f remains constant, only t i I is varied. For convenience, i n our system rzl is represented as 2“ and the actual change
54
PLAMEN V. KOLEV AND M. JAMAL DEEN
of the digital time constant is made by selecting k . The SNR increases with the square root of the number of averaged transients so it is expected that the SNR will depend linearly on k . 2. Demonstrution The results of the combined action of both techniques (pseudo-logarithmic and continuous time averaging with variation of k ) are shown in Figs. 18 and 19. A high time resolution in the beginning of the transient is seen in the inset in Fig. 18, and a very good SNR is demonstrated in the inset in Fig. 19. The trace with k =: 2 seems almost identical to that with k = 3 but in fact it contains more noise at low frequencies comparable to the inverse of the pulse period. Note the difference in the second half of the transient. Also, the trace with k = 4 in Fig. 19, compared to the trace with k = 3 in Fig. 18, appears more noisy because the scale of the vertical axes is different. The noise fluctuations in the inset in Fig. 19 can be estimated to be in the range of 10-30 pVppp.Compared with the noise magnitude as seen in Fig. 17a, which is about 10 mV,-,, SNR improvement is substantial. The 12-bit resolution of the ADC (or that of the pseudo-logarithmic circuit) is always much better than the vertical resolution of the computer screen; therefore, quantization errors are not visible. In addition, the 12-bit resolution can be effectively boosted hy the averaging process to 16-bit resolution, because the computer operates with 16-bit “words” (Wilmshurst, 1990; Doolittle and Rohatgi, 1992).
k = 4
0
100
200
300 400 so0 Dclay time [ins1
600
F i w w 19. Comhined pseudo-logarithmic and continuous time averaging with different time constants 2k. The inset shows very good SNR at the end of the transient with k = 10. Input signal is seen in Fig. 17.
5s
DEVELOPMENT A N D APPLICATIONS OF A NEW DLTS METHOD
3. ltiipleriientmtion The continuous time averaging routine' is activated by a hardware interrupt immediately after the transfer of the last pseudo-logarithmically averaged data point of each transient. For each data point of the received transient, the subprogram shifts left 4 bits, which is equivalent to multiplying by 16. Next, it subtracts the corresponding 16-bit value, stored in the reference memory buffer, then shifts right k bits and finally adds the result to the old buffer value to be stored as a new value. This is exactly the low-pass filter algorithm implied in Eq. (30). Because the averaging is performed over 16-bit digits, the digital time constant can be as large as 2"-1 or 32,767, which means the maximum increment or decrement lo the 16-bit value stored in the buffer is just one bit. However, in practice this is a very inconvenient choice. Consider, for example, a relatively fast transient recorded during a 20-ms-long emission pulse period. There will be just 50 averaging steps per second, and to change the stored value e times will require 32,767s/S0 or roughly 1 1 min. To reach the new value within 1 o/o error, one has to wait almost I hr. Besides, a 20-inslong emission pulse period does not allow the pseudo-logarithmic averaging technique to be used efficiently. E. Applications
I . Loizg Trmsierits The averaging techniques suggested here are not limited only to DLTS measurements. The main goals achieved by these techniques are substantial SNR improvement and the efficient reduction of the number of data points. In DLTS, these techniques should be considered as preprocessing steps followed by the actual transient parameter analysis, which can use all of the digital processing methods listed earlier (Doolittle and Rohatgi, 1992; Morimoto et ul., 1987, 1988; Ikeda and Takaoka, 1982; Okuyama et ul., 1983; Weiss and Kassing, 1988; Hanak et ul., 1990; Valeur, 1978; Kirchner et ml., 1981; IkossiAnastasiou and Roenker, 1987; Shapiro rt nl., 1984). Furthermore, because of the reduced number of data points and increased SNR, it can be expected that these techniques will give better results. Alternatively, as demonstrated in Doolittle and Rohatgi ( 1992), weight functions well known in analog methods, such as boxcar, lock-in amplifier. and exponential correlator, can also be used for transient analysis. When SNR is not of concern, the boxcar function is particularly attractive because it gives Free for download at ht~p://www.ensc.sfu.ca/C;~.adStudcnts/kolcv/DLTS.htmlor http://ww~.GeoCities.~~n~SiliconValley/Piiie~/~X~9.
at
56
PLAMEN V. KOLEV AND M. JAMAL DEEN
higher resolution of the peaks. In our system, we rely mostly on the use of the boxcar function with time delays t? = 2tl by simply subtracting two selected data points with the same index m but from different groups IZ and n 1. The corresponding DLTS scan is displayed in an inset on the computer screen during the thermal scan, overlapping the transient display. The time delays can be varied and the resulting DLTS scans can be seen in real time. Thus both the transient signal and the DLTS scan are updated on the computer monitor after each pulse, similar to what it would be if displayed on an oscilloscope screen. Whereas the data points for the Arrhenius plots can be obtained during the experiment, the analysis of a given trap can start as soon as the scan of the temperature interval corresponding to the measurable trap time constants is completed. Figure 20 shows several transients recorded for each degree of temperature change. The magnitudes of the traces were adjusted to fit on the same chart and are not to scale. This adjustment was made in order to demonstrate the benefits of the very high SNR, which allows for making obvious the observed time constant changes. These results suggest the option of using isothermal analysis DLTS (Okushi and Tokumaru, 1980; Yoshida et LII. 1993), which can minimize the effects of thermal dependencies of the capture cross-section and transient magnitude. For this purpose, we need to determine the emission rate without relying on its temperature dependence; for example, by displaying the transient on a logarithmic scale. Figure 21 shows several transients at the same temperature (47 K), but with different settings of the reference capacitor and the corresponding different reverse steady-state bias voltage across the sample. For comparison, we
+
ion - implanted
\
-
o
,
I
*
.
ioo
zoo
I
I
400 Delay tiinc 1ni\1 300
I
8
5oo
.
I
6on
FIGURE 20. Temperature variation of the transient time constant as ohserved during m a surements. The curves demonstrale the need for better temperature resolution at low temperatures. Magnitude is not to scale. The temperature was maintained with accuracy better than +/-0. I K.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
57
-=--;y
-.
ion-i m plm ted p-MOS capacitor at 47 K
4v RC
.sv OV
0
200 400 Delay time [ins]
600
Ficitmf; 2 I . Time constant variation at fixed temperature of 47 K and variable reverse bias 0 to 8 V. The time constants calculatcd from the slope 0 1 the trace vary lrom 200 i n s at zero bias to 158 nis at 4 V. Note the departure froiii the exponential hchavior at 8 V bias during the first 5 0 nis. For comparison is shown a transient obtaincd by difierenliating a squarc wave with an RC high-pass tiller is a150 shown.
also recorded a true exponential transient with the magnitude and the time constant adjusted to be close to the real DLTS signal. This true exponential transient was obtained by differentiating a square wave with an appropriate RC filter. The slope of the traces gives time constants decreasing from 200 ms to IS8 ms. The decreased transient time constants, corresponding to low values of the reference capacitor or higher reverse bias demonstrate the effect of field-enhanced emission (Couturier et d., 1989). It is important to note here that the change in the slope, and the corresponding change in the transient time constant, can be easily monitored at a fixed temperature. This allows for appropriate adjustment of the measurement setup during the experiment. For example, by adjusting the voltage and monitoring the slope of the trace, we can avoid the conditions of field-enhanced emission, and at the same time, obtain a large signal that improves measurement sensitivity.
2. DLTS Spectro In the measurements in Figs. 18-21, the pseudo-logarithmic settings were N = I 1 and M = 16, the temperature variation was limited to between 40 and SO K, and all transients were associated with just one trap. Indeed, with less than two transient recordings per second and the digital time constant set to 29, about 600 s were needed to obtain good averaging results. Clearly, these settings are not very convenient for a full-range thermal scan, and this setup is
58
PLAMEN V. KOLEV AND M . JAMAL DEEN
better suited for isothermal measurements when a significant noise reduction is essential. When the signal level is high compared to the noise, we set the digital time constant to 24 through 2', and then we can store transients up to 1.3 s long for the full range of the thermal scan. For scanning temperature measurements of noisy transients, we use a setup of N = 8 and M = 10, which is enough to record a 50 ms long transient with just 9 0 points. With about 20 continuous time averages per second, this setup allows us to use digital time constants of up to 2"', depending on the temperature scanning rate. For example, if the setup is 2", then to change the values of the stored transient e times takes less than a minute. Recording the transient at each degree temperature, and with a temperaturc scanning rate of 0.8 K/min or less, there is still enough tirne lor averaging, because the time constant of the transient varies far less than e times for each degree of temperature change. This is especially true at temperatures above 100 K where the DLTS signal varies slowly with the temperature. Using this setup and the simple boxcar technique already described, a series of six DLTS spectra were obtained (Fig. 22). There are five well-defined peaks present. The shape va.riation of the large peak at the highest temperature suggests the existence of a sixth peak, which is dominated by the large peak. This peak appears more on the curves corresponding to a short time constant setup, and it is a rather strange result because it is well known that the DLTS peak resolution improves for setups of long time constant windows (Doolittle and Rohatgi, 1992). However, further analysis shows that this false peak can
p-MOS capacitor
f =
50
I ; 2 : s; 10; 20; 37
I00
Ills
150 200 Ternpcrature [ K ]
250
300
FIC~UKE 22. Series of six boxcar DLTS spectra with channel delay ratio 12 = 211 obtained from onc thermal scan. The change in the background level seen at 100 K is lor r = I ms and r = 2 ins. The change in the magnitude for longer time constants is probably due to temperature variation of the capture cross section.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
59
be associated with signal overloading, which affects the first half of the stored transient more than the second half. Figure 23 shows the Arrhenius plots obtained using the same data file. As outlined in Benton ( 1990),obtaining the energy level positions and capture cross sections is only the first step toward impurity or defect identification. We limit the scope of this section to demonstration of the averaging techniques. The trace in Fig. 21 with lowest influence of the field-enhanced emission is fitted well by an exponentially decaying signal with a time constant of about 200 ms. In the rightmost Arrhenius plot in Fig. 23 for the trap around 50 K, this time constant corresponds to 47.6 K. This again supports the need to record more than one transient for each degree temperature at low temperatures. Using a setup of tl = 10.3 ms and t? = 20.4 ms for the rate window, we have found the peak temperatures of the traps appearing in Fig. 22. The recorded transients at these temperatures are shown in Fig. 24. Signal overloading of the transient recorded at 251 K is seen at the beginning of the trace for delay times of less than 5 ms. This is the reason for the false peak appearance on the low-temperature side of the large peak in Fig. 22. Therefore, the observation of the whole transient can easily prevent the incorrect interpretation of the shape variation mentioned here. In the inset of Fig. 24 are shown details of the same transients during the first 2 ms. This can further improve the rejection of incorrect data for transient analysis. One simple way to check the transient distortions at the beginning of the transient that are difficult to see is to display the transient on a logarithmic time scale. X
4
2
SO
I00
I so
200
250
I/kT l e v - ' ]
23. Arrhenius plots obtained from the scan data file used in Fig. 22. The energy level positions of the hole traps are in eV ahove the valence hand a s indicated. Note the good linearity of the data points. FlGLIRF
60
PLAMEN V. KOLEV AND M. JAMAL DEEN 2,s 251 K 2.0 53 K 1x9 K
5 .o I
0.S
Delay liinc Inis]
61 K
I42 K
0.0 0
I0
20
30
40
so
Delay lirnc [ins1
FIGLIRE 24. Series of transients at the peak temperatures for sctup f 1 = 10.3 ins and 12 = 20.4 nis of the traps in Fig. 21 (same data file). Details of the first 2 ms are shown in the inset. Note the overloading at 25 I K .
In Figure 25 is shown the same series of transients as in Fig. 24, but on a logarithmic time scale. Here one considers the data above 0.4 ms because below this limit the recorded data points reflect the capacitance meter recovery process. The switching time of the feedback circuit is below 1 ys and it does not introduce any significant delay. Another possible source of relatively slow recovery time is high series resistance of the sample. From the point of view of demonstrating the averaging techniques, it is interesting to note the smooth shape of the curves during instrument recovery. This is possible using only very high time resolution with a good SNR in the beginning of the transient capture. One of the curves in Fig. 25 (at 53 K) is presented as a set of data points in order to demonstrate their even spacing on a logarithmic time scale. For comparison, another CC-DLTS trace recorded with the same setup is also shown. The measured sample was a junction field-effect transistor subjected to neutron irradiation. This trace proves that the feedback and the biasing circuit are not the source for the distortion in the remaining traces. F. Conclusions
In this section, a new approach to the digital signal processing of DLTS signals by separating the noise and data reduction from the transient analysis was introduced. The combined action of two complementary digital averaging techniques to improve the DLTS digital signal processing was demonstrated. Pseudo-logarithmic time averaging is efficient in reducing the number of
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
61
2.5
2.0
-> 1.5 -E
5 1.0 0.5
0.0
I
0 .I
" I
I0
SO
Delay time I iris] f k i l l R E 25. The data of Fig. 24 plotted on a seinilogarithmic scale to display details ;it the beginning of the transient. The signal before 0.4 nis is distorted because of the instrument recovery after the tilling pulse. The dots in the trace at S3 K show the data points evenly spaced on a logarithmic scale. For coinparison is shown ii CC-DLTS trace obtained irom neutron irradiated junction field effect transistors and reduced 25 times i n mngnitude.
processed data points and improving the S N R for high-frequency noise components and for the transient tail. Continuous time averaging is well suited for improving overall SNR, for continuous data display and processing, and it is more efficient in using the computer resources. The described combination of hardware and software tools for implementation of these techniques supplies continuously fresh data and does not require any synchronization with the main computer program. The proposed techniques allow one to combine the powerful transient analysis of the digital DLTS methods with the sensitivity and convenience of the analog methods. Furthermore, the developed averaging techniques can be easily adapted to data processing in many other experiments, where the sampling is performed at a fixed rate, but the results can be adequately presented on a logarithmic scale.
Iv. FEEDBACK CIRCUITS AND EXPERIMENTAL SETUP FOR CC-DLTS AND CR-DLTS
A. Iiztrodiution Since the introduction of DLTS by Lang (1974a), a large variety of modifications and improvements have been reported. When a capacitance meter is
62
PLAMEN V. KOLEV AND M . JAMAL DEEN
used, there are two possible modes of operation. In constant-voltage mode, one can measure the capacitance transient (CVCT). The second way is to keep the capacitance constant by using a feedback circuit (Pals, 1974; Goto et a/., 1973; Johnson, 1982; Miller, 19'72; Li and Sah, 1982a; DeJule et al., 1985; Klausmann, 1986; Shiau e t a / . , 1987a; Kolev, 1992; Kolev and Deen, 1998) and to measure the voltage transient (CCVT). Combining DLTS and CCVT modes has many advantages. As Johnson (1979) has shown, because the capacitance is constant, the depletion layer width also remains constant and, therefore, the change of the net charge trapped in interface states is directly proportional to the measured voltage transient for any interface trap density. Furthermore, the proportionality factor does not depend on the temperature, doping concentration, or doping profile. With double correlation technique, CCVT can be used for both deep imperfection profiles measurements Lefkvre and Schulz (1977a) and energy-resolved interface trap measurements (Johnson, 1979). In the classical setup, shown in Fig. 26, the capacitance of the DUT is forced to be equal to that of the reference capacitor connected at the time. This is done by using a large-gain compensation amplifier OA in the feedback loop to adjust the bias voltage A V across the sample. In practice, for this purpose an integrator with a sufficiently long time constant r = HlCl is used to prevent any oscillation in the system, by reducing the total gain in the feedback loop for high frequencies. The cut-off frequency of the integrator is set depending on the properties of the sample at the quiescent point. However, the reduced speed of the feedback amplifier causes the voltage transitions between the filling and emission pulses to be integrated, thus distorting the correct signal. The small
F K ~ O K26. E Classical setup for constant-capacitance measurements. Thc feedback loop contains: DUT, capacitancc meter, and a high-gain compensation amplifier OA with cut-off gain . P is alternating the connection between the refercnce for high frequencies set by R L C I Signal capacitor for the filling pulse Cf and that for the emission pulse C,. Shown is also the voltage applied to the DUT containing fast (before moment t) and slow transients.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
63
voltage transient, which compensates for the change in the trap occupancy with charge, is outweighed by a large transient caused by the integration of the voltage step. In Fig. 26 this occurs mostly before the moment t . The distortion is significant for low-density impurities and for measurements with short delay times (before t in Fig. 26). Taking into account that most often the voltage step is in the range of hundreds of millivolts or even several volts, while the voltage transient caused by charge emission from the traps is usually below I mV, the distortion is a serious problem that significantly reduces the useful part of the recorded transient. The large transient is difficult to account for because the effective gain d C / d V of the DUT is generally unknown and temperature dependent. A very good solution to this problem was proposed by Shiau et ul. ( 1 987a). In addition to the integrator in DeJule et (11. (1985), an additional local feedback loop, which the authors call a “memory circuit,” was introduced. Its purpose is to store the bias applied on the sample at the end of each pulse and to apply this bias as a baseline at the beginning of the next cycle. Thus the slow integrator needs to follow only the transient caused by the traps and the summing circuit is used to combine the total bias voltage. Unfortunately, the sensitivity reported in Shiau et d.(1987a) (about 1/500 of the dopant density) is too low for measurement of interface state densities at the Si/SiO;! interface. Following the basic idea proposed in Shiau et a/. (1987a), a new “memory circuit” has been created with some improvements. The sensitivity has been increased up to 1 OWs of the dopant density, and measurements with delay times as short as SO/IOO ps are still possible. Furthermore, the concept of alternating the feedback circuit configuration synchronously with the sequence of filling and emission pulses is the technical basis for the new variation of the DLTS technique that we called constant-resistance DLTS. R. Feedbuck Circuit urid Detuils on the Setup , f . r CC-DLTS
The feedback circuit is shown in Fig. 27 and it is similar to the circuit in Shiau et al. ( 1 987a). It consists of an integrator OA, a summing amplifier C, and four sample and hold amplifiers S&H-I through S&H-4. Using these sample and hold amplifiers, two parallel analog memories are built, each one providing the corresponding baseline voltage for filling or emitting pulses at the input of the summing amplifier. We call them “emission pulses memory” (S&H- 1 and S&H-3 in Fig. 27) and “filling pulses memory” (S&H-2 and S&H-4 in Fig. 27). Only the noise introduced by the “emission pulses memory” is important for DLTS applications. I n order to reduce this noise, two RC lowpass filters R2C2 and R3C3 are applied in the “emission pulse memory.” The time constants of the integrator R I C I and the lowpass filters are adjusted with respect to the boxcar averager delay times. Further noise reduction is achieved
64
PLAMEN V. KOLEV AND M. JAMAL DEEN
Y
1
CJ
bb
0
I
i
0 I
1
R
.3 a .-0
CJ
E * X Y r
-
SWITCHING CONTROL FIGURF. 27. Improved feedback circuit-details. Shadowed area denotes the “filling pulse memory.” which can he omitted when the overloading during the filling pulse can he ignored. w
by using commercially available sample and hold amplifiers. They are used in unity gain mode and are connected with large storage capacitors CI and C5 (-2 pF high quality type, e.g., polystyrene). These capacitors are connected via -I kR resistors (not shown in Fig. 27) to further increase acquisition time.’ As a result, the acquisition time rises to several hundred milliseconds or more, and it can be adjusted by varying the resistance in series to the storage capacitors Cq and Cg. In the hold mode, because the input current of the reading amplifier in the sample and hold circuits is very low, the voltage drop across these resistors can be neglected. The rise of the acquisition time does not affect the system performance, but the noise reduction for low-frequencies is significant. An additional benefit is the reduced distortion when large time constant windows are used. I n Shiau ef a / . (1987a) the distortion is reported to be 1 mV per second, while in our circuit it is less than 7 pV/s and can be neglected in most cases. With this control of acquisition time, an ordinary S&H circuit behaves much like a boxcar channel. Alternatively, a boxcar channel could be used to provide the baseline for the summing amplifier (Vitanov and Kolev, 1986). As it was explained in the previous section, the purpose of the feedback circuit is to bias the DUT in s,uch a way that its capacitance is equal to the corresponding emission reference capacitance C,. A miniature electromechanical relay or a fast, low-capacitance diode is used for alternating C, and C f at the reference terminals. Another solution is to replace these reference capacitors with a varactor diode, biased separately from the DUT. The varactor diode Atitdog
Device.c Data-Acqui.sitiori Dntcihook, Vol. 1, pp. 14-31 (1985).
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
65
replaces both capacitors, and its capacitance is alternated simply by applying voltage pulses to the external bias input of the capacitance meter or to an external biasing circuit.’ The varactor diode is easy to switch and adjust, but it is neither an ideal switch nor a high-quality capacitor. As an alternative, the reference capacitors can be selected to be high-quality variable air-capacitors, but the switching device is slow (Reed-relay or mercury relay), or non-ideal, and operates over a limited voltage range (a diode or an analog semiconductor switch). Additional complications arise when the variation of the reference capacitor used for defect profiling is automated, because then a bank of reference capacitors combined frequently by a slow relay-matrix is used. The problem is that the relays in this matrix have their own parasitic capacitances. Inevitably, some compromises are needed and the choice is made on a case-by-case basis.
C. Experimental Setup for CR-DLTS In the setup for CR-DLTS shown in Fig. 28, the sample transistor is connected as a voltage-controlled resistor between the test terminals of a Boonton 7200 capacitance meter. On the differential terminals a reference resistor R,, is connected that has a low conductance at the I MHz test signal frequency. In our case, this resistor is about I M R . The capacitance meter is used as a highsensitivity amplifier of the difference in the channel conductance of the FET sample transihtor
Sam ple transistor
-
%-
F K i U R E 28. a) Block diagram of the setup for CR-DLTS; h) connection for hack-gate driving available for four-terminal devices; c) connection of two devices for simultaneous measurements (expandable to more).
’
Application Note IM-00 I , Boonton Electronics Corporation, 25 Eastmanh Road, Parsippany, NJ 07054-0465 USA; Insrruction Munutrlfiw Model 72B C a p a d o w e Meter, Boonton Electronics Corporation, Parsippany. NJ 97520.
66
PLAMEN V . KOLEV AND M. JAMAL DEEN
and the conductance of the reference resistor. This difference is detected by a phase-sensitive detector (PSD) and, through the conductance analog output of the capacitance meter provided on the rear panel, then applied to a feedback circuit. An important property of the circuit is its dynamic configuration in response to the driving pulses. Essentially, the feedback output is connected to the gate of the sample FET by the switch SW only during the emission pulse. During the filling pulse, the feedback loop is open and at the FET gate a constant bias from an external source is applied. During the emission pulse with the feedback loop closed, a high-gain compensation amplifier OA provides for almost exact matching of the channel conductance of the tested transistor to that of the reference resistor R,,f by continuously adjusting the gate voltage V , . As the value of R,.“
I00
d
S0
0
SO
100
I so
200
250
300
Temperature [ K ]
F1cmw 56. CR-DLTS spectra of general purpose silicon JFETs 2N5459. These devices were tested “as received” without any intentional damages. Note the y-axis scale and the sensitivity of the measurement.
Fig. 56 the CR-DLTS spectra of these devices are shown displaying two deep levels with low concentrations. As transconductance of the samples is in the range of several milliamperes per volt (which contributes to total feedback gain), we were able to perform very sensitive measurements. The Arrhenius plots of the deep levels are shown in Fig. 57. Notice the overlapping of the data for defect El in both samples. The difference in the slope of the plots for E l is less than 1 MeV. As these devices are from the same distribution set and most likely have the same production history, the coincidence of the fitting lines is not surprising. However, the overlapping of the Arrhenius plot demonstrates the high precision of our DLTS system (Kolev et id., 199Xa) because the data is for two different devices measured at identical conditions. When the source for defect generation is not known. obtaining accurate activation energy and capture cross section is important, but this is only part of the process of trap identification, as outlined in Benton (1990). Without technological data, we limit our comments only to the demonstration of the new CR-DLTS technique as a potential tool for routine defect analysis and possible control in industrial environments. With a combination of current-voltage (I-V) and capacitance-voltage (C-V) ineasurements, and with the aid of the design catalog of Siliconix Inc., the doping level was estimated to be around 5 . 1015cm-3. Then, the trap concentrations were calculated using Eq. (54). One possible application of CR-DLTS is to study the defect distribution in the bulk of the channel, for example, by varying the filling pulse level in order
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
IkT
I05
leV-11
FIGL~RP, 57. Arrlieiiiiis plots o f the traps i n the CC-DLTS spectra on Fig. 56. Note the difference i i i the energy level and capture cross section of El in both samples.
iiiiiior
to selectively populate only fractions of the traps (Lang, 1974a). In Fig. 58, we denionstrate several CR-DLTS scans of sample 2N5459 # I with variation of the magnitude of the filling pulse relative to the pinch-off voltage V,, (about -5.2 V in this case). There is an obvious change in the peak magnitudes vs tilling pulse level, particularly around -3 V. Again, we emphasize more the aspects of possible applications of the new technique rather than the final result of the defect profiling.
2.
Grmiuiiiiini
JFETs
We demonstrate the new technique also with p-channel germanium JFETs TlXM I2 produced by Texas Instrument. As in the previous case, these samples were neither used in an electronic circuit nor damaged with radiation. In Fig. 59 are shown CR-DLTS spectra in one device. Unlike the spectra of the previous devices, the spectra in Fig. 59 show bipolar signals resulting from the measurement of two-sided p-n junctions. We obtained fairly good Arrhenius plots shown in Fig. 60 and from these, the energy levels were found to be 0.12 eV, 0.173 eV, and 0.283 eV above the valence band edge for H I , H2, and H3, respectively, and -0.268 eV for E l below the conduction band. The capture cross sections determined from the intercept with the y-axis were 7 . 4 . 10p15cm', 4 . IO-"cm?, 8 . 10-'3cm', and 7 . IOp'"cm', respectively. However, this data should be treated with caution because lhe results for the hole traps H1, H2 and H3 could be affected by the electron
106
PLAMEN L’. KOLEV AND M. JAMAL DEEN
- I00
c
Si J E T 2 N 5 4 5 9 # I at T = I6
111s
I
-80
-
2.
-60
u
00
5 -40 -20
0 I80 220 Temperature [ K ]
I40
260
FIGIIRE58. CR-DLTS scans with variable filling pulse level. The signal magnitude change with the filling pulse level can be used for defect profling
G e JFET T l X M I2 #3
-200
-1
YI
z [ins]
so
1.2 14.4 29 12
- I00
I
OIJ
5
-50
0
50 50
100
I50
200
250
T e in perat ure [ K ]
FIC~URE 59. CR-DLTS spectra of gerinanium p-channel JFET TIXM12 #3. The sign reversal is probably caused by emission from electron trap inside the gate material.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
107
Ficiu~ri 60. Arrhenius plots o f the gerninnium transistor in Fig. 59. Note the fairly good linearity and small data spread despite the low signal levcl shown in Fig. 59.
trap E l , or by field enhanced emission. This may explain the difference in our results with those obtained in high purity germanium by other researchers (see Blondeel et a/., ( 1 997); Evwardye rt 01, 1979; Haller et ul. ( 1979) and the References therein). Therefore, we limit our discussion only to illustration of the CR-DLTS in general purpose commercially available transistors. From the magnitude of the signal, it can be estimated that the sensitivity of the concentration measurement should be comparable to that of the silicon devices in the 1 0 ' l c m ~ range.
3. Cotiipurison of CR with CC ~ i n dCT-DLTS To explain the similarities and differences between constant-capacitance (CC-) and CR-DLTS, we need to review some specifics of the C-V curves of a JFET. From, we note that there are three capacitors connected in parallel: gate-source, gate-drain and gate-channel capacitors. This model is further complicated from the internal connection of the top and bottom gates. I n Fig. 61 are shown typical C-V curves of a monolithic silicon JFET. Let us consider the possible configurations for C-V measurements. If the p-n junction capacitance of the gate versus source or drain is measured, then we obtain curve similar to curve a). One can repeat this measurement with source and drain connected externally together, in order to eliminate the isolation of the two areas around and below the pinch-off voltage. Curve b) was obtained from this configuration. The source-drain symmetry of the
I08
PLAMEN V. KOLEV AND M. JAMAL DEEN
channel
X0
60
40 (b)
20
0
(a)
I
I
I
0.0
-0.5
-1.0
:
1
-1.5
Bias [ V l
FIGLJRE 61. Capacit~unce-voltage(C-V) CLII-vcs of JFET J I : a) gate to source 01- gate Lo drain capacitance: b) gate to sotirce and drain. which are connected in parallel. The aspect ratio is 11.400 vni 10 5 ~ r n .
device can be verified by multiplying the part of curve a) around and below the pinch-off voltage by a factor of two, and then comparing the result with the corresponding part of curve b). As seen in Fig. 6 I , the measured device is fairly symmetrical. In Sze (1983), the expression for the pinch-off voltage (5 1 ) was derived using the assumption of abrupt edges of the depleted regions extending into the channel from the top and bottom gate (see Fig. 51). When these regions merge, the contact between source and drain no longer exists, and this should lead to a sudden drop in the gate capacitance. However, Fig. 61 shows that this sharp drop in the capacitance extends from -0.8 to - 1.2 V, and this reflects a gradual merging of the ends of the depleted regions (Debye tails). In this region of the gate voltage, the channel is depleted from free current carriers. In Fig. 61, this region can be defined froin the bias point where curve a) and curve b) split to the bias point where curve a) multiplied by 2 deviates sharply from curve b). Above this region, the voltage variation corresponds to variation of the depletion width (and thus, capacitance) of the p-n junctions extending from source to drain along top and bottom gates. Below this “channel depletion” region, in the pinch-off region, there are just two separate capacitors o f the source and drain vs (mainly) the bottom gate. The relative contribution of the gate-channel capacitance can be estimated from the smooth step in the curve b) during the transition from the linear to the pinch-off region. From Fig. 61 it is clear that this capacitance is dominated from the capacitance of the gate-source and gate-drain p-n junctions and the associated edge capacitances.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
109
In constant-capacitance mode of operation, the volume of the depleted region is fixed by selecting the value of the reference capacitor. In order to compare this mode with CR-DLTS, one has to select an appropriate value of the reference capacitor, so the baseline in CC-DLTS (see Fig. 28) during the emission pulse equals that in the CR mode. This will ensure complete depletion of the channel and emission from all charged traps in the channel. However, in CR-DLTS, the baseline drives the transistor in the subthreshold region. Therefore, matching this requirement in CC-DLTS mode effectively eliminates the reaction of the traps inside the channel region-the channel will be completely depleted. As a result, the gate-channel capacitance will be independent of the gate voltage. As the gate-channel capacitance is connected in parallel with the gate-source and gate-drain capacitors, the observed voltage transient in CC-DLTS mode will compensate for the variations in these two capacitors. This means that the CC-DLTS signal will originate from the traps located inside the source and drain depleted regions. In contrast, i n CR-DLTS mode, the voltage transient compensates only the charge emitted from the traps localized inside the channel, because the charge trapped in the source and drain depletion regions cannot affect the pinch-off voltage and the channel conductance. Of course, one can select a value of the reference capacitor convenient to bias the transistor in the linear region (Fig. 61); in this way the traps inside the channel can be measured. However, the existence of a conductive layer in the middle of the channel during the emission pulse will reduce the total number of responding traps. Even if this biasing is matched in CR mode by reducing the value of the reference resistor, and Eq. (54) is appropriately changed, there will still be a difference in the area and volume tested by the two techniques. Thus, an accurate comparison of CR- with CC-DLTS is not possible in this case. Although the two techniques use very similar instrumentation, they rely on entirely different physical mechanisms for compensation of the charges emitted from the traps. Fortunately, both techniques are area independent and, for uniform defect distribution, should ideally produce identical voltage transients. In practice, there will be some difference in the magnitude because of edge capacitance effects- the effective defect concentration will appear different even for a completely uniform defect distribution. More significant differences will appear for small area devices. In CCDLTS, the differential capacitance vs voltage (dC/dV) plays a critical role i n the total gain of the feedback loop as an “internal gain” of the test structure, and it strongly affects SNR. Obviously, small u r w devices will have low dC/dV and, if possible at all, the CC-DLTS measurements will have a severely limited sensitivity. In CR-DLTS however, the parameter corresponding to the “internal gain” of the test structure is the transconductance of the transistor, which depends on the m p c t rotio W/L, and not on the active device
110
PLAMEN V. KOLEV AND M. JAMAL DEEN
area. Thus, unlike the constant-capacitance technique, the constant-resistance technique is completely area independent and allows for very sensitive DLTS measurement of deep-submicron devices. This is demonstrated by sensitive DLTS measurements made on 2 pm x 0.2 pin MOSFETs (Kolev and Deen, 1998). In Fig. 62 are shown several curves demonstrating the magnitude dependence of the DLTS magnitude on the filling pulse level. This is a classical method for obtaining the trap distribution in the depleted region by partially filling and monitoring the emission from only a fraction of the defects (Lang, 1974a; Akita et al., 1993; Deen, 1993b). As seen in Fig. 62, to the first order, the trap E3 is uniformly distributed (Lang, 1974a). For our illustration of the potential for defect profiling using CR-DLTS, more important is the precise overlapping and smooth nature of the curves. For nonuniform trap distribution, the analysis would be complicated by the internal, nonremovable interconnection between the top and bottom gates. Thus, for low-level filling pulses (in our case around -1 V), the traps in the middle of the channel will be first populated (Fig. 51). As the filling pulse levels go to zero, more traps will be charged and the signal magnitude will increase. We see this behavior in Fig. 61 down to zero bias, and slightly in the forward bias direction where the signal saturates. The major difference between CR and CC-DLTS is in the forward direction. In addition, the difference between the two modes of DLTS is more
e-
I
~
~
AAA
AVg CR J5
000
AVg CK J1
AVg CC JS
AAA
AVg CC J1
+++
ACg J1
v
i
C
u" U
->
A0 0
*M I
40 -
:
0.0
*A
0
4,
O
-03
channel depletion
-1
.o
-1.5
Bias [ V ]
FIGURE 62. Profiling curves of J1 ( I 1400 pin x 5 pin) and JS (400 pm x 7 pin). Circles denote CR-DLTS data, triangles denote CC-DLTS, and crosses show the data obtained from CT-DLTS after appropriate magnitude adjustment.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
111
pronounced for J 1 ( 1 1,400 pm x 5 ym’), a device with very large periphery as compared to that of JS (400 pm x 7 pm’). At about - 1.3 V, when the f i l l ing pulse magnitude and the DLTS signal are close to zero, the extrapolated curves cross the x-axis at different points. This corresponds to a difference in the pinch-off voltages of the devices. After appropriate scaling, the magnitude dependence of the standard capacitance-transient DLTS signal completely coincides with the curves obtained with CR- or CC-DLTS. In Fig. 63 are shown CC-DLTS spectra of device 3 1 . The main difference between the results in Figs. 63 and 52 is in the appearance of a partly resolved peak below 200 K. We attribute this peak to a local fluctuation of P,-Ci defect E4 located at 0.29 eV (Asom et d., 1987). In view of the difference in the two modes of operation, we believe that this peak is related to defects concentrated mainly in the source and drain regions. As the CR-DLTS signal originates entirely from the channel region, it may have reduced influence from this type of defect. The activation energy and capture cross sections of the traps are shown in Fig. 64 and are similar to the those obtained by CR-DLTS. Also, it was possible to perform sensitive CC-DLTS measurements using three other transistors in parallel, -53, 34, and 37. The spectra are shown in Fig. 65 and the Arrhenius plots in Fig. 66. Comparing with the results from measurement of a single device J I , the area independence of CC-DLTS mode of operation is demonstrated. The minor differences can be attributed to the different areato-periphery
t
-c
I
40
I20
200 Temperature [ K ]
280
FIGURE 63. CC-DLTS scans of JFET J I . Below 200 K a weak peak E4 appears, but it is dominated by E3. which i s tentatively assigned to a P,-C, complex at -0.29 eV. Note the wide range of rate windows used.
112
PLAMEN V. KOLEV AND M. JAMAL DEEN
40
100 I/kT [ e V - ' ]
70
130
I60
FI(;URE64. Arrhenius plots from CC-DLTS data. The device-measured is J I
40
I20
200 Temperature [ K ]
2x0
Firmiti; 65. CC-DLTS spectra obtained from measurement of three J E T S connected in parallel. Note the similarity with the spectra in Fig. 63.
The thermal dependence of the baseline voltage for one CR-DLTS and two CC-DLTS spectra is shown in Fig. 67. The darker line corresponds to constant-resistance mode. Around 200 K, this darker line displays a step that is related to E3-for a given pulse period and below certain temperature, the time constant of E3 is slow and the traps remain charged all the time.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
40
X0 I00 I k T IcV-l]
60
I20
113
I40
FIGURE 66. Arrhenius plots of the defects i n Fig. 65.
t
Bahelines of JFET J 1 W = I I000 ~ . l m1. = 7 uin
SO
I00
I so 200 Tcmpcraturc [ K ]
2.50
3011
FIGURE 67. Thermal dependence of the baseline voltage is very similar lo the thermal dependence of the pinch-off voltage. The darker line was recorded from a CR-DLTS run, CC-DLTS # I was recorded after adjustment at the room temperature, and CC-DLTS #2 was recorded alter adjustment at 50 K. the low-temperature end of the therinal scan.
This explains the similarity of the curve with thermally stimulated capacitance curves (Walker and Sah, 1973). The remaining two lines are recordings of the baseline voltage during operation in constant-capacitance mode. Line #1 was obtained after the reference capacitor was adjusted at room temperature so the baseline was the same as in CR mode. Note the different effect that carrier
I I4
PLAMEN V . KOLEV AND M. JAMAI, DEEN
freezing has on the CR and CC baseline curves at low temperatures. This effect is more pronounced in the C C curve #2, which was recorded after the adjustment was performed at S O K. One should note that C C curve # I actually follows the temperature variation of the Fermi level in the source and drain depletion regions, while the CR curve is related to the Fermi level variation in the channel region. Both C C curves have slope changes around 80 K that correspond to defect E l , but curve # I has the change associated with E3 at significantly lower temperatures. This may be related to the specific balance between carrier freezing and the shift of the Fermi level inside the source and drain depleted regions with the temperature. Figure 68 compares several CR-DLTS spectra of devices with different aspect ratios. Also included is a CC-DLTS spectrum of three of the devices connected in parallel. The fluctuations in the magnitude are insignificant and can be attributed to variation of the defect concentration. This suggestion is supported by the appearance of a defect E4, which we have assigned to P,-Ci complex (Asom rt a/., 1987). Notice the complete independence of the magnitude of the DLTS signal from the size of the device. This is also true for the constant-capacitance spectrum, which is influenced by the trap E4 in the same way as it appears in the spectrum of 57, one of the devices which was connected for the CC-DLTS measurement. The largest available device in the set J I displays the strongest signal for defect E3, and in the same spectrum, the magnitude of E2 appears almost at 60% as compared to the other CR
SO
I00
I so Ternperature [K]
200
Flcuw 68. Comparison of CR-DLTS spectra of transistors with different siLe and with a CC-DLTS spectrum. The small difference in the magnitude can be attributed to variations in the trap concentrations and to edge capacitance effects. CC-DLTS spectra were obtained from JFETs 53. 54, and 57 connected in parallel.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
1 15
spectra. This can be regarded as evidence for the independence of the DLTS signal magnitude from the device size or the mode chosen for the experiment, CR or CC. The advanced digital signal processing implemented in our system (Kolev rt d.,1998a) allows us to record 1.3 s. long transients with just 208 data points and with 10 ps-resolution at the beginning of the transient. In Fig. 69 are shown several recorded transients at temperatures where the time constant is roughly equal to 1/.5 of the duration of the emission pulse. In this way, we could record most of the decaying curve. Next, we display the traces on a semilogarithmic scale and this allows for immediate checking for nonexponential behavior. As we see in Fig. 69 all the traces deviate from pure exponents during the first 2.5% of the time. This is the reason for magnitude variation in Figs. 52, 54, 63, and 6.5. Less deviation is present in the transient of E l , and this corresponds to reduced variation of the peak magnitude. This nonexponential behavior is frequently attributed to field enhanced emission, distortion caused by large trap concentration, or presence of multiple traps. However, more plausible is the cause suggested in Rockett and Peaker (1981)-variation of the Debye tail both with temperature and bias. In our case this effect is enhanced by the overlapping of the Debye tails from both top and bottom gates. Also, the bias is fixed to relatively low values near the pinch-off voltage. The difference in the magnitude between CR and CC
Dclay time Lms]
FIGLJRE 69. CR and CC traces of JFETs J I and J5 recorded at appropriate peak temperatures. The deviation from pure exponential decay is the rcason for magnitude variation in Figs. 52, 54, 63, and 65. CT-DLTS denotes a capacitance transient trace after appropriate scaling and shifting i n order to be distinguished from thc CR-DLTS 1r;ice. CR-DLTS JS - low was recorded with sinall magnitude of thc tilling pulse referred to the basclinc and x a l e d until the tail coincides with the capacitance transient trace of the same transistor.
1 I6
PLAMEN V. KOLEV AND M . JAMAL DEEN
mode of operation is attributed to the edge capacitance. We find evidence to support this suggestion in the differences between the two modes for JI and J5 -device J5 with smaller periphery has the CC trace closer to CR trace as compared to the much larger device J I . For sensitive capacitance-transient DLTS measurements, the capacitance meter should be operated in its most sensitive range. This requires the quiescent sample capacitance to be balanced by a reference capacitor during the thermal scan (Doolittle and Rohatgi, 1992). Because we did not have a computer-controlled reference capacitor (Doolittle and Rohatgi, 1992), we were unable to perform sensitive capacitance-transient DLTS measurements. However, our reference capacitor for CC-DLTS measurements could be temporarily used for balancing the sample capacitance at a fixed temperature. This allowed us to record the capacitance transient CT DLTS J5 in Fig. 69. After appropriate scaling, it coincided completely with the other transients and this proves that the nonexponential behavior is not related to a particular DLTS mode of operation -they produce identical results. In order to verify the reason for nonexponential behavior, we also recorded a transient denoted as CR-DLTS JS-low (a. u.). It was recorded after a low-level filling pulse, just I80 mV above the base line maintained by the feedback circuit. With this low filling pulse magnitude only those traps localized near the edge of the depletion regions of the top and bottom gates (in the Debye tails) were populated. According to the bias dependence demonstrated in Rockett and Peaker ( 1 98 I), this should increase the nonexponential behavior. Next. we adjusted the tail of the recorded transient by scaling it to coincide with the tail of the capacitance transient, the uppermost trace in Fig. 69. The difference between the two curves during the first 200 ms gives strong support to the idea that the cause for nonexponential behavior (as suggested in Rockett and Peaker (1981) and Zhao et al (1987)) is charge capture in the Debye tail region. In Fig. 70 are shown CR- and CC-DLTS spectra of the germanium JFET. The area of transistor 2N54.59 was too small to perform the constant-capacitance measurement with the required sensitivity. Again, there is a remarkable overlapping of both spectra in Fig. 70, particularly below 120 K . The CR curve also demonstrates much better SNR when compared to the CC curve. Obviously, for small devices, CR-DLTS is superior to CC-DLTS. We have observed an interesting difference between CR- and CC-DLTS spectra at high temperatures when the gates of several monolithic JFETs were connected in parallel. In Figs. 7 1 and 72, CR- and CC-DLTS spectra are shown at high temperatures. Not only is the magnitude substantially different, but the signal sign is also reversed. As the devices were isolated on common, floating substrate, the minority carriers generated at high temperatures could accumulate on the external gate, and this affects the measurement in a similar way
DEVELOPMENT AND APPLICATIONS O F A NEW DLTS METHOD
117
I00
Ge JFET #3 z = 14.4 ins \
_ _ _ ~ ~~
" 30
I
60
,
90 I20 Tcm pcrat ure 1K 1
CK DLTS CC DLTS
,
I
I 50
I80
FIGURE 70. Comparison of CR-DLTS and CC-DLTS spectra of a germanium JFET. Note rhe increased SNR of the CR vs CC spcctrum.
I6
JFETs 33.34 and 57 bottom - CR-DLTS 54
-> 12 -Ew > 4
8
4
0
-0.3 -0.0
280
300 320 Temperature [K]
340
FIGURE7 1 . Effect of minority carrier generation on CR- and CC-DLTS spectra of several devices with gates connected in parallel. CC-DLTS magnitude difference depends on the s i x of thc external device connected to the meusured.
I18
PLAMEN V. KOLEV AND M. JAMAL DEEN 4 4
3
F
E 2
Y
til
> Q
I
-e----
---=:==
0 270
310 Temperature [K]
290
330
FIGLIRE 72. Same experiment as in Fig. 71 hut measuring only the largest available device I'rom the set. Compare the magnitudes with thesc in Fig. 71.
to the situation in CC-DLTS of a MOS-capacitor (Johnson et al., 1978). In this case, the ratio of the external gate area to that of the measured device was important. While this is obviously a system effect, it still clearly demonstrates the different physical nature of CR- and CC-DLTS methods despite the apparent similarity of their principle of operation. When the external gate was disconnected, the strong CC-DLTS signal disappeared. Arrhenius plots of these peaks produced almost the same activation energy of the E-center (in our notation E3), about 0.66 eV above the valence band.
D. Conclusions We have applied the new constant-resistance DLTS technique to both custommade and commercially available silicon and germanium JFETs. We have demonstrated that CR-DLTS is a reliable and very sensitive tool for investigation of electrically active point defects located in the channel of the JFET. The trap concentration was calculated without knowledge of the transconductance g,,, or the mobility p of the test device. The new technique was compared in detail to constant-capacitance DLTS. The CR-DLTS signal magnitude was demonstrated to be independent of the device size and it was shown that, for small size devices, CR-DLTS is more sensitive than CC-DLTS. Comparisons have also been made with the standard, capacitance-transient DLTS. The observed nonexponential behavior was attributed to the complex generationrecombination processes at the edge of the depletion regions in the Debye tail. In addition, the possibility for defect profiling using CR-DLTS was illustrated.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
I19
VIII. CONCISJSIONS A N D AREAS ITJK FUTURE RESEARCH A. Conclusiorzs
We introduced in this chapter a new approach to signal processing in digital DLTS systems. The problems of signal recovery from noise and efficient data storage are addressed separately from the transient signal analysis. As a result of this approach, an improved digital averaging scheme for DLTS signal recovery from noise and transient data storage has been proposed. We have shown that the combined action of two complementary digital averaging techniques can improve DLTS digital signal processing. Pseudo-logarithmic time averaging is efficient in reducing the number of processed data points and improving the SNR for the high frequency noise components and long delay times. We demonstrated that the normalized errors in the magnitude measurements introduced by this type of averaging remain below 1% for pure logarithmic, and below 0.1% for the case when the logarithmic averaging intervals are further divided into five or more equal parts. Continuous time averaging is well suited for improving the overall SNR, and for continuous display and processing of data and more efficient use of the computer resources. The described combination of hardware and software tools for the implementation of these techniques supplies continuously fresh data and does not require any synchronization with the main computer program. Compared to other digital DLTS systems, this new approach offers an improved short delay time resolution, improved SNR, and more efficient data storage. At the same time it offers real-time observation of essentially noisefree transients and, like analog systems, a real-time display of the DLTS scan. The real-time display means that the displayed transient and DLTS scan can be updated on the computer screen after each pulse even for a very large number of averaged pulses and the result can be predicted well before the acquisition process reaches this number. The proposed techniques allow one to combine the powerful transient analysis of the digital DLTS methods with the sensitivity and convenience of analog methods. The described averaging and data reduction techniques are intended primarily for DLTS data processing but the same principles can be useful for many other physical experiments involving transient data analysis. We also described setup configurations for CC- and CR-DLTS. Details of the novel feed-back circuit, which solves the most critical technical problems for implementation of CC-DLTS were also given. After slight modification, the same circuit makes CR-DLTS possible. We also discussed the most important technical parameters this feedback circuit should have in order to allow for fast and sensitive DLTS measurements. When these requirements are met, the speed and sensitivity of CC-DLTS are almost the same as in the
120
PLAMEN V. KOLEV AND M. JAMAL DEEN
conventional constant-voltage capacitance-transient DLTS. Also, guidelines for using the feedback circuit were presented in order to provide a practical guidance for running CC- and CR-DLTS experiments. The application of the feedback circuit was demonstrated with recorded traces and experimental results of interface trap density measurements. We presented a new variation of a DLTS technique convenient for measurement of submicron field-effect transistors, where standard capacitance DLTS cannot be used. Constant-resistance DLTS is similar to the conductance DLTS, but it is more sensitive and does not require simultaneous measurement of the transconductance or surface mobility for calculation of the trap concentrations. In addition, the DLTS signal is largely independent of transistor size, thus allowing measurements of very small-size transistors. The proposed technique is not restricted to metal-oxide-semiconductor field-effect transistors (MOSFETs), but can also be used to study other field-effect transistors. In another application, CR-DLTS was demonstrated with measurements of radiation-induced traps in buried channel MOSFETs, which are used as CCD output amplifiers. The unique structure of these devices offers extended opportunities for studying the space distribution of the radiation-induced defects. Still, most of the results are also valid for ordinary depletion-mode MOSFETs. In addition, we show a variation of the CR-DLTS technique using back-gate driving, a technique which is applicable for studying the channel-substrate p-n junction and the results are then compared with those obtained from constantcapacitance (CC-)DLTS measurements. Complementary measurement using front-gate and back-gate operation of CR-DLTS can help to resolve the ambiguities usually associated with DLTS measurements of symmetrical p-n diodes. The CR-DLTS have been successfully applied to study virgin and radiationdamaged junction field-effect transistors (JFETs). We have described results from three groups of devices: commercially available discrete silicon JFETs, virgin and exposed to high-level neutron radiation silicon JFETs custom-made by using a monolithic technology, and commercially available discrete germanium p-channel JFETs. The CR-DLTS was found to be a simple, very sensitive, and area independent technique that is well suited for measurement of a wide range of deep-level concentrations. Comparisons have been made with the CC-DLTS and standard capacitance DLTS. In addition, possibilities for defect profiling in the channel have been demonstrated.
B. Areas ,f.r Future Research I . Development of the System Hurdware Development. There are several areas for improvement. First, it is useful to replace the low-noise amplifier with a programmable-gain amplifier.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
121
This would allow for the program to select convenient gain according to the signal and to avoid overload and loss of data. Another, simpler improvement is to connect a second reference resistor and to alternate between the two resistors. Thus, when measuring transistors with high conductance during the tilling pulse, it would be possible to avoid overloading of the capacitance meter. As a result, the recovery time would be decreased several times and the fast portion of the transient would not be distorted. Other important improvements would be the possibility to control the values of the reference resistor and capacitor from the computer. This would offer opportunities for automated spatial profiling of the defects. I t would also be convenient to build a small, automated relay matrix i n order to change automatically the configuration from CC-DLTS to CR-DLTS or to change the measured device. A lixture to apply short tilling pulses is useful for implementation of alternative techniques for determination of the capture cross section as explained i n Sect. 11.
Sojtware Develoimimt. Hardware development can be efficient only when i t is complemented by corresponding software development. This is especially true when the goal is to make available gain adjustments during thermal scans, or setup, bias and test device changes between the scans. This includes routines for estimating the prevailing magnitude and for gain adjustments based o n these estimates. These changes would require a new format of recorded data files, which would allow for automated recognition of gain changes. Major improvements can be done in the transient data analysis. With superior raw data, all known digital techniques for transient analysis should work better. Many of them could be implemented to operate in real time and to display the trap parameters during the measurement. The references in Sect. 111 provide a good starting point.
2. Ap~diccrtioris The new CR-DLTS method can be tested with new types of FETs such as silicon-on-insulator (Sol), silicon-on-sapphire (SOS), 111-V MESFETs, or high-electron mobility transistors (HEMT). Hot-carrier degradation in subniicron MOSFETs can also be studied. By independent variation of the sourcesubstrate and drain-substrate biases, the spatial distribution of the interface and bulk traps can be investigated. Unlike the capacitance measurements, CRDLTS uses the channel conductance as a probe. It is also very tolerant to the changes i n the gain of the test structure and allows measurement of very small devices. Therefore, it makes it possible to reroute the CR-DLTS test signal to n single device in a large set of devices connected together, provided that the channel conductance of each device can be tested independently. Combined with other functional tests, this may allow us to identify the type of defect causing malfunction of a particular device out of many similar devices.
I22
PLAMEN V. KOLEV AND M. JAMAL DEEN
ACKNOWLEDGMENTS
P. V. Kolev expresses his deep gratitude to the following persons -B. Z. Antov, A. Y. Mladenov, and P. K. Vitanov (Institute of Microelectronics, Sofia, Bulgaria) and R. Attanasov (Faculty of Physics, University of Sofia, Bulgaria) who inspired, guided and supported his interest in the DLTS techniques during 1983-1990. The support received in 1992-1993 from P. Clauws and F. Cardon (Dept. of Solid-state Physics and Crystallography, Royal University of Gent, Belgium) is also greatly appreciated. M. J. Deen is deeply indebted to the students and researchers in his research group for their valuable contributions, comments, and suggestions in his research work on characterization and parameter extraction of semiconductor materials and devices. Both authors are pleased to acknowledge the fruitful collaboration of M. Citterio and J. Kierstead (Brookhaven National Laboratory, USA), T. Hardy and R. Murowinski (National Research Council, Victoria B.C.), and N. Alberding (Dept. of Physics, Simon Fraser University). We also express our gratitude for the comments and suggestions of E. Haller (University of California at Berkeley, LISA) which improved the final version of the manuscript. We also thank the members of our Integrated Devices and Circuits Research Group and the staff members of the School of Engineering Science at SFU-B. Woods and C. Cheng are thanked for their support, comments and assistance during the course of this research. This work was supported by: Crystar Inc.; the Science Council of British Columbia; Micronet; the Federal Center of Excellence in Microelectronics; the Natural Sciences and Engineering Research Council (NSERC) of Canada; Canadian Microelectronics Corporation; and Simon Fraser University. REFERENCES Abele, J . C.. Kremer, R. E., and Blakemore, J. S. (1987). Transient photoconductivity measurenients i n semi-insulating GaAs. 11. A digital approach, Jout-. A/)/?/,P/7y.s.,62: 2432. Akita, C., Fujirnoto, M., and Ito, K. ( I 993). Isothermal capacitance transient spectroscopy of grain-boundary interfacial states in Bi-doped SrTiOi ceramics, Jour. A””/. Phys., 74: 2669. Anand, S.. Subramanian. S., and Arora, B. M. (1992). Use of low-frequency capacitance in deep. level transient spectroscopy ineasurernents to reduce series resistance effects, Joirr-. A / ) / J ~Phys., 72: 3535. A.rada, K. and Sugano, T . ( 1982). Simple niicroconiputer-based apparatus for combined DLTSC-V measurement, Rev. P i . bi.rrrum., 53: 1001. Asom, M. T., Benton, J. L., Sauer, R.. and Kimerling, L. C. (1987). Interstitial defect reactions i n silicon, A p / . Phys. Lett.. 51: 256. Atanassov, R. D. ( 1983). Spectrum analyzer of exponentially decaying transients, A/Jp/. Phys. Lrrt., 43: 1361.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
I23
Austin, R. H., Beeson, K. W., Chan, S. S., Debrunner, P. G.. Downing. R., Eisenstein, L., Frauenfelder. H.. and Norlund. T. M. (1976). Transient iinalyxr with logarithmic time base. Rei*. sci. /ustru/f~.,58: 44s. Awadelkarim, 0. 0..Wenian, H., Svcnsson, B. G.. and Lindstroni. J . L. (1986). Deep-level transient spectroscopy and photoluminescence studies of clectron-irradiated Cmchralski silicon, Jorrr. Appl. Phys., 60: 1974. Balland. J . C., Zielinger, J. P.. Nogucl. C., and Tapiero, M. (1986a). Investigation of deep levels in high-resistivity hulk materials by photo-induced current transient spectroscopy: I . Review and analysis of sonie basic problems. Jorrr. Phys. / I . ; Appl. Plrw.. 19: 57. Balland, J . C., Zielinger, J. P., Tapiero, M., Gross, J . G., and Noguet, C. (1986b). Investigation of deep levels in high-resistivity hulk materials by photo-induced current transient spectroscopy: II. Evaluation of various signal processing methods, Jour. Phys. D.: Appl. Plzy.~., 19: 71. Barnes, C. E., ( 1979). Cama-induced trapping levels in Si with and without gold doping. Jour. Electron. Mur.. 8: 437. Henton. J . L., ( 1990). Characterization of defects in semiconductors by deep-level transient spectroscopy, Joio-. C y s r . Growth. 106: I 16. Benton, J . L. and Kinierling, L. C. ( 1982). Capacitance transienl spectroscopy of trace contaniSOC.,129: 2098. ination in silicon, Jour. Elrc~troc~/reru. l Anrr/y,si.s ,fiw rhr Ikvington, P. K. and Robinson, D. K . ( 1992). In IMtr Rt,dirc.tion ~ i t Error P/iy.siccil Scirnces, second edition, New York: McGraw-Hill. See alao Reid, C. E. and Passin, T. B. (1992). In Si,qnrrl Procv.v.\irrg in C. part II. New York: Wiley. Blondeel. A,, Clauws, P., and Vyncke, D. ( 1997). Optical deep level transient spectroscopy of minority carrier traps i n n-type high-purity germanium, ./our. A/>/>[,Phys., 81: 6767. Blood, P. and Orton, J . W. ( 1992). In Tlir Elrcrrictrl Chrrrcrc,teri~trriorr( I f Srniit~ondrrc~tor.stMrr,jority Curriers n r i d Electron Sttrtrs, N. H. March, Ed., New York: Academic Press. Borsuk. J . A. and Swanson. R. M. ( 1980). Current transient spectroscopy: A high-sensitivity DLTS system, IEEE Trms. Elec./nm Derl., ED-27: 22 17. Borsuk, J . A. and Swanson, R. M. ( I98 I ). Capture-cross-section determination by transicnt-current trap-tilling experiments, Jnirr. AppI. Phys., 52: 6704. I3osetti. M.. Croitoru. N., Furetta, C., Leroy. C., Pensotti, S.. Rancoita, P. G., Rattaggi. M., Redaelli. M.. Rimatti. M., and Scidman, A. ( 1995). DLTS measurements of energetic levels. generated in silicon detectors, Nu(././n.s/t-. U U L ~Meth. A , 361: 461. Breitcnstein. 0. ( 1982). A capacitance meter of high absolute sensitivity suitable for scanning DLTS application. P l q Brotherton, S. D. and Bicknell. J . (1982). Measurement o f minority carrier capture cross sections and application to gold and platinum i n silicon. , / c J u ~ . Appl. Pkys., 53: 1543. Hroniatowski. A., Blosse, A.. Srivastava, P. C., and Bourgoin, J . C. (1983). Transient capacitance nieasurcinents on resistive samples. Jour. Appl. Phys., 54: 2907. Brotherton, S. D. (1983). The width of the non-steady state transition region i n deep level i n purity measurements, Solid-Sttrte Elec~roir.,26: 987. Burd. M. R. and Braunstein. R. (1988). Deep levels in senii-insulating liquid encapsulated Czochralhki-grown GaAs, Jorrr. P h y Chtwi. Sol., 49: 73. Buehler, M. G. and Phillips, W. E. (1976). A study of the gold acceptor in a silicon p+-njunction and an n-type MOS capacitor by thermally slimulated current and capacitance measurements, Solid-Stutc Elrcrron., 19: 777. Carhallcs, J. C. and Lebailly, J . ( 1968). Trapping analysis in gallium arsenide, .Solid-Stcrrc C o w miin., 6: 167. Chappell, T. 1. and Ransom. C. M. (1984). Modifications to the Boonton 72BD capacitance meter for deep-level transient spectroscopy applications. Rev. Sci. Instrum., 55: 200.
124
PLAMEN V. KOLEV AND M. JAMAL DEEN
Chang. C. Y.. HSLI,W. C., Uang, C. M., Fang, Y. K.. and Liu, W. C. (1984). A simple and low-cost personal computer-based automatic deep-level transient spectroscopy system lor semiconductor devices analysis, f E E E Trtiiis. Insrriiiii. Mc~is.,IM-33: 259. Chantre. A. and Kimerling, L. C. (1986). Configurationally multistable defect i n silicon. App/. P//!..\. Lctt., 48: 1000. Chantre, A,. Vincent, G., and Bois, D. (1981). Deep-level optical spectroscopy in GaAs. Phys. Re\,., B23: 5335. Chen. X . Y., Deen, M. J., Yan, Z. X . , and Schroter. M. (1998a). Effects of emitter dimensions on low frequency noise in double polisilicon BJTs, Elwtroii. Lert., 34: 2 19. Chen. C. H..Deen, M. J., Yan, Z. X., Schroter, M., and Enz, C. (1998b). High frequency noise in MOSFETs I1 - experiments, Sdit/-St(irtJ Elecrroii., Vol 42; no. 1 I , pp. 2083- 2092 (Nov 1998). Chen. J. W. KO. R. J., Brzezinski, D. W., Forbes, I,.. and Dell’Oca, C. J. (1981). Bulk t r a p i n silicon-on-sapphire by conductance DLTS, IEEE Trms. Elrctrorr Deij., ED-28: 299. Chen, J. W. and Milnes, A. G. (1980). Energy levels in silicon, In Aniiiiti/ Revirvv of Mrrrcrit i / Scirwce. R. A. Huggins, R. H. Bube, and D. A. Vermilyea. Eds., Palo Alto. CA: Aiiiiutrl R(,iievt~.s,10: 157- 228. Chen, M. C.. Lang, D. V., Dautremont-Smith, W. C.. Sergent. A. M . , and Harbison, J . P. (1984). , 790. Eflccts of leakage current oil dcep level transient spectroscopy, Appl. Phys. L ~ t t .44: Chretien, O., Apetz. R., Vescan, I-., Souifi, A., and Liith, H. (1995). Thermal hole emission from Si/Si I -.,Ge,/Si quantum wells b,y deep-level transient spectroscopy, Jour. A/)/)/. Pliys., 78: 5439. Citterio, M., Kierstead, J., Rescia, S., Manfrecti, P. F., and Speziali. V. (1995). Low noise monolothic Si-JFETs ior operation in the 90-300 K range and in high radiation environments. Proc. . s y r i ~ ) o . s i r r r i i o i i Low Tern/). Elcctroirics m i d High Temp. Sii/Jer[-otidLrcrii,iry,C. L. Clayes, s. I . Raider. R . K. Kirshman, and W. D. Brown, Eds.. The Electrochemical Society Proceedings Sei-ies, PV 95-9: 4lX, Pennington. NJ. Citterio, M., Kierstead. J., Rescia, S.. and Radeka, V. (1996). Radiation effects on Si-JFET devices for front-end electronics, IEEE rims. Niicl. S i . , 43: 1576. Coli‘a, S., Privitera. V., Priolo. F., Libertino. S., and Mannino, G. (1997). Depth profile\ of vacancy-and interstitial-type defects in MeV implanted Si, Jour. AppL Phys., 81: 1639. Collet. M. G. (1975). An experimental method to analy/,e trapping centers in silicon at very low concentrations. Solirl-Store E/ectroti., 18: 1077. Couturier, G., Thabti, A,, and Barriere. A. S. (1989). The baseline problem in DLTS technique, Rei3. Phyv. App/iqfi6, 24: 243. Crowell, C. R. and Alipanahi, S. (198 I ) . Transient distortion and 17th order filtering in deep level transienl spectroscopy (D’ILTS), Solitl-Store Elocrroii., 24: 25. Dacey. G. C. and Ross, 1. M. (1953). Unipolar “Field-Efiect” transistor. Proc. IRE, 41: 970. Day. D. S . . Tsai. M. Y., Streetman, B. G., and Lang, D,V. (1979b). Deep-level transient specrroscopy: System effects and data analysis, Joirr. Appl. Phys., 50: 5093. Day. D. S.. Helix. M. J.. Hess, K . , and Streetman, B. G. (107%). Deep level transient spectroscopy for diodes with large leakage currents, Rela. Sci. Itisrrurn.. 50: 1571. Deen, M . J., (1993a). Low-frequency noise and excess currents due to trap-assisted tunneling in double barrier resonant tunneling diodes, 23-rd Eiiropem Solicl-Stcirr Device Re.serrrch Cord: (ESSDRC’93). Grenoble, France, 355. Decn, M. J. ( 1993b). Low-frequency noise as a characterization tool for InP- and GaAs-based double-barrier resonant tunnelling diodes, Mar. Sci. ciird Eng.. B20: 207. Deen, M. J . ( 1 9 9 3 ~ ) .Low frequency noise and excess currents due to trap-assisted tunneling i n double barrier resoilant tunneling diodes 23rd Euro/wt/n Solid-Sttitr Drvicv Re.seorch Coiifiretrce (ESSDERC ’93). Grenoble, France, 355.
DEVELOPMENT AND APPLICATIONS O F A NEW DLTS METHOD
125
Deen, M. J . ( 1993d). D. C. and low frequency noise characteristics of resonant tunneling diodes, Proc. Sytnp. on LOW Teniperritiirr Electronics titid High 7iwiperutirre Siiprn~orrc/uctivity P v 93-22: 191, S. Raider, C. Claeys. D. Foty. and T. Kawai, Eds., The Electrochemical Society Proceedings Series, Pennington, NJ. Deen, M. J.. Ilowski, J. I., and Yang, P. (199Sa). Low frequency noise in polysilicon-emitter bipolar junction transistors, JoLrr. App1. Phys., 77: 6278. Deen, M. J . . Ilowski, J. I., and Yang, P. (199%). The effect of emitter geometry and device processing on the low frequency noise of polysilicon emitter npn bipolar transistors, Proc. 13th /tit. Cot$ 011 Noise iti Phg.sicd Sgstrms t i r i d //f”iic~turrtions, p. 4.54, Bareikis and R. Katilius. Eds., Singapore: World Scientitic. Deen, M. J . and Quon, C. ( 1991). Charncterization of hot-carrier effects in short channel NMOS devices using low frequency noise measurenients. 7/11C‘otIf /n.sir/trtingFi1tn.s 0 1 1 Sf~trii-c~ot?r~ucror.s (INFOS 91), Liverpool, United Kingclorn, 2-5 April 1991, W. Eccleston and M. Uren, Eds., p. 295. United Kingdom, 1OP Publishing Ltd.. Deen. M. J . and Raychaudhuri, A. (1994). Charge pumping, low frequency noise and floating gate characteriLation techniques of Si02 gate insuliitors in MOSFETs, Proc. S y i p . oti Silicwti Nitride citid Silicoti Dioiide Thiri / t l . s d t i r i t l g Filtns, Thc Electrochemical Society Procecdings Series, PV 94-16: 375, V. J . Kapoor and W. D. Brown. Eds., Pennington, NJ. Dcen. M. J., Kouniyantsev, S. and Orchard-Web, J. (1998). Low (iequency noise in heavily doped polysilicon thin film resistors. Joirr. V t i c , . Sci. r i d Techti. B. vol B 16, no. 4. pp. 1881 - 1884 (July/August 1998) Dccn, M. J. and Yan, Z. X. (1990). A new method lor measuring the threshold voltage lor small geometry MOSFETs from subthreshold contluction. Solitl-Srtirr Elertr., 33: 503. Deen, M. J . and Zhu, Y. (1993). I/f noise in n-channel MOSFETs at high temperatures, AIP Cotif: Proc. 2x5 - Q~itititutii I / f Noist~ntirl Other Low Freqirrticy F1irctutrriott.s i t i Electronic Ilei~icr.~, p. 165, P. H. Handel and A. L. Chung, Eds., New York: AIP Press. DcJule, R. Y.. Hause, M. A,, Ruby. D. S., and Stillman, G . E. (1985). Constant capacitance DLTS circuit for measuring high purity semiconductors, Solicl-Stare Electr., 28: 639. Dinowski. K. and Pi6ro. Z. (1987). Noise propcrties of analog correlators with exponentially 58: 2 185. weighted average, Rev. Sci. /rr.~tr.ut~i., Doan, M.. Z. Buffet, and Deen, M. J . (1997). The effect5 of a.c. and d.c. reverse bias stress on the d.c. forward bias and low frequency noise characteristics of polysilicon emitter bipolar f1evice.s Dire t o Device junction transistors, Proc. of’the Sgriip. on /lie Drgrticltitioti of’E1et~tmtiic~ ~ ~ / J C ‘ ~ ( L ~ ;( O I St l w e / / U S CrJ~.St(i/itle ( I d ~ ~ O l ’ r . S . S - / t l ~ l / ~ t . e d The Electrochemical Society Proceedings Series. PV 94-1: 235. H. J . Qucssnier. J. g. K. E. Bean, T. J . Shaffner, and H. Tsuya, Etls., Pennington, NJ. Doolittle, W. A. and Rohatgi, A. ( 1992). A novcl computer based pseudo-logarithmic capacitance/conductance DLTS system designed for transicnt analysis. Rev. Sci. /ti.striuti., 63: 5733. Driver. M. C. and Wright. G. T. ( 1963). Thermal release of trapped space charge in solids. /’roc. Phg,\. .SOC.( f A J t l t k J t f ) . 81: 141. Dubecky, F. ( lYX9). Characterization of deep delccts in semi-insulating GaAs by capacitance and conductance DLTS with electrical and optical excitations, Mat. Sci. For~rt71,38-41: 1301. Eades, W. D. and Swanson, R. M . (1984a). Improvements in the determination of interface state density using deep level transient spectroscopy, ./our. App/. P1iy.s.. 56: 1744. Eades, W. D. and Swanson, R. M. (I984b). Determination of the capture cross section and degeneracy factor of Si-SiO? interface states, Appl. Ph Engstroin, 0. and Alm, A. (1983). Energy concepts o nductor interface traps, Joirr. App1. Phgs., 54: 5240. Evans. A. D. ( 1980). In De.yigtiing with Fie/tl-I?ffecrTrmsistors, New York: McGraw-Hill. Evwaraye. A. 0. and Baliga, B. J . ( 1977). The dominant recornbination cenkrs i n elcctronirradiated semiconductors devices, Joiir. /i/ectroc/ietfi.soc., 124: 9 13.
126
PLAMEN V. KOLEV AND M. JAMAL DEEN
Evwaraye, A. 0..Hall, R. N., and Soltys. T. J . (1979). DLTS measurements o f trapping deifects in high purity germanium, lEEE Trims. Nu(,/. Sci.. NS-26: 27 I . Farmer, J. W., Lamp, C. D., and Meese, J . M. (1982). Charge transient spectroscopy. AppI. Phys. Lett.. 41: 1063. Prugrcirnniiri,q for the XO.786, Fernandez, J. N. and Ashley, R. ( 1990). In A.s.senihly L.~rrijiii~i~qr p. 182. New York: McGraw-Hill. Fourches, N. ( 1991). Deep level transient spectroscopy base on conductance transients, A/J/J/. P h y . Lett., 58: 364. Gardner, R. G. (1987). Improved fast integrator for small ion currents (IfA-IinA), Rev. Sci. Iristrrrrri.. 63: 1540. Golio. J. M.. Trew, R. J., Maracas, G. N., and Lefkvre, H. (1984). A modeling technique for characterizing ioii-implanted material tising C-V and DLTS data, Solid-State Electron., 27: 367. Goto C., Yanagisawa. S.. Wada, 0.. and Takanashi, H. (1973). Determination ot deep-level energy and density profiles in inhoinogeneous semiconductors. Appl. Phys. Lett., 23: I SO. Giitz, W., Johnson, N. M., and Akasaki, 1. (1994). Deep level defects in n-type GaN. App/, Phys. Lett.. 65: 463. Gotr, W., Johnson, N. M., and Bour, D. P. (1996a). Deep levcl defects in Mg-doped. p-type GaN grown by metalorganic chemical vapor deposition, A p p / . Pkys. Lrtt.. 68: 3470. Gdtz. W . , Johnson, N. M., and Imler, W. (1996b). Activation energies of Si donors in GaN. Appl. P/iy.s. Loti., 68: 3 144. Gijtz, W., Johnson, N. M., and Street. R. A. ( 1 9 9 6 ~ ) .Activation of acceptors in Mg-doped GaN . Lett., 68: 667. grown by metalorganic chemical vapor deposition, A / J / J /Phyx Graff, K. ( 1995). In Mrral ltuprri,firsit7 Si/icoti-Drr%x I;rrhriccition, Hans-Joachim Queisser, Ed., Berlin: Springer Series i n Material Science. Grimmeins, H. G. (1977). Deep level impurities in semiconductors, In Aiiriutrl RPi>iewof’Mrrrr~ritrl Sciencr, R. A . Huggins, R. H. Eiuhe, and R. W. Roberts. Eds., Palo Alto, CA: Atzriucil Ker~iews. 7: 341. Groeaeneken, G., Maes, H. A , , Beltrin. N.. and De Keersmaecker, R. F. (1984). A reliable approach to charge-pumping measurements in MOS transistor, lEEE Trwis. E/rt.rroti. net,.. ED-31: 42. Hacke, P. and Okushi, H. (1997). Characterization of the dominant midgap levels in Si-doped GaN by optical-isothermal capacitance transient spectroscopy, Appl. Phys. Lett., 71: 524. Haddara, H., Elewa, M. T., and Cristoloveanu, S. (1993). Measurements and modeling of drain current DLTS in enhancement SO1 MOSFETs. Microelectrot7. Jour., 24: 647. Hall, R. N. ( 1952). Electron-hole recombination in germanium, Phys. RCIJ..87: 387. Haller, E. E., Li, P. P., Huhbard, G. S., and Hansen. W. L. (1979). Deep level transient spectroscopy of high purity germanium diodesldetectors, lEEE Trrrris. N u d . Sci.. NS-26: 265. Hamilton B.. Peakcr, A. R., and Wight, D. R. ( 1979). Deep-state-controlled minority-carrier lifetime i n n-type gallium phosphitle, ./our. A p p / . Phys., 50: 6373. Hanak. T. R., Ahrenkiel, R. K., Dunlavy, D. J., Bakry, A. M., and Timmons, M. L. (1990). A new method to analyze niultiexponential transients for deep-level transient apectroscopy, Jour. Appl. PIiXs., 67: 4126. Hardy, T.. Murowinski, R.. and Deen, M. J. (1998a). Charge transfcr efficiency in proton damaged CCD’s, lEEE ~ o ~ I .Ns/ K. / . .%;., NS-45(2): Hardy, T., Murowinski, R., and Dren, M. J. (1998h). The effect of proton radiation on the charge transfer efficiency in CCD’s, P roc. ESO Workshop - Clpricril Lktrctors jor Astroriotriy, J . W. Beltic and P. Amico, Eds., Kluwer ASSL Series, pp. 223-230. New York: Kluwer. Hascgawil, F. (1985). A new method (the three-point method) of determining transient time constants and its application to DLTS. J f i p n . Joirr. Appl. Pkvs., 24: 1356.
DEVELOPMENT AND APPLICATIONS OF A NEW DLTS METHOD
127
Hawkins, I. D. and Peaker, A. R. (1986). Capacitance and conductance deep-level transient spectroscopy in field-effect transistors, A/J/J/.Phys. Lett., 48: 227. Henini, M., Tuck, B., and Paull, C. J. (1985). A microcomputer-based deep level transient spectroscopy (DLTS) system, Jo/rr. Phys. Ec Sci. /ri.strrrrn., 18: 926. Henry, P. M., Meese. J. M., Farmer, J. W.. and Lamp, C. D. (1985). Frequency-scanned deep/. 57: 628. level transient spectroscopy, Jour. A ~ / J Phy.~., Hcnry. C. H., Kukimoto, H., Miller. G. la., and Merritt, F. R. (1973). Photocapicitance studies of the oxygen donor in Gap. 11. Capture cross sections, Phy.s. Rev., 137: 2499. Heydenreich. J. and Breitenstein, 0. ( 1986). Characterization of defects in semiconductors by combined application of SEM (EBIC) and SDLTS. Jour. Mic.rosc.. 141: 129. H.jalmarson. H. P and Samara, G. A. ( 1 988). An improved deep level transient spectroscopy method, Joirr. App/. Phys., 63: 180 I. HolAein K., Pensl, G., Schulz, M., and Stolz, P. (1986). Fast computer-controlled deep level transient spectroscopy system for versatile applications in semiconductors, Rev, Sci. /ri.s/rurii., 57: 1373. Hurtes, C., Boulou. M., Mitonneau, A,, and Bois, D. (1978). Deep-level spectroscopy i n highresistivity materials, A p / ~ l Pkyx . Lm., 32: 82 I . Iheda, K. and Takaoka, H. ( 1982). Deep-level Fourier spectroscopy for determination of deeplevel parameters. Jcip. Joirr. Appl. Phys., 21: 462. Ikossi-Anastasiou, K. and Roenker. K. P. (19x7). Retinements in the method of moments for analysis of multiexponential capacitance transients in deep-level transient spectroscopy. Jorrr. A / J ~ / Phyx., . 61: 182. Istratov, A. A. and Vyvenko, 0. F. (199s). Deep centers in cadmium sulfide crystals: New method for comparing DLTS data found by different investigators, Sernicondirctor.s, 29: 340. lack. M. D.. Pack, R. C., and Henriksen, J. (1980). A computer-controlled deep-level transient spectroscopy system for semiconductor process control, IEEE Trciris. Electron Dev., ED-27: 2226. Jnggi, B. and Deen. M. J. (1995). Low temperature operations of silicon charge coupled devices for imaging applications, Pmc. of the Syr7ip. on Low Ewrperotrrre Elrctroriics r i r i d High E w prrcrture Sii/)ercorirluc.r;i~ir,v, S . I . Raider, R. Kirshinan, H. Hayakawa, and H. Ohta, Eds., The Electrochemical Society Proceedings Series. PV 88-9: 579, Pennington, NJ. Jaros, M. ( 1982). Deep Leivls in S~,r)iicoridicc,tor.s, A. Hilger, Ed.. Bristol, UK: The Institute of Physics. Jcrvis, T. R., Teter. W. M., Cole, T., and Dunlavy, D. (1982). Deep level transient spectroscopy using CAMAC components, R n , . Sci. /ri,srriwi., 53: I 160. Johnson. N. M. (1979). Energy-resolved DLTS nieasureiiient of interface states in MIS structures, A/)/)/. Phys. Lett., 34: 802. Johnson, N. M. ( 1982). Measurement of semiconductor-insulator interface states by constantcapacitance. deep-level transient spectroscopy. Jorrr. b2ic.. ScY. Techno/.. 21: 303. Johnson, N. M., Bartelink, D. I., Gold, R. B.. and Gibbons, J. F. (1979). Constant-capacitance DLTS measurement of defect density protiles i n semiconductors. Jnrrr. App/. Phys.. SO: 4824. Johnson, N. M., Bartelink, D. J., and Schulz, M. (1978). Transient capacitance measurements of electronic states at the Si-SiOl interface, P roc. hi t . Corzj: PIzys. o f ' ,502rrricl Its /rl/
[email protected]. p. 42 I , New York: Perganion Press. Johnson, N. M. and Herring, C. (1991). Migration of the H? complex and its rclation to H- i n n-type silicon. Phys. Rev. B, 43: 14297. Jones, B. K. (1994). Low-frequency noise spectroscopy, / Trms. Electrori L h , . , ED-41: 2 188. Kachwalla, Z . and Miller, D. J. (1987). Transient spectroscopy using the Hall effect, A /) /) /. /'/iys. Lett.. SO: 1438.
128
PLAMEN V. KOLEV AND M. JAMAL DEEN
Kandiah. K. ( 1994). Random telegraph signal currents and low-frequency noise in junction tield effect transistors, IEEE Trtrns. Elccrron Dev., ED-41: 2006. Karasyuk, V. A., Thewalt, M. L. W., An. S., and Lithtowlers, E. C. (1994). Intrinsic splitting of the acceptor ground state in silicon, Phys. Rev. Lett., 73: 2340. Katsube, T., Kakiinoto, K., and Ikoma, T. (1981). Temperature and energy dependences of capture cross sections at surface states in Si metal-oxide-semiconductor diodes measured by deep level transient spectroscopy, Jour. Appl. Phys., 52: 3504 3508. Kim. M . C., Song, K. H., and Park, S. J. (1993). Isothermal capacitance transient spectroscopy study on trap levels in polycrystalline SnOz ceramics, Jour. Mat. Kes., 8: 1368. Kim, H.. Blouke, M. M., and Heidtmann, D. L. (1990). Effects of transistor geometry on CCD output sensitivity, In Chcrrge-Coupled Deisict>.sarid Solid-State Opticcrl Sen.tors, M. Blouke, Ed., Proc. SPIE, 1242: 195. Kimerling, L. C. (1976). New developments in defect studies in semiconductors, IEEE Trcrm Nucl. Sci., NS-23: 1497. Kimerling. L. C. ( 1977). Defect states in electron bombarded silicon: Capacitance transient analyses, In Rrrdiation Effrcts in Semit~oiiiiuc.tor.v(Bristol, UK: The Institute of Physics, 1976). Institute of Physics Conf. Series, 31. p. 22 I . Kimerling, L. C., Benton, J. L., and Rubin. J. J . (1981a). Transition metal impurities i n silicon, I n Dcficts crnd Rtrdiariori Efects in Sernicorze/ui.tor.s (Bristol, UK: The Institute of Physics, 1980). Institute of Physics Conf. Series, 59: p. 217. Kimerling, L. C., Blood, P., and Gibson, W. M. (1981b). Defect states in proton-bombarded silicon at T 10, higher derivative terms in g(x) become nonnegligible. Let US note that h ( x ) is further endowed with interesting vanishing moments:
J ' d x h ( r ) = 1 and J ' d x k ( x ) (x
- ;)"I
= 0 for
112
= I , 2 and 3.
(53)
In the early days of the Daubechies wavelets, the introduction of vanishing moments for the scaling function led to the construction of the well-known co(flefs. The parameter a in Eq. (52) can be directly computed from the filters coefficients an by using the first nonvanishing momentum of cp(.r): writing (54) we have
178
J.-M. LINA. P. TURCOTTE and B. GOULARD M U (for J=2.4,h and 8)
5 . -5
.
-25
.
-30
FIGL~KF. 12. Left:
.s((o);
/
I
/
.
-6
, ,
-5
,
.
-4
,
.
I
, .
. I
-3
.
’ .
.’
I
/
.’ .
-2 -1 log(w)
.
.
0
I
2
log-log scale. The slope of the straight lines is J
+ 1.
Straightforward integrations by part lead to
For J = 2 and 4, (Y is respectively equal to -0.164 and -0.089. We observe that Figs. 7 and 10 are consistent with Eq. (52). Another interesting relationship can be stated between $ and cp. Writing $(x) = w ( x ) i7r(x), we look at the relationship between the real functions w and h (the real part of the scaling function cp). As seen on Fig. 12, it is found numerically that, at least for J up to 8, the ratio
+
is real and behaves as d+I.Such a relationship certainly does not occur in the real Daubechies cases. However, it is worth recalling that continuous wavelets are usually generated by taking successive derivatives of some scaling function as the Gaussian. The famous “mexican hat” wavelet is the second derivative of the Gaussian function. Here, we obtain compactly supported orthogonal complex wavelets whose real part is close to being the derivative of a smooth function, the real part of the corresponding complex scaling function.
VII. THEMALLAT ALGORITHM WITH
COMPLEX
FILTERS
The discrete multiresolution analysis of f consists of the computation of the coefficients of the expansion
COMPLEX DYADIC MULTIRESOLUTION ANALYSES
179
where jo is a given scale (low resolution). In practice, the sum over j (the details at finest scales) is finite and ,f’ is projected onto some approximation space vjmSx : PV,,,,~“ f (XI =
C cjl,l,,x.
k~;,,,~,~,k
(59)
k
The coefficients in the expansion equations (46) and (47) are computed through the orthogonal projection of the field over the multiresolution basis:
Starting with Pv,,,,,, f , the wavelet coefficients are computed with the frst wavelet u‘econiposition algorithin W composed with the low-pass prqjection V, + V / - , and the high-pass projection V, + W / - l :
h
k
Conversely, any elements of 4-1and of W,-l can combine to give a unique vector in 4; this reconstruction (denoted by W - ’ ) is expressed by the inverse fast wavelet transform:
k
k
In most applications, the signal to be analyzed is real-valued: The complex wavelet representation provides a redundant description of the signal. Equation ( 5 2 ) helps in interpreting this redundancy because, using the Taylor expansion of a one-dimensional field, we can estimate the real and irnaginary parts of the coefficients c i as
Let us consider the estimate of the finest scale approximation of f , that is, f k . A crude approximation is simply
Pv,,,,,,,(f), given a sampled function
180
J.-M. LINA. P. TURCOTTE and B. GOULAKD
FiciLlat. 13. Projection onto V,,,,,,,: real part (left) and imaginary part (right) of P,,!,,],,,1. (The original image is the right iinagc displayed on Fig. 10.)
given by L ‘ ; , ~ , ~ , , x= 2pJ/2fk. Denoting by !)IJ c V, the set of all functions in V, with real-valued modes, this approximation is nothing but the orthogonal projection (denoted by P!,il,,,z,k) of ,f’ onto !Hj,,l,l,. This corresponds to “Mallat’s initial conditions” for fast wavelet transform. A more accurate estimate of PLJ,,,~,,~ ( f ) is obtained by using the operator
This projection gives a nontrivial imaginary part for the cl”’.“.As expected, it corresponds to the Laplacian of the estimated real part. This is illustrated nicely in the 2-D example displayed in Fig. 13.
VIII. RESTORATION FROM n j F : PHASE The issue discussed here is nieant to facilitate understanding of the redundancy of the complex wavelet representation of a real signal. In other words, we want to understand the “role of the phase of a complex wavelet coefficient d j , .”~ For simplicity of notation, the discussion is done for 1 -D signals but the simulation is presented in 2-D. Let us first define two projectors, P,Ii and P r . The projector P,,{ extracts the first-order approximation of the scaling coefficient of the expansion equation (59) at the finest resolution (!)1 denotes the real part):
COMPLEX DYADIC MULTIKESOLUTION ANALYSES
181
Let us now consider the wavelet expansion equation (58) of a given tield .f‘o and define the phase of the wavelet coefficients 0 j . k = Arg(d,,k). We observe that the new set of functions Qj,k(x) = e”’:$J,~(x) is also an orthonorinal basis of L2(R):This “local rotation” of the wavelet basis leads to a inultiwavelet basis adapted to the signal. Indeed, we define the istyhase spuce r by the set of all expansions
where the coefficients ri are now positive real numbers. The PI. is the orthogonal projector on this space that depends on the phase of the wavelet coefficients of the original field with which we start. Given an arbitrary wavelet expansion of the form (2) with d,,,k = w,,,k i u ; , ~ the , projection on the isophase space is defined by the closest point on r, that is,
+
with
We further observe that both P:li and f’l- project onto convex spaces (POCS). Considering an arbitrary point .70 in r, a well-known theorem states that the sequence of alternate projections shown in Fig. 14, that is,
converges and, in the present case, the limit point is the original real signal .f‘o from which we defined r. The 2-D generalization of this algorithm is straightforward using the usual cross-product of the I-D rnultiresolution basis. For the sake of illustration of the “phase reconstruction algorithm,” Fig. 15 displays the original picture f o , the initial point P,~{,fo(obtained by killing all the modulus of the ~ ,j,,,i,x - 4 wavelet coefficients of the four-level decomposition, that is, j , = with SDWJ = 2) and the POCS reconstructions f n = 1 o 0 and J ’ J J = ~ ( J ( ) ( ) . We first notice that the POCS gradually restores the details of the iinage from coarse to fine. As we notice in Eq. (68), the projector PI- “shrinks” the inodulus of the wavelet coefficients, even to 0. I t is worth recalling that shrinkage techniques
182
J.-M. LINA, P. TURCOTTE and B. GOULARD
F~GLJR 14. E Phase reconstruction by alternating projections on aftine spaces
rj,,,zlx and K,,,,,,
are nowadays an efficient tool for denoising. Phases thus encode the "coherent" structures of the signal and the POCS algorithm reconstructs the original image through the coherency of the encoded information. The restoration of the modulus of the wavelet coefficients is illustrated in Fig. 17: coefficients of the coarser level j = j o = j,,,,, - 4 and those of the finest scale j = j,,,, - 1, both for f n = l O O O . We can observe the resulting shrinkage of the wavelet coefficients that depends on the scale of the details. Let us further mention the significant speed-up of the POCS algorithm by using a relaxation parameter in the isophase projector. This is done by redefining PI- as follows
In place of Eq. (69), we now consider the new sequence of projections (see Fig. 16), .frl
= (P!iiPr(Ll) ) " ~ ! I t ( . j ' o )
(71)
where h,, is computed in order to minimize the quadratic error ll.fr, - f112. The iterative algorithm is obtained on the form
COMPLEX DYADIC MULTIRESOLUTION ANALYSES
Fq
183
FIGURE 16. Phase reconstruction by alternating projections with relaxation parameter A. 200000 200000, 150000 I00000
A
.
50000 0
0
”
f ..t
50000 100000 150000 200000 (a)
60001
’
’
5000. 4000 -
3000 .
2000 1000
0
1000 2000 3000 4000 5000 6000 (h)
FIGURE 17. (a) coarse scale wavelet modulus of .f’~cno vs original wavclet modulus: (h) finest scale wavelet modulus of .flow vs original wavelet modulus.
with 6,, = d J l / r J where l, d,, is the distance between the two convex (i.e., the energy of the imaginary part killed in the projection P!,,),
In Fig. 17 we have displayed the amplitude of the reconstructed wavelet coefficient versus the original true value. In this example, all the modulus of the wavelet coefficients have been set to zero at the beginning of the POCS algorithm.
ENHANCEMENT IX. IMAGE The bidimensional multiresolution analysis is built from the product of two niultiresolution spaces V. The scaling function @(x, y ) = cp(x)cp(y) generates
184
J.-M. LINA, P. TURCOTTE and B. GOULARD
Vo and, complemented with three wavelets @(x, y) = @(x)@(y),q ’ ( x , y ) = @(x)cp(y), @(x, y ) = cp(x)cp(y), it spans VI . The functions are complexvalued. In particular, we have
and Eq. [ 5 8 ] or Eq. [59] now generalizes in two dimensions as
span the spaces V, and W > ,respectively, so that
In the sequel, we denote by W N the N-levels wavelet transform
where d = (do,d ’ , d 2 ) . Figure 13 shows an example of the projection Pv,,,,,,,I and Fig. 18 displays the modulus of the complex wavelet coefficients (and the c 111 ~ ~ , ~ in , ’the s upper left corner). The real and imaginary parts of the scaling function are
where G(x, y ) denotes the real smoothing kernel h ( x ) h ( y ) .On one hand, the real part of the 2-D scaling function is close (because a2 < < 1) to the smoothing kernel G(x, y ) while, on the other hand, the imaginary part is proportional
COMPLEX DYADIC MULTIRESOLUTION ANALYSES
I85
I’ICiLJKE 18. Modulus of the coniplex wavclet coefficients (SDW4, N = 3 ) . The top leli sector is the matrix of coefficients ri:,,. The other sparse matrices are the wavelets coefticicnts when the scale j increases from .jo to ,j,,,L,x = j~ -1- 3 (3 directional wavelets).
to the Laplacian of G(x, y ) : W ( x , y ) is thus the “Marr wavelet” associated with @(x,y ) 2: G(x, y ) .
As the real and imaginary parts of the wavelet transform coefficients of some r e d image correspond to the convolution of the original field with the real part and the imaginary part of @ , , , , , , , l (x, y ) , respectively, we then have access to the (multiscaled) smoothed Laplacian of the image:
The simultaneous presence of a smoothing kernel and its Laplacian in the complex scaling function can be exploited to define some elementary operations on the wavelet coefficients. In other words, we use this information to synthesize a new image that corresponds to some prescribed operation. A typical example is de-noising; this application is among the most successful applications of wavelets. Here, we investigate the edge enhuncemenr that is
I86
J.-M. LINA, P. TURCOTTE and B. GOULARD
usually implemented through the sharpening operator
Starting with an expansion of the form given by Eq. (76),we synthetize a new image
defined by
t11.11
where the scale 1 runs from ,j,,,L,x to the coarse scale jo. hl are real-valued and 61 (:) are some functions of the complex wavelet coefficients. Our intention is to extract various kind of details at different dyadic scales of the image and to add them to the original with appropriate weights. Some particular choices of hl’s and 61’s are particularly worthy of mention. For example, let us consider:
We observe that hl = = 0 leads to the identity, that is, I = 1. Introduction of nonvanishing (but small) hl amounts to adding scale dependent details that are similar to the multiscale Laplacian. Figure 19 shows an example of such a processing with SDW4 and N = 3. In comparison with the original image, the local contrast has been improved significantly: Artefacts from the scanner acquisition are now apparent in the processed lady image. This “anomaly detection” is one of the most promising applications of multiscale representations. Let us mention that other efficient rnultiscale sharpening transformations have been proposed in the recent past. The main difference in the current work is the orthogonality property of the SDW transform and the use of the phase of a complex basis. We recall that the SDW bases are not derived from a representation that allows specific
COMPLEX DYADIC MULTIRESOLUTION ANALYSES
187
FI(iIJKE 19. Image eiihaiicciiient: the teat image ;I( left. The I-ight image has heen synthctircd u4ng nonvanishing A's and r in Eqs. (XO) and (82)
enhancements. On the contrary, the Laplacian has been shown to be inherent to this particular orthogonal basis.
x.COMPLEX S H R I N K A G E Let us now consider an image corrupted with additive Gaussian noise (denoted by N) projected into the approximation space of highest resolution
We want to estimate a real signal
-
from the observed image I =Io+N. The wavelet shrinkage technique amounts to computing estimates from the wavelet representation of the observed signal. Let
I88
J.-M. LINA, P. TUKCOTTE and H . GOULAKD
We have to solve the following variational problem: Given a positive Lagrange parameter h, find an image I* that minimizes the functional
L ( I ) E E ( ? ,I ) + h S ( I ) .
(91)
Here E(?, I ) is the root-mean square (rms) error between the observed and test images:
whereas S ( I ) represents some constraint on the regularity of the optimal solution; in fact it regularizes the ill-posed problem of minimizing E alone. The choice of this constraint involves some LI priori knowledge about the true signal we aim to restore. The parameter h controls the trade-off between goodness of fit and CI priori smoothness. This latter property, given by S ( I ) , can be quantified by using a norm in some Besov space (see De Vore and Lucier, 1992). An important result i n wavelet theory is the definition of such a norm in the wavelet representation. Considering the Besov space (i.e., the space of functions “with (Y derivatives in L , ” ) , it can be shown that
cc 2
S(I)=
(z
/,,,.,x-~
ld),kl,k2Ip)“”’, s’ = (Y
2”“’
[=o ]=n
+ 1 - 2P
-
(931
defines a norm in this space. For the sake of simplicity, we consider here p = q = 1 and(-Y= 1; then
and the functional defined in Eq.(91) can be read as 2
11,,.n,-~
Equation (94) illustrates the efficiency of the wavelet representation for solving the preceding optimization problem because it “diagonalizes” the functional over each coefficient of the decomposition. As illustrated in Fig. 20, the
COMPLEX DYADIC MULTIRESOLUTION ANALYSES
189
FIGURE 20. Solution of Argrnin R d , 2).
solution d* of our variational problem d* = ArgminF(d, d )
(97)
is clearly a point lying on the ray defined by the phase of 2.In other words, the phase is preserved and plays a role in this problem rather like a parameter. Let us rewrite F(d, d ) in terms of the amplitudes,
The solution of Argmin F ( r , 7) is obviously given by r* = ( r
-
i)+
(99)
where (x)+ is equal to x for x 2 0 and 0 elsewhere (see dashed line in Fig. 2 1 ). We thus obtain a soft-shrinkage of the wavelet coefficients’ amplitude with a threshold defined by k ( j ) . Denoting by 7this shrinkage operator of the wavelet
190
J.-M. LINA, P. TURCOTTE and B. GOULARD
FIGURE
21. Shrinkage function
s(x).
coefficients, the resulting estimate is given by
Let us emphasize that this approximation is obtained by modifying only the amplitude of the wavelet coefficients: The phases are preserved. This fact will be explicitly used in the Bayesian approach considered in the sequel. Let us consider the general case of a zero-mean complex Gaussian random variable ;rl with a normal distribution (see, for instance, Miller, 1974). Following the recent work of Picinbono (1996), we define the variance cr2 and the “relation” c by 2 def 0 = E(rpj), e f E ( ; r 1 2 ) . (101) The complex Gaussian distribution is then described by the density function ( 102)
with
Straightforward computation yields the following expression for Q:
191
COMPLEX DYADIC MULTIRESOLUTION ANALYSES
where n = EelLYand
Let us consider the following expression for the wavelet coefficients computed from the observed image (we omit to write the indicesj, k ) :
using the polar representation of the wavelet coefficients, = pelt. First we obtain the following likelihood function:
2 = reiN and
do
Second, as we are interested in an estimator that preserves the phase of the wavelet coefficient, that is, an estimator of the amplitude only, we define a likelihood on the amplitude rj,k by
h, the procedure of calculating the basis vectors can be generalized in the form of the following algorithm (Sayood et al. 1984): 1 . Set i = 1 and calculate b11 = &. 2. Set i = i I and calculate the off-diagonal elements
+
3 . Calculate the diagonal element
4. If i < n , go to 2. C. Laminated Lattices
In this section we examine another class of lattices known as laminated lattices. One of the classical mathematical problems is that of packing an n-dimensional container with identical spheres as densely as possible. This problem is completely solved for the 1- and 2-D spaces. In such spaces, the centers of the spheres must coincide with points belonging to the following lattices:
where A denotes a laminated lattice. The I-D laminated lattice A , consists of all even integral points and is equivalent to the lattices Z I ,A1 and AT. In a 2-D space the lattices A2, A, and A; are equivalent. If we draw unit radius circles around each point, then by placing exact copies of the resulting row of circles next to each other as closely
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING
227
as possible we form the 2-D laminated lattice 122. This lattice, equivalent to the so-called hexagonal lattice A2, is shown in Fig. 16. For dimensions from 2 to 8 the densest sphere packings are known only among lattices. The lattices that form the basis for the densest sphere packings in the spaces of 2, . . . , 8 dimensions are the following (Conway and Sloane, 1993):
where A,,, D,, and E,, are the root lattices discussed in the previous sections. The laminated lattices are known to be the densest sphere packings in dimensions up to 8. This is also true up to the 29-dimensional space with the exception of 10- to 13-dimensional spaces where the so-called K-type lattices give better results. Half of the minimal distance p between two distinct lattice points (see Fig. 16) is called the packing radius. It can also be defined as the largest number p, such that spheres of radius p centered at the lattice points do not overlap. The packing radius of laminated lattices equals unity. The points on the plain farthest from the lattice points are called deep holes, and R is called the covering radius, which is equal to the distance from a lattice point to a deep hole. For the lattice in Fig. 16 the covering radius R = 2p/&. The covering radius is related to the covering problem, which is to find the least dense covering of the space R” by overlapping spheres. The covering problem is the dual of the packing problem defined at the
F I G L I R16. ~ The laminated latticc A ?
228
MIKHAIL SHNAIDER AND ANDREW P. PAPLINSKI
beginning of this section. The covering radius R is the smallest number p , such that spheres of radius p centered at the lattice points cover the whole R ” . The 2-D construction of sphere packing illustrated in Fig. 16 can be extended farther into the 3-D space by replacing the circles with 3-D spheres of the same radius as circles and stacking the obtained layers of the A2 laminated lattices as densely as possible in the third dimension. In this way we obtain the A3 lattice, which is equivalent to the A3 and 0 3 lattices. The A3 is known as the face-centered cubic lattice. Similarly, the A,, lattice can be recursively constructed from the A , , - , lattices. The A,, lattice is obtained by placing the A,?-I lattices as close as possible to each other. In such a way every A,, lattice includes a number of A,,-I lattices. This relationship is depicted in Fig. 17. It can be seen in this figure that for spaces of some dimensionalities the result of the foregoing construction is not unique. That is, more than one laminated lattice can be generated in such spaces. From the point of view of image compression the most interesting of the high-dimensional laminated lattices is the A 16 lattice with the generator matrix
- 4 2 2 2 2 0 2 0 0 0 0 2 0 0 0 2 0 0 2 0 1 MA16
=
-
1/2
0
2 0 0 0 0 2 0
2 0 0 0 0 0 2 0
2 0 0 0 0 0 0 2 0
2 0 0 0 0 0 0 0 2 0
2 0 0 0 0 0 0 0 0 2 0
2 0 0 0 0 0 0 0 0 0 2 0
1 1 1 1 0 1 0 1 1 0 0 1 0
0 1 1 1 1 0 1 0 1 1 0 0 1 0
0 0 1 1 1 1 0 1 0 1 1 0 0 1 0
-
0 0 0 1 1 1 1 0 1 0 1 1 0 0
I1 1 1 1 1 1 1 I ’ 1 1 1 1 1
1
1 1-
0
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING 229
... ...
... \
FIGURE17. Laminated lattices
V. QUANTIZATION AI.GORITHMS FOR SELECTED LATTICES Quantization can be viewed as a mapping of an input set onto a precalculated set known as a codebook. For each input sample, or batch of samples, the quantizer finds the corresponding samples, or batch of samples, in the codebook according to the minimal distance criterion and sends its index to the output stream. Let us assume that the codebook consists of points belonging to a certain lattice L,t. The dimensionality n of the lattice corresponds to the dimensionality of the input vectors of the quantizer. Assuming that the codebook of a quantizer is formed from the lattice points we need an algorithm to find the closest lattice point for every input vector. For each lattice such an algorithm is required to quantize the input with the codebook in which the codewords are the lattice points. In this section we examine some fast quantization algorithms (Conway and Sloane, 1982; Gibson and Sayood, 1988) for the lattices discussed in the previous sections. Before we proceed further with the presentation, let us introduce first a few “utility” functions used in quantization algorithms. 0
0
Let u(x) denotes the closest integer to x. In the case of a tie, when x is equidistant from both neighboring integers, ,u(x) is equal to the integer with the smaller absolute value. Let w ( x ) denotes the second closest integer to x distinct from ~ ( x ) .
230 0
0
M I K H A I LSHNAIDER AND ANDREW P. PAPLINSKI
If the input is an n-dimensional vector, that is, x = (XI, . . . , x,,), the functions v ( x ) and w ( x ) are applied to each vector component separately, that is,
Let us also define the round-off residue function, d ( . ) , as d ( x )= x - u(x).
0
Let us also define a coordinate index k in the following way:
and Jd(xk)I = Id(x;)I
0
implies
k I i.
In words, the coordinate xk has the largest absolute value d(xk) among all coordinates xi in a vector x. If it happens that a number of coordinates in x have the same maximum absolute value d(xk) then k is assigned to the index of the coordinate, among those with the maximum d(.), which is located on the leftmost position in the vector x. Finally, given that xk is known we define a function g(x) as
The function g(x) effectively equals u ( x ) with the coordinate v ( x k ) substituted with w(xk). The index k is defined as in the foregoing. The preceding functions are illustrated in Fig. 18.
v(x)
R(X)
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING
If, for example, x = (3.6, -2.5, - 1.8), then the functions and g(.) return the following values: 1I(X)
?I(.),
23 1
w(.), d(.)
= (4, -2, -2),
w ( x ) = (3, -3, - I ) , d ( x ) = (-0.4, -0.5, 0.2), g(x) = (4, -3, -2).
A. The Closest Point of a Dual Lattice As mentioned in Sect, IV, for each lattice L there exists a dual lattice L* that can be specified in the following way: I- I
L* = u ( r i I =o
+L)
(59)
where ri are the glue vectors. Let us assume we have a quantization algorithm for the lattice L that assigns a pointy from the lattice L to a point x being quantized, that is, y = @(x), y
E
L.
Then, the quantization procedure for the corresponding dual lattice, L*, can be defined by the following procedure Conway and Sloane (1982): 0
Calculate all prospective dual vectors
+
y: = @(x - r ; ) r ; ,V 0 5 i 5 k 0
-
I.
(60)
Determine the glue vector, r.i for which the distance dist(x, y;)
0
attains minimum. Assign y * = @(x - r , )
+ r,
where y * E L*. In other words, y * is the one closest to the point x being quantized among all other y:.
232
M I K H A I LSHNAIDER AND ANDREW P. PAPLINSKI
Apart from the dual lattices this procedure can be used to find the closest points for any lattice that can be represented as a union of cosets of the form given by Eq. (59).
B. Z,, Lattice We know that the Z,, lattice consists of all integer points in an n-dimensional space. Therefore, for x E R" the pointy of the lattice Z,, closest to x is given , is, by ~ ( x ) that y =v(x). C. D,, Luttice arid Its D u d
The quantization algorithm for the D,, lattice follows directly from the definition of the corresponding root system. We know that the D,, root system is obtained by letting Z = R" and R = (r E Z : ( r ,r ) = 2) = ( f ( e ; fe l ) , i j ) . Therefore, the lattice D,, is a set of integer points in the n-dimensional space with an even sum of coordinates. For an arbitrary point x the closest point belonging to the lattice D,, is given by either v(x), or g(x), depending on which result has an even sum of coordinates, that is,
+
@(x)=
v(x)
if x j u ( x , ) is even otherwise (C,g ( x j ) is even
)
For example, for a four-dimensional (4-D) point x = (2, -3.4. 0.7,6.1), we find that ~ ( x = ) (2, -3, 1,6) and that g(x) = (2, -4, I , 6). As the sum of components of v ( x )is even while the sum of components of g(x) is odd, then the 0 4 lattice point that is the closest to x is given by the function 7@). In general, it can be observed that the function g(x) needs to be calculated only in the case when C ; v ( x ; )is odd. Otherwise, it is sufficient to calculate only 4 x ) . The dual lattice D,T can be defined as
where r, are the glue vectors specified as follows:
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING
233
Alternatively, the lattice DZ can be defined in terms of the lattice Z,, in the following way:
with the glue vectors being
ro = (0”) and rl =
(1”)
As both definition equations (62) and (63) determine the same dual lattice D:, one can use any of those definitions in conjunction with Eq. (60) depending on a specific application. However, the latter definition of the lattice D: clearly results in a faster algorithm for obtaining the closest point because of the use of two cosets, instead of four as in Eq. (62). Apart from the smaller number of cosets used in Eq. (63) compared with Eq. (62), quantization of the lattice Z,, in an algorithm based on Eq. (63) is advantageous because it is, on average, faster than quantization of the lattice D,,. D. The Lainitiated
A16
Liittire
As already mentioned, this lattice can be constructed from the first-order ReedMuller code of length 16. The quantization procedure relies on this fact. Using the algorithm for the 0 1 6 lattice we calculate 32 lattice points taking the codcwords of the Reed-Muller code as the coset representatives. The goal point among the 32 lattice points obtained is the one for which the minimal distance from the quantized point is attained.
VI.
COUNTING THE
LATTICE POINTS
A problem that seems especially important in the context of lattice quantization is that of counting the lattice points located within a given distance from the origin. In terms of quantization, the number of lattice points corresponds to the size of the codebook used for quantization. Therefore, we need to know how many points of a lattice L,, are located on the surface of an n-dimensional sphere centered at the origin. We address first the problem of counting lattice points contained inside a sphere of a given radius.
234
MIKHAIL SHNAIDER A N D ANDREW P. PAPLINSKI
A . Estimation of the Number of the Lattice Points within a Sphere
In this section we estimate the number of lattice points contained inside a sphere of a given radius. We will consider first a problem of counting the points of a 2-D lattice 2 2 (Kriitzel, 1988). The lattice 2 2 is the set of all integral points in a 2-D space. We would like to estimate the number of lattice points C,, lying within the circle of the radius f i with m > 1 (Fig. 19):
With each lattice point P , = (B,, y , ) , we can associate a square S, which has the point P , located at its center. Formally, the square S , can be defined as
+
Now we can draw a circle of radius f i d / 2 centered at the origin. All squares S , that lie within this circle satisfy the following inequality:
B’+y2i
l
2
(A+$)
j o j o j o j o j
.
................ . .
................... .. .. ..
FIGURE
I
.
.
.
.
.
19. Counting the points of the lattice Z l
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING
235
Therefore, the number of lattice points C,,, that are contained inside the circle of the radius fi can be estimated in the following way:
which is true for any m 2 1 (as has been assumed here). In order to obtain the lower bound of the C,,,, we draw a smaller inner circle defined by
for which the corresponding estimate of the number of lattice points inside this circle is
Hence -
or
I c,,,- 751121
0. Letting 4 = en': we can rewrite Eq. (72) as
X € L
After the introduction of a new summation variable m = IIx I(*, we can express the theta series as 00
C A,,4"
OL(Z) =
(74)
r1r=O
where A,,, is the number of lattice points with the squared norm of m . In other words, assuming that the vector x is of dimension n , the summation coefficient A,,f gives the number of vectors in the lattice L located on the surface of the n-dimensional sphere of radius f i centered at the origin. The theta functions examined in this study originate from the Jacobi theta function given by
)2
1 =2 (0;( z )
+ (-1; ( z ) ) .
(79)
Among the D-type lattices one of the most interesting is the 0 4 lattice. The theta series of the 0 4 lattice is specified in Table 111. The dual lattice DZ has the following theta series:
238
MIKHAIL SHNAIDER A N D ANDREW P. PAPLINSKI
The last lattice, which is also important in the context of wavelet-based image compression, is the laminated lattice 1216.The theta series of this lattice is
with the coefficients given in Table IV. C. Relationship between Lattices and Codes As already mentioned, the laminated lattice A 16 can be generated from the first-order Reed-Muller code. This is an example of the relationships between lattices and codes, which are examined in this section (MacWilliams and Sloane, 1977; Conway and Sloane, 1993; Leech and Sloane, 1971; Sloane, 1977). Let F2 denote a finite field with two elements: F2 = (0, l}. A binary linear code C over the field F 2 of length n is a subset of F ; . In other words, the code C is a set of binary vectors of length n. The Hamming distance between two vectors C I and c2 is equal to the number of positions where these two vectors differ, and is denoted by dist(c1, cz). For instance, if C I = 101 1 1 and c2 = 00101 then dist(c1, c 2 ) = 2. The Hamming weight of a vector is the number of nonzero components in it, and is denoted by wt(c). For the c l specified here we have wt(c1) = 4. It is easy to derive the following relation between the Hamming distance and Hamming weight: dist(c1, c2) = wt(c1 - c2).
A code C can also be characterized by its minimum distance between any two codewords C : d ( C ) = mindist(c1, cz), V
C I ,c2 E
C
and
CI # c?.
A code with a minimum distance d can correct r(d - 1)/21 errors, where [a1 denotes the usual ceiling function giving the greatest integer less than or equal to a. Typically, an [ n ,k , d ] binary linear code refers to a code with 2k codewords of length n that differ in at least d places. The first-order ReedMuller code related to the A16 laminated lattice is denoted by (Vetterli, 1984; DeVore et al., 1992; Barlaud et al., 1994). TABLE IV
NUMBERS OF
POINTS ON THE SURFACE OF A SPHERE OF A G I V E N RADIUS FOR 'THE
LATTICEI\ 16
In
0
2
4
6
8
10
12
14
16
A,,,
1
0
4320
61440
522720
221 1840
8960640
23224320
67154400
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING
239
For an [ n ,k , d ] linear code C let A, be the number of codewords with the Hamming weight of i. Thus, Cr=,,A, = 2'. The numbers A , are called weixht distribution of C. Now, we can associate with the code C a homogeneous (containing terms that are all of the same degree) polynomial in the fol1owing way :
This polynomial is called the weight enumerutor of the code C. In weight enumerator the variable x effectively indicates the number of zeros in a codeword, whereas y indicates ones. With the preceding fundamentals in mind it is now possible to draw a parallel between the weight enumerators of codes and the theta series of lattices. Consider the coefficients A,,, of the theta series given by Eq. (74) and the coefficients A; of the weight enumerators in Eq. (82). In both cases they specify the number of points, or codewords, at a certain distance from the origin. Therefore, these polynomials contain the essential information about the distribution of vectors in the subspace under consideration. Extending the connection established here between lattices and codes, we can observe that there exist a number of methods of constructing sphere packings from codes. It can be shown that a sphere packing constructed from a code is a lattice packing if and only if the code is linear (MacWilliams and Sloane, 1977). Therefore, by choosing a linear code as a basis for any of the available constructions we obtain a lattice corresponding to this code. Three of the most popular construction methods, referred to as Construction A, B and C, are outlined in what follows. 1. Construction A
This construction yields the simplest way to associate a lattice with a code. Let C be a binary code. The centers x of spheres from the sphere packing corresponding to the code C in R" are those that are congruent (modulo 2) to codewords of C: h x m o d c r E C. (83)
In this way the centers of spheres are obtained by adding even numbers to the codewords of C and then dividing the result by The inultiplication with
a.
240
MIKHAIL SHNAIDER AND ANDREW P. PAPLINSKI
is simply a scaling operation and, as we shall see later, it does not alter enumeration of lattice points and is often omitted. As we have already mentioned, the code must be linear in order to obtain a corresponding lattice packing L ( C ) . Let us consider a codeword c = (cl, c2, . . . , c,,) in a linear code C. Following Eq. (83) the corresponding lattice points 1 = ( I 1 , 1 2 . . . . , l,,) E L ( C ) can be expressed as
L ( c ) = ( 1 1 ~ 1 2 , .. . , l , , ) with
I,
E
(c,
+ 2 Z ) A = -c,1 + &Z Jz
where L ( c ) E L ( C ) denotes a lattice point constructed from a codeword c E C . As c, is defined on F2 = (0, 11, we need to consider two cases, namely, c, = 0 and c, = 1. From Eq. (78) we can recall that Oz,, ( z ) = O , " ( z ) . Thus, S z ( z ) = O,(z), and we can deduce the form of 0~~ as follows:
Similarly, for O , , J Z + J Zwe ~ have
Now we can write
where wt(c) is the number of 1s and n - wt(c) is the number of 0s in c. Consequently,
In summary, if we assume that C is a linear code with weight enumerator W c ( x , y) given by Eq. (82), then the theta function of the corresponding lattice L ( C ) is given by Eq. (84). For example, using Construction A we can generate lattice E8 from the 18, 4, 41 extended Hamming code H 8 (Sloane, 1977) with weight enumerator
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING
24 I
2. Construction B Consider an [n,k, 81 binary linear code C with codeword weights divisible by 4. The centers x of the corresponding sphere packing are points that satisfy the following properties:
I. 2.
421-(mod 2) E C;
4,JZk.x;. i= I
It can be shown (Conway and Sloane, 1993) that the lattice sphere packing L ( C ) obtained by the following Construction B has a theta series as follows: O L ( e ) ( Z )=
+
;Wc((-)->,(22). (4')2(22)) ;(31;(2,7).
For example, by applying Construction B the lattice E N can be generated from the repetition code [8, I , 81 consisting of codewords {{O)', { I ) ' ) . The procedures given by Constructions A and B can be simplified by introducing a coordinate array for each lattice point defined as follows (Conway and Sloane, 1993; Sloane, 1977). The number of columns in this array equals the dimensionality of the generated lattice. Each column is the binary rcpresentation of the corresponding coordinate of the point. For negative number the 2's complement notation is used. For example, the coordinate array of x = (3, 2, I , 0, -1, -2, -3) is 3 1 1 0 0
2 I 0 -1 0 1 0 1 0 0
0 0 0 0 0 0
-2 - 3 1 0 I I I 0 1 1 I I 1 1
Now, if the coefficient f i in Constructions A and B specified by Eqs. (83), (85) and (86) is omitted, we have the following simplified algorithms. Construction A is effectively reduced to finding the centers (points) that have the top rows of their coordinate arrays being codewords of code C . Construction B can be redefined as a method of finding the centers that have the top rows being codewords of code C and the second top rows with even weights. 3. Construction C This construction is based on a linear code C with codewords of length n . The corresponding lattice packing consists of points for which the 2"s rows ( i = 0, . . . , 1 2 ) of their coordinate arrays are i n C.
242
MlKHAlL SHNAIDER A N D ANDREW P. PAPLINSKI
In our discussion of lattice quantization we have covered definitions from the lattice theory (Sect. IV), types of lattices (Sects. 1V.A and C), construction of lattices (Sects. 1V.B and C). Fast quantization algorithms for lattices can be found in Conway and Sloane (1982, 1983). From the point of view of lattice quantization another problem remains open. We need a scaling algorithm that could be applied to wavelet coefficients as a preprocessing step before utilization of the fast quantization algorithms. Such a scaling algorithm is discussed in the next section. VII. SCALING ALGOKITHM Although the probability distribution function of the wavelet coefficients studied in Sect. 1I.B.2 has similar shapes for each block, for example, (LL), (LH), (HL), and (HH), some statistical parameters, such as minimum and maximum values, standard deviation, and so on, are expected to vary from block to block. Therefore, for the purpose of quantization each block of the wavelet coeffjcient matrix should be treated separately. According to the results presented in Sect. 1II.C an optimal quantization of wavelet coefficients can be achieved by uniform placement of the codevectors on the surfaces of concentric spheres with centers at the origin. It has been shown in the previous section that each lattice actually provides a set of points located on the surfaces of spheres centered at the origin. The number of such points for each sphere is defined by the theta series of Eq. (74). Assuming that each lattice point is a codeword, we need to know how many codewords are required for quantization of a given block, or alternatively, what the sire of'the codebook should be. One possible solution to the problem of determining the number of codewords was presented in Sect. I1.C in the form of an optimal bit allocation routine. The output of this routine is the number of bits per pixel or, in other words, the compression ratio for each block. Now, from the target compression ratio we derive the size of the goal codebook (the number of codewords required for quantization), N . By summing the coefficients of the theta series, the number of lattice points corresponding to that size can be obtained. Because the theta series gives only the numbers of lattice points (see Tables 11. Ill. and IV) lying on surfaces of the spheres around the origin, the size of' the codebook actually used is often an approximation of the one generated by the optimal bit allocation procedure. The available size of the codebook consisting of the lattice points is b'riven by N = EYLoA,.Equivalently, given the appropriately chosen radius f i of the largest sphere which accommodates vectors of the squared norm (or energy) 177, we obtain a set of codevectors belonging to the selected lattice. Now we require that the wavelet coefficients (collected into vectors, since we use vector quantization) with a certain norm E be scaled to the norm H I of the surface
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING
243
of the outer sphere. E is selected so that a prescribed number of vectors, say 70-80%, have their norm not greater than E (Fig. 20). The scaling factor between the m and E shells is given by s=
E.
After quantization this scaling factor is included in the compressed data stream (Barlaud rt al., 1994). There are a number of possible ways of handling vectors with squared norm greater than E . The simplest method is to truncate them to the surface of the outer sphere E . However, it results in the introduction of some additional distortion of the high frequency edges in an image. The approach we have adopted and tested in this work is to set a separate scaling factor s, for each
where El is the squared norm of the i-th vector. Thus, the high frequency information remains preserved at the expense of a slight increase in the bit ratc, which depends on the value of the threshold E l . The waling factors s ,
P
0
,,’ E shcll
. o -0.
Wavelet coefficicnts
Fic;~irct;20. Laltice quanriLaiion scheme.
244
MIKHAIL SHNAIDER A N D ANDREW P. PAPLlNSKl
are to be transmitted together with the corresponding vectors. Thus, at the receiving end, dequantization is followed by scaling each vector by the factor l/s. for the vectors with norm not exceeding E , and the factor l/.s;, for the vectors with norm greater than E . Now, let us summarize the considerations presented in this section in the form of the following scaling algorithm to be used as a preprocessing step in lattice quantization of wavelet coefficients: 1 . Compute the squared norm of each vector of wavelet coefficients and de-
termine the norm E . 2. Determine the size of the codebook for a chosen lattice as well a s the corresponding norm in (see Sect. 1I.C). 3. Determine the scaling factor as in Eq. (87) and include s in the bit stream. 4. For each vector (norm E ; ) : (a) if E j 5 E , scale the vector by a factor of s, otherwise determine s; as in Eq. (88), and scale the vector by a factor of s,; (b) encode the vector with a fast quantization algorithm (see Sect. V); (c) if E , > E , include s, in the bit stream.
In this section, we have examined the problem of lattice quantization. As it has been shown, there exist a variety of lattices that can be used for this purpose. The final choice depends on a particular application and relies on the evaluation of a number of criteria, such as the fidelity of the system. the time of encoding, software or hardware implementation, and so on. This issue, namely the choice of a suitable lattice, is addressed in the following section.
VIII.
SELECTING A
LATTICE FOR QUANTIZATION
From Section IV we know that every lattice is determined by its generator matrix M and the points that comprise a lattice are specified by Eq. (30). We can choose these points as codevectors for quantization of an n-dimensional input sequence. Let w , denote an i-th codevector and V ; be a quantization region around it. As a result of quantization. every entry in the input sequence located within the region V ; is represented by the codeword w;.The distortion introduced during quantization can be measured as the mean squared error (MSE) between the input and output sequences of the quantizer. Here we use a dimensionless quantity D known as MSE per dimension given by
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING
245
where x is the input sequence and p ( x ) is the probability distribution function of x . For many applications, an optimization of quantization means an appropriate selection of codevectors w , with the aim to minimize distortion. Thus, it is important to determine the minimum distortion D* subject to the i I N - 1) set of codevectors w , , (0 I D* = infD. W,
Solving this equation for each lattice will enable the choice of an appropriate lattice for quantization of the source x . It is evident that the distortion depends not only on the position of the codevectors in the quantiLation space but also on their quantity. Therefore, to allow a correct comparison of the performance of various lattices, the number of lattice points N used for quantization should be fixed for every lattice. We know that the quantization regions V, formed by a lattice are congruent Often, in order to fit N quantization regions in a to its fundamental region quantization space Q, quantization regions are scaled
v.
where q is the scaling factor. It was shown by Zador (1982) that for large N the following equality is satisfied:
where r is set to 2 for the MSE measure, and G p ( n , r ) is known as the dimensionless second moment of a polytope P. In the case of lattice quantization, a polytope P becomes a quantization region of a lattice. After setting r to 2, we have that the moment C,(rz, 2) depends only on n , which is the dimensionality of the input x. Assume that w , are selected so that the distortion D* is minimized. Thus, G v ( n ,2) can be calculated as
246
MIKHAIL SHNAIDER AND ANDREW P. PAPLINSKI
In the preceding equation SR,$p ( ~ ) " / ( " ~ ' )has d x been replaced by E L , S, p ( x ) r l / ( " + 2 ) dbecause ~, the quantization space is entirely covered with the quantization regions. The values of the second moment G y ( n ,2) can be tabulated for each lattice by letting x be uniformly distributed over the quantization space Q, that is p ( x ) = const. After setting p ( x ) to a constant, Eq. (93) becomes
Because all quantization regions V, are congruent to the fundamental region of a lattice we have
l,
dx = vol(V,) = vol(v)
(95)
v
where is the fundamental region of a lattice. Also, provided that wi are located in the centroids of the corresponding quantization regions, the distortions produced by quantization of the uniformly distributed source x within all quantization regions are equivalent. Finally, assuming that the centroid w of the fundamental region is located in the origin, Eq. (94) can be simplified as follows:
Table V, reproduced from Conway and Sloane (1993) shows the dimensionless second moment G v ( n , 2) for various popular lattices. Equation (92) can be now rewritten as
247
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING
TABLE V DIMENSIONLESS SECOND MOMENT G p ( / l ,2) FOR SELECTED LATTICES. (Source: Conway and Sloane. 1993. Reprinted with permission.) 1
2
A5
4
D5
DT
E(,
E:
E7
E;
077647
,076922
,075786
,075625
074347
,074244
,07323I
,073 I 16
/I
4
3
X Ax
AI;
DX
1);
.(I7739I
,075972
,075914
.(I74735
Ex.E,* .07 16x2
24
12
16
KI?
A I ~AT6 ,
Ax,
,070100
,068299
,06577 1
According to the scaling algorithm presented in Sect. VII, every input vector x is multiplied by the scaling factor l/s before it is fed into the quantizer.
Thus, for the minimum distortion, D*, we have
Suppose that the lattice points cover the quantization space densely enough to assume that the PDF of the input is uniform within each quantization region. With this assumption, Eq. (98) can be simplified further as follows:
where p ( w , )is the probability of the source at the centroid of the i-quantization region. In the preceding derivation we have assumed that all quantization regions are congruent and the PDF within each of them is constant, thus, Eq. (95) is satisfied. Using Eqs. (95), we can rewrite Eq. (99) in the following form:
It is easy to see that the value of .s”vol(v) in the foregoing equation is, effectively, the normalized volume of the fundamental region of a lattice, and the scaling factor .I: corresponds to the scaling factor q in Eq. (91). An
248
MIKHAIL SHNAIDER AND ANDREW P. PAPLINSKI
appropriate selection of the factors will ensure that the normalized volumes of the fundamental regions of all lattices are equal to each other, and a collection of N lattice points covers completely the quantization space R. A connection between s and N also can be seen from the fact that s is given by the scaling equation (87) and m in Eq. (87) is chosen so that N = C’,!’=,A,. Here N denotes the number of points that belong to a lattice. For quantization, lattice points become codevectors. In (100) the codevectors are denoted by wi with i between 1 and N . As mentioned in Sect. VI the number of lattice points is given by the theta series equation (74). Therefore, Eq. (100) can be modified in the following way:
(101) with wlwk = J , A, being the coefficients of the theta series. Equation (101) gives a good approximation of the distortion measure provided that the following assumptions are fulfilled: 0
0
quantization codevectors are densely distributed in the quantization space; and the probability distribution within each quantization region is almost uniform.
In order to select an optimal lattice for quantization of a source, one should fix the value of N and then solve Eq. (100) for every lattice with p ( . ) set to the PDF of the source. The final selection of a lattice for quantization does not depend on the actual values of the distortion measure D* across the range of lattices, but rather on which lattice attains the smallest D*. In this context, it appears valuable to note that after fixing N for some n-dimensional quantization space R,the term N p 2 / ”i s ” v ~ I ( V ) ) ( ” +becomes ~ ) / ~ ~ constant: C = N-””
(Il+2)/11
(s”vol(V))
(102)
Thus, we have
1
(l1+2 1/11
D * ( N , R) = C G v ( n , 2) )( : ; (
2:C
p(.~Wk)n’(r’+2)
(103)
As our primary concern here is quantization of wavelet coefficients, the PDF that must be used in Eq. (103) is the PDF of wavelet coefficients. Recall from Sect. I1 that the PDF of wavelet coefficients can be modeled by the Gaussian-type function. Naturally, we can use this model in Eq. (103).
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING
249
Consider the following example of selecting a 2-D lattice for quantization of the lowest frequency band of the wavelet coefficients. Assume that we have a bank of two lattices, namely lattices 2 2 and A ? . As shown in Eq. (7), for the lowest frequency band of the wavelet coefficient matrix (texture) the Generalized Gaussian function (GGF) is reduced to the Gaussian PDF. Assuming that the Differential Pulse Code Modulation (DPCM) predictor used in the lowest frequency band removes correlation between neighboring samples, we have ( 104)
given that the standard deviation (T = I . For the preceding Gaussian PDF, the distortion measure can be expressed in the following way:
where p ( . s f i ) is the probability of the source vector having the squared norm of .s2,j:
where j is the squared norm of x . For an estimation of the distortion measure of Eq. (105) it is required to select the scaling factor s . In Sect. VII, s was defined a s a scaling factor for mapping the outer sphere of the quantization space S2 onto the outer sphere composed of the lattice points to be used for quantization. After testing, it was found that a good choice of the outer sphere of the quantization space for wavelet coefficients is when it includes between 70% and 80% of wavelet coefficients. This corresponds to selecting the value of E in eq. (87) to be approximately (1.30)'. where (T is the standard deviation. Recall that (T was chosen to be 1 in Eq. (104). In order to calculate s specified in Eq. (87), it is required to select the number of lattice pointdcodevectors used for quantization; this number is denoted by N . As N = X Y ! o A , , , from N we can find m which is needed in Eq. (87). As already mentioned, A,, are coefficients of the theta series, which, for the lattices 2 2 and A2 used in this example, are given in Table VI. Let us set N = 45, for example. Thus, we have Z i L , A , = 45 for the lattice 22 and C&Aj = 43 RZ 45 for the lattice A?. Using this, the scaling factors for the lattices 2 2 and A2 become s = I . 3 / m and s = 1 . 3 / m , respectively.
250
MIKHAIL S H N A I D E R A N D ANDREW P. PAPLINSKI TABLE VI N U M H ~ ROF S POINTS OF I H E LAI TICES z? AND A? 0
I
Z
?
A? A ,
A
, I
2
I
I 6
4 0
3
4
5
4
0
4
6
6
0
7
6
8
0
0
8
0
12
9
4
4
0
6
I0
II
12
8
0
0
0
13
I4
15
16
0
8
0
0
4
6
12
0
0
6
Now, Eq. (105) can be tabulated for the test lattices. The results are
q, (45, Q) = 17.2136 C
and
D;? (= 45. Q j
C
= 15.7095.
In conclusion, by quantization of the wavelet coefficients with the A2 lattice one may expect to obtain approximately 10% improvement in the meansquared error comparing with quantization of the same source with the Z ? lattice. Unsurprisingly, this result coincides with the result obtained by Conway and Sloane (1993) for a uniformly distributed source. For a uniformly distributed source, the benefit of using A2 over Z ? is approximately 4%.
Ix.
ENTROPY CODING OF
LATTICE VECTORS
In an image compression system lattice quantization is typically followed by an entropy coder, which exploits an unevenness of the PDF of lattice vectors. Therefore, an entropy coder requires the knowledge of the probability of each codevector before encoding. In many systems, the codevectors together with their probabilities are included into the output bitstream. This results in undesirable increase in the bit rate. As the lattice quantizer has a regular structure, all codevectors can be generated by the decoder from the fundamental region of the lattice. What remains to be included into the bitstream is the probability for each vector. However, as we know that the probability distribution of the wavelet coefficients has a form of the generalized Gaussian function, the probability of a codevector w , can be approximated as
P ( w , ) = s”vol(v)p(w,j V I 5 i 5 N
(107)
where N is the number of lattice points used for quantization and p ( . ) is the generalized Gaussian function with the mean of 0 and the standard deviation equals the standard deviation of the wavelet coefficients. This estimation can be done independently at the encoder and decoder using the standard deviation of the coefficients, which can be included into the bitstream without noticable overhead.
LATTICE VECTOR QUANTIZATION FOR WAVELET-BASED IMAGE CODING
25 1
Although the preceding approach to tabulate the probability of each codevector is correct it appears rather demanding t o enumerate all lattice points with their probabilities. It was shown in the previous sections that the generalized Gaussian function with an appropriately selected parameter ( closely resembles the PDF of the wavelet coefficients. is set to 2 for the lowest frequency band and ( = 0.7 for the remaining bands. When ( = 2 the n becomes the well-known Gaussian and, consequently, the regions of equal probabilities are located on spheres centered around the origin, which is 0 in the case of wavelet coefficients. If ( equals 0.7 the equiprobable surfaces become somehow distorted spheres. Obviously, the farther the value of ( departs from being 2 the more distant from a spherical shape the equiprobable surfaces become. For instance, when ( is I , that is the Laplacian PDF, the equiprobable surfaces are pyramids (Fischer, 1986). The number of lattice points on each equiprobable surface is given by the theta series specified in Eq. (74). Provided that we have a theta series for each lattice, instead of enumeration of the lattice points as in the foregoing, we can enumerate spheres and calculate the probability of any vector, say a representative, located on each sphere. The remaining vectors have the same probability as representatives corresponding to them. In order to accomplish the preceding enumeration we need a theta series for each lattice. Let us define an (-norm of an M-dimensional vector x as
o
then it defuzzifies all states into classical logic variables. 4. FA^^^^^,^ (., .) and F H ~ ~ , ~ ~ .) , ~can , , , be , ( simple ., operations on fuzzy sets, for example, union, intersection, algebraic product, algebraic sum, bounded suin and bounded difference. Also, they may be complicated operations, for .) example, similarity between two juzzv numbers. By the way, FAcIie,v,,,,Jl(., and F B ( . ~ , ~ ~ ,.)~may , , , ~also ( . , be any combination of the foregoing operations. They can use any number of entries in A , and B f and use any y k / and ukl in N r ( i , j ) . It is not necessary to keep the’tradition of the conventional CNN where every entry in A or B can multiply only a y k l or a uk/ because the relation between the (local) structure (synaptic weights) and the (local) information flow (inputs, state variables and outputs) is the only thing we are concerned about in an FCNN. ,,,]
5. The general frame given by Eqs. (26) and (27) can be easily generalized to delay-type FCNN (DFCNN) and discrete-time FCNN (DTFCNN).
282
TAO YANG
C. Classificationqf FCNN Fortunately, from 1974 on there has been much literature on fuzzy (artificial) neural networks (FNN) (Lee and Lee, 1974). According to the method presented by Buckley and Hayashi (1996), we can lump FNN into 3 types: 1 ) Type-1 FNN, which has real signal and fuzzy weight (Yamakawa, 1990; Yamakawa and Furukawa, 1992); 2) Type-I1 FNN, which has fuzzy signal and real weight (Jang and Sun, 1995; Shann and Fu, 1995); and 3 ) TypeI11 FNN, which has fuzzy signal and fuzzy weight (Ishibuchi, 1993; Hayashi et ol., 1993; Blanco et al., 1995a,b; Pedryez, 1991; Furukawa and Yamakawa, 1995). This classification can also be applied to FCNN. As FCNN is a generalization of CNN from classical set to fuzzy set, it is not strange to enable a generalization of almost all conclusions and applications of the conventional CNN. Corresponding to the classification presented in Sect. I. B. 2, the FCNN structures can also be classified in that way. With the ability to interpret linguistic statements, the fuzzy set theory embeds the ability to process linguistic inputs into FCNN. From this point of view, FCNN can model not only structures of neural systems, as the conventional CNN does, but also the behaviors and function of neural systems, namely, cognitive processes. From these statements one can see that the classification of FCNN can be done in both the CNN and the FNN direction. There exist some differences between FNN and FCNN. In an FCNN, the cell property is not necessary space-invariant, which means that FCNN has more possibilities than FNN. For example, we can define type-IV FCNN while there does not exist a corresponding type-lV FNN. A type-IV FCNN may contain all kinds of cells that belong to any type-I, -11, and -111 FCNN. This structure gives the type-IV FCNN the ability to process both signals (real numbers) within some neighborhood systems and more general information flow (e.g., conceptual or linguistic variables) within some other neighborhood systems. The type-I1 FCNN is the closest one to the conventional CNN because it has real weight. On the other hand, because type-I1 FCNN has the simplest structure for VLSI implementation, this is why most FCNN results are focused on this type. D. Different Structures of FCNN
Since it is impossible to study all kinds of FCNNs (as defined in Eqs. (26) and (27)) in a single section, we will focus on some simple cases in which only fuzzy logical OR ( q or MAX) and fuzzy logical AND (A or MIN) are integrated. On the other hand, MAX and MIN are the simplest fuzzy union and intersection operations that can be implemented by using VLSI technologies.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
283
The structure of FCNN is a tradeoff between VLSI implementation and general function. For the purpose of VLSI implementation, the FCNN proposed here integrates the fuzzifier, the defuzzifier, and the fuzzy inference engine into a planar structure. The nonlinear dynamics of the conventional CNN are kept in FCNN structure. In this section, we give structures of type-I and type-I1 FCNNs. Also, only 2D cases are given. 1. Type-IFCNN A cell
Ci, in an M x N type-I FCNN is defined by:
State equation
Output equation
Input equation u fJ = E l , , 1
5iI M, 1 I .j 5 N .
Constraint conditions
Parameter assumptions
Boundary conditions and initial conditions.
(33)
284
TAO YANG
In a type-I FCNN, thereexist fuzzy synaptic weightsA,(i, j ; k , / )a n d j f ( i ,j ; k , f ) . (In this section, we use the symbol ‘‘-” over a character to denote a fuzzy number.) The relation between a fuzzy feedback synaptic weight and an output is defined by the membership function ( j , j ; k ~, (/ y k / ) . The relation between a fuzzy feedforward synaptic weight and an input is defined by the membership function p B , ( i , j : k , ~ ) ( ~ ~The / ) . inputs and outputs are crisp variables in a type-I FCNN. Remark: Fuzzy synaptic weights introduce a set of nonlinear synaptic laws into a type-I FCNN. In general case, the concept “template,” which is very useful in conventional CNN (Chua and Yang, 1988a) is not suitable for describing fuzzy synaptic laws.
2 . Type-I1 FCNN A cell C,, in an M x N type-I1 FCNN is defined by:
State equation
Output equation
Input equation
Constraint conditions
Parameter assumptions
Boundary conditions and initial conditions.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
285
where pl,k/(.) and ~ . ~ i ; ( are . ) two membership functions. In a type-I1 FCNN, all synaptic weights are crisp. Inputs and outputs are supposed to be fuzzy. They are described by membership functions pLlkl(.)and ~ ~ i ; ( . Observe ) . that p , i , ; ( . ) corresponds to the output function in a conventional CNN. The preceding type-I1 FCNN is sometimes called multii>licutive type-I1 FCNN. Correspondingly, there exists an additive type-I1 FCNN whose state equation is given by:
A subclass of the additive type-I1 FCNN has been found to be a universal paradigm for implementing mathematical morphology operations (Yang and Yang, 1997d,e).
3. Simple Cuses of Type-I and Type-I1 FCNN In type-I and type-I1 FCNNs, F A ( . and ) F B ( . )denote two fuzzy local operators defined in N,.(i, j ) , which may be any fuzzy logical expression combined by fuzzy OR ‘‘G”and fuzzy AND “A”. For example, suppose F A ( . )denotes the following fuzzy logical expression in a 1 -neighborhood system:
where xi, denotes fuzzy variable (for example, the gray value of a pixel (i, j ) in an image). Then we have
286
TAO YANG
A simple and most commonly used type-I FCNN is given by
+ +
A A
CLA,,,,,,(r,j:k./)(Ykl) C U E N ,( i , j )
p H , ,,,,,,( i . j ; k , / ) ( u k / )
FB,,,,; , x ( ; , , ; ; k , / ) ( y k / )
C U E N , ( I .j )
\j
+
Cii eNrif.j)
1 (i(M,1
v
f
P E f ,,.i x ( i , j ; k , / ) ( u k / ) t
CLIE N ,( i .j )
(45)
5 , j ~ N .
If FCNN is space-invariant, then in view of the method used in conventional CNN (Chua and Yang, 1988a), Eq. (45) can be rewritten as the following 2D convolution form:
+ ;if r n i n O r n i n Y i j + 2.f m a x O r n a x Y i j + B f r n i n O r n i n u i j + B f m a x G n m x u i j > 15 i s M , 15 j i N (46) where * denotes a 2D convolution. The A, B are feedback and feed-forward templates, respectively, and Af ,,,in, A f B f !,,in, and B.j max are fuzzy feedback MIN template, fuzzy feedback MAX template, fuzzy feed forward MIN template, and fuzzy feedforward MAX template, respectively. O,,,,, denotes a 2D operation as shown in following example: maxOrnmllxYij
=
v (‘LiEN,
and
Omin
CLA,,,,,,(i,.j:k./)(Yx/)
(47 1
(Ji)
denotes a 2D operation as shown by: ifrnin6minYij
=
A
CLA,,,,,,,(i.j;k./)(Vk/).
(48)
(‘L/EN/u.;)
From the preceding one can see that fuzzy templates and 6 f are local fuzzy patterns that are fuzzified for the purpose of fitting a more loose relation between fuzzy templates and signal patterns. A simple and most commonly used type-I1 FCNN is given by
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
287
where A.f Af Inax. B,f ,,,in, and B f Illax are feedback MIN template, feedback MAX template, feedforward MIN 'template, and feedforward MAX template, respectively. The Olnaxdenotes a 2D operation as shown in the following example: ~ f r n a x0n1ax
Yij
=
\s/
~flljax(i,j ; k , f ) Y k /
(51)
C U E N ,(1. j )
and Omindenotes a 2D operation as shown in the following example: A f m i n ~ m l Y,,/ n =
/I
~j
min(i9 j ; k , [)YL/.
(52)
C u m , ci..;,
E. Diferences between FCNN und FNN The FCNN structure differs significantly from other FNN structures. In an FCNN, the fuzzifier layer and the defuzzifier layer, which always appear in a standard FNN structure, are embedded into a single layer. This planar structure is mostly suitable for 2D VLSI implementation because links between two chips are avoided. Thus an FCNN universal cell should be programmable for different membership functions and allow some basic programs of fuzzy operations (relational computation). The most significant characteristic of FCNN is the local connectedness of cells. Any FNN structure that is based on local connectedness will fall in the range of the concept of FCNN. To understand the differences between FCNN and FNN, we first show how we can get FCNN structures from the crossover of FNN and CNN. A typical FNN has three layers as shown in Fig. 4a. The first layer is used to give crisp
288
* TAO YANG
Glohal Connections Between Layers
B Input
- u u
88
memhership functions
I
000 0 v A
fuzLy operetlons
@ i-th defuziified function (a)
@ input function @ dynamics of cell
I
(h)
FIGURE4. The FCNN structure is a crossover of the conventional CNN and FCNN. (a) Typical FNN structure. (b) Typical conventional CNN structure. (c) The structure of FCNN. (d) The planar topological structure of a cell in FCNN for easy VLSI implementation.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
I
289
1.ocal Caoncclmns Inwlc thc h y c r
/-- b
function
inputs some fuzzy measurements. The nonlinearities in the neurons of this layer are some membership functions. The second layer is used to calculate the relationship between different fuzzy variables from the first layer. Fuzzy computations are embedded into the nonlinearities of this layer. The third layer is used to give some crisp forms of outputs. If the inputs are already fuzzy
290
TAO YANG
variables (e.g., the gray value of a pixel in a digital image), then the first layer can be eliminated. Also, if we need some fuzzy outputs (e.g., outputs that are used by a high-level A1 system), then the third layer can be eliminated. Figure 4b shows the typical structure of a signal layer conventional CNN. Observe that this single layer CNN contains an input function sublayer, a cell dynamics sublayer and output function sublayer. In a conventional CNN, the input function is usually a linear function and the output function is a piecewise linear function. Of course, we can use more complicated nonlinearities as input and output functions. Then we are in the situation of showing how to combine the structures in Figs. 4a and b into FCNN, which is shown in Fig. 4c. By using different membership functions as input functions, we embed the fuzzifier layer of FNN into the input function sublayer of FCNN. We combine the cell dynamics of CNN and the fuzzy operations of FNN into a fuzzy-crisp mixed dynamical sublayer that contributes FCNN cell dynamics. The fuzzy-crisp mixed dynamics make our FCNN structure solve both crisp and fuzzy problems. Finally, we choose the nonlinearities of the defuzzifier layer of FNN as the output functions of FCNN. The output function sublayer is equivalent to the defuzzifier layer of FNN. Of course, FCNN structures carry out the properties of local connectedness from CNN. Figure 4d shows that the sublayers in Fig. 4c actually can be fabricated onto a planar silicon chip. In this structure a region in each cell is used to fabricate a photodiode as an embedded sensor. Then the fuzzifier, the fuzzy inference engine, and defuzzifier are fabricated in different subregions within the cell.
111. THEORY OF Fuzzy CELLULAR NEURAL NETWORKS
In this section, we study the elementary theory of different kinds of FCNN. We always present the results for type-I1 FCNN first and then present the corresponding results of type-I FCNN.
A. Elem entury Theary
In this section, we study the dynamical range of type-I and -11 FCNN. The dynamical range is important to a physically implemented FCNN because we know the dynamical range and then we can choose the power supply and physical structures of FCNN. Also, the existence of an equilibrium point for type-I and -11 FCNN is needed to guarantee correct operation of FCNN.
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
291
I . Dynamicul Rmge of Type-I1FCNN To guarantee that FCNNs can be implemented by physical systems, we should study the dynamic range. In this section, we study the dynamic range of type-I1 FCNN in Eq. (49). First, we need the following definition.
Definition 7. Dissipative FCNN: Let E be a compact set in !IIMN; if all solutions of an FCNN finally fall into E and stay in E , then this FCNN is called a dissipative FCNN. Remark: Let
The dynainical range of a dissipative FCNN is E .
i
when(k, I ) = (i, j ) and A(i, j ; k, 1 ) 5 0 when(k, I ) = (i, j ) and A(;, j; k, I ) > 0, or, (k, 1 ) # ( i , j ) (53) then we have the following theorem. ''I
=
0, I,
Theorem 1. The type-11 FCNN in Eq. (49)is u dissiputive FCNN and all its solutions with any initial conditions x, ( 0 )jfinally fall into the jbllowing conipuct set:
,
whcre
292
TAO YANG
where x = cof (XI 1 ,
. . . ,X M N ) .
Proof ( 1 ) We construct a radially unbounded positive definite function VI =
. 1
c M
N
Differentiate V I along the solution of Eq. (49), and as lylJl I and I p u ( u I J ) I I 1, we have
1.I
J
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
293
The last inequality is satisfied when x E S2,. (2) We then construct MN radially unbounded Lyapunov functions with respect to the state variable x;, as follows:
where sgn(.) is the signum function. Along solution of Eq. (49), we calculate the Dini upper-right differential as
294
TAO YANG
the last inequality is satisfied when x E !&. Thus, the solution of Eq. (49) will fall in !NMN/C2z,and fall in S and stay in S . If x(0) E E, then x ( f ) E 3 , V t > 0. So, S is an w-invariant set, in W " N / S there exists no stable equilibrium point of the FCNN in Eq. (49). 0
Remark: This theorem gives the dynamical range of an FCNN; in practical circuit design, we can choose the correct parameters to guarantee that the FCNN can work in the typical voltage range of power supply in IC circuits. The following theorem guarantees that the FCNN in Eq. (49) has at least one equilibrium point.
Theorem 2. The FCNN in Eq. (49) has
nt
Consider the following vector operator:
where
least one equilibrium point.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
Q
z
295
I+ v max
+
R.,
i/
IBf m,tn(i.j ; k , 111 Q,
1)
IBj rn,x(i,j ; k , l)1
Ci/ E N , ( , . , 1
CIlEN,(I J )
Then the vector operator
v
+
maps the following set
S = {x(x,,( 5 Q , 1 5 i 5 M , 1 5 j 5 N }
. (65)
(66 1
into itself. As S is a convex compact set, from Brouwer’s fix-point theorem, we know that Q, : S H S has at least one fix-point x = x * . And x * is an 0 equilibrium point of FCNN in Eq. (49). 2. Dynamical Range of Type-I FCNN
Theorem 3. The type-I FCNN in Eq. (45) is a dissipative FCNN and any of its solution with any initial condition x,, (0)will full into the following compact set: !vN/QI !liMNp2 (67)
A
where
n
296
TAO YANG
where x = col(xl1,. . . , X M N ) .
Proof (1) We construct a radially unbounded positive definite Lyapunov function l
v ,= 2c
-
cc M
N
.
l=l
XI',
J=I
(70)
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
< 0.
297
(71)
The last inequality is satisfied when x E 2 ' I. (2) We then construct M N radially unbounded Lyapunov functions with respect to the state variable
where sgn(.) is the signum function. Along the solution of' Eq. (43, we calculate the Dini upper-right differential L1S
298
TAO YANG
the last inequality is satisfied when x E S22. So, the solution of Eq. (45) will and fall in E and stay in 8 . If x(0) E E,then x ( t ) E E,Vr > 0. fall in RMN/S22, So, 8 is an w-invariant set, in R M N / Ethere exists no stable equilibrium point of the FCNN in Eq. (45). The following theorem guarantees that the type-I FCNN has at least one equilibrium point.
Theorem 4. The FCNN in Eq. (45) has at least one equilibrium point. Proof
Letting the right-hand side of Eq. (45) be 0, then we have
Consider the following vector operator:
where
and x = col(xl I , . . . , . x M N ) ,
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
299
Let
Then the vector operator
4,
maps the following set;
into itself. Since S is a convex compact set, lrom Brouwer’s fix-point theorem, we know that 4, : S H S has at least one fix-point x = x*. And x* is an 0 equilibrium point of FCNN in Eq. (45). B. Clobul Stability
1 . Results ,for Type-ll FCNN As each pixel of input image can be viewed as a.fuz2.y singleton, we can choose pL,(.)as p.,,(x)= x, then the type-I1 FCNN in Eq. (49) can be rewritten as
Output equation of C ; ; is given by
Constraint conditions are given by
300
TAO YANG
Parameter assumptions are as follows:
In state equation (79), if no fuzzy logical relation exists between two cells C;,, and Ckl, then we say that the fuzzy connections between them are rinnexistent, or else we say that the fuzzy connections between them are existent. We only study the FCNN with flat fuzzy feedback MIN templates and flat fuzzy feedback MAX templates. A ,flat fuzzy feedback MIN template is defined as A , lnill(i,j ; k, I ) = a, VCkl E N r ( i , j ) and A f m l l , ( i j, ; k, 1 ) are existent
(83)
where a is a constant. A flat fuzzy feedback MAX template is defined as A , nlax(i, j ; k, I ) = j?, vch/ E N r ( i , , j ) and A f nlax(i,j ; k, 1 ) are existent
(84)
where j? is a constant. Then Eq. (79) can be rewritten as
where
a , .' J
-
=
{ {
:ndefined, {ndefined ,
if corresponding A , m,n(i.j ; k , I ) is existent. if corresponding A f m i l l ( i ,j ; k, I ) is nonexistent. (86) if corresponding A f m a x ( i j, ; k, I ) is existent. if corresponding A,,nax(i, j ; k , I ) is nonexistent.(87)
From parameter assumptions in Eq. (82), we have
We then have the following proposition:
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
Proposition 1. Suppose that x and x’ are two states of FCNN in Eq. then we have (1)
301
(as),
302
TAO YANC
then we have the following theorem.
Theorem 5. Suppose that the spectral radius of matrix R, [ A1, p(R, [ AI ) < 1 , then the type-lI FCNN in Eq. (85) has only one equilibrium point, and this equilibrium point is globally stable.
Proof The existence of equilibrium point of FCNN in Eq. (85) is guaranteed by Theorem 2. Now we only need to prove that the FCNN has less than two equilibrium points. Let the right-hand side of Eq. (85) be 0, we have
( 1 ) N ) and x ( ~ = ) c01(x;~),. . . , xzh) be two solutions Let x(') = col(xj", . . . , x M of Eq. (85), then we have
MN
The second inequality is in view of Proposition 1.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
303
We then rewrite Eq. (99) into a vector form as follows:
As p(R.,IAI) < I , we have
We then have x(I)
- (2)
-x
3
which yields that the FCNN in Eq. (85) has only one equilibrium point, x*. As x* is the only equilibrium point, following Eq. (85) we have
As p(R,IAI) < 1, ( E - R,IAI) is an M-matrix, where E is the unity matrix. So, there exists a group of positive constants, p, > 0, i = 1 , 2 , . . . , M N , such that MN
j = 1 , 2,..., M N . We construct the following Lyapunov function: 1
V(x) = C
c MN
,IjlX,
/=I
I ; .-
> 0.
304
TAO YANC
When x = x*,we have V(x) = 0. When Ix, - xgl + +m, we have V(x) -+ +m.
Along the solution of Eq. (103), we calculate the Dini upper-right differential of V(x) as D+ v (x ) I&.(
103)
(106)
< 0.
The second inequality is in view of Proposition 1. The second equality is in view of the parameter assumption in Eq. (88). The last inequality is satisfied 0 when x # x*. Similarly, we can have the following theorem.
Theorem 6. Suppose that the ,following matrix
is a Hurwitz niutrix, then the equilibrium point x = x* is globally stable. Here s;j
=
{
1 ,i=j 0 ,i#j.
(107)
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
Proof
305
Since
MNxMN
+ ( ( l - 6 ; / ) ( l " i ; l + Ia;,/I+
IBi,jI))MNxMN
is a Hurwitz matrix, we have that
+ + Ia;;I + lBiiI + Ilai;l + ~i;,
- ((1
- 6ij)(lLli.;I
MNxMN
IB;;I))MN~MN
is an M-matrix. Then, from properties of M-matrix, there exists a group of positive constants. p i > 0, i = I , 2 , . . . , M N , such that I
PJ
(-g
+ la.;.iI + lb;./I
+"././
- 6f/)(lcl;jI
f Ia;,;I
+
IBi/I)
< 0,
i= I
j = 1, 2 , .. . , M N .
(108)
We construct the following Lyapunov function: , MN
(109)
where sgn(.) is the signum function. When x = x*, we have V(x) = 0. When Ix, - XTI + +m, we have V(x) -+ +m. Along the solution of Eq. (103), we calculate the Dini upper-right differential of V(x) as D+V(X)lE,.
(103)
306
TAO YANG
MN
+ C pi(l
-
1
~ ~ J ) ( I ~+i IaijI j I + I B ~ ~ ~I ). f ( x . j-) .f(x7)1
i=I
< 0.
( 1 10)
To get the first inequality, observe that u,jj(f(xj)-
.f(x,*))sgn(x,; -
= a,j;If(x;)- .f(xf)l.
(111)
The second inequality is in view of Proposition 1. To get the third inequality, observe that -Ixj -.TI F -I,f(x;) - .f(xy)I. ( 1 12) The second equality is from the parameter assumption in Eq. (88). The last inequality is satisfied when x # x * . I?
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
307
2. Results ,f.r Type-I FCNN We recast the type-I FCNN i n Eq. (45) into
( 1 13)
We need the following proposition
Proposition 2. Assuine that for thrrt
then we have
two points
x(') and
x ( I ) there
exists ci k such
308
Proof
TAO YANG
1. Assume that there exists an h such that
We have
2. Assume that there exists an h such that
We have
then we have the following theorem.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
309
Theorem 7. Suppose that the spectral radius of matrix R, IAl, p ( R , [ AI) < 1, then the type-1 FCNN in Eq. (113) has only one equilibrium point, and this eyiiilibriim point is globally stable. Proof The existence of equilibrium point of FCNN in Eq. (1 13) is guaranteed by Theorem 4. Now we only need to prove that the FCNN has fewer than two equilibrium points. Let the right-hand side of Eq. ( I 13) be 0, we have
I j Let x(’) = col(xl( 1 ) , . . . , x (M N and ) x ( ~=) col(x,( 2 ), . . . , x (2) M N )be two solutions of Eq. ( I 13), then we have
MN
MN
j= I
310
TAO YANG
The second inequality is in view of Proposition 2. The rest of the proof is 0 similar to that of Theorem 5. Similarly to Theorem 6, we have the following theorem.
Theorem 8. Suppose that the following matrix
is a Hunvitz matrix, then the equilibrium point x = x* is globully stable. Here Jij
=
{I,
i=j 0, i # j
C. Local Stubility 1. Results for Tvpe-IIFCNN
In the FCNN as in Eq. ( 8 5 ) ,the state variable of each cell, x,(i = 1 , . . . , M N ) , can stay in three different intervals: (-m, --I], ( - I , 1) and 11, GO), which correspond to three different cell outputs: -1, x,, and 1 . So, the state space of the FCNN can be divided into 3 M N separated regions, D,(i = 1, . . . , 3MN). Each D, is an M N -dimensional hypercube. Suppose that x* is an equilibrium point of the FCNN in Eq. (85), and it is an inner point of Dk. Then there exists a neighborhood of x*, which is in the interior of Dk. Let
be the biggest hyperball in Dk. We can rewrite Eq. (85) in G as follows: d (Xi
-
x; )
dt MN
MN
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
31 1
then we have the following theorem.
Theorem 9. Suppose that the @lowing matrix
is u Hurwitz matri.x, then the equilibrium point x = x* is asymptotically stuble in the basin of cittraction G. Here
{ 1,
i=j
&; = 0, i # j
(131)
und
Proof
Since
is an Hurwitz matrix, we know that
is an M-matrix. Then, from properties of M-matrix, there exists a group of positive constants, pi > 0, i = 1, 2 , . . . , M N , such that
312
TAO YANG
We construct the following Lyapunov function: I
MN
where sgn(.) is the signum function. Along the solution of Eq. (1 30), we calculate the Dini upper-right differential of V ( x ) as D+ V(X) I EL,.( 130)
MN
r
MN
r
MN
.
MN
3 13
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS MN
MN
MN
+ CPi(1
- Ji,,)"(X,*)
+
(al.; IaijI
i= I
< 0.
I
+ I P ~ ~ IIxj) -.TI (135)
The second inequality is in view of Proposition 1. The last equality is from the parameter assumption in Eq. (88). The last inequality is satisfied when x # x* and x in hyperball G. So, G is the basin of attraction of the equilibrium point x*. 0
2. Resultsfor Type-I FCNN Similarly to Theorem 9, we can get the following theorem for type-I FCNN.
Theorem 10. Suppose thtrt the Jollowing matrix
+ ((I
- ~IJ~'(x~)(~aIJ~
+ a;":+ B I J ) ) M N x M N
(137)
is u Hurwitz matrix, then the equilibrium point x = x* is asymptotically stable in the btisiri o j attraction G. Here
arid
D. Type-I1 Delay-Type FCNN An M x N type-I1 DFCNN is described by the following state equation: C-
dxi,;( t ) dt
314
TAO YANG
We repack the state variables x,, into a vector x of size n = M N . Similarly, the input and output variables u,, and yf, are repacked into u and y using the same labeling order. The initial conditions for an DFCNN are given by
We assume that xo;;(t) is a continuous function. Then we recast the state equations (140) into the following functional differential equations (FDE):
Ck = C F ( t , x,)
x, E C, is defined as X,(O); = X ( t
+ el;,
0E
[-t,01.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
3 15
1 . Existence and Uniqueness o j Solutions
Proposition 3. Given the initial condition xo(t) = @ ( t ) , @ ( t )E then the DFCNN in Eq. (142)has
N
c,
unique coritinuous solutionfiw t
( 1 44) E
[0,00).
Proof we need to show that Eq. (142) has a unique solution. First we show that F ( t , x,) is globally Lipschitzian, that is, IF(t, $) - F ( t , @)I 5 Lly? - @I
for all
y ? , @ E C , and all t
(145)
for some constant L. If we define
then L qualifies as our Lipschitz constant. As input is continuous, F ( t , y?) is continuous with respect to r for all y?. The conclusion then follows from work by Driver (1977, pp 308-309). 0
Proposition 4. v t h e initial conditions are bounded by K > 0, theti all states x,, .f the type-I1 DFCNN in Eq. (140) ure bounded ,f.r all time in ubsolute value by the suni M = K+R,III+R,max
x ( l A r ( i , j ; k , l)l+lA(i, j ; k , l)l+lB(i, j ; k , 1 ) l )
316
TAO YANG
Proof It is sufficient to follow the Proof of Theorem 1 by Chua and Yang (1988b) to see that in this case it is also possible to recast the equations of the network in the same form as that of their Eq. (4a), as follows here: (148) where f ; ; depends only on y k / ( f ) and j ' k / ( t - r ) and g i j depends only on the inputs and the bias, and for both it is possible to compute an upper bound in the same way as was done by Chua and Yang (1988b). 0
2. Stcihility Results Given two points x,x* E DV""'", and the following function, C : !)IMN A such that for C(x - x * ) = (a(x; which is defined by A
n(x, - xT) = f ( x , )
-
H
!)IMN,
f(x,F)
A
Suppose x* is an equilibrium point and let w = ( w ; } M N ~=~ x Eq. (140) can be rewritten into
-
x*, then
Also, we study the stability of the type-I1 DFCNN with flat fuzzy templates. Similar to that in Proposition 1 , we have the following proposition.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
3 17
Proposition 5. Suppose x and x' are two solutions of type-11 DFCNN in Eq. (150), then we have (1)
- MN
- MN
Proof Let the right-hand side of Eq. (150) be zero, then we have the corresponding equilibrium equation as
318
TAO YANG
At the equilibrium point, we have
then we can recast Eq. ( 1 54) into
then we have
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
3 19
In view of la(x) - a(y)l 5 Ix - y l , and using a process similar to that in the proof of Theorem 5, we complete the proof. 0 Let H = (
~ ; ~ ) M N ~ satisfy M N
Then we have the following theorem.
Theorem 12. The origin of E4. (150) i s globally asymptotically stable i f H is a nonsingular M-matrix. Proof
We construct the following Lyapunov function:
where pi > 0, i = I , 2 , . . . , M N , are constants. Along solution of Eq. ( 1 50), we calculate the Dini upper-right differential of V(w(t)) as
3 20
TAO YANG
(160) In view of Io(wi)l 5 Iwi( and using a process similar to that in the proof of Theorem 5 and in view of parameter assumptions in Eq. (124), we have
The last inequality is satisfied when w # 0.
E. Type-I DFCNN An M x N type-I DFCNN is described by the following state equation:
32 1
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
Similarly we have the following two propositions:
Proposition 6. Given the initial condition
then the DFCNN in Eq. (162) has a unique continuous solution for t
E
[O,
00).
Proposition 7. If' the initial condition ure bounded by K > 0, then all strites x I Jofthe type-I DFCNN in Eq. (162)are boundedfor all time in absolute value by the sum M =K+R,III
+ 6R,
(164)
and the w-limit points o f x ,I ( t )are bounded in obsolute value by ( M - K). By repacking x = [ Eq. (162) as
X , ~ } M ~into N
a ID vector x = { X , } M N ~ Iwe , can rewrite
322
TAO YANG
Suppose x* is an equilibrium point and let w = ( Eq. (165) can be rewritten into
A w ; ] M N ~= ~
x - x*, then
Similar to that in Proposition 2, we have the following proposition.
Proposition 8. Assume thut f o r two points x i ' ) und x ( ~there ) exists u k such thut
then
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
323
Similarl.y, assume that there exists an 1 such that
thetz we have
Proof
It is the same as those in Theorems 7 and 1 I .
0
Letting H M = ( / Z , ~ ) M N ~ Msatisfy N
then we have the following theorem.
Theorem 14. The origin of Eq. (166) is globally asymptotically stable if H M is a rionsingulur M-matrix.
324
Proof
TAO YANG
Similar to that of Theorem 12.
0
F. Stability of Discrete-Time FCNN Discrete-time fuzzy cellular neural network (DTFCNN) is a very important branch of FCNN. The DTFCNN is governed by a set of difference equations. In this section, we present the structures of DTFCNN and provide some stability criteria for them. The dynamics of a cell C,, in an M x N DTFCNN is given by: 1. State equation
where t E N is the discrete-time; uij, xi,; and yi, are input, state, and output, respectively. 2. Output equation yij(f) = f(xij(t)). (179)
3. Parameter assumptions Al (i, j ; k, 1 ) = A1 (k, I ; i , j ) , AZ(i, j ; k, I ) = A2(k, I ; i , j ) .
,x:
(1 80)
Let x:, be an equilibrium point of the DTFCNN and letting e;,;(t)= xi,([) we can recast Eq. (178) into
325
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
Similar to Corollary 1 in Yang and Yang (1996), we need the following proposition.
Proposition 9. Let ( x , , ~ mid ] [x,,~)he two states o j DTFCNN in Ey. (178), then we have
(183)
Proof
Similar to that of Corollary 1 from Yang and Yang (1996).
Theorem 15. The equilibrium point qf the DTFCNN in Ey. (178), [x:,], asymptotically stable if
where L > 0 is a constant.
Proof
We define a Lyapunov function as M
N
0 is
326
TAO YANG
Taking the forward difference of V along the solution of Eq. (181) we get
In view of Proposition 9, we have
Observe that if
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
327
then A V is negative, which implies the asymptotic stability of the equilibrium point { x:~]. With the symmetric property of the neighborhood system and the parameter assumptions, we can recast Eq. (190) into
0 DTFCNN can be used in some typical applications of FCNN. Furthermore, DTFCNN can also be used to model some discrete-time and spatial-distributed phenomena such as highway traffic flow (Yang et al. 39983).
IV. FCNN
AS
COMPUTATIONAL ARRAYS
In this section, we present those FCNN structures that function as computational arrays. As the most commonly used interpretations of fuzzy AND and fuzzy OR are minimum and maximum calculations and as mathematical morphology (Serra, 1982, 1988; Heijmans, 1992; Haralick et al., 1987) is closely connected with fuzzy logic, FCNN is a paradigm for implementing morphological operators. The applications of FCNN as computational minmax networks are presented in this section. We show that FCNN can function as low-level computational structures just as conventional CNN does. The advantage of applying FCNN to image processing problems is that type-TI FCNN can implement max and min operations in a more natural and efficient way than the conventional CNN does. A. Basic Knowledge o f Muthematical Morphology
Mathematical morphology (Serra, 1982, 1988; Heijmans, 1992) is a theory that deals with processing and analysis of image, using operators and functionals based on topological and geometrical concepts. During the last decade, it has become a cornerstone of image processing problems. Morphological operations have been widely used for object recognition (Shih and Mitchell, 1988 a, b), edge detection (Lee et al., 1987), shape analysis (Pitas and Venetsanopovlos, 1990), thinning (Jang and Chin, 1990), image coding (Goutsias and Schonfeld, 1989; Maragos and Schafer, 1986), and smoothing (Jang and Chin, 1989). Four basic transformations in mathematical morphology include, dilation, erosion, opening, and closing. These basic transformations permit extraction
328
TAO YANG
of contours and skeletons, separation of close objects, and computation of geodesic distances, etc. (Serra, 1982, 1988; Heijmans, 1992). The basic idea of mathematical morphology is to probe an image with a structuring element and to quantify the manner in which the structuring element fits (or does not fit) within the image. In general, the structuring element has a simple shape and is very small compared to the image being investigated. We let f : X I-+ E and s : S H E be maps representing image and structuring element, respectively, where E is the range of gray values, X is a gray-scale image, and S is a weighted structuring element. Then the basic morphological operations of erosion and dilation for gray-scale images are given by Haralick er al. (1987) Grav-scale Erosion
+
for all z E S and x z Gray-scale Dilation
E
X.
for all z E S and x - z E X. With the definitions of gray-scale erosion and gray-scale dilation, gray-scale opening and gray-scale closing are given by Gray-scale Opening XOS = (X€3 S ) CB s. (194)
Gray-scale Closing XOS= (XCBS)~S.
(195)
For the purposes of implementing gray-scale morphological operations by FCNN, E is normalized within LO, 11. For example, let S be within a 3 x 3 square with its origin located at center as follows: hi h2 h3 S = h4 h5 hg (196) [ h , hg hg1
-
where hl hg denote gray values of corresponding entries of the structuring element. An entry in a structuring element is dejined if there exists an operation, is undefined if there exists no operation (or simply set an undefined entry as fco in erosion and as -m in dilation). A structuring element whose defined entries have the same gray value is called afZat structuring element.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
329
B. Iinplenierztation of Morphological Opercitions As type-I1 FCNN is a combination of min and max operations with parallel dynamics, it is very convenient to implement morphological operations in type-I1 FCNN. Another reason for using FCNN in mathematical morphology is that given a relatively small structuring element the morphological operations have strong local properties. And a big structuring element can sometimes be decomposed into a set of smaller size structuring elements. This makes possible the applications of FCNNs with 3 x 3- or 5 x 5-neighborhood to image processing problems where large structuring elements are needed. The FCNN is found to be a universal parallel array to implement morphological operations for processing both binary and gray-scale images (Yang and Yang, 1997 d, e). I n this section, we use different FCNN structures to implement the basic morphological operations. Although the results presented in this section are based on FCNN, the corresponding DTFCNN structures can also be used. 1 . Using Multiplicative Type-I1 FCNN
The following multiplicative FCNN' (Yang and Yang, 1996) is used to implement a morphological operator with a flat structuring element:
The parameters for implementing erosion with a flat structuring element are given by K,,= I , I = -11, B f .,,,i{X = 0, B f ",in = S' (198) where h is the height of the flat structuring element S , S' is given by substituting the defined entries in S by Is. The parameters for implementing dilation with a flat structuring element are given by K, = I , I = 11, B f = Sb, B + =0 (199) where Sh is given by substituting the defined entries in S,) by Is and S,) = (-x : x E S ] . An FCNN is callcd niu//i/~/ictr/ivr if i t has multiplicative fuzzy synaptic laws. An FCNN is called d d i / i v r if it has additive f'tiz7y synaptic laws.
330
TAO YANG
For example, letting S be that in Eq.( 196), then SL, is given by
As an example, Fig. S shows output results of the forementioned two FCNNs with the following zero-height flat-structuring element:
s'=
I: 1 :I 1
1
1 ,h=0.
The parameters for the erosion FCNN are given by
The output of this erosion FCNN is shown in Fig. Sb. The parameters for the dilation FCNN are given by
The output of this dilation FCNN is shown in Fig. Sc. As defined, a grayscale opening can be implemented by an erosion FCNN followed by a dilation FCNN. The result of FCNN-based flat opening is shown in Fig. 5d. A grayscale closing can be implemented by a dilation FCNN followed by an erosion FCNN. The result of FCNN-based flat closing is shown in Fig. 5e.
2. Using Additive Type-I1 FCNN The following additive FCNN is used to implement erosion and dilation with any structuring element besides flat ones (Yang and Yang, 1997 d, e; Yang et ul., 1996 d):
The additive FCNN for implementing erosion has the following parameters: R , = 1 , BfI,,;,x= undefined, B ,
= -S.
(20s)
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
33 1
(K)
t:lGuw 5. FCNN-based mathematical morphological operations with zero hcight 3 x 3 full scale Hat structuring element. (a) Original image. (b) Output of erosion FCNN. (c) Output of dilation FCNN. (d) Output of FCNN-based opening. ( e ) Output of FCNN-based closing.
We call the preceding FCNN an erosion FCNN. Image X is its input and its initial state is arbitrary. When we say a template is “undefined,” it means that lhe template is not used by the FCNN. The additive FCNN for implementing dilation is given by
R, = 1, B ,
= undefined, B f
= SD.
(206)
332
TAO YANG
(d)
(c)
FIGURE6. Implementation of gray-scale morphological operations using additive type-I1 FCNN. (a) Output of erosion FCNN. (b) Output of dilation FCNN. (c) Output of pray-scale opening. (d) Output of gray-scale closing.
We call this FCNN a dilation FCNN. Image X is its input and its initial state is arbitrary. Figure 6 shows examples of implementing basic morphological operations using additive type-I1 FCNN. The structuring element is given by
s=
[
0.02 0.1 0.02 0.1 0.2 0.1 0.02 0.1 0.02
1
The original image is the same as that in Fig. 5a. Figure 6a shows the output of the erosion FCNN. Figure 6b shows the output of the dilation FCNN.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
333
Figure 6c shows the output of the dilation FCNN. Thus, it is the opening of the original image. Figure 6d shows the output of erosion FCNN when the image in Fig. 6b is the input. Thus it is the closing of the original image. Shortly after its invention (Yang and Yang, 1996), many applications of FCNN-based matheinatical morphology were found in image processing. Some typical applications include gray-scale reconstruction (Yang and Yang, 1997e), Euclidean distance transformation (Yang and Yang, 1997d), fuzzy shrinking and expanding (Yang et al., 1998g), medial axis transformation (Yang et a/., 3998g), and edge detection under low SNR conditions (Yang and Yang, 39970. The FCNN-based mathematical morphological operators were also proved to be more robust and reliable than those based on conventional CNNs (Yang e t d . , 19980. A comprehensive survey on applications of FCNN to image processing can be found in a technical report by Yang et al., (1997i). C. MIN/MAX C N N
MINMAX CNN, which consists only of local MIN and MAX operations, is a special case of type-I1 FCNN by eliminating either the multiplication or addition operation between the weights and the inputs (respectively, outputs). The state equation of a discrete-time MINMIN CNN (MMCNN) is given by x ( i , j ) = kl min u ( i ~IENI
+ k , j + I ) + k2 max u(i + k , j + I ) , kl En/.
1 l i s M , l 5j < N .
(208)
Nj and N2 in Eq. (208) are two spheres of
injuence, henceforth called neighborhood patterns, which give the arguments of MIN and MAX operators, respectively. N I and N2 may be the same or different. Figure 7 shows three typical neighborhood patterns, which can also be respectively represented in the following forms (-1,
-1)
(1, -1)
(-1,O)
(1,O)
(-1,
1)
(1, 1 )
where (.,.) denotes the Cartesian coordinate (with respect to the center) of the artificial synapse, and the symbol “0” denotes that the weight (a PN junction) does not exist (for nonprogrammable chip) or is off (for programmable chip) at the indicated position. Because comparing operations are much easier to implement than arithmetic operations given the same accuracy level, and as local MIN and MAX
334
TAO YANG
(a)
(b)
(c)
FIGURE7. Some typical neighborhood patterns used in MMCNN.
operators are widely used in image processing based on gray-scale mathematical morphology operations, MIN/MAX CNN has a very high silicon area efficiency and yet performs many primary image preprocessing tasks. A schematic circuit implementation of the local MIN and MAX operations is shown in Fig. 8a and b, respectively. A possible CMOS current-mode MAX circuit is given in Fig. 8c (Baturone et al., 1997), which shows that much fewer transistors are needed. As the MMCNN chip is technically simple yet highly efficient, its functions should be specially designed for different image processing tasks. In this section we design some typical image processing tasks for MMCNN chips. There are two kinds of design methods. The first is an exact design method, which we apply for mathematical morphology operations, rank filters, and range operations. All of these operations use M I N M A X operators. The second
Max u(i+k,j+l) N?
Min u(i+k.j+l) Nl
(C)
FIGUIE8. Circuit implementation of the MIN and MAX operations. (a) The schematic circuit of the MAX operation. (b) The schematic circuit of the MIN operation. ( c ) A practical CMOS current-mode MAX circuit.
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
335
is an approximate design method, which we apply for Laplacian kernels, averaging operations, and orientation derivatives. The nature of these operations is totally different from the MIN/MAX operators. The additional optimization needed in the second class is implemented by a learning algorithm presented in Sect. IV. C. 6. By choosing kl = 1 and k2 = 0, an MMCNN becomes an erosion network, whose structuring element is given by Mi (the domain). Similarly, by choosing k l = 0 and k2 = I , an MMCNN becomes a dilation network, whose structuring element is decided by Nr (reflected with respect to origin). 1 . Des igriirig M M C N N ,for Approximciting Laplcician Operator
A 3 x 3 Laplacian operator is given by -1
-1
-I
The corresponding neighborhood patterns of the MIN and MAX operators in the corresponding MMCNN, henceforth called a ,fidl-.scale rzeighhorhod pcittrm, are given by
N , =N2 =
(-1. - 1 ) (0-1)
i
(],-I)
(-1,O) (-1,O) (I,0)
(-1, 1) (0,l) (I,])
}
.
(210)
We search around the kl - k2 plane for the minimum regions for the error between the output of the MMCNN and the Laplacian operator. Based on our knowledge, we know that we should search in regions where kl < 0 and k2 > 0. In Fig. 9 we show the searched results. The corresponding output for the Laplacian operator is shown in Fig. 9a. Figure 9c shows the error surface with kl E (-4, 0) and k2 E (0,4). This region is divided into 50 x SO small rectangular regions. The minimum value is found at the point ( k l , k 2 ) = (-0.72,0.72) with E = 505.946899. The corresponding output of the MMCNN is shown in Fig. 9b. Comparing the results in Figs. 9a and b we find that the latter is a "blurred" (lowpass filtered) version of the former. This can be confirmed from the rather big error E in this case. 2. Designing M M C N N for Approximating Averuging Operator A 3 x 3 averaging operator is given by 119 119 119
119 119 119
119 1/9 119
336
TAO YANG
(c)
FIGURE 9. The distribution of the erro between the output of the MMCNN and the Laplacian operator with different kl and kz values. All images in this figures are shown in negative exposure. (a) The output of the Laplacian operator. (b) The output of the MMCNN with minimum error. (c) The error distribution on the k~ - kZ parameter plane.
The corresponding neighborhood patterns of the MIN and MAX operators in the corresponding MMCNN are given by
Figure 10 shows the searched result. The corresponding output of the averaging operator is shown in Fig. 10a. Figure IOc shows the error surface with kl E (0, 1 ) and k2 E (0, 1 ). This region is divided into 30 x 30 small rectangular
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
337
(c)
F l ~ u n t i 10. Error distribution between the output of the MMCNN and the averaging operator with tliffcrcnt kl and kz values. (a) The output of the averaging operator. (b) The output o l the MMCNN with minimum error. (0) The error distribution on the kl -?, ! parameter plane.
regions. The minilnuin value is found at point ( k l , k z ) = (0.5, 0.5) with E = 13.7573. The corresponding output of the MMCNN is shown in Fig. lob. As we can expected from the very small E , the two images in Figs. 10a and b are almost the same. Of course, this result is image-dependent. For images having very few ‘‘smooth” segments, the error E will become larger. 3. Designing MMCNN,fiw Approxinzrrting Horizontul Derivative A horiLontal derivative with a 3 x 3 kernel is given by
{:i
i).
(2 13)
338
TAO YANG
The neighborhood patterns for the MIN and MAX operators in the corresponding MMCNN are given by
We then search the minimum point on the kl - k? plane. Figure 1 I shows the searched results. The corresponding output of horizontal derivative is shown in Fig. I la. Figure I Ic shows the error surface with kl E (-3,O)
(C)
FIGIJREI I . Error distribution between the output of the MMCNN and the horizontal derivative operator with different k l and k? values. All images in this ligure are shown i n negative exposure. (a ) The output of thc horizontal derivative operator. ( h ) The output of the MMCNN with minimum error. ( c ) The error distribution on the X I k~ parameter plane. ~
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
339
and k2 E (0, 3). This region is divided into 30 x 30 small rectangular regions. The minimum value is found at the point ( k l , k ? ) = (-1.5, 1.5) with E = 391.307098. The corresponding output of the MMCNN is shown in Fig. 1 Ib. Here we present only the derivative kernel for the horizontal derivative. The other MIN/MAX derivative kernels are given in Sect. IV. C. 4.
4. Designing MMCNN jii r ApproxiinLi ring Orietz ta tinrz De ri vci tives The orientation derivatives are collections of orientation-dependent high-pass filters. A combination of different kinds of orientation derivatives usually gives a much better edge detector than a single orientation-independent high-pass filter, such as the Laplacian operator. In this section we show the results of different orientation derivatives using different neighborhood patterns. The neighborhood patterns corresponding to one of the 8 compass directions are listed here: 1.
2.
4.
5.
340
TAO YANG
6.
Ni=
(-1, -1) (O,-l) ( I , -1)
i
(-1,O)
0 0
(-I,
1) 0 0 ],N*={ 0 0 (I. -I>
(1,O)
(-1.1) (0, I (I, I)
1
.
(220) There are many other neighborhood patterns, including combinations of these patterns. Some simulation results are shown in Fig. 12. Observe that the details of all of the results are different from each other even though the corresponding neighborhood patterns differ only slightly from each other. Although the orientation derivatives are standard image operations, to implement them in real time is not a trivial problem; MMCNN provides a possible real-time solution to this problem. 5 . Uiiiwrse of MMCNN Furzc~tionsin the kl
-
k2
Plane
In this section we show that the two tunable parameters kl and k2 can generate many different image processing functions by introducing competition and cooperation between the local MIN and the local MAX operations. The firnctional universe of MMCNN with full scale N , and N 2 is depicted in Fig. 13. We call the positive half of the horizontal axis the erosion axis and the positive half of the vertical axis the diliitioii axis. Observe that the functional universe of MMCNNs includes generalized erosion and dilation as two special cases. Moreover, the functional universe also includes many other tasks resulting from competition and cooperation between the MIN and MAX operations. To study the universe qualitatively, it suffices to consider only the upper half because the lower half represents only corresponding “negative” tasks. However, as the nonlinear output function of the neurons is not symmetric, the tasks irnplemented by MMCNNs in the lower half plane are very different from those corresponding to the negative of the upper half plane. These differences are illustrated in the examples provided in this section. We also labeled two lines indicating the central regions of competition and cooperation. Cooperation results in a lowpass filtering operation while competition results in a high-pass filtering operation. Perforinmice of H i g h - p a s s Filteririg via MMCNN. Figure I4 shows different results of high-pass filtering operations resulting from competitions between erosion and dilation. The parameters for Figs. 14b and e are the negative of those in Figs. 14c and d. As the nonlinearity of the output function is not symmetric, the output of the parameters in the lower half plane is not negatively symmetric to those from the upper plane. Observe that k l and k? can tune the high-pass performances in a very different manner. Figures 14a, b, and c show typical Laplacian performances. However, the result in Fig. 14d
(d) (e) ( f) FIGURE 12. The results of the orientation derivatives computed by the MMCNN; kl = -1.4 and kl = 1.4 for all cases. All images are shown in negative exposure. (a) The output as in Eq. (215). (b) The output as in Eq. (216). (c) The output as in Eq. (217). (d) The output as in Eq. (218). (e) The output as in Eq. (219). (f) The output as in Eq. (220).
342
TAO YANG
FIGLIRF 13. The universe of functions of the MMCNN in the k~
~
k? plane
looks more like image enhancement than high-pass filtering alone whereas the result in Fig. 14e is more similar to image segmentation. In the functional universe of MMCNNs the points kl = -1 and kl = I correspond to the so-called range operator, which computes the difference between the maximum value and the minimum value of the pixels within the neighborhood. The range operator responds to the boundaries between regions having different average brightness and is sometimes used as an edge-defining algorithm (Russ, 1992). Our MMCNN represents a generalization of the range operator by also scaling the effects of MIN and MAX using different weights. Perfbrtnatzce of Lowpnss Filtering via MMCNN. In this section we give the cooperating results of dilation and erosion. The simulation results are shown in Fig. IS. Although the effect of different kl and k 2 on the performance is somewhat similar to that corresponding to the radius of different Gaussian kernels, the difference is that the MMCNN has a very strong nonlinearity. For example, in Fig. 1Sa, although some region is very similar to the output of the Gaussian kernel with a big radius, however, by inspecting the position of the hair between the hat and the face, we can still find a black thin vertical line, which would not exist using a similar Gaussian kernel. Hence, we find that the MMCNN not only provides reasonable approximations of existing various image processing operations, but also presents other unique functions that can be exploited in some special applications.
6. Learning Algorithms Many image processing operations can not be implemented exactly using MMCNN because they are in nature very different from MIN/MAX operations.
w
P
w
(d) (el FIGLIRE 14. The results of high-pass performances resulting from the competition between erosion and dilation. All images are shown in negative exposure unless stated otherwise. (a) kl = -4 and k, = 4. (b) k1 = -1 and kz = 0.75. (c) kj = -1 and X.2 = 3 . (d) kl = 4 and k? = - 3 (shown in positive). (e) kl = 1 and k? = 0.75 (shown in positive).
344
TAO YANG
(d) ~ l ( i 1 l R E 15. The results of lowpass performances given by a11 MMCNN. (a) kj = 4 mid = 4. (b) kl = 2 and k2 = 2. ( c ) k1 = I and k2 = I . (d) kl = 0.2.5 antl kz = 0.75. ( e ) kl = 0.75 antl kz = 0.25.
They are convolutions whose implementation requires many multiplications. Unfortunately, current CMOS implementation of a multiplier requires at least 10 transistors, occupying 10 times more silicon area than an MMCNN implementation. This 10-fold reduction in silicon area is the motivation for developing MMCNNs for approximating the functions of classical image operators. Clearly, there is a tradeoff between functionality and implementability. We present a learning method to optimize an MMCNN for different image processing tasks. Some examples are also presented in this section. In this section we use the following input and output functions of MMCNN:
We then study how an MMCNN can learn the optimum parameters when only the example sets { ( u ( i , j ) , o ( i , j ) ) ) are available, where [u(i,j ) ) is the input image set, and { o ( i ,j ) } is the desired output image set. The learning process is driven by minimizing the square of the difference between the
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
345
desired output ( n ( i ,j ) ) and actual output ( y ( i , j ) }
E =
1 -
2
C(o(i, j ) - y ( i , j))’ (1.J)
for all the training samples to be learned. We call this function a cost,function. We then have
aE ilk1
-
aE d j l ( i , j ) &(i, j ) ay(i, j ) &(i, j ) akl
where
Then the &learning rule for updating kl is given by
where p is the learning rate. Similarly the 8-learning rule for updating k2 is given by
Hrmurks:
The learning algorithms provided in this section can only converge to local minima of the surface of error function E . However, if the training examples are smooth enough, we found in most cases that E only has a single global minimum. Thus, the choice of the initial condition is only to speed up the search process in the learning algorithm. 0
The size and statistical characteristics of the image samples should represent typical ones for the applications of the trained MMCNN. When the size of
346
TAO YANG
the image sample is too small, more than one image sample should be used because the training algorithm will be too sensitive if only a very small number of pixels are used in the training iterations. 0
0
p can not be chosen too big, or the training algorithm may become unstable. However, a very small p will slow down the convergence speed significantly.
More advanced learning algorithms should be developed for training neighborhood patterns. This is a kind of “structure” training problem, which is much more complicated than a parameter training problem. However, as image processing is a well-developed field, in most cases we can use our experience to choose neighborhood patterns efficiently.
Typicul Examples. I n this section we will train an MMCNN to perform three typical image processing operations: namely, averaging, calculating the Laplacian, and calculating the horizontal derivative. The averaging operator is a typical lowpass operation, whereas the Laplacian operator is a typical high-pass operation. While the preceding two operations are orientationindependent, the horizontal derivative is a typical orientation-dependent operation.
Averuging Operator. The output shown in Fig. 10a for the averaging operator in Eq. (21 1) is the desired output image. To train the MMCNN we choose p = lop5. The initial values for both k l and k2 are chosen equal to unity. The learning process is shown in Figs.16b and c with 1000 training iterations. Figure 16b shows the error E decreases monotonically and Fig. 16c shows the corresponding learning dynamics of kl and kz, which converges to kl = 0.48 1469 and k2 = 0.5 193549, respectively. The corresponding output of the MMCNN is shown in Fig. 16a. As we can expect from the very small error E , the two images in Figs.lOa and I6a are almost the same. This result is image-dependent. For images having very few “smooth” segments, the error E will become larger. Horizontul Derivutive. The output shown in Fig. I la of the horizontal derivative in Eq. (213) is the desired output image. To train the MMCNN we choose p = 4 x lops. The initial values are chosen equal to kl = - 1 and k2 = I . The learning process is shown in Figs.17b and c with 800 training iterations. Figure 17b shows the minimization of the error E and Fig. 17c shows the learning dynamics of kl and k 2 , which converges to kl = - 1.556571 and k2 = 1.533475, respectively. The corresponding output of the MMCNN is shown in Fig. 17a. Luplucian Operutor. The output shown in Fig. 9a of the Laplacian operator in Eq. (209) is the desired output image. To train the MMCNN we choose
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
347
(ill
l
i
I
.
.,
.
,
,
.
.
.
I I
,
,
,
,
,
,
,
,
,
,
I i I
I;~C;LJKI: 16. Thc learning process of an MMCNN for approximating the averaging operator. (a) The output of the MMCNN with learncd parametcrs. (13) The dynamics of the cost function E . (c) The convergence processes of kl (solid line) and k: (dashed line).
p = IO-’. The initial values are chosen to be kl = - 1 and k2 = I . The learning process is shown in Figs. 18b and c with 800 training iterations. Figure 18b shows the minimization of the error E . Figure I8c shows the learning dynamics o f kl and k l , which converges to kl = -0.6467125 and k l = 0.6626593, respectively. The corresponding output of the MMCNN is shown in Fig. 18a. Comparing the results in Figs.9a and 18a, we find that the latter is a “blurred” (lowpass filtered) version of the former. This can be predicted from the rather large error E at the equilibrium point found by the learning algorithm.
7. Esumples of’ Applications To demonstrate the capability of the MMCNN, we present here some real-life applications of the MMCNN to irnage processing. As the performance of any image processing structure is evaluated by humans and not by machines, we present here some simulation results for the readers to judge the performance of the MMCNN. Filteriiig out Smull Pcirtides. In Fig. 19a we show the image of a microscope view containing different sizes of particles. The task is to count only the big particles. Although humans can perform this task very easily, for a
w
P
00
iterations (b)
iterations
(c) FIGURE17. The learning process of an MMCNN for approximating the horizontal derivative. All images in this figure are shown in negative exposure. (a) The output of the MMCNN with learned parameters. (b) The dynamics of the cost function E . ( c ) The convergence processes of l i l (solid line) and kz (dashed line).
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
'A
349
,
FIGURE18. Training an MMCNN for approximating the Laplacian operator. All imagcs in this figure are shown in negative exposure. ( a ) The output of the MMCNN with learned parameters. (b) Tho dynamics of the cost function E . ( c ) The convergcncc processes of kl (solid line) ;ind k l (dashed line).
program to do it, it is first necessary to delete the small particles. This problem can be solved by applying four erosions on a 3 x 3 8-neighbor sphere of influence N , (the result is shown in Fig. 19b), followed by applying four dilations, as shown in Fig. 19c. Observe that the result is not very good because of the strong background at the upper-left corner in the original image. To overcome this problem we should eliminate first the big difference in contrast between different portions of background; this problem can be solved by applying an MMCNN with the following sphere of influence:
Nl
= N2 =
{
-1) (0, - 1 ) (1, -1)
(-1,
(-I,()) (O,O) (1,0)
(-1,
1) (0, I ) (1,1)
}
(227)
and with kl = 0.2, kz = 0.2, which is a scaled smoothing MMCNN. When we subtract the output of this MMCNN from the original image, we obtain the image shown in Fig. 19d. Observe that the background becomes much more homogeneous. The image in Fig. 19d is then processed by applying the erosion MMCNN twice (shown in Fig. 19e) and followed by applying the
I
(.
50 . a
100
I50
150.
200
200
B
250 50
100 150 200 250
n
'
. I 250 50
100 150 200 250
50
100 150 200 250
(a)
.m.
'
.. .
'... #
. . ' I
1
s .
*
.
250
250
FIGURE19. Sequence of operations to filter out small particles. All images in this figure are shown in negative exposures. (a) Original image. (b) After four erosions of (a). (c) After four dilations of (b). (d) Homogenizing the background of (a). (e) After two erosions of (d). (f) After two dilations of (e).
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
35 I
FIGLIIIE20. linage enhancement. (a) The output of NMCNN in Eq. (228). (b) Subtraction of (a) from the original image.
dilation MMCNN twice. The final result shown in Fig. 19f is significantly better than that of Fig. 19c. linage Enhuncement. One way to enhance an image is to locally increase the contrast at discontinuities by subtracting a high-pass version from the original image. For this example, the high-pass version is obtained by applying the following MMCNN:
Ni =N2 =
(-1, - 1 ) (0,- 1 )
i
(1,--1)
(-1,0)
(0,O)
I) (0, 1 )
(-1,
(1, 1 )
1
,kl
= -0.65,
k2
= 0.65.
(228) The high-pass output image is shown in Fig. 20a. This result is then subtracted from the original image as shown in Fig. 5a and the enhanced image is shown in Fig. 20b. Our next example shows improvement in the fine details from an almost homogeneous image that resulted from the sinall dynamic range of the imaging camera. The simulation results are shown in Fig. 21. Figure 21a shows the original image of a tissue sample. Since the camera mounted on the microscope has a very limited dynamic range, details related to the fiber orientations within the tissue samples are very difficult to distinguish. To make the fiber orientations visible, we use an MMCNN to enhance the original image. In the first step a “smoothing” MMCNN with full-scale neighborhood patterns N, and N2 and with synaptic weights kl = k2 = 0.5 is used to smooth the original image. The result is shown in Fig. 21b. The smoothed version is then subtracted from the original image to give the result shown i n Fig. 21c. Subtracting the result in Fig. 21c from the original image in Fig. 21a, we obtain the enhanced image in Fig. 21d where the fiber orientations are clearly visible.
352
TAO YANG
(C)
(d)
21. Image enhancement: enhance the details in the original imge. (a) The original image. (b) The smooth version of (a) using an MMCNN with k l = 0.5, k l = 0.5. (c) Substract (b) from (a) and amplify 20 times (shown in negative exposure). (d) Substract (c) from (a). FI(iLu<E
Texture Processing: Alloy Quality Control. Many images consist of regions with repeated variations in brightness called textures. Rank operations are often used to detect textures in images. One of the simplest (but least versatile) texture operators is to find the range of difference between the maximum and the minimum brightness values in a neighborhood. For a flat or uniform region, the range is small. Large values of the range correspond to a surface with a large roughness. The size of the neighborhood should be large enough to include both dark and light pixels, which generally means that it should be larger than the small uniform details in the images. However, current VLSI technology makes it impractical to choose a sphere of influence
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
353
larger than a 3 x 3 neighborhood. I n view of such practical considerations, an MMCNN can only process relatively fine textures. For images with big texture characteristics, we need to reduce the resolution before an MMCNN can process it. Consider a real-life application of an MMCNN for quality control of alloys. Thc results are shown in Fig. 22. Figure 22a shows an original image of the broken face of an alloy test sample. Observe that the upper left corner is much smoother than the lower right corner. The smooth portion is due to long-term fatigue. The rough portion is due to terminal tearing failures. If the alloy has no defect, then the boundary between these two kinds of textures should be regular. As fatigue failures are more likely to develop along defects in the alloy, any defect will make the boundaries irregular. However, with the naked eye it is very difficult to see the boundaries between the smooth portion and the rough portion in Fig. 22a. Let us use an MMCNN to segment the smooth texture and the rough texture. Figure 22b shows the range image provided by an MMCNN with full-scale 3 x 3 neighborhood patterns Nl and N? and with kl = - 1, kr = 1. In this image original feature brightness is gone and brightness represents only texture. However, as the 3 x 3 neighborhood is too small compared with the characteristic regions of the texture, the difference between textures is covered by noise. To overcome this problem we use the same MMCNN to process the output image in Fig. 22b for enhancing the difference as shown in Fig. 22c. We then smooth the image in Fig. 22c by an MMCNN with parameters kl = 0.25, k2 = 0.75. The result is shown in Fig. 22d. As this result favors the MAX effect, we process Fig. 22d by an MMCNN with parameters kl = 0.75, k2 = 0.25. The result is shown in Fig. 22e. The results in Figs.22d and e are then added to give the comprehensive result shown in Fig. 22f. The result in Fig. 22g is combined into a threshold version of Fig. 22f (with threshold equal to 0.85) and the original image. Figure 22h is the compensated image of Fig. 22g. Figure 22g shows the smooth portion of the broken end. This portion is formed by long-term fatigue. From Fig. 22h we can see that all small grains with a sinall bright center region are identified. This texture corresponds to the rougher portion of the original image. This kind of texture is formed by tearing failures. We can therefore conclude that a big defect exists in the lower left corner. This defect induced a fatigue failure along the horizontal direction from the lower left comer.
Intelligent Cumera. As an MMCNN chip has a very high pixel density, it can be fabricated in a very small size that fits into a camera. A smallscale MMCNN chip can be used to control the self-focusing system of an automatic camera for dynamic object tracing. The self-focusing system is an
354
TAO YANG
(b)
(11)
F I W R E 22. Texture detection. (a) The original image. (b) The range image of (a) provided by an MMCNN with kl = - 1 , k? = I , (c) The range image of (b) provided by an MMCNN with k l = - 1 , kz = I . (d) The smoothed image of (c) provided by an MMCNN with k~ = 0.25. k? = 0.75. (e) The smoothed image of (d) provided by an MMCNN with kl = 0.7.5. k? = 0.25. (f) The sum image of (d) and (e). (g) The combination of the threshold version of (f) and the original image in (a) reveals the locations of the smooth texture. (h) The compcnsation of (g) reveals the Icications of the rough texture.
intelligent feedback control system consisting of a sensor, an algorithm, and an actuator. The sensor is embedded into the MMCNN array. The actuator is driven by a step motor. The algorithm is a simple digital feedback controller. The whole system is depicted in Fig. 23. Figure 23a shows the block diagram of the entire self-focused camera. Whenever a captured image is off-focus,
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
355
(b)
FK;LIIE23. Applications of an MMCNN to a self-focused camera. (a) The block diagram of the entire optical and electronic system. (b) The varying neuron dcnsity on the MMCNN chip. Every box denotes it neuron.
the MMCNN gives a signal to the controller. The controller drives the lenses moving forward or backward. The feedback loop is set up when the MMCNN detects the adjusted image and gives a signal determined by the degree of offfocus. From linear optics we know that the more the camera is off-focus, the more blurred is the captured image. The off-focus effect can be approximately modeled as a lowpass filtering process with a different neighborhood size (X.P. Yang et a/., 1994). The goal of the feedback controller is to minimize the sum of the outputs of all neurons on the following MMCNN chip:
Nl =N2=
{
(-1, -1) ((3-1) (1, - 1 )
(-1,O) (-1,O) (1,O)
1) ((),I) (1,l)
(-1,
1
,kl = - l , k 2 =
I.
(229)
This MMCNN can calculate the sharpness of an image by finding the difference between the local minimum and the local maximum. As any lowpass filter will reduce the original local maximum and increase the original local minimum, the minimum of the output sum of the foregoing MMCNN corresponds to the “sharpest” image.
356
TAO YANG
The MMCNN chip used in the camera has a space-varying neuron density just as the human retina does. As shown in Fig. 23b, the center region of the MMCNN chip has the highest neuron density because when we shoot a picture, we usually put the objects we find interesting in the center region. On the other hand, it is more convenient to implement this structure using a VLSI chip because we can put the control and addition circuits in the margin regions. In this MMCNN chip, only 40 neurons are fabricated. The captured images of a controlling sequence are shown in Fig. 24. Figures 24a to f are captured images in steps 1 to 6. The output signal given by the MMCNN chip is shown in Fig. 24g. Let us follow the output of the MMCNN where the minimum is found at the sixth step. The seventh step gives an overshoot, which is suppressed at the eighth step. D. Fure Image Processing using Type-1 FCNN For the purpose of face recognition (Chellappa et al., 1995), facial expression animation (Yang et al., 19960 and synthesis (Sato et al., 1995), and compression of face image (Clarke, 1995; Torres and Kunt, 1996), we need to model face images. Two important steps in modeling of face images are face image segmentation and feature extraction. The structural features of a face image include eyebrows, eyes, nose, and mouth. For facial expression animation, the deformation of cheeks is of interest. Although Yang et al., ( 1 9960 used a 2-layer conventional CNN to animate facial expressions, they did not provide a method to locate the key-points, which specify the locations of eyes (eyebrows) and mouth. Of cause, we can locate eyebrows, eyes, nose, and mouth by using a series of templates that perform the corresponding digital image algorithms as surveyed in Chellappa et al., (1995). We show here how the following two simple type-I FCNNs can be used to solve this problem. One of them is given by (230)
2 . Output equation
In the preceding type-I FCNN, the template
i f,,,in
consists of fuzzy numbers.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
357
Control Step
(g)
24. Self-focused intelligcnt camera. (a) The captured image at step 1. (b) The capturcd iinage at step 2. (c) The captured image at step 3. (d) The captured image at step 4. (e) The capturcd image at step S. (f) Thc captured image at step 6 . (g) The output of the MMCNN chip. FIGURE
The other is given by the following state equation:
The output equation is the same as that in Eq. (231). Similarly, the template 81 ,111,, consists of fuzzy numbers. 1 . Locating Low Boutidmies of E~iehrows,Eyes. Nose, and Mouth
In this example, the templates given by
81
and Bfrndxin Eqs.(230) and (232) are
358
where of A.
TAO YANG
“-”
is an algebraic negative and
-
Bfmax=
[
‘‘2’denotes the fuzzy complement
A
-C
-c
A
B
A
(234)
-C
where the fuzzy number A is given by
+ l),
x 5 0.5 0.5 < x 5 1
max(0, - n u l), p c ( x ) = max(0, m ( x - 1 ) l ) ,
x 5 0.5 0.5 < x 5 1
=
{
k ( 0 , k(x - 1 )
the fuzzy number B is given by
the fuzzy number C is given by
+
{
+
(237)
‘
Observe that the fuzzy numbers A, B, and C are piecewise linear for the purposes of easy VLSI implementation. To show how the operator “&,in” works, we rewrite Eq. (230) into an explicit form as i l j
= -XI
pA(ui-l.j-l), 1 + pC(ui-l.j), 1 - p A ( u i - l , ; + l ) ~ + / ~ ~ ( u i . j - ~ ) , 1 ~ ~ ( u i . , ) ) 1 + ~ ~ ( u i , j + ~. (238) 1 - pA(Ui+l,j-l), 1 + /-k(ui+l.j), 1 FA(ui+l,j+l)
1
A(
j+ -
1
-
Also, to show how the operator an explicit form as i . .- . x . I. J + -
.IJ
)
1 3
-
v
-PC(~~-I.,)*
PA(ui-I.j-1)3
-(
-PC(U~.~-I 1
p A ( U i + I , 1-1
‘‘6,,,ax” works,
9
),
we rewrite Eq. (232) into P A ( ~ ~ - I , ~),+ I
~ n ~ ( ~ i , j ) ,- P C ( ~ ~ , ; + I -pC(ui+l, j ) ,
1,
pd(uif1.j+l )
)
.
(239)
The simulation results are shown in Fig. 25. Figure 25a shows the facial image of a Chinese girl of size 63 x 63 and 256 gray levels (normalized within [O, I I). This image is fed into the input of FCNN in Eq. (232), the output image is shown in Fig. 25b. Then the image in Fig. 25b is fed into the input of FCNN in Eq. (230), the output image is shown in Fig. 25c. From Fig. 25c one can
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
359
(c)
FIGURE 2.5. Locating structural features of facial iniage using typc-I FCNN. (a) A gray-scale facial image of a Chinese girl. (b) Output image of the FCNN in Eq. (232). Input is the image i n (a). (c) Output image of the FCNN in Eq. (230). Input is the image in (b).
see that the low boundaries of eyebrows, eyes, nose, and mouth are marked by black lines. In this simulation, we choose k = 2 and m = 4. However, different k s and ms should be chosen for different facial images.
2. Segmentation of Fuce
I n this example, we show thc segmentation of "flat" parts of a facial image that contains cheeks, chin, and forehead. The two type-I FCNNs in Eqs.(230)
360
TAO YANG
(b)
(a)
FIGLIRE 26. Segmentation o f a facial image. (a) Output image of the FCNN in Eq. (230). Input is the image in Fig. 25(a). (b) Output image of the FCNN in Eq. (232). Input is the image in (a).
and (232) are used. The templates
if,in
and
ifmax are given by
The fuzzy numbers A and B are the same as those in Eqs.(235) and (236). The simulation results are shown in Fig. 26. Figure 26a shows the output results of the FCNN in Eq. (230). The input image is that shown in Fig. 25a. Then the image in Fig. 26a is fed into the input of FCNN in Eq. (232), the output image is shown in Fig. 26b. From Fig. 26b one can see that all the flat regions of the face (e.g., forehead, cheeks, and chin) are marked by light regions. We choose k = 2.
v.
EMBED LINGUISTIC STATEMENTS INTO FCNN
In this section we present some inner properties of FCNN that function as an intelligent component in the CNN universe as an interpreter between highlevel knowledge expressions and low-level hardware structures. Comparing the results provided in Sect. IV, this section emphasizes the high-level ability of
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
36 I
FCNN structures. And high-level ability is the significant difference between conventional CNN and FCNN. A. FCNN: 1tzterfiice.s Beth>eenHicmurz Experts Lrnd CNN
In an artificial intelligent system, the motivation is to make a brain model as the core of the system. A conventional CNNUM cannot function ;IS this core even though it has already been proved to be a Turing machine (Crounse and Chua, 1995) (Turing machines cannot answer a simple question: “How are you feeling?”) and even though it can be used to explain many visual phenomena (Yiing et NI, 1996f; Csapodi and Roska, 1996; Werblin et [ I / . , 1994) (Seeing is not thinking). On the other hand, the FCNN structure can be used as an interface between a human expert and a conventional CNN. In this sense, the input of an FCNN is the knowledge of a human expert, which is described by linguistic statements, and the outputs are sets of “templates.” In other words, FCNN is used to translate linguistic or higher-level statements, which are expressed as fuzzy rules into CNN structures. 1 . Fuzzy Set Theory c r n d Fuzzy Properties of 1in~ige.s
In each phase of image processing, there exist many uncertainties (Yager and Zadeh, 1992), for example, additive and nonadditive noise in the sensing and transmission processes; the loss of information while 3D shapes or scenes are projected into 2D images; lack of the quantitative measurement of image quality and imprecision in computations; and ambiguity and vagueness in representations, definitions and interpretation of complex scenes. Fuzzy set theory (Zadeh, 1965) provides the mathematical strength to capture these uncertainties (Kandel, 1982; Marks, 1994). I t has found wide applications in image processing (Yager and Zadeh, 1992; Zadeh er d . , 1975; Pol and King, I98 1 ; Kandel, 1982; Marks. 1994; Peleg and Rosenfeld, 1989; Nakagawa and Rosenfeld, 1978) such as image modeling, preprocessing, segmentation, objecthegion recognition, and reasoning aspects of image processing problems. While fuzzy set theory provides an inference mechanism under cognitive uncertainty, the CNN (Chua and Yang, 1988a,b) offers advantages such as learning, adaptation, fault-tolerance, parallelism, and generalization. Although fuzzy logic is a natural mechanism for modeling cognitive uncertainty, it may involve an increase in the amount of computation required (compared with a system using digital logic). This can be readily offset by using FCNN, which has the potential for parallel computation with high flexibility. A fuzzy set A with its finite number of supports x,,i = I , . . . , t i , is defined as an ordered pair A ={ PA (.Ti ))I (242) 1
362
TAO YANG
or, in a union form, A=
Up r / x , ,i = I , . . . , n
(243)
I
where the membership function ~ A ( x ;in) the interval [O,l] denotes the degree to which an event x; may be a member of A. Here P A = 0 represents no membership and p~ = 1 represents full membership. This characteristic function can be viewed as a weighting coefficient that reflects the ambiguity in A . A f i z z y singleton is a fuzzy set that has only one supporting point. I n digital image this concept is very useful because a pixel can be viewed as a fuzzy singleton. The operations on fuzzy sets are extensions of those used for traditional sets. Some of the common operations include comparison, containment, intersection, union, and complement. Assuming U to be the universe of discourse, A E U and B E U , these operations are defined as follows: Cornptrrison: is A = B?
U.
(244)
< ,utj(x), Vx E U .
(245 )
A = B iff p ~ ( . x= ) p ~ ( x ) Vx ,
Contuinnient: is A
E
c B? A
cB
iff
PA(X)
Union: The union of two fuzzy sets A and B , A v B , is given by combining the membership functions of A and B. Although there have been several different union operations defined (Yager, 1979), the most common, and so far the simplest, union is defined as
Intersection: Like the union, the intersection of two fuzzy sets A and R , B , is given by combining the membership functions of A and B and is defined as PAM = m i n { ~ ~ ( xP)B, ( x > ) , Vx E u. (247)
A
A
Conzplement: The complement of the fuzzy set A , 2, is defined as
In addition to these operations, De Morgan’s law, the distributive laws, algebraic operation such as addition and multiplications, and the notion of convexity have fuzzy set equivalents (Zadeh, 1965).
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
363
Kernnrk: The traditional CNN can implement the following fuzzy operations: Comparison can be implemented by using a minus operation template and then checking whether the output is zero or not. Containment can be implemented by using a minus operation template and then checking the sign of the output. Complement can also be implemented by using a minus operation template. However, intersection and union cannot be implemented by using the traditional CNN, and they can be implemented by using FCNN. A gray-scale image possesses ambiguity with each pixel because of the possible multivalued levels of brightness. If the gray levels are scaled to lie in the region [O,l], we can regard the gray levels of a pixel as its degree of membership in the set of high-valued “bright” pixels-thus a gray image can be viewed as a fuzzy set. Regions, features, primitives, properties, and relations among pixels that are not crisply defined can similarly be viewed as fuzzy subsets of images (Zadeh et l i l . , 1975; Pol and King, 1981). With the concept of fuzzy set, a gray-scale image X of M x N pixels with gray levels belonging to [0,I ] can be considered as an array of fuzzy singletons, each with a value of membership function denoting the degree of brightness relative to some brightness level in [0,1 I. On the other hand, the fuzzy property of an image also comes from the uncertainty of the relationship between rent pixels. This is the basis for application of FCNN to image processing.
2. FCNN cis mi Interpreter Human experts usually use linguistic statements to evaluate and describe images. When we take a picture we are most likely to say, “got a little blurred” instead of “filtered by a lowpass filter”. Or we may say “there are some black dots!” instead of “there were impulsive noises at pixels (3,4), (44,94) and (123,321).” Rased on the fuzzy description of images, many fuzzy methods have been developed to deal with image processing problems. It is very difficult to embed linguistic if-then rule-based fuzzy image processing techniques into a conventional CNN chip. To overcome this problem, we need an interpreter between the human experts and the conventional CNN. From many previous works (Yang and Yang, 1996, 1997d,e,f; Yang et nl., 1996d; Yang et nl., 1996e, 1997i, 39988; Yang et a/., 1998. 1998). we find that FCNN can efficiently embed fuzzy rules into its structure. This capability of FCNN is found very useful in interpreting the accumulated linguistic knowledge from the field of fuzzy image processing. On the other hand, the learning ability of FCNN also provides us with the possibility of “teaching” FCNN using linguistic statements and making the design of CNN structures for special image processing tasks much easier. If we think that the only way a human expert communicates with a conventional CNN is through a template design or by collecting a huge body of data to train the CNN structure, we find that teaching an FCNN using our language
364
TAO YANG
may be more direct and easier than the method used so far. In this section we present only the method for embedding linguistic statements into FCNN structure. We will present learning algorithms of FCNN in Sect. VI. B. Embedding Fuzzy Ii$erence into FCNN In research by Yang et 01. (1998g), the FCNN structure for implementing the following fuzzy IF-THEN rule was presented. The rule is given by:
where K (( .i )L I E Ni ,j o( jx,k l , I 5 i 5 M , is an algebraic or fuzzy local operation detined in N,.(i, j ) . Here A;, 1 5 i 5 M is a fuzzy variable and B is a fuzzy variable and the consequent. The corresponding FCNN structure for implementing the above IF-THEN rule is given by 1. State equation
2. Output equation Yi; = P R ( X i j )
(250)
where / L A , ( . ) , i = 1, 2, . . . , M , and p ~ ( ( are . ) membership functions of A ; and B. respectively. In Yang and Yang ( 1997e) and Yang er LZI(19988) we presented the FCNN structure for embedding the fuzzy inference ruled by else action (FIRE) operntors, which are fuzzy operators for image processing (Russo, 1992; Russo and Ramponi, 1994b). The FIRE operators are based on fuzzy IF-THEN-ELSE architectures to perform many important image processing tasks, for example, image enhancement (Russo and Ramponi, 1994a, 1995) and edge detection (Russo, 1992; Russo and Ramponi, 1994b). First, we introduce FIRE operators briefly. Consider an L-level gray-scale is a pixel in the neighborhood image U . Suppose u;; is a pixel in U and of u,;, then we define xk) = lukl - u;;l as “gray-value difference.” We also
365
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
need the membership function of linguistic variable ZERO(ZE), K L f - (x), the membership function of linguistic variable WHITE(WH), / L W H (x), and that of linguistic variable BLACK(BL), p ~ l / ~ ( . x ) . In general, a FIRE operator consists of a group of N IF-THEN-rules and one ELSE-rule as
Rule 2: IF XI is A I I and . . . and .XM is A I M THEN y is BT . . . IF .rl is A N I and . . . and XM is AN^ THEN y is Br ELSE y is BE where M is the number of input variables. Ai.,, i = I , . . . , N ; J = I , . . . , M , is the fuzzy set corresponding to the j-th input variable in the i-th THEN-rule. BT is the common consequent set of the group of THEN-rules and BE is the consequent set of the ELSE-rule. Observe that every THEN-rule in Rule 2 has the same structure as in Rule 1. If BT is different for every THEN-rule, we can use N layers of FCNN in Eq. (249) to implement Rule 2. As all THEN-rules have the same consequent, we can use a simpler FCNN structure to implement Rule 2. Letting A; be the strength of the i-th THEN-rule in Rule 2, we have
where P A , , ( . ) is the membership function of A;,. Letting AT be the strength of the THEN-sub-rule in Rule 2, we have
v
AT=
A,
(252)
j=l. ...N
Letting AE be the strength of the ELSE-rule in Rule 2, we have
Al:’= 1
-AT.
(253)
Finally, the output y is given by a tradeoff between AT and h . by ~ using a proper defuzzifier. There is no unique way to perform defuzzification. And there are some considerations for choosing defuzzifiers. Several existing methods for defuzzification take into consideration the shape of the clipped fuzzy numbers (Bojadziev and Bojadziev, 1995). The complexity of computations and the possibility of VLSI implementation are also taken into account. According to the principles proposed in Yang et al., (1998g), the defuzzifier is
366
TAO YANG
implemented by the so-called output function in the conventional CNN (in our FCNN, it is called an output membership fuizction or defuzz~erfur7ction). The simplest defuzzifier is a threshold function when only binary output is needed. C. Applicution to Irnrige Processing 1 . Fuzzy lnfererzce Edge Detector
We then give the FCNN structure that can be used to embed the FIRE edge extractor (Russo and Ramponi, 1994b). A FIRE edge extractor is illustrated in Fig. 27. In Fig. 27, the membership functions of ZE, BL, and WH are chosen as a trapezoidal shape. To simplify the structure of FCNN, we only use 17. to defuzzify output. We use only one single-layer FCNN to implement all THEN-rules shown in Fig. 27. The state equation of this FCNN is given by
(254)
It should be noticed that a don’t cure pixel in Fig. 27 introduces no template relation. The equilibrium point of the state variable gives the strength of the THEN-rules, 1 ~The . output equation of the foregoing FCNN functions as a defuzzifier and is given by
367
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
p€p% I I
I I I I I
I
I THEN-rule 1
THEN-rule 3
THEN-rule 2
I
THEN-rule 4
I
FlouRE 27. Rule\ for FIRE edge extractor
where y i i denotes the classical truth value of the pixel to be an edge pixel and > 0 is a threshold. The simulation results are shown in Fig. 28. Figure 28a shows the state variables of FCNN in Eq. (254), which is the result of fuzzy inference. Figure 28b is the corresponding output, which is a thresholded (defiizzified) version of Fig. 28a.
h
2. Iiizpulsive Noise Rerimul via Fuzzy Itlference In this section, we consider the problem of impulsive noise removal. Median filters are usually used to remove impulsive noise (Arakawa, 1996; Yin ~t ul., 1996). Unfortunately, a median filter blurs fine structures of an image and causes edge jitter and streaking. To overcome this problem, an efficient method is one in which the median filter filters only those pixels where impulsive noise exists and keeps the other pixels unchanged (Arakawa, 1996). To do this, the first step is to identify impulsive noise. From our experience, an impulsive noise always seems to introduce a significant gray-value difference to its neighbors. This experience can be expressed by the following fuzzy
368
TAO YANG
(3)
(b)
FIOUKE 2X. FCNN-based fuzzy inference for edge detection. (a) Output of FCNN-based f u u y inference. ( b ) Result of thresholding the image in (a)
IF-THEN rule:
IF p ~ ( u ; , ; - l u ; , ) is big and p , ~ - ( u ; , ~ +u;,;) ~ is big and p ~ ( u j - 1 , ;- u;,;)is big and p ~ ( u ; + l , ,; u;,;) is big and p ~ ( u j - 1 , j - l - u ; ; ) is big and p ~ ( u ; + l , , , +~ ~ 4 , ~is) big and p~1.F(u;-l,;+l- u;,) is hig and p,:(u;+l,j-1 - u ; j ) is big, THEN u;; is an impulsive noise. I n the preceding fuzzy rule the membership function pb-(.) functions as a fuzzilier. We choose p,;(.) as the following piecewise linear function:
where k > 0 is a constant. If the input of an FCNN is normalized within 10,I ] and k = 1, then Eq. (256) can be rewritten as
The membership function of linguistic variable big, p b l g ( . ) . can be obtained by using the training method proposed in (Arakawa, 1996). We use the membership function p,,,,,,(.) to denote the degree of truth of the sentence “there
369
FlJZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
is an impulsive noise.” So, p,,,iae(.) = 1 denotes “there is (exactly) an impulsive noise;” p,,,ise(.)= 0 denotes “there isn’t an impulsive noise.” To facilitate VLSI complementation of FCNN, we usually choose phig(.) and pnoise(.) as piecewise linear functions. Then the following FCNN can be used to identify impulsive noises: I . State equation x. ’11- -xi,; ,
+
/I
phig(luk/ - u i j l )
(258)
ChctNi ( l . l ) / c , ,
where N I ( i , j ) / C l , denotes all cells within neighborhood system N I(i, j ) except for C l / . 2. Output equation (259)
~ i ,= j P.noise(Xij).
Generally, finding the corresponding template of an FCNN structure as in Eq. (249) is very difficult and unnecessary. Fortunately, as the relation between synaptic weights and inputs in Eq. (258) is not very complicated, the FCNN in Eq. (258) has the following space-varying nonlinear B f min template: BJ iiiiii(ij)0 llij
=[
pbig(llli-1.j-I phig(1ui.j-I @hig(llli+I.j-I
- uijl) -
phig(1ui-1.j
-
u i , / I ) / ~ h t g ( l ~ i - l , j +l uijl) phig(lui.j+I
ui.jI)
- l(i,,I)
phig(lui+l.j - u i j l )
phig(lui+l,j+l
- ui.jI) - Lli,jl)
1
.
(260) However, even in this simple case, use of a template seems to complicate the expression. The preceding FCNN identifies impulsive noise in a parallel way over the whole image. Here, p h l g ( ’ ) and pUnoite(.) are usually difficult to choose and depends on the statistical properties of impulsive noise. To overcome this problem, we can train these membership functions by a 3 x 3 FCNN. Some efficient learning algorithms for this kind of FCNN had been developed for from examples; these results will be presented in learning p h i g ( . ) and pnoise(.) Sec. VI. Simulation results are shown in Fig. 29. Figure 29a shows a facial image of size 63 x 63 and 256 gray levels. In Fig.29b, impulsive noise of mean value is added. This image is denoted by {d’). Figure 29c 2.56 and deviation shows the output result of a median filter. This image is denoted by (u:!). Observe that the fine structures of the image in Fig. 29a are totally destroyed because Fig. 2% has low resolution and many details have a characteristic width of 1 pixel. Figure 29d shows the output of the FCNN in Eqs. (258)
370
TAO YANG
(el
(d)
FICXIKH 29. Using FCNN to remove impulsive noise. (a) The facial image of a Chinese girl. (b) Impulsive noises are added. ( c ) Output of a median filter. (d) Output of FCNN shows the degree of impulsivc noise. (e) Output of FCNN-based median filter.
and (259). In this image, the gray value of each pixel denotes the degree of being an impulsive noise. A black pixel means that it is an impulsive noise (or, pnoise= 1). A white pixel means that it is not an impulsive noise (or, pnoise = 0). A gray pixel, which occurs at boundaries or edges, means that it is suspected of being an impulsive noise. This image is denoted by (p:,ise). In this simulation, we normalize gray values within interval [O,l], then phi,,(’) is given by phig
and
= x, 0 5 x 5 1
is given by
40 20 where y > s > 0 are two constants, p = m, s = 256
(261 1
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
37 1
We are then ready to give the output result of our FCNN-based median We have filter
(~7,~).
(263) The output result is shown in Fig. 29e. Observe that this result is much better than that of Fig. 2%. The median filter can be implemented by some conventional CNNs. So far, there exist three kinds of CNN-based median filters. The first one (Paul et nl., 1992, 1993a) needs n cells to sort tz samples. The second one (Shi, 1994) reduces the cell number to one and needs a neighborhood of odd number of cells. The third one (Rekeczky et al., 199Sa; Roska and Kek, 1995), which is supposed to be an improved version of the second one, needs a neighborhood with an even number of cells. In this section, some analysis of the second and the third CNN-based median filters is proposed. For simplicity, we consider only those CNNs with neighborhood size of 3 x 3 (i.e., N I ( i , j ) ) ; the analysis of other cases is similar. The median filter given in Shi (1994) has the following state equation:
where the function sgn(.) should be defined by -1,
xo
If the input set { U ~ I I CELNI l ( i , j ) ) is sorted into a nondecreasing order as: (u1, w ,. . . , u5,. . . , U X , ug), then the median value should be ug. Let I I + denote the number of elements in the set (uk Iuk = us, k > 5 ) and t z - denote the number of elements in the set (ukluk = U S , k < S), if n+ = n - then U S is the only equilibrium point of the cell in Eq.(264). In this case, we study the global stability of this equilibrium point. We have the following theorem.
Theorem 16. Given n+ = n - , then of attraction (-00,00). Proof
145
is asymptotically strible in the basin
We construct the following Lyapunov function:
372 V,,(t) > 0 for any x,,(t) # of Eq. (264), we have:
TAO YANG u5.
Differentiating V l j ( f )following the solutions
where
Obqerve that when x,, < U S ,a! > 0 and when x,, > ( x l / ( t ) - u 5 ) a ! < 0 for any x,/ # us. We then have
U S , a!
< 0. Hence, we have
The equality is satisfied only when x,, = u5. So, the median value is asymptotically stable in the basin of attraction (-00, 00). 0 However, if it+ # n - then the cell in Eq. (264) has no equilibrium point. In this case, this median filter has no stable output in common sense. In simulations we found that the output of the cell fluctuated around a certain value with a small deviation as time became sufficiently large. To describe this fact, we need the following definition (Yaiig, 1994):
DeJnition 8 Equilibrium point for the mean (m-equilibrium point): said to be an in-equilibrium point of Eq. (264) if lim E ( x ; ; ( t ) J= x:,
1-00
and lim E ( i , , ( r ) ) = 0. 1-00
Then we have the following theorem:
XI*/
is
(270)
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
E [ s g n ( d * ( t ) )= ]
11- -
373
n+
ri++n-+ 1
then us is an tn-equilibriiirripoint of Eq. (264) when n+ #
11-.
Proof
0
The fourth equality is in view of d * ( t ) E (-6, 6).
Remark: We do not argue that us is the only m-equilibrium point. In fact, there are infinite m-equilibrium points, xs,:, given different E[sgn(x,,( t ) x;,)). This is possible if noise exists. Figure 30 shows the effect of noise on m-equilibrium points. In this simulation, we let (111, M?, . . . , u s , . . . , ug, ug) =
FIGURI: 30. (a) m-equilibrium points will1 different dc biases in point with different unil'ormly distributed noise in . ~ , , ( f ) .
.t;j(f). (b)
In-equilibrium
374
TAO YANG
(0.4,0.5,0.5,0.5,0.5, 0.6, 0.7, 0.8, 0.9}, that is, us = 0.5, n+ = 0, t i - = 3. t ) denotes additive noise in xi,;(t).Figure 30a shows cases when u,, ( t ) = K u is a dc bias. Observe that the m-equilibrium points are changed by different dc biases. For purpose of comparison, Fig. 30b shows cases when II,, ( t ) unifonnly distributes between -Kv Kii. The change of m-equilibrium point is similar to that in Fig. 30a. As these results are independent of initial state, we find the CNN-based median filter in Eq. (264) is robust enough. Given a low level of additive noise, the m-equilibrium point of Eq. (264) is very close to the median value when n f # n - . In most cases, this CNN-based median filter can output a satisfied result. When this median filter is used to process a 256 gray-scale image, it can always output the correct median value because the offset of an m-equilibrium point from a real median value is 3 times less than the value corresponding to 1 bit (We normalized 256 gray levels in [ - I , I]). Rekeczky et al. (199%) proposed a CNN-based median filter as follows: ‘ti,, (
-
1. State equation: i i j ( t ) = -xij
+ f(x;,) +
C
sgn(xjj(t) - i l k / ) .
(274)
Chi EN1 ( i . illc,,
2. Output equation:4 . f ( x i , j ( t ) )=
1/2(Ixij(t)
+ 1I
-
Ix,j(t)
-
11).
(275)
Then Rekeczky et a / . (1995a) argued that “General rank order filters can be implemented simply by changing the bias value of the template (e.g., MIN filter: I = 8; MAX filter: I = -8)”. However, Yin etal. (1996) gives the following description of a median filter: “To compute the output of a median filter, an odd number of sample values are sorted, and the middle or median value is used as the filter output.’’ Since the CNN-based median filter in Eq. (274) uses only eight sample values in N l ( i , j ) , it is not a median filter and its result is very sensitive to initial conditions. For example, suppose these eight values are sorted into a nondecreasing order as ( U I ,. . . , 114, U S , . . . , U X } and 114 # u5. If x f j ( 0 ) < 114, then xfJ(o0) = 114 6 , 6 + O+. If x;,;(O) > us, then x,,;(oo) = 115 6, S ---f 0-. If x i , ( 0 ) is a random number such that 114 < x i j ( 0 ) < U S , then x i j ( o 0 ) = .x;,,(O) is also a random result. When ( U S - 114) is big, (e.g., there exists an edge in N I(i, j ) ) , this CNN-based median filter seems to output a random result when noise exists in x,,j(O). So, this CNN-based median filter is much worse than that in Eq. (264). Also, it cannot be a rank-order filter with different biases.
+
+
Rekeczky et d.(199Sa) and Roska and Kek (1993 did not give the explicit expression of the output function hut described it as (Rckcczky et nl.. 1995a p. 684) “a sigmoid-type piecewise linear function.”
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
375
6
6
e
I
2
2
0
0
2
2
1
4
6 80
6
80 0
" "
(C)
FGUKE31. (a) The difference of gray value between a median filter and the CNN in Eq. (274) with initial condition .r,,(O) = -1. (b) The difference of gray value between a median filter and the CNN in Eq. (274) with initial condition x , , ( O ) = 1 . (c) The difference of gray value between a median filter and the CNN i n Eq. (274) with initial condition x l , ( 0 ) = ul,.
To show this kind of randomness in a real image processing problem, we use the CNN-based median filter in Eq. (274) to process the gray-scale image shown in Fig. 25(a). Figure 31a shows the difference between the median filtering result and the output of CNN in Eq. (274) with initial condition x,, (0) = - I . Figure 3 I b shows the difference between the median filtering result and the output of CNN in Eq. (274) with initial condition x,;(O) = 1. Figure 3 I c shows the difference between the median filtering result and the output of CNN in Eq. (274) with initial condition x;;(0) = u;;. VI. LEARNING ALGORITHMS OF FCNN In this section we present the results that distinguish an FCNN from a computational array. It means that an FCNN can learn its weights from examples or from existing knowledge and the experience of human experts.
376
TAO YANG
There are many references on learning algorithms of different CNN structures (Harrer et al., 199 la,b; Brucoli et al., 199Sa,b; Nossek, 1994, 1996; Tetzlaff and Wolf, 1996a,b; Gunsel and Guzelis, 1995; He and Ushida, 1996; Aizenberg and Aizenberg, 1994; Utschick and Nossek, 1994; Tzionas, 1996; Magnussen and Nossek, 1992, I994b; Magnussen et al., 1994; Guzelis and Karamahmut, 1994; Schuler et al., 1992; 1994; Mizutani, 1994; Balsi, 1994; Hansen, 1992; Guzelis, 1992; Szolgay et al., 1992; Puffer et al., 1995; Van Dam et al., 1994; Aizenberg et ul., 1996; Roska, 1992; Kozek et al., 1993; Sziranyi and Csapodi, 1994; Zou et al., 1990a,b; Vandenberghe et al., 1990; Pelayo et al., 1990). Learning is one of the promising properties of CNN that distinguishes a CNN structure from a parallel computational array. On the other hand, FNN literature also provides us with many special learning algorithms on the high nonlinearity of FNN (Yamakawa and Furukawa, 1992; Ishibuchi, 1993; Blanco et al., 199Sa). Nourished by these two fields, the learning algorithms of FCNN were developed. One difference between the learning algorithms for FCNN and those for conventional CNN is that the learning algorithms of FCNN may have linguistic variables as their examples (input-output pairs). In some cases, when the experience of a human expert is easily obtained and the measuring data is difficult to obtain, this learning ability is very useful. In Sect. IV, we have shown that additive FCNN is a universal framework for implementing different kinds of mathematical morphology operators. Although as shown in Sect. IV mathematical morphology is very useful in signal processing, one key problem is the choice of structuring elements for different tasks. Normally, structuring elements are chosen by trial-and-error methods. Recently, some morphological (neural) networks with learning ability were presented (Davidson, 1992; Davidson and Hummer, 1993; Davidson and Ritter, 1990; Araujo and Ritter, 1992). But the structure of a morphology (neural) network is too complicated to be implemented with the state-of-the-art VLS1 techniques. On the other hand, we find that when FCNN are used as computational arrays, some of them (see examples in Sect 1V.B) are in fact morphological networks. Thus we can train FCNN with examples and find structuring elements from the training results. Because FCNN is a combination of two mature fields: fuzzy set theory and CNN, many regions are waiting to be explored. At the very beginning of our attempts to set up the framework of this brand new field, there were two basic motivations. One involved mathematical morphology (Serra, 1982, 1988; Heijmans, 1992), which is a very elegant framework for signal processing from the geometrical point of view. We found that both FCNN and mathematical morphological operators share two elementary features, local connectedness and max/min operations. The other motivation involved the necessity of developing an interface between human experts (users) and low-level conventional CNN structures.
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
377
An FCNN structure can be used as either a computational array or a learning array. In Sects. IV and V, FCNNs were used as computational arrays. However, the learning ability of FCNN is also a very important aspect because only when an FCNN can learn its parameters from both real number examples and linguistic statements can it actually perform as an “intelligent” interface between human experts and low-level CNNs (e.g., the conventional CNN).
A. Learning Structuring Elernents
I n this section, we present some learning algorithms for additive FCNNs. The learning algorithms are based on the DTFCNN structure. Although DTFCNN can be viewed as a corresponding concept of DTCNN (Harrer and Nossek, 1992b), it is not necessary to obey the tradition of standard DTCNN in which the output should be binary. The learning algorithms presented in this section are used to learn structuring elements from examples. In this view, these DTFCNN structures are mathematical morphology networks with learning abi I i ty . A general framework of type-I1 DTFCNN is given by
where F A ( . )and F H ( . )are two local fuzzy operations defined in N,.(i, , j ) and A , ( i , j ; k , I ) and B + ( i ,j ; k , I ) are fuzzy feedback synaptic weight and fuzzy feedforward synaptic weight, respectively. In this section, we study the learning algorithm of the following type-I1 DTFCNN: xij(k
+ 1 ) = F;k,EN,(;.;)(B,f(i,j ; k I ) , u ~ ) .
(277)
This DTFCNN is a kind of uncoupled DTCNN. It maps its input to its output by a single iteration. This computational structure is very useful in iinplementation of mathematical morphology operators (Heijmans, 1992). 1. Leurning Algorithm cf Additive Type-I1 DTFCNN
In Yang and Yang (1997d, e), we have shown that the following DTFCNN is very useful to implement gray-scale mathematical morphology transformations:
378
TAO YANG
State equation:
+
v
(BfIllax(i, j ; k ,[)
+w),
C U E N , ci.;,
(278 1
1 l i l M , 1 5j ( N .
As the operations between fuzzy feedforward synaptic weights and inputs are additions, the foregoing DTFCNN is called additive DTFCNN. The output equation is given by I
Y i j ( k ) = f ( x i j ( k ) ) = ,(lx;j(k)
+ 1I
-
l s i ( M , l ~ j ~ N .
Ixi,(k)
-
111, (279)
The parameters of DTFCNN in Eq. (278) for implementing erosion are given by Bymax= undefined, By ,,in = -S (280) where S is the structuring element. The parameters of DTFCNN in Eq. (278) for implementing dilation are given by R~,,,,, = S n , B f m i n= undefined (28 I ) where S D is given by S D = (-x : x
E
S).
In this section, we study how an additive DTFCNN can learn the structuring element when only a set of examples ((u;,, O,,;))is available. The quantity ( u ; ; ) is input set and (O;,} is output set. As structuring elements are embedded in the feedforward templates of the DTFCNN, the objective of training the network is to adjust weights so that a set of input produces the desired set of output. This is driven by minimizing the square of the difference between the desired output [O;,) and actual output ( y ; ; } ,for all the samples to be learned (283) It is well known that
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
379
Let us expand the first terms in the right-hand side of Eqs. (284) and (285) as
and expand the second terms in the right-hand side of Eqs. (284) and (285) as
Then let us expand the third term in the right-hand side of Eq. (284) as
Then we consider the so-called "smooth derivative" (Blanco ef ul., 1995a) of min(y, x). In the classical sense, max(y, x) is derivable into the open intervals y ix and y > x but the derivative is not defined at y = x, that is,
From Eqs. (284) and (291) we know that the DTFCNN will stop learning whcn y > x.This makes the learning process very slow. In the worst case, this
380
TAO YANG
can even make the learning process impossible. To overcome this problem, we notice that Eq. (291) only gives the crisp truth value of statement: “ y is less than x.” In this view, we can fuzzify Eq. (291) using different schemes to make the DTFCNN learning in a fuzzified way. One method can be found in Blanco er al. (199.54. However, this method can not be used here, we fuzzify our “smooth derivative” as
where y E [ - I , 11, x E [ - 1 , 11. Similarly, the third term in the right-hand side of Eq. (28.5) can be expanded as
and
Similarly, we use the following “smooth derivative” to guarantee the learning process of DTFCNN in Eq. (28.5)
aE We denote -~ by 6 ; j , therefore ayi, -
aE ax;.; - 6 . .@. . aw(i, j ; k , I ) ” aw(i, j ; k , I )
(297)
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
38 1
where w(i, j ; k , I ) denotes B f ,,,,,,(i, j ; k , I ) or B f m a x ( ij, ; k , 1 ) . Finally, the changes for the weights will be obtained from a 8-rule with expression
where w ( i , j ; k , I ) as that in the Eq. (312); p is a positive constant. 1. Exuniples
Here the learning algorithms of DTFCNN are used to learning structuring elements from examples. We use two examples to show the usefulness of the learning algorithms to structuring element learning. Let the structuring element S be 0.11 0.15 0.13 0.16 0.19 0.18 . (299) 0.12 0.17 0.14
1
Then we have
0.14 0.17 0.12 0.18 0.19 0.16 0.13 0.15 0.11 and
Bfmin = -S =
-0.1 1 -0.16 -0.12
!
-0.15 -0.19 -0.17
-0.13 -0.18 -0.14
I
.
(301 j
Then we use the dilation operator to generate 2000 samples {(u;;,Oi;)) as the training data to train a dilation DTFCNN. The learning process of
the /Iflnax template is shown in Fig. 32. Observe that the elements of Bfmax approached to correct values (see Eq. (300)). The initial condition for B,,,,,, template is 0 and ,LL = 1. As the B f m a xtemplate is of size 3 x 3, we only need a 3 x 3 DTFCNN to learn the structuring element. Next we use the erosion operator to generate 2000 samples [ (u;,, O;,)} as the training data to train an erosion DTFCNN. The learning process of the B , fmin template is shown in Fig. 33. Observe that the elements of Bfn,in approached to correct values (see Eq. (301)). The initial conditions for B f m i ntemplate is 0 and p = 1. These examples show that our DTFCNN learning algorithms work well.
382
TAO YANG
0.22 I
I 1
1'
0.2
Bfmax(2.2)
rJ
g 0.18
a
-6 0.16
!2
0.14 J,
.-u
5
0.12
w
0.1 0.08 n nr. I!.,,"
200 400 600 800 1000 1200 1400 1600 1800 2000 Number of Training Examples
FIG~JW, 32. Learning process of dilation DTFCNN. -0.06
I
I
-0.08 C .+
-0. I 2 m -2 -0.12
-
2 -0.14
u-+.---
L
0
$ -0.16 .-
-
Bfmin( 1.3) Bfiiiin( 3,3) Bfmin( I .2)
c
C
w -0.18 -0.2
-0.22
I 200 400 600 800 1000 1200 1400 1600 1800 2000 Number of Training Examples FIGURE
33. Learning process of erosion DTFCNN.
B. Advanced LeLirning Algorithm of Additive Discrete-Time FCNN The breakpoints of min and max operators pose a big problem regarding finding their derivatives. In practice there exist two kinds of methods to overcome this problem. The first one uses bounded-addition and multiplication to replace the min and max operators. Although this method bypasses the problem of derivatives, the trained network may be functionally very different from the original one. The second one involves developing a rigorous and systematic
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
383
theory for the differentiation of min and max functions by means of step function (Marks er ~ d . 1992), , functional analysis (Zhang er al., 1994), and some special functions (Zhang ef Nl., 1996). For the purpose of deriving the &learning law for DTFCNN, we have to cope with the partial differentiation of E with respect to lzf mln(i,j ; k , 1 ) and B, ,,,,,(i, j ; k , I); such a differentiation can not be given in a conventional sense. By Theorem 8 of Zhang et (11. (1996) we know that the following two expressions are satisfied almost everywhere in real field 91:
where a denotes a partial derivative in a conventional sense and 3 denotes the partial derivative presented in Zhang et ul. (1996). Let us expand the first term in the right-hand sides of Eqs. (302) and (303) as (304) and expand the second term in the right-hand sides of Eqs. (302) and (303) as (305) Then let us expand the third term in the right-hand side of Eq. (302) as ax; ;
384
TAO YANG A
A
-
where Y = B f . m i n ( i , j ; P , 9) + u p q and x = A ~ ~ l ~ N ~ ( i , , j ) , ( e , / ) #r (n i~n (,i 7, ~j ); ( B f k , 1 ) ~ e)./ Becauses min(.. .) and max(., .) are not differentiable functions in the conventional sense, we need to show that under certain conditions all min-max functions are continuously differentiable almost everywhere in the real number field !N. Fortunately, a rigorous theory on this problem was presented in Zhang et al. (1996). To make this section self-contained, we need the following definition and lemma:
+
Definition 9 (Definition I , p. 1141, Zhang et al., 1996). A function !I1 H !N on the real number field !I1 is defined as
flor
:
(307)
Proposition 10. (Corollary I , p. 1143, Zhmtzg et al., 1996). Suppose ci i s N real number and f (x), 121 (x) = ci';.,f (x), cind h 2 ( . x ) = a r \ f (x) are real iwriable functions. If they lire all differentiable at point x, then (308)
It follows from Proposition 10 that
(310) Similarly, the third term in the right-hand side of Eq. (303) can be expanded as
(31 1)
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
--
3E
aw(i, j ; k , 1 )
= 6,/4,-
;Ix, /
3w(i, J ; k , I )
385
(312)
k , l ) . Finally, the chanwhere w(i,j ; k , l ) denotes B +,,,,(i, j ; k , l ) or B f lllax(i,j; ges of weights will be obtained from a 6-rule with expression
(313) where w(i, j ; k, 1 ) is the same as that in Eq. (312); p is a positive constant. The following two theorems guarantee that the learning algorithm in Eq. ( 3 13) makes sense almost everywhere in !li and the learning result will be a local minimum of the cost function E .
Theorem 18. For the erosion DTFCNN in Eq. (280)and the dilution DTFCNN in Eq. ( 2 8 / ) ,arid the cost function in Eq. (283), the partial diflereritiuls in Eys. (306) und ( 3 1 I ) exist almost everywhere in !I{. Proof Because x;,;s in both erosion DTFCNN and dilation DTFCNN are i)-functions, that is, functions containing c and/or , i operations, it follows from Corollary 4 of Zhang et ul. (1996) that the partial differentials in 0 Eqs. (306) and (31 I ) exist almost everywhere in !N.
(c.
Theorem 19. The &rule given in Eq. (313)guarutitees the erosion DTFCNN in Eq. (280) mid the dilution DTFCNN in Eq. (281) will converge to a locul tnitzitnum c$ E in Eq. (283) with Probability I with incrensing iteration index. Proof Similar to the proof of Theorem 10 of Zhang et al. (1996), let us prove the theorem in two steps. First, using the similar process in the proof of Theorem 10 of Zhang et al. (1996), we immediately know that E in Eq. (283) is differentiable with respect to discrete time with Probability 1 . Then, as the second part of the proof, we show that E always decreases whenever it is differentiable. Suppose E is differentiable at time t , then
386
TAO YANG
I . Exumples In this section, the advanced learning algorithms of DTFCNN are used to learn structuring elements from examples. We use two examples to show the usefulness of the learning algorithms to structuring element learning. Letting the structuring element S be the same in Eq. (299), we then use the dilation ) training data to train a dilaoperator to generate 2000 samples ( ( u i , , 0 ; ; )as tion DTFCNN. The learning process of Bflrlaxtemplate is shown in Fig. 3421. Observe that elements of B j m a xapproach correct values (see Eq. (300)) within 300 iterations. The initial conditions for BfinaXtemplate is 0 and I-( = I . As the BfInaxtemplate is of size 3 x 3, we only need a 3 x 3 DTFCNN to learn the structuring element. Next we use the erosion operator to generate 2000 samples ( ( u ; , ; ,O;,;)] a s training data to train an erosion DTFCNN. The learning process of B,+illin template is shown in Fig. 34b. Observe that elements of Bfl,,iil approach correct values (see Eq. (301)) within 400 iterations. The initial conditions for B , illlil template is 0. We choose p = 1. Comparing the results in Fig. 34 and those in Figs. 32 and 33 we find that the learning time of the learning algorithms presented i n this section is much shorter than that presented in the previous section. 4 IJh
,
,
,
,
,
,
,
nsmhcr of ~r.iinmg~.x,unple\
niiinher 01 1c11~nm8e w n p I c \
1 4
(hi
.
,
FicnKE 34. Learning process of dilation DTFCNN and erosion DTFCNN by using advanced learning algorithms. (a) Training dilation DTFCNN. (b) Training erosion DTFCNN.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
387
To show that the learning algorithm can obtain correct learning results, we also show the learning results of different types of structuring elements. The next one is the so-called j k i t structuring element, which has all its elements on the same gray-scale value. We choose the flat structure element as
s=
i
0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
I
The learning process of BJ,,,,, template is shown in Fig. 3%. Observe that approach correct values within 900 iterations. The initial entries of B f condition for B,,,,, template is 0. The learning process of Bfnlin template is shown in Fig. 3%. Observe that elements of B,,,,iIl approach correct values within 1400 iterations. The initial condition for B,min template is 0. Also for comparison, we present the results of the learning algorithms presented template and B f ,,,in in the previous section. The learning process of BJ template are shown in Figs. 35c, d, respectively. The results in Figs. 35b, d are somewhat misleading due to low printer resolution. After 1400 iterations, the learning emors in Fig. 35b are much smaller than those in Fig. 35d. We have performed extensive simulations using different templates and it cult to find cases in which the learning algorithms did not converge to the correct results (global minimum). As the surface of E is very complex, the choice of initial conditions and p is very important to get a global minimum. Zhang et al. (1996) proposed a method to get global minimum by randomly choosing many groups of initial conditions and then choosing the best one from these training results. However, the local minimum problem is still an open problem for almost all of the existing learning algorithms. C. Learning from L,inguistic Inputs
In this section, a learning algorithm for a type-lV DTFCNN is presented. Unlike the FCNNs we proposed before (Yang and Yang 1996, 1997d, e, f; Yang et id., 1996d, e, 1998g), this type-IV DTFCNN can process fuzzy number inputs and real number inputs. Its learning algorithm is based onfuzzy nurizher inputs. 1 . Structure of Type-IV IITFCNN
Generally, an FCNN can be used as a computational array or a learning array. As a computational array, with synaptic weights being predesigned and fixed, FCNN is a universal framework of mathematical morphology network (Yang and Yang, 3997d, e) and a paradigm of processing local linguistic statements (Yang ef al., 1996d3,19988). As a learning array, FCNN should organize its
0.5
II
I _I 1
0
w
02x,
I
I
.
,
,
,
,
,
,
,
,
200 41x1 600 K H I IlK)O I200 1400 I600 I800 2000 Nuniher ot Trainme Example, (hi
W W
FIGURE35. Learning process of dilation DTFCNN and erosion DTFCNN for flat structuring element. (a) Training dilation DTFCNN using advanced learning algorithm. (b) Training erosion DTFCNN using advanced learning algorithm. ( c ) Training dilation DTFCNN using the old learning algorithm of Sect. V1.A. (d) Training erosion DTFCNN using the old learning algorithm of Sect. V1.A.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
389
own knowledge by learning from examples that may be related to crisp numbers or fuzzy numbers. As the conventional CNN can only process numerical information from sensors (e.g., a camera), it can not learn from linguistic information. However, in a hybrid system, the knowledge represented by fuzzy if-then rules usually plays an important role in the high level of image processing and understanding. The learning ability of FCNN can bridge the gap between the linguistic knowledge and the low-level image processing ability of the conventional CNN (or, the conventional CNNUM). When we train a CNN, we need a set of examples that consists of a set of phenomena and a set of corresponding results. Therefore, we have to collect enough data. In some cases we feel that CNN is so limited that it can only learn knowledge from data that may be very expensive to acquire or wastes much time in collection. However, human experts have accumulated a huge body of knowledge and experiences that can not be expressed by data but by linguistic statements. If we can make a CNN-based hybrid system that is smart enough to “understand” and “learn” knowledge from a human expert, we may save both money and time. In this view, FCNN functions as the interface between the human expert and the low-level conventional CNN. In this section, we proposed a DTFCNN structure that can be trained by fuzzy number (i.e., a convex and normal fuzzy set on a real line (Kaufmann and Gupta, 1995)). This DTFCNN has a crisp structure, which allows fuzzy number information to flow through it. Therefore, the synaptic weights in this DTFCNN are crisp set while the inputs, states, and outputs are fuzzy numbers. This DTFCNN structure can process the knowledge of a human expert. This DTFCNN is a type-1V FCNN. In particular, we teach this DTFCNN how to remove impulsive noise in an image using linguistic variables. To remove impulsive noises in images, median filters are usually used (Arakawa, 1996; Mancuso et al., 1996; Pitas and Vanctsano-Poulos, 199 1 ). Although median filters have some edge-preserving capabilities, they distort the fine structures of images (thin lines in the image may disappear and the image becomes slightly blurred). One can use weighted median filters (Yli-Harja et al., 1991) or conditional median filters (Arakawa, 1996; Mancuso rt al., 1996) to improve performance. However, setting weights of a weighted median filter is very difficult, so we do not discuss this kind of median filter. A conditional median filter outputs the median value if an impulsive noise is identified and keeps the input value unchanged if no impulsive noise is identified. Identification of impulsive noise thus plays the most important role in a conditional median filter. In Arakawa (1996) and Mancuso et LZI. ( 1 996), the fuzzy rule-based methods are used to identify impulsive noise and have high performances. However, it is difficult to design the fuzzy rules and choose the membership functions. To overcome this problem, the DTFCNN
390
TAO YANG
learning algorithm is used to learn these fuzzy rules from those linguistic examples that are based on our experience. In this section we use the symbol ‘‘-” over a character to denote a fuzzy number. To reduce the computational complex, the LR-type fuzzy number (Dubois and Prade, 1980) is used. A fuzzy number X is said to be LR-type if and only if
where pi is the membership function of X. g L ( . ) and g ~ ( . are ) the reference functions for left and right references. The c denotes the mean values of X and u and b denote left and right references, respectively. If LZ and h are both zero then X is degraded to a crisp number. We define addition of two fuzzy numbers X and j as
and define multiplication of a real number k and a fuzzy number
X as
m ( y ) = m a x M x ) l y = &I.
(3 18)
For a monotonically increasing function f ( . ) , we define f ( X ) as
The h-level set X” of
X is defined by
2’ = {XI/.Lh(X) 3 h. h E
(0, I]).
(320)
So, X” is a closed interval denoted by
where the subscripts “L” and “CJ” denote the lower limit and the upper limit, respectively. We define addition of two intervals [ X L , X U ] and [ y ~y , ~ as] (Alefeld and Herzberger, 1983) [XL,
xu1
+ [ Y L . vul = 1x1, + Y L , X U + Yul
(322)
and define multiplication between a real number k and an interval [ x ~xu] , as (Alefeld and Herzberger, 1983) (323)
391
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
inax(., .) and min(., .) operations are defined by
For a monotonically increasing function f ( . ) we define
~ ( [ x L xu]) ,
as
(327) (328) (329) (330) (331) A cell C;, in an M x N DTFCNN used in this section is defined by state equation:
+
+
wherei,,(t+ I)isthestateofC,, atdiscrete-timet I andi,,(r 1)isafuzzy number. The F ( . ) denotes a fuzzy local operator defined in r-neighborhood N,.(i, j ) and iikl is the input of CAIand a fuzzy number. Because the preceding DTFCNN does not have feedback synaptic weight, its output equation is given by
where f (.) is a monotonically increasing nonlinear function given by
(334) The conventional DTCNN has an f ( . ) as a sgn(.) function (Harrer and Nossek, 1992b). However, when an DTFCNN is subjected to a learning process, a continuous first-order derivative of f(.)should be used. , f ( . ) defined in Eq. (334) satisfies this condition.
392
TAO YANG
Remarks: The DTFCNN structure in Eq. (332) is completely different from the structures we proposed before. In our previous FCNN structures the membership values were mapped to either crisp values or other membership values, which are real numbers, that is, only real numbers are propagated through these FCNN structures. The DTFCNN structure in Eq. (332) can map fuzzy numbers to crisp values or fuzzy numbers and allows fuzzy numbers to propagate through it. Although we demonstrated that the general FCNN structure is not a kind of conventional NCNN, we did not present examples of FCNN structures that cannot be included in the classical CNN with nonlinear synaptic laws. However, the DTFCNN in Eq. (332) is totally different from any kind of conventional NCNN because the fuzzy number can flow through this structure. This DTFCNN structure is very useful to classification problems where input patterns are fuzzy numbers. As the structure in Eq. (332) is very general, we would like to study the learning algorithm, using one of its simple forms as follows:
This type-IV DTFCNN shares the same mathematical form of a simple min/max FCNN we proposed before. However, as its inputs and states are totally different from those we proposed before, it is a new FCNN structure. 2. Leartiing Algorithm of Type-IV DTFCNN In this section we propose the learning algorithm of the type-IV DTFCNN for two-class classification problems. Assume that we have the following example set:
{(h,L O,/)} where
(G;i)
(336)
is a set of fuzzy numbers given by
O;, is a classification result given by
“1
1 = 0:
{
if C,, belongs to class 1 if C , , belongs to class 2
’
(338)
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
393
If we use the output of C;; to denote the classification result of C i j , we have I , if C;; belongs to class I (339) j i j ( f ) = 0, if C,, belongs to class 2
c
'
From the foregoing one can see that j ; j ( t ) degenerates to a crisp number. It is because the nonlinear output function .f(.> functions as a defuzzifier. An explicit expression of this kind of defuzzifier can be given by (340) where h E (0, 11. Then, given an h-level set of following cost function:
IIJ, our training
objective is to minimize the
(341) where
From this one can see that we should train the DTFCNN using different 12-level sets. An increase of the number of h-level sets improves the training results but also increases the training time. Therefore, there exists a tradeoff between the number of k-level sets and the perforinance of training results. To train the DTFCNN, we use the following learning rules to update two kinds of feedforward synaptic weights B I ( i , j ; p , q ) and B2(i, j ; p , q ) , respectively:
where t is the learning iteration, C,), E N,.(i, j ) , and a and /3 are learning rate and momentum rate, respectively. In the right-hand side of Eq. (343) the aE/i3B1(i? j ; p , q ) is given by
aj!7.
aE - i)E a.il;,; IJ ( i , j ; p , q ) iI-j;,j i~t;;;~ B I(i, j ; p , q )
P6U
DNVA 0V.L
394
TAO YANG
pul? (LPE)
(347)
As X I / is a fuzzy number, then we can train the DTFCNN using the h-level interval numbers as
aq,
(348) where
Then we consider the so-called "smooth derivative" (Blanco et LII., I995a) of max(u1, E). In the classical sense, max(uI, E) is derivable into the open intervals u I < E and L I I > E but the derivative is not defined at 111 = E, that is,
From Eq. (350) we know that the DTFCNN will stop learning when u 1 < E. This makes the learning process of DTFCNN very slow. In the worst case, this
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
395
can even make the learning process impossible. To overcome this problem, we notice that Eq. (350) only gives the crisp truth value of the statement: “ u , is greater than E.”In this sense, we can fuzzify Eq. (350) using different methods. One example can be found in Blanco et d., (1995a). However, the method used in Blanco et id., (199%) can not be used here, we fuzzify our “smooth derivative” as arnax(u1, E) a11 I
=i 1,
if u I > E (351)
Then we have
Similarly, in the right-hand side of Eq. (344) the aE/aB?(i, j ; p, 4 ) is given by
(353)
(354)
396
TAO YANG
Similarly, the “smooth derivative” of min (u?,y ) is given by (356) Then we have
3 . Application to Impulsive Noise Ident~fificution Impulsive noise in an image can be removed by using a nonlinear filter, such as a medium filter, rank-order filter, or mathematical morphology operator. However, almost all of the foregoing filtering methods blur fine structures of the parts of the image where impulsive noises do not exist. Thus a kind of expert knowledge-based method exists that can remove impulsive noise while keep regions without impulsive noise unchanged (Mancuso et al., 1996; Arakawa, 1996). The first step in this kind of method is to identify the location of impulsive noise based on linguistic statements of knowledge of impulsive noise. If we assume that an image is smooth enough, then impulsive noise will introduce a significant difference of gray value from its neighbors. Our visual system has the experience that if a pixel has a gray value that differs significantly from all its neighbors it should be impulsive noise. To make this experience understandable to a DTFCNN, we first translate it into a set of fuzzy if-then rules (Mancuso et d., 1996). Considering a 3 x 3 neighborhood (N I ( i , , j ) ) and use u;,; to denote the gray value of pixel (i, j ) in the image, we have If Iu;,,;-l - ui;l is big and l u ; , ; + ~- “;,I is big and If lu;-~,;- u;;l is big and Iu;+l,, - u,jl is big and If I U - ~ ~ ; - I - u;,l is bigand Iu;+l,;+l - u,;I is big and If lu;-l,,j+l - u;;l is bigand lul+l,,j-1 - u;,;I is big THEN u;,; is an impulsive noise. where “big” is a fuzzy number. As the characteristics of impulsive noise are changed from one image to another, human experts will have different
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
t
FiGLJRE
397
M em hers ti i p
36. Membership ltinctions of three fuzzy numbers: srrrtrll (S). mic/d/c~( M ) ,and hig (13).
qualitative statements for “big. The DTFCNN can do a trddeoff between the judgments of human experts by learning from different linguistic examples (knowledge from different human experts). To train the DTFCNN, we detine the fuzzy numbers big, middle, and smdl as shown in Fig. 36. From Fig. 36 one can see that membership functions of fuzzy numbers smull, middle, and big can be expressed by ”
From the preceding we know that we can use the following DTFCNN to identify impulsive noise:
where B,,, = {Bma,(k,/ ) ] 3 x 3 and &,in = {B,ill(k, / ) ] 3 x 3 are two feedforward templates. The quantity UT+n,J+I denotes the fuzzy number that is used to describe the uncertainty of lu;+~,,+/- ui,l. Since I”;, - u;jl = 0 is always true, Bnlill(O,0) and B,,,,(O, 0) are don’t mre entries. In this section, we let B,,,,,(O, 0) = 0 and B,,,,,(O, 0) = 0. As the DTFCNN in Eq. (361) is space-invariant, we only need a 3 x 3 DTFCNN to learn a 3 x 3 template. And as the training process only needs
398
TAO YANG CLASS I input
output
CLASS 2 input
output
input
output
input
output
input
output
input
output
input
output
0 s
OM HB
input
input
output
output
m a n y o f S , M a n d B mdon'tcare
(i ~mpu~sive J noise @)
not an impulsive noise
FIGURE 37. Illustrations of patterns of training examples in two classes. Class I denotes that an impulsive noise exists. Class 2 denotes that an impulsive noise does not exist.
knowledge from a human expert, we can generate the training examples as shown in Fig. 37. In Fig. 37, each input pattern denotes a possible configuration of iiT++r,j+,sin N l ( i , j ) . Two classes of examples are illustrated. There is only one pattern in class 1 that has output 1 (impulsive noise) while all 8 patterns in class 2 have outputs 0s (not an impulsive noise). We train the DTFCNN using examples choosing from class 1 and class 2 randomly. During the first 2000 examples, we chose 80% of examples from class 1 and the rest from class 2. This makes the learning process faster. After that, we chose only 8% of examples from class 1. This makes the learning process slower and smoother. Figure 38 shows the learning curves of & i n ( 1 , I ) and &,,( I , I ) with parameters: (Y = 0.5, p ( t ) = 0.5 x (0.999)'. The initial values of the entries of templates &,in and B,,,, are chosen randomly in interval (0, 1). As the ranges of inputs and outputs are in [0, 11, in the learning process we restrict the dynamical ranges of B,in(k, I ) and B,,,,(k, I) i n interval [-1, 11. From Fig. 38 one can see that Bmin(1,I ) approaches 1 while B,,,(l, 1 ) approaches 0. In this simulation, we use 3 level sets ( h = and 1 ) of each fuzzy number input pattern to train DTFCNN. After being trained by 40,000 examples, the DTFCNN learns the following templates:
i, i,
399
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS 1
0.9
- 0.8 Z 0.7
m
0.6
3
5 0.5
- 0.4 7
.z v
m
0.3 0.2 0. I
'
012
014
016 018 'I 112 114 Number of Training Examples
1.6
2
118 x
lo4
FIGURE. 38 Learning curve\ o f B,,,, (1, 1 ) and R,,,,, ( I . I )
B,,,,, =
0.996746 0.996926 0.994743 0.996385 0.000000 0.997409 0.997262 0.997369 0.996796
i
)
,
-0.000003 -0.000001 -0.000002 -0.000001 0.000000 -0.000002 -0.000002 -0.000002 -0.000003
(362)
Observe that every entry in B,,iIl is very close to 1 and every entry in B,,,,, is very close to 0 (one should notice that the central entries of both templates are don't care entries). We then use the templates in Eq. (362) to process a 63 x 63 gray-scale image of 256 gray levels, which contains impulsive noises of mean value 220 and deviation 35, shown in Fig. 39a. The image in Fig. 39a is used as u ; j , 1 5 i, j 5 63; u,, is normalized such that condition
(363) is satisfied. After training, synaptic weights of the DTFCNN can be fixed and the DTFCNN is also degenerated into a computational array whose inputs and outputs are crisp values. Thus the crisp form of the trained DTFCNN used in this simulation can be written as
(364)
400
TAO YANG
(c)
FIGURE 39. Computer siniulation results of impulsive noises identilied using the trained DTFCNN. (a) Image containing impulsive noises. (b) The output of the trained DTFCNN. ( c ) The threshold result of (b).
Figure 39b shows the output of the foregoing DTFCNN, from which one can see that every impulsive noise is identified except those in the first and the last rows and those in the first and last columns. It is because we used these cells as dumb cells (boundary cells) for 3 x 3 templates in our simulation. Figure 39c shows the threshold result of Fig. 39b, from which one can see that all impulsive noises are identified. We never use crisp examples to train the DTFCNN but it works well when it processes crisp Inputs.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
401
Because fuzzy numbers can be propagated through this DTFCNN structure, this DTFCNN can learn its templates from linguistic knowledge. In some cases where training examples are difficult to collect and the knowledge of a human expert is available, this structure is very useful. On the other hand, this structure can also be used as an interface between the conventional CNN and the human experts, designers, and users of a CNN-based hybrid image processing system. Therefore, this structure will extend the CNN concept from low-level image processing to high-level image processing and from structurebased image processing to knowledge-based image processing.
FCNN structures can be used effectively to solve fuzzy IF-THEN-ELSE rulesbased image processing problems. Given a set of local fuzzy rules, a systematic method was presented for selecting the corresponding FCNN structures in Sect. V. The membership functions of the linguistic variables used in the fuzzy rules should be chosen according to different rules. In a fuzzy IF-THENELSE rule the membership functions, whose choice is usually a very difficult and time-consuming process, play very important roles. In this section, a real coded genetic ulgorirhrn (GA) is used to optimize the membership functions of the chosen FCNN structure. The corresponding crossover and mutation operations are presented. The crossover operation consists of three schemes that are a tradeoff between the evolution of the best individual and that of the other population. The mutation operation consists of a local one and a global one. The local one makes the evolution search the local basin of the best individual while the global one makes the evolution search the global problem space to overcome the trap of a local optimization. Then the GA is used to optimize the membership functions for solving the edge extraction problem with ill-conditioned examples. A. Genetic Algorithm ,for Optimizing FCNN
GAS are optimization approaches motivated by creature evolution. They cumbine robustness with the ability to explore huge search space quickly. The basic knowledge of GA can be found in Davis ( 1 99 1 ) and Goldberg ( 1 989). The GA exploits the collective learning process within a population of individuals, and each individual represents a search point in the space of potential solutions to a given problem. The applications of GA to fuzzy logic (Hanebeck and Schmidt, 1996; Back and Kursawe, 1995; Lozano et al., 1995; Tryba et al., 1995) can be lumped roughly into two categories: 1) optimization of the membership functions of fuzzy sets; and 2) automatic learning of fuzzy rules.
402
TAO YANG
We use GA to optimize the membership functions of FCNN. The correct choice of the membership functions plays an important role in the design of FCNN. There are some applications (Hanebeck and Schmidt, 1996; Back and Kursawe, 1995; Lozano et al., 1995; Tryba et ul., 1995) that show that GA are capable of optimizing membership functions. The basic idea is to represent the complete set of membership functions by an individual and to evolve shapes of the membership functions. We use GA only to optimize the normalized trapezoidal membership functions, which can be represented by a 4-tuple ( a ( ' )d , 2 )d, 3 )d4)) , as follows (Bojadziev and Bojadziev, 199.5):
(36.5) (0,
otherwise
where A is a trapezoidal fuzzy variable. A typical GA starts with a randomly chosen population of individuals. Then this population undergoes evolution in the form of natural selection. In each generation, relatively good individuals are reproduced, providing offspring that replace the relatively bad individuals which are eliminated. An evaluation or fitness function is used to distinguish good and bad individuals. A typical GA consists of three basic operations: 1) evaluation of individual fitness; 2) formation of a gene pool, and 3) recombination using two basic genetic operators-crossover and mutation. The GA used in this section is shown as fol lows :
/*initiuli;.e*1 Generation t = 0; Initialize the gene pool = G P (0); while (not termination-condition) do generation t = t I ; select individual C ( t 1 ) = (c,}E G P ( t - 1); crossover c,, c, E C ( t - 1) and get C ( t ) ; evaluation and selection C ( t ) and get GP ( t ) ; mutation G P ( t ) ; end
+
I
+
Since in an FCNN-based FIRE edge extractor, only two fuzzy variable ZERO and WHITE are used, the k"' individual in the gene pool ck can be represented by
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
403
where (a,( 1 ) , czk( 2 ), (ik(3) , cik(4)) IS a 4-tuple for determining the trapezoidal membership function of ZERO and (hi”, h;”, h f ) ,hi4’) is that for WHITE. The initialization of the gene pool is given by the following process. Suppose is a pseudo-random number distributed in (0, I ) . It is easy to see that the only choice of a:’) of p ~ & ) is a : ’ ) = 0. Then aL4)is chosen by ’
(4)-
Uk
and
c,:.~)
-
5
(367)
2
and a:,”), respectively, are chosen by
(369) Observe that hi” for p ~ ~ (has x )a best choice of 6;’ given by t 6,( 1 ) 0.5 3
= I . Then 0;” is
+
(370)
and then by’ and b13),respectively, are chosen by
Crossover is given by the max-min-arithmetical algorithm presented in Lozano et al. (1995). Assuming that C I = (cl(l), c1(2), . . . , cl(8)) and C? = {c?(l),c2(2), . . . , ~ ( 8 )are ) two individuals to be crossed, then the four offsprings are given by
CT
= ( ~ y ( i ) l c y (= i)~
C ( iI)
+ (I
c; = (c;(i)lc;(i)= ( 1 - (~)q (i)
I , 2, . . . , 8)
(373)
+ a c l ( i ) ,i = 1,2, . . . , 8 )
(374)
-
( ~ ) ~ 2 ( i i) ,=
+ c?(i)),i = I , 2, . . . , 8) ci = ( c z (i )l c z (i )= min(c1 (i) + c ? ( i ) ) ,i = 1 , 2, . . . , 81 C;
= {c;(i)lc;(i)= max(cl(i)
(375) (376)
where a E (0, 1 ) is a constant. And then the best ones are selected. There are three crossover schemes used in our GA. The first one, which occurs with a probability pel, is the crossover between the best individual and the
404
TAO YANG
worst one. And then the best offspring is substituted for the worst one. The second one, which occurs with a probability p C 2 , is the crossover between the best individual and any of the subworst ones. And then the best offspring is substituted for the subworst one. The third one, which occurs with a probability pc.3,is the crossover between any two of the subbest ones. And then the two best offsprings are substituted for the two subbest offsprings. Mutations consist of a local mutation scheme and a global mutation scheme. The local mutation, which occurs with a probability p , ? ! ~is, given by the following process. Assuming that an element of an individual ck = (ck ( 1 ), . . . , , 8, is chosen for local mutation and that , ck(8)),ck(i),i = I , the domain of ck(i) is [d', d ' ] , then the result is a new individual ck = (ck(l), . . . , c : ( i ) , . . . , c k ( 8 ) ] ,where c ; ( i ) is given by
c:(i) =
{
c;(i) - c*(cE(i)- d), for 6 > 0.5 c i ( i ) ["(cf' - c : ( i ) ) , else
+
(377)
where {* is a pseudo-random number uniformly distributed in (0,l). The [d', d'7 for each element of individual ck is given by: (([0,0],[0,d3?,b'2),a'4)], 111, (lo, ~ ' 1 [b"), , P ) ] , [ P , b(411, [ I , 11)) The global mutation, which occurs with a probability pm2, is the same as initialization. Thus, the local mutation can be used to improve the existing individuals while the global mutation continuously added new types of individuals into the gene pool during the evolving process. To evaluate the performance of an individual ck, the output of FCNN in Eqs. (254) and (255), yi;, is compared with the ideal output (o;;)by using an error function (378) where
41, ( t )=
cr(t), for o,, = 0
(379)
is the evaluation weight. In our simulations, we let a ( t ) = 1 and B ( t ) = 0.5 0.5c. And we define the global fitness of the tth generation gene pool G P ( t ) ,E,,,,,as n E,in(t) = inin E k ( t ) (380)
+
k=l
where n is the number of population in the gene pool. B. Application to Image Processing
In this section, computer simulation results are provided. We use GA to choose p z ~ ( . and ) ~ w H ( . )Figure . 40(a) shows the original gray-scale image of size
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
3200 3000 1 2800 2600 E 2400 2200 1000 I800 I600 .
,
,
,
,
,
,
,
405
,
WE
\
63 x 63 with 256 gray levels. The image is normalized to [O,l]. Figure 40b shows a bad version of an edge detecting result. Observe that much noise exists in this result and the edge is almost indistinguishable. Then we use GA to learn p z , ~ ( . and ) p w (.) ~ from this ill-conditioned example. The parameters of the GA used in this simulation are chosen as:
population size: tz = 10. prohabili9 of croossover: p(I = 0.1, p ( 2 = 0.1, p'y = 0.2. Max-Min-Arithmetical crossover parumeter: CY = 0.618. prohuhility ( f rnutution pnlI = 0.2, I),,!?= 0.5. stop conditions: If El,,, trapped into a local minimum more that 45 generations.
406
TAO YANG
After 100 generations, we get the following individuals:
( a ‘ ’ ) C, I ( ~ ) a, ( 3 ) d4)) , = (0.000000, 0.00461 I , 0.007106, 0.043000) (b‘”,O”’, h‘3’, h(4))= (0.543300, 0.675663, 0.689058, 1 .000000).(381 )
The corresponding output is shown in Fig. 40c. Observe that the edge characteristics are enhanced while noise is suppressed. In this simulation, h in Eq. (255) is chosen as h = 0. One should notice that the contour of the entire face, which is ahnost diffused by noises in Fig. 40b, is perfectly recovered by the FCNN and GA as shown in Fig. 4Oc. As well, the contours of both eyes and the mouth are significantly recovered and filled into closed curves while those in the example in Fig. 40b remain broken lines. From the crisp or classical image processing point of view, the learned result in Fig. 40c is “terrible” because there exists a big error from the original example. However, from a human expert’s point of view (i.e., from our cognitive point of view), the result in Fig. 40c is much better than its original example. In the first view, this kind of improvement is unbelievable because our common sense is that any learning algorithm of an artificial neural network (ANN) is an approximation to its supervisor examples. The usual example that an ANN learning simulation can give is to use a perfect crisp algorithm to generate some input-output examples and then use an ANN to learn the known crisp algorithm from the input-output pairs. In this view, the trained ANN should not be better than the crisp algorithm. Why does the FCNN in the simulation perform better than its supervisor examples? In this case, improvement comes from the structure of the FCNN. When we come back to Sect. V.C. 5.3 and Fig. 27, we can find that the human expert’s intuition (or experience, knowledge) about the concept of “edge” has been embedded into the FCNN structure as shown in Eq. (254). While the learning ability of our brain is overemphasized these days, we should also remember that our brain has a unique structure that took nature millions of years to evolve. The structure of our brain is also a kind of knowledge. How can this kind of structural knowledge be made useful? The answer resides in learning. It we return to the example shown in Fig. 40, one can see that in the evolution of learning, the knowledge of edge detection embedded in an FCNN structure will gradually correct errors from the supervisor examples. Our simulation demonstrated that any distortion found i n the knowledge structure as shown in Eq. (254) or Fig. 27 gave a much worse result. From this example, one can also find that FCNN is a high-level CNN structure capable of embedding human experts’ knowledge in a very efficient way and performing some intelligent behaviors.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
407
Of course, if we can acquire the perfect examples, the conventional CNN may be a better choice to learn from these examples. In fact, an excellent example of DTCNN learning algorithm of edge detection had been proposed by Harrer et al. (1991a). The question that remains, however, is how the conventional CNN performs when only ill-conditioned examples are available. Without prior knowledge of the task embedded in its structure, the CNN will be puzzled and wander in the problem space and settle down to an arbitrary local minimum in the vicinity of its initial condition. Why not use both ill-conditioned examples and prior knowledge (usually represented by a set of rules or linguistic statements) to train our model‘? I f we can use this method, we can let our model use its “knowledge” to judge whether an example is good or bad. Then big weights are automatically assigned to good examples and small weights are automatically assigned to bad ones. Structural representation of knowledge plays a very important role i n human intelligence. We keep emphasizing that there are two motivations for inventing FCNN one is mathematical morphology and the other is to embed human knowledge into a CNN structure. This section provides another result motivated by the second of the two.
VIII. APPLICATIONS OF DISCRETE-TIME FCNN Although in most applications both continuous FCNN and DTFCNN can be used, DTFCNN has some unique applications that continuous FCNN can not perform. In this section we present some of these unique applications. A . Inipletnenting Nonlinear Fuz: y Operutors jor Image Processing
The fuzzy operators we discuss in this section are based on fuzzy IF-THENELSE rule bases. The parallel computation mechanics of DTFCNN are used to offset the computational complexity of fuzzy image processing problems. 1 . The Structure of DTFCNN
A DTFCNN is described by the following equations:
I . Input function sublayer (= fuzzifier layer)
408
TAO YANG
where Ekl(r) is the detected signal, for example, the output of a camera. The (.) is the membership funcquantity t denotes discrete-time iteration; C k / E'Ly~ N,(lJ) tion of the fuzzy variable embedded in cell Ci; and it is used by feedforward synaptic law. Finally, { ] denotes a set. 2. Cell dynamics sublayer (= fuzzy inference engine layer) (383)
where ck/:i;(i,)(.) denotes a fuzzy inference process in N r ( i j ) . Here fill(.) is the membership function of the fuzzy variable embedded in cell C i , and it is used by feedback synaptic law. The y k l ( t ) is output, which is given by: 3. Output function sublayer (= defuzzifier layer) (384)
where ck,:;(,Jl (.) denotes a defuzzifier function defined in N,.(ij).
2. Embedding Fuzzy IF-THEN-ELSE Rules into DTFCNN Fuzzy local image operations were developed as new image processing tools (Russo, 1992; Russo and Ramponi, 1994a, b, c, 1995) because there exist different kinds of uncertainties in image processing and image understanding. Some simple local fuzzy operators such as fuzzy shrinking and fuzzy expanding may be considered as types of mathematical morphological operations and can be readily implemented by type-I1 FCNNs (Yang and Yang, 1996; 1997d, e; Yang et nl., 1996d). Here, we use type-I1 DTFCNN to implement fuzzy IF-THEN rule-based image operators. One simple fuzzy operator is given by the following fuzzy rule:
Rule One IF ((Ekl)' is A ' ) AND ({Ek1l2is A 2 ) .. . AND ( ( E k l J Mis A''"), THEN (y,, is BI ( X ; C I , W ~11, I ELSE (yl, is B O ( X ; C OW .U ) ) for CJJE N , ( i j )
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
409
where Bo(x; CO, w g ) and B I ( x ;c1, W I ) are two triangularly shaped fuzzy sets, which are defined by (Bojadziev and Bojadziev, 1995) (385) A / ) , p = 1, 2, . . . , M are M fuzzy variables and y ~ is/ a crisp output of Rule One. Then we can use the following DTFCNN to implement Rule One.
I . Input sublayer for implementing
({EklJP
is A " ) , p = I , 2 , . . . M : (386)
2. Cell dynamics sublayer for implementing IF parts is
(387) By adopting correlation-product inference (Kosko, 1992), we get the third sublayer as follows. 3. Output sublayer for implementing THEN-ELSE part
For comparison, the preceding DTFCNN structure can be summarized into a form similar to that of a conventional CNN as follows: 1. State equation
Of course, one set of fuzzy IF-THEN rules may be too simple to solve a practical problem. We usually need more that one set of IF-THEN rules in fuzzy image operator. For example, Russo and Ramponi ( 1 995) use a fuzzy
410
TAO YANG
rule set that contains 32 IF-THEN rules and one ELSE rule. Usually we should consider the following fuzzy rule:
Rule Two IF ( { E u } " is A " ) AND ( { E k / ) 1 2is A ' * ) . . . AND ( { E ~ I } ' ~ ~ is A'"'' 1, THEN (y;,; is B I(x; C I , W I ) ) ,
IF ( { E L / } "is A " ) AND ( { E L / } 'is * A ' * ) . . . AND EL/}'"'^ i? THEN (y,, is B , ( x ; c , ,wf)),
AIM,),
, p = 1,2, . . . , N are N triangularly shaped fuzzy sets where B p ( x ; c pwIJ), as defined in Eq. (385). The { E k / } l J ' / ,p = 1 , 2 , .. . , N , q = 1 , 2 , . . . , M I , are fuzzy variables. This rule base consists of N IF-THEN rules and one ELSE rule. Similar to the implementation of Rule One, the IF part of the p-th IFTHEN rule can be implemented by the following DTFCNN:
Therefore, we need p layers of DTFCNN to implement p IF-parts. The ELSErule is implemented by the following DTFCNN:
Then the whole rule base is finished by a common output layer
(393)
41 1
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
In conclusion, to implement Rule Two, we need ( N a common output layer.
+ I)
layer DTFCNN and
3. Implementing Fuzzy Inference Shurpener We then show how a fuzzy IF-THEN-ELSE rule base for image processing can be embedded into a DTFCNN structure. Consider a basic fuzzy sharpener presented by Russo and Ramponi ( 1 994c). The rule shown in Fig. 4 I is applied to a 256-gray-level digital image. It should be noted that all inputs in the rules are gray-value differences between each pixel in the neighborhood system and the central pixel. This is the so-called “relative in the antecedents” approach (Russo and Ramponi, 1 9 9 4 ~ ) . The rulebase in Fig. 41 consists of two IF-THEN rules and one ELSE rule. We can express the rulebase by the following equivalent statement:
IF (E;-I,,;-~ - E;, is P ) AND (Ej-1.; - E;; is P ) AND (E;-I,,,.~-E;; is P ) AND (Ei,,-1 -E;,, is P ) AND ( E ; , K I- E ; , is P ) AND ( E + ~ ,-E;; ~ - ~ is P ) AND @ i + l . , i - ~ i , i is P ) AND (Ei+l,j+l - E,; is P ) THEN ( y i , ; is N ) , IF (E;-I,,-I -E;, is N ) AND (E,-l,,-E;; is N ) AND ( E ; - I , ~-E;j I is N ) AND (Ej,i-l - E i j is N ) AND ( E , , f i-Ei; ~ is N ) AND (E,+I.~-I - E i J is N ) AND ( E ; + I , ~ - E is ; ~N ) AND (E+l,,*~ -Eii is N ) THEN (yi,; is P ) , ELSE ( y i , is 2 )
To implement this fuzzy inference sharpener, letting E l , denote the grayvalue of pixel (i, j ) , then we have the following multilayer DTFCNN structure. The state equation of DTFCNN #I which is used to implement the 1st rule in Fig. 41 is given by
where p p ( . ) is the membership function of fuzzy set P as shown in Fig. 41. The state equation of DTFCNN #2, which is used to implement the 2nd rule in Fig. 41 is given by: Eij)
(395)
where P U N ( .is) the membership function of fuzzy set N as shown in Fig. 41.
412
TAO YANG
Definitions of fuzzy sets
CN 0 Wz cp gray-value difference
Rulebase for sharpener
The 2nd RULE
ELSE
ELSE
-El
RULE FIGURE 41. The rulebase lor fuzzy inference sharpener.
The state equation of DTFCNN #0, which is used to implement the ELSE rule in Fig. 41 is given by = rnin(1 - x x ,I , ( t - I ) , I -xf,(t x,,(t) 0
- 1)).
(396)
The foregoing three layers share a single output layer as
p=o
where c,, and w ~ , ,p = 0, 1,2, are centers and widths of triangularly shaped fuzzy variables Z , N , and P as shown in Fig. 41, respectively.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
4 13
Similarly to that in Russo and Ramponi (1994c), if we change the output layer of the foregoing multilayer DTFCNN into the following form: ?,lo)=
Yf(A$-
l)J;/(f-
I>,x,?,(t- I ) ) + 1
(398)
where I is a fixed bias for each cell, then we get a kind of fuzzy high-pass filter. The output of a DTFCNN-based fuzzy inference sharpener is shown in Fig. 42. Figure 42a shows the original image. Figure 42b shows the output of the DTFCNN-based fuzzy inference sharpener. The parameters are chosen as: y = 1, cz = 0, M?Z = 50, ~p = 255, wp = 300, C N = -255, and WN = 300. When we choose I = 128 and keep all other parameters unchanged, the output result of the DTFCNN-based fuzzy high-pass filter is shown in Fig. 42c. B. Enzhedding Locd Fuzzy Rellition Equntioizs FCNN is the only existing high-level CNN structure in CNN universe. Here, the phrase “high-level” means the ability to process conceptual variables, for example, linguistic variables. We present here a DTFCNN structure for embedding local fuzzy relation equations. Fuzzy relation equations were first recognized and studied by Sanchez ( 1976). Fuzzy relation equations play an important role in areas such as fuzzy system analysis, design of fuzzy controller, decision-making processes, and fuzzy pattern recognition. Fuzzy relation equations are associated with the concept of composition of binary fuzzy relations, which includes both set-relation composition and relationrelation composition. We only use max-min composition because it has been studied extensively and utilized in numerous applications. Embedding fuzzy relation equations into artificial neural networks (ANN) is not new-there are inany references (Blanco et al., 1995a,b; Hirota and Pedrycz, 1996; Nola et al., 1995; Pedrycz, 1991) on this topic. Thanks to these references, we can combine DTFCNN and fuzzy relational neuroconiputations very easily. 1 . Locul Fuzzy Relation Equation and I t s Irriplenzentatiorz
Let A i j be a fuzzy set in N , - ( i j ) and R , j ( N , . ( i j ) ,(4ij))be a binary fuzzy re, set ( & / } = (@lj, . . . , #:;}, then the set-relation lation in N , ( i j ) x ( 4 i j ]where
414
TAO YANG
(c)
FIGURE42. The sirnulation result of DTFCNN-based fuzzy inference sharpener and I’u~zy high-pas filter. (a) Original image. (b) The output of the DTFCNN-based fuzzy inference sharpener. ( c ) The output of the DTFCNN-based fuzzy high-pass filter.
composition of A;j and Rij, A;,oR;,, results in a fuzzy set in { y i l ) . Let us denote the resulting fuzzy set as B i j , then we have A I.J. o RI J. . - B I J ’
(399)
The preceding equation is a fuzzy relation equation. The membership function of Bij is given by:
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
4 I5
If we view R,, as a local fuzzy system, A,, as a local fuzzy input, and B,, as a fuzzy output, then Eq.(399) describes the characteristics of a fuzzy system via fuzzy input-output relation. From Eq. (400) we can see that an m-layer DTFCNN can implement the fuzzy relation equation in Eq. (399). The p-th layer DTFCNN is given by 1. Input equation
2. Cell dynamics
is the synaptic weight. 3. Output equation = x:(t .$’.(t) ‘J
-
I ).
2. An Exumple To show how fuzzy relation equations can be embedded into a multilayer DTFCNN we define the following fuzzy relation equation:
where U = { u ~1.42, , . . . , ux, u 9 ) is numbered according to Fig. 43 (lower part labeled “numbering order”); U can also be expressed by the following pattern:
and C is given by C = { c , ,cz}; R is given by
416
TAO YANG
255
I
S
2
6
9
1
3
8
4
numbering order Fi(iuRE
43. The fuzzy set R and the numbering order of cells in N l ( i j ) . CI
'0.5 0.5 0.5 0.5 1 1 1
I ,0.2
c2
0.75' 0.75 0.75 0.75 0.5 0.5 0.5 0.5 1 ,
The first-layer DTFCNN is used to implement the first column of R in Eq. (408), which is given by:
I . Input equation Uk/
= PB(Ek/ 1, Ckl E NI ( i j ) .
(409)
2. Cell dynamics
where B " ) ( i , j ; k , I ) is given by the following fuzzy set defined in Z 2 ( N l (ij)) grid: j-1 j j + l 0.5 1 { ~ ( " (ji;,k , I)]= i 1 0.2 (41 1 ) 0.5 1 0.5 which corresponds to the first column of K in Eq. (408).
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
4 I7
3. Output equation 0) (11 .y1.1. ’ ( t ) = ,rij ( t - I ) .
The second layer is used to implement the second column in Eq. (408), which is given by: 1. Input equation
2. Cell dynamics x,(;’(t) =
max
CIIENI ( I
/I
min[ukl, B(*’(i,j ; k , 1 ) ]
(414)
where B ( 2 1 ( ij,; k , I ) is given by the following fuzzy set defined in Z2(N1 ( i j ) ) grid j - I .j j + l 0.75 0.5 0.75 (415)
3 . Output equation (2 I
(21
y,, ( t ) = x , , ( t - 1 )
Figure 44 shows the simulation results. I n this simulation, we choose the fuzzy set B in Fig. 43 as x < 50
0, 1.50
1.
’
50 5 .r < 200 else.
(4 17)
Figure 44(a) shows the output of the first layer. The gray value of each pixel corresponds to a membership value. Figure 44(b) shows the output of the second layer. Also, the gray value of each pixel corresponds to a membership value. Observe that the forementioned two results are two kinds of image segmentations. In both simulations, wrap-up boundary conditions are used. Because fuzzy relation equations can be viewed as a description of fuzzy systems that have fuzzy input and fuzzy output, we find that DTFCNN functions as a parallel implementation of this kind of fuzzy system. The immediate applications of this kind of fuzzy relational DTFCNN structure are image processing and pattern recognition.
418
TAO YANG
FIGIJKE 44. Simulation results of DTFCNN-based local fuzzy relation equation. (a) The output of the first DTFCNN layer. (b) The output of the second DTFCNN layer.
3. Detecting Impulsive Noise As we have shown, the first step for removing impulse noise is to identify
its position. In this section we will show how a DTFCNN-based local fuzzy relation equation can perform this task. We choose the input equation as
and BIG is given by .Y - 100- r
5 -200
,
-200 < x 5 -100 -100 5 x < 100 .r - I00 100’ 100 _ ( x < 200 x 2 200. ,()()
(4 19)
The cell dynamics are given by
where B(i,,j;k , l ) is given by the following fuzzy set defined in ZTN1 ( i j ) )grid:
{B(i,j ; k , / ) l = i~ i
+1
(
j-1
0.5 0.5 0.5
j
j + l
0.5
0.5
0.5
)
(42 1 1
FUZZY CELL.ULAR NEURAL NETWORKS AND THEIR APPLICATIONS
(a)
4 I9
(b)
FIGLIRE 4.5. Sirnulalion resulis of DTFCNN-based local f u u y relation equation for impulse noise detection. (a) Image with impulse noise. (h) The output of the DTFCNN-based local fuzzy relation equation.
The simulation result is shown in Fig. 45. Figure 4% shows an image with impulse noise. Figure 4Sb shows the output of the DTFCNN. Observe that all impulse noise is found. 4. Fuzzy Orietztutioti Derivatives Fuzzy local relation systems can also function as high-pass filters. In this section we show DTFCNN-based fuzzy orientation derivatives. We choose the input equation as
and BIG is given by
(423)
The cell dynamics are given by
420
TAO YANG
For horizontal fuzzy orientation derivative B ( i , j ; k , 1 ) is given by the following fuzzy set defined in Z2(N1 ( i j ) ) grid j - I
j
;+I 0.4
(425 1 i+l
For diagonal fuzzy orientation derivative B ( i , j ; k , 1) is given by the following ~ grid fuzzy set defined in Z 2 ( N (ij))
( B ( i ,j ; k , I ) ] =
i i+1
(
j-1
0.4 0.4
j j + l 0.4 0.4 0.4 0.4
)
.
(426)
Simulation results are shown in Fig. 46. Figure 46a shows the result of horizontal fuzzy orientation derivative. Figure 46b shows the result of diagonal fuzzy orientation derivative.
IX. CONCLUSIONS AND FUTURE WORK
The invention of FCNN comes in part from AI. The idea of embedding fuzzy set theory into a CNN framework is partially motivated by the inner connection
(a)
(b)
FIGLIRE 46. Simulation results of DTFCNN-based fuzzy orientation derivatives. ( a ) Horizontal fuzzy orientation derivative. (b) Diagonal fuzzy orientation derivative.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
42 1
of mathematical morphology and fuzzy logic. Because the substantial basis of FCNN is FCA, which is a model for modeling and coping with complexity, the invention of FCNN can also be used to model the complexity from a high-level local activity. Another motivative comes from the demand to use the huge body of knowledge that comes from human experts. Encountering the complexity of the outer and inner world, a human individual uses a systematic but general description to model and handle the behaviors required for survival. If we only model the complexity itself, it is not very helpful to survive it. What fuzzy theory contributes to sciences is to model the survival strategy of the human individual directly. In FCNN, we model the systematic behavior of handling complexities due to local activity but not complexities themselves.
A. The C N N Uriiverse
From the standpoint of AI, the existing CNN universe is something like that shown in Fig. 47. In this figure, we do not include CNNUM because it is not a CNN class but a platform of CNN. The CNNUM emphasizes implementation and integration of different CNN structures. Although more than 400 papers in this field have been written since 1988, CNN is far from mature because we have found only the slightest bit of the CNN universe. The implementation of CNN using different techniques deserves further investigation because the simple structure of CNN provides us with the possibility of implementing it. Applications of CNN to signal processing, in particular, image processing, also need further study because it seems that CNN is a very promising candidate for the next generation of parallel image processing engines. On the other hand, the CNN paradigm can be used to animate many biologic, chemical and physical processes where dynamics are governed by local coupling of simple units. However, from Fig. 47 one can see that most parts of the high-level CNN are unknown. In fact, the only high-level CNN we know so far is FCNN. We can image that the high-level CNN should include some paradigms that can be used to model the dynamics of human society when nonphysical factors such as emotions, feelings, and intuitions are used as local couplings along with physical factors that include food, money, dwellings, and work opportunities. Although today we can not imagine how to embed nonphysical things into the structure of CNN, we know human society has used them efficiently to organize itself for thousands of years. We believe that the future CNN model should be something like those we have predicted in Sect. 1I.A. We are always very careful to avoid giving the reader an impression that CNN can do everything. On the contrary, we restrict the range of CNN to a of problems the can be decomposed into local components. Because the
422
TAO YANG
//
continuousCNN
\
/
~
\
-
\\ \
/ CNN UNIVERSE
FIGURE47. The map of the C N N universe
top-down process, that is, decomposing a global problem into local components, sometimes is very difficult, bottom-up processes, that is, using relatively simple elements and local rules to generate some global behaviors, are aiso employed. So far, almost all CNN applications to signal processing and biological modeling employ the top-down method. And some applications of CNN to pattern formation and spatiotemporal process modeling employ the bottom-up method. However, when the bottom-up method is used, the emergent behavior of CNN may be very difficult to interpret. In general, it is not the problem of CNN itself, but the elementary problem of emergent computation. We should come back to FCNN. Type-I1 FCNN is most studied and wellunderstood. In particular, we have presented an entire set of methods to exploit the world of some type-I1 FCNN that are used as computational CNN. However, when fuzzy numbers flow though type-I1 FCNN structures, the problem becomes very complicated. This makes FCNN totally different from
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
423
conventional CNN. Research in this field has just begun. Type-I and -111 FCNN provide both more possibility and flexibility in the modeling of local coupling processes. Sometimes, type-I FCNN is more complicated than type-I1 FCNN because its fuzzy structure may introduce more complexity. On the other hand, from the examples in Sect. 1V.D one can see that the potential application of type-I FCNN is beyond the range of linear signal processing. It is most possible to provide new methods to be used for nonlinear signal processing. So far, we know of very few type-I11 FCNN because they are too complicated to analyze. The results in Yang et 01. (19981) show that type-Ill FCNN can be used to model very complex systems where linguistic flows are used as state variables. Most of the world of type-I11 and even type-IV FCNN still awaits exploration. Although we emphasize FCNN applications to image processing, FCNN can also be used to model complex processes such as spatiotemporal chaos. As an example, we present a type-11 DTFCNN structure to implement fuzzy spatial Dilemmas. The fuzzy spatial Dilemma is a new concept we have generated from conventional spatial Dilemma (Nowak and May, 1993). B. Implenzenting Fuzzy Spatial Dilemmas Using Type-I1 DTFCNN The conventional spatial Dilemma (Nowak and May, 1993) is defined by a game played between two types of players: the defector (denoted by D) and the cooperator (denoted by C). The interaction between a cooperator and a defector is described by the following payoff matrix:
C
D
;(; :)
(427)
In this matrix, we show only the payoff of a player. If two cooperators interact, both receive 1 point. If a defector meet a cooperator, the defector receives 2 points and the cooperator receives 0 point. If two defectors interact both receive 0 point. The fuzzy generalization of the spatial Dilemmas is along two directions. The first one fuzzifies the payoff. As uncertainties exist in the payoff, we should describe the payoff by f u z i y variables such as “high” or “low.” The second one fuzzifies the property of a player. Because a player can be a very complex system such as an animal or even a human individual, we can not absolutely define which one is a defector or a cooperator. A better way to describe the property of a player is to assign a degree of being a defector (or being a cooperator). By using the fuzzy property of a player, we can say a player is low defection, middle defection or high defection.
424
TAO YANG
Here, we use two kinds of fuzzy descriptions to describe uncertainties in both the payoff and property of a player. We use membership function p ~ ( . ) to denote the degree of the player x being a defector. We use a fuzzy set P to denote the payoff. We use triangularly shaped fuzzy sets. We denote a triangularly shaped fuzzy set by A ( x ; c, w), which is given by:
The mathematical operations between two triangular-shaped fuzzy sets A ( x ; c,, w,) and B(x; C h , wh) are defined by (Bojadziev and Bojadziev, 1995)
where ci > 0 is a scalar. In this case, when player x plays with player y the payoff map T ( x , y) for player x is a 2D fuzzy set T : !)I x 91 H [0, I], which is defined by
We use p T ( x , y) to measure the degree of player x obtaining a “high” payoff. The payoff is also described by a fuzzy set. There are different fuzzy functions to combine the foregoing two kinds of fuzzinesses; here we use the following fuzzy function to denote the payoff P ( x , y) for x when it plays with y :
where S(x; cJ, w,) is the standard fuzzy set that corresponds to the payoff for a defector when it meets a cooperator. We then use a DTFCNN to model fuzzy spatial Dilemmas. The dynamics of this DTFCNN are given by: 1 . Fuzzy state equation
Notice that the state variable is a fuzzy set (usually a fuzzy number) instead of a signal (real number). This is a type-I1 FCNN. 2. Output equation Y,, ( t ) = f d
(X1/ ( t - 1 )>
(434)
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
where 1976)
,fd(.)
425
is a defuzzified function. Here, we choose f l l ( . ) as (Sanchez,
1: 1:
YPUiol) dY
fnG) =
(435 1 Pi(Y)dY
I f i is a triangularly shaped fuzzy set, then we have . f n ( i ( x ; c, WJ))= c
+ 3w I
(436)
At each iteration t , a center cell C;,i is replaced by a cell C ~ EI N , . ( i j ) whose output is the maximum in N , . ( i j ) . In the following simulations, a wrap-up boundary condition is used. Grayscale images with 256 gray levels are used to denote spatial patterns. The degree of whiteness denotes the membership value of a cell being a cooperator. Figure 48 shows cases when S(x; c,,, w,)= S(x;5 , 4 ) . Figure 48a shows the initial condition, which is of size 63 x 63 cells. Figure 48b shows the snapshot of the 20th iteration. Figure 48c shows the snapshot of the 150th iteration.
(d)
(CJ
( fJ
FIGURE48. Evolving process of fii7z.y spatial dilernmas with S ( x ; c ~ , w , ) = S ( x5.4). ; (a) Initial condition. (b) Output at / = 20. ( c ) Output at / = 150. ( d ) Output at / = 316. ( c ) Output at t = 317. ( f ) Output at t = 318.
426
TAO YANG
(c)
(d)
F l m m 49. Evolving process of fuzzy spatial dilemmas with S(.Y;c , , I V , ) = S(a; 10, 10). ( a ) Output at t = 455. (b) Output at / = 630. ( c ) Outptlt :It t = 845. (d) Output at f = 1000.
Finally, the pattern goes to a periodic 3 solution as shown in Fig. 48d, e and f of the 3 16th, 3 17th and 3 18th iterations, respectively. Figure 49 shows cases with S(x;c,,, w , ~= ) S(x; 10, 10).The initial condition is the same as that in Fig. 48a. Figures 49a, b, c and d show snapshots of the 4SSth, 630th, 845th and 1000th iterations, respectively. The evolution becomes chaotic. Because the payoff of cooperation becomes higher, the number of cells of high degree of cooperation increases. A very interesting phenomenon is that a large cluster of defectors can not exist for a long time because the high payoff of cooperation will soon change some of them into cooperators. This can not be observed in Fig. 48 where the payoff of cooperation is relatively low. Observe that the for mentioned type-I1 DTFCNN functions as a kind of fuzzy cellular automata (FCA) (Adamatzky, 1994).Although conventional cell-
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
427
ular automata (CA) are widely used in simulations of many natural phenomena such as fluid dynamics, diffusion, reaction-diffusion systems, populations, epidemics, etc., there are very few applications of FCA. The lack of a proper platform for studying FCA is the main reason for the lack of FCA applications. The DTFCNN is a very promising platform for FCA as shown in this section. REFERENCES Adainatiky, A. I . ( 1994). Hierarchy 0 1 l i m y cellular automata. Fuzy Sets cirrtl Sy.s~r~ri,s, 62: 167- 174. AiLenberg, N. N. and Aizenbcrg, I . N. (1994). CNN-like networks based on multi-v;ilued and universal binary neurons: learning and application to image processing. In Pro(.. Third IEEE lntrrrrtrtiorirrl Workshop O I I Ci~llrrl~rr Nc,irrtil Nrtworks trrrrl Their Applictrtioirs ( C N N A -W), pp, 1.53- 158. Ailenberg. N. N., Ai~enberg,I . N.. and Krivoshccv. G. A. (1996). CNN based on universal binary neurons: learning algorithm with error-corection and application to impulsive-noise filtering on gray-scale images. I n Proc. Forruth IEEE / r ~ ~ t ~ r r i ~ r t i Work.shop ~~ritr/ o r i Cc~llrrkirNlwrril N e f b i w k s r r r r d Thrir Aplicrrtioris, S e i ~ i l / (Sprliri, ~. Jurir 24-26. pp. 309 -3 14. Alcfeld. G. and Hcrzberger, J. ( 1983). Irltroduc~riorito / r i / r r i ~ rCoinl~rrttitiori.c. / New York: Academic Press. Anguita. M., Pelayo, F. J . , Fernandey. F. J.. and Prieto, A. (199.5). A low-power analog iinplenientation of cellular neural networks. In Frorti N t r r w d to Art~$ciri/ Nerrrtrl Cor?rprrtcitiorr. lritrnitrtioiirrl Workslzop on Arfifificitrl Ncro.trl Nctwwks. Proc.retlirig.s, pp. 736 - 743. Anguita. M.. Pelayo. F. J., Prieto, A,. and Ortega. J. ( 1993). Analog CMOS implenientation of a discrete time CNN with prograininable cloning templates. I Trriris. Circwit.\ crrrd S\.s/rin.> 11: Antilog iuid Di,qi/trl Si~qii~l Proce.s.siri,g.40(3):215-2 18. Anguita. M.. Pelayo, F. J.. Ros, E., Paloinar, D., and Prieto, A. (1996). VLSl irnpleinentationa of CNNa for image processing and vision tasks: single and multiple chip approach. In Proc. Fourtli IEEE ~iit~~riiirtioritil Workshop on Cellirliir Ntwrcrl Neniwrks m d Their Applic~itioiis, Srvillr, Sptrin, Jirnr 24-26, pp. 479-484. Anguita, M., Prieto. A., Pelayo, F. J.. Onega, J., CI t i / . (1991). CMOS implementation o f a cellular neural network with dynamically alterable cloning templates. I n Arr(fic,itrl Neurrrl Nrrworh. Ir~frriitrtioiitrl Work.sliop I W A N N 9 1 Procet4irig.s. pp. 260- 267. Arakawa, K. (1996). Median filter based on fuzry rules and its application t o image restoration. Fuzzy St,rs Nlltl Sy.stcII1.s. 7 7 3 - 13. Araujo. C. P.S. and Ritter. G. (1992). Morphological neural networks and iinagc algebra i n ar~ ~ i ~ dPruccssirrg //I. Proc. /he tificial perception systems. /rntr,qr Alg~brtr( i t i d M 0 ~ / J h I J / Olrirngr SPIE - Tlir Interr~otioi~ctl Sociery,fi)r Opricd Eri,qirrec~riiig.Snir Dirjio. C'A. USA. 1769(20- 2 2 ) : 128- 142. una, L., and Manpanaro, G . ( l995a). Chua's circuit can be generated ( I I I S . C'ircwits rrriil Sv.st~wi.~ 1: Fuiidriiiirntd Tliror? crrrtl App[it~trtiofi,s, 42(2):123-125. Ai-ena, P., Baglio. S.. Fortuna. L.. and Manganaro, G. (l995b). Hyperchaos from cellular neural networks. Elrc/rorlic.s Lrttrr,s. 31(4):2.50- 25 1 Arena, P., Baglio. S., Fortuna, L., and Manganaro, G. (1996). Generation of n-double scrolls via cellular neural networks. Iirtrrrrrrtioiirrl Jorrr. Circuit Tlieor:v arid Applictztioris, 24(3):24I -252. Artificial Lile Workshop. ( 1994). Ar/$c.itrl lip 111: Proc. Work.s/rop o i i Artjficial Ljfi.. Reading, MA.: Addison-Wesley.
428
TAO YANG
Back. T. and Kursawe, F. ( 1995). Evolirtionary Algorithrns fi)r FKZV Logic; A Briqf' O w r ~ ~ i r w . pp. 21 -28, River Edge. NJ: World Scientific. Baktir. 1. A. and Tan, M . A. (1993). Analog CMOS implementation of cellular neural networks. .~. m d Sysferns I/: Analog rnrd Digitul Signti1 Processinq. 40(3):200-206. IEEE T ~ N I ICircirifs Baktir, I. A. and Tan, M. A . (I99 1 ). Analog CMOS implementation of cellular neural networks. In Coml)ir/ercincl Injorni~itionSciences VI. Pro(.. I Y Y I In/rrncrtiontrl Syn?po.siun/,Oc~/.30- No\,. 2. pp. 825-834. Balsi. M. ( 1994). Hardware supervised learning for cellular and Hoplield neural networks. In World Congress on Neural Nencwk-Strn Diego. I YO4 Intemrrtionril Neirrnl Network Societ~, Annrrtrl Meeting, pp. III/45 I -4.56. F. (1994). Optoelectronic cellular neural Balsi. M . , Ciancaglioni. I.. Cimagalli. V.. and Gallu networks based on amorphous silicon thin f i l i i i technology. In Proc. Third lEEE Intrrnrr/iontrl Workslrop on Cellultir NeiirtiI Netnwrks trntl Tlirir Applicutions (CNNA-94). Dec. 18-21. pp. 399 - 403. Bang. S. H., Sheu, B. J.. and Wu. T. H.-Y. (1994). Paralleled hardware annealing of cellular neural nctworks for optimal solutions. In 1094 IEEE In/erncr/ionol Conferencr on Nrirrtrl Neti3.ork.s. IEEE World Congress oil C~~iirliiif~rfi~~~icil lnielli,~rnce,June 27- July 2, 4:2046-205 I . Baturone, I., Sanchez-Solano, S., Barriga. A., and Huertas, J. L. (1997). Implementation of CMOS fuzzy controllers as mixed-signal integrated circuits. IEEE Titnu. F r c y Systrni.s. 5( I ) : I - 19. Beccherelli, R.. de Cesare, G.. and Palma, F. (1994). Towards an hydrogenated amorphous silicon photo-transistor cellular neural network. In Proc. Third IEEE Intrrntrtiontil Work.slrop o i r Cdloltrr Neurcrl Ne/works trntl Their AlJp/ictrIion.s (CNNA-Y4), Dec. / 8 - 2 1 , pp. 351 - 362. Berger. T. W., Sheu, B. J . , and Tsai, R. H.-J. (1994). Analog VLSl implementation o f a now linear systems model of the hippocampal brain region. In Proc. Third IEEE Iirternt~tioirtrl Workshop on Cellirlrir Neiirtrl network.^ rnrtl Tlieir Applic,trtions (CNNA-94). Dec. 18- 2 I . p p . 47-51. Shi, B. E. (1994). Order statistic filtering with cellular neural networks. In Proc. Tliird IEEE Int. Workshop on Celli~lorNtwrtrl Ntmvork.s rnrtl 71ieir Applicritions. (CNNA-Y4). Dec. 18-21. pp. 441 -444. Betta. G. F. D., Grafti. S., Kovacs, Z. M., and Masetti, G . (19'93). CMOS implementation 0 1 an analogically prograrnmahle cellular neural network. IEEE Trcnn. Circnif,\ (rnd .Sy.s/cni.\ I / : Antr/og crntl Di,qittrl Signirl Proces.sing, 40(3):206-2 I S . Bey, Ir.. P.P., Yonce, D. J . , and Newcomb, R. W. ( 1993). Investigation of contrast enhancement by numerical methods for an optical cellular iictiI.;d network. I n Proc. -?tit//Mitlwr.st Synrlwxiinn on Circni/.s trntl S w t w w , pp. 582-583. Blanco. A,. Delgado, M . , and Requena. I . (1995a). Idenrificalion of fuzzy relational equations hy fuzzy neural networks. F'IK:J, Sc,t.s tnrd .Sy.stcnu, 71:2 IS -226. Blanco. A,, Delgado, M . , and Requena, 1. ( 199Sb). Improved fuzzy neural networks f o r solving relational cquations. F K : ~Sets untl Systmr.~,71:3 1 1 -322. Bojadziev, G . and Bo,jadiiev. M. (1995). F u x ~S ~ r s ,F n x y I,ogic,, App1icrrtioir.s. Singapore: World Scientific. Braspenning. P. J., Thui-jsman, F., and Wei.jters, A. J. M. M . (Eds.). (1995). Artjfic.ial Ne,irrtrl Nem>orks:Aii I i i t r o h c f i o i i to ANN Tlieorv orit/ Prtrctic,e. New York: Springer. Brucoli, M . , Carnimeo. L., and Grassi, G. ( 1995a). Discrete-time cellulai- ncural networks for associativc memories: a new design method via iterative learning and forgetting nlgorithnis. In 3Htlr Mithvst Synil~o.viinnoil Circuits crntl Sy,venr.s. Proceedings, pp. 542 - 545. Brucoli. M . , Carnimeo. L., and Grassi, G . ( 199%). Discrete-time cellular neural networks for nssociarive memories with learning and forgetting capabilities. IEEE Trtim. Cirt.rti/.s turd .Sy.sfew.\ I: Fnndtm~enrolTheor? rind Applicnrioiis, 42(7):396- 399.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
429
Brucoli, M., Carnimeo, L.. and Grassi, C. (1996). A global approach to the design of discretetime cellular neural networks for associative memories. Interrrcrtionol Jour. Circirir Theon) r i n d application.^, 24(4):489-5 10. Buckley, J. J. and Hayashi, Y. (1996). F ~ z Neural y Networks, Chapter 1 I , pp. 233-249, New York: McGraw-Hall. Cardarilli, G. C., Lojacono, R., Salerno, M., and Sargeni. F. (1993). VLSl implementation of a cellular neural network with programmable control operator. In Proc. 36th Midwest SyinI)o.siirrn on Circuits t i n d Svsterns, pp. 1089- 1092. Cardarilli, G. C.. Lojacono, R., Salerno, M., and Sargeni, F. (1992). A VLSl implementation of programinable cellular neural networks. In Artijicid Neural Networks. 2. Proc. I992 Interntitionril Conf2rence (ICANN-YZ), Sept. 1-7, 2: pp. 1491 1494. Cardarilli, G. C. and Sargeni. F. (1995). Very efficient VLSI implementation of CNN with discrete templates. E1ectronic.s Lcrters, 2Y( 14):1286- 1287. Chcllappa, R.,Wilson, C. L.. and Sirohey, S. ( 1995). Human and machine recognition of faces: a survey. Proc. IEEE, 83:705-740. Chua, L. 0. (1998). CNN: A V i s i o n of' Coinplexir)~.Singapore/River Edge, NJ: World Scientific. Chua. L. 0. (1992). CNN. 11. applications and VLSl circuit realizations. In Proc. .?5r/i Midwe.st S~~tnpo.siirm on Circuirs orid S?/.stenrs.AUK. 9 - 12, 1: 146- 149. Chua, L. 0. and Goras, L. ( 1995). Turing patterns i n cellular neural networks. Intrriirr~ionnl Jour. E1rctronic.s. 79(6):7 19-736. Chua. L. O., Hasler, M., Moschytz. G. S., and Neirynck, J. (1995). Autonomous cellular neural networks: a unilied paradigm for pattern formation and active wave propagation. IEEE T,aris. Circ,ui/s rind S.y.stetn.s I: Fundcit~ientrr/Tliroiy m d Applic~titions.42( 10):559- 577. Chua, L. 0. and Roska, T. (1993). The CNN paradigm. I tins. Circuits arid Systems I: Funrlt~mentd T h e o ~ ynntl App/iccrtioii.\, 40(3):I47 156. Chua, L. O., Roska, T., Kozek, T., and Zarandy, A. (1996a). CNN universal chips crank up the computing power. IEEE Circuits and Devices Magazine, 12(4):I 8 -28. Chua, L. 0. and Yang, L. ( I 98Xa). Cellular neural networks: Applications. IEEE Trrrns. C'irr.rtit.s nntl Systems, 35( 10):1273- 1290. Chua, L. 0. and Yang, L. (1988b). Cellular neural networks: Theory. IEEE Trtms. Circuits trnd Systein.~,35( 10):1257- 1272. Chua, L. O., Yang, T., Zhong. G. Q..and Wu. C. W. (1996b). Adaptive synchronization of Chua's oscillators. Int~~rnationirl Joirr. B$irctitiori trnd Clmos, 6( 1 ): 189- 201 . Chua. L. 0..Yang, T., Zhong, G. Q., and Wu, C. W. ( 1 9 9 6 ~ ) Synchronization . of Chua's circuits with time-varying channels and parameters. IEEE Ti-tni,~.Circuits tznd Systetns -1: F~rntkirnrnto/ Tlieoy crnd Applicurions, 43(10):862 868. Civalleri, P. P. and Gilli, M. (1992). Some atability properties of CNN's with delay. In CNNA '92 Proceedings. S~contiInrernatiotial Workshop on Celliilar Neurril Nr%vorks and Their App/icutions, pp. 94-99. Civalleri, P. P. and Gilli, M. (1994). Some dynamic phenomena in delayed cellular neural networks. Interntrtiontil Jour. Circuit T/ieory rind Applicntions, 22(2):77- 105. Civalleri, P. P., Gilli, M.. and Pandolf. L. (1993). On stability of cellular neural networks with delay. IEEE Truns. Ciriuit.s irrrd Systtwu I: F ~ r n t l t ~ n ~ ~ Tlleo~y w ~ i i / rind Applictitions, 40(3):157 165. Clarke, R. J. ( 1995). Digitrrl Conil~re.s.sior~ of Still Itntige wit1 Video. New York: Academic Press. Coli, M., Palazzari, P.. and Rughi. R. (1995). Design of dynamic evolution of discrete-time ow Articontinuous-output cellular neural networks. I n ICANN '95. Internritional COI!~'P~C.IICP ,ficirrl Neurul Networks. Neuroniines '95 Scicntifc Coqfr,riwce, Oct. 9- 13, 2:419-424. thods for image processing and pattern formation in Crounse, K. R . and Chua, L. 0. (199 cellular neural networks: a tutorial. Trrins. Circuits urid Systerns I: Funn'ctmentul Theory ~ r Applicrrtio~~.~, ~ d 42( 10):583-60 I. -
-
-
-
430
TAO YANG
Crounse. K . R. and Chua, L. 0. (1996). The CNN universal machine is as universal as a Turning machine. IEEE Truris. Circuits uritl Systems I: Fundamental T l t e o n ~wid A ~ ~ p l i c n t i o i ~ . ~ , 43(3):353- 355. Crounse, K. R., Roska. T., and Chua, L. 0. (1993). Image halftoning with cellular neural networks. IEEE Truns. Circuitv und Systems 11: Antilog find Digital Signcrl Processing, 40(4): 267 -283. Cruz, J . M. and Chua, L. 0. (1991). A CNN chip for connected component detection. IEEE Truus. Circuits rind Systems, 38(7):812-817. Cruz. J . M. and Chua, L. 0. (1995). Application o f cellular neural networks to model population dynamics. IEEE Truns. Circuits und Sy.s/eni.s I: F u ~ ~ d u m e n t 711eop r~l und App/ication.s, 42( 10):715-720. Cruz, J . M.. Chua. L. O., and Roska, T. (1994). A fast, complex and efficient test implementation of the CNN universal machine. In Proc. Third Intemntionul Workshop 011 Cellulur N c w d Nr~tworkscrud Their App1icution.s (CNNA-94). Dec. 18-21, pp. 61 -66. Csapodi, M. and Roska, T. (1996). Dynamic analogic CNN algorithms for a complex recognition task -a first step towards a bionic eyeglass. Internutional Jour. Circuit T h e o p and Applicutiori.y, 24( I ): 127 - 144. Dalla Betta, G. F., Graffi, S., Masetti, G . , and Kovacs, Z. M. (1992). Design of a CMOS analog programmahle cellular neural network. In CNNA '92 Proc.eedings. Set.oild lriterizatinnul W(~rk.sIiopon Cellulur Neurrrl Networks n r i d Their Applicutions, Oct. 14- 16, pp. 15 1 156. Davidson. J . L. ( 1992). Simulated annealing and morphology neural networks. Inluge Algebrrr turd Mor[~ho/ogica/Iiriage Proce.ssirig Ill, P roc. SPIE - The Internutioizcil Society f;w Opficul E~?gblec,ri~lg. SUII Diego, CA, USA, 1769(20-22): I 19- 127. Davidaon, J . L. and Hummer, F. (1993). Morphology neural networks: an introduction with applications. Circuits. Systems, a r i d Signul Proceu.sin~, 12(2):177- 2 10. Davidson, J . L. and Ritter, G . X. (1990). A theory of morphological neural networks. Digitcrl Opticul C o i n p h g 11. Proc. SPIE - The Interiiutionnl Society ,fi)r Opticul Engineerirzg. 1215(17- 19):378-388. Davis, L. ( 1991). Hundhook of' Genetic, A1gorithm.s. New York: Van Nostrand Reinhold. Destri, G. and Marenzoni, P. (1996). Cellular neural networks as a general massively parallel computational paradigm. lriternutioritrl Jour. Circuit Theoty mid Applit~utions,24( 3):397 407. Diederich. J . (Ed.). (1990). Arrijiciul Neurul network^\: Concept Leciruing. Los Alamitos. CA: IEEE Computer Society Press. Doan, M.- D., Glesner, M., Chakrabaty, R., Heidenreich, M., and others. (1994). Realisation of a digital cellular neural network for iinage processing. In Proc. Tlzird IEEE Itlrernrrriond Workshop on Cellulur Neurul Nehvorks tint! Tlzeir Applicutions (CNNA-94). Dec. 18- 2 I . pp. 85-90. Doniinguez-Castro, R., Espejo, S., Rodriguez-Vazqucz, A., and Carmona, R. (1994b). A CNN universal chip in CMOS technology. In Proc. Third IEEE Interi~arioncrlWorkshop 011 Cellulur Ntwral Nefn.orks and Their App1i'licwtion.s(CNNA-Y4). Dee. 18-21, pp. 91 -96. Dominguez-Castro, R.. Espejo, S., Rodriguez-Vazquez, A,, Garcia-Vargas, 1.. and others. ( 1994). Sirena: a simulation environment for CNNS. In Proc. Third IEEE lriterriutiori~il Workshop on Cellular NeuruI Nehvorks crnd Their App1ication.s (CNNA-94). Dec. 18-21, pp. 4 17 422. Drivel-, R. D. ( 1 977). Ordintip arid Delny Djflertwtial Equation.~.Berlin: Springer-Verlag. Duhois, D. and Prade, H. (1980). Fuzzy Sets and Systems: Tlleory arid Ap~~/iccltioil.s. New York: Academic Press. -
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
43 1
Espejo, S., Carmona. R., Dominguez-Castro, R., and Rodriguez-Varquez, A,, (199621). A CNN iniiversal chip in CMOS technology. lnterntirional Joiir. Circuit Theory crnd Applicrrtions. 24( l):93- 109. Espejo, S., Cannona, R., Dominguez-Castro, R.. and Rodriguez-Vazquez, A. ( I996h). A VLSIoriented continuous-time CNN model. Internntionul Jour. Circuit Theon rind App/icntions, 24(3):341-356. Espejo. S., Dorninguez-Castro, R., Carmona, R., and Rodriguez-VaLquez, A. ( 1 9 9 4 ) . A continuous-time cellular neural network chip for direction-selectable connected component detection with optical image acquisition. In Proc. Fourth lriternutiorial Cor!fererice on Mic.ro-rlei.troriic.s j b r Neurd Networks and Fiiz,-?' Systrws, Sepi. 2 6 - 2 8 , pp. 383-391. Espejo, S.. Dorninguez-Castro, R.. Carmona, R., and Rodriguez-Vazquez, A. ( 1994d). Cellular neural network chips with optical image acquisition. In 1994 IEEE Itrtrrncitiorrtrl Cotfi,rence on Neurcil Networks. IEEE World Congrexs on Conipiitationcil Intelligenc~e.June 27-Jrily 2. pp, I X77- 1882. Espejo, S., Dominguez-Castro, R., Rodriguez-Vazquez. A., and Carmona, R. ( I994e). Weightcontrol strategy for programmable CNN chips. In Proc. Third IEEE Irilerntitiotiul Work.shop on Cellrrlnr Neurul Netkrorks cind 7%eir App1iccition.s (CNNA-94). Dec. 18-21, pp. 405-410. Espejo, S., RodrigueL-Vazquez, A.. Dominguez-Castro, R., Huertas, J . L., and othcrs. ( 1994). Sinartpixel cellular neural networks in analog currcnt-mode CMOS technology. IEEE Jour. Solid-Stute Circuits, 29(8):895 -905. Espejo, S., Rodriguez-Vazquez, A., and Huertas. J. L. (1992). Design and testing issties in currentmode cellular neural networks. In CNNA '92 Proc~wdin,gs,Sei.ontl lntc~rnc~tiond Work.shop on Ci4lulur Neurcil Ncrworks atid Their Applictitions, Ocr. 14- 16, pp. 169- 174. Espejo, S., Rodriguez Vazquez, A., DominguezCastro, R., Linares, B.. and others. ( 1993). A model for VLSl implementation of CNN image processing chips using current-mode techniques. In (Proceedings)I 993 IEEE Inteniutioritil Syniposiuni 011 Circuit.s trnd Swterrrs, 2:970973. Farmer. D., Tofloli, T., and Wolfram. S. (Ed\). ( 1984). Cdlirlnr Automtrtti: Procwding.7 of un Ititrrdi.\ci~~liirtrr~ Work.r/rop.New York: North-Holland Physics Pub. Faure, B. and Mazare, G. (1990). A VLSl cellular array for the processing of back-propagation neural networks. In A1gorithnr.s rind P(irrrlle1 VLSl Architectures. Lectures ( i d 7irtorial.s Pre.senrod nt the Iriterrrcttionrzl Workshop, Jurw 10- 16, pp. I93 -202. Finocchiaro. M. and Perfetti. R. ( 1995). Relation between template spectrum and stability of cellular neural networks with delay. Elrctrorric,.s Lrtters, 31(23):2O24- 2026. Forrest, S. (Ed.). (199 I ) . Enrergerit Cornputotion. Cambridge, MA: MIT Preas. Fruehauf, N., Chua, L. 0..and Lueder. E. (1992). Convergence of reciprocal time-diacrete cellular neural networks with continuous nonlinearities. In CNNA '92 Proceetlirig.\. Second Interncrtiontil Workshop on Cellirltrr N e r d Nerkiwks trnd Their App1icnrioii.s. Oct. 14- I6 pp. 106- 1 1 I. Fruehauf, N., Lueder, E., and Bader, G . (1993). Fourier optical realization of cellular neural networks. IEEE Truns. Circuits and Sy.itm.s I/: Antilog cind Digittil Signal P roc.r.s.\ing:,40(3):156162. Fruehauf, N. and Lueder, E. (1990). Realization o f CNNs by optical parallel processing with spatial light valves. In 1990 IEEE Irireniritioncil Workshop on Cdlulur Ncurrrl Networks und Their Applicririons, CNNA-YO. Lkc. 16- 19. pp.281 -290. Furukawa, M. and Yamakawa, T. (1995). The design algorithms of membership functions for a fuzzy neuron. Fuzzy S P I S and Systems. 71329 343. Galias, Z. ( 1992). Designing discrete-time cellular neural networks for the evaluation of local Boolean functions. In CNNA '92 Proceerlirij+v. Second lnterndiond Workshop on C'ellirltir Neurul Networks and Their Applications, Oct. 14- 16, pp.23-28. -
432
TAO YANG
Galias, Z.(1993). Designing cellular neural networks for the evaluation of local Boolean functions. /EEE Trttns. Civczrits ( r i d Systeins /I: Antilog c i n d Digitti1 SiRntil Proces.sing. 40(4):267283. Galias, Z. and Nossek, J . A. (1994). Control of a real chaotic cellular neural network. In Proc,. Third IEEE Internntionnl Workshop on Cellirlrrr Nrwrtil Nemork.r cind Their Ap[J/ictitions (CNNA-94), p. 345. Gilli. M. ( 1993). Strange attractors in delayed cellular neural networks. 1EEE Tr(JIf.s. Circ~iri/.s (ilJt/ .~i'.stfwI.\1.' ~ U l ~ d ~ l t l T~/ l P~Ol n~ f fi l /l d A/J[)/iC~ltiOll.S, 40( 1 1 ):849- 853. Gilli. M. (1994). Stability of cellular neural networks and delayed cellular neural networks with nonpositive templates and nonmonotonic output functions. IEEE Trtins. Circuits itnd Sys/cin.s I: F~rritkinienttr1 T / z e o n nnd Applictitions, 41(8):5 I X-528. Gilli. M. ( 1995). A spectral approach for chaos prediction in delayed cellular neural networks. /n/ernu/ion~i/ Jorrr. B(furcnriorr and Clitios irr App/iet/ S'cience.s trnd E~~,yiiiet,riiig. 5(3):86Y - 875. Goldberg. D. E. ( 1989). Genetic Algorithtns in Srtrrt.h, Op/inrix//iotf nnd Mrrchinr Letiming. Reading: Addison-Wesley. D~nunzictr1Svs/cni.s. trnd N w r d Goles, E. and Martinez, S. (Eda.). ( 1994). Cellirliir Aritonttrt~~, Nvtworks. Boston: Kluwer Academic Puhlihhers. Goutsias, J . and Schonfeld, D. ( 1989). Image coding via morphological traiisformation: a general theory. In Pro(,. IEEE Cnnf: Cortipiter Vision crnd Pottern Reco,ynition, June Gunsel. B. and Curelis, C. ( 1995). Supervised learning of smoothing parameters in image restoration by regularization under cellular neural networks framework. In Proceedings. / n / c v n d w i r i / Conft,rence on hrirgr Proce.wing. pp. 470-473. Gutowitz. H. (Ed.). ( 199 I ). Cr/lii/cir Airtonrafti: Tlieorv tint1 Ekp~~rirnenr. Cambridge, MA: MIT Press. Guzelis, C. ( 1992). Supcrvised learning of the steady-state outputs in generalized cellular nelworks In CNNA '92 Procrer1irig.r. .Sec.onrl Intern~rtioncr/Workshop on Cellultir Nerrr-(11Nrrrwrk., tnrd Their App/ications. pp. 74-79. Guzelis. C. ( 1993). Chaotic cellular neural networks made of Chua's circuits. Jour. Circuits, S ~ . s / e ~ tint1 n s Coinpiiters. 3(2):603-6 12. GuLelis, C. and Karamahmut, S. ( 1994). Recurrent perceptron learning algorithm for completely stable cellular neural networks. I n /'roc,. Third /EEE /nfc~rncrtionrr/Workshop on Ce1lulur Neural Nerrwrk.~and Their App1icntion.s (CNNA-Y4), pp. I77 - 182. Halonen. K., Porra, V., Roska, T., and Chua, L. ( 1 9 9 1 ~ ) Programmable . analog VLSI CNN chip with locnl digital logic. In 1991 IEEE /n/ernutiono/ Synposiinn o i i Circ.nits t n i d Systeins. Juiw 11-14, pp. 1291-1294. Halonen, K., Pona. V., Roska, T.. and Chua, L. (1990). VLSI implementation o f a reconfigurable cellular neural network containing local logic (CNNL). I n 1990 1EEE Interndomi/ Works/zop 0 1 1 Crllultir N c w d N ~ t w o r k scrnd Their App/icritiorr.s, CNNA-YO, Dec. 16- 19 pp. 206-2 I S . Halonen. K., Radvany, A,. and Roska, T. (1991a). The control strategy of a dual (progrnmmahle analoglogical) cellular neural network chip. I n Proc. Second Interntrtionul Corlfi,r-c.rrce on Micr-oc~Iec~troriies ,for Neirrnl Ne/\twks, p. 25 I . Halonen, K. and Vaananen, J . (1990). The non-idealities of the IC-realization and the stability o f CNN-networks. In 1YYO IEEE Intrrntrlionol Work.\/rop on Celliilrrr Nercrcrl Netrwrks tin(/ Tlwir A/Jp/~cnf~oif,s, CNNA-YO, Ilet.. 16- 19. pp. 226-234. I-ialonen, K.. Vaananen. J . , Porra, V., Roska, T., rt ( I / . (1991b). VLSI-impleinentatior1 of a programmable dual computing cellular neural network processor. In Artificitrl Neiri-ti/ N c . t w ~ ~ k s . Proc. 1991 1imrnntionci1 Conference, pp. 1581 1584. Hanebeck, U . D. and Schmidt, G. K. (1996). Genetic optimization of f u ; / ~ networks. y FK:Y Sets ( i n d Sv.strnr.s, 79:59-68. -
FUZZY CELLULAR NEURAL NETWORKS A N D THEIR APPLICATIONS
433
Hansen, L. K. (1992). Boltzmann learning of parameters in cellular neural networks. In C N N A '92 Procretlirigs. Secorrd / n t ~ ~ i - ~ ~ ~ i t Workshop iorur/ 011 Crllulur N r r t r d Network.\ trritl Their A / ~ p / i c ~ h i i i pp.62 , s , - 67. I-laralick, R. M., Sternberg. S. R., and Zhuang, X. (1987). Image analysis using mathematical morphology. IEEE Trtrrrs. Ptrttern Antrl. MacVr. / n t d / . , PAM1-9:.532- 550. Harrer, H. ( 1993). Multiple layer discrete-time cellular neural networks using time-vai-iant tenitrrid Digitd Sigricrl P rocr.ssirrg, 40( 3): I9 I plates. lEEE Trms. Circuir.s rrnd Systern.~/ I : A~itrlo~q 199. Harrer. H. and Nossek, J . A. (1992a). New test results of a 4 by 4 discrete-time cellular neural network chip. In CNNA '92 Proc~rr.cliiicq.s.Second Iriti~rircrtioritrlWorkshop o r i Ce//cr/(rrNrurrrl Networks rrrid Their App/ictrtiori.v. Oct. 14- 16 pp. I63 - 168. Harrer. H. and Nossek, J . A. ( I 992b). Discrete-time cellular neural networks. Irr~rrucrtiorrdJour. C'ircrtit Theory and App/ic~trtiori.s.20(5):453-467. I-larrer, H., Nossek, and R. Stclzl, ( 1092). An analog implementation of discrete-time cellular neural networks. lEEE Trcnis. Neirrcrl Networks. 3(3):466-476. Harrer, H., Nossek. J. A., and ZOLI,F. (1991a). A learning algorithm for discrete-time cellular neural networks. I n Proc.. IJCNN'91, Sirr,qqxire, pp.717-722. Harrer, H.. Nossek, J. A., and ZOLI,F. (1991b). A learning algorithm for time-discrcte cellular neural networks. In I99 I IEEE It~trrriritiorinlJoirit Cor?fireric.r or1 Nerrrrrl NctktwLs, pp.717-722. Harrer, H.. Venetianer, P. L., Nossek, J . A,, Roska. T., and others. (1994). Some examples of preprocessing analog images with discrete-tinie cellular neural networks. In Proc. Third IEEE / ~ r t i ~ r ~ i ~ ~ Workshop t i t i ~ i ~ i / on Cellulur Nrrrr(r/ Nct\iwrk.s cirid Their App/iecrti(im ( C N N A -94). Dec. 18-21 pp.201 -206. Hnssoun, M. H. ( 1995). Fundmutwfo/,s of ArtiJicid Nrirr~il N e h i w k s . Cambridge, MA: MIT Press. Hayashi, Y., Buckley, J . J . , and Czogala. E. (1993). Fuzzy neural network with f u i z y signals nnd weights. Irrt. .lour. /ntrl/igerit Syst., 8:527-537. He, C. and Ushida, A. (1994). Iterative middle mapping learning algorithm for cellular neural networks. /E/CE Trriris. Furirlrrtirrnro/.s o f Electrorrics. Cor~imuriic.ntionsarid Conipirrr Scirrrws. 1yn(unicd Systeins Approwh to Machine Intolligonco. Englewood Cifls, NJ: Prentice-Hall. Kowalski, J.. Slot, K., and Kacprzak, T. ( 1994). A CMOS current-mode VLSI implernentation of cellular neural network for an iinagc objects area estimation. I n Pro(.. Third IEEE Interiitrtioncrl Work.shop 0 1 1 Cellulrrr Ncwrul NetMwks tint1 Their A p / ~ l i c ~ t i ~(CNNA-Y4), ii.~ l k . 18-21, p. 351. Koiek. T. and Roska. T. (1996). A double time-scale CNN lor solving two-diinensional NavierStokes equations. Internntiontrl ./our. Circxit Theoiy rnirl Applications, 1(24):49- 55. Koiek. T., Roska, T., and Chua, 1., 0. (1993). Genetic algorithm for CNN template learning. IEEE Tr(in.s. Circuits and Sy.r/rnts I: Funtkrnienttil Throi? mid Apldiccrtions, 40(6):392-402. Kricg, K. R. and Chua. L. 0. (1990). Hardware and algorithms for the functional evaluation of cellular neural networks and analog arrays. I n 1990 IEEE /nterntrrioml W~irk.slzopon Cellulnr Neur(i/ Nehvorks and Their App/i(~(i/ioif,$, CNNA-YO, Dec. 16- 19, pp. 169- 17 I . Kulller, S. W.. Nicholls, J. G., and Martin, A. R. (1984). /-~r(~n7 Ncuroii to Br(ri/i: A Cdlultir Appro(ich to the Function of' rhr Nervous Sy.s/e~n,(Second Edition). Sunderland, MA: Sinauer Associates. Kiilkarni, A. D. ( 1994). Artificitrl N e u r d Networks fi,r Irnn,qr Understcmding. New York: Van Nostrand Reinhold. Lai. K. K. and Leong, P. H. W. (1996). Implementation of time-multiplexed CNN building block cell. In Proc. Fifth Intorncitiontil Conjermce oti Microelectronics fi)r Nrurnl Networks (ind I.'rr:iy Systenis. MicroNeuro'Y6, Feb. 12- 14. pp. 80-85. Lai, K. K. and Leong, P. H. W. (1995). An arca efficient implementation of a cellular neural network. In Proceedings. IYYS. Second NCW Zerilnritl Iritrrntrtiorrcil Two-Strerim Coif(,reni.e on Artijkitrl N r u r ~ Networks l rrnd E.xperf S.v.sf(,in.s.Nail. 20-2-?, pp. 5 1 -54. Lai. K. K.. Leong. P. H . W., and Jahri, M. A. ( 1995). Analogue CMOS VLSl implementation of cellular neural networks. In Proc. Si.rth Austrtrlicui Coiifewnce U I I Neurul NetMwrks (ACNN'YS) h'eh. 6 - 8 , pp. 17-20. Langton. C. G. (1995). Art$cicil LVe; An Overview. Cambridge, MA: MIT Press. Lec, C.- C. and de Gyvez, J. P. (1994a). Single-layer CNN simulator. In 1994 Swiposiuni 011 Circuirs rind Systems, Mtrv 30 - June 2, 6:217-220. Lcc. C. C. and Pineda de Gyvez, J . (1994b). Time-inultiplexing CNN simulator. I n 1994 /EEE Intrrnritior~cilLYvinpo,siiini017 Circuits nnd Systeni.~.,Mtiy 30-Jurie 2, 6:407 -4 10.
436
TAO YANG
Lee, J . S. J., Haralick, R. M.. and Shapiro, L. G. (1987). Morphological edge detection. IEEE ./our. Rohotics and Automar.. 3(2). Lee. S. C. and Lee, E. T. (1974). Fuzzy sets and neural networks. Jour. Cybeniet., 4:83-103. Liin, D. and Moschyt7. G. S. ( 1994). A programmable, modular CNN cell. In proc. Third IEEE Internutionul Work.shop OII Cellulur New-irl Networks unrl Their App/icfrtions (CNNA- Y4), Doc. I K - 2 / , pp. 79-84. Lomno. M., Herrera. F.. and Verdegay. J. L. ( 1995). Generuting Fuz:y Rules from E.xtnnplrs U.sing Genetic A1goritlnn.s. I n Yager, R.R. and Zadeh, L.A. (Eds.). Fuzzy Logic and Soft Computing. pp. 21 -28, River Edge, NJ: World Scientific. Magnussen, H. and Nossek. J. A. (1992). Towards a learning algorithm for discrete-time cellular neural networks. In CNNA '92 Proceedings. Second lnternu/iorial Workshop 0 1 1 Cellirlur Neitrul Netkiwks und Their Applicutiom~pp. 80- 85. Magnussen, H. and Nossek. J. A. (l994a). A geometric approach to properties of the discretetinie cellular neural network. IEEE Truns. Circuirs und ,Jy.s/enis I : Fundmrientrrl T/icor-y tnid Applicutions, 41( I0):625 -634. Magnussen, H. and Nossek, J. A. ( 1994b). Global learning algorithms for discrete-time cellular neural networks. In Proc. Third IEEE Internutionul Workshop on Crllulur Neitrirl Networks untl their App/icutiorrs (CNNA-94).pp. I65 - 170. Magnussen, H.. Papoutsis, G., and Nossek, J. A. ( 1994). Continuation-based learning algorithm lor discrete-time cellular neural networks. In Proc. Third IEEE hiternntionrrl Worksliop oii C'ellulor Nwrul Net!.twk.s rind Their Applicutions (CNNA-94). pp. 17 1 - 176. Mancuso, M.. Luca, R. D.. Poluzzi, R., and Rizzotto, G. G. (1996). A fuzzy decision directed filter for impulsive noise reduction. Fuzzy Sets mid Sy.s/enz.s, 77: I I I - I 16. Maragos, P. A. and Schafcr, R. W. ( 1986). Morphological skeleton representation and coding of binary images. / E E E Truns. Acoir.rt. S p r c h S i p i d Proc?s.sing, 34(5):1228- 1244. Marks 11, R. J., Oh, S., Arabashahi, P., Caudell, T. P., Choi, J. J., and Song, B. G. (1992). Steepest descent adaption of rnin-inax fuzzy if-then rules. In Proc.. IJCNN, 3: 47 1-477, Beijing, China. Marks 11. R. J. (Ed.). (1994). Firzzy Logic, Technology und Applic-irtions. Piscataway, NJ: IEEE Press. Memrd, M., Parisi, G., and Virasoro, M. A. (Eds.). (1987). Spin G/as.s Tlleorey nnd Beyond. Singapore: World Scientitic. Mizutani. H. (1994). A new learning method for multilayered cellular neural networks. In /'roc. Third IEEE Inrernutionul Worhshop on Cellulur Neural Networks rind Their App1icution.s (CNNA-94).pp. 195-200. Los Alamitos. Morgan. N. (Ed.). ( 1990). Art$cirrl Nri~rulNetworks :E/c,ctronic,Iin~~lrmrrrrcrfiow.s. CA : IEEE Computer Society Press. Nakagawa, Y . and Rosenfeld, A. (1978). A note on the usc of local min and niax operations in digital picture processing. IEEE Trans., Syst., MNII.Cyhern.. SMC-8:632-635. Nemes, L. and Roska, T. (1995). A CNN model of oscillation and chaos in ant colonies: a case und Sy.yterns I: Fnndumentul T h e o y and Applicrrtiori.s,42( I0):74 I study. IEEE Trtrns. Circ~/it.s 745. Nemes, L. Toth, G., Roska. T. and Radvanyi, A. (1996). Analogic CNN algorithms for 3D interpolation-approxiniation and object rotation using controlled switched templates. !ntrrnutionul Jour. of Circuit Theory und Applicutions, 24(3):283 - 300. Nola. A. D., Pedrycz, W., and Sessa, S. (1995). Fuzzy relational structures: The state-of-art. F ' r i : q ~ Sets and Systems, 7 5 2 4 I - 262. Nossek, J. A. (1994). Design and learning with cellular neural networks. In Proins I: Fiin~ltiinrnttil Tlieor? rind Applicatiori.s, 40(3): I74 - 1 X I . Perfetti, R. ( 1993a). On the convergence of reciprocal discrete-time cellular neural networks. IEEE Trans. Circiiits m e 1 Sysfrrns 1: F~rridumentrilTheory anti Applicririons. 40(4):286 2x7. Perfetti. R. ( I993b). Relation between template spectrum and convergence of discrete-time cellular neural networks. Electronics Letters, 29(25):2208- 2209. Perfetti. R. (1994). On the Op-Amp based circuit design of cellular neural networks. httt~n~ntioritil Jour. Circuit the on^ e n i d Applic~titions,22(S):42S-430. Perfetti, R. (1995). Some properties of the attractors of discrete-time cellular neural networks. I I ~ ~ ( W I U ~ ~ O I IJoirr. NI Circuit T h e o n r i n d Applic.ution.s, 23(5):485-499. Pham, C.- K.. Ikegami. M., and Tanaka, M. (I995a). Discrete time cellular neural networks with two types of neuron circuits for image coding and their VLSl implementations. lElCE Trrirrs. Fiindcnnentci1.s o j Electmiiic.s, Cr~riirii~iniccitions und Computer Scierires, E78-A(8):978-988. Pham, C.- K., Kimura, T., Ikegami, M . , and Tanaka. M. (1995b). Pulse coded cellular neural network and it's hardware implementation. In I Y Y S IEEE Internritiontrl Cov~ferenct,o i i Neitrril N o v . 2 7 - D e c . 1. 4: 1590- 15Y4. Nehrorks Proc~eeding.~, Pineda de Gyvez. J. (1994). XCNN: a software package for color image processing. In Proc. Third IEEE Internntioncil Workshop on Cellulrir Nenrul N e m o r k . r cine1 Their App1icritioii.s ( C N N A - Y 4 ) . Drc. 18-21, pp. 219-224. Piovaccari. A. and Setti, G. (1994). A versatile CMOS building block for fully analogicallyprograminable VLSl cellular neural networks. In Proc. Tlzird IEEE Internc~rionuIWorkshop on Celliilrir Neural Networks cind Tlipir A p p l i c ~ i t i o m( C N N A - Y 4 ) , Drc. 18-21. p. 347. Pitas, I. and Vanetaanopoulos. A. N. (1991). Nonlinecir Digirril Fil/ers: Principles ( n i d Applictitioris. Dordrecht: Kluwer Academic Publisher. Pitas, 1. and Venetsanpoulos, A. N. ( 1990). Morphological shape decomposition. IEEE Tr(in.s. Puttern. Anal. Machine lntell., 12( I ):38-45. Pol, S. A,. and King, R. A,. (1981). Image enhancement using smoothing with fuzzy sets. I Trms., Syst., Mnri, Cylxrn., SMC-ll:494-501. Poliakov, G. 1.. (1972). Neurori Strnctiirc. of the Hrriiri. Cambridge, MA.: Harvard Llniversity Presb. Proceedings. ( 1990). 1 9 9 0 IEEE Intrrncitionul Wor-kshop on Celliil~rNeurtil Networks nntl Their Appliceitions, CNNA-YO: Proceedings. Piscataway, NJ: IEEE Service Center. Proceedings. ( 1992). C N N A '92: Proc. Second Internritioricil Workshop on Celliilrir NeiirriI Networks anti Their Applictiti(117.s. Piscataway, NJ: IEEE Service Center. Proceedings. ( 1994). Proc. Third IEEE Interncitionril Work.shop on Cellirltir Ncwrtil Nrtwork.\ riritl Their App1icLition.s ( C N N A - 9 4 ) . Piscataway, NJ: IEEE Service Center. Proceedings. ( 1996). Proc. Fourth lEEE Internntional Workshop oii Cellitlcir Nrrtrril N e n w r k . ~ (in(/ Their App1ic~tition.s( C N N A - 0 6 ) . Piscataway, NJ: IEEE Service Center. Proceedings. ( 1998). Proc. F f t h IEEE Interncitioricil Workshop on Crllulcir Neurul N o m w k s cind Their Applicotions (CNNA-YR). Piscataway, NJ: IEEE Service Center. -
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
439
Puffer, F., Tetzlaff. R.. and Wolf, D., (1995). A learning algorithm for cellular neural networks (CNN) solving nonlinear partial diffcrential equations. I n PIYJC.IYYS ZJRSI liiternritionr~l ystenis, rind E1ectrorlic.s. ISSSE '9.5, pp. SO I -504. Raffo, L., Sabataini. S. P.. and Risio, G. M. (1996). A prograininable VLSI architccturc based on multilayer CNN paradigms for real-time visual processing. Irrtrrnaliorial .lour. Circuit Theorv arid App/icotiorl.F, 24(3):357 - 367. Rekeczky. C.. Nishio, Y., Ushida, A., and Roska, T. ( 199%). CNN based adaptive smoothing and some novel types of nonlinear operators for grey-scale image processing. In Proc. 19YS /tltrrnLl/i(JllU/ $vn!po.siuni on Nonlinear Tl7eOr) ( / ! i d I t s Applications (NOLTA '95). Rekeczky, C . , Nishio, Y.. Ushida, A,, and Roska, T. (199%). CNN based adaptive smoothing and some novel types of nonlinear operators for grey-scale image processing. In 199.5 Intrrnr~tiona/Svmpo.viuin 011 Nonlinear Tlieoi?; rind I t s Applicutiom (NOLTA 'YS), p p . 683-688. Kekeczky, C., Ushida, A,, and Roska, T. ( 1 9 9 5 ~ ) .Rotation invariant detection of moving and standing objects using analogic cellular neural network algorithms based on ring-codes. IEICE arid Conrputrr Scirnces. E78-A( 10): Trrins. Fuiidcnnental.~of Electmnicx, Coinrr~ui~ic~rrtioii.s 1.116- 1330. Rel,jin, B., Serdar. T., Kostic, P., and Pavasovic. A. (1995). CMOS VLSI realization of voltagemode programmable analog ccllular neural network. In IYYS 20th lnternationul Conference on Microe1ec~troiiic.s.Proccwlings, Sept. 12- 14, 2:497 505. Rodriguez-Vazquex, A,. Espejo, S., Doininguez-Castro, R., Huertas. J. L., et rrl. ( 1993). Currentmode techniques for the implementation o i continuous- and discrete-time cellular neural networks. IEEE Trans. Circuits and Sy.stein.s I / : A n d o g rind Digitrrl Sigiid Processbig, 40(3): 132- 146. Rodriquez-Vazquez, A,, Dorninguez-Castro, R., and Huertas, J. L. ( 1990). Accurate design of analog CNN in CMOS digital technologies. I n IYY0 IEEE Oitc~rnationalWorksliop on Cc,//ulrir Neunrl Netv1wk.s rind Their Appli'licrition.s. CNNA-90, Dec. 16- I Y, pp. 273 - 280. Roska, T. ( 1992). Programming CNN: a hardware accelerator for simulation, learning, and realon Circ14it.srlrid Sv.s/ein.s,pp. 437-440. time applications. In P ~ o c .35th Mirfwest Syrrip~~.viuni Roska, T. (1994a). Analogic algorithins running on the CNN universal machine. In Pmc. Third Workshop on Cellular Neurctl Netktwks and Their App1iccitiorr.s (CNNA-Y#), IEEE Interni~fiorcr~l pp. 3-8. Roska, T. (1994b). The CNN universal machine-a summary of an analogic supercomputer chip architecture for very high speed visual processing. In 199.1 CERN Sclrool of Coniputing Proceedings (CEKN 95-01). I994 CEKN School of Cotnl)u/itrg Proceedings, pp. 295 - 297. Roska, T., Bartfai, G., Szolgay, P.. Sziranyi, T., and others. (1990). A hardware accelerator board for cellular neural networks: CNN-HAC. In I Y Y O IEEE Internutioricil Workshop o i i Cellirlrir Neunil Networks rind Their Appliccrtions, CNNA-YO, 1 k ~ 16. 19, pp. 160- 168. Roska. T.. Bartfai, G., Szolgay. P., S7il.anyi. T., and others. ( 1992a). A digital inultiprocessor l Circ.rrit hardware accelerator board for cellular neural networks: CNN-HAC. I i r t e i ~ i i t r t i o i i r ~four. Theory [ M I Applicutions, 20(5):589 - 599. Roska, T., and Bartfai. G. (1990). CNN-HAC: a digital multiprocessor hardware accelerator for general cellular neural networks. In Hirngtrrinn Acnd. Sci., Burkipst, Hungriry (Technical report). Roska, T., Boros, T., Radvanyi, A., Thiran, P., and others. ( 1992b). Detecting moving and standirig objects using cellular neural networks. Interntitionul Jour. Circuit TI~eoryrind Appliccitions. 2O(S):613 -628. Roska, T., and Chua. L. 0. (1990). Cellular neural networks with nonlincar and delay-type tcrnplate elements. In lYY0 IEEE Intrmutioncil Workshop on Cellulrir Neurtrl Nerworks und Their Applicritions. CNNA-YO. pp. 12- 25. -
440
TAO YANG
Roska, T.. and Chua, L. 0. ( 1 9 9 2 ~ ) The . CNN universal machine. I . the architecture. In CNNA '92 Proceedings. Second Interntrtioiicrl Workshop on Cellulrrr Neurul Nehvorks and Their Applications, pp. I 10. Roska. T., and Chua. L. 0. (1992d). The CNN universal machine. 11. programinability and applications. In CNNA '92 Pniceeditigs. Secorrd Iittemutionul Workshop on Cellulrrr Neurcrl Network.s cirrd Their Applications, pp. 18 I - 190. Roska, T., and Chua, L. 0. (1993a). Cellular neural networks with nonlinear and delay-type template elements and nonuniform grids. Iriternationnl Jour. Circuit Theor?: unci App/ictrtioii.s, 20(5):469-481. Roska, T., and Chua, L. 0. (1993b). The CNN universal machine: an analogic array computer. IEEE Truris. Circuits mid Sy.s/em.s 11: Ariulog trnd Digital Sigrid Processing, 40(3):I63 - 173. Roska, T.. Chua, L. O., Wolf, D., Kozek, T., and others. (1995). Simulating nonlinear waves and partial differential equations via CNN. I. basic techniques. IEEE Truns. Circuits mid Systetns 1: Furrdunieritul Theor?: und Applications, 42( 10):807- 8 IS. Roska, T., and Kek, L. (Eds.). (1995). Median-removes impulsive noise from a grey-scale image. p.34 of CSL-CNN software library: Templates and algorithms (version 6.4). Technical report, Computer and Automaton Institute (MTA SzTAKI) of The Hungarian Academy of Sciences, Budapest. Roska. T. and Radvanyi, A. (1990). CNND simulator. cellular neural network embedded i n a simple dual computing structure. user's guide version 3.01. In Hurignrinrr /\cud. Sci., Budupe.si, -
Hurrgtrn.
Roska. T., and Vandewalle, J. (Eds.). (1993). Cellulur Neirrul Networks. New York: Wiley. Roska, T., Wu, C. W., Balsi, M., and Chua, L. 0. (1992e). Stability and dynamics of delay-type general and cellular neural networks. IEEE T r m x Circuits mid Systems I: Fundunzrnrtrl Tlreoq atid Applicclrions, 3Y(6):487-490. Roska. T., Wu, C. W., and Chua, L. 0. (1993). Stability of cellular neural networks with dominant nonlinear and delay-type templates. IEEE Trcrns. Circuits niid Systems I: Funciurnerittrl Theon' tnzd Applicutions, 40(4):270- 272. Rueda, A,, and Huertas, J . L. (1992). Testability in analogue cellular neural networks. Iiitertroriotla/ JiJUr. Circuit Theory crnd A p p l i c ~ t i o t i ~20(5):583 , - 587. Russ, J. C. (1992). The Inirige Prucessing Hurrdhook. Boca Raton, FL.: CRC Press. Russo, F. (1992). A user-friendly research tool for image processing with fuzzy rules. In Proc. First IEEE In/. Conf o i i Frizzv Svstenl, Fuzzy-IEEE '92, pp. 56 I - 568. Russo. F.. and Ramponi, G., (1994a). Combined FIRE filter for image enhancement. In Pro(,. Third IEEE Confererice O I I Fiizy Syster1l.s. IEEE World Congress OII Coni~~iitcitioricrl Intelligenc~e. pp. 249-253. Russo. F., and Ramponi, G., (1994b). Edge extraction by FIRE operators. In Prnc. Third IEEL Conference oir Fuzzy Systenrs. IEEE World Congre.s.s on Cumpututioriul Intelligence, pp. 249-253. Russo, F., and Ramponi, G. ( 1 9 0 4 ~ ) .Nonlinear fuzzy operators for image processing. S i g i i d Processing, pp. 429 -440. Russo, F.. and Ramponi, G. (1995). A fuzzy operator for the enhancement of blurred and noisy images. IEEE Truris. Initrge P recessing, 4(8): I 169 - I 174. Salerno, M., Sargeni. F., and Bonaiuto, V. (1996). 6x6 DPCNN: a programmable mixed analoguedigital chip for cellular neural networks. In Proc. Fourth IEEE lnternutionnl Workshop on Cellular Neirrul Networks and Their Aplicutiun.s, Seville. Spain, Jiirie 24- 26, pp. 45 1 -456. Salerno, M., Sargeni, F., and Bonaiuto, V. (1995). DPCNN: a modular chip for large CNN arrays. In 1995 IEEE Sytnposium on Circuits arid Systems (Cut. No.95CH35771). I995 IEEE Symposium otz Circuits und Systeni.s. April 28-May 3, 1:4 17-420. Sanchez, E. (1976). Resolution of composite fuzzy relation equations. In$ Cotitl., 30:38-48.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
441
Snni, A., Graffi, S., Masetti, G . , and Setti, G. (1994). Design of CMOS cellular neural nelworks . IEEE Iiiternritioiuil Worksliop o i i Cellulnr operating at several supply voltages. In P r ~ c Third Nerrrul Networks uric1 Their App/icutions (CNNA-941, Dec. IH-21, pp. 363-368. Sargeni, F. ( 1994). Digitally programmable transconductance amplifier for CNN applications. Electronics Letters, 30( I I ):870-872. Sargeni. F., and Bonaiuto, V. ( 1994). High perfomlance digitally programmable CNN chip with discrete templates. In Pro(,. Third IEEE liiterricftioiid Work.shop o i i Cellular Neirrrrl Networh uric1 Their App1iccitioti.s (CNNA-Y4), Dec. 18-21. pp. 67-72. Sargeni, F., and Bonaiuto, V. (1995). A fully digitally programmable CNN chip. fEEE Trcins. Circuits orid ~ v s t c n iIf: . ~ Anei/og cfrrd f>igitci/ S i g r i d Processing, 42( 1 1 ):741 -745. Sargeni, F., and Bonaiuto, V. (1996). A 3*3 digitally prograrnmable CNN chip. Intrriicitioiiul JOLW.Circuit Theory curd Applications, 24(3):369- 379. Sato, T., Ushida, H., and Yaniaguchi. T. (1995). Retrieval system to generate facial expressions fiiteriiritiontil Coi!ference on Fuzzy Systcins, pp. 148')- 1494. using chaos. In Proc. 1995 I Schulcr, A. J . , Brabec, M., Schubel, D., and Nossek, J. A. (1994). Hardware-oriented learning ior cellular neural networks. In Proc. Tliirtl IEEE Intertirrtioriul Workshop 011 Cellulcir Neurcil Nehvorks mid Their Applictrtions (CNNA-94), pp. 183- 188. Schuler, A. J.. Nachbar, P., Nossek, J. A,, and Chua, L. 0. (1992). Learning state space irajectories in cellular neural networks. In CNNA '92 Proceediiigs. Second fnrerriationczl Work.rhop on Celblcir Nourol Nehr'orks nnd Their App1icution.s. pp. 68 -73. Serra. J. (1982). Irncigr Arrci1wi.s rlzcnioticol Morphology. New York: Academic. Serra, J. (Ed.). ( 1988). lnrtrge Aii el Meith~~i~icfticrd Morphology (vol. 2): T I i r o i ~Advtmces. New York: Academic. Scthi, I. K., and Jain. A. K . (Eds.). ( 1991). Artificier1 Neurul network.^ eind Strrtistical Patterii Recognition: Old mid New Coniwctioii.~.New York: North-Holland. Shnnn, J . J., and Fu, H. C. (1995). A f u z ~ yneural network for rule acquiring on fuizy control systems. Fuzzy Sets rriid Sy.stems, 71:345-357. Shcu, B. J., Bang, S. H., and Fang, Wai-Chi (1994). Analog VLSI design of cellular neural networks with annealing agility. In Proc. Third IEEE Int~~rtiutioiwlworks hey^ on Cellular Neirrcrl Networks ernel Their App/iccr/ions (CNNA-94). Dec. IK-21, pp. 387-392. Shcu, B. J., Bang, S . H., and Fang, Wai-Chi (199Sa). VLSI design of cellular neural networks Circuirs ruid Sy,~terri.s with annealing and optical input capabilities. In 199.5 IEEE Synzpo.siitm (Cot. No. 9SCH.35771), April 2K-Mtry 3, 199s IEEE LSyt~iposiirrno i i Circrrits oriel Sy.stei?i.s, 1:653-656. Sheu, B. J., Chang, R. C., Wu, T. H., and Bang, S. H . (1995b). VLSI-compatible cellular neural networks with optimal solution capability for optiinir.ation. In 1995 fEEE Svinposiuni (ni Circuits titid S j w o i i i s (Celt. No.95CH.35771). April 2 8 - May 3. 1995 IEEE Sytnpo.sium on Circuits and Systenis, 2: 1 165 - I 168. Shi, B. E., and Chua, L. 0. (1992). Resistive grid image filtering: input/output analysis via the CNN framework. IEEE Puns. Circuits rind Systrms I: Fuiiiiuinentnl Theory niid Applicci/ions. 39(7):531-548. Shi, B. E., Roska, T., and Chua, L. 0. (199.3). Design of linear cellular neural networks for motion sensitive filtering. IEEE Priiis. Circuits (2nd Systeins f I : Ancilog cuid Digitcil Signed Proce.vsiiig, 40(5):320- 33 I . Shih, F. Y., and Mitchell, 0. R. (1988a). Automated fast recognition and location of arbitary shaped objects by image morphology. In Proc,. IEEE Conj; Cornpurer Visrori rind Perttern Kwognitinn, Jun. 5-9, pp. 774-779. Shih, F. Y . , and Mitchell, 0. R. (1988b). Industrial parts recognition and inspection by image morphology. In Pmc. l9XK fEEE fnterncitionnl Car$ O I I Kohotics and Autoincctiori, Apr. 2 4 - 2 9 , 3: 1764- 1766.
442
TAO YANG
Shimizu. N., Cheng, G.- X., Ikegami, M., Nakamura. Y.. and others. ( 1994). Pipelining GaussSeidel method for analysis of discrete time cellular neural networks. l€/C€ Traus.Fitridrii~i~rit d s of' Electronic.\, Conimcrriications mid Coriiputer Sciences, E77-A(8):1396- 1403. Slavova, A. ( 1995). Cellular neural networks with nonlinear dynamics. Neurtrl, Ptimllel u i i d Scierit$c Coniprrtatioii, 3(3):369- 377. Slot. K. ( 1992). Optically realized feedback cellular neural networks. In CNNA 'Y2 Procrrdin,qs. Second Iiiteniatioriul Workshop on Cellulur Ncwral Nenvorks uncl Their Applicufims, Ocr. 1 4 - 1 6 . pp. 175-180. Slot, K. (1994). Large-neighborhood templates implementation in discrete-time CNN universal machine with a nearest-neighbor connection pattern. In Proc. Third I€€€ Interi~citioiitrl Workshup o r 1 Crllirlar Neitrul Netbvorks ~ i i t Their l Applicutions (CNNA-Y4).pp. 2 I3 - 2 18. Slot, K., Roska, T., and Chua, L. 0 . (1992). Optically realized feedforward-only cellular neural networks. Arcliiv jirr Elektronik untl liehrrtrrrguiig.stcc~iiiik,46(3): 158- 167. Special Issue on Cellular Neural Networks. (1996). Itit. Jour. Cirririf Tlieoiy mid A p p l x , 24:( 1 ). Special Issue on Cellular Neural Networks. ( I093a). I€€€ Truris. Circwirs (uid Sy.ctrms-1: Funclunrenttrl T h e o y rind Applicutions. 40:(3). Special Issue on Cellular Neural Networks. ( 1993b). I€€€Trans. Circuits uritl Systrm-11: Aiiulofi n r i d Digitrrl Sipirrl Processing. 40:(3). Special Issue on Cellular Neural Networks. ( 1095). Iiit. Jour. Circuit Theor)) mid App/.s.. 24:(3). Special Issue on Cellular Neural Networks. (1992). Int. J . Circuit Theor?:and Applx, 20:(5). Sullivan. G. O., Horan, P., Hegarty, J., Kakizaki, S., Kelly, B., and McCabe, E. (1906). A fully optically addressable connected component detector i n CMOS. In Proc. Fourth /EE€ Intcvutrtiorrul Workshop oii Cellulur Neural Networks unrl Their Ap/ictrtioi?s, Sevillc.. S p i i u , June. 2 4 - 2 6 , pp. 439-444. Suykens, J. A. K.. Yang, T., and Chua, L. 0. (1098). Impulsive synchronization of chaotic Lur'e systems by measurement feedback. Intemationul Jour. Btfifilrcufionurzd Chuos, S(6). pp. I37 1 1381. Suzuki. H., Matstunoto, T., and Chua. L. 0. (1992). A CNN handwritten character recognizer. Inferriatioiiul Jour. Circuit 7/ieoi:v and App/ic.nrioii.s. 20(5):601 -61 2. Sziranyi, T. (1996). Robustness of' cellular neural networks in image deblurring and texture segmentation. Iriterncitional Jour. Circuit T h e o y and Applicntioiis, 24(3):38 I - 396. SLiranyi, T., and Csapodi, M. (1994). Texture classilication by cellular neural network and genetic learning. I n Proc. 12th IAPR Iiiterntrtionul Conference 017 Puttem Recognition, pp. 381-383. S~olgay,P., Katona, A,, Eross, G., and Kiss, A. (1994). An experimental system for path tracking of a robot using a 16*16 connected component detector CNN chip with direct optical input. In Proc Third l E € € Irilernational Worksliop oii Cellular Neurd Networks a r i d Their App1icntion.s (CNNA-94). Dec. I N - 2 / , pp. 261 -266. Szolgay. P., Kispal, I., and Kozek, T. (1992). An experimental system for optical detection of layout errors of printed circuit boards using learned CNN templates. I n CNNA '92 Prot.eedirigs. Second Internutioncil Workshop on Cellulur Neurcil Netww-ks und Their AppIi,lic.citiori.\, pp. 203-209. Tanaka. M., Crounse, K. R., and Roska, T. (1994). Parallel analog image coding and decoding of Electronics, Cortimuiiiccitioii.r by using cellular neural networks. IEICE Truns. F~r~idarneritrrls arid Computer Scirnces, E77-A (8): 1387- 1395. Tao. L. H., Xi, Y. L.. Yun, W. B., and Ya. H. Z. (1995). A new type of chaotic attractor with cellular neural networks. In Proc. ISCAS'YS - lrrtenintional Syinpo.siimrr~oil Circuits uritl Systcwis, pp. 997 - IOOO. Tetzlaff, R.. and Wolf, D. (l996a). A learning algorithm for the dynamics of CNN with nonlinear templates-part I: Discrete-time case. In Procecdings ofthe Fourth I€€€lnteriiutioiiul W O ~ ~ . ~ / I O [ J or1 Cdlular Neurul Networks mid Their Aplic~cctioris.Seville, Spuin, June 24-26, pp. 46 I -466.
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
443
Tetilaff, R.. and Wolf, D. (1996h). A learning algorithm for the dynamics of CNN with nonlinear templates. part 11: Continuous-time case. I n Pro(.. Fourth IEEE Inrernrrtioncil Work.~IiopO N Cellular Neirrrrl Networks and Their Aplicutiow Seville. S p i n , June 24-26, pp. 467 -412. Thalmann. N . M., and Thalinann, D. (1994). ArtiJicicd Lrfe unrl Virtuul Reuli/y. New York: Wiley. Thiran, P., Crounse, K. R., Chua, L. O., and Hasler, M . (1995). Pattern formation properties of autonomous cellular ncural networks. IEEE Trtrns. Circuits rind Systems I: Fun~l~imentul Tliroiv trnd Applic(rtions, 42( IO):757-774. Jiir Morleling. Cambridge: Toffoli. T . ( 1987). Cellulrrr Autonintn Mrichines: A New Eni~ir~m~nent MIT Press (Series in Scientific Computation). Torres, L., and Kunt, M. ( 1996). Video Cotling: Tlie Src~ond Gc,nerci/ion Approtrch. Boston: Kluwcr Academic Publishers. Toth, G., Lent, C. S., Tougaw, P. D., Brazhnik, Y., c’t trl. (1996). Quantum cellular neural networks. Siipr1ortice.s mid Microstructure.s, 20(4):473 -478. Tryba, V. Heider, H., and Muhlenfeld, E. (1995). Auronratic Design qf’Fuz:y Systems liy Genetic AI,~prithm.s,pp. 21 -28. River Edge, NJ: World Scientitic. Tiionas, P. (1996). A cellular neural network learning the pseudorandom behaviour of a complex system. Internutiontrl .lour. electronic.^, 80(3):405-41 3 . Utxliick, W., and Nossek, J . A. (1994). Computational learning theory applied to discrete-time cellular neural networks. In Proc. Third IEEE lnterncrtionrrl Workshop on Cellnlur Neurcrl Networks and Their Applications (CNNA-Y4), pp. 159- 164. Van Dam, J. W. M., Krose, €3. J . A., and Groen, F. C. A. (1994). CNN: a neural architecture that learns multiple transformations of spatial representations. In ICANN ‘Y4. Proc. Inrr~rnrrrioncd L‘onjerence on Artijiciul Neurrrl Network.v, pp. 1420- 1423. Vandenberghe, L., Tan, S.. and Vandewalle, J. ( 1990). Cellular neural networks: dynamic properties and adaptive learning algorithm. In Neurcil Nerwork.s. EURASIP Workshop 1990 Prongs. Neural Networks. EURASIP Workshop 1990 Proceedings, pp. 141 - I SO. Varrientos. J. E., Ramirez-Angulo, J., and Sanchez-Sinencio, E. (1990b). Cellular neural network implementations: A current mode approach. In I Y Y O IEEE Inrrrncitional Workshop of1 Cc4lulrtr Nourrtl Networks mid Their App1icatioii.r. CNNA-YO, Dec. 16- 19, pp. 216-225. Varrientos, J. E., Ramirez-Angulo, J., and Sanchez-Sinencio, E. (1990a). A current-mode CMOS cellular neural network. In Proc. 33rd Midwest Symposiuni 017 Circuits rind Sy.steni.s, pp. 12- 14. Varrientos, J. E., and Sanchez-Sinencio. E. (1992). CELLSIM: a cellular neural network simulator for the personal computer. In Proc. 35th Midwest Syrnpo.siuni on Circuits on(/ Swstems, pp. 1384- 1387. Varrientos, J . E., Sanchez-Sinencio, E., and Ramirez-Angulo. J . (1993). A current-mode cellular ~. m i l Systems 11: Analog mid Digital neural network implementation. IEEE T r ~ r i . Circuit.s S i g r i d Proces.sing. 40(3): 147- 15.5. Venctianer, P. L., Werblin, F., Roska. T., and Chua, L. 0. (1995). Analogic CNN algorithms for sonie image compression and restoration tasks. IEEE Truns. Circ~uirsand Sy.strinu I: I~’uriclametiralT1ieor.v and Applicciriorzs, 42(5):278 - 284. Wen, K.- A,, Su, J.- Y., and Lu, C.- Y. (1994). VLSI design of digital cellular neural networks for image processing. Jour. visuul Communiccrtion arid ImnRe Representation, 5(2) Werhlin, F., Roska, T., and Chua, L. 0. (1994). The analogic cellular ncural network d Circuit Tlieory and Appliintion.r, 23(6):S41 -569. eye. I n t ~ ~ r n u t i n nJour. Wu, C. W. Xing, T. and Chua, L. 0. (1996). On adaptive synchronization and control of nonlinear dynamical systems. Internationnl .lour. R[frrrcution und Chaos. 6(3):4SS-47 1. Yager, R. (1979). A measurement informational discussion on fuzzy union and intersection. Int. Jorrr. Mrtn-Machine Studie.s, 1 1 : 189-200, 1979. Yagcr, R. R. and L. A. Zadeh (Ed ( 1992). An Introtlucriori to Fuz7Y Logic Applicrttion in lrrt~lligrrztSystems. Boston: Kluwer Academic Publishers.
444
TAO YANG
Yaniakawa, T. ( 1990). Pattern recognition hardware system employing a fuzzy neuron. I n Proc. Int. Car$ F w ; y Logic. pp. 934-938. Yamakawa T. and Furukawa, M. ( 1992). A design algorithm of membership function for a fuzzy neuron using example-based learning. In Pro(,. IEEE Inr. Con$ Fnzzy Sysr. (FUZZY-IEEE'92). pp. 943-948. Yang. C. M., Yang. T., and Zhang, K. Y. (1994). Chaos in the discrete time cellular neural networks. I n Proc. Third IEEE Internutioncrl Workshop on Celldur Neurul Nehtwrk.\ mid Their App/iccrtiorr.s (CNNA-94), pp. 297 - 302. Yang, H.-K., Yakout, M. A,. and El-Masry, E. I . (1994). Current-mode implementation of discrete-time cellular neural networks using the pulse width modulation technique. In Proc.. 37th M i d w w t Symposium on Circuirs arid Systems, Aug. 3 - 5, 1:4.57-460. Yang. L., Chua. L. 0. and Krieg. K. R. (1990). VLSl implementation ofcellular neural networks. I n I990 IEEE International Svinposiurn on Circuits cord Sysrerns, pp. 2425 -2427. Yang, T. ( 1994). Blind signal separation using cellular neurnl networks. Internorioritrl Jortr. Circuit Theory arid App/ic.utions, 22(5):399-408. Yang, T. and Chua, L. 0. (1998a). Applications of chaotic digital code-division multiple access(CDMA) to cable communication systems. Irltr~rnntiontrl Joirr. Rfurc.ntion m c l Clitros, 8(8):1657- 1669. Yang, T. and Chua, L. 0. ( I998b). Error performance for chaotic digital code-division multiple access(CDMA). /ntrrnutiorr~r/Jour. B$urcation trnd Chaos. 8( 10):2047- 2059. Yang. T. and Chua, L. 0. ( I 996a). Channel-independent chaotic secure communication. InternrrtionuI Jour. Rfurcatiori nrid Chaos, 6( 12Bj:26.53- 2660. , and Chua, L. 0. (199613). Secure communication via chaotic parameter modulation. Trum. Circuir.r nnd Sy.sr~'m.s-I: Fundamental Tlieorv c m d App/ic-arion.s. 43(9):8 I7 - 8 19. Yang, T. and Chua, L. 0. (1997a). Chaotic digital code-division multiple access (CDMAj systems. International Jour. Bifurcation arid Chaos, 7( 12):2789- 2805. Yang, T.. Chua, L. 0..and Crounse, K. R. ( 1 9 9 6 ~ ) .Application of discrete-time cellular neural networks to image copyright labeling. I n P roc. Fourth IEEE Intrriurtionul Work.shop on C'rllrtltrr N m r d Netviwrks (2nd Their Aplicariorrs (CNNA-96), pp. 19 -24. Yang. T., Wu, C. W., and Chua, L. 0. (1997g). Cryptography based on chaotic systems. IEEE T ~ / I I . Circuits v. c/11t1Sv.\ferll.S -I: F/ttIdCrmetitcl/ The()? U t I d Applicati0Il.S. 44(5):469 -472. Ymg, T.. Yang, C.-M.. and Yang, L.-B. (1998d). A detailed study of adaptive control of chaotic systems with unknown parameters. Dyncmrics und Conrrol, 8(3):2.5.5-267. Yang, T., Yang, C. M.. and Yang, L. B. (1998e). Break chaotic switching using generalized synchronization: Examples. lEEE Trrms. Circuits und S y s t r m - I: F~trrdumentc~lTheory u ~ d Applictrtioiis, 45( 10): 1062- 1067. Yang. T., Yang, C. M., and Yang, L. B. (1998f). The differences between cellular neural network based and f~izzycellular neural network based mathematical morphological operalions Irrti~rnutioncr/Jour. Circuir T h e o n und App/ictrtion.s, 26( 1 ): 13 - 2.5. Yang, T., and Yang, L. B. (1996). The global stability of f u u y cellular neural network. IEEE Trum. Circuirs rind System.s-I: Fundmwitcrl Theory urid Applicatiorr.v, 43( 10):880- 883. Yang. T., and Yang, L. B. (1997d). Application of f u z y cellular neural networks to Euclidean distance transformation, IEEE Truns. Circuits and Svstetrrs- 1: Fundanrentul Theoiy and Allplicutions, 44(3):242-246. Yang. T., and Yang, L. B. (1997e). Application of fuzzy cellular neural networks to moi-phological gray-scale reconstruction. Internationcil Jour. Circuir Theory rind A[J[dii~utions, 25(3):153- 165. Yang, T., and Yang, L. B. (1997f). Fuzzy cellular neural network: A new paradigm for image processing. Inrernationnl Jour. Circuit Theoiy trnd App1iccztion.i. 25(6):469-48 I .
FUZZY CELLULAR NEURAL NETWORKS AND THEIR APPLICATIONS
445
Xing, T., Yang, L. B., Wu, C. W., and Chua. L. 0. (1996d). Fuzzy cellular neural networks: Applications. In Proc. 4//7 IEEE 1171. Workshop on Cellulnr Neurrrl Networks nrid Tl7eirA~~plic.c~rioirs (CNNA ‘96).pp. 225 - 230. Yang, T., Yang, L. B.. Wu, C. W., and Chua, L. 0. (1996e). F u z ~ ycellular neural networks: ‘Theory. In Proc. 4/11 /LEE In/. Work.shopori Ct~llulrrrNeural Network.s eincl Their App1icutiori.s (CNNA‘96).pp. I8 1 - 186. Yiing, T., Yang, L. B., and Yang, C. M. (1997i). Fuzzy cellular neural network. Technical Report Memorandum No. UCB/ERL M97/6 I , Electronics Research Laboratory, College of Engineering, University of California, Berkeley, 3 Sept. 1997. pp. 1 - 196. Yang, T.. Yang, L. B. and Yang. C. M. (19988). Fuzzy cellular neural networks and their appli, 1):78- 8.5. cations. Cl7ine.w Jour. ElechUiiic.\ (Erigli.sh k r ~ s i o r i )7( Yung, T., Yang, L. B., and Yang, X. P. (1996f). Application of cellular neural network to facial cxpression animation and high-level image processing. Interntrtioriol Joirr. Circuit Tl7eorv rrritf Applicc/ti/lrl.s,24(3):425- 450. Yrung, T. ( 1995a). Application of cellular neural network to map recognition. Jorrr. Torigji Uniwr,c.ir\..23( l):107- 112. (In Chinese). Yang, T. (1995b). Recovery of digital signals from chaotic switching. Iiitrmnrioiinl Jour. Circuit Theory cud App1icrrtioii.s. 23(6):61 1-615. Yang, T.. and Chua, L. 0. (l997b). lrnpulsive control and synchronization of nonlinear dynanio ~ i ~ Bifirccitiori rl ond ical syslems and application to secure communication. l ~ r r ~ r ~ ~ t r / iJaw. C’htro.~,7(3):64S-664. Yang, T., and Chua, L. 0. ( 1 9 9 7 ~ ) .Impulsive stabilimtion for control and synchronization of chaotic systems: Theory and application to secure conimunication. l E E E Trclrrs. Circuits trnrl .Syvsterr~.s-I: F ‘ u ~ i ~ / nTheorv ~ ~ ~ urid t ~ ~Applicatioii.s, ~ / ~ ~ ~ 44( 10):976-988. Yang, T., Suykens, J.A.K., and Chua, L. 0. ( 1 9 9 8 ~ ) .Impulsive control of nonautonomous chaot~ .~Bifilrcrrtioii d rmcl Chrros. 8(7):1557ic systems using practical stabilization. l ~ ~ / r r ~ r u /,i/ o w 1564. Yang, T., Yang. C. -M., and Yang, L. -B. (I997g). Control of Riissler system to periodic motions using inipulsive control method. Physics Lt,t/er.s A, 232(5):356-361. Yang. T., Yang, C. M.. and Ymg, L.-B. (1998). Genetic optimization of fuzzy cellular neural Jour. Circuit Thron. networks -get knowledge from both learning and structures. lrr/errr~~/ioritd rirrrl App1iccirion.s. Subniitted. Yang. T., Yang, C. M.. and Yang, L.-B. (1998). Learning algorithm of fuzzy discrete-time cellular neural networks. IEEE Trcrii.suc/inrisoil Circuits crrrtl Sy,srerri.s-l: Furrc/rirnentnl the on^ errit/ Applictrtiorrs. Submitted. Yang, T., Yang, L.-B.. Yang, C.-M., Yang. X.-P., and Yang, H.-N. (199Xi). Linguistic flow in furzy discrete-time cellular neural networks and its stability. IEEE Trtr17.s. Circuirs ~ i i i r l .S\,.S~CI~I.S -I: ~ ‘ J / I I L ~ O I I I ~ I I ~The/)r\> (I/ C / I I ~ /AP/J/~CO/~OIIS, 45(9):869 - 878. Yanp, T., Yang, L. B., and Xing, C. M. (1998h). Theory of control of chaos using sampled data. physic.^ Letters A, 246(3 - 4):284 - 288. Yang, T., Yang, L.-B., and Yang, C.- M. ( 1997.1). Impulsive control of Lorenz system. Physicrr I), 100:18-24. Yang. T., Yang, L.- B.. and Yang. C.- M., (lW7k). Impulsive synchronization of Lorenz systems. Physics Lettrr.~A , 226(6):349- 354. Yanp, T., Yang, L.- B.. and Yang, G . (1994a). On unconditional stability of the general delayed cellular neural networks. I n Proc. Third IEEE Ii~terriutionnl Workshop OIZ Collrtltrr N c w w l Nomvrks trrid Their Applicrrtions (CNNA-94). pp. 9 - 14. Yang. X . P.. Yang, T., and Yang, L. B. (1994b). Extracting focused object from defocused hackground using cellular neural networks. In Pro(,. Third /EEE Iriterrin/ioricrl Workshop ori Crllultrr Nritrrrl Netawrks c i t d Their App1icdori.s (CNNA-94), pp. 45 I -455.
446
TAO YANG
Yin, L., Yang, R., and Neuvo. Y. (1996). Weighted median filters: A tutorial. IEEE Truns. Circuits Systenix - I / , 43( 3): 157 - 192. Yli-Harja, O., Astala, J . , and Neuvo, Y. (1991). Analysis of the properties of median and weighted median filters using threshold logic and stack filter representation. IEEE Truns. Signtrl Processing, 39: 395 -4 10. Zadeh. L. (1965). Fuzzy sets. lnfoi-ni. t r n d Coritrol, 8: 338-353. Zadeh. L. A,, Fu, K. S., Tanaka, K. and Shimura M. (Eds.). (1975). Fuzzy Sets trnd T/wir A/'p1icution.s to Cognitive und Deci.sion Processes. London: Academic. Zarandy, A,, Werblin, F., Roska, T., and Chua, L. 0 .(1996). Spatial logic algorithms using basic morphological analogic CNN operations. Intemntioriul Joitr. Circuit Theor? urrd A ~ i p l i ~ ~ u t i ~ ~ n s . 24(3):283-300. Zhang, X.. Hang, C. C., Tan, S., and Wang, P.Z. (1996). The min-max fiinction differentiation and training of fuzzy neural networks. IEEE Trciris. N e u r d Networks, 7(5): 1 139- I 150. Zhang, X. H., Hang. C. C., Tan, S.. and Wang, P.Z. (1994). The delta rule and learning for min-max neural networks. In Proc. IEEE-ICNN'Y4, 1: 38-43. Orlando, FL. Zhou, Yi-Tong ( 1992). Artijciul Neurul N e t w o k s ,for Compufer Vision. New York: SpringerVerlag. Zou. F.. Katerle, A,, and Nossek, J . A. (1993). Hoinoclinic and heteroclinic orbits of the threeTl,roi-\. rrntl cell cellular neural networks. IEEE Trcrri.~.Circuits rind Systems /: F~rnr/rrrnen~r~l App1iccition.s. 40( I I ): 843 -848. Zou, F., and Nossek, J. A. (1991). A chaotic attractor with cellular neural networks. lEEE Truris. Circuits crud Svstrms, 38(7):8 I I - 8 12. Zou F.. and Nossek, J . A. (1992). Double scroll and cellular neural networks. In I9Y2 IEEE Internrrtionul Symposium on Circuits rind Sys~e~ns. 1:320- 323. Zou, F., and Nossek, J. A. ( 1 9 9 3 ~ ) .An autonoinous chaotic cellular neural network and Chua's circuit. Jour. Circuits, Sv.sten7.s nrrd Cotnputer.~,3(2):59 I -601, Zou, F., and Nossek, J . A. (1993h). Bifurcation and chaos in cellular neural networks. lEEE Truns. Circuits trnd Systems I : Arndumenrul Tlieor?;trrid Applicntions. 40(3): 166- 173. Zou, F., and Nossek, J. A. (1993a). Hopf-like hifiircation in cellular neural networks. In (Proceediirp) I993 IEEE Infernnfioncrl Syn/iosiuni on Cirwits trnd Systems., 4: 239 I 2394. Zou F., Schwarz, S., and Nossek. J . A. (1990a). Cellular neural network dcsign using a learning algorithm. In 1990 IEEE Intertitrtioncrl Workshop on Ce/lulur Neural Networks r i n d Thrir A/J/J/iUitiOn.S, CNNA-90, pp. 73 - 8 I . Zou. F., Schwari, S., and Nossek, J . A. (l99Ub). Cellular neural network design using a learning s Tlieir algorithm. In IYYU lEEE Inrernufioncil Workshop on Cellultrr Neurcrl N e t ~ ~ o r kwid Applicntion.s, CNNA-90, pp. 73- 8 I . -
INDEX A
future work and conclusions, 421 -23 local connectedness, 266, 277 multilayer (MCNN), 272 nature of, 266-68 nonlinear synaptic laws (NCNN), 271-72 structures of conventional, 269-75 unified structure, 276-79 universal machine (CNNUM), 272-73 Chaotic CNN (CCNN), 272 Charge-coupled devices (CCDs) measurement of, 36-37 MOSFETs and, 83-84 Charge pumping, 35 Charge transfer efficiency, 36-37 Charge transient, 19 Conductance-voltage (G-V) method, 36 Constant capacitance deep-level transient spectroscopy (CC-DLTS) conclusions, 72 feedback circuit and setup for, 63-65 illustrations, 67-72 noise sources in, 143-44 Constant-capacitance voltage transient (CCVT), 29-30, 62 Constant resistance deep-level transient spectroscopy (CR-DLTS) conclusions, 72 feedback circuit and setup for, 65-67 illustrations, 67-72 noise sources in, 143-44 use of term, 74 Constant resistance deep-level transient spectroscopy (CR-DLTS), JFETs and applications for JFETs, 96-97
Acceptor, 7 Analog-to-digital converters (ADCs), 41,44, 47,48-49 Approximation space, 164-65 Arrhenius plot, 22 Artificial neural networks (ANNs), 266 Averaging process, 30-3 1, 144 continuous time, 52-55 pseudo-logarithmic, 43-52
Barnes-Wall lattice, 228 Berry-Hannay phase, 164 Binary linear code, 238 Blocking effect, 200 Boxcar function, 24-25, 32, 40, 42, 55-56
C Capacitance transient, 15 - 18 Capacitance transient deep-level transient spectroscopy (CT-DLTS), 68, 143 Capacitance-voltage (C-V) methods high-low frequency, 3.5-36 quasi-static, 35-36 Capture rates, 12, 14 Cartan matrices, 224, 260- 1 Cell, 269, 277-78 Cellular automata (CA), 266 Cellular neural networks (CNNs) See also Fuzzy cellular neural networks (FCNNs); MIN/MAX CNN (MMCNN) chaotic (CCNN), 272 delay-type (DCNN), 27 1 discrete-time (DTCNN), 273-75
447
448
INDEX
Constant resistance deep-level transient spectroscopy (CR-DLTS), JFETs and (cont.) comparison of CC-DLTS and CT-DLTS with, 107-18 conclusions, 1 18 germanium JFETs, results in, 105-7 silicon, radiation-induced defects in, 155-56 silicon JFETs, results in, 99- 105 theoretical background, 97 -98 Constant resistance deep-level transient spectroscopy (CR-DLTS), MOSFETs and experimental results, 77-83 theoretical background, 75 -77 Constant resistance deep-level transient spectroscopy (CR-DLTS), MOSFETs (depletion mode) and benefits of, 84 charge-coupled devices and reading, 83-84 conclusions, 95-96 experimental results and discussion, 87-95 theoretical background, 84 -87 Continuous time averaging, 52-55 Covering radius, 227-28 Current DLTS, 74 Current transient, 18- I9 Current transient spectroscopy (CTS), 27-28
D Daubechies wavelets, 170-5 symmetric, 175-76 Debye tail, 13, 16 Deep centers defined, 3, 8 effect of, on device performance, 10-1 1 role of, 9- 10 Deep holes, 227 Deep-level parameters, determining, 21 -24 Deep levels, 9
Deep-level transient spcctroscopy (DLTS) See nlsn Constant capacitance deep-level transient spectroscopy (CC-DLTS); Constant resistance deep-level transient spectroscopy (CR-DLTS) areas for future research, 120- I capacitance transient, 15- 18 charge transient, 19 classification scheme for, 34 conclusions, 119-20 conventional, 24-25 current transient, 18- 19 defects, impurities, and energy levels, 7- I I detection of emission of trapped charge, 15 -2 1 determination of deep-level parameters, 2 1 -24 field effect transistors and, 73-75 generation-recombination statistics, 11-15
magnitude errors, I35 -40 noise sources and signal-to-noise ratio, 141-44 other methods compared with, 35-38 recombination centers, 8-9 role of, 3-4 shallow impurities, 7-8 template for analysis program, 152-55 template for measurement program, 148-5 1 time constant errors, 140- I traps, electron and hole, 8-9. 1 I - 12 traps, interface, 10, 20 voltage transient, 19-21 Deep-level transient spectroscopy (DLTS), averaging and recording of digital analog signal processing methods, 40-41 applications, 55 -60
INDEX
Deep-level transient spectroscopy (DLTS), averaging and recording of digital (cont.) conclusions, 60-61 continuous time averaging, 52-55 digital signal processing methods, 41 -43 pseudo-logarithmic averaging, 43-52 technical overview, 39-43 Deep-level transient spectroscopy (DLTS), main stages in averaging, 30-3 1 detection (emission pulse), 27-30 digital methods, 33 excitation (filling pulse), 25-27 transient analysis, 3 1-34 Delay-type (DCNN), 27 I Differential Pulse Code Modulation (DPCM), 249 Direct memory access (DMA), 44, 48, 50 Discrete-time CNN (DTCNN), 273-75 Discrete-time fuzzy CNN (DTFCNN) advanced learning algorithms of additive, 382-87 applications of, 407-20 embedding f u u y IF-THEN-ELSE rules into, 408- 1 I embedding local fuzzy relation equations, 4 I3 -20 implementing fuzzy inference sharpener, 4 I 1 - I3 implementing fuzzy spatial dilemmas using Type-11, 423-27 lcarning algorithms of additive Type-11, 377-82 lcarning algorithms of Type-IV, 392-401 structure of, 407-8 structure of Type-IV, 387-92 Discrete wavelet transform (DWT), 202-3 Distortion function, 209- 10 minimization of. 212- 13 DLTS spectrum, 23, 40, 57-60
449
D, lattice, 232-33 Donor, 7 Double-correlation or DDLTS, 26, 62 Dual lattices. 2 19, 23 1 -32
E Electron trap, 8-9, 1 I - 12 Emission pulse, 27-30 Emission pulses memory, 63 Emission rates, 12. 14 Entropy coding of lattice vectors, 250-4 Equal slope algorithm, 204 Excitation (filling pulse), 25-27 Exponential correlator, 32, 40. 42
F Face image processing, Type-I FCNN and, 356-60 Factorizations, 166 Feedback and feedback template, 270, 27 I Feedback circuit, electrical circuits of. 146-47 Feedforward and feedforward template. 270, 271 Field effect transistors (FETs), DLTS techniques and, 73-75 Filling pulse, trap, 16, 22, 25-27 Filling pulses memory, 63 Filter, scaling iunction. I67 -68 FIRE edge extractor. 366-67 FIRE operators. 364-65 Flat-band voltage, 86 Flat structuring element, 328 Fourier domain, I65 -66 Fuzzy cellular neural networks ( FCN N s) See also Discrete-time fuzzy CNN (DTFCNN) applications of discrete-time, 407 - 20 applications to image processing. 366-75 cell, 277-78
INDEX
Fuzzy cellular neural networks (FCNNs) (con?.) classification of, 282 decentralization. 277 differences between FNN and, 287-90 different structures of, 282-87 dynamics, 277 future work for, 420-7 gcnetic algorithm applications to image processing, 404-7 genetic algorithms for optimizing, 401-4 local connectedness, 266. 277 principlcs and definitions of general, 279-81 unitied CNN structure, 276-79 Fuzzy cellular neural networks, computational arrays and implementation of morphological operations. 329-33 mathematical morphology, basic, 327-28 MIN/MAX CNN, 333-56 Fuzzy cellular neural networks, learning algorithms and of additive Type-11 discrete-time, 377 - 82 advanced, of additive discrete-time, 382-87 from linguistic inputs, 387-401 of Type-IV discrete-time, 392-401 Fuzzy cellular neural networks, theory of delay-type, 3 I3 -24 elementary theory, 290-9 global stability, 299-3 10 local stability, 3 10-3 stability of discrete-time, 324-27 Fuzzy cellular neural networks, Type-I delay-type, 320-4 description of, 283-84, 285-87 dynamical range of, 295-99 face image processing using, 356-60
global stability, 307- 10 local stability, 3 13 Fuzzy cellular neural networks, Type-Il additive, 330-3 delay-type, 3 13-20 description of, 284-87 dynamical range of, 291 -95 embedding fuzzy inferencc into, 364-66 fuzzy operations implemented by, 363 fuzzy set theory and properties of images, 36 1-63 global stability, 299-306 its interpreter, 363 -64 local stability, 3 10-3 MIN/MAX, 333-56 multiplicative, 329-30 Fuzzy set theory, 276, 361 -63 Fuzzy singleton, 362
G Generalized Gaussian function (GGF). 210-2, 217, 249, 251 Generation-recombination (G-R) centers, 1 1 - IS Genetic algorithm (GA) applications to image processing, 404 - 7 for optimizing FCNN, 401 -4 Glue vectors. 2 19 Gram matrix of lattices, 219 of root, 220 Gray-scale closing, 328 Gray-scale dilation, 328 Gray-scale erosion, 328 Gray-scale opening, 328
H Hall effect, 2 I , 34 Hamming distance, 238 Hamming weight, 238 Hole trap, 8-9, 11 - 12 Human visual perception, 204
lNDEX
I
45 I
Jacobi theta function, 236 Junction field-effect transistors. S ~ P Constant resistance deep-level transient spectroscopy (CR-DLTS), JFETs and
Cartan matrices, 224, 260- 1 codebook for, 2 14 conclusions, 2 3 - 6 0 counting lattice points, 233-42 distortion measure and quantization regions, 214- 16 entropy coding of vectors, 250-4 experimental results, 254-58 optimal quantizer for wavelet coefficients, 217- 18 quantization algorithms for selected lattices, 229-33 scaling algorithm, 242-44 selecting lattices for quantization, 244-SO Learning algorithms. See F u ~ z y cellular neural networks, learning algorithms of Lightly doped drain (LDD) devices, 84 Linear-spline function, 164-66 Local connectedness. 266. 277 Lock-in arnplitier, 32, 40, 42 Low-frequency noise measurements, 37
L
M
Lngrange multiplier technique, 2 12 Laminated lattices, 226-29. 233 Laplace transform, inverse, 33-34 Laplacian PDF, 25 1 Lattice points, counting, 233-42 Lattices construction of root, 224-26 defined. 218-19 description of root, 219-24 D,,, 232-33 dual, 219, 231-32 glue vectors, 219 Gram matrix of. 219 laminated, 226-29. 233 quadratic form, 2 I9 rclntionship between codes and, 238-42 Z,,, 232 Lattice vector quantization, 2 I3 See r d s o Wavelet coefficients, quantization of
Mallat algorithm with complex filters, 178-80 Mathematical morphology, basic, 327-28 Mean squared error (MSE), 216. 244-45 Memory circuit, 63 Metal-oxide-semiconductor FETs. See Constant resistance deep-level transient spectroscopy (CR-DLTS), MOSFETs and MIN/MAX CNN (MMCNN) applications, 3 4 7 4 6 for averaging operator, 335-37, 346 for horizontal derivative, 337-39, 346 for Laplacian operator, 335, 346-47 learning algorithms. 342-46 neighborhood patterns, 333 for orientation derivatives, 339-40 universal functions, 340-2
IF-THEN-ELSE rules, 364-66, 407, 408- I3 Image processing FCNN applications to, 366-75 genetic algorithm applications to, 404 - 7 Impedance spectroscopy, 36 Impurities, shallow, 7-8 Impurity characterization, importance of, 2-3 Interface traps, 10, 20 Ionization energy, 7 Isophase space, 18 I
J
452
INDEX
Multilayer CNN (MCNN), 272 Multiresolution analysis conclusions, 193-96 image enhancement, 183-87 Mallat algorithm with complex filters, 178-80 wavelet shrinkage technique, 183-87 Multiresolution wavelets, 167 -70
N Noise removal, fuzzy inference and impulsive. 367-7.5 Noise sources and signal-to-noise ratio, 141 -44 Nonlinear synaptic laws (NCNN), 27 1-72
0 Optimal bit allocation, 208- 13 Orthonormalization. 166. 168 Daubechies. I7 1
P Packing radius, 227 Phase, role of reconstruction 180-3 in signal processing, 163-64 in spline wavelet bases, 164-66 Phase-sensitive detector (PSD), 66, 72, 74-75 Photoluminescence (PL), 37 Pinch-off voltage, 86, 97-98 Pixels, 267 />-ti junction, 15-18, 72 Principle of detailed balance, I3 Probability distribution function (PDF), 209- 12, 217, 247, 248, 249, 251, 2.54-55 Project onto convex spaces (POCS), 181-83 Pseudo-logarithmic averaging demonstration, 50 - 52 electrical circuit of, 145 error analysis, 45-48
implementation, 48-50 theory, 43-45
Q Quantization. See Lattice vector quantization; Wavelet coefficients, quantization of Quantization algorithms for selected lattices, 229-33 Quasi-equilibrium, 14
R Random telegraph signal (RTS), 94 Rate window concept, 22-23, 99 Recombination centers, 8-9 Reed-Muller code, 228. 233, 238 Refinement equation, 167 Ridge and skeleton algorithm, 164 Riesz basis, 164-65, 166 Root lattices Cartan matrices, 224, 260- I construction of, 224-26 description of, 2 19-24 Gram matrix, 220 root systems, 220-3
S Scaling algorithm, 242-44 Scaling function Daubechies, 170 defined. 165, 166 multiresolution, 167 symmetric, 169 symmetric complex-valued, 174 Shannon lower bound, 215, 216 Shockley-Read-Hall (SRH) theory, 12- 1.5, 39, 76, 86, 98 Signal-to-noise ratio (SNR), 141 -44, 206. 207 Space-charge region (SCR). 13, IS- 18, 20 Spectral analysis DLTS (SADLTS), 41 Spline wavelet bases, 164-66 Strang-Fix condition, 170- 1 Symmetric Daubechies wavelets. 175-76
453
INDEX
Symmetric Daubechies wavelets, (cont.) phase of, and scaling function, 176-78 Symmetric scaling function, 169 Symmetry, 166
T Thermally stimulated capacitance (TSCAP), 36, 39 Thermally stimulated current (TSC), 36, 39 Theta functions, 236-38 Time-delay, 27 I Transient analysis, 3 1-34 Transient spectroscopy, use of term, 23 Traps filling pulse, 16, 22, 25-27 hole and electron, 8-9, 1 1 - 12 interface, 10, 20
U Uniform vector quantizer, 216
V Vanishing moments, 170 Vector quantization (VQ) See also Lattice vector quantization disadvantages of, 200 Voltage transient, 19-21
W Wavelet coefficients, quantization of, 201 -2 distortion function, 209- 10 distortion function, minimization of, 212-13 fundamentals of quantization process, 203-4 high-frequency coefficients, 21 0- I information distribution across coefficient matrix, 204-8 low-frequency coefficients, 211-12 optimal bit allocation, 208- 13 statistical model of. 212- 19 Wavelets Daubechies, 170-5 image enhancement, 183-87 Mallat algorithm with complex filters, 178-80 multiresolution, 167-70 shrinkage technique, 183-87 spline, 164-66 symmetric Daubechies, 175-76 Weight distribution of C, 239 Weight enumerator, 239 Weight function, 32
Z Zn lattice, 232
I S B N O-L2-014751-3