\P 5J\ Soft Computing Series — Volume 6
Brainuiare: Bio-Inspired Architecture and its Harduiare Implementation Editor: ...
105 downloads
763 Views
13MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
\P 5J\ Soft Computing Series — Volume 6
Brainuiare: Bio-Inspired Architecture and its Harduiare Implementation Editor: TsutOIIIU Miki
RSI Fuzzy Logic Systems Institute (FLSI)
Brainware: Bio-Inspired Hrchitecture and its Hardware Implemention
Fuzzy Logic Systems Institute (FLSI) Soft Computing Series Series Editor: Takeshi Yamakawa (Fuzzy Logic Systems Institute, Japan)
Vol. 1: Advanced Signal Processing Technology by Soft Computing edited by Charles Hsu (Trident Systems Inc., USA) Vol. 2:
Pattern Recognition in Soft Computing Paradigm edited by Nikhil R. Pal (Indian Statistical Institute, Calcutta)
Vol. 3: What Should be Computed to Understand and Model Brain Function? — From Robotics, Soft Computing, Biology and Neuroscience to Cognitive Philosophy edited by Tadashi Kitamua (Kyushu Institute of Technology, Japan) Vol. 4:
Practical Applications of Soft Computing in Engineering edited by Sung-Bae Cho (Yonsei University, Korea)
Vol. 5: A New Paradigm of Knowledge Engineering by Soft Computing edited by Liya Ding (National University of Singapore)
F L 5 I 1 Soft Computing Series — Volume 6
Brainutare: Bio-Inspired Architecture and its Hardware Implementation
Editor
Tsutomu Miki Kyushu Institute of Technology, Japan
fe World Scientific Singapore • New Jersey • London • Hong Kong
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
BRAINWARE: BIO-INSPIRED ARCHITECTURE AND ITS HARDWARE IMPLEMENTATION FLSI Soft Computing Series — Volume 6 Copyright © 2001 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in anyform or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-02-4547-5
Printed in Singapore by Fulsland Offset Printing
Series Editor's Preface
The IIZUKA conference originated from the Workshop on Fuzzy Systems Application in 1988 at a small city, which is located in the center of Fukuoka prefecture in the most southern island, Kyushu, of Japan, and was very famous for coal mining until forty years ago. Iizuka city is now renewed to be a science research park. The first IIZUKA conference was held in 1990 and from then onward this conference has been held every two years. The series of these conferences played important role in the modern artificial intelligence. The workshop in 1988 proposed the fusion of fuzzy concept and neuroscience and by this proposal the research on neuro-fuzzy systems and fuzzy neural systems has been encouraged to produce significant results. The conference in 1990 was dedicated to the special topics, chaos, and nonlinear dynamical systems came into the interests of researchers in the field of fuzzy systems. The fusion of fuzzy, neural and chaotic systems was familiar to the conference participants in 1992. This new paradigm of information processing including genetic algorithms and fractals is spread over to the world as "Soft Computing". Fuzzy Logic Systems Institute (FLSI) was established, under the supervision of Ministry of Education, Science and Sports (MOMBUSHOU) and International Trade and Industry (MITI), in 1989 for the purpose of proposing brand-new technologies, collaborating with companies and universities, giving university students education of soft computing, etc. FLSI is the major organization promoting so called IIZUKA Conference, so that this series of books edited from IIZUKA Conference is named as FLSI Soft Computing Series. The Soft Computing Series covers a variety of topics in Soft Computing and will propose the emergence of a post-digital intelligent systems.
Takeshi Yamakawa, Ph.D. Chairman, IIZUKA 2000 Chairman, Fuzzy Logic Systems Institute
Volume Editor's Preface Brain is an ultimate intelligent processor, which processes a huge amount of information acquired from sensory systems in a flash and comes up with the most appropriate answer immediately. Furthermore, a human brain can handle ambiguous and uncertain information adequately. Implementation of such a human-brain architecture and function is so called "Brainware". The Brainware will be a candidate of the new tool that realizes a human-friendly computer society, not a computer-friendly human society. In order to make mis tool practical for use, silicon implementation of Brainware is indispensable. The enormous hardware capacity based on silicon VLSI technology can offer a potential in implementing very complex systems on a chip. However, a microprocessor based on even today's VLSI technology is insufficient for silicon implementation of Brainware, because its architecture is inherently different from mat of bio-systems. In order to realize the Brainware on a chip, die new hardware paradigm is needed. One of the new streams is so called "bio-inspired" hardware for which some principles and mechanisms of bio-systems have been applied. In mis book, the hardware implementing the bio-inspired system is discussed as to its devices, architecture and systems. This book consists of eight enriched versions of papers selected from HZUKA'98. First of all, several silicon realizations of nerve function are introduced. In chap.l, me new functional device with multi-inputs, neuron MOS (vMOS), is described. The vMOS works like McCulloch & Pitts model and turns on when the weighted-sum of input signals exceeds the direshold voltage of the device. In this chapter, the association processor architecture based on a psychological brain model has been proposed and implemented in vMOS hardware computing scheme. Chap.2 presents the realization of nerve function using new device technology. The neuron circuit is composed of a metal-ferroelectricsemiconductor FET (MFSFET) and a complimentary unijunction circuit. It realizes the adaptive learning function by modulating the output frequency. In chap.3, as an effective signal transformation in neural systems, a PWM approach is introduced. From the viewpoint of designing large networks, signal transformation, simple weight circuits and nonlinear function circuits are
viii Preface
discussed. Since visual information processing has to treat with enormous data, it is difficult to process in real-time by using ordinary computing method. So, in the following three chapters, neuromorphic vision systems are discussed. In chap.4, methodologies for making an application specific design of vision circuits based on bio-inspired architecture is introduced. Complex visual processing, such as image processing, eye-tracking and visual inspection, is described. Chap.5 presents a simple MOS circuits with a correlation model based on the insect motion detectors aiming at realizing a motion-sensing. This model offers the real-time computation of the optical flow. And, chap.6 focuses upon a morphological picture processing which needs a long time for computing by using ordinary methods. The cellular automaton using vMOS is proposed for its real-time processing. In this chapter, noise reduction, edge detection, thinning and shrinking are especially explained in detail. Human brain, from the different viewpoint, can be' regarded as a huge scale dynamical complex system. One of the keys to explicate its function is to investigate chaotic behavior found in bio-systems in order to implement it in hardware. Thus, in chap.7, how to create such chaotic dynamics by using simple circuitry is described. According to recent neuroscience researches, a neuron is expected to process information efficiently by using the complex physiological properties of its dendrites. In the last chapter, computational function of neuronal dendrites is explored based on mathematical model. This model describes dendrites' essential responsiveness and offers an effective design of hierarchical large-scale neural networks. I hope that this book will arouse some interest and be a help to researchers working in the research field of Brainware. I would like to express sincere thanks to Prof. Tadashi Shibata for his contribution by organizing "Bio-inspired Hardware" sessions at IIZUKA'98.
Tsutomu Miki Volume Editor Iizuka, Japan March, 2000
Contents Series Editor's Preface
v
Volume Editor's Preface
vii
Chapter 1
Neuron MOS Transistor: The Concept and Its Application Tadashi Shibata
Chapter 2
Adaptive Learning Neuron Integrated Circuits Using Ferroelectric-Gate FETs Sung-Min Yoon, Eiske Tokumitsu, Hiroshi Ishiwara
33
An Analog-digital Merged Circuit Architecture Using PWM Techniques for Bio-Inspired Nonlinear Dynamical Systems Takashi Morie, Makoto Nagata, Atsushi Iwata
61
Chapter 3
Chapter 4
Application-Driven Design of Bio-Inspired Low-Power Vision Circuits & Systems Andreas Konig, Jan Skribanowitz, Michael Eberhardt, Jens Doge, Thomas Knobloch
1
89
Chapter 5
Motion Detection with Bio-Inspired Analog MOS Circuits Hiroo Yonezu, Tetsuya Asai, Masahiro Otani, Naoki Ohshima
123
Chapter 6
v MOS Cellular-Automaton Circuit for Picture Processing Masayuki Ikebe, Yoshihito Amemiya
135
Chapter 7
Semiconductor Chaos-Generating Elements of Simple Structure and Their Integration Koichiro Hoh, Tatsuo Tsujita, Takahiro Irita, Yuichiro Aihara, Jun-ya lrisawa, Akira Imamura, Minoru Fujishima
163
x
Contents
Chapter 8
Computation in Single Neuron with Dendritic Trees .., Norihiro Katayama, Mitsuyuki Nakao, Mitsuaki Yamamoto
179
Appendix A
197
About the Authors
207
Keyword Index
229
Chapter 1 Neuron MOS Transistor: The Concept and Its Application Tadashi Shibata The University of Tokyo
Abstract A multiple-input transistor has been developed by a simple modification in the regular MOSFET structure. The transistor turns on when the weighted sum of input signals exceeds the threshold voltage of the device. Due to its functional similarity to McCulloch and Pitts Model of a neuron [1], it is named neuron MOS transistor or vMOS (neuMOS) for short. Such a functionality enhancement in the elementary device has shown a great impact on the way of constructing circuits and systems. In applications to binary logic circuits, the number of transistors and interconnects has been remarkably reduced. The concept of soft hardware, i.e., the real-time reconfigurable logic gates has been also developed using vMOS. Several hardware-computing schemes have been developed in which algorithms are directly carried out in the circuits using vMOS' as key elements. This allows real-time response of a system in real-world data processing. In order to implement intelligent systems on silicon, the association processor architecture has been developed based on a psychological brain model and implemented in the vMOS hardware-computing scheme. Applications of the association processor architecture to recognition problems as well as to practical problems are presented. Keywords : neuron MOS transistor, soft hardware logic, hardware algorithm, analog/digital merged processing, psychological brain model, association processor, winner take all, analog EEPROM, vector quantization
1.1 Introduction Over the past several decades the progress of semiconductor technology as represented by the number of components on a chip has been growing exponentially following the Moore's Law [2]. It has dramatically impacted on the digital computer technology and we can now enjoy the super-computer performance of eighties with our laptop PC's. Present-day computers are dedicated machines for ultra fast numerical calculations. Although their
l
2
T. Shibata
computing powers are enormous, diey are not very good at such tasks like seeing, recognizing, and taking immediate actions, while they are just effortless for humans, or for biological systems in general. A question arises if such a performance gap is going to be narrowed by just increasing the clock frequencies of MPU's and integration densities of memories, and by further sophistication in software programs. The scaling of device dimensions and the resultant enhancement in the integration density and speed performance have been the sole success scenario of silicon technology. However, such a scenario is encountering the fundamental limitations of material properties and device physics [3] as well as severe economic issues. New paradigms in computing are now in critical demand. This article proposes an approach to the problem based on the functionality enhancement in an elementary device. This allows us to conduct some elemental information processing at the very hardware level, thus lessening the burdens of softwares in a total system. In addition, the hardware cost is greatly reduced because elemental computations are carried out in very simple electronic circuits. The concept of such hardware computation scheme has been extended to a hardware-intensive recognition system based on a psychological brain model. The association processor architecture, a hardware maximum-likelihood search engine, has been developed for such recognition systems. In §1.2, a naive comparison between electronic systems and biological systems is made. The concept of vMOS is introduced in §1.3 and its application to binary logic circuits is presented in §1.4. As an example of hardware computation using vMOS, a center-of-mass tracker circuit is described in §1.5. In §1.6, a psychological brain model is presented and the association processor architecture is described as its prototype hardware implementation. Applications of the architecture to some practical problems are presented in §1.7 and the concluding remarks are given in §1.8. 1.2 Bio Processing vs. Electronic Processing The comparison between the biological computing system and electronic system is given in Fig. 1.1. A frog finds a fly passing through and catches it in a moment. This action has been produced by a series of information processing carried out within this small creature. The processing would include the capture
Neuron MOS Transistor: The Concept and Its Application 3 of the fly's image on the retina, identification of the object as food, computation of its expected motion followed by the activation of motor neurons to catch the fly. Such a real-time action, however, is impossible even with the most advanced super computers. The switching speed of a short channel transistor, the very basic element of a computer, is about 109 times faster than its biological counterpart. Signals on metal interconnects travel at a speed of light, while nerve impulses propagate only 2~3m/sec in the brain. Why, with such an overwhelming speed advantage, is real-time response not possible in electronic systems? One of the major reasons, we believe, is the difference in the functionality of an elementary I
See and Catch !!
r—~:
; \ v/
Real-Time Response I
[_
i
— t
Pre Inverter
(X.ltXJ:(0,0) Main Inverter
t
t vDD
(0,1) (1,0) (1,1)
Vp (Principal Variable)
Fig. 1.5 vMOS logic gate representing Exclusive OR. VP (principal variable) takes the multilevel values shown in the insert according to the binary inputs X| and X2. The ratios of coupling capacitors are indicated by fractional numbers as 1/2, 3/8, and 1/8. On the right is a floating gate potential diagram representing fa (floating gate potential) as a function of VP. Here the effect of the floating gate-to-ground capacitance C0 is neglected for simplicity. In order to include the effect of C0, the ordinate needs be multiplied by y like yVDD and l/2yVDD etc., where y = (Q-K^+Ca) / (C0+ C,+C2+C3) with C L C2, C 3 being the three input gate capacitors.
Neuron MOS Transistor: The Concept and Its Application
7
figure. If we apply a large positive voltage to the control terminal, for instance, it is easy to turn on the transistor by the gate terminal because the floating gate potential is already boosted by the control terminal, making it behave as a depletion-mode transistor. If a negative bias is applied, it behaves as a highthreshold-voltage enhancement-mode transistor. The variable threshold nature of vMOS plays an essential role in introducing flexibility to the function of electronic circuits. A typical form of a binary logic gate implemented by vMOS technology is illustrated in Fig. 1.5. VP (called principal variable) is an input variable, taking multiple-level values according to the binary inputs Xi and X2 as indicated in the figure. It should be noted that the conversion of Xi and X2 to VP does not need any special circuitry because it is easily done by directly transferring Xi and X2 to the vMOS floating gate via coupling capacitors having the ratio of 2:1 as is the case in Fig. 1.6 (for design details, see Ref. [5, 6]). However, the explanation below is given using VP because of the simplicity of discussion. The circuit in Fig. 1.5 is an Exclusive OR gate and the diagram on the right represents the floating gate potential §v as a function of VP increasing from OV to VDD. Assume for the moment that the input terminal of "3/8" is grounded and only VP on the "1/2" terminal is increasing. The contribution of VP to fp via the "1/2" input terminal is represented by the shaded triangle in the diagram on the
V MOS Inverters Fig. 1.6 Soft Hardware logic circuit for two binary inputs Xi and X 2 composed of three pre-inverters and a main inverter. Any of all possible 16 Boolean functions can be specified by three control signals VA, V Bs V c [5, 6].
8
T. Shibata
right. Since the coupling capacitance of the gate is 1/2 of the total capacitance, F never exceeds VDD/2, the CMOS inverter threshold. Namely, the direct VP input only cannot upset the inverter, and the contribution from the indirect input via the pre-inverter is essential for logic operation. If the pre-inverter has the inversion threshold of 3/4VDD, the overall variation of ()>F would be the one shown in the diagram. (The contribution from the pre-inverter output indicated by the parallel piped is superimposed on the triangular contribution from the direct VP input.) I I
1 1 1
!
Association • Processor CnJ^ Sensor with vMOS Processor
1i i i r
o
II 1 1 1 Winner-Take-All
Maximum-Likelyhood Event
Fig. 1.12 Hardware recognition system based on the psychological brain model in Fig. 1.11.
Neuron MOS Transistor: The Concept and Its Application
13
template vectors. The association is conducted by calculating the distances between the input code vector and the stored template vectors and searching for the minimum distance vector by a winner-take-all (WTA) circuitry [23]. In building such systems, the analog/digital merged computation scheme using vMOS circuits is employed as a guiding principle.
1.6.2
Non-Volatile Vast Memory Technology
In conducting the analog/digital-merged computation, storage of analog or multivalued data is essential. In our recognition system, the mass storage of knowledge in the form of analog template vectors is particularly important. For this reason, a high-precision analog EEPROM technology [24] has been developed. The chip does not require time-consuming write/verify cycles [25], thus being compatible with real time knowledge capture. The memory cell structure is shown in Fig. 1.13(a). It is a regular floatinggate EEPROM cell, but the tunneling electrode was made floating and the programming voltage is applied through a capacitor, while the main control electrode coupled to the floating gate being grounded. By making the control-
Programming Voltage
VD0
Input
(b) Fig. 1.13 Analog EEPROM cell with real-time writing control.
14 T. Shibata
.,10 h V 5
v~
IS2
\i •1
§5 O
0 500 Time [usee]
Fig. 1.14 Measured operation of the analog EEPROM cell (in Fig. 1.13) during data writing. electrode capacitance large enough as compared to the tunnel-oxide capacitance, die floating-gate potential becomes only dependent upon the net charge on die floating gate. The memory content (the floating gate potential) during data writing is real-time monitored through the source follower action of the memory transistor. When the memory cell content reaches the target value, the vMOS comparator turns the control transistor on and terminates data writing. The comparator is composed of multiple inverters (Fig. 1.13(b)). The target value is memorized in the vMOS inverter during auto zeroing. Positive feedback to one of thevMOS input terminal stabilizes the operation. Measured waveforms of the memory cell during writing is demonstrated in Fig. 1.14. As a high programming voltage is applied to the tunneling electrode, the memory value increases due to electron extraction from the floating gate. When the memory content arrives at the target value, the comparator turns on and terminates data writing. Improvements were made in the cell structure shown in Fig. 1.13 (a) to enhance the accuracy of data read/write. The source-follower readout was replaced by an op-amp voltage-follower circuitry in which the memory transistor is incorporated as one of the pair transistors in the differential pair. A tight cell layout resulted in a higher cell density. Furthermore merging analog EEPROM transistors into the matching cell circuitry (absolute value circuit) is
Neuron MOS Transistor: The Concept and Its Application
15
under study. 1.6.3 v MOS Association Processor The architecture of the vMOS association processor is shown in Fig. 1.15 where X is an input vector and A-Z template vectors down loaded from the vast memory. At each matching cell, the absolute value of difference IX ; - Z ;l is calculated and transferred to the floating gate of a vMOS source follower and accumulated. Therefore the output of thevMOS source follower yields the Manhattan distance, the dissimilarity measure between the input vector and the template vector. The WTA is composed of vMOS inverters having two-equally weighted inputs. At time t = 0, all vMOS inverters are in on states. This is because VDD is fed to one of the inputs and a non-zero distance value to the other, thus biasing the inverter above the threshold of VDD/2. When the common voltage is ramped down, thevMOS inverter receiving the smallest distance value turns off firstly. At this moment, the feedback loop in each inverter is closed and the state of the inverter is frozen. The location of the
A Xo-
x=
XiI I
-
Xn-
B
4 • S I SI >SI
z 4
Matching Cell IXi-Zil
si vMOS Source Follower
•mi
21X1-2 I
• s i S4
ULkB ___ jE a of^>
o?T>
WTA Fig. 1.15 Architecture of vMOS association processor. Manhattan distances are calculated by matching cell array and the minimum distance vector is searched by a winner-take-all circuit.
16 T. Shibata
©
A.1
\
Vi
Al
/
i ii
1TI
Reset Operation |
©
t
V2-Vi>.
\
© ,Vi-V2
II yi
V2-V1
^o(p v
^1
YZ^I
|Vl-V2l
Exchange of Input Voltage
Source Follower Activation
Fig. 1.16 Operation principle of absolute value circuit. smallest distance vector is identified by a flag appearing at the off-state inverter. Substantial computation is conducted by analog processing which is immediately followed by binary decision. This analog/digital-merged decision making operation is an essential feature of the vMOS circuitry. The operation of the absolute value circuit [26] is explained in Fig. 1.16. The circuit is composed of two floating gate NMOS' connected at their source terminals. While V, and V2 are fed to the input terminals, the floating gates are firstly grounded (®) and then disconnected from the ground to make them electrically floating ((2)). Then me input voltage is exchanged as shown in ® , resulting in the floating gate voltages of V2— V\ on the left and V! — V2 on the right. Here die floating-gate-to-ground capacitance is assumed to be negligibly small as compared to the input-gate-to-floating-gate capacitance for simplicity of explanation. When the source follower operation is activated as shown in (4), the output follows die larger of V2— V, or Vi — V2, namely IV! — V21 (Here it
Neuron MOS Transistor: The Concept and Its Application Input Vector _ v
Maximum Likelihood Pattern
> s
»""**: "*r*n-i.
5
(b)
(a)
10
Time In sec]
Fig. 1.17 (a) Photomicrograph of a test circuit of vMOS association processor, (b) Measurement results of the test circuit. is also assumed that the NMOS threshold ^ 0 . ) . Fig. 1.17 demonstrates the photomicrograph of a test circuit and the measurement results. An input vector of uiree components were compared with eight template vectors A~H. Although there is no template pattern exactly matching the input, the circuit automatically recalls the pattern C as the most similar to the input. Analog EEPROM VroVssVlN '•*»#•
18] iVM
*—tfCi
fS?
u H Input Vector
13Trs (a)
I
' — 1 _ Cl
raiOisi SH(-
Input Vector
5Trs (b)
Fig. 1.18 Two types of absolute value circuit for matching cell: (a) data are down loaded from EEPROM; (b) data are embedded in vMOS' in the cell.
18
T. Shibata
The matching cell (the absolute value circuit of Fig. 1.16) consumes a large chip area due to the crossbar switches to exchange V, and V2 and the interconnects for downloading template data from the analog EEPROM. This is illustrated in Fig. 1.18(a). In Fig. 1.18(b), a new ROM-version cell is presented [27] in which the template dada are merged into the matching cell using the concept of vMOS multivalued ROM technology [28]. The analog memory value is represented by the ratio of two input terminal capacitances of a vMOS, where die memory content is recalled by giving OV and VDD to respective gates. If the template data are established by off-line computation, this new version cell yields a higher integration density of matching cells, resulting in further enhanced association capability of the chip. 1.7 Applications of Association Processor Architecture 1.7.1 Vector Quantization Processorfor Motion Picture Compression As a straightforward application of the association processor architecture, the vector quantization (VQ) chips have been developed for motion picture compression and about three orders of magnitude faster performance has been demonstrated as compared to typical CISC processors. The VQ chips were implemented in conventional CMOS digital circuitry employing a fully parallel SIMD architecture [29, 30] as well as in the vMOS circuitry [31], resulting in the eight times higher integration density in the vMOS implementation. This is briefly described in the following. The vector quantization (VQ) [32] algorithm employed in the system is explained in Fig. 1.19. A fragment taken from the original picture (4X4 pixels for instance) is an abstract pattern of gray patches, which can be approximated by one of the template patterns stored in the code book. Thus the pixel data are compressed to the code number of the template. Although the algorithm is straightforward, the template matching is an extremely expensive computation.
Neuron MOS Transistor: The Concept and Its Application
19
I Fig. 1.19 Vector quantization ;(VQ) algorithm for image i compression. RECONSTRUCTED
However, this is the task that the association processor can carry out most efficiently. 1.7.1(a) Digital VQ processor In order to prove the VQ algorithm is effective for motion picture compression, we first implemented a VQ processor in a pure digital CMOS technology. The most important concern of the system is the real-time encoding of motion INPUT VECTOR
~ffi
ffjj
(16 elements, 8 bit /element)
SLAVE |
GLOBAL WINNER CODE Fig. 1.20 Organization of digital VQ system composed of eight VQ processor chips. The shortest distance vector (winner) is searched in three steps of competition.
20
T. Shibata
pictures. In order to encode a 640X480 full color picture in a 4:1:1 format within 33 msec, a single VQ operation must be completed within 1.1 -pec. Our sfrategy toward this end is as follows. Firstly a folly parallel SIMD architecture is employed. Secondly a single VQ operation is conducted in two pipeline stages, each pipeline segment consisting of 19 cycles. As a result, a single VQ operation is finished in every 1.1 psec at a. clock frequency of 17 MHz. Thirdly the chip is extendible to 8-chip master-slave configuration, enabling us to perform a fully parallel search for maximum 2048 template vectors in 1.1 usee. Fig. 1.20 shows the block diagram of the VQ chip module, which is composed of eight VQ chips, namely one master chip and seven slave chips. Each VQ chip stores 256 template vectors in the embedded SRAM. The input vector is given to all the chips at the same time and stored hi the input p i p o buffers. The template vector having the minimum distance to the input vector is searched in three stages of competition by using digital winner-take-all (WTA) circuits. The first stage is performed in each 64-vector matching block where the distances between the input vector and 64 template vectors are calculated and the winner (the shortest distance vector) is selected in each block. The second stage is conducted on each chip and the chip winner is selected by the 2nd 8-Vectors
7.98 mm
Fig. 1.21 Photomicrograph of digital VQ processor chip fabricated in 0.6pm single-polysilicon triple-metal CMOS technology.
Neuron MOS Transistor: The Concept and Its Application
21
WTA. The distance of each chip winner is sent to the master chip, where the final competition is carried out to find out the global winner. Fig. 1.21 shows a photomicrograph of the chip fabricated in a 0.6-|im singlepoly triple-metal CMOS technology. The search time for 2K template vectors in the eight-chip master-slave configuration is 1.1 usee at 17 MHz of the clock frequency, and the power dissipation of a chip is 0.29 W under 3.3 V power supply. A single VQ operation for 2K template vectors on typical CISC processors requires roughly 1.2 M operations. This number was derived from the estimation: (38 operations/element) X (16 elements/vector) X (2048 vectors/VQ) = 1.2M operations/VQ. The present VQ system in the eight-chip configuration can do this job in 1.1 \\sec, which is equivalent to a CISC processor performance of about 1000 GOPS (1.2M operations/ 1.1 jisec). 1.7.1(b) vMOS VQ Processor An analog vector quantization processor has been also developed using the neuron-MOS (vMOS) technology [31]. In order to achieve a high integrating density, the template-merged matching cell shown in Fig. 1.18(b) is employed in
Reset
Matching Degree Resistance Control
| " 1 Q " | " 1 5 " | "7" '3Floating' *~~^ I « *~ " 256-Vector Matching Block Gate
TWfe=fe X^^n
Neuron-MOS Comparator
yyy.
A
High Gain Amplifier,
Controller
3&F&S Latch Winner - Observer Winner Code =
"00000001"
Fig. 1.22 Self-convergent vMOS WTA circuit employed in vMOS VQ processor.
22
T. Shibata
the absolute value circuitry. A newarchitecture vMOS winner-take-all (WTA) circuit has been developed to resolve the trade-off relation between the search speed and the discrimination accuracy. In Fig. 1.22, the WTA architecture is illustrated. All 256 comparator outputs are fed to an OR gate and its output is fed back to the reference voltage terminal of each comparator, thus forming a multiple-loop ring oscillator. The loop gain is controlled by the Fig. 1.23. vMOS VQ processor chip variable resistance inserted in the fabricated in 1.5jim double-poly CMOS loop. At the start of WTA activation, technology. all the vMOS comparators turn on and the OR output starts an l-to-0 transition. This transition is fed back to all comparators and provide them with a descending reference voltage. If one of the comparators upsets, the OR gate upsets also and starts a O-to-1 transition. Detecting this transition, the controller increases die value of the variable resistance. In this manner the feed back gain is step-by-step reduced and the winner search accuracy is gradually increased from the coarse search with a low scan rate to the fine search with a high scan rate. The fixed-value resistances were made by MOS transistors and the value was altered by changing the resistor connections. In this manner, die new WTA performs multi-resolution winner search in an automatic control. The circuit was designed to achieve the discrimination accuracy of 5mV after five scan steps. A photomicrograph of the analog VQ processor chip is shown in Fig. 1.23. The chip was built in a 1.5-fim double-polysilicon CMOS technology and has the chip size of 7.2mm X 7.2mm. A single chip contains 256 16-element template vectors. This is equivalent to one eighth of the chip size of the digital CMOS implementation (built in a 0.6-jim CMOS technology) if it is assumed that the chip size scales with die minimum feature size of the technology.
Neuron MOS Transistor: The Concept and Its Application 23 1.7.2. Fully Parallel Motion-Vector Detection Circuitry [33, 22] The basic architecture of the vector matching circuitry in the vMOS association processor (Fig. 1.15) has been applied to the motion vector detection, or motion compensation, the most time consuming processing in the MPEG-2 coding. The motion of an object in two successive frames is obtained based on the image data projected onto x- and y- axis. The circuit configuration is shown in Fig. 1.24. The x-projection data at time t are intentionally shifted to ±4 pixels and matched with the data at time t + At. The absolute value of difference is calculated and summed up for each shift and die best much is searched by the WTA. The x-component of the motion vector is identified by showing a flag at one of the WTA outputs. In this manner, the computationally expensive motion compensation can be conducted in a very short time on a very simple hardware. The HSPICE simulation results are shown in Fig. 1.25. In Fig. 1.26, a photomicrograph of the test circuit designed for ±2 pixel shifts is demonstrated, and the measured data are presented in Fig. 1.27. In this manner, the basic
Fig. 1.24 vMOS motion-vector detection circuit composed of matching cell array and WTA.
24 T. Shibata
w
„
r+-n
ft " 0
_
•
TIMEt,
x-WinnerCellt+31
y ._i-;:''_J.OS^&
4
LATCH
^ — ~ " "
-
2
0
I:
0
. . . .SNinDBT . . . . CBlU+Si
i
Losers/ •
/
6 •3 *
5 2
"
n
^*~ Winner Cell(+3)
,
x—Losers
TIME [ sec ]
Fig. 1.25. HSPICE simulation results for the circuit of Fig. 1.24 with test input data shown in the figure. The circuit was simulated assuming 0.5um technology.
operation of the circuit is experimentally verified (The chip was built using a 3Hm Tohoku University lab. processing).
Fig. 1.26 A photomicrograph of test circuit designed for :2 pixel shifts (fabricated by CMOS process with 3-um layout rules).
Neuron MOS Transistor: The Concept and Its Application 25
=*=F CONTROL
~F=+
OUTPUT +1 PIXEL SHIFJ^j RAMP
-
•••
• — - ^
K
2(isec/div Fig. 1.27 Measured wave forms of the circuit in Fig. 1.18, showing only the output of +1 pixel shift is falling. The data were monitored by direct probing to negativelogic outputs
1.7.3 CDMA Matched Filter [34] The self correlation matching technique developed for the motion vector detection has been extended to build a matched filter, one of the key components in the next generation WB-CDMA wireless communication systems. In this application the templates are binary vectors representing the short PN (pseudorandom noise) codes with varying phase shifts. The chip architecture is shown in Fig. 1.28. An input signal train captured by sample and hold circuits is simultaneously matched with a group of templates having all possible shifts in the phase of an identical PN code. The maximum correlation is detected by fully parallel matching using the binarysearch vMOS winner-take-all circuit. Such a parallel architecture enables us to perform very fast peak detection as well as the detection of second and third correlation peaks arising from multi-path delays. Matching cell used in the matched filter is given in Fig. 1.29. Since a template vector is a certain length of a PN code composed of ± 1 , it is easily
26 T. Shibata Input DATA
I
I
I
WTA Controller
Sample/Hold
1
Aiit
,r
+
lin
:e u
L^J. v MOS Source Follower
W
t
Arniy
< H
Degree of Matching in Binary Code
O C
Winner Code (Phase)
Fig. 1.28 Block diagram of vMOS matched filter. implemented as a pattern of each switching state either to VREF or to Vj in the reset and evaluate cycles. A photomicrograph of the test chip fabricated in a 0.6|jm double-poly triple-metal CMOS technology is shown in Fig. 1.30 and the fundamental operation of the system has been experimentally demonstrated [34].
EVALUATE
RESET
PN Code
dating Gate
V V
R3
= V V
RBF
+
-£(V i -V M ! F )-(PN) l
^
Fig. 1.29 Binary matching cell used in vMOS CDMA matchedfilter.PN code of ±1 is determined by the switching pattern in the reset and evaluate cycles
Neuron MOS Tmnsistor: The Concept and Its Application
27
Fig. 130 Photomicrograph of a test chip of vMOS matched filter fabricated in a 0.6fim double-polysilicon triple-metal CMOS technology.
1.8
Conclusions
It has been discussed that the functionality enhancement in an elementary device has a number of impacts on circuits and systems. The concept of neuron MOS transistor and the analog/digital merged hardware computation scheme implemented by vMOS circuits have shown a number of interesting features. These include a remarkable simplification in the circuit configuration as weE as the totroduction of flexibility in toe functionality when applied to binary logic circuits. The scheme has been also successfully applied to motion detection as well as to association processor implementation. Various other interesting applications have been exploited such as those to fuzzy processors [35], a finger print identification chip [14], a differential-of-gaussian filtering chip [36] and so forth. Aiming at building real-time recognition systems, a psychological brain model has been proposed and the vMOS association processor architecture has been developed as its hardware implementation model. In order to apply the association processor architecture to real recognition problems, how to represent the input image by a characteristic vector is of primary importance. Namely, the
28
T. Shibata
dimensionality reduction in the input image data while retaining its essential features is most essential. A hardware friendly vector representation algorithm has been developed and its versatile characteristics have been proven by simulation in applications to handwritten character and hand-drawn pattern recognition and medical radiograph analysis [37]. The study on hardware implementation of the characteristic vector extraction algorithm as well as the total system integration on silicon is now in progress.
Acknowledgment The major part of the work presented in this article was done in collaboration with Prof. T. Ohmi at Tohoku University when the author was at Tohoku University. The contributions of Prof. K. Kotani, Ning Mei Yu, Y. Yamashita, M. Konda, and T. Nakai of Tohoku University and A. Nakada formerly at The University of Tokyo are acknowledged and the author would like to express his sincere thanks to all of these people. This work was partially supported by the Ministry of Education, Science, Sports, and Culture under Grant-in-Aid for Scientific Research on Priority Areas, "Ultimate Integration of Intelligence on Silicon Electronic Systems" (1995-98), and also by Semiconductor Technology Academic Research Center (STARC) under the research project, "Right-Brain Computing Integrated Circuits and Their Application to Real-Time Image Processing" (1997-1998). Some of the chips presented here were fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Nippon Motorola LTD., Dai Nippon Printing Corporation, and KYOCERA Corporation and also in collaboration with Rohm Corporation and Toppan Printing Corporation
Neuron MOS Transistor: The Concept and Its Application
29
References [1] W. S. McCulloch and W. Pitts, "A logical calculus of the ideas immanent in nervous activity," Bull. Math. Biophys., vol. 5, p. 115 (1943). [2] G. Moor, "Progress in digital integrated electronics," IEDM Tech. Dig., 1975, pp. 1113. [3] J. D. Meindl, "Low power microelectronics: Retrospect and prospect," Proc. IEEE, Vol. 83, No. 4, pp. 619-635 (1995). [4] T. Shibata and T. Ohmi, "A functional MOS transistor featuring gate-level weighted sum and threshold operations," IEEE Trans. Electron Devices, Vol. 39, No. 6, pp. 1444-1455 (1992). [5] T. Shibata and T. Ohmi, "Neuron MOS binary-logic integrated circuits: Part I: "Design fundamentals and soft-hardware-logic circuit implementation," IEEE Trans. Electron Devices, Vol. 40, No. 3, pp. 570-576 (1993). [6] T. Shibata and T. Ohmi, "Neuron MOS binary-logic integrated circuits: Part n, Simplifying techniques of circuit configuration and their practical applications," IEEE Trans. Electron Devices, Vol. 40, No. 5, pp. 974-979 (1993). [7] T. Shibata, K. Kotani, and T. Ohmi, "Real-time reconfigurable logic circuits using neuron MOS transistors," in ISSCC Dig. Technical papers, Feb. 1997, FA 15.3, pp. 238-239(1993). [8] W. Weber, S. J. Prange, R. Thewes, E. Wohlrab, and Andreas Luck, "On the application of the neuron MOS transistor principle for modern VLSI design," IEEE Trans. Electron Devices, Vol. 43, No. 10, pp. 1700-1708 (1996). [9] K. Ike, K. Hirose, and H. Yasuura, "A module generator of 2-level neuron MOS circuits," in the Proceedings of the 4th International Conference on Soft computing, Methodologies for the conception, design, and Application of Intelligent Systems (World Scientific, Singapore, 1996) pp. 109-112. [10] T. Shibata, H. Kosaka, H. Ishii, and T. Ohmi, "A neuron MOS neural network using
30
T. Shibata self-learning-compatible synapse circuits," IEEE J. Solid-State Circuits, Vol. 30, No. 8, pp. 913-922(1995).
[11] H. Kosaka, T. Shibata, H. Ishii, and T. Ohmi, "An excellent weight-updating-linearity EEPROM synapse memory cell for self-learning neuron-MOS neural networks," IEEE Trans. Electron Devices, Vol. 42, No. 1, pp. 135-143 (1995). [12] Jun-ichi Nakamura and E. R. Fossum, "Image sensor with image smoothing capability using a Neuron MOSFET," in Charge-Coupled device and solid State optical sensors IV, Proc. SPIE Vol. 2172, pp. 30-37 (1994). [13] M.Ikebe, M. Akazawa, and Y. Amemiya, "vMOS Cellular-Automaton devices for intelligent Image Sensors," Proceedings of the 5 th International Conference on Soft Computing and Information/Intelligent Systems, 16-20 Octover, 1998, Iizuka, Fukuoka, "Methodologies for the Conception, Design and Applications of Soft Computing," Vol. 1 (T. Yamakawa and G. Matsumoto, Eds.) pp. 113-117 [14] S. Jung, R. Thewes, T. Scheiter, K. F. Groser, and W. Weber, "A Low-Power and High-Performance MOS Fingerprint Sensing and Encoding Architecture," IEEE Journal of Solid State Circuits, Vol. 34, No. 7, pp. 978-984 (1999). [15] H. R. Mehrvarz and C. Y. Kwok, "A large-input-dynamic-range multi-input floatinggate MOS four-quadrant analog multiplier," IEEE Journal of Solid-State Circuits, Vol. 31, No. 8, pp. 1123-1131, August 1996. [16] K. Kotani, T. Shibata, M. Imai, T. Ohmi, "Clocked-neuron-MOS logic circuits employing auto-threshold-adjustment," in Digest of Technical papers, 1995 IEEE International Solid-State Circuits conference (ISSCC), San Francisco, FP 19.5, pp. 320-321 (1995). [17] K. Kotani, T. Shibata, and T. Ohmi, "DC-Current-Free Low-Power A/D Converter Circuitry Using Dynamic Latch Comparators with Divided-Capacitance Voltage Reference," 1996 IEEE International Symposium on Circuit and Systems (ISCAS 96), Vol. 4, Atlanta, pp. 205-208, May (1996). [18] K. Kotani, T. Shibata, and T. Ohmi, "CMOS Charge-Transfer Preamplifier for OffsetFluctuation Cancellation in Low-Power, High-Accuracy Comparators," Digest of Technical papers, 1997 VLSI Circuit Symposium, Kyoto, June, pp. 21-22 (1997). [19] Ho-Yup Kwon, K. Kotani, T. Shibata, and T. Ohmi, "Low Power neuron MOS Technology for High-Fuctionality Logic Gate Synthesis, " IEICE Trans Electronics, Vol. E.80-C, No. 7, pp. 924-930 (July, 1997). [20] Ning Mei Yu, Tadashi Shibata, and Tadahiro Ohmi, "A Real-Time Center-Of-Mass Tracker Circuit Implemented by Neuron MOS Technology," IEEE Transactions on
Neuron MOS Transistor: The Concept and Its Application
31
Circuit and Systems II, vol. 45, No.4, pp.495-503 (1998). [21] T. Shibata and T. Ohmi, "Neural Microelectronics," Technical Digest, International Electron Devices Meeting (IEDM) 1997, Washington D. C , pp. 337-342. [22] T. Shibata, T. Nakai, N. M. Yu, Y. Yamashita, M. Konda, and T. Ohmi, "Advances in neuron-MOS applications," in ISSCC Dig. Technical Papers, Feb. 1996, SA 18.4, pp.304-305. [23] T. Yamashita, T. Shibata and T. Ohmi, "Neuron MOS winner-take-all circuit and its application to associative memory," in ISSCC Dig. Technical Papers, Feb. 1993, FA 15.2, pp. 236-237. [24] Y. Yamashita, T. Shibata, and T. Ohmi, "Write/Verify Free Analog Non-Volatile Memory Using a Neuron-MOS Comparator," 1996 IEEE International Symposium on Circuit and Systems (ISCAS 96), Vol. 4, Atlanta, pp. 229-232, May (1996). [25] J. Hemink, T. Tanaka, T. Endo, S. Aritome, and R. Shirota, "Fast and accurate programming method for multi-level NAND EEPROMs," in 1995 Sym. VLSI Technology, Kyoto, Dig. Technical papers, pp. 129-130. [26] M. Konda, T. Shibata, and T. Ohmi, "Neuron-MOS Correlator based on Manhattan distance computation for event recognition hardware," 1996 IEEE International Symposium on Circuit and Systems (ISCAS 96), Vol. 4, Atlanta, pp. 217-220, May (1996). [27] M. Konda, T. Shibata, and T. Ohmi, "A Compact Memory-Merged Vector-Matching Circuitry for Neuron-MOS Associative Processor," IEICE Transactions on Electronics, Vol. E82-C, No. 9, pp. 1715-1721 (1999). [28] A. Rita, T. Yamashita, T. Shibata, and T Ohmi, "Neuron-MOS multiple-valued memory technology for intelligent data processing," in ISSCC Dig. Technical papers, Feb. 1994, FA 16.3, pp. 270-271 (1994). [29] T. Shibata, A. Nakada, M. Konda, T. Morimoto, T. Ohmi, H. Akutsu, A. Kawamura, and K. Marumoto, "A fully-parallel vector quantization processor for real-time motion picture compression," in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 236-237. [30] A. Nakada, T. Shibata, M. Konda, T. Morimoto, and T. Ohmi, "A fully-parallel vector quantization processor for real-time motion picture compression," IEEE Journal of Solid-State Circuits, Vol. 34, No. 6, pp. 822-830, June 1999. [31] A. Nakada, M. Konda, T. Morimoto, T. Yonezawa, T. Shibata and T. Ohmi, "FullyParallel VLSI Implementation of Vector Quantization Processor uisng Neuron-MOS Technology," Vol. E82-C, No. 9, pp. 1730-1737 (1999). [32] A. Gersho and R. M. Gray, "Vector quantization and signal compression," Kluwer
32
T. Shibata Academic Publishers, Boston, 1992.
[33] T. Nakai, T. Shibata, and T. Ohmi, "Neuron-MOS Quasi-Two-Dimensional Image Processor for Real-Time Motion Vector Detection," in the Proceedings of the 4th International Conference on Soft computing, Methodologies for the conception, design, and Application of Intelligent Systems (World Scientific, Singapore, 1996) pp. 833-836. [34] A. Okada and T. Shibata, "A Neuron-MOS Parallel Associator for High-Speed CDMA Matched Filter," The 1999 IEEE International Symposium on Circuits and Systems (ISCAS '99), Vol. 2, Orlando, Florida, May. 30 - June 2, 1999, pp. 11-392 395. [35] Ning Mei Yu, Tadashi Shibata, and Tadahiro Ohmi, "An Analog Fuzzy Processor Using Neuron-MOS Center-of-Mass Detector," Proceedings of the 6th International Conference on Microelectronics for Neural networks, Evolutionary & Fuzzy Systems (MicroNeuro'97), 24-26 September, 1997, Dresden, pp. 121-128. [36] T. Sunayama, M. Ikebe, and Y. Amemiya, "A vMOS Cellular-Automaton Device for Differential-of-Gaussian Filtering," Extended Abstract of the 1999 International Conference on Solid State Device and Materials, Tokyo, 1999, pp. 110-111. [37] T. Shibata, M. Yagi, and M. Adachi, "Soft-Computing Integrated Circuits for Intelligent Information Processing," Proceedings of The Second International Conference on Information Fusion, Vol.1, pp.648-656, Sunnyvale, California, July 68, 1999.
Chapter 2 Adaptive Learning Neuron Integrated Circuits Using Ferroelectric-Gate FETs Sung-Min Yoon,
Eisuke Tokumitsu,
and Hiroshi Ishiwara
Tokyo Institute of Technology
Abstract An adaptive-leaning neuron circuit composed of MFSFET and complementary unijunction transistor (CUJT) oscillation circuit was fabricated on an SOI (siliconon-insulator) structure as the first step to the next-generation neural network. SrBi2Ta20g (SBT) was selected as a ferroelectric gate material and patterned by a newly developed selective etchant, NH4F:HC1. It was demonstrated that the fabricated MFSFET showed good memory operations and gradual learning effect in which the drain current was changed gradually by applying a number of input pulses with a sufficiently short duration time. It was also demonstrated that the output pulse frequency of neuron circuit increased gradually as the number of input pulses was increased. Finally, the problem of small output pulse height of the neuron circuit was solved by replacing C U J T oscillation circuit with the CMOS Schmitt-trigger circuit. Keywords : adaptive-learning, neuron circuit, MFSFET (metal-ferroelectricsemiconductor field effect transistor), synapse array, P F M (pulse frequency modulation), SOI (silicon-on-insulator), SrBi2Ta20g, selective-etchant, ferroelectric film, CMOS Schmitt trigger, complementary unijunction transistor (CUJT)
2.1
Introduction
Artificial neural networks, which execute a distributed parallel information processing and an adaptive-learning function, have attracted much attention for the future highly-developed information-oriented society. In a human brain, a huge quantity of information is processed in parallel, and stored as one's past experiences. In this system, neurons accept many weighted input signals and generate output pulses when the total value of 33
34
S.-M. Yoon, E. Tokumitsu
& H. Ishiwara
input signals exceeds a threshold value. The weighting operation for input signals is conducted by synapses which are attached to the neurons. Thus, synapses and neurons can be realized using memory devices and processors in an artificial neural network. However, the hardware implementation of a large-scale network is rather difficult, since the number of synaptic connections becomes huge as the number of neurons increases. One possible solution to this hardware problem is to use electrically-rewritable, nonvolatile analog memories, by which an electrically modifiable synapse array can be implemented in a small size. Actually, floating-gate MOS devices are used for this purpose [lj-[4j. In these devices, the data are stored as an amount of electrical charge injected through a tunnel oxide into the floating gate [5]. However, precise control of the quantity of injected carriers is rather difficult, unless the well-designed control circuit is used [3]. We have proposed a new concept of synaptic connection which is composed of an array of MFSFETs (metal-ferroelectric-semiconductor field effect transistors) and applicable to an adaptive- learning neural network [6]. In an MFSFET, the gate dielectric film of an MOSFET is replaced with a ferroelectric film, and it is used as an analog memory device for storing the past experiences through partial polarization of the ferroelectric film. The MFSFET array has the following merits superior to the floating-gate MOS device. First, the control circuit for modifying the synaptic weight values is much simpler in the MFSFET neuron circuit, since the polarization reversal phenomenon in a ferroelectric film is not so nonlinear as the electron tunneling phenomenon to a floating-gate, which reduces the synaptic connection area greatly. Secondly, the endurance for "rewrite" cycles in a ferroelectric film is more than 10 12 , which is much higher than the value (about 106) in a floating-gate FET. From these reasons, an MFSFET array is very promising for hardware implementation of neural networks. However, it is difficult, at present, to obtain good interface between a ferroelectric film and Si substrate in fabrication of MFSFETs, and it is further difficult to integrate MFSFETs with Si circuitry using conventional LSI technology without degradation of device performances. In this paper, we discuss some problems and solutions in fabricating PFM (pulse frequency modulation)-type neuron circuits integrated with MFSFETs and demonstrate the adaptive-learning function of the neuron circuit. In these circuits, CUJT (complementary unijunction transistor) and CMOS Schmitt-trigger circuit are used as the switching components for oscillation circuits.
Adaptive Learning Neuron Integrated Circuits . . .
2.2
35
Operation Principles of Adaptive-Learning Neuron Circuits
2.2.1 In our neuron circuit, a key device for realizing the adaptive-learning function is an MFSFET. Recently, the MFSFET has attracted considerable attentions as a promising device for nonvolatile memory applications, because they have crucial advantages compared with ferroelectric random access memories (FeRAMs) using ferroelectric capacitors [7j-[lOJ. Since the MFSFETs exploit the ferroelectric field effect, which is the modulation of conductivity by the electrostatic charges induced by ferroelectric polarization, the stored data can be read out nondestructively. Moreover, since the MFSFETs obey the scaling rule for the device miniaturization unlike DRAMs and capacitor-type FeRAMs, they are very desirable configuration for a single-transistor-cell-type high density nonvolatile memory. Actually, we have proposed a prototype of the single-transistor-cell-type digital memory, in which MFSFETs are arranged in a matrix form on an SOI (silicon-on-insulator) structure [ll]. However, fabrication of MFSFETs is very difficult. When a ferroelectric film is deposited on a Si substrate, generation of interfacial traps and interdiffusion of constituent elements occurs easily and the electrical properties as an FET become very poor. That is the main reason why no commercial nonvolatile memories loading MFSFETs have yet appeared, although the original idea on MFSFET is dated back to 1950's [12] and several prototype devices have been fabricated so far [13]-[15]. In order to realize the MFSFETs, the various ferroelectric materials and structures have been researched. In this sub-section, we will review the MFS-related structures briefly. Although P b Z r r T i ! _ x 0 3 (PZT) is a typical ferroelectric material with a high remnant polarization (Pr), it is well known that the PZT/Si structure causes terrible interdiffusion of Pb and Si atoms, even if the annealing temperature is as low as 500°C. Thus, various buffer materials (SrTi0 3 [16], Ce0 2 [17], Y 2 0 3 [18], YSZ [19] and MgO [20] etc.) to prevent the interdiffusion are being investigated. One of the most successful fabrications using PZT related material as a ferroelectric gate is Ir/Ir02/PZT/Ir/Ir02/Poly-Si/Si02/Si structure, in which the memory window (the shift of threshold voltage) of 3.3 V was achieved in JD - VG (drain current vs. gate voltage) characteristics for a bias sweep
36
S.-M. Yoon, E. Tokumitsu
& H. Ishiwara
of ±15 V [21]. However, in these MFIS or MFMIS (Hnsulator) structures, high operation voltage is required to provide the sufficient voltage to the PZT gate film since PZT has a relatively high dielectric constant. SrBi 2 Ta20g (SBT) is the other important material in nonvolatile memory applications because of its reasonably large remnant polarization and superior fatigue-free properties [22]. Since a sol-gel derived SBT film with poly crystalline structure can be deposited directly on Si without significant degradation of the interface, the SBT/Si structure can be used for MFSFET applications. We have confirmed the memory operations of MFSFET using this structure [23]. However, the memory window width is much narrower than the expected value, which is ascribed to the existence of transition layer with low dielectric constant such as SiC>2 formed at the interface. Although Ce02 and Y2O3 are also selected as buffer layers for forming an MFIS-FET [24]-[25], they are not very effective for improving this problem. Another Bi-layered ferroelectric material is Bi4Ti30i2 (BiTO). This material was often used in the early studies on MFSFET [13] and MFISFET [14]. Recently, excellent interface properties and data retention characteristics were obtained in MFIS diodes, in which Bi2SiOs (BSO) was used as an interfacial buffer layer and both BSO an BiTO were epitaxially grown on Si (100) substrate [26]. Other interesting materials are Sr2(Ta,Nb)207 and BaMgF 4 . The Pt/Sr 2 (Ta,Nb) 2 0 7 /Pt/Ir02/Poly-Si/Si02/Si structure showed excellent FET characteristics, in which the memory window of 3.6 V was obtained for a bias sweep of ± 5 V [27]. The fluoride ferroelectric materials such as BaMgF4 (BMF) has low dielectric constants and they can be deposited directly on Si substrates without formation of unintended transition layers, such as SiO x at the interface [28]-[29]. Furthermore, the interface state density of MFS diodes formed in a simple process seems to be relatively small [30]. However, the ferroelectricity of BMF is easily degraded through the fabrication process, especially by both dry and wet etching processes, and the data retention time of MFSFET using BMF film is still very short. In these MFS-related devices, one of the most important properties for realizing nonvolatile memories with nondestructive "read-out" operation is the data retention characteristics. Ideally speaking, the data stored in an MFSFET through "write" operation must be retained for years. However, in actual cases, the remnant polarization of ferroelectric film is prone to be reduced with time due to the depolarization field and leakage current
Adaptive Learning Neuron Integrated Circuits ...
37
through the ferroelectric film. Therefore, in order to improve the device performance of MFSFETs, it is very important to establish a better ferroelectric/semiconductor interface and to develop a more robust device structure for the retention failure. Recently, it was found that FETs with the gate structure of Pt/SBT/Pt/SrTa20e/SiON/Si showed good data retention characteristics when area ratio of the top electrode and the floating gate electrode was optimized [31].
2.2.2 Figure 2.1 shows a basic neuron circuit proposed as an elementary component of the pulse frequency modulation (PFM) type adaptive-learning neural networks. The term "adaptive-learning" means such a function that the electrical properties of a device are changed partially or totally by applying a certain number of usual signals to the device. In this circuit, MFSFETs correspond to the synapses and other devices (C, R and CUJT) form a neuron part which generates output pulses when the total input charge exceeds a threshold value. In order to realize the adaptive-learning function, the polarization state of the ferroelectric gate in MFSFET is partially reversed by applying input pulses to the gate terminal, and thus the channel resistance of MFSFET is gradually changed according to the polarization state. In other words, the synaptic values stored in MFSFETs are gradually changed by applying an adequate number of input signals. For this reason, the duration of input pulses must be sufficiently shorter than the switching time for polarization reversal of the ferroelectric film. This is a main reason why the PFM system is used in the proposed neuron circuit. In the neuron part, CUJT is used as a switching component to discharge the capacitor C, which corresponds to the threshold processing in a neuron. Since the output pulse interval of the circuit is proportional to the C times channel channel resistance of MFSFET, the output pulse frequency can be gradually changed as the number of input pulses is increased. This operation is similar to the information processing in a human brain, in which current pulses generated in neurons propagate through nerve membranes and axons.
38
S.-M. Yoon, E. Tokumitsu
& H. Ishiwara Vcc
CUJT
°Htt^H£H£ T Fig. 2.1
A Basic neuron circuit for an adaptive-learning neural network.
2.2.3
In neural networks, each neuron has many synapses and they are connected to the neurons in the previous layer. Figure 2.2 shows the schematic diag r a m of a two-layered neural network, in which the o u t p u t s of m neurons are fully connected to the n neurons in the next layer. In this neural network, mxn synapses are required, which can be realized by parallel connection of the M F S F E T s , as shown in Fig. 2.1. In this structure, each M F S F E T is differently polarized and accepts pulse signals from different neurons. Therefore, the total drain current summed u p for all M F S F E T s determines the o u t p u t behaviors of the neuron circuit. T h e "weighted-sum" operation of synaptic values in a neuron is performed in this way. T h e prototype layout of the synapse array fabricated on an SOI structure is shown in Fig. 2.3, where Si stripes with a lateral npn structure are placed on an insulating layer and then covered with a ferroelectric film, and common metal stripes for gate electrodes are placed on the film perpendicular to the Si stripes. Since there is no via-holes across the ferroelectric film in this structure, the packing density of synapses is expected to be very high. Furthermore, the synapse array fabricated on an SOI structure can be electrically isolated completely from one another, which enables us to give different weight values to the individual synapses with ease. We have demonstrated the
Adaptive Learning Neuron Integrated Circuits . . .
39
m-neurons
Fig. 2.2
Schematic diagram of a two-layered neural network.
"weighted-sum" operations of electrically modifiable synapse array using MFSFETs with 3x3 array structure [32]. 2.3
Neuron Integrated Circuits Composed of MFSFETs and CUJT Oscillation Circuits
We selected the SBT/Si structure of various MFS-related structures discussed above for fabricating synapse device using MFSFET. Although this structure is not perfectly promising in its interface properties, the FET Ferroelectric film
Fig. 2.3
Gate electrode
Synapse array using MFSFET matrix fabricated on an SOI structure.
40
S.-M. Yoon, E. Tokumitsu & H. Ishiwara
behaviors are sufficiently good for synapse device applications [22]. Furthermore, the simplicity of this structure is expected to increase the yield of circuit after the full fabrication processes. It was found that parasitic ferroelectric effects of the unnecessary SBT film which was deposited on the whole area of substrate prevented the normal oscillation operation of the circuit, although the individual devices in the circuit operated normally. The parasitic capacitors seem to be formed particularly in the areas of metal interconnections and the electrode pads and have a significant effect on the normal operation scheme of the neuron circuit. In order to solve this problem, the unnecessary SBT film must be selectively etched. Therefore, we newly developed a selective etchant for an SBT film for integrating MFSFETs with other components of the circuit. The fabrication procedure and the operations of the circuit will be explained in detail. 2.3.1 All devices of the neuron circuit were designed by a 5 fim design rule and fabricated on an SOI structure with a 3-/<m-thick p-type Si layer. The channel length and width of MFSFET are 5 fim and 50 ^m, respectively. The device structure and electrical characteristics of CUJT used in a neuron circuit were described in our previous paper [33]. The capacitors were designed to be 3 pF, 10 pF and 30 pF and fabricated with the structure of Al/Si0 2 /n+-Si. RL was designed to be 60 fi-80 Q.. The fabrication procedures are as follows. First, the device region was separated into islands of rectangular shapes using plasma etching system. The reaction gases and their ratio were CF4:02 and 45:5, respectively. Then, the ion implantation processes for forming the active regions of device and contact regions were performed based on the optimum conditions examined in Ref. [33]. After the Si islands were oxidized by dry oxidation for passivation, gate windows for deposition of SBT films were formed by wet chemical etching. The thickness of passivating SiC>2 layer was 50 nm. SBT films were deposited using liquid source misted chemical deposition (LSMCD), in which the same type of sol-gel precusors was used as that in spin-coating method. A better coverage at surface steps and a good thickness uniformity is expected to be obtained by the LSMCD method [34]. Figure 2.4 shows the schematic diagram of LSMCD apparatus used in this study. The deposited SBT films were dried at 150°C for 5 min and prefired at 500° C for 20 min to remove residual organics. The deposition process
Adaptive Learning Neuron Integrated
Circuits
41
Carrier gas N2
Exhaustion •^m —I
Fig. 2.4
Ultrasonic nebulizer *"
Schematic diagram of LSMCD apparatus used in this study.
by LSMCD was repeated until the desired film thickness was obtained, and they were annealed for crystallization at 750°C for 30 min in an O2 atmosphere using a rapid thermal anneal (RTA) system. The final thickness of SBT gate film was about 150 nm. Then, a Pt film was deposited by e-gun evaporation method for forming the gate electrode and it was patterned by lift-off process. In order to obtain good ferroelectricity of SBT, it is generally desirable to use the Pt electrode. We can also expect that the Ptgate electrode formed right after the deposition of SBT acts as a protection layer for SBT gate during the subsequent fabrication processes. As mentioned above, it is essential for obtaining the normal oscillation operation of the circuit that the unnecessary SBT films be removed. To remove the unnecessary SBT film, various etching methods, which give a sufficient etching selectivity between SBT and underlying SiC-2 films, were attempted. In a reactive ion etching (RIE) process, it is very difficult to etch the SBT film only because of the similar etching rates of SBT and SiC-2 either in a gas mixture of Ar:Cl 2 or in CF 4 -based gas mixtures. On the other hand, it was found in the wet chemical etching using HF:HC1 solution that the SBT was very quickly etched off compared with SiC-2, in which the etching rate for SBT was about 10 times faster than that for SiC-2However, the absolute value of etching rate for SiC>2, about 70 nm/min, was too high even if the HF concentration was decreased to 2.5 %, hence the remaining SiC-2 was seriously damaged.
42
S.-M. Yoon, E. Tokumitsu ,-.250
& H. Ishiwara
Room temperature for 60s
^200
(«
J 150 !S BJ3
100
C
"Js 50 W
0 0
0.2 0.4 0.6 0.8 NH4F:HC1 (NH4F Concentration, M/1)
20 30 40 Etching Time (s)
50
(b) Fig. 2.5 Etching characteristics of NH4F:HCl;(a)Dependence of etching rates on the concentration of NH4F. (b)Dependence on the crystallization temperature of SBT.
After many trials, we developed a new wet selective etchant for SBT film, NBi4F:HCl solution. Figure 2.5(a) shows the comparison of the etching rates of S B T and SiC-2 in NH4F:HC1 solution as a function of the concentration of NH4F, which shows a good etching selectivity between SBT and Si02- When the concentration of NH4F is 0.7 M/1, the etching selectivity of about 14:1 was obtained. It was also found t h a t the etching rate of S B T film in this etchant was dependent on the crystallization t e m p e r a t u r e of S B T , as shown in Fig. 2.5(b). Using this etchant, the SBT film deposited on the entire area of the circuit was removed thoroughly, leaving only the gate area of M F S F E T . Contact holes were easily formed by wet etching using B H F solution,
Adaptive Learning Neuron Integrated Circuits ...
MFSFET
Fig. 2.6
43
CUJT
A photograph of integrated MFSFET neuron circuit.
since the unnecessary SBT film did not exist any more. Finally, Al interconnection and electrode pads were formed by lift-off process, which was essential for successful Al patterning, since the wet chemical etching of Al in a hot H3PO4 solution degraded the SBT film severely. 10 sheets of photomask were used in fabrication of the neuron circuit. A photograph of the integrated neuron circuit is shown in Fig. 2.6. 2.3.2
Figure 2.7 shows the drain current (ID) - gate voltage ( VG) characteristics of the fabricated MFSFET. A counterclockwise hysteresis was obtained as indicated by arrows, and the memory window was about 0.57 V for a VG sweep from 0 V to 6 V. To confirm that this shift of threshold voltage is attributed to the ferroelectric nature of SBT film, the dependence of the memory window width on the sweep rate of gate voltage was measured. In Fig. 2.8(a) the ID - VG characteristic for the fastest sweep rate case (6xl0 4 V/s) was compared with that for the normal case shown in Fig. 2.7 (0.5 V/s), in which the sweep rate of gate voltage for the fastest case is about 105 times faster than that of normal case. In this measurement, the gate voltage was applied using 5 kHz triangular wave of 0 to 6 V in a virtually grounded circuit shown in Fig. 2.8(b). Although it is the case that the existence of mobile ions in the gate film may cause the hysteretic behavior
44
S.-M.
Yoon, E. Tokumitsu
& H. Ishiwara
1.6 Memory Window 0J7 V
0.8 0.4 0
0
1
2
3
4
Gate Voltage (V) Fig. 2.7 1.6
~i
ID - Va characteristics of the fabricated M F S F E T
'
r
VG sweep rate SOOmVIs ° 6xl
• o; vUI$F:£339l
Ir-
5
ap
m
l200//m
, .
(a)
..- •{: „pi &stmkQi
vu '
f
'i
*o o
(MOS)
Fig. 2.17 A photograph of CMOS Schmitttrigger oscillation circuit using fixed value resistors and MOSFETs as RE-
0
i 10 20 30 40 50 60 70 80 90 100 (c) Time ((is)
Fig. 2.18 Output pulse waveforms of CMOS Schmitt-trigger oscillation circuit using fixed value resistors; (a)50 left, (b)100 kQ, and (c)500 k n .
pulse frequency is well modulated by the variation of S-D channel resistance of MOSFET. However, the hysteretic behavior shown in Fig. 2.12 was not obtained , since an MFSFET was not used in this circuit. It is concluded from these results that the output pulse signals generated from CMOS Schmitt-trigger circuit can be used as input signals for neurons in the next-layer without connecting an additional amplifier. 2.4.4
In the previous subsection, it was demonstrated that the small output pulse height problem of the neuron circuit using CUJT oscillator could be solved by replacing it with the CMOS Schmitt-trigger oscillator. Thus, in this subsection, the CMOS Schmitt-trigger oscillator is integrated with an MFSFET and the adaptive-learning characteristics with an improved pulse height property in the neuron circuit is demonstrated. In fabrication of the modified neuron circuit, the oscillation part was
52
S.-M.
Yoon, E, Tokumitsu
& H. Ishiwara
/gs=i.or
1
] Vgs=2M
103
— - t — —
::::::::?::::::: pr ,1
f
Vt>s=3.0Y
102
I±l F - - } — • EEfrE
_JCL__
^-x^eimfiF^
,
:
~Z
4
23kHz
Vgs=4.0Y 0
10
20 30 Time (jis)
40
50
Fig. 2.19 Output pulse waveforms of CMOS Schmitt-trigger oscillator using MOSFET as RB-
O 10
i i
1
2 3 4 Gate Voltage (V)
Fig. 2.20 Variation of output pulse frequency with the value of applied gate voltage.
fabricated using the same process as that of the CMOS Schmitt-trigger oscillator shown in 2.4.2, and then the fabrication process of MFSFET was incorporated, which was almost the same as that used in fabrication of the CUJT neuron circuit discussed in section 2.3. A photograph of the fabricated circuit is shown in Fig. 2.21, in which a capacitance value is fixed at 10 pF. All the fabricated devices including an MFSFET and MOSFETs composing the inverter circuits were confirmed to operate normally by optimizing the fabrication conditions, especially the etching condition of SBT. To examine the improved behavior of the modified neuron circuit, similar measurements were performed. The power supply voltage ( VDD) w a s 5 V. Figure 2.22 shows the output pulse waveforms after a single pulse (20 ns, 6 V) and 60 pulses were applied to the gate terminal of MFSFET. As can be seen in this figure, the adaptive-learning characteristics were similarly obtained in this modified neuron circuit. Furthermore, as expected, the height of output pulses was almost the same as VDD • The output pulse frequency was also successfully modulated with the number of input pulses, as shown in Fig. 2.23. It is concluded from these results that the imple-
Adaptive Learning Neuron Integrated Circuits ...
200 fim
| I Output
53
. •
Fig. 2.21 A photograph of the modified neuron circuit. mentation of neuron circuit using C M O S Schmitt-trigger as an oscillation component is a very desirable solution to the small pulse height problem of the neuron circuit using a C U J T oscillator.
2.5
Conclusions
A novel P F M - t y p e adaptive-learning neuron circuit using an M F S F E T as a synapse device was successfully fabricated on an SOI structure after optimizing the fabrication process. Main results obtained are summarized as follows. (1) M F S F E T s , which act as analog memories storing the synaptic weight in t h e neuron circuit, were fabricated using the S B T / S i structure, and good nonvolatile memory operations were demonstrated. (2) Gradual change of the drain current of M F S F E T due to the partial polarization reversal of S B T gate film was demonstrated by applying a number of input pulses with a sufficiently short duration of 20 ns. (3) In the integrated neuron circuit using an M F S F E T and a C U J T oscillation circuit, the o u t p u t pulse frequency was gradually changed as the number of input pulses applied to M F S F E T was increased. (4) T h e problem of small o u t p u t pulse height of the C U J T neuron circuit was solved by replacing the C U J T oscillation circuit with the C M O S
S.-M.
Input
Yoon, E. Tokumitsu
s
6V
1.85 V
Input
Pulse
20 ns
Output
after one pulse
'OUT
J
2
n
- . ! -20
-10
0 10 Time (us)
nil-JL"
60 Pukes
Output
j=64.7kHz
(i
[
> *
Fig. 2.22 circuit.
& H. Ishiwara
•
after sixty pulses > 4 -
"|
f=l54.2kHz
I1
/
/*1
OUT
54
2 •
0
20
•
_1,., -20
-10
,/
,H
j
.r,
0 10 Time ((is)
.,/ . 20
Output pulse waveforms in adaptive-learning function of the modified neuron
at constant gate volatge of 1.85V
J4
Adaptive-Learning Pulse Frequency Modulation 64.7 -154.2 kHz
0
Fig. 2.23
10
20 30 40 Number of Pulses
50
60
Gradual change of output pulse frequency in the modified neuron circuit.
Schmitt-trigger circuit. However, there are still problems which must be improved for the future perspectives. First, it has been found that the retention time of stored synaptic weight is not sufficiently long for using the practical systems. In retention measurement of the circuit, the output pulse frequency obtained by application of input signals decreased to about one-half of the initial value after 500 s, which is much shorter than the memory retention time measured in individual MFSFETs (typically 5000 s). Secondly, the modu-
Adaptive Learning Neuron Integrated Circuits . . .
55
lation range of output pulse frequency is still too narrow. In order to carry out the learning operation in a neural network using the back-propagation method, it is generally said that the minimum resolution of the weight value in a synaptic connection is not less than 10 bit. In other words, the output frequency must be modulated in the range of 3-orders-of-magnitude. Therefore, further researches on the improvement of the overall performance of oscillation circuit as well as the device characteristics of MFSFET must be continued. Actually, since it is expected that the retention time of an MFSFET is improved by optimizing the device structure, the next version of the neuron circuit with a better retention property is under fabrication. Finally, we conclude that this novel neuron circuit with a adaptivelearning function is very promising for the large-scale neural networks in the next generation,particullary when the above mentioned problems are well solved.
56 S.-M. Yoon, E. Tokumitsu & H. Ishiwara
References [l] O. Fujita and Y. Amemiya, "A floating-gate analog memory device for neural networks," IEEE Trans. Electron Devices, 40, pp.2029-2035, 1993 [2] K. Nakajima, S. Sato, T. Kitaura, J. Murota, and Y. Sawada, "Hardware implementation of new analog memory for neural networks," it IEICE Trans. Electron., E78-C, pp.101-105, 1995. [3] T. Shibata, H. Kosaka, H. Ishii, and T. Ohmi, "A neuron-MOS neural network using self-learning-compatible synapse circuits," IEEE J. Solid-State Circuits, 30, pp.913-922, 1995. [4] C. Diorio, P. Hasler, B. A. Minch, and C. A. Mead, "A single-transistor silicon synapse," IEEE Trans. Electron Devices, 43, pp.1972-1980, 1996. [5] S. M. Sze, Physics of Semiconductor Devices, Wiley, New York, 1981. [6] H. Ishiwara, "Proposal of Adaptive-Learning Neuron Circuit with Ferroelectric Analog-Memory Weights," Jpn. J. Appl. Phys., 32, pp.442-446, 1993. [7] T. Fukushima, A. Kawahara, T. Nanba, M. Matsumoto, T. Nishimoto, N. Ikeda, Y. Judai, T. Sumi, K. Arita, and T. Otsuki, "A Microcontroller Embedded with 4Kbit Ferroelectric Non-Volatile Memory," 1996 Symp. on VLSI Circuits Tech. Dig., pp.46-47, 1996. [8] D. J. Jung, N. S. Kang, S. Y. Lee, B. J. Koo, J. W. Lee, J. H. Park, Y. S. Chun, M. H. Lee, B. G. Jeon, S. I. Lee, T. E. Shim, and C. G. Hwang, "A 1T/1C Ferroelectric RAM using a Double-level Metal Process for Highly Scalable Nonvolatile Memory," 1997 Symp. on VLSI Technol. Tech. Dig., pp.139-140, 1997. [9] K. Amanuma, T. Tatsumi, Y. Maejima, S. Takahashi, H. Hada, H. Okizaki, and T. Kunio, "Capacitor-on-Metal/Via-stacked-Plug (CMVP) Memory Cell for 0.25 nm CMOS Embedded FeRAM," IEDM Tech. Dig., pp.363-366, 1998 [10] S. Tanaka, R. Ogiwara, Y. Itoh, T. Miyakawa, Y. Takeuchi, S. Doumae, H. Takenaka, and H. Kamata, "FRAM Cell Design with High Immunity to Fatigue and Imprint for 0.5 /jm 3 V 1T/1C 1M bit FRAM," IEDM Tech. Dig., pp. 359-362, 1998.
Adaptive Learning Neuron Integrated Circuits ...
57
[11] H, Ishiwara, T. Shimamura, and E. Tokumitsu, "Proposal of a singletransistor-cell-type ferroelectric memory using an SOI structure and experimental study on the interference problem in the write operation," Jpn. J. Appl. Phys., 36, pp.1655-1658, 1997. [12] W. L. Brown, US Patent 2791759, and I. M. Ross, US Patent 2791760, 1957. [13] S. Y. Wu, "A New Ferroelectric Memory Device, Metal-FerroelectricSemiconductor Transistor," IEEE Trans. Electron Devices, 21, pp.499-505, 1974. [14] K. Sugibachi, Y. Kurogi, and N. Endo, "Ferroelectric field-effect memory device using Bi 4 Ti 3 Oi 2 film," J. Appl. Phys., 46, pp.2877-2881, 1975. [15] Y. Higuma, Y. Matsui, M. Okuyama, T. Nakagawa, and Y. Hamakawa, "MFSFET-A new type of nonvolatile memory switching using PLZT film," Proc. 19th Conf. Solid State Devices, Tokyo, 1997, Jpn.J.Appl. Phys., Suppl.17-1, pp.209-214, 1977. [16] E. Tokumitsu, R. Nakamura, and H. Ishiwara, "Nonvolatile memory operations of metal-ferroelectric- insulator-semiconductor (MFIS) FETs using PLZT/STO/Si(100) structures," IEEE Electron Device Lett., 18, pp.160162, 1997. [17] B. E. Park, I. Sakai, E. Tokumitsu, and H. Ishiwara, "Hysteresis characteristics of vacuum-evaporated ferroelectric PbZro.4Tio.6O3 films on Si (111) substrtaes using C e 0 2 buffer layers," Appl. Surf. Sci., 117/118, pp.423-428, 1997. [18] B. E. Park, E. Tokumitsu, and H. Ishiwara, "Fabrication of PbZr^Ti 1-^03 Films on Si Structures Using Y 2 0 3 Buffer Layers," Jpn. J. Appl. Phys., 37, pp.5145-5148, 1998. [19] S. Horita, S. Horii, and S. Uemoto, "Material Properties of Heteroepitaxial Ir and P b ( Z r i T i i _ x ) 0 3 Films on (100)(ZrO2)i-x(Y 2 O 3 ) I /(100)Si Structure Prepared by Sputtering," Jpn. J. Appl. Phys., 37, pp.5141-5144, 1998. [20] J. Senzaki, K. Kurihara, N. Nomura, O. Mitsunaga, Y. Iwasaki, and T. Ueno, "Characterization of Pb(Zr,Ti)0 3 Thin Films on Si Substrates Using MgO Intermediate Layer for Metal/Ferroelectric/Insulator/Semiconductor Field Effect Transistor Devices," Jpn. J. Appl. Phys., 37, pp.5150-5153, 1998. [21] T. Nakamura, Y. Nakao, A. Kamisawa, and H. Takasu, "Ferroelectric memory F E T with I r / I r 0 2 Electrodes," Integrated Ferroelectrics, 9, pp.179-187, 1995. [22] C. A. Paz de Arajuo, J. D. Cuchuaro, L. D. McMillan, M. C. Scott, and J. F. Scott, "Fatigue-free ferroelectric capacitors with platanum electrodes," Nature, 374, pp.627-629, 1995. [23] E. Tokumitsu, G. Fujii, and H. Ishiwara, "Electrical properties of MFS-FETs
58 S.-M. Yoon, E. Tokumitsu & H. Ishiwara using SrBi2Ta2 0 9 films directly grown on Si substrates by sol-gel method," Proc. Mat. Res. Soc. Symp., 493, pp.459-464, 1998. [24] T. Hirai, Y. Fujisaki, K. Nagashima, H. Koike, and Y. Tarui, "Preparation of SrBi2Ta2 0g Films at Low Temperatures and Fabrication of a Metal/Ferroelectric/Insulator/Semiconductor Field Effect Transistor Using Al/SrBi 2 Ta 2 O 9 /CeO 2 /Si(100) Structures", Jpn. J. Appl. Phys., 36, pp.59085911, 1997. [25] H. N. Lee, M. H. Lim, Y. T. Kim, T. S. Kalkur, and S. H. Choh, "Characteristics of Metal/Ferroelectric/Semiconductor Field Effect Transistors Using a Pt/SrBi2Ta20 9 /Y 2 03/Si Structure," Jpn. J. Appl. Phys., 37, pp.1107-1109, 1998. [26] T. Kijima, and H. Matsunaga, "Preparation of Bi.jTi30i 2 Thin Film on Si (100) Substrate Using Bi 2 Si0 5 Buffer Layer and Its Electric Characterization," Jpn. J. Appl. Phys., 37, pp.5171-5173, 1998. [27] Y. Fujimori, N. Izumi, T. Nakamura, and A. Kamisawa, "Application of Sr 2 Nb 2 07 Family Ferroelectric Films for Ferroelectric Memory Field Effect Transistor, " Jpn. J. Appl. Phys., 37, pp.5207-5210, 1998. [28] D. R. Lampe, D. A. Adams, M. Austin, M. Polinski, J. Dzimianski, and S. Sinharoy, "Process Integration of the Ferroelectric Memory F E T for NDRO FeRAMs," Ferroelectrics, 133, pp.61-72, 1992. [29] K. Aizawa, T. Okamoto, E. Tokumitsu, and H, Ishiwara, "Fabrication and Characterization of Metal-Ferroelectric-Semiconductor Field Effect Transistors Using Epitaxial BaMgF.* Films on Si (111) Substrate," Integrated Ferroelectrics, 15, pp.245-252, 1997. [30] K. Aizawa, T. Ichiki, T. Okamoto, E. Tokumitsu and H. Ishiwara, "Ferroelectric Properties of BaMgF 4 Films Grown on Si(100),(lll), and Pt(lll)/SiO 2 /Si(100) Structures," Jpn. J. Appl. Phys., 35, pp.1525-1530, 1996. [31] E. Tokumitsu, G. Fujii and H. Ishiwara, "Nonvolatile ferroelectric-gate fieldeffect transistors using SrBi 2 Ta 2 0 9 /Pt/SrTa2 06/SiON/Si Structure ", Appl. Phys. Lett., pp.575-577, 1999. [32] S. M. Yoon, E. Tokumitsu, and H. Ishiwara, "An Electrically Modifiable Synapse Array Composed of Metal-Ferroelectric-Semiconductor (MFS) FETs Using SrBi 2 Ta 2 09 Thin Films, " IEEE Electron Device Lett., 20, pp.229-231, 1999. [33] S. M. Yoon, Y. Kurita, E. Tokumitsu and H. Ishiwara, "Electrical Characteristics of Neuron Oscillation Circuits Composed of MOSFETs and Complementary Unijunction Transistors," Jpn. J. Appl. Phys., 37, pp.1110-1115, 1998.
Adaptive Learning Neuron Integrated Circuits ...
59
[34] M. Huffman, "Liquid source misted chemical deposition (LSMCD) - A critical review," Integrated Ferroelectries, 10, pp.39-53, 1995.
Chapter 3 An Analog-digital Merged Circuit Architecture Using P W M Techniques for Bio-inspired Nonlinear Dynamical Systems Takashi Morie,
Makoto Nagata, Hiroshima
and Atsushi Iwata
University
Abstract This chapter presents an analog-digital merged neural circuit architecture using pulse width modulation (PWM) signals. In particular, circuits implementing bipolar-weighted summation and arbitrary nonlinear transformation are described. The weighted summation circuit attains 8-bit precision in SPICE simulation by compensating parasitic capacitance effects. Measurement results of a prototype chip fabricated using a 0.6 fim CMOS process demonstrate that the overall precision is 5 bits. A neural network has been constructed using the prototype chips, and the experimental results for realizing the XOR function have successfully verified the basic neural operation. T h e arbitrary nonlinear nonmonotone transformation is achieved in conversion from analog voltage to P W M signals using plural comparators. Using this technique, we have fabricated a CMOS chaos chip, and have succeeded chaotic signal generations which exhibit bifurcation behaviors closely similar to those predicted by numerical simulation. Keywords : analog-digital merged architecture, VLSI implementation, nonlinear dynamical system, pulse width modulation, P W M , switched-current source, bipolar-weighted summation, nonlinear transformation, neural networks, chaos, bifurcation
3.1
Introduction
Neural networks, an information processing paradigm inspired by biological nervous systems, have been recognized as a useful approach in such applications as vision, acoustics, robotics or control systems. However, 61
62
T. Morie, M. Nagata & A. Iwata
the conventional neural network models used in many applications have only simple dynamics. The backpropagation networks, which are the most famous model, have a layer-type feed-forward structure and have no dynamics. The Boltzmann machines and Hopfield networks, which are also well-known models, only have symmetrical connections, thus their dynamics always leads to fixed-point steady states, and chaotic behavior or oscillation is never observed. However, recent many studies in brain physiology and artificial neural network theories have revealed that nonlinear analog dynamics plays an important role in intelligent information processing. Chaotic neural networks [l; 2; 3], associative memory with nonmonotone dynamics [4; 5; 6], and nonlinear oscillator networks [7] are the typical models. In order to use such models in real-time real-world applications, massively parallel nonlinear dynamical systems have to be constructed. Thus, their VLSI implementation is essential. There have been many reports of VLSI implementation of neural networks. However, the conventional VLSI implementation approaches can hardly realize such nonlinear dynamical systems. The aim of this chapter is to propose a VLSI circuit architecture and some related circuit techniques for constructing massively parallel nonlinear dynamical systems. We have developed analog-digital merged architecture using pulse width modulation (PWM) approaches, which are different from conventional ones. We evaluate the performance of our architecture and circuits by using circuit (SPICE) simulation and measurement results of fabricated prototype VLSI chips. This chapter is organized as follows. In Sec. 3.2, the features of the VLSI implementation approaches of neural networks are compared, and the advantages of our new approach using PWM signals are clarified. Next, our basic circuit architecture is proposed. In Sec. 3.3, a neural circuit based on the PWM approach is described [8]. A new bipolar weighted summation method using PWM signals is proposed, and circuit techniques that achieves high calculation precision are introduced. The performance of the circuit is evaluated using SPICE simulation and measurement results of a prototype chip. In Sec. 3.4, a new circuit technique for arbitrary nonlinear transformation by using PWM signals is proposed [9]. A chip that can generate arbitrary one-dimensional chaos is presented, and its measurement results are shown. Finally, we give the conclusion in Sec. 3.5.
An Analog-Digital
3.2
3.2.1
Merged Circuit Architecture . . .
63
A N e w VLSI Implementation Approach Using P W M Signals Comparison
between Various Implementation
Approaches
VLSI implementation of neural networks are mainly classified into digital, analog and pulse modulation approaches. The digital approach has high controllability and expandability. The digital systems are stable and robust against various disturbances arising in real VLSI systems. Recently, some practical high-performance digital neural VLSI chips have been developed [10; 11; 12], which can implement large-scale neural networks. The digital approach, however, cannot implement analog dynamics essentially although they can obtain high calculation precision in exchange for the large circuit area. Because the circuit components occupy the large area, massively parallel operation is difficult. Instead, time-sharing operation is performed. This feature is suitable for implementing simple feedforward networks such as backpropagation networks, but is not suitable for realizing massively parallel analog dynamical neural systems. On the other hand, the analog approach is obviously suitable for implementing analog dynamics. It is very powerful and effective for implementing recurrent networks that have analog dynamics. In addition, the circuit size can be reduced drastically compared with the digital approach. Thus, there are many attempts to develop analog neural VLSI chips [13; 14; 15]. However, the calculation precision is limited by various non-idealities in circuit components, noise and crosstalk [16; 17]. Moreover, it is not easy to perform arbitrary nonlinear, nonmonotone transformation. Therefore, analog VLSI chips dedicated to specific dynamics are designed. The third approach, pulse modulation approach, is considered as one for achieving time-domain analog information processing using pulse signals. It includes some information representation methods, such as pulse density (pulse frequency) modulation (PDM), pulse width modulation (PWM), and pulse phase modulation (PPM). The pulse modulation approach has almost the same advantages as the digital approach. The PDM approach has often been used because of the similarity to the behavior of biological neurons. It can approximately perform continuoustime continuous-state dynamics. A digital system using the PDM approach has been developed for large-scale neural network implementation [18].
64
T. Morie, M. Nagata & A. Iwata
However, the PDM approach has a drawback of large power consumption because of the large transition rate. This paper focuses on the PWM approach. A PWM signal represents the information by its pulse width. The PWM approach is used in an analog-digital merged circuit architecture, where signals have digital values in the voltage domain and analog values in the time domain [19; 20]. The PWM circuits mainly consist of digital circuit components, thus they match the scaling trend in the Si CMOS technology and low voltage operation. They operate with lower power consumption than in the traditional digital or PDM circuits because one data is represented by only one state transition in the PWM approach. This is an important superior point to the PDM approach in VLSI systems. Thus, the PWM approach seems suitable for constructing bio-inspired VLSI systems. The PWM approach implements continuous-state discrete-time dynamics. Obviously, discretetime dynamics is not used in the biological systems, but it has been thoroughly examined, and equivalent functions can be achieved in most cases. The PPM approach also has the same features as the PWM approach. However, it requires a reference (clock) signal defining the start time for measuring the phase, whereas a PWM signal includes all of the information in itself. Therefore, PWM signals can be transmitted efficiently, whereas PPM methods may effectively be used in local circuits.
3.2.2
Basic Architecture
Using PWM
Signals
A basic neural architecture using PWM signals is shown in Fig. 3.1 [8]. The operation is as follows: (1) PWM signals are transmitted from other neurons. (2) Weighted summations are performed by converting the PWM signals into charges stored in a capacitor using switched-current sources (SCSs). (3) The voltage between the nodes of the capacitor, Vout, is compared with the reference signal, and is transformed into a PWM signal. In neural circuits, bipolar (positive and negative) weights are required corresponding to excitatory and inhibitory synapses. Therefore, this PWM neural architecture must be expanded in order to perform bipolar weighted summation. This is described in Sec. 3.3 [8].
An Analog-Digital
Merged Circuit Architecture
...
65
PWM signals
Comparator Switched Current Source (SCS)
current / • u.\
JL V
(weight)
v
y Vout= ^ I ; T :i/C ;
' "
nonlinear transformation
V
~ Vref=/(t) Vout
Vout=/(Tout) Tout=/"1(Vout)
PWM output Tout Fig. 3.1 Basic neural architecture using P W M signals.
Nonlinear transformation is another key point in the neural circuits. The PWM output signal is made by comparing Vout with a ramped reference signal. A nonlinear transformation can be performed in this comparing process by supplying a nonlinear reference waveform. If the reference signal voltage Vref nonlinearly varies in the time domain, i.e. Vref = f(t), where / is a nonlinear function, the pulse width of the output signal, Tout, is given by Tout = f~1(Vout), where / - 1 is the inverse function of / . In this method, although / is limited to a monotone function, a sigmoidal function, which is often used in many neural network models, is easily generated. However, arbitrary nonlinear nonmonotone transformation is required for constructing general nonlinear dynamical systems. A new approach for arbitrary nonlinear transformation is described in Sec. 3.4 [9].
66
T. Morie,
M. Nagata
& A.
Iwata
PWM input
S2
u:
S3
H
S4
T ,Vref
PWM output
r^tf*^ 6—*—0^> OVt I C2 CI _i_ ,, comi comparator Vout SOWS
F i g . 3.2
P W M n e u r o n circuit w i t h four s y n a p s e s .
Since the PWM-voltage-PWM transformations are analog operations, much attention should be paid in designing the corresponding circuits. However, establishing the design criteria is easier than in the pure analog approach because analog parts in PWM circuits are localized. 3.3
A Neural Circuit Using P W M Signals
A neuron circuit based on the above PWM method is shown in Fig. 3.2, where a neuron with four synapses is assumed. PWM input pulses are fed into the synapse circuits Si,i = 1,•••,4, weighted by the preset synaptic weights, and summed in bipolar summation circuit SUM. The summation result is obtained as voltage Vout. The reason why this summation circuit configuration is chosen is described below. Here, we assumed digital memory (4 bits and a sign-bit) as a weight memory because it can easily be fabricated using the ordinary VLSI fabrication technology. A synapse circuit configuration is shown in Fig. 3.3. However, analog memory is desirable in practical neural chips. One of the authors has developed a practical analog memory device and applied it to an analog neural VLSI chip [15]. Thus, PWM neural chips with analog synaptic memory can be realized. 3.3.1
Bipolar
Weighting
Methods
PWM signals turn on the current sources and the capacitor is charged up. The synaptic weights are expressed as the current values of the currentsources. The bipolar weighting is achieved by the following two methods as
An Analog-Digital
Merged Circuit Architecture
...
weight sign bit load j w e i g h t - JJ
^
ilCQN CLK-1
PWM
P>>
signal
Vbias
\s°
|[*
* j | °NJ
Isum(+). Isum(-) Fig. 3.3
Synapse circuit with 4-bit digital memory.
illustrated in Fig. 3.4: (A) charging or discharging the single capacitor, and (B) preparing identical two capacitors, charging the corresponding capacitors with absolute values of positive and negative inputs, and subtracting the charges of the negative part from those of the positive part. Method A is simple and it is based on the same idea as Kirchhoff's current law that has been used in analog neural circuits [14]. PWM neural chips that have already been reported [21; 22] use this method without any evaluation about its effectiveness in the PWM approach. However, in this method, it is difficult to attain symmetric charging and discharging operation because PMOS and NMOS FETs are used as current sources, respectively. In addition, the linear summation voltage range is smaller than in method B because of non-ideal characteristics of both FETs. This small summation range is not so serious in analog neural circuits using voltage or current domain. The reason is that threshold operation near zero value is most important while large input values are less important because of saturation characteristics of sigmoidal transfer functions. However, this is not the case in PWM neural circuits. As shown in Fig. 3.5, when unbalanced
68
T. Morie, M. Nagata & A. Iwata (B) positive
negative
LJ
LJ
positive
is negative J Fig. 3.4
, ''"J" -'-c
- J -c
}
~J~
~J~
P W M methods for bipolar-weighted summation. •
«-•
T+ T~=nT*|
,+ r l + 2
I
l
• • •
•
time
,. _
-
h
Vout +
(n-l)T .
T
1 * ••time
without saturation with saturation time
Fig. 3.5
Saturation effect in method A when unbalanced inputs are applied.
inputs are applied, a saturation is caused in bipolar weighted summation operation by the small summation range, which leads to an error. In method B, it is rather easy to obtain high calculation precision because the identical MOSFETs and capacitors are used for positive and negative summations and the relative precision between the identical devices on a chip is very high. There exists the upper limit in summation
An Analog-Digital
Merged Circuit Architecture
...
69
(a) Serial connection Absolute-value summation mode Isum(+)
Isum(-)
Isum(-)
Isum(+)
vc vci
_x
•ftAJc H 4
(b) Parallel connection Absolute-value summation mode
Switching Subtraction mode 'Vout=Vcl-Vc2+Vr
^hv-j-VHKv Vr
Fig. 3.6
Vout
Vr
+ C2
1
CI Vr
Two ways for subtraction operation.
of absolute values as in method A, but the linear range is larger than in method A. As a result, we adopt method B to obtain high accuracy. 3.3.2
Subtraction
Operation
In method B, there are two ways for subtracting charges of the negative part from those of the positive part as shown in Fig. 3.6: (a) serial con-
70
T. Morie, M. Nagata & A. Iwata
absolute-value summation mode negative weightsChar
f
Qn
AQn
positive weights Cha 8eQP
Qn-AQ„
;
zn ~T
\
~n
Qp-AQp
Vr-
Vout/AQp
parasitic capacitor subtraction mode
i
+ - T
Vr-w^-w \ AQn discarded
Fig. 3.7
| Vout
I
+
ifw-w / AQp summed
T Vr
Parasitic capacitance effect on summation operation.
nection and (b) parallel connection. If two identical capacitors C are used, the maximum voltage integrable in a capacitor for the parallel connection is twice as large as that for the serial connection. Therefore, the calculation precision in the parallel connection is higher than that in the serial connection. However, the operation speed in the parallel connection is slower than that in the serial connection because charge redistribution occurs in the former case. When C = 10 pF, the results of SPICE simulation using 0.6 fim CMOS parameters at 5 V supply voltage showed that the operation speed is 20-30 ns for the serial connection and 50-60 ns for the parallel connection. Because we consider the calculation precision more important than the operation speed, we adopt the parallel connection configuration. 3.3.3
Summation
Accuracy
Even when the method B is used, a calculation error arises due to parasitic capacitance. As shown in Fig. 3.7, charge AQn is stored in the parasitic source/drain junction capacitance in the absolute-value summation mode, while it is discarded in the subtraction mode. Because of this effect, the
An Analog-Digital
2"
c
:n
20-1X0
20-ISO In..]
|^m5
Vl.
C2
f
Merged Circuit Architecture
2(1-ISO
2d-ISO
-I I
IJ
i
2n •I
•I
PW = PWM(+) - 20ns(-) 0.6
3.0 , 1
0.4
2.8
Vout[
2n |n
Merged Circuit Architecture ...
2.339 Vbias (V)
2.315
Fig. 3.21 Bifurcation diagrams observed on the oscilloscope screen, (a) tent map, (b) logistic map.
using a 0.6 /zm CMOS process. The measurement results demonstrated that the overall precision in the weighted summation and the sigmoidal transformation is 5 bits. Although the precision achieved in the prototype chip is lower than expected by SPICE simulation, analyzed results indicates that the precision can be improved up to around 8 bits by optimizing the circuit design. We have also attained arbitrary nonlinear analog transformation using PWM signals. This cannot be realized in the ordinary analog approach, nor the digital approach. We fabricated a CMOS chaos generator chip using a 0.4 /an CMOS process. This chip exhibited chaotic behaviors as predicted
84
T. Morie, M. Nagata & A. Iwata
by the numerical simulation. In the future, practical VLSI systems implementing arbitrary analog nonlinear dynamics will be constructed using this architecture. It will provide new hardware for various bio-inspired models such as advanced associative memory, chaotic neural networks, and nonlinear oscillator networks. Moreover, we can also provide another new hardware that can dynamically change its analog dynamics. We expect that such hardware leads proposal of innovative information processing models. Acknowledgments The authors would like to thank Jun Funakoshi and Souta Sakabayashi for their contributions to this work. This work was supported by the Ministry of Education, Science, Sports, and Culture under Grant-in-Aid for Scientific Research on Priority Areas, "Ultimate Integration of Intelligence on Silicon Electronic Systems" (Head Investigator: Tadahiro Ohmi, Tohoku University). This work was supported in part by The Mazda Foundation's Research Grant.
An Analog-Digital Merged Circuit Architecture . . .
85
References [1] K. Aihara, T. Takabe, and M. Toyoda, "Chaotic Neural Networks," Phys. Lett. A, 144, pp.333-340, 1990. [2] H. Nozawa, "A Neural Network Model as a Globally Coupled Map and Applications Based on Chaos," Chaos, 2, pp.377-386, 1992. [3] S. Ishii, K. Pukumizo, and S. Watanabe, "A Network of Chaotic Elements for Information Processing," Neural Networks, 9, pp.25-40, 1996. [4] M. Morita, "Associative Memory with Nonmonotone Dynamics," Neural Networks, 6, pp.115-126, 1993. [5] H. Kakeya and T. Kindo, "Hierarchical Concept Formation in Associative Memory Composed of Neuro-window Elements," Neural Networks, 9, pp. 1095-1098, 1996. [6] T. Miki, M. Shimono, and T. Yamakawa, "A Chaos Hardware Unit Employing the Peak Point Modulation," Proc. Int. Symp. Nonlinear Theory and its Applications, pp. 25-30 , 1995. [7] D. L. Wang and David Terman, "Image Segmentation Based on Oscillatory Correlation," Neural Computation, 9, pp.805-836, 1997. [8] T. Morie, J. Funakoshi, M. Nagata, and A. Iwata, "An Analog-Digital Merged Neural Circuit Using Pulse Width Modulation Technique," IEICE Trans. Fundamentals., E82-A, pp.356-363, 1999. [9] T. Morie, S. Sakabayashi, M. Nagata, and A. Iwata, "Nonlinear Dynamical Systems Utilizing Pulse Modulation Signals and a CMOS Chip Generating Arbitrary Chaos," Proc. 7th Int. Conf. on Microelectronics for Neural, Fuzzy and Bio-inspired Systems (MicroNeuro '99), pp. 254-260, Granada, 1999. [10] C. Park, K. Buckmann, J. Diamond, U. Santoni, S. The, M. Holler, M. Glier, C. Scofield, and L. Nunez, "A Radial Basis Function Neural Network with On-chip Learning," Proc. Int. Joint Conf. on Neural Networks, pp. 30353038, 1993. [11] Y. Kondo, Y. Koshiba, Y. Arima, M. Murasaki, T. Yamada, H. Amishiro, H. Shinohara, and H. Mori, "A 1.2GFLOPS Neural Network Chip Exhibiting
86
T. Morie, M. Nagata & A. Iwata Fast Convergence," IEEE Int. Solid-State Circuits Conf. Dig., pp. 218-219, 1994.
[12] O. Saito, K. Aihara, O. Fujita, and K. Uchimura, "A 1M Synapse SelfLearning Digital Neural Network Chip," IEEE Int. Solid-State Circuits Conf. Dig., pp. 94-95, 1998. [13] C. R. Schneider and H. C. Card, "Analog CMOS Deterministic Boltzmann Circuits," IEEE J. Solid-State Circuits, 28, pp.907-914, 1993. [14] T. Morie and Y. Amemiya, "An All-analog Expandable Neural Network LSI with On-chip Backpropagation Learning," IEEE J. Solid-State Circuits, 29, pp.1086-1093, 1994. [15] T. Morie, O. Fujita, and K. Uchimura, "Self-Learning Analog Neural Network LSI with High-Resolution Non-Volatile Analog Memory and a Partially-Serial Weight-Update Architecture," IEICE Trans. Electron., E80C, pp.990-995, 1997. [16] R. C. Frye, E. A. Rietman, and C. C. Wong, "Back-Propagation Learning and Nonidealities in Analog Neural Network hardware," IEEE Trans. Neural Networks, 2, pp.110-117, 1991. [17] T. Morie, O. Fujita, and Y. Amemiya, "Analog VLSI Implementation of Adaptive Algorithms by an Extended Hebbian Synapse Circuit," IEICE Trans. Electron., E75-C, pp.303-311, 1992. [18] Y. Hirai and M. Yasunaga, "A PDM Digital Neural Network System with 1,000 Neurons Fully Interconnected via 1,000,000 6-bit Synapses," Proc. ICONIP, pp. 1251-1256, 1996. [19] A. Iwata and M. Nagata, "A Concept of Analog-Digital Merged Circuit Architecture for Future VLSI's," IEICE Trans. Fundamentals., E79-A, pp. 145-157, 1996. [20] M. Nagata, J. Funakoshi, and A. Iwata, "A PWM Signal Processing Core Circuit Based on a Switched Current Integration Technique," IEEE J. SolidState Circuits, 33, pp.53-60, 1998. [21] E. I. El-Masry, H. K. Yang, and M. A. Yakout, "Implementations of Artificial Neural Networks Using Current-Mode Pulse Width Modulation Technique," IEEE Trans. Neural Networks, 8, pp.532-548, 1997. [22] J. C. Bor and C. Y. Wu, "Realization of the CMOS Pulsewidth-Modulation (PWM) Neural Network with On-Chip Learning," IEEE Trans. Circuits & Syst. II, 45, pp.96-107, 1998. [23] P. W. Hollis, J. S. Harper, and J. J. Paulos, "The Effect of Precision Constraints in a Backpropagation Learning Network," Neural Computation, 2, pp.363-373, 1990. [24] D. D. Caviglia, M. Valle, and G. M. Bisio, "Effects of Weight Discretization on the Back Propagation Learning Method: Algorithm Design and Hardware
An Analog-Digital Merged Circuit Architecture . . .
87
Realization," Proc. Int. Joint Conf. on Neural Networks, pp. 11-631-637, 1990. [25] J. L. Holt and J. Hwang, "Finite Precision Error Analysis of Neural Network Electronic hardware Implementations," Proc. Int. Joint Conf. on Neural Networks, pp. 1-519-525, Seattle, 1991. [26] B. W. Lee and S. W. Kim, "Required Dynamic Range and Accuracy of Electronic Synapses for Character Recognition Applications," IEEE Proc. of Int. Symp. Circuits and Systems, pp. 1545-1548, San Diego, 1992. [27] K. Eguchi and T. Inoue, "A Current-Mode Analog Chaos Circuit Realizing a Henon Map," IEICE Trans. Electron., E 8 0 - C , pp.1063-1066, 1997. [28] T. Morie, S. Sakabayashi, M. Nagata, and A. Iwata, "Nonlinear Function Generators and Chaotic Signal Generators Using a Pulse-Width Modulation Method," Electron. Lett, 3 3 , pp.1351-1352, 1997.
Chapter 4 Application-Driven Design of Bio-Inspired Low-Power Vision Circuits & Systems Andreas Konig,
Jan Skribanowitz,
Jens Doge,
Michael Eberhardt,
and Thomas Knobloch
Dresden University of Technology
Abstract Natural vision systems are yet unrivaled with regard to parameters such as performance, size, and power consumption in comparison to todays technical and predominantly digital implementations. This especially holds for complex vision tasks, e.g., in image sequence analysis. Application-specific constraints, imposed by many real time vision tasks, can be met by an opportunistic design of bio-inspired circuits and systems employing analog and mixed-signal design techniques. Consequently, a plethora of vision chips exploiting basic principles have been designed but only few can actually serve in real applications. Today, the modeling of complete application systems requires a hybrid approach and an appropriate design methodology to assure the viability of the resulting integrated system. This paper reports on a research activity that tackles the development of a corresponding design methodology. Several application projects, e.g., OCR, automotive image processing, eye tracking, and visual inspection, will be introduced, t h a t were subject to this design methodology and gave feedback to advance the methodology for systematic design of integrated cognitive systems. Keywords : bio-inspired VLSI systems, systematic low-power mixed-signal design, design methodology, CMOS image sensors, vision chips, automotive applications, overtake monitoring, 3D-displays, eye-trackers, image coding, OCR
4.1
Introduction
Numerous machine vision problems, e.g., complex surveillance tasks, automotive applications, or automated visual inspection and visual process 89
90
A. Konig, J. Skribanowitz,
M. Eberhardt, J. Doge & T.
Knobloch
control, impose high d e m a n d s on viable solutions in terms of size, speed, performance, and power consumption. Furthermore, cost and t u r n a r o u n d time are critical factors. Todays predominantly digital systems cannot always provide an adequate problem solution by available state-of-the-art hardware with regard to all constraints specified above. In contrast, biological systems frequently excel m a n - m a d e structures in respect of these requirements. Therefore, the systematic, technological exploitation of salient features from the wealth of biological and physiological evidence by bioinspired algorithms and circuit implementations is of relevance for advanced microelectronic application solutions. This especially holds for issues of power dissipation and related high input currents and heat dissipation. T h e SIA r o a d m a p [30] points out t h a t with ongoing feature size reduction and technological advance power consumption increases or, at best, stagnates. In addition t o benefits in power consumption, due t o fault tolerance by graceful degradation as well as a d a p t a t i o n and learning capability met in the biological evidence, bio-inspired systems mimicking these properties can alleviate problems met in design, yield, and test of todays complex integrated circuits and systems. However, the lessons learned from neural network hardware design versus the development and applicability of general purpose (GP) hardware, e.g., from the surging communication market and respective low-voltage and low-power implementations, have to be taken in account to achieve technically and economically sound and viable dedicated system solutions with regard to competing G P solutions. Especially systems t h a t employ complex spatio-temporal processing principles observed in biological systems ([10], [9]) are still not in reach for G P digital hardware under the constraints of size, power-consumption, and costs. Complex problems t h a t benefit from such principles and their respective implementation are thus most attractive candidates for related dedicated implementation efforts. In conjunction with an opportunistic design style [36] biological principles and bio-inspired circuits employing analog and mixed-signal design techniques in a potentially massively parallel architecture allow to deal with computational burdensome tasks in a very efficient way. Particularly, the combination of image acquisition and early vision processing is attractive for system solutions. Salient, yet complex phenomena such as recurrent feature maps, selective attention, temporal binding of features as well as habituation and a d a p t a t i o n processes can thus be efficiently implemented and exploited in technical applications. Generic system solutions as well as complete bio-inspired systems in the
Application-Driven
Design of Bio-Inspired
...
91
described domain are still out of the question as, with todays technology, the implementation of the required complexity under the given constraints is not yet within reach. Nevertheless, hybrid systems in CMOS technology with advanced bio-inspired, spatio-temporal processing and dynamics in analog technology, including CMOS compatible sensors, can be saliently combined with dedicated digital processing for competitive, dedicated, integrated system solutions. Remarkable examples are for instance reported in [23], [15], and [2] on implementations of an "artificial retina" chip and related systems for 3D human motion recognition as well as general image preprocessing, or CSEM's motion detector chip for pointing devices [l], that is used in Logitech's Marble trackballs [25] as part of a commercial product. To widen the scope for additional application domains, to exploit the described potential, and to meet constraints of turnaround time, development and design costs as well as overall system validity and performance, an efficient design methodology is required. In our work, we introduce such a methodology and enhance the standard design flow by a level for fast behavioral modeling of the vision task. Our objective is to alleviate and advance the design of hybrid application-specific vision systems incorporating bio-inspired algorithms in an opportunistic, low-power design style, so that todays industrial application needs, constraints, and requirements are met. Special emphasis is put on the realization of high-performance, yet extremely power-conserving circuits and systems with optimum exploitation of todays microelectronics potential. In the following section, our design methodology will be presented. Then, examples of application specific integrated vision based recognition systems, their modeling, and their implementation are described. Concluding, we will assess the current state and future aims of our design methodology.
4.2
Methodology for application-specific design of vision circuits and systems
The introduction pointed out the need for an efficient top-down design methodology tailored to the design flow of vision and recognition systems. Typically, the design of such a general intelligent system starts with a coarse specification of the problem and the aspired solution. Based on available examples and/or knowledge a first-cut reference system must be designed
92 A. Konig, J. Skribanowitz, M. Eb&rhardt, J. Doge & T. Knobloch
to assure the viability of the solution by simulations. After system optimization and test of robustness, the VLSI design effort can be started using the simulation system as a reference and its results as benchmarks. It is evident that the flexibility of the design platform as well as the existence and availability of suitable system performance measures, e.g., in terms of discriminance and recognition ability, are crucial for success and rapid advance of the design effort. Employing the simulation system as the QuickCog
Fig. 4.1
Enhanced Y diagram dedicated to mixed-signal cognitive systems design.
baseline together with a behavioral description, the design process now advances by repeated partitioning and elaboration of building blocks of lower complexity. In the process, the design description is both detailed and advanced from behavioral to functional and, finally, to geometrical representation. Design decisions and compromises take place, e.g., the choice of a certain fixed-point computational accuracy as well as the selection of a specific circuit technology. These design options, though they might bring benefits concerning area or power consumption, can be extremely detrimental to overall system performance and, thus, can put the viability of
Application-Driven Design of Bio-Inspired ...
93
the overall design into question. Following the basic idea and methodology applied in the systematic design of neural network hardware, neurochips, and neurocomputers [17] and extending this experience to system level, a methodology for the systematic and optimized design of integrated cognitive systems can be introduced here. Figure 4.1 visualizes the approach, enhancing the well-known Y-diagram of Gajski and Kuhn [7] to the issues implied by mixed-signal implementation of cognitive systems. T h e top-down design process described above, from concept level to algorithmic representations in b o t h C / C + + and hardware description languages such as Verilog/Verilog-A and V H D L / V H D L - A and the conversion into structural and, finally, geometrical representations, is complemented by introducing feedback and assessment p a t h s from the various levels of description and representation t o t h e reference system. T h u s , in principle, chosen design options can systematically and rapidly be validated, and the viability of the chip design can thus be assured, while minimizing design time, effort, and related costs. It is obvious from the discussion, t h a t the properties of the tool for reference system modeling and design state assessment are crucial for the overall success of the proposed methodology. For this aim, the QuickCog system has been devised in a concurrent research project (cf., e.g., [2l], [20], [19]). T h e general architecture of the adaptive QuickCog system is given in Fig. 4.2. It meets the needs of rapid reference system modeling by providing the following key features: ( 1 ) Visual programing of block diagrams for system modeling. ( 2 ) Sample set oriented processing. ( 3 ) Large collection of significant and proven m e t h o d s for image processing, pattern recognition, and artificial neural networks. Currently, bioinspired information processing m e t h o d s are included in the system. ( 4 ) Convenient and intuitive graphical user interface (GUI), t h a t supports d a t a acquisition (e.g., images or image sequences), sample set creation for learning from examples, region of interest (ROI) definition and object partitioning as well as preclassification. ( 5 ) Feature space visualization based on multivariate d a t a projection and interactive visualization techniques. This visualization gives insight into the current problem characteristics, e.g., feature discriminance, separability, class overlap, or the number of modes per class. Gradual degradation in system performance due to chosen design options
94 A. Konig, J. SkHbanowitz, M. Bberhardt, J. Doge & T. Knohloch
Pig. 4.2
QuickCog adaptive system architecture.
can be detected in the feature space visualization. (6) Assessment functions related to feature space visualization. These measures can also serve to detect and assess degradation in system performance due to chosen design options. (7) Automatic feature selection, method selection, and method parameter optimization is included. (8) Comprehensive classifier toolbox from simple centroid to powerful nonparametric classifiers, comprising statistical approaches as well as neural networks. Thus, QuickCog provides a platform for fast and efficient modeling of integrated cognitive systems from reference system modeling to implementation
Application-Driven Design of Bio-Inspired . . .
95
evaluation by using QuickCog's unique and powerful modules for the assessment of a system's efficacy. The adaptive features of the architecture, that considerably facilitate and accelerate the reference system modeling and reduce overall turnaround time, can also be well exploited just for software system design. Therefore, QuickCog serves also as a commercial tool for general visual inspection tasks. One example is given in Fig. 4.3. In
Fig. 4.3
QuickCog applied in an electronics manufacturing task.
the following, the methodology will be elucidated by a very simple design example. For the well known Irisdata, a classifier was designed in CMOS technology and subthreshold mode. The classifier implements the recall structure required for Learning-Vector-Quantization (LVQ) [22] or NearestNeighbor-techniques (kNN) [6] (cf. Fig. 4.4). The training reference system for this simple example is given in Fig. 4.5. The training system is complemented by tools for feature space visualization and assessment [18]. In an actual application, the feature input of the classifier would stem from an image processing and feature extracting hierarchy. Figure 4.6 shows the reference test system and the modified test system, which incorporates the hardware model of the classifier. In the block Stimuli In/Out, stimuli are converted and handed down the design hierarchy for simulations, and the achieved simulation results are fed back to the system for ongoing processing as well as result analysis and assessment. Figure 4.7 shows the
96
A. Konig, J. Skribanowitz,
M. Eberhardt, J. Doge & T. Knobloch
E[3:0]
T,[3:0]
T„[3:0] •A[n:1]
Fig. 4.4
LVQ and kNN classifier recall architecture.
••- i . JI.J-J
i11- *"•" g
'•li f
.
]
""tMoT
T _J
Fig. 4.5
!- * .
r:jr JJ...-U 'i»
t
»
; : . . . i ' " .._. .zr
«t
•
1 *" "" ^ . . . J —~-•
•
1
LVQ and kNN classifier training in QuickCog.
comparison of the reference and the hardware test system, employing confusion matrices for classifier recall result analysis. Evidently, the classifier's performance deteriorated from 93% to 73% due to design and circuit imperfections. For directed optimization, which is a substantial part of our design methodology, the statistical performance on its own does not provide sufficient information. In addition to the overall classification rate and the confusion rates between individual classes, the location of misclassified patterns in feature space as well as the respective feature values are of significant interest. For this purpose, our methodology uses multivariate data
Application-Driven
Design of Bio-Inspired . . .
97
SHNKO jiij
« .T" TftptR
'
*
1 {
1
* "J ' 1
*J j
J>
«
(j
Wat&w,! w
\ i
* .a 1
" * >; 1 4«(wt
*f|
. *. ..J; i
. *..ii..i
r-
|
T.*..-"J
X* M
r* # _a^u
*
J
M* i
^, _,.
|
' • ni Fig. 4.6
LVQ and kNN classifier recall by reference and hardware model.
(a) Fig. 4.7
(b)
Confusion matrices of classification results, (a) Reference, (b) Hardware model.
projection and advanced interactive feature space visualization offered by the QuickCog system [18]. These can be employed to understand the problem and optimize the circuit. Figure 4.8 shows the feature space with the imposed class labels of the reference and the hardware model, respectively. At each projection point, the feature values are plotted in a radial representation. From the feature space visualization^ it becomes obvious, that mis-
98
A. Kbnig, J. Skribanowitz,
M. Eberhardt, J. Doge & T.
Knobloch
• . . * • ;
J"*
Sx
^
A
*t%'%>*"*s
^ •"'
™im*l™ "Jl1
/ »;-*V
\
/ (a)
Fig. 4.8
(b)
Feature space with classification results, (a) Reference, (b) Hardware model.
classifications strongly correlate with very large feature values. Thus, the current circuit and its dimensioning still suffers from a saturation problem for large feature and metric values. This system oriented analysis sustains I-^V
">•>
,TM=3 M n h .
[X> "nil
(a) Fig. 4.9
Classifier circuits, (a) Subtraction, (b) Absolute value computation.
the consistency and information processing properties of the design. It can be employed on the behavioral level, e.g., Verilog or VHDL, on the functional level as given in Fig. 4.9 for two standard subcircuits of the classifier as well as on the geometrical level, e.g., simulations based on the extracted layout of the regarded circuits (cf. Fig. 4.10). The simple classifier example gives an idea of the design methodology and the assessment functions
Application-Driven
Design of Bio-Inspired
...
99
SMmi>
(a) Fig. 4.10
(b)
Classifier circuit layout, (a) Subtraction, (b) Absolute value computation.
provided in QuickCog. More visual and numeric assessment functions, e.g., for image comparison and assessment, feature space assessment as well as classifier performance assessment based on estimated aposteriori values are available. These allow to detect failures as well as to disclose gradual degradations caused by design decisions and provide a means to systematically correct and optimize the design. In ongoing extensions of our methodology, we begin to exploit adaptation and learning mechanisms for compensation of circuit imperfections. Similar to the standard approach used for training of analog neural network chips, e.g., the Intel ETANN chip, which is denoted as hardware-in-the-loop-learning, models of the imperfect circuits will be generated and incorporated in the system configuration and learning phase. Extending QuickCog to a true self-learning system, imperfections of innovative devices, circuits, and resulting structures can be overcome by: (1) Compensation employing learning of the following stages. (2) Learning in the same stage with a model of the imperfect circuit, e.g., optimizing design parameters, coefficients, or degree of parallelism. (3) Learning of all stages for optimization and compensation. So far, our methodology provides a way of systematic, consistent system design. The issue of design automation with focus on synthesis techniques has not yet been tackled in our project work. However, many research activities on analog and mixed-signal design synthesis can be observed that can be exploited and integrated with our work in the future. In the next section, several chip and system design examples will be presented, that profited from the idea of the described design methodology. These implementations were not only subject to the design methodology but actively contributed to it by feeding back experience, algorithms, methods, and simulation tech-
100
A. Konig, J. Skribanowitz,
M. Eberhardt, J. Doge & T. Knobloch
niques. The salient essence of the realized project work was extracted and integrated into the QuickCog system to alleviate and advance the design of future application-specific integrated cognitive systems.
4.3 4.3.1
Design examples of integrated low-power vision systems OCR chip for consumption
meter
read-out
In cooperation with an industrial partner, we developed a reference system and an embedded image sensor for automated visual consumption meter read-out [31J. Such a device is commercially interesting for utility companies because a large number of mechanical meters have to be read out manually today. Our approach consists of a smart sensor system snapattached to conventional mechanical meters. The OCR chip features bioinspired algorithms in the digital part and was first choice for the further development of our design methodology. Processing methods as well as design parameters, e.g., sensor size and resolution, have systematically been determined. Starting with a clear system specification, an algorithm has been derived, validated, and subsequently optimized. Because errors can be detected at an early stage of the design hierarchy, time-consuming and expensive redesigns are avoided. Due to the moderate processing speed requirements it was possible to optimize the OCR algorithm for low complexity and high discriminance. Exploiting the salient properties of QuickCog, an appropriate system configuration to cope with the problem of partly occluded characters, caused by gradual digit transition in the meters, could be rapidly determined. A template matching approach as given in Fig. 4.11 was successfully employed. For the software prototype, initially, the Euclidean distance has been used as a similarity measure. However, following our design methodology, systematic simulations showed that the Manhattan distance measure could be used equally well, which is much more convenient for VLSI implementation. Our system is capable to provide both a classification of the dominantly visible digit, as well as the exact meter wheel position. The achieved performance was only one error in 70,000 digit images. A prototype for the recognition system has been developed comprising a PC and a CCD camera connected to a commercial frame grabber. The camera can be attached to consumption meters by the means of an adapter containing also the LED illumination. This setup has successfully been
Application-Driven
W Fig. 4.11
Design of Bio-Inspired
...
101
(b)
Recognition system, (a) Template matching, (b) Neuro-inspired approach.
tested by our industrial partner. One drawback of the system is the large memory required for storing the templates, which have to be defined for each new meter and respective font type. Therefore, a multistage neuroinspired architecture (Fig. 4.11) was proposed. This hierarchical, strokebased recognition approach is described in detail in [8]. The architecture of our OCR sensor chip is depicted in Fig. 4.12 [8]. Behavioral simulations of the recognition system showed that a resolution of six bits is sufficient for the system requirements. In order to enhance the field of application, a local storage has been integrated into each pixel cell, allowing a random access read-out. Each pixel cell contains an active, integrating core cell employing a diffusion-substrate diode as a photosensitive element. A SC amplifier allows both simple read-out and correlated double sampling (CDS). Due to the moderate speed requirements a twostep flash architecture with a resolution of three bits has been chosen for the A/D converter. The on-chip finite state machine generates all internal clock signals and is controlled by five input signals. Figure 4.13 shows the pixel cell layout, featuring a size of (31.6/im) 2 . The photosensitive area amounts to (15.0/xm)2. By predominantly using minimum size structures, a fill factor of 22.5% could be achieved. The source follower and reset transistors have been designed with twice the minimum structure size in order
102 A. Konig, J. Skribanowitz, M. Eberhardt, J. Doge & T. Knobloch
BA CDS CLK ResetN-
Control-Unit
4 li
Digital Output
Afttjiftfiftrj
*-?£•
Pixel Array
fttxel Reset Buffer
Array 3x2?
Analog Output x-Address Decoder
Kc ierroce Currents
JL
IT
Current
Address-Latch k
vren
[7] digital
Fig. 4.12
' A d d r e s s Decoder
vrefh
|_] mixed
Address
Q analog
Architecture of the simple CMOS OCR sensor.
to achieve a better matching and to reduce fixed pattern noise. The chip
'^^WffflRiP^*
(a) Fig. 4.13
(b)
OCR sensor chip, (a) Layout of pixel cell, (b) Chip photograph.
has been fabricated and successfully tested. Figure 4.13 depicts a chip photograph. The sensor features a frame rate of up to 20 images per second. A QuickCog based system for the read-out task with a captured sample im-
Application-Driven
Design of Bio-Inspired
. . . 103
age and matching templates is shown in Fig. 4.14. As our OCR algorithm has to be able to deal with poor illumination conditions, the local adaptation concept, discussed in Section 4.3.4, has been used as an improvement. Summarizing, a QuickCog based reference system has been developed and
••'"
'•"'"•"
Fig. 4.14
" ' " ' ' '
••••••••"••"."•
.•.•••=.•"•••••'.••'•••••••
- . ; •
•'•'•••'••'••
,';";'3g:ja^':•"•.
,.'"•'""
\xs&--
'
•}
QuickCog OCR system employing CMOS sensor images.
assessed. Crucial VLSI design parameters, e.g., the sensor's spatial and the pixel value resolution as well as the metricforthe recognition process have been determined by systematic simulations and used in a mixed-signal design effort. The image sensor's operation could be verified and acquisited meter images were correctly classified by reference system, thus proving the viability of the design approach as well as the validity of the design it-
104
A. Konig, J. Skribanowitz,
M. Eberhardt, J. Doge & T. Knobloch
self. The neuro-inspired OCR algorithm itself was modeled in Verilog and simulated [8], but due to funding limitations was not manufactured so far.
4.3.2
Overtake monitor and
eye-tracker
As outlined in the introduction vision problems requiring complex tasks such as spatio-temporal image sequence processing impose high demands on a viable solution and are thus first choice candidates for a dedicated VLSI implementation. For instance, in the automotive area, autonomous vehicle guidance, driver assistance (collision avoidance, control of the distance to vehicles in front, detection of drowsy drivers, overtake monitoring) as well as the surveillance of the car interior are such challenging tasks with also significant economical background. For our design activities, we regarded overtake monitoring as especially suited. Passing vehicles are to be monitored by the system, and the driver should be warned not to change lanes in dangerous situations. After a comprehensive study of motion detection and tracking methods (such as optical flow, [12] and [29]), we focused on feature based schemes (using corners and edges) with bio-inspired preprocessing and developed a prototype system simulator for Overtake Monitoring. Inspired by the ASSET-2 system introduced by Smith [33], we developed a smart, hardware-oriented algorithm, that tracks and clusters these features as well as the resulting objects. The system simulator carries out a risk assessment for changing lanes and computes the driver warning in dangerous situations. Figure 4.15 (a) shows the implemented processing steps, while Fig. 4.15 (b) gives a demonstration of the OTM simulator capabilites, based on a two-lane highway scene. Spatio-temporal smoothing as preprocessing, as used by Nagel in his work on optical flow [29], was found salient for stabilization of reliable corner detection. Processing steps that are shown gray-colored in Fig. 4.15 (a) are suitable for an implementation in analog hardware. Furthermore, the corner detection stage can also be integrated on the CMOS image sensor, reducing the required complexity and data throughput of the digital part. Detection and tracking of faces in image sequences, e.g., user eye pairs for the control of three-dimensional displays and graphical user interfaces, is an excellent application in the field of multimedia and surveillance. In particular, autostereoscopic 3D-displays are subject of intensive research and development due to their significant market potential. For a correct stereoscopic impression the exact pupil positions of the oberservers' eyes
Application-Driven
f
Design of Bio-Inspired
...
105
Risk AssESSfrient and Warning of Driver
(a) Fig. 4.15
(b) Overtake monitor, (a) Flow diagram, (b) Simulation results.
must be known in order to control the 3D-display hardware and software. In contrast to [5] we chose a monofocal approach, which makes higher demands on the vision system in terms of processing power. The system integration into -one single device together with image plane processing as well as advanced techniques such as local adaptation (cf. section 4.3.4) have the potential to compensate for it, resulting in a low-cost, compact system. The prototype system has been developed using the same design approach as for the overtake monitor [32]. The implemented processing steps of our hardware-friendly algorithm are shown in Fig. 4.16 (a). Again, the gray-colored boxes emphasize the processing steps that are most attractive candidates for an analog, bio-inspired, mixed-signal VLSI implementation. As in the overtake monitor the image sequence is first spatio-temporally smoothed. Next, contour maps are derived using a Difference-of-Gaussian (DoG) filter and zero-crossing evaluation. Approximations to DoG filters can conveniently be implemented in analog hardware. Their successful application in the modeling of the receptive field of the human vision system [37] made them even more appealing. The identified contours are represented by polylines and polygenes for an easier and faster processing. Eye
106
A. Konig, J. Skribanowitz,
M. Eberhardt, J. Doge & T.
Knobloch
region candidates are detected and tracked by a rule-based approach. The algorithm employs pairwise matching of eye region candidates and filterbased extraction of pupils within the eye shape regions to determine and output- the pupil coordinates of valid eye pairs. Figure 4.16 (b) demonstrates our current eye-tracker system simulator. Figure 4.17 shows the modular QuickCog implementation of the eye-tracker system, which will serve for step by step hardware modeling according to our design methodology. The salient image preprocessing features common for both applica-
c
* « * » § * A?