ADVANCED TOPICS IN SCIENCE AND TECHNOLOGY IN CHINA
ADVANCED TOPICS IN SCIENCE AND TECHNOLOGY IN CHINA Zhejiang Univer...
78 downloads
694 Views
21MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCED TOPICS IN SCIENCE AND TECHNOLOGY IN CHINA
ADVANCED TOPICS IN SCIENCE AND TECHNOLOGY IN CHINA Zhejiang University is one of the leading universities in China. In Advanced Topics in Science and Technology in China, Zhejiang University Press and Springer jointly publish monographs by Chinese scholars and professors, as well as invited authors and editors from abroad who are outstanding experts and scholars in their fields. This series will be of interest to researchers, lecturers, and graduate students alike. Advanced Topics in Science and Technology in China aims to present the latest and most cutting-edge theories, techniques, and methodologies in various research areas in China. It covers all disciplines in the fields of natural science and technology, including but not limited to, computer science, materials science, life sciences, engineering, environmental sciences, mathematics, and physics.
Xingui He ShaohuaXu
Process Neural Networks Theory and Applications
With 78 figure s
'w:"
T
ZHEJIANG UNIVERSITY PRESS
mrjI*~lliJt&U
~ Springer
Authors Prof. Xingui He School of Electronic Engineering and Computer Science Peking University 10087 1, Beij ing, China E-mail: hexg @cae.cn
Prof. Shaohua Xu School of Electronic Engineering and Computer Science Peking University 100871, Beijing, China E-mail: xush62@ 163.com
Based on an original Chinese edition: ct~; if ~ ;;, I'l) ~ (Guocheng Shenjing Yuan Wangluo), Science Press, 2007.
ISSN 1995-6819 e-ISSN 1995-6827 Advanced Topics in Science and Technology in China ISBN 978-7-308-05511-6 Zhejiang University Press, Hangzhou ISBN 978-3-540-73761-2 e-ISBN 978-3-540-73762-9 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2008935452
© Zhejiang University Press, Hangzhou and Springer-Verlag Berlin Heidelberg 2009 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. springer.com
Coverdesign: Frido Steinen-Broo, EStudio Calamar, Spain Printed on acid-free paper Springer is part of Springer Science-Business Media (www.springer.com)
Preface
The original idea for this book came from a conference on applications of agricultural expert systems, which may not seem obvious. During the conference , the ceaseless reports and repetitious content made me think that the problems the attendees discussed so intensely, no matter which kind of crop planting was involved, could be thought of as the same problem, i.e. a "functional problem" from the viewpoint of a mathematical expert. To achieve some planting indexes, e.g. output or quality, whatever the crop grown, different means of control performed by the farmers, e.g. reasonable fertilization, control of illumination, temperature, humidity, concentration of CO2, etc., all can be seen as diversified time-varying control processes starting from sowing and ending at harvest. They could just as easily be seen as the inputs for the whole crop growth process. The yield or the quality index of the plant can then be considered as a functional dependent on these time-varying processes. Then the pursuit of high quantity and high quality becomes an issue of solving an extremum of the functional. At that time, my research interest focused on computational intelligence mainly including fuzzy computing, neural computing, and evolutionary computing, so I thought of neural networks immediately . I asked myself why not study neural networks whose inputs and outputs could both be a time-varying processes and why not study some kinds of more general neural networks whose inputs and outputs could be multi-variable functions and even points in some functional space. Traditional neural networks are only used to describe the instantaneous mapping relationship between input values and output values. However, these new neural networks can describe the accumulation or aggregation effect of the outputs on the inputs on the time axis. This new ability is very useful for solving many problems including high-tech applications in agriculture and for elaborate description of the behavior of a biological neuron. The problems that the traditional neural networks solved are function approximation and function optimization , and the problems we need to solve now are functional approximation and functional optimization, which are more complicated. However, as a mathematician my intuition told me that there existed the possibility of resolving these problems with certain definite constraints and that there might be the prospect of broader applications in the future. In research during the following years, I was attracted by these issues. In addition to numerous engineering tasks (e.g. I had assumed responsibility in China for manned airship engineering) , almost all the rest of my time was spent on this study. I presented the
VI
Preface
concept of the "Process Neural Network (PNN)", which would be elaborated in this book. In recent years, we have done some further work on the theories, algorithms, and applications of process neural networks, and we have solved some basic theory issues, including the existence of solutions under certain conditions, continuity of the process neural network models, several approximation theorems (which are the theoretical foundations on which process neural network models can be applied to various practical problems), and we have investigated PNN's computational capability. We have also put forward some useful learning algorithms for process neural networks, and achieved some preliminary applications including process control of chemical reactions, oil recovery, dynamic fault inspection, and communication alert and prediction. It is so gratifying to obtain these results in just a few years. However, the research is arduous and there is a long way to go. Besides summarizing the aforementioned preliminary achievements, this monograph will highlight some issues that need to be solved. At the time of completing this book, I would like to express my sincere thanks to my many students for their hard work and contributions throughout these studies. Furthermore, I also wish to thank those institutes and persons who generously provided precious data and supported the actual applications.
Xingui He Peking University Beijing April, 2009
Contents
1 Introduction
1
1.1 Development of Artificial Intelligence
I
1.2 Characteristics of Artificial Intelligent System
5
1.3 Computational Intelligence
9
1.3.1 Fuzzy Computing
9
1.3.2 Neural Computing
12
1.3.3 Evolutionary Computing
12
1.3.4 Combination of the Three Branches
15
1.4 Process Neural Networks
16
References
17
2 Artificial Neural Networks
20
2.1 Biological Neuron
21
2.2 Mathematical Model of a Neuron
22
2.3 FeedforwardlFeedback Neural Networks
23
2.3.1 FeedforwardlFeedback Neural Network Model ..............................
23
2.3.2 Function Approximation Capability of Feedforward Neural Networks
25
2.3.3 Computing Capability of Feedforward Neural Networks
27
2.3.4 Learning Algorithm for Feedforward Neural Networks
28
2.3 .5 Generalization Problem for Feedforward Neural Networks
28
2.3.6 Applications of Feedforward Neural Networks
30
2.4 Fuzzy Neural Networks
32
2.4 .1 Fuzzy Neurons
32
2.4 .2 Fuzzy Neural Networks
33
VIII
Contents
2.5 Nonlinear Aggregation Artificial Neural Networks 2.5.1 Structural Formula Aggregation Artificial Neural Networks
35 35
2.5.2 Maximum (or Minimum) Aggregation Artificial Neural Networks 2.5.3 Other Nonlinear Aggregation Artificial Neural Networks
35 36
2.6 Spatio-temporal Aggregation and Process Neural Networks
37
2.7 Classification of Artificial Neural Networks
39
References ..
40
3 Process Neurons
43
3.1 Revelation of Biological Neurons
43
3.2 Definition of Process Neurons
44
3.3 Process Neurons and Functionals
47
3.4 Fuzzy Process Neurons
48
3.4.1 Process Neuron Fuzziness
49
3.4.2 Fuzzy Process Neurons Constructed using Fuzzy Weighted Reasoning Rule
50
3.5 Process Neurons and Compound Functions
51
References
52
4 Feedforward Process Neural Networks
53
4.1 Simple Model of a Feedforward Process Neural Network
53
4.2 A General Model of a Feedforward Process Neural Network
55
4.3 A Process Neural Network Model Based on Weight Function Basis Expansion 4.4 Basic Theorems of Feedforward Process Neural Networks
56 58
4.4.1 Existence of Solutions
59
4.4.2 Continuity
62
4.4.3 Functional Approximation Property................................................
64
4.4.4 Computing Capability ..................... ................ .................................
67
4.5 Structural Formula Feedforward Process Neural Networks
67
4.5.1 Structural Formula Process Neurons
68
4.5.2 Structural Formula Process Neural Network Model
69
4.6 Process Neural Networks with Time-varying Functions as Inputs and Outputs
71
4.6.1 Network Structure
71
Contents
4.6.2 Continuity and Approximation Capability of the Model .................
IX
73
4.7 Continuous Process Neural Networks
75
4.7.1 Continuous Process Neurons
76
4.7.2 Continuou s Process Neural Network Model. ..................................
77
4.7.3 Continuity, Approximation Capabil ity, and Computing Capability of the Model........................................ ...........................
78
4.8 Functional Neural Network
83
4.8.1 Functional Neuron
84
4.8.2 Feedforward Functional Neural Network Model ............................
85
4.9 Epilogue
86
References
87
5 Learning Algorithms for Process Neural Networks
88
5.1 Learning Algorithms Based on the Gradient Descent Method and Newton Descent Method
89
5.1.1 A General Learning Algorithm Based on Gradient Descent
89
5.1.2 Learning Algorithm Based on Gradient-Newton Combination
91
5.1.3 Learning Algorithm Based on the Newton Descent Method
93
5.2 Learning Algorithm Based on Orthogonal Basis Expansion
93
5.2.1 Orthogonal Basis Expansion of Input Functions
94
5.2.2 Learning Algorithm Derivation
95
5.2.3 Algorithm Description and Complexity Analysis
96
5.3 Learning Algorithm Based on the Fourier Function Transformation 5.3.1 Fourier Orthogonal Basis Expansion of the function in L
2[0,2n]....
5.3.2 Learning Algorithm Derivation 5.4 Learning Algorithm Based on the Walsh Function Transformation
97 97
99 lOl
5.4.1 Learning Algorithm Based on Discrete Walsh Function Transformation
101
5.4.2 Learning Algorithm Based on Continuous Walsh Function Transformation 5.5 Learning Algorithm Based on Spline Function Fitting
105 108
5.5.1 Spline Function
108
5.5.2 Learning Algorithm Derivation
109
5.5.3 Analysi s of the Adaptability and Complexity of a Learning Algorithm
III
X
Contents
5.6 Learning Algorithm Based on Rational Square Approximation and Optimal Piecewise Approximation............................................ ......
112
5.6.1 Learning Algorithm Based on Rational Square Approximation ...
112
5.6.2 Learning Algorithm Based on Optimal Piecewise Approximation
119
5.7 Epilogue
126
References
126
6 Feedback Process Neural Networks 6.1 A Three-Layer Feedback Process Neural Network
128 129
6.1.1 Network Structure
129
6.1.2 Learning Algorithm .
130
6.1.3 Stability Analysis
132
6.2 Other Feedback Process Neural Networks
135
6.2.1 Feedback Process Neural Network with Time-varying Functions as Inputs and Outputs 6.2.2 Feedback Proce ss Neural Network for Pattern Classification
135 136
6.2.3 Feedback Process Neural Network for Associative Memory Storage
137
6.3 Application Examples
138
References
142
7 Multi-aggregation Process Neural Networks
143
7.1 Multi-aggregation Process Neuron
143
7.2 Multi-aggregation Proces s Neural Network Model
145
7.2.1 A General Model of Multi-aggregation Process Neural Network
145
7.2.2 Multi-aggregation Process Neural Network Model with Multivariate Process Functions as Inputs and Outputs 7.3 Learning Algorithm
147 148
7.3.1 Learning Algorithm of General Models of Multi-aggregation Process Neural Networks
148
7.3.2 Learning Algorithm of Multi-aggregation Process Neural Networks with Multivariate Functions as Inputs and Outputs
152
7.4 Application Examples
155
7.5 Epilogue
159
Contents
References
XI
160
8 Design and Construction of Process Neural Networks
161
8.1 Process Neural Networks with Double Hidden Layers
161
8.1.1 Network Structure
162
8.1.2 Learning Algorit hm
163
8.1.3 Application Examples
165
8.2 Discrete Process Neural Network
166
8.2.1 Discrete Process Neuron
167
8.2.2 Discrete Process Neural Network
168
8.2.3 Learning Algorithm
169
8.2.4 Application Examples
170
8.3 Cascade Process Neural Network
172
8.3.1 Network Structure
173
8.3.2 Learning Algorithm
175
8.3.3 Application Examp les
176
8.4 Self-organizing Process Neural Network
178
8.4.1 Network Structure
178
8.4.2 Learning Algorithm
179
8.4.3 Application Examples
182
8.5 Counter Propagation Process Neural Network
184
8.5.1 Network Structure
185
8.5.2 Learning Algorithm
185
8.5.3 Determination of the Number of Pattern Classificatio ns
186
8.5.4 Application Examples
187
8.6 Radial-Basis Function Process Neural Network
188
8.6.1 Radial-Basis Process Neuron
188
8.6.2 Network Structure
189
8.6.3 Learning Algorithm
190
8.6.4 Application Examp les
192
8.7 Epilogue
193
References
193
9 Application of Process Neural Networks
195
9.1 Application in Process Modeling
195
9.2 Application in Nonlinear System Identification
198
XII
Contents 9.2.1 The Principle of Nonlinear System Identification
199
9.2.2 The Proces s Neural Network for System Identification
200
9.2.3 Nonlinear System Identification Process
201
9.3 Application in Process Control
203
9.3.1 Process Control of Nonlinear System
204
9.3.2 Design and Solving of Process Controller
204
9.3.3 Simulation Experiment
208
9.4 Application in Clustering and Classification
210
9.5 Application in Process Optimization
215
9.6 Applicat ion in Forecast and Prediction
216
9.7 Application in Evaluation and Decision
224
9.8 Application in Macro Control
226
9.9 Other Applications
227
References
231
Postscript
233
Index
238
1 Introduction
As an introduction to this book, we will review the development history of artificial intelligence and neural networks, and then give a brief introduction to and analysis of some important problems in the fields of current artificial intelligence and intelligent information processing. This book will begin with the broad topic of "artificial intelligence", next examine "computational intelligenc e", then gradually turn to "neural computing", namely, "artificial neural network s", and finally explain "process neural networks", of which the theories and applications will be discussed in detail.
1.1 Development of Artificial Intelligence The origins of artificial intelligence (AI) date back to the 1930s-1940s. For more than half a century, it can be said that the field of artificial intelligence has made remarkable achievements, but at the same time has experien ced many difficultie s. To give a brief description of artificial intelligence development, most events and achievements (except for artificial neural networks) are listed in Table 1.1. The main purpo se of artificial intelligence (AI) research is to use computer models to simulate the intelligent behavior of humans and even animals, to simulate brain structures and their function s, the human thinking process and its methods . Therefore, an AI system generally should be able to accompli sh three tasks : (a) to represent and store knowledge ; (b) to solve various problems with stored knowledge; (c) to acquire new knowledge when the system is running (that is the system has the capability of learning or knowledg e acquisition ). AI has been develop ing rapidly over the past 50 years. It has been widely and successfully applied in many fields, such as machine learning , natural language comprehension, logic reasoning, theorem proving , expert systems, etc. Along with the continuous extension of AI application fields and with the problem s to be solved becoming more and more complex, traditional AI methods based on a symbol processing mechanism encountered more and more difficulties
Process NeuralNetworks
2
Table 1.1 The milestones of artificial intelligence Date 1930s -
I940s
Leading players Frege, Whitehead , and Russell
1936
Turing
1946
Turing
1948
Shannon
1956
McCarthy et al.
1960
McCarthy
1964
Rubinson
1965
Zadeh
1965
Feigenbaum
1977
Feigenbaum
Description and significance of event or production Established mathematical logic system and gave us new ideas about computation Established automata theory, promoted the research of "thinking " machine theory, and proposed the recursive function based on discrete quantities as the basis of intelligent description Pointed out the essence of the theory "thinking is computing " and presented formal reasoning in the process of symbolic reasoning Established information theory which held that human psychological activities can be researched in the form of information , and proposed some mathematical models to describe human psychological activities Proposed the terminology "artificial intelligence" (AI) for the first time which marks the birth of Al based on symbol processing mechanism Developed the list processing language LISP which could deal with symbols conveniently and later was applied widely in many research fields of AI Proposed the inductive principle which marks the beginning of research into machine proving of theorems in AI Proposed the fuzzy set, and pointed out that the membership function can describe fuzzy sets, which marked the beginning of fuzzy mathematics research. Binary Boolean logic especially was extended to fuzzy logic Proposed an expert system which used normative logical structure to represent expert knowledge with enlightenment, transparency , and flexibility which was widely applied in many fields Proposed knowledge engineering that used the principles and methods of AI to solve application problem s. Established expert systems by develop ing intelligent software based on knowledge
with artificial intelligence technology when solving problems such as knowledge representation, pattern information processing, the combinatorial explosion, etc. Therefore, it has practical significance to seek a theory and method that have intelligent characteristics such as self-organization, self-adaptation, self-learning, etc., and which is suitable for large-scale parallel computation. Almost at the same time as the above research activities, some scientists were also seeking methods of representing and processing information and knowledge from different viewpoints and research domains. In 1943, the physiologist
Introduction
3
McCulloch and the mathematician Pitts abstracted the first mathematical model of artificial neurons [1] by imitating the information processing mechanism of biological neurons, which marked the beginning of artificial neural networks research based on connectionism. In 1949, the psychologist Hebb proposed the Hebb rule [2), which can achieve learning by modifying the connection intensity among neurons, and make the neuron have the ability to learn from the environment. In 1958, Rosenblatt introduced the concept of the perceptron [3]. From the viewpoint of engineering, this was the first time that an artificial neural network model was applied in information processing. Although the perceptron model is simple, it has characteri stics such as distributed storage, parallel processing, learning ability, continuous computation, etc. In 1962, Widrow proposed an adaptive linear element model (Adaline) [4J that was successfully applied to adaptive signal processing. In 1967, Amari implemented adaptive pattern classification [5] by using conferring gradients. The period from 1943 to 1968 can be considered as the first flowering of artificial neural networks research. In this period, there were many more important research achievements, but we have not listed all of them here. In 1969, Minsky and Papert published Perceptrons [6] , which indicated the limitation of function and processing ability of the perceptron , that it cannot even solve simple problems such as "Xor". The academic reputation of Minsky and the rigorous discussion in the book, led their viewpoints to be accepted by many people, and this made some scholars who had engaged in artificial neural networks earlier to tum to other research fields. Research in artificial neural networks came into a dormant period that lasted from 1969 to 1982. Although research in neural networks encountered a cold reception, many scholars still devoted themselves to theoretical research. They proposed lots of significant models and methods, such as Amari's neural network mathematical theory [7] (in 1972), Anderson et al:' s BSB (Brain-State-in-Box) model [8) (in 1972), Grossberg's adaptive theory [9] (in 1976), etc. In the early 1980s, the physical scientist Hopfield proposed a feedback neural network (HNN model) [10] (in 1982) and successfully solved the TSP (Traveling Salesman Problem) by introducing an energy function. Rumelhart et al. proposed the BP algorithm in 1986 that preferably solved the adaptive learning problem [111 of feedforward neural networks. From 1987 to 1990, Hinton [12], Hecht-Nielson [13], Funahashi [14] and Hornik et al. [15] separately presented the approximation capability theorem of multi-layer BP network which proved that multi-layer feedforward neural networks can approximate any Lz function. This theorem established the theoretical basis for the practical application of neural networks, and helped the theory and application of neural networks to mature gradually. Artificial neural networks came into a second flowering of research and development. In 1988, Linsker proposed a new self-organizing theory [16) based on perceptron networks, and formed the maximum mutual information theory based on Shannon 's information theory. In the 1990s, Vapnik and his collaborators proposed a network model called Support Vector Machine (SVM) [17-19J according to the structural risk minimization principle based on learning theory with a limited sample, and it was widely applied
4
Process Neural Networks
to many problems such as pattern recognition, regre ssion, density estimation, etc. In recent years, many novel artificial neural network models have been established and broadly applied in many areas such as dynamic system modeling [20,21 1, system identification [221, adaptive control of nonlinear dynamic systems [23,24 1, time series forecasting [251, fault diagnosis 1261, etc. In 2000, we published process neuron and process neural network (PNN) models after years of intensive study [27,28 1• The input signal s, connection weight s, and activation thresholds of process neurons can be time-varying functions, or even multivariate function s. Based on the spatial weighted aggregation of traditional neurons. an aggregation operator on time (or even more factors) is added to make the process neuron have the ability to process space-time multidimensional information. This expands the input-output mapping relation ship of the neural networks from function mapping to functional mapping, and greatly improves the expre ssion capability of neural networks . A series of basic theorems (including existence theorem, approximation theorem, etc.) of proce ss neural networks have been proved and some related theoretical problems have been solved . Practice shows that PNN models have broad applications in many actual signal processing problems relating to process. These will be the core content in this book . At present , there are thousands of artificial neural network models, of which there are more than 40 primary ones . The application scope of these models covers various fields including scientific computation, system simulation, automatic control, engineering applications, economics, etc., and they show the tremendous potential and development trends of artificial neural networks . However, most present neural networks are traditional neural networks with spatial aggregation and have no relation with time. Traditional AI methods based on symbol processing mechanisms and neural networks based on connectionism are two aspects of AI research , and each of them has its own advantages and limitations. We assume that the combination of both methods can draw strengths from each other to offset the weaknesses. For example, the setting and connection mode of neural network nodes (neurons) can definitely connect the solving goal with the input variables . We once observed that the specific reasoning rules can be considered as the network nodes (neurons) and "reasoning" can be converted into "computing". At the same time, according to the rules described by knowledge in the practical field, the connection mode and activation threshold among the network nodes can be properly chosen and modified to express more reasonable logical relationships among the described problem s, and the corresponding expert system can be designed in terms of a structure of a neural network. The term AI, as its name suggests, involve s making "intelligence" artificially, or even making an intelligent system. Its short-term goal is to implement intelligence simulation in an existing computer and endow the computer with some intelligent behavior, while its long-term goal is to manufacture an intelligent system and endow it with intelligence similar to (or perhaps exceeding in some aspects) that of animals or human beings . Using AI to study autocorrelation problems in the human brain seems to be a paradox in logic and involves complex recursive processes in
Introduction
5
mathematics, and is high in difficulty . The problem is that how the brain works might never be understood in some sense, because the brain itself is also changing and developing while people are studying it. If some aspects of the brain at some time were studied clearly, the brain function at that time might develop, the former state might change again, and this would not be the same as the original research objective . However, such a spiral research result is still very significant and can be applied in various practical problems . Therefore, we think that, on the one hand, AI should have a long-term research goal and this goal can be gradually approximated; on the other hand, we still need to propose various short-term goals and these goals should not deviate from practical applications to reach for that which is beyond our grasp. The development history of AI in this respect has already given us many lessons, which are worth remembering by AI researchers . In short, the development of Artificial Intelligence has experienced ups and downs during the past 60 years. Because of the increased demands in science fields and practical applications, we believe that AI will undergo further development, play a more important role in the advancement of science and technology through its role in tackling human and other problems that are difficult to solve with traditional method s at present , and that it will also make great contributions to producing intelligent systems for human beings in the future.
1.2 Characteristics of Artificial Intelligent System What system can be called an intelligent system? This is a question that we should answer before setting about researching intelligent systems. It can be said that we should set up a research goal. Of course, the understanding of this question changes dynamically , and we cannot answer it clearly in a moment. In fact, we can first find some rough answers from analysis of the intelligent behavior of biological systems .
(1) An intelligent system is a memory system From the perspective of neurophysiology, what is called memory is the storage capacity and the processing procedure for information obtained from the external or produced from the internal. There is a large amount of information that comes from the outside world, through sense organs, inwards to the brain . The brain does not store all the information that the sensory organ directly receives , but only stores the information obtained through learning or that is of some significance. Therefore, the intelligent system must have memory storage capacity; otherwise, it will lose the object and cannot store processing results, just as a person who has completely lost his memory will no longer have intelligence. In addition, the artificial intelligent system is not completely identical with the human brain; the latter has a powerful memory ability which will decrease gradually, so the former should simulate the latter in aspects of memory and forgetting in some way.
6
Process Neural Networks
(2) An intelligent system is a computation system Cognitive science considers that "cognition is computing": it combines intelligence with computation closely and forms a new concept-computational intelligence. What is called computation refers to the process by which we carry out various operations and combinations (including digital or analog) repeatedly using a certain symbol set according to some rules. The acquisition, representation, and processing of knowledge can all come down to the computation process . Therefore, an artificial intelligence system should also have this computing capability to accomplish the corresponding functions . In the Chinese language , there is an alias "electronic cerebra" for the computer, which is of great significance. Carrying out various digital or analog operations fleetingly is the strong point of the computer, so a computer is quite suitable for simulating some intelligent behaviors. However, there are some troubles and problems when we directly use current digital machines or analog machines to handle fuzzy information or qualitative data, and indeed sometimes they cannot handle it at all, so we expect to use a digital machine that has an analog operation component. Such a machine is different from a general digital/analog mixed machine, it should have a uniform digital/analog mixed memory in which to deposit the processing object, and its processor should possess a uniform mixed processing ability for this mixed information . We believe that research on the computing capability of an intelligent system is very important and worth strengthening, and that research and development of the computing capability of an intelligent system (such as fuzzy neural computation) will greatly promote basic research on intelligence, or even the whole development of computer science. (3) An intelligent system is a logical system Traditional logic is binary logic and it is adequately utilized in the von Neumann computer, but in fact, the reasoning logic of humans does not strictly abide by binary logic. Especially when the cognition of something is unclear or not completely clear, we only describe this by a qualitative or fuzzy concept, and handle this with a qualitative method or by fuzzy logic. Therefore, an artificial intelligent system should be able not only to carry out routine logical reasoning, but also to represent and process various qualitative and fuzzy concepts that are described by natural language, and then execute the corresponding qualitative or fuzzy reasoning . Consequently, an artificial intelligent system becomes a strong logical processing system. In addition to logical reasoning, the system should also be able to execute complex logical judgments, and adopt appropriate action or reaction according to the judgments. The current computer is competent for binary logic or finite multi-valued logic, but is helpless when it comes to some continuous-valued logic (for example, fuzzy logic) and qualitative logical reasoning. We need the above digital/analog unified processing hybrid computer to meet these demands . (4) An intelligent system is a perceptive system An important characteristic of a biological system is that it can perceive the outside
Introduction
7
environment through various sensory organs and acquire various bits of information, and make responses based on the information received. Many researches on artificial intelligent systems have been done to apperceive the outside environment by various sensors, e.g. a variety of robot systems . It should be said that this perception not only acquires information from the outside environment by sensors, but also pretreats the information. An artificial neural network perceptron, especially a multi-layer perceptron, has strong processing ability (for instance, the BP network with a single-hidden layer can approximate any functions on ~), and can complete this pretreatment. Perception is a "black box" problem and belongs at the bottom level of cognitive behavior. A neural network provides an effective approach to solving such a black box problem. Perception is the basis of an intelligent system to understand the outside environment, so its simulation system should also have this ability. (5) An intelligent system is an interactive system Biological systems need to interact with the outside environment. Here we do not consider physical interactions, but only discuss information and knowledge communication. Commonly , a biological system cannot complete the acquisition and processing of knowledge at one time; it often needs to supplement and continuously modify the acquired knowledge according to outside circumstances, and verify the correctness of the knowledge obtained from the outside environment to perfect itself. Thereby, in principle , interactivity of an artificial intelligent system is a necessary function. In seeking self-improvement according to a change in the environmental conditions or the practical requirements of users, the system must interact with the outside environment. When using the system, if a user wants to control the behavior of the system or give the system some necessary information at any moment, the user must demand that the intelligent system has its own interactive ability and a convenient man-machine (machine-machine) interactive interface and means. (6) An intelligent system is a learning system Learning is a process by which a biological system acquires knowledge through interaction with the outside environment and learning ability is an important factor in intelligence. There are different levels of learning ability. They range from the low-level information of conditioned reflex to the high level of imparting of language knowledge, so an artificial intelligent system should also be divided into different levels to simulate the learning process of the biological system. Learning and memorizing require interaction. The learning result needs to be memorized, while significant memory is acquired by learning training samples repeatedly . In the process of learning, knowledge can be acquired in two ways. One is to learn knowledge from teachers or judge the concept by specific hint information to accumulate and update knowledge. The other way does not need a teacher's guidance; it is "independent", which means that it can modify the stored knowledge in a neural system according to observation and learning from the environment to better accord with the inherent rules of the environment and essential characteristics of the outside
8
Process Neural Networks
environment. A system without learning ability cannot be called an intelligent system , but only a memorizer. It is becau se the intelligent system has learning ability that it is able to acquire knowledge from the outside constantly, ju st like a biological system. In addition , it can process acquired knowledge, reject useless or outdated knowledge, modify old knowledge, add new knowledge, and constantly improve its own intelligence level. The system can show strong adaptive ability and fault-tolerance becau se of its learning ability. At the same time, the system will not be paralyzed in case of a local breakdown or error , and will not suffer large deviation due to the interference of the outside environment. Consequently, it can improve its ability to adjust to the changes in the environment by learning constantly. (7) An intelligent system is a self-organizing system Self-organization, self-adaptation, and self-perfection are important characteristics of a biological system. From a macroscopic perspective, the nervous system in the brain of a biological system can not only memorize various acquired knowledge, but also understand new unknown information by self-learning, and adapt itself to variou s complex environments. From a microscopic perspective, the brain neural network system can reconstruct and reform its neural network in the process of adapting to the environment. Therefore, an artificial intelligent system should also have the characteristics of self-organization and self-adaptability, so that it can learn from an unknown environment or simulate independently some learning mechanism like competition, and adjust and reorganize its system structure properly . (8) An intelligent system is an evolutionary system Learning and evolution are two concepts that are interrelated but different from each other , because learning is an individual behavior, while evolution is a sort of group behavior. Because an intelligent system has learning ability , each individual in an intelligent system can acquire experience and knowledge through interaction with the constantly changing environment so as to adapt to the changes . However, the learning abilities of various individuals are different. As a biological group, they also adjust themselves constantly to the change in the environment and change their functions from simple to complex and from low-clas s to high-class. This development process is just the so-called "evolutionary proces s". Similarly, self-organization is also just individual behavior, but it supports the evolution of the whole species together with learning ability. The group in an artificial intelligent system should have the ability to simulate the process of biological evolution ; therefore, an intelligent system is an evolutionary system. In virtue of its evolutionary ability, the intelligent system group can con stantly improve its adaptation to the environment, and thus it has a strong competitive ability . (9) An intelligent system is a thinking system Thinking is a brain function unique to Primate s, and only human being s have real
Introduction
9
thinking ability. Thinking is generally divided into logical thinking and image-based thinking that are controlled by the two hemispheres of the brain respectively . In a narrow sense, thinking is often equal to association; in a broad sense, thinking can be considered as various activities and abilities of the brain. The characteristics of an intelligent system outlined above, such as memory ability, computing capability , logical reasoning ability, perception ability, interaction ability, learning ability, self-organizing ability, the evolutionary characteristic, etc., can all be considered as the basis of the brain 's more advanced thinking activity. It is the ideal and aim of artificial intelligence scholars to achieve an intelligent system with thinking . Though this aim is great and there are many difficulties to be encountered and the way to go is still considerable, we believe that as long as we propo se a reasonable intermediate targets and search for the correct way persistently, the great aim will be realized gradually .
1.3 Computational Intelligence Biological species make progress and are optimized by natural competition. How artificial intelligence simulates this evolutionary process is worth studying. For example, evolutionary computations simulate the process of biological evolution in nature, and there are some highly parallel and multi-directional optimization algorithms that can overcome the fatal weakness that a single locus descent algorithm easily falls into a local extremum. In recent years, research and application results of various genetic algorithms and evolutionary algorithms have attracted great attention in the artificial intelligence field. Computational intelligence is a quite active and relatively successful branch of the artificial intelligence field at present. Computational intelligence is a subject that acquires and expresses knowledge and simulates and implements intelligent behavior by means of computing . At present, the three most active fields in computational intelligence are fuzzy computing, neural computing, and evolutionary computing, as well as the combination and mutual mingling of them.
1.3.1 Fuzzy Computing Fuzzy computing is based on fuzzy set theory. It starts with a domain and carries out various fuzzy operations according to certain fuzzy logic and reasoning rules. (1) Fuzzy set and fuzzy logic In 1965, while researching the problem that in the objective world there are lots of fuzzy concepts and fuzzy phenomena which are difficult to describe by classic binary logic or finite multi-valued logic, Zadeh proposed fuzzy set theory [29J which
10
Process Neural Networks
provided a cogent descriptive and analytical tool and opened a scientific way forward for solving fuzzy problems. In fact, fuzzy logic is a method for solving and analyzing inaccurate and incomplete information. Using a fuzzy set, human thinking and reasoning activities can be simulated more naturally to a certain extent. A fuzzy set has flexible membership relations and allows an element to belong partly to the set, which means that the membership of an element in a fuzzy set can be any value from oto 1. In this way, some fuzzy concepts and fuzzy problems can be expressed easily and reasonably in a fuzzy set. Logic is the theoretical basis of a human being's thinking and reasoning, and is the science of the relationship between antecedent and conclusion. In fact, people often handle such logical reasoning where the relationship between the antecedent and conclusion is not clear but includes various fuzziness . Therefore, logic is divided into precise logic and fuzzy logic. Abstractly speaking, any logic can be regarded as an algebra whose elements are conjunctive logical formulas with certain true values and whose operations are composed of some logical operations (such as "and", "or", "not") and reasoning rules (such as syllogism) . Each logic has some axioms to reason whether a conjunctive logical formula is a theorem of this logic or not. In artificial intelligence, we often adopt some rules that express the relationship between antecedent and conclusion to describe certain knowledge, and then adopt logical reasoning or computing to solve problems. Fuzzy computing generally refers to various computing and reasoning methods including fuzzy concepts. For example, suppose that there are K fuzzy if-then rules, and the form of the rule k is as follows . If Xl is AkJ, Xz is A kZ, ... , and Xn is A km then Yl is B k" yz is B kZ, ... , and Ym is B km, where A ki and Bkj are the fuzzy sets in the universe of discourse V i and V; respectively, X=(Xl,xZ,. .. ,xn)TE V1xVZx.. . xU; and Y=(YJ,Yl," .,Ym)TE V,xV1x ... xv; are respectively the inputs and outputs of the fuzzy logical system. The above reasoning process can be completed by fuzzy computing . (2) Weighted fuzzy logic In traditional fuzzy logic, if there are multiple antecedents, the true value of the antecedent conjunction is generally defined as the minimum of all true values of sub-formulas . Although this fuzzy logic reflects some objective principles to a certain degree, sometimes it does not correspond with the practical situation. Often in the reasoning process, the degree of importance of each antecedent to a conclusion is different, and traditional fuzzy logic cannot embody the relative degree of importance of each sub condition . To solve this problem, we proposed weighted fuzzy logic in 1989 [301• A weighted fuzzy logic can be denoted by a 4-tuple : WFL={E,A,O,R} where E denotes a set of atomic logical formulas ; O={ negation, weighted conjunction, implication} where a weighted conjunctive logical formula is the formula which starts from E and executes the operations in for finite times; A denotes a set made up of some weighted conjunctive logical formulas and is called an axiom set; R={the first syllogism, the second syllogism}. The theorem in weighted fuzzy logic is the weighted conjunctive logical formula obtained by repeatedly carrying out reasoning
°
Introduction
11
rules in R for finite times starting from A. The reasoning rules of syllogism are described as follows. The first syllogism reasoning rule: given that the truth degree of the logical formula Xi is T(Xi) (-I:ST(x;):SI; ;=1,2,.. .,n) and that the truth degree of the weighted n
implication i::' WiXi ~ Y is
I
n
TC i::, wix ~ y) j
n
where
Wj
=I,
then the truth degree
j= 1
of the logical formula y is T( y) = T(~ WX. ~ y)x j='
I
,
I W xT(x). )=1
J
The second syllogism reasoning rule: when
(1.1)
J
TC;
,- I
WjXj
~ y)+ IW)XT(x))~I, ) =1
the truth degree of the logical formula y is n
n
T(y)=T(i::,WjXj ~ y)+ ~W)XT(X)-1.
But when
T(~ WjXj ~ y)+ IW)XT(x)),&}, and R is a set ofreasoning rules such as fuzzy syllogism. We can obtain all the logical conjunctive formulas of fuzzy computational logic if we start from E and execute the operations in for finite times, and get theorems in this logic by using syllogism to reason repeatedly starting from A. The expression ability of fuzzy computational logic is very strong and can be used to describe and denote various fuzzy knowledge .
°
12
Process Neural Networks
1.3.2 Neural Computing Neural computing is inspired by biology, a parallel and non-algorithmic information process ing model established by imitating the information processing mechanism of a biological neural system. Neural computing presents the human brain model as a non-linear dynamic system using an interconnected structure, i.e. an artificial neural network simulates the human brain mechanism to implement computing behavior. In this interconnected mechanism, it is unnecessary to establish an accurate mathematical model in advance. The solving knowledge of the artificial neural network is denoted by the distributed storage of connection weights among a great many interconnected artificial neurons, and the input-output mapping relationship is established by learning samples in given sample sets. At present, various artificial neurons and artificial neural networks can be used as the model for neural computing, such as the MP neuron model, the process neuron model, the BP neural network, the process neural network, etc. In neural computing, there are two key steps, namely constructing a proper neural network model and designing a corresponding learning algorithm according to practical application s. It has already been proved that any finite problem (a problem that can be solved by a finite automatic machine) can be solved by a neural network and vice versa, so the solving capacity of a neural network is equal to a finite automatic machine. In a continuous situation, a multi-layer feedforward neural network can approximate any multivariate functionfRn---+Rm in ~(where R" is n-dimensional real number space). The neural computing problem will be expounded in detail at the back of this book.
1.3.3 Evolutionary Computing Many phenomena in nature or in the objective world may profoundly enlighten our research, and a very good example is the simulation of the law of biological evolution to solve some more complex practical problems . In this example, better solutions are gradually yielded by simulation of the natural law without describing all the characteristic s of the problem clearly. Evolutionary computing is just a generalized solving method based on the above thinking, which adopts simple coding technology to express complex structures, and guides the system to learn or determine the search direction through simple genetic operations and optimizing natural selection by a group of codes. Because evolutionary computing adopts the way a population organizes a search, it can search many regions in the solution space at the same time, and has intelligent characteristics, such as self-organization, self-adaptability and self-learning, and the characteristic of parallel processing. These characteristics mean that evolutionary computing has not only higher learning efficiency, but also some characteristics of simplicity, easy operation and generalization. Hence, it earns attention from a broad range of people. An evolutionary algorithm is a class of random search algorithms learned from natural selection and genetic mechanisms in the biological world. They mainly comprise three algorithms, namely the generic algorithm (GA), evolutionary programming (EP), and evolutionary strategy (ES), and they can be used to solve
Introduction
13
such problems as optimization and machine learning . Two primary characteristics of evolutionary computing are population search strategy and information exchange among individuals in a population. Because of the universality of the evolutionary algorithm, it has broad applications and is especially suitable for handling complex and non-linear problems that are difficult to solve by traditional search algorithms. Next, we will simply introduce GA, EP and ES. (1) Generic algorithm The generic algorithm (GA) [32J is a computing model simulating the biological genetic proce ss. As a global optimization search algorithm , it has many remarkable characteristics including simplicity and easy generalization, great robustness, suitability for parallel processing, wide application scope and so on. GA is a population operation that takes whole individuals in the population as objects. Selection, crossover and mutation are three main operators of GA, which constitute so-called genetic operations that other traditional algorithms do not possess . GA mainly involves five basic elements: (a) the coding of individual parameters; (b) the enactment of the initial population; (c) the design of the fitness function; (d) the design of the genetic operation; (e) the enactment of the control parameter (mainly referring to the scale of the population, the probability of genetic operation on individuals in a population, etc.). These five elements constitute the core content of GA. In nature, although the evolutionary and genetic proce ss is infinite and endles s, a criterion for algorithm termination must be given to a learning algorithm and at that time the individual with maximal fitness value in the population serves as the solution to the problem. In GA, the execution sequence for the operations of selection, crossover and mutation can be serial or parallel. The flow chart is shown in Fig . 1.1. Many researchers have improved and extended Holland's basic GA according to practical application requirements. GA has been broadly applied to lots of fields, such as function optimization, automatic control, image recognition, and machine learning, etc. [33-37] and has become one of the common algorithms in computational intelligence technology. Coding and form ing of init ial population Detection and evaluation of individual's fitnes s in the population
Yes
Fig. 1.1 The GA flow chart
14
Process Neural Networks
(2) Evolutionary programming
The evolutionary programming (EP) method was first proposed by Fogel et al. in the 1960s [381• They thought that intelligent behavior should include the ability to predict surrounding states and make a proper response in terms of a determinate target. In their research, they described the simulated environment as a sequence composed of symbols from a finite character set and expected the response to be the current symbol sequence evaluated to obtain the maximum income. Here the income is determined as the next-arising symbol and its predefined benefit target in the environment. In EP, we often use a finite state machine (FSM) to implement such a strategy, and a group of FSMs evolve to give a more effective FSM. At present, EP has been applied in many fields such as data diagnosis , pattern recognition , numerical optimization , control system design, neural network training, etc., and has achieved good results. EP is a structured description method and it is essential to describe problems by a generalized hierarchical computing program. This generalized computing program can dynamically change its structure and size in response to the surrounding state, and has the following characteristics when solving problems : (a) The results are hierarchical; (b) With the continuing evolution, the individual constantly develops dynamically towards the answers; (c) The structure and size of the final answers need not be determined or limited in advance, because EP will automatically determine them according to the practical environment; (d) The inputs, intermediate results, and outputs are the natural description of problems, and the preprocessing of input data and the post-processing of output results are not needed or needed less. Many engineering problems can come down to the computer programs producing corresponding outputs for given inputs, so EP has an important application in practical engineering field [39-45 J• (3) Evolutionary strategy
In the early 1960s, when Rechenberg and Schwefel carried out wind tunnel experiments, the parameters used to describe the shape of the test object were difficult to optimize by traditional methods during design, and so they adopted the idea of biological mutation to change the values of parameters randomly and obtain ideal results. Thereafter, they carried out an in-depth study and development of this method and formed another branch of EP, which is evolutionary strategy (ES) [461• Currently, ES mainly has two forms: (u+A.) selection and (u) selection. (u+A.) ES produces A. individuals from J1 individuals in the population by means of mutation and crossover, and then compares these J1+A. individuals so as to select J1 optimal individuals; (u,A.) ES selects J1 optimal individuals directly from newly produced A. (A.>J1) individuals. In contrast to GA, ES directly operates in the solution space, emphasizes
Introduction
15
self-adaptability and diversity of behavior from parents to offspring in the evolution process, and adjusts the search direction and step length adaptively.
1.3.4 Combination of the Three Branches Fuzzy systems, neural networks, and evolutionary algorithms are considered as the three most important and leading edge areas within the field of artificial intelligence in the 21st century. They constitute so-called intelligent computing or soft computing. All of them are theories and methods imitating biological information processing patterns in order to acquire intelligent information processing ability. Here, a fuzzy system stresses the brain's macro functions such as language and concept, and logically processes semantic information including fuzziness according to the membership functions and serial and parallel rules defined by humans. A neural network emphasizes the micro network structure of the brain, and adopts a bottom-up method to deal with pattern information that is difficult to endow with semantics using complex connections among large numbers of neurons according to a parallel distribution pattern formed by learning, self-organization, and non-linear dynamics. An evolutionary algorithm is a probabilistic search algorithm that simulates the evolutionary phenomena of biology (natural selection, crossover, mutation, etc.). It adopts a natural evolutionary mechanism to perform a complex optimization process, and can solve various difficult problems quickly and effectively. It can be said that for fuzzy systems, neural networks, and evolutionary algorithms, their goals are similar while their methods are different. Therefore, combining these methods can draw on their individual strengths to offset their weaknesses and form some new processing patterns. For example, the learning process of a neural network requires a search in a large space in which many local optimal points exist, so sometimes it is more difficult to solve a large-scale training problem for a neural network. Meanwhile a genetic algorithm is very suitable for carrying out large-scale parallel searches and can find a global optimal solution with high probability . Thus, we can improve the performance of the learning algorithm of a neural network by combining it with a genetic algorithm . Combining fuzzy logic with a neural network, we can construct various fuzzy neural network models that not only mimic a human being's logical thinking, but also can have a learning trait. For example, the fuzzy computing (reasoning) network proposed by us in 1994 can execute a fuzzy semantic network and soakage computing [471• Furthermore, the combination of a neural network and a genetic algorithm can construct a neural network whose connection weights evolve continually with change in the environment, and it can much more vividly simulate biological neural networks. This continually evolving neural network can do various things in operation : (a) Perceive change in the environment, changing its network parameters correspondingly via evolution (e.g. by adopting an evolutionary algorithm) , and finding a new network structure and learning algorithm (the key lies in giving the
16
Process Neural Networks
algorithm or structure a proper coding (gene), as well as in the evaluation method for network performance); (b) When the network performance cannot meet demand , it automatically starts some learning algorithm, improves the parameters or structure of the network, and enhances its self-adaptability. Subject crossing or combination can often lead to the discovery of new technologies and methods and lead to innovation. For example, we can combine fuzzy systems, neural networks, and evolutionary algorithms and establish a fuzzy neural network with evolutionary capability to implement and express human intelligent behavior effectively.
1.4 Process Neural Networks At present, most artificial neural network models with actual values are constructed based on the MP neuron model, and the system inputs are time-unrelated constants, i.e. the relationship between the inputs and outputs of networks is instantaneous corresponding to relationships of a geometric point type. However, research from biological neurology indicates that the output change of a synapse is affected by the relative timing of the input pulse in a biological neuron and depends on the input process lasting for a certain time. Furthermore, in some practical problems, inputs of many systems are also a process , or functions depending on spatial-temporal change, or even multivariate functions relying on multiple factors; the system outputs are relative not only to current inputs, but also to a cumulative effect over a period of time. When we use a traditional neural network model to solve the issue of inputs and outputs of a time-varying system, the common method is to deal with it after converting the time relation to a spatial relation (time series). However, this will result in rapid expansion of the network size, and actually traditional neural networks still have difficulty in solving learning and generalization problems for large numbers of samples. At the same time, doing it like this makes it hard to satisfy real-time demands of the system and to reflect the cumulative effect of time-varying input information on the output. With these problems in mind, we proposed and established a new artificial neural network model , a process neural network (PNN), by extending traditional neural networks to the time domain or even the multi-factor domain. PNN can directly deal with processing data (time-varying functions) and has easy adaptability for solving many practical process-related problems . In this monograph, we will discuss the process neural network in depth, and will study its various theories, interrelated algorithms and various applications, and addres s a variety of unresolved issues that need further research . Finally, we will extend a neural network to a generalized abstract space, i.e. regard a neural network as a special mapping between points in different (or the same) abstract spaces, and
Introduction
17
consequently un ify all kinds of neural network models proposed by mathematicians in the past.
References [I] McCulloch W.S. , Pitts W.H. (1943) A logical calculus of the ideas imminent in neuron activity. Bulletin Mathematical Biophysics 5(1) ;115-133 [2] Hebb D.O. (1949) The Organization of Behavior: A Neuropsychological Theory. Wiley , New York [3] Rosenblatt F. (1958) Principles ofNeuro-Dynamics. Spartan Book s, New York [4] Widrow B. (1962) Generalization and Information Storage in Networks of Adaline Neurons. In: Self-Organizing Systems . Spartan, Washington DC, pp.435-46I [5] Amari S.A. (1967) Theory of adaptive pattern classifiers. IEEE Transaction Electronic Computers 16(3):299-307 [6] Minsky M.L., Papert SA (1969) Perceptrons. MIT Press , Cambridge MA [7] Amari S. (1972) Characteristics of random nets of analog neuron-like elements. IEEE Transaction on Systems, Man, Cybernetics 5(2):643-657 [8] Anderson lA . (1972) A simple neural network generating interactive memory. Mathematical Biosciences 14:197-220 [9] Grossberg S. (1976) Adaptive pattern classification and universal recoding . I: Parallel development and coding of neural feature detectors. Biological Cybernetics 23(3) :121-134 [10] Hop field J.J . (1982) Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Science, U.S.A. 79(2) :554-558 [II] Rumelhart D.E., Hinton G.E. , Will iams R.J. (1986) Learning representations of back-propagating errors. Nature 323(9):533-536 [12] Hinton G.E., Nowlan S.J. (1987) How learning can guide evolution. Complex systems 1(3):495-502 [13] Hecht -Nielsen R. (1989) Theory of the back-propagation neural network. Proceedings of the International Joint Conference on Neural Networks 1:593-605 [14] Funahashi, K. (1989) On the approximate realization of continuous mappings by neural networks. Neural Networks 2(3) :183-192 [15] Hornik K., Stinchcombe M., White H. (1990) Univer sal approximation of an unknown mapping and its derivatives using multilayer feedforword networks. Neural Network s 3(5):551-560 [16] Linsker R. (1988) Towards an organizing principle for a layered perceptual network. Neural Information Processing Systems 21(3):485-494 [17] Boser B.E., Guyon LM., Vapnik V.N. (1992) A training algorithm for optimal margin classifiers. In: Haussler D., Ed . Proceedings of the 5th Annual ACM
18
Process NeuralNetworks
Workshop on Computational Learning Theory. ACM Press, Pittsburgh, PA, pp.144-152. [18] Vapnik V.N. (1995) The Nature ofStatistical Learning Theory. Springer, New York [19] Vapnik V.N. (1998) Statistical Learning Theory. Wiley, New York [20] Han M., Wang Y. (2009) Analysis and modeling of multivariate chaotic time series based on neural network. Expert Systems with Applications 36(2):1280-1290 [21] Abdelhakim H., Mohamed E.H.B., Demba D., et al. (2008) Modeling, analysis, and neural network control of an EV electrical differential. IEEE Transactions on Industrial Electronics 55(6):2286-2294 [22] Al Seyab R.K., Cao Y. (2008) Nonlinear system identification for predictive control using continuous time recurrent neural networks and automatic differentiation . Journal ofProcess Control 18:568-581 [23] Tomohisa H., Wassim M.H., Naira H., et al. (2005) Neural network adaptive control for nonlinear nonnegative dynamical systems. IEEE Transactions on Neural Networks 16(2): 399-413 [24] Tomohisa H., Wassim M.H., Naira H. (2005) Neural network adaptive control for nonlinear uncertain dynamical systems with asymptotic stability guarantees. In: 2005 American Control Conference pp.1301-1306 [25] Ghiassi M., Saidane H., Zimbra D.K. (2005) A dynamic artificial neural network model for forecasting time series events. International Journal of Forecasting 21(2):341-362 [26] Tan Y.H., He Y.G., Cui c., Qiu G.Y. (2008) A novel method for analog fault diagnosis based on neural networks and genetic algorithms. IEEE Transactions on Instrumentation and Measurement 57(11) :1221-1227 [27] He X.G., Liang J.Z. (2000) Process neural networks. In: World Computer Congress 2000, Proceedings of Conference on Intelligent Information Processing. Tsinghua University Press, Beijing, pp.143-146 [28] He X.G., Liang J.Z. (2000) Some theoretical issues on procedure neural networks. Engineering Science 2(12):40-44 (in Chinese) [29] Zadeh L.A. (1965) Fuzzy sets. Information and Control 8:338-353 [30] He X.G. (1989) Weighted fuzzy logic and wide application . Chinese Journal of Computer 12(6):458-464 (in Chinese) [31] He X.G. (1990) Fuzzy computational reasoning and neural networks . Proceedings of the Second International Conference on Tools for Artificial Intelligence . Herndon, VA, pp.706-711 [32] Holland 1. (1975) Adaptation in Natural and Artificial Systems . Univ. of Michigan Press, Ann Arbor [33] Malheiros-Silveira G.N., Rodriguez-Esquerre V.F. (2007) Photonic crystal band gap optimization by generic algorithms. Microwave and Optoelectronics Conference, SBMOIIEEE MTT-S International pp.734-737 [34] Feng X.Y., Jia J.B., Li Z. (2000) The research of fuzzy predicting and its application in train's automatic control. Proceedings of the 13th International Conference on
Introduction
19
Pattern Recognition pp.82-86 [35] Gofman Y., Kiryati N. (1996) Detecting symmetry in grey level images: the global optimization approach. Proceedings of 2000 International Workshop on Autonomous Decentralized Systems 1:889-894 [36] Fogarty T.e. (1989) The machine learning of rules for combustion control in multiple burner installations. Proceedings of Fifth Conference on Artificial Intelligence Applications pp.2l5-221 [37] Matuki T., Kudo T., Kondo T. (2007) Three dimensional medical images of the lungs and brain recognized by artificial neural networks. SICE, Annual Conference pp.lll7-1121 [38] Fogel LJ., Owens AJ., Walsh MJ. (1966) Artificial Intelligence Through Simulated Evolution . Wiley, New York [39] Swain AX, Morris A.S. (2000) A novel hybrid evolutionary programming method for function optimization . Proceedings of the 2000 Congress on Evolutionary Computation 1:699-705 [40] Dehghan M., Faez K., Ahmadi M. (2000) A hybrid handwritten word recognition using self-organizing feature map, discrete HMM, and evolutionary programming . Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks 5:515-520 [41] Li X.L., He X.D., Yuan S.M. (2005) Learning Bayesian networks structures from incomplete data based on extending evolutionary programming . Proceedings of 2005 International Conference on Machine Learning and Cybernetics 4:2039-2043 [42] Lieslehto 1. (2001) PID controller tuning using evolutionary programming . Proceedings of the 2001 American Control Conference 4:2828-2833 [43] Li Y. (2006) Secondary pendulum control system based on genetic algorithm and neural network. IEEE Control Conference pp.l 152-1155 [44] Jose J.T., Reyes-Rico C, Ramirez J. (2006) Automatic behavior generation in a multi-agent system through evolutionary programming . Robotics Symposium, IEEE 3rd Latin American pp.2-9 [45] Gao W. (2004) Comparison study of genetic algorithm and evolutionary programming. Proceedings of 2004 International Conference on Machine Learning and Cybernetics 1:204-209 [46] Back T., Hoffmeister F., Schefel H.P. (1991) A survey of evolution strategies. Proceedings of the Fourth ICGA. Morgan Kaufmann Publishers, Los Altos, CA, pp.2-9 [47] He X.G. (1996) Fuzzy reasoning network and calculation inference . Journal of Software (10):282-287 (in Chinese)
2 Artificial Neural Networks
The modern computer has strong computing and information processing capabil ities, and it can be said that the modern computer has alread y exceeded the capabilities of the human brain . It plays an important function in human society in the fields of human daily life, production, and scientific research. However, current computer hardware and software systems are still based on von Neumann architecture. They can only mechanically solve actual problems by using predefined programs, and their capability is less than that of humans when solving certain problems, such as adaptive pattern recognition, behavior perception, logical thinking, analy sis and processing of incomplete and fuzzy information, independent decision-making in a complex environment, etc. What is more, they lack the mechanism and capability for adaptive learning from the environment and active adaptation to the environment. Neurology research indicates that the human brain is an information-processing network system formed by the complex and mutual connection of a huge number of basic units (biological neurons) and the network system is highly complex, nonlinear, and uncertain and has a highly parallel processing mechani sm. Each neuron cell is a simple information-processing unit, whose state is determined by self-conditions and the external environment. It has a definite input-output transformation mechanism. The human brain has capabilities such as memorizing, computing, logical reasoning and thinking, perception and learning from the env ironment, evolving with the environment, etc. Therefore, by imitating the organizational structure and the running mechani sm of the human brain, we seek new information denotation, storage, and proce ssing method s and construct a new information-processing system, which is closer to human intelligence, to solve the problems that are difficult to solve by using traditional methods. This will greatly extend the application areas of computers and promote the advancement of science. This will also provide a tentative way to explore a completely new computer system.
Artificial Neural Networks
21
2.1 Biological Neuron The biological brain is a complex interconnected network that is made up of billions of nerve cells (neuron s). The human brain has approximately 1010_10 11 neurons . Each neuron is interconnected with 103-105 other neurons (including itself), The brain is a huge and compl ex network system . In general, the structure of the neuron can be divided into three parts : soma, dendrites, and axon, as depicted in Fig. 2.1 11,21•
VI
I ({
Dendrites
r'
Synapses
Soma
\
Axon
Fig. 2.1 Biological neuron
To one side of the soma are many dendrite s form a tree shape; on the other side of the soma is the axon . Many branches of the axon connect with dendrites from other neurons. The junction between the axon branch and the dendrites is called a synap se. A neuron accepts electrical or biochemical transm issions from axon branches of other neuron s via a dendrite (input) . After weighted processing by the corre sponding synapses, each input signal undergoes aggregation superposition and non-linear activation at an axon hillock at the back of the soma . Under certain conditions (for example, the intensity of an aggregation signal exceeds a certain threshold value), it generates an output signal by activation. This signal is transferred to other neurons connected to it by branches of the axon, and then turns to the next information-processing process. The synapses of the neuron are the key unit in the construction of neural information proce ssing ; they not only transform an input pulse signal into a potential signal, but they also have an experience memory function, and can carry out weighted processing on the input signal accord ing to memory . The differences in information-processing methods between the brain system and von Neumann architecture are as follows : (a) Their information storage modes are different. The biological brain does not have a separate and centralized storage or arithmetic unit; while each neuron combines the functi on s of storage and computing. Various kind s of information are distributed and stored in the synapses of different neurons, and various kinds of information processing in fine-grained distribution are completed by numerous neurons. (b) The biological brain does not need a program for solving problems, that is, it does not create a model in advance when solving practical problems, but directly changes the memory parameters (connection weights) of the synapses of a neuron to
22
Process Neural Networks
acquire the knowledge for solving certain problems by learning. (c) The information (a processing object) processed by the biological brain is not completely certain and accurate, but has obvious fuzziness and randomness. The processing object can be either a discrete quantum, or a continuous quantum. (d) The processing method used by the biological brain can be a digital method, an analog method, or a digital/analog (D/A) organic mixed method, and also a random processing method. Therefore , the brain and the current computer have great differences in information processing methods. With the addition of the random processing method, and D/A mixed processing, the whole process becomes complex, and it is usually non-repeatable . (e) The switching time of a brain neuron is several milliseconds (of the order of 10-3 s), which is millions of times longer than that of a current computer (of the order of 10-10 s). However, the human brain can produce an accurate response to a complex activation in less than one second. This indicates that although the processing and transmission speed of a single neuron are rather slow, the brain can respond quickly due to its high parallelism. The brain is made up of many simple neurons and is very simple in microstructure, but it can solve very complex problems. More incredibly, the brain has stupendous creativity, and this is worth noting by learners of artificial intelligence. It is certain that we can learn much from research on the structure of the brain, e.g. artificial neural networks. Now let us start by providing a mathematical model of a neuron.
2.2 Mathematical Model of a Neuron In the above, we have simply analyzed the structure and information-processing mechanism of the biological neuron, to provide biological bases for constructing the mathematical model of an artificial neuron. Obviously, it is impossible to simulate factually various characters of the biological neuron in a current computer , and we must make various reasonable simplifications . In current research on the neural network, the neuron is the most essential information-processing unit of the neural network. Generally, the mathematical model can be depicted as in Fig. 2.2. Wlj
X2 - - - ' - - - + {
Fig. 2.2 Artificial neuron model
w ij
In Fig. 2.2, Xi (i= I,2,...,n) is the input signal of n external neurons to a neuron j; is the connection weight between the ith external neuron and the neuron j ; OJ is
Artii'icial NeuralNetworks
23
the activation threshold of the neuron j ; f is the activation function (also called an effect function, generally non-linear); Yj is the output of this neuron. The relationship between the inputs and the output of a neuron is (2.1)
where f can be a non-linear activation function, such as a Sign function or a continuous Sigmoid function . It can be seen from the above that the mathematical model of a neuron preferentially simulates information-processing by a biological neuron to a certain extent, but this has two disadvantages : (a) The information-processing does not refer to time. There is no time-delay between the inputs and the outputs. The relationship between the inputs and the outputs is a momentary corresponding relationship. (b) The accumulation effect of the inputs on the outputs has not been taken into consideration . A certain momentary output just depends on the current inputs without reference to earlier inputs. Nevertheless, in order to discuss and research conveniently , we first consider this kind of simple neuron model and its corresponding neural network.
2.3 Feedforward/Feedback Neural Networks Various artificial neural networks are constructed by connecting together several artificial neurons according to a particular topological structure. At present, there are tens of primary neural network models. According to the connection method among the neurons and the different information directions in the network, neural network models can be divided into two kinds. One is a feedforward neural network that has only forward information transfer, but no feedback information. The other is a feedback neural network that has not only forward transfer of information, but also reverse transfer (feedback) information.
2.3.1 FeedforwardlFeedback Neural Network Model A feedforward neural network is made up of one input layer, several middle layers (hidden layers) and one output layer. A typical structure with a single hidden layer is shown in Fig. 2.3. A feedforward neural network may contain several middle hidden layers. The neurons of each layer only accept output information coming from the neurons of the forward layer.
24
Process NeuralNetworks
Input layer
Hidden layer
Output layer
Fig. 2.3 A feedforward neural network with a single hidden layer
Each directed connection line among the neurons has one connection weight. The connection weight can be zero, which means that there is no connection. For simplicity and uniformity , in the diagram of a feedforward neural network, the neuron s of the previous layer are connected with all the neurons of the following layer. Any two neurons in a feedback neural network can be connected , including self-feedback of neurons . A typical structure is shown in Fig. 2.4. Output
Output Output
Input
Input
Input
Fig. 2.4 Feedback neural network
In Fig. 2.4, wij (solid line) is the connection weight for the forward transferring network nodes, and Vj i (dashed line) is the connection weight for the feedback transferring nodes of the network information . In the above network, each neuron does not always have initial input, and the connections between neurons are not complete connections. In a feedback neural network, the input signal will be repeatedly transferred among the neurons from a certain initial state, and after being transformed a few times, will gradually tend to either a particular steady state or a periodic vibration state. In neural networks research at present, the most popular and effective model is a feedforward neural network . It is quite successful in many domains, such as pattern recognition, classification and clustering , adaptive control and learning, etc. In neural networks research with a combination of feedforward and feedback ,
Artificial Neural Networks
25
due to the complexity of the structure, the problem of feedback information processing should be considered in the operation mode, and in some cases even time should be quantified. There are, therefore, many difficulties but few achievements . However, the information-processing mode of various animal brains belongs to this type, and various applications lead to a strong demand for research on feedback neural networks, so this research becomes imperative .
2.3.2 Function Approximation Capability of Feedforward Neural Networks When it is applied as a computing model, the computing capability of the artificial neural network and what sort of problems it is capable of solving should be considered first. Second, as learning by the neural network can be regarded as a special process of function fitting or approximation, and the neural network's solution of problems is generally inaccurate, the precision of its solution and the function approximation capability should be considered. An example of a MISO (multi-input-single-output) feedforward neural network with a single hidden layer is shown in Fig. 2.5.
y
Fig. 2.5 MISO feedforward neural network with single hidden layer
The relationship between the inputs and the output from the input layer to the hidden layer is (2.2)
The relationship of the inputs-output from the hidden layer to the output layer is (2.3)
Integrating Eqs. (2.2) and (2.3), the mapping relationship between the inputs and output of a feedforward neural network is
26
Process Neural Networks
(2.4)
In Eqs. (2.2)-(2.4), XI. X2, •••, X n are the multidimen sional inputs of the system; OJ (j= 1,2,...,m) is the output of the jth neuron in the hidden layer; f is the activation function of the hidden layers; OJ is the activation threshold of the jth neuron in the hidden layer; 0 is the activation threshold of the output neuron; g is the activation function of the output neuron. Obviou sly, the input-output relationship of feedforward neural networks can be considered as a mathematical function, and the problem of learning can be considered as a special problem of function approximation or fitting. The class of approximation function is a set that is composed of the above neural networks. Therefore, in order to explain the possibility that neural network models solve various application problems, the idea that the above models approximate input-output relationships (mathematical function relation ships) should be demon strated in theory. Otherwi se, there is no universality for solving problems. Hitherto, under certain conditions, many approximation theorem s for a neural network have already been proved. Now we refer to some of the famous theorem s. (1) Hecht-Nielson Approximation Theorem [3]
Suppose that Q is a bounded closed set. For any t:>O and any ~ function! Rn_Rm (R is the real number set) defined in Q, there exists a feedforward neural network with double hidden layers (shown in Fig. 2.6) such that IIf-yllR continuous function approximation
Hornik Theorem 2 Suppo se that the activation function g( .) of hidden nodes is any continuous non-constant function , then the three-layer feedforward neural network with adequate hidden layer nodes can approximate any fathomable function of R" with any precision. (3) Funahashi Approximation Theorem [5] Suppose that gO is a bounded , monotonou s increasing and continuous function; D is the compact subset (bounded closed set) in R"; F is a continuou s mapping : D---+Rm, then for any F and 00, there is a feedforward neural networkfwith k (k~3) hidden layers and a hidden layer activation function g(·):D---+Rm , such that max II f(x) - F (x ) II< E, xe D
where 11·11 is any norm in R". The structure of the network is shown in Fig. 2.8. Yl Yz
Yrn Fig. 2.8 A feedforward neural network with multiple hidden layers used for D-->Rrn approximation
2.3.3 Computing Capability of Feedforward Neural Networks Computing Capability Theorem The computing capability of a feedforward neural network is equivalent to that of a Turing machine. In 1995, Liu and Dai proved that the computing capability of the linear threshold unit neural network is equivalent to that of a Turing machine [61. As a linear threshold
28
Process Neural Networks
unit neural network is a quite simple feedforward neural network model, the comput ing capability of a feedforward neural network whose activation function adopts a Sigmoid function , a Gauss function, etc. will not be smaller than that of a Turing machine . On the other hand, the operat ional symbols used in a feedforward neural network are "+", ":", "I" and their compound operations, which can be completed by a Turing machine . Therefore, the computing capability of a feedforward neural network will not be greater than that of a Turing machine. Hence, the computing capability of a feedforward neural network is equivalent to that of a Turing machine .
2.3.4 Learning Algorithm for Feedforward Neural Networks The learning (or training) for a neural network is not simply a matter of memorizing the mapping relationship between the inputs and outputs among the learning samples, but of extracting the internal rules about the environment which are hidden in the sample by learning the finite sample data. At present , there are many learning algorithms of feedforward neural networks among which the error back-propagation algorithm (BP algorithm) and its various improved pattern s are applied most exten sively and effectively. A multi-layer feedforward neural network model which adopts the BP algorithm is generally called a BP network, and its learning process is made up of two parts: forward-propagation of input information and error back-propagation. Forwardpropagated input information is transferred to the output layer from the input layer after processing in the hidden layer. The state of each layer neuron only influences the state of neurons in the next layer. If it cannot obtain the expected output in the output layer, it shifts to back-propagation, and error signals are returned along the original pathway of the neural connection. In returning, the connection weight of each layer is modified one by one. Through successive iterations , the error between the expected output signals of the network and practical output signals of the system reaches an allowable range. A learning algorithm for a neural network is often related to certain function approximation algorithms, especially to some iterative algorithms that make the approximation error gradually smaller. In fact, the above-mentioned BP algorithm corresponds to a gradient descent algorithm in function approximation. Once we know this principle, we can construct various learning algorithm s for neural networks according to different function approximation algorithms.
2.3.5 Generalization Problem for Feedforward Neural Networks Generally , when modeling a certain object using a neural network, the input and output data samples of this object are divided into two groups: one group is called a learning sample; the other group is called a test sample. The learning sample is used for obtaining models by learning and training ; the test sample is used for testing the "generalization error" of the model by testing and learning . If the generalization
Artificial Neural Networks
29
error of the model is small, the generalization capability of the model is strong. On the contrary , if the generalization error of the model is big, then the generalization capability of the model is weak. The "approximation error" between the practical object and the model is described by the learning error and the generalization error . In fact, the generalization error of the model should refer to the error between the practical object and all possible input/output samples. Therefore, when the neural network is trained , the reasonable selection of the learning sample has a great influence on the generalization capability of the model. Analyzing the model structure , the generalization capabil ity of a neural network, and especially a multi-layer feedforward neural network, is closely related to many factors , such as the degree of complexity of the actual data source, the number and the distribution of the learning sample, the structure and the scope of the network, the learning algorithm , etc. In conclusion, the generalization capability of a neural network can be improved by two features : the network structure and the learning sample set. The network structure mainly considers how to improve the robustness and fault-tolerance of the network and ascertain the proper information capacity of the network from the following aspects: the network model, the connection structure of neurons , the number of neural network hidden layers and neurons in each hidden layer, the learning algorithm, etc. The learning sample set should consider whether the selected sample set covers all the different situations in the research objectives, whether the distribution of inputs is reasonable , and how many samples we need to ensure that the generalization error satisfies the demand s. For instance, the following problems are worth studying : (a) If the research object (system) is complex, non-linear, and high in degree of uncertainty, and the different individuals of the objects of a class have obvious differences, we can design a given sampling experiment, enlarge the overlying scope and the density of the sample, and express the non-linear dynamic characters of the research object as completely as possible . Thus, we can improve the generalization effect , i.e. diminish the approximation error of the test sample set. (b) As a black box system, the modeling of neural networks completely depends on input and output data, thus the quality and the distribution of the learning sample set are important to the generalization capability of the network. As in practice we can only obtain a finite data sample with a given scope and condition s, because of noise pollution and error analysis, the sample data quality will be reduced . Therefore, in the selection of the learning sample, we should construct a complete data collection and analysis mechanism to improve confidence in the learning sample . (c) The mismatch between the network scale and the degree of complexity (information capacity) of a practical system is also an important factor in influencing the generalization capability of the network. At present, the structure and the scale of neural network s cannot be ascertained by any mature theory, but have to be decided by experience and repeated experiments. Although neural networks have a general approximation property , the proof of this conclusion is based on the premise of an infinite network scale and sample size. If the network scale is too small, the information capacity is low, and the network cannot completely approximate complex objects. If the scale is too large, it will induce over-fitting and reduce the
30
Process Neural Networks
robustnes s and fault-tolerance of the network. In some cases, the fuzzy logic system is equal to a neural network. Accordingly, in practical applications, we can firstly obtain the fuzzy relationship between the inputs and outputs of the research object according to prior knowledge and understanding of the practical system. Based on this relationship, the neural network structure can be defined primarily , then gradually be modified and completed by validating the sampling data. In this way, the structural and the property parameters of neural network models can correspond well with and match the system characteristics of the research object. (d) The essence of neural network training is to further simulate the mapping relationship between the inputs and the outputs of a practical system in a certain data environment. For the trained network, if the data environment changes obviously , we must retrain the network, i.e. redetermine the new mapping relationship of the research objects to ensure the generalized capability of the network. (e) For a group of given sample data, we should research "how to properly classify the learning sample and the test sample to obtain the minimum approximation error of the neural network through learning in the whole sample set".
2.3.6 Applications of Feedforward Neural Networks As neural networks need not build accurate mathematical or physical models in advance in order to solve a problem, they are broadly applied to fields that lack prior theory and knowledge or where it is difficult to build accurate mathematical or physical models, such as in scientific research, engineering computing and other facets of daily life. The feedforward neural network has the following important and successful application s because of the characteristics of its information-processing mechanism.
(1) Pattern recognition Pattern recognition is one of the earliest and most successful applications of a feedforward neural network . A neural network can automatically extract and memorize the essential characters of various sample patterns by learning the training sample set, and form the discriminant function by adaptive combination of multiple characters, and solve various complex pattern recognition problems, such as autodiagnosis of mechanical failure [7], script character recognition [8], discrimination of sedimentary microfacies in petroleum geological study [9], and phoneme recognition [IO J•
(2) Classification and clustering Classification and clustering are common problems in signal processing and combinatorial analysis. When there are several classes , how to classify samples is called "classification" ; when the class number is unknown, how to merge samples into classes most reasonably is called "clustering". For classification , the BP
ArtificialNeuralNetworks
31
network is like a classifier with learning and adaptive mechanism by learning and extracting various classes of pattern features . For clustering, the classification structure of research objects does not need to be known beforehand , and it can be classified according to similarities among the research objects, which are not restricted by the current level of study of research objects and prior knowledge . The feedforward neural network (such as a self-organizing mapping neural network without teaching) , adopting the self-organizing competitive learning algorithm, is a perfect cluster, which is broadly applied to many fields including data mining, association analysis, etc. [11-131 (3) Forecastdecision-making
As a feedforward neural network has the properties of a learning mechanism regarding the environment, adaptive capability and continuity , the neural network that has learned some knowledge about related domains is like a prediction model that can analyze development trends of objects according to their external condition changes. At the same time, a neural network model is based on case learning, and can convert the knowledge and information acquired from learning into facts and rules in the process of reasoning ; therefore , it can be used for decision-making. At present, a neural network has been applied to trend prediction in economic development [141, environmental prediction [15,161, intelligent decision support [17 1, stock market change trend prediction analysis [18,191, earthquake prediction [201, performance forecast of a refrigeration system [211, etc. (4) System identification and adaptive control
System identification and adaptive control are other important applications of feedforward neural networks . System identification based on a neural network uses a nonlinear transformation mechanism and the adaptability of neural networks and regards the neural network as an equivalent model to an identification system, such that the practical system and the identification model have the same output under the same initial conditions and given inputs based on the inputs and the output data of the system. Moreover, a feedforward neural network can be a controller of a practical system, which can take effective adaptive control in conditions of system uncertainty or disturbance, and make the control system achieve required dynamic and static characteristics [22-241• (5) Modeling and optimizing
Feedforward neural networks have good learning capability and nonlinear transformation mechanisms. They can effectively finish simulation modeling for problems including sensing systems and automatic production processes where it is difficult to build accurate models using mathemat ical formulas . Moreover , they can also be applied to system structure design, optimization, etc. [25,261• As a feedforward neural network has good function approximation and computing capability , it has been broadly applied in other practical fields of scientific computing, image processing 127,281, etc.
32
Process Neural Networks
2.4 Fuzzy Neural Networks The signals processed by the biological nervous system are fuzzy and qualitative hybrid simulations to some extent. The managing process for them is not a simple nume rical calculation, but a combination of the environmental activ ation signal and the exi sting knowledge in the neural system. The information-processing mechanism of the neural network real izes logical reasoning and computing. A fuzzy neural network can integrate fuzzy logical reasoning with a nonlinear transformation mechanism and the learning capabil ity of a neural network to simulate the information-processing mechanism and proce ss of the biological neural network more closely.
2.4.1 Fuzzy Neurons There are two kinds of models of fuzzy neurons. Model I is obtained by the directly Iuzzied non-fuzzy neuron; Model II is described by fuzzy rules . The structure of Model I obtained by the directly fuzzied or popularized non-fuzzy neuron is shown in Fig. 2.9 .
y
Fig. 2.9 Structure of fuzzy neuron Model I
In Model I, the inputs, the connection weights, the activation thresholds, the aggregation operation, and the nonlinear activation function (also called effect function) are all fuzzied, and can be various fuzzy numbers, fuzzy operations, or fuzzy functions, separately. Therefore, the output of the neuron is fuzzy too. Similar to the non-fuzzy neuron, this fuzzy neuron can do a certain aggregation operation on inputs (fuzzy or precise) after the weighted operation, and then compute the output result for the neuron according to the activation threshold and the activation function. Fuzzy neuron Model II is designed according to the weighted fuzzy logic proposed by the authors. It denotes a weighted fuzzy logical rule in semantics, and the premise and the conclusion are fuzzy predications including fuzzy sets as arguments. In this fuzzy neuron, input information (fuzzy or precise) is related with its output by a weighted fuzzy logical rule. The reasoning rule denoted by the fuzzy neuron is stored in structure connection parameters of the neuron and the
Artificial Neural Networks
33
aggregation operation mechanism. The output predication is composed of the current input predication and the past experienced weight according to a certain rule . The structure of Model II is shown in Fig. 2.10. XI
X2
lV I'
IVZ'
•
-
Y
Fig. 2.10 Structure of fuzzy neuron Model II
2.4.2 Fuzzy Neural Networks Obviously, fuzzy logic has an outstanding feature: it can naturally and directly express the logical meanings habitually used by humans, so it is applicable to direct or advanced knowledge representation. On the other hand, it is difficult for fuzzy logic to express the complex nonlinear tran sformation relationship between quantitative data and proce ss variation. The neural network can complete adapt ability by a learning mechanism, and automatically obtain knowledge expre ssed by available data (accurate or fuzzy). However, with this knowledge indirectly expre ssed by "connection weight" or "activation threshold " in neural networks, it is difficult to directly determine its meanings, and is not easy to directly proceed to semantic interpretation. It is obviou s that both fuzzy logic and the neural networks have advantages and disadvantages. However, we can easily discover that the advantages and the disadvantages of fuzzy logic and neural computing are complementary in a certain sense. Fuzzy logic is more suitable for top-down analy sis and a design process when designing intelligent system s, while a neural network is more suitable for improving and consummating the performance of an intelligent system from bottom to top after it has been initially designed. Therefore, if fuzzy logic and a neural network can be combined harmoniously, they can have complementary advantages, that is to say the inhere nt disadvantages of one field can be compensated for by the other. It will be a good combination if we adopt the fuzzy neuron depicted in the above section to construct neural networks. Obviously, the knowledge base expressed by fuzzy rules can be conveniently expressed by a network composed of one or more of this kind of fuzzy neurons . Another combination is to adopt some fuzzy logica l rules to control the structure and the values of property parameters of a fuzzy neural network. For example, some learning parameters change according to fuzzy reasoning rules during the learning or the running process for fuzzy neural networks. The parameters u and d in the RPROP algorithm are originall y fixed constant s. The origin al algorithm is greatl y
34
ProcessNeural Networks
improved after adopting a fuzzy control method to make the parameters change during the running. In fact, the fuzzy control method can be extended to continuously control and modify other components of the neural network, including the connection weights, the activation threshold, the aggregation method, or even the dynamic adju stment of the activation function, etc . From hereon, the key is to design, acquire and ascertain fuzzy control rules , which is a design problem dependent on actual applications. For instance, in the learning course of a general fuzzy neural network, a method of modifying the fuzzy connection weight adopting not fuzzy computing but fuzzy logical rules is vital and worth researching. The main difficulty lies in how to produce appropriate fuzzy modification rules according to the semantics of the problems. There are also other methods for combining fuzzy logic and neural networks, for example: (a) Fuzzy operator neural network [29J• This is a fuzzy neural network model, whose neuron aggregation operator is a fuzzy operator satisfying commutative law, associative law and zero law, with a consistent approximation for the continuous functions; (b) Monomer fuzzy neural network [301• This is a fuzzy neural network model which modifies the operators of the traditional neural network to operators
«v ; /\> ;
(c) Simplex and mixed fuzzy neural network [311• This includes traditional neurons and fuzzy neuron s, and has accurate and fuzzy information-processing capability. (d) Fuzzy max-min operator neural network [321. This is composed of fuzzy max-min operator neurons. The fuzzy max-min operator neuron refers to the following memory storage system (2.5)
where «v ; r;» satisfy: for any a,bEAs;[-I,I], there are a/\b=sgn(ab)min(lal,lbl), avb=sgn(ab)max(lal,lbl),
I, x> 0 where sgn(x)= 0, x = 0 ; { -I, x < 0
Xl ,x2, ... ,xn
are n inputs ,
X;E
[0,1];
WJ,W2, ... ,Wn
are the
connection weights corresponding to the above n input channels, WjE [-1,1]. Different combination modes can give rise to different fuzzy neural networks, but there are two main methods according to function, i.e. the combining pattern based on "differentia" and the integration pattern based on "sameness". The former integrates the advantages of both fuzzy logic and neural networks, and makes the fuzzy system or the neural network extend to extra special functions based on the
Artificial Neural Networks
35
original function. The latter integrates them based on the similarity between fuzzy systems and neural networks.
2.5 Nonlinear Aggregation Artificial Neural Networks In the aggregation operation of the traditional neuron, the aggregation operator generally takes the linear weighted summation of the input signals. In fact, in information processing in a biological neuron, the effect of an exoteric perception signal or signal transferred from other neurons is not completely linear weighted aggregation, but often is a particular nonlinear aggregation. Now we consider several effective nonlinear aggregation artificial neural network models.
2.5.1 Structural Formula Aggregation Artificial Neural Networks In a biological neuron, some input signals produce activation of neurons, while others produce inhibition. Consequently , we naturally construct the following artificial neuron mathematical model with structural formula aggregation .
Y=f [LWXX B), LVXX
(2.6)
where the numerator part Iwxx denotes the activation from an input signal to a neuron; the denominator part Ivxx denotes inhibition from the input signal to the neuron ; their effects can be adjusted by the connection weight coefficients . When the external input signal only activates but does not inhibit the neuron, then Ivxx=l, and the structural formula aggregation neuron is a traditional neuron model, i.e. the traditional neuron can be regarded as a special case of the structural formula aggregation neuron. The structure of a structural formula aggregation artificial neural network is similar to that of the traditional feedforward neural network ; the difference is that the neuron in the network is a structural formula aggregation neuron. This network model has higher efficiency and delicacy than the general neural network does in the fitting of an object with singular values output.
2.5.2 Maximum (or Minimum) Aggregation Artificial Neural Networks The importance that external factors have in stimulating and influencing the neuron is generally different. Under some conditions, a certain important factor may determine the output of the neuron, and thus we can use the following maximum (or
36
Process Neural Networks
minimum) aggregation artificial neural network model to express this informationprocessing mechanism. The maximum aggregation artificial neural network mode l
y=flmax(wx.x)-B).
(2 .7)
The minimum aggregation artificial neural network model
y=f(min(wx.x)-B) .
(2 .8)
A neural network composed of maximum (or minimum) aggregation neurons is called a maximum (or minimum) aggregation artificial neural network. This model is particularly suited for decision support, sensitive factor analysis, etc.
2.5.3 Other Nonlinear Aggregation Artificial Neural Networks In fact, we can construct multiform non-linear aggregation artificial neural mode ls according to the actual demands of practical problems and the constitutive principle of artificial neural networks. For example
y=f [ y=f [
LWXX
max(wxx) LWXX
min(wxx)
y=f( min(wxx) max(wxx)
y = f(
m~x(wxx) mm(wxx)
oJ
(2.9)
oJ
(2.10)
oJ oJ
(2.11) (2.12)
y = f(II wxx-O) ,
(2.13)
y = f( exp(II wxx)-O) .
(2.14)
Different types of aggregation artificial neurons have different informationprocessing mechanisms for the external input signals. A neural network consisting of the above neurons or some different types of neurons according to a certain hierarchical structure can emphasize different characters of different neurons in information processing. It is to a certain extent similar to the basis function composed of different types of functions in the function approximation, and can advance the flexibility and the adaptability of neura l networks in solving practical problems.
Artificial Neural Networks
2.6 Spatio-temporal Networks
37
Aggregation and Process Neural
As mentioned above , so far the artificial neural network (ANN) models that have been researched or are being researched are mostly based on the theoretical framework of PDP (Parallel Distributed Processing). The inputs of ANNs are constants independent of time, that is, the inputs at a time are just geometric point type instantaneous inputs (a value or a vector) . However, neurophysiological experiments and biological research indicate that variation of output of the synapse is related to the relative timing of the input pulse in a biological neuron, i.e. the output of a neuron depends on an input process that lasts for some time . The output of the neuron is not only related to the spatial aggregation and activation threshold function of input signals , but also depends on a time cumulative effect of the input process. Moreover, in practical problems, the inputs of many systems are also processes or functions changing with time . For example, in a real-time control system , the inputs are continuous signals changing with time, and the outputs not only depend on the spatial weighted aggregation of the input signals , but also are relative to the temporal cumulative effect in the input process interval. For variational problems, the definitional domain of the functional is generally a process interval related to time. For optimizing problems, multifactor optimization that depends on time can also be classified as conditions with process inputs . It can be said that the traditional artificial neuron M-P model preferably simulates the spatial weight aggregation effect and activation threshold function of the biological neurons in information processing, but it lacks another important character of the biological neuron-temporal cumulative effect [331• In order to solve problems like dynamic signal processing and nonlinear continuous system control, many scholars have presented some neural network models that can process time-varying information, such as delay unit networks [341, spatial-temporal neural models [351, recurrent networks [361, and partial feedback networks 1371. When solving procedural input and the problem of time order dependency of the system, these models usually implement delay between inputs and outputs by an external time-delay link, i.e. a time-discretization loop network is constructed. However, it will make the system structure complicated and bring many problems that are difficult to foresee, to the structure of the learning algorithm of the networks, the convergence and the stability of the algorithm, etc. At the same time, the essence of models and learning algorithms listed above are still based on traditional neural networks, and do not change the information processing mechanism of artificial neurons . Therefore, we simulate the relevant processing method of the biological neural system for the external input information, and extend the aggregation operation mechanism and the activation mode of the neuron to the time domain. It has important and practical significance to make the artificial neuron have the ability to process spatio-temporal 2-D information at one time. In the I990s, the authors started to research neural networks whose
38
Process Neural Networks
inputs/outputs are all a time-varying process, and in 2000, the concept and the model of the process neuron and the process neural networks were published for the first time. A process neuron works by simulating the principle of dynamics that the external stimulation of some biological neural system may last some time and the biological neuron proceeds to information processing according to the functions of synthesis, coordination and accumulation of many time-varying input signals in time delay intervals . The inputs and the weights of the process neuron can both be time (process) functions. It adds a temporal cumulative aggregation operator based on the spatial aggregation operation of the traditional neuron. Its aggregation operation and activation can simultaneously reflect the spatial aggregation function and temporal cumulative effect of time-varying input signals, i.e., the process neuron can process spatia-temporal 2-D information at one time. The basic information-processing units composing an ANN system are neurons. The information -processing mechanism of the neuron is the key to the character and information-processing capability of the neural network. The connection weights of the network can only be adjustable parameters or functions, and the aggregation operations (spatial, temporal) and the activation effect of the activation threshold should be completed in a neuron. From this point of view, the process neuron preferably simulates the information-processing mechanism of a biological neuron. A process neural network is a network model that is composed of some process neurons and general non-time-varying neurons according to a certain topological structure. Like the traditional neural network, a process neural network can be divided into feedforward and feedback neural networks according to the connection mode and the existence of feedback in information transferring among neurons. In fact, according to the difference in the topological structure of the network, mapping relationship between inputs and outputs, connection weights, activation threshold styles and learning algorithms, we can construct multiform process neural network models to adapt to different practical problems. The process neural network broke the synchronous instantaneous limitation of the traditional neural network model to inputs/outputs , which makes the problem more generalized and the application fields of artificial neural network broader. In fact, many practical applications can be classified into these kinds of issues, such as the simulation modeling of nonlinear dynamic systems, nonlinear system identification, control process optimizing , classification and clustering of continuous signals, the simulation and control of an aggregation chemical reaction process, fault diagnosis of continuous systems (analysis of fault reason), factor analysis (determination of primary-secondary of factors or reasons, also called reverse reasoning), and function fitting and process approximation . The neural network with process input is an extension of the traditional artificial neural network into the time domain, and is a generalized artificial neural network model. The traditional artificial neural network can be regarded as a special case of the process neural network which has broad adaptability for solving multitudinous problems related to inputs/outputs and processes in practice.
Artificial Neural Networks
39
2.7 Classification of Artificial Neural Networks So far, there are many kinds of proposed artificial neural network models, and each of them has its own structure character and information-processing method. According to the construction elements of the neural network, artificial neural networks can be classified by the following nine dimensions. It can be said that various existing neural networks are all included in the nine dimensions. (a) Input type Input can be divided into a simple type (integer, real type, string, etc.), a structure type (complex number type, tuple, etc.), a predication , a function (especially a time-varying function and a multivariate function), and even the dot in any functional space or abstract space. Moreover, we can further divide the above inputs into accurate, fuzzy, uncertain, or incomplete inputs, etc. (b) Output type Output can be divided into a simple type (integer, real type, string, etc.), a structure type (complex number type, tuple, etc.), a predication, a function (especially a time-varying function and a multivariate function), and even the dot in any functional space or abstract space. Moreover, we can further divide the above outputs into accurate, fuzzy, uncertain, or incomplete outputs, etc. (c) Connection weight type Connection weight can be divided into a simple type, a structure type, a function (especially a time-varying function, and a multivariable function), even functional, etc. Moreover, we can further divide the above connection weights into accurate, fuzzy, uncertain, or incomplete connection weights, etc. (d) Activation threshold type Activation threshold can be divided into a simple type, a structure type, a function (especially a time-varying function and a multivariable function), even functional, etc. Moreover, we can further divide the above activation thresholds into accurate, fuzzy, uncertain, or incomplete thresholds, etc. (e) Aggregation function type Aggregation function can be divided into arithmetical (further divided into linear and non-linear), logical, compound, and even functional types, etc. Moreover, we can further divide the above aggregation function into accurate, fuzzy aggregation functions, etc., including adopting various aggregation functions consisting of the T operator and S operator in fuzzy mathematics. The whole aggregation process of the neuron on input signals can be divided into spatial aggregation, multi-factor aggregation, temporal accumulation, etc. (f) Activation function type There are many types of activation functions . Generally, they are non-linear functions or functional, and can be further divided into accurate and fuzzy activation functions, or can also be time-varying functions . (g) Connection structure type Connection structure is generally divided into two classes, i.e. pure feedforward and feedback.
40
Process Neural Networks
(h) Learning algorithm type There are many kinds of learning algorithms. They can be divided into three types of computing (including functional or computation in abstract space), logic and reasoning in terms of adopted operation type according to the learning algorithms. (i) Process pattern of time type Process pattern of time can be divided into a continuous class and a discrete class (also called quantization). There are two aims of classification , one is to summarize existing research productions, and make them standardized, systemized, and make the understanding of problems clearer at the same time ; the other is to highlight those neural network models with significant factor permutations that have not so far been studied or applied using permutations and combinations of the possible values of classification factors (a multi-dimensional array composed of classification factors) . According to this aim, there are nine classification factors , and they have thou sands of combinations in all, among which many significant combinations exist. We believe that many neural networks corresponding to these combinations have not been researched thoroughly yet and are worth the notice of researchers. We especially point out that proposing this classification of neural networks including various existing neural networks is the main contribution of this book . The subject of this book "process neural network" is just one kind of these numerous networks. Certainly, it has great importance and significance.
References [1] Shepherd G.M. (1994) Neurobiology, 2nd Ed. Oxford University Press, New York [2] Longstaff A. (2004) Instant Notes in Neuron Science, 1st Ed. Bios Scientific Publishers, Oxford [3] Hecht-Nielsen R. (1989) Theory of the backpropagation neural network. Proceeding s of International Joint Conference on ofNeural Networks 1:593-605 [4] Hornik K. (1991) Approximation capabilities of multilayer feedforward networks. Neural Networks 4(2):251-257 [5] Funahashi K., Nakamura Y. (1993) Approximation of dynamical systems by continuous time recurrent neural networks. Neural Networks 6(6) :801-806 [6] Liu X.H., Dai R.W. (1995) Turing equivalence of neural networks of linear-threshold-logic units. Chinese Journal of Computer 18(6):438-442 [7] Mohamed A., Mazumder M.D.A. (1999) A neural network approach to fault diagnosis in a distribution system. International Journal of Power and Energy System 19(2):696-703 [8] Garris M.D, Wilson c .t, (1998) Neural network-based systems for handprinted OCR applications. IEEE Trans Image Processing 7(8):1097-1112 [9] Ran Q.Q., Li S.L., Li Y.Y. (1995) Identification of sedimentary microfacies with an artificial neural network. Petroleum Exploration and Developmen, 22(2):59-63 (in
Artificial Neural Networks
41
Chinese) [10] Schwarz P., Matejka P., Cemocky 1. (2006) Hierarchical structures of neural network s for phoneme recognition. ICASSP 2006 Proceedings. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing 1:325-328 [II] Wang S.L. (2008) Research on a new effective data mining method based on neural networks . International Symposium on Electronic Commerce and Security , 2008 pp.195-198 112] Wu X.D. (2004) Data mining: artificial intelligence in data analysis . In: IEEEIWIC/ACM International Conference on Intelligent Agent Technolog y 1:569-575 113] Curtis D. (2007) Compari son of artificial neural network analysis with other multimarker methods for detecting genetic association. BMC Genetics 8(\) :49 114] Wang W., Zhang C. (2000) Applying artificial neural network to the predicting of nonlinear economy. Journal of Systems Engineering 15(2):202-207 (in Chinese) 115] Zhu CJ., Chen SJ. (2008) Prediction of river water quality using organic gray neural network. Control and Decision Conference pp.248 1-2484 1161 Zhu CJ., Zhou 1.H., Ju Q. (2008) Prediction of groundwater quality using organic grey neural network model. The 2nd International Conference on Bioinformatics and Biomedical Engineering pp.3168-3171 1171 Kuo RJ ., Chi S.c. (2002) A decision support system for selecting convenience store location through integration of fuzzy AHP and artificial neural network . Computer s in Industry 47(2) :199-214 1181 Ye Q., Liang B., Li YJ. (2005) Amnestic neural network for classification: application on stock trend prediction . Proceedings of 2005 International Conference on Services Systems and Services Management 2:1031-1034 1191 Khoa N.L.D, Sakakibara K., Nishikawa I. (2006) Stock price forecasting using back propagation neural networks with time and profit based adjusted weight factors . International Joint Conference on SICE-ICASE pp.5484-5488 1201 Liu Y., Liu H., Zhang B.F. (2004) Extraction of if-then rules from trained neural network and its application to earthquake prediction . Proceeding s of the Third IEEE International Conference on Cognitive Informatics pp.109-115 1211 Ertunc, H.M., Hosoz, M. (2006) Artificial neural network analysis of a refrigeration system with an evaporative condenser. Applied Thermal Engineering 26(5-6) :627-635 1221 Xia c .i., Qi W.Y., Yang R., Shi T.N. (2004) Identification and model reference adaptive control for ultrasonic motor based on RBF neural network. Proceedings of the CSEE 24(7):117-121 (in Chinese) 1231 Gc 5.5.. Hong F.. Lee T.H. (2003) Adaptive neural network control of nonlinear systems with unknown time-delays. IEEE Trans Automatic Control 48(11): 200.f-20 I0 12-11 Tomus> 1'.. Kr/ys/tor Z. (2007) Application or artificial neural network to robust
42
Process Neural Networks
speed control of servodrive. IEEE Transactions on Industrial Electronics 54(1):200-207 [25] Ciuprina G., Loan D., Munteanu I. (2002) Use of intelligent particle swarm optimization in electromagnetics . IEEE Transactions on Magnetics 38(2):1037-1040 [26] Niu Y.G., Yang C.W. (2001) Mode control for nonlinear uncertainty system of neural network. Information and Control 30(2):139-142 (in Chinese) [27] Liu B., Brun 1. (1999) Solving ordinary differential equations by neural network. Modeling and Simulation: A Tool for the Next Millennium. Proceeding of 13th European Simulation Multi-conference, Warsaw, Poland 11:437-441 [28] Feng Y., Chen Y.M. (2005) The Application of self-organizing neural network in image processing. Process Automation Instrumentation 26(8):32-34 (in Chinese) [29] He X.G. (1990) Fuzzy computational logic and neural networks. Advancement of Fuzzy Theory and Systems. International Academic Publishers, Beijing D14:1-8 [30] Liang 1.Z., He X.G. (2000) Function approximation capabilities of monolithic fuzzy neural networks. Journal of Computer Research and Development 37(9):1045-1049 (in Chinese) [31] He X.G. (1998) The Theory and Techniques of Fuzzy Knowledge . National Defense Industry Press, Beijing (in Chinese) [32] Liang 1.Z., He X.G. (2001) Turing equivalence of fuzzy max-min operator neural networks. Journal of Beijing University of Aeronautics and Astronautics 14(1):82-85 (in Chinese) [33] Ou Y.K., Liu W.F. (1997) Theoretical frame based on neural network of biometric-model of nerve cells. Beijing Biomedical Engineering 16(2):93-101 (in Chinese) [34] Waibel A., Hanazawa T., Hinton G.E., Shikano K., Lang KJ. (1989) Phoneme recognition using time-delay neural networks. IEEE Transaction ASSP 37(3):328-338 [35] Tsoi A.C. (1994) Locally recurrent globally feedforword networks. A Critical Review of Architectures. IEEE Transactions on Neural Netwo rks 5(2):229-239 [36] Draye 1.S., Pavisic D.A., Cheron G.A., Libert G.A. (1996) Dynamic recurrent neural networks: A dynamical analysis. IEEE Trans SMC(B) 26(5):692-706 [37] Herts 1., Krogh A., Palmer R.G. (1991) Introduction to the Theory of Neural Computation . Addison-Wesley Longman Publishing Co., Inc., Boston, MA
3 Process Neurons
In this chapter, we will begin to discuss in detail the process neural network (PNN) which is the subject of the book. First, the concept of the process neuron is introduced. The process neuron is the basic information-processing unit that constitutes the PNN, and the model used to form it and its operating mechanism determine the properties and information-processing ability of the PNN. In this chapter, we mainly introduce a general definition and basic properties of the process neuron, and the relationship between the process neuron and mathematical concepts, such as compound functions, functional functions, etc.
3.1 Revelation of Biological Neurons Neurophysiological experiments and research in biology indicate that the information processing characteristics of the biological neural system include the following nine main aspects: the spatial aggregation function, the multi-factor aggregation function, the temporal accumulation effect, the activation threshold characteristic, self-adaptability, excitation and inhibition characteristics, delay characteristics, and conduction and output characteristics [1-3]. From the definition of the M-P neuron model, we know that the traditional ANN simulates the characteristics of voluminous biological neurons, such as spatial weight aggregation, self-adaptability , conduction and output, etc., but that it lacks a description for the time delay, the accumulation effect and the multi-factor aggregation function. In the process of practical information processing in the biological neural system, the memory and the output of the biological neurons not only depend on the spatial aggregation function of each piece of input information, but also are related to time delay and accumulation effects, or are even related to other multi-factor aggregation functions. Therefore, the process neuron model we want to construct should simulate the~e important information-processing characteristics of biological neurons.
44
Process Neural Networks
3.2 Definition of Process Neurons In this section, we first define a simple proce ss neuron , which temporarily excludes the multi -factor aggre gation ability. Th is process neuron is made up of four operation s, including a time-varyin g process (or funct ion) signal input, spatial weighted aggregation , time effect accumulation and activation threshold activation output. It differs from the traditional neuron M-P model in two ways. First, the inputs, connection weights and activation threshold of the process neuron can be time- varying functions; second, the proce ss neuron has an accumulation operator, which makes its aggregation operation express both spatial aggregation for the input signals and the cumulative proc ess to time effect. The structure of the proce ss neuron model is shown in Fig. 3.1 .
y
Fig. 3.1 A general model of process neuron , xn(t) are the time-varying input functions of the In Fig. 3.1, XI(t), X2(t), process neuron ; WI(t), W2(t), , wn(t) are the corresponding weight functions; K(·) is the aggregation kernel function of the process neuron that can transform and proces s the input signals according to the inherent character of the actual system; K) is the activation function which usually is a linear function, a Sigmoid function , Gaussian function, etc. The proce ss neuron can be divided into two basic model s described by mathematics according to the different sequences of the spatial aggregation and temporal accumulation operation. The relationship between inputs and outputs of the proces s neuron is described below . Model I:
y = f(:L(f(K(W(t),X(t))))-O) .
(3.1)
In Eq. (3.1), X(t) is the input function vector, W(t) is the corresponding connection weighted function vector, y is the output, () is the activation threshold (also can be time-varying), "I " denote s some spatial aggregation operation (such as a weighted sum , Max and Min), "I" denote s some temporal accumulation operation (such as the integral over t). The proce ss neuron described by Eq. (3. 1) first does temporal weighted accumulation for the external time-varying input signals, i.e. implements the weighted temporal accumulation of system output for each time-v arying input signal,
ProcessNeurons
45
then does spatial aggregation on the temporal accumulation effect, and finally outputs the result by computing the activation function! Its structure is shown in Fig. 3.2.
y
Fig. 3.2 Process neuron model I Model II:
y=f
(f(L( K (W(t), X(t)))) -0).
(3.2)
The process neuron denoted by Eq. (3.2) first does spatial weighted aggregation when carrying out temporal-spatial aggregation operation, i.e. implements the spatial aggregation of multi-input signals at the same time point, then does temporal accumulation on the former spatial aggregation results, and finally outputs the result by computing the activation function! This process neuron is more often used in applications. Its structure is shown in Fig . 3.3.
Fig. 3.3 Processneuron model II It should also be noted that f, K, I and J can be diversified operators, and that they are not always exchangeable. Therefore, Model I is not equivalent to Model II. For instance, if we suppose that I=weighted sum, J=integral,f-:sign, K(u,v)=u*v, then Eq. (3.1) becomes
f(W(t) * X(t))dt )-0),
(3.3)
f(:L(W(t) * X(t))dt)-O) .
(3.4)
y = sign(L( and Eq. (3.2) becomes
y = sign(
Further, the process neuron can be extended to the condition that its inputs and outputs are all time-varying process functions, for example
46
Process Neural Networks
y(r) = f
(r( l( K (W(t), X (t»))) -0),
(3.5)
l(r( K(W(t),X(t»)))-O) ,
(3.6)
or y(r ) = f(
where "Jr" is a temporal accumulation operator depending on r, for instance the integral in the time interval [0, r] or [r-k, r] , This kind of process neuron can be used to constitute complex process neural networks with multi-hidden-layers. For brevity, we now use "Elf' and "®" to denote respectively the spatial aggregation operator and temporal accumulation operator in Eqs. (3.1) and (3.2), then the mapping relationship between the inputs and output of a process neuron denoted by Fig. 3.2 is
y = f ((W(t) EEl X(t»)® KO-O),
(3.7)
and the relationship between the inputs and output of a process neuron denoted by Fig. 3.3 is
y = f((W(t)® X(t») EEl KO-O).
(3.8)
For instance,
W(t) EEl X(t)
=:tw;(t)x;(t),
r
(3.9)
;=1
A(t) e KO =
A(t)K(t)dt,
(3.10)
where [0, 1'] is the input process interval of time-varying signals; K( ·) is an integrable function over the interval [0, 1'], or more generally suppose that K(·) is a mono-functional, and we define
A(t) e K(·) = K(A(t» .
(3.11)
Generally, the weighted function W(t)=(w,(t), wz(t), ... , wit» and the temporal weighted kernel function (functional) K(·) are both supposed to be continuous, and actually are in most applications. In Eq. (3 .7), if the spatial aggregation operation is taken as a weighted sum, the
temporal (process) accumulation operation is taken as the integral, and K(·)= 1, then the formula can be rewritten as (3.12)
ProcessNeurons
47
The process neuron described by Eq. (3.12) is called a special process neuron whose operation consists of weighted multiplication , summation, integration, and activation functions. In fact, the spatial aggregation operator "EEl" and the temporal accumulation operator " (8)" can take other operations of various forms. For example, "EEl" can be "max" and "min", or "T-operator" and "S-operator"; " (8)" can be convolution , varying parameter integration, etc.; the activation function f can be any bounded function. Thus, the process neuron described by Eq. (3.7) or Eq. (3.8) is a class of very broad process neurons and is called the generalized process neuron. The adaptability and the information-processing capability of the process neuron for handling different practical problems mainly depends on the forms of the spatial- temporal accumulation and aggregation operators, which should be carefully selected in practical applications . The process neuron can produce a process memory of the characteristics of time-varying input signals by learning the training samples. The process memory is reflected by the connection weight functions of the process neuron. In Eq. (3.12), if T=O, Xi(t )=Xi, Wi(t)=Wi, then it can be simplified as (3.13) This is a non-time-varying traditional neuron. It is obvious that the traditional neuron is a special case of the process neuron. Next, we will discuss the process neuron and some interrelated mathematical concept s, such as the relationship between neurons, functionals, and multivariate functions, etc.
3.3 Process Neurons and Functionals From the definition of the special process neuron, we know that the input of the process neuron is a time-varying function (or function vector), and that the output is a real value. Therefore, the process neuron is actually a kind of functional function from the mathematical perspective. Subsequently, in Eq. (3.12), if the activation function f is a linear function, and the activation threshold 0=0, then the process neuron is a linear functional. If we use F to denote the functional relationship delegated by the process neuron, we can obtain F (a1X1(t ) +a2X 2(t) + ...+aKX K(t))
= .bf W (t )-(a1(X1(t)) T +a2 (X 2 (t )) T + ...+aK(XK(t )) T )dt
r
=a1
W (t )· (X 1(t ))Tdt+a2
r
W (t )· (X 2(t ))Tdt+ ...+aK
r
W (t ) · (X K(t ))Tdt
48
Process Neural Networks
where Xk(t)=(Xkl(t) , xdt) , ... , xdt)) is an n-dimensional vector of the input functions, W(t)=(WI(t), W2(t) , . .., wn(t)) is an n-dimensional vector of weighted functions, ak is a real constant. In fact, the process neuron defined by Eq. (3.2) can also be directly extended to the condition of time-varying inputs-outputs, for example (3.14)
Then the inputs and outputs of the process neuron are all time-varying functions, i.e. the process neuron denoted by Eq. (3.14) is a functional function with variable parameters. The mapping mechanism of the traditional artificial neuron is a kind of function relationship. Function theory and function approximating methods greatly improve research into traditional artificial neural networks. The mapping mechanism of the process neuron is a kind of functional relationship, so we can also discuss in detail some properties of process neural networks by virtue of functional theory, and research on the learning and general ization problems of PNN by virtue of the functional approximating idea. It is of great significance for research on mapping mechanisms and applicability of the process neuron.
3.4 Fuzzy Process Neurons In practice, we often meet processing problems with process fuzzy information, such as ECDM process control 141, grinding process fuzzy control system design 15J, steam temperature regulation in coal-fired power plant [6J, machining process modeling 171, etc. If we define a kind of fuzzy process neuron by combining the information processing method of the process neuron with fuzzy reasoning rules, it will improve the information processing ability of artificial neurons . Two methods can be used to construct a fuzzy process neuron. One is that we directly fuzz the process neuron, combining the nonlinear transforming mechanism of the time-varying information of the process neuron with fuzzy logical reasoning methods, and establish a fuzzy computing model that can deal with process information. The other is that we denote the fuzzy reasoning rules used with process information as a fuzzy process neuron, i.e. each fuzzy process neuron denotes one fuzzy process-reasoning rule, so that multiple fuzzy process neurons can constitute a fuzzy process neural network according to a certain structure, i.e. construct a fuzzy process logical reasoning system (rule set). The following problems all focus on the domain with process fuzzy information (fuzzy time-varying system), and the non-fuzzy system can be regarded as a special case of a fuzzy system.
ProcessNeurons
49
3.4.1 Process Neuron Fuzziness Suppose that
Ai' ~, ..., A
K
are fuzzy sets in a domain U, and the membership
functions in the acceptance domain are /1;. (-),/1;. 1
2
0, ...,/1;. 0 respectively. The K
fuzzy process neuron is made up of weighted inputs of fuzzy process signals, fuzzy aggregation operation, and fuzzy activation output. Its structure is shown in Fig. 3.4.
Fig. 3.4 Fuzzy process neuron
In Fig. 3.4, the neuron input X(t)=(Xl(t),X2(t) ,... ,x,,(t » , tE [O,n can be time-varying functions or process fuzzy information; the connection weight of the fuzzy process neuron w(t) = (WI (t), w2 (t) ,..., Ii'" (t)) can be used to denote membership function s or belief functions; "S:" and "Etl" are two fuzzy dual aggregation operators corresponding to spatial aggregation and temporal accumulation respectively, such as max and min. an S-operator and T-operator; f is the fuzzy activation function, and y is the output of the fuzzy process neuron, According to Fig. 3.4, the relationship between the inputs and the output of this fuzzy process neuron is ." =
In Eq. (3.15),
B(t)
f (Etl (x(t) & w(t) ) -
B(t) ) .
(3.15)
is the fuzzy activation threshold of the fuzzy process neuron,
and it can also be a time-varying fuzzy function . As the inputs, connection weights, activation threshold, aggregation/accumulation operation and activation function of the process neuron are all fuzzied, and can be variously a fuzzy set, a fuzzy operation and fuzzy functions respectively, the output of the process neuron can be a fuzzy numerical value or a fuzzy function . Similarly to the information processing mechanism of the non-fuzzy process neuron, all the input functions (fuzzy or accurate) of this fuzzy process neuron are correspondingly aggregated/accumulated after weighting, and we obtain the output result of the neuron according to the activation threshold and the activation function.
50
Process Neural Networks
3.4.2 Fuzzy Process Neurons Constructed using Fuzzy Weighted Reasoning Rule [8-10] Denote the process neuron as a weighted fuzzy logical rule in semantics. Its precondition and conclusion include fuzzy predication of proce ss information. In this fuzzy process neuron , the inputs/output with process fuzzy information are conn ected by a weighted fuzzy logical rule. The knowledge and rules of the domain are stored in the fuzzy connection weight and the aggregation operator, and their output predications are made up of the combination of the current input predication and existing experienced weight according to certain rules. One fuzzy proce ss neuron corre spond s to one weighted fuzzy logical rule with process information. Its structure is shown in Fig. 3.5.
Fig. 3.5 Fuzzy reasoning proce ss neuron
The process neuron denoted by Fig.3.5 corresponds to a fuzzy reasoning rule that contains process information, and is denoted as (3.16) where P;(t), Q(t) (tE [0,11) are fuzzy logical predications, and the true value is taken in the interval [-I , 1]; the fuzzy connection weight w; ~ 0 (which can be a function
I
n
dependent on time), and
W; = I ; cf(O~O, the global optimal solution that the damped Newton method of Eq. (4.27) gives is the result of Eq. (4.15). Thus, the proof is completed .
4.4.2 Continuity To show the continuity of process neural network is to solve the problem of whether the mapping relationship of the process neural network is continuous . In other words, when the variation of the network inputs is very small, the variation of the outputs is also very small. Theorem 4.2 Suppose that the two inputs of a process neural network defined by Bq. (4.1) are respectively X(t),X'(t)E U c (C[O,T))", and the corresponding outputs are y, y'E VcR. If f, g are continuous, then for any £>0, there exists £5>0, when IIX(t)-X'(t)lk£5, Iy-y*k£ holds.
Feedforward ProcessNeural Networks
63
Proof In Eq. (4.1), denote W = max sup I W i} (t) I, I .j
O';t,;r
. - Jorr f'LJ
U } -
W;}
.
(I)
(t ) x; (t) d t - ()} .
;= 1
As g is continuous, then for any 00, there exists 151>0, when
ly-y*l<e holds. In the following, we will prove that for 15 1>0, there exists 15>0, when IIX(t)-X*(t)II<J, there is
Becausef is continuous, for 151>0, there exists 152>0, when (4.29)
we have (4.30) where V = max(1 , Vi I). Thus, whenever X(t), X*(r) and the selected15>0 satisfy IIX(t)- X*(t)11 < 0 < O2 / (n ·T ·W),
(4.31)
we have
Thus, the proof is completed. As we all know, a traditional neural network is a continuous model. Actually, a traditional neural network is a special case of the process neural network.
64
Process Neural Networks
Theorem 4.3 network.
Proof In
The traditional neural network is a special case of the process neural
y=
g[~Vjf(
r(t
Wij(t)X;Ct))dt-Ojl)
)-0)'
if we let
T=O,
x;(t)=x; and w;/t)=wij, then this can be simplified as
Thi s is a time-invariant traditional feedforward neural network with a single hidden layer. Thus, the proof is completed.
4.4.3 Functional Approximation Property Functional approximation capability is an important property of a process neural network, and it determines the applicability and the modeling capability of the process neural network for solving problems. In order to discuss the functional approximation property of the process neural network, two definitions are given as follows. Definition 4.1 Suppose that K(·):R n- . VcR is a random continuous function from R" to R, and is denoted as KE qRn). Define the functional class 'L\K)=
lfU-.V1 f(x(t)) =
r
K(x(t))dt,x(t)E UcR\f{x)E VcR} .
Definition 4.2 Suppose that X(t)=(Xt(t), xz(t), ..., Xn(t))T where X;(t)E qo, T], i=l, 2, . . ., n. If 1x;(tt)-x;(tz)I:::L1t t-tzl with I..;::.O for any tlo tzE [0, T], then x;(t) is said to satisfy the Lipschitz condition; If IIX(tt)-X(tz)II:::;Lxltl-tzl with L20, then X(t) is said to satisfy the Lipschitz condition; if IK(X(tl))-K(X(tz))II:::;LKIIX(tl)-X(tz)lI, then K(·)E C(Rn) is said to satisfy the Lipschitz condition. Research on the traditional neural network has already proved the following well-known approximation theorem. Lemma 4.1 [4J For any continuous function gE C(Rn) there exists a feedforward neural network with only one hidden layer, which can approximate g with any chosen accuracy. Theorem 4.4 (Approximation Theorem 1) For any continuous functional G(x(t)) E'Ln(K) defined by Definition 4.1 and any c:>0, if G(x(t)) satisfies the Lipschitz condition, then there exists a process neural network P such that IIG(x(t))-P(x(t))lkc:. Proof For any GE 'Ln(K), that is
G( x(t)) =
r
K(x(t))dt.
T=l, K
(4.34)
Without loss of generality, let is regarded as the composite function with respect to t, and the integral interval is divided into N equal parts , here t;=i/N (i=l ,
FeedforwardProcess Neural Networks
65
2, . .., N) is the partition point , then N
G(x(t»)= L
i =1
f
Let functional G(x(t» = ~ K(x(t) N i=\
l' K(x(t»)dt. t
(4.35)
i-I
be the approximation of G(x(t», then
t
I
_ IN . 1 N IG(x(t»)-G(x(t»)1 = ~ K(x(t»)dt- N~K(X(ti»)
s
tit
K(x(t»)dt-
~K(X(t;»)1
(4.36)
Because K(x(t» is continuous with respect to t, by the interval mean value theorem, there exists ~;E [(i-l)IN, i/NJ such that (4.37) Therefore,
(4.38)
where L K and L, are respectively the Lipschitz constants of K(x) about x and x(t) about t. Therefore,
G(x(t») =
1K(x(t»)dt=-N1 LK(x(t))+O(l/ N). r
N
(4.39)
i=O
Denote X(ti)=X(i). Becau se K(x(i»:Rn ---+V is a continuous function in C(Rn ) , according to Lemma 4.1 , it can be approximated by a traditional neural network, and based on Theorem 4.3, this traditional feedforward neural network can certainly be replaced by a process neural network Pi, i.e. (4.40) where f:i>O is an arbitrarily small value, i=l, 2, ... , N. We might as well let f:;No, and we have
66
Process Neural Networks
I
I LK( N e G ( X(t ))-x(t)) 0, there exists a process neural network PI which satisfies
Theorem 4.5 can be extended to the situation of the process neural network with multiple inputs and multiple outputs: for any continuous functional Gz: VzcC(R n ) - . VzcR L and 81>0, there exists a process neural network Pz satisfying
Feedforward ProcessNeural Networks
Due to C= (Ch
CZ, ••• , CL) E
75
VZ, there exists Pz which satisfies
Define a process neural network P P(X (m =
~ (X
(m *B(t ),
in which B(t)=(bl(t), bz(t), ... , bL(t)), and "*,, denotes the inner product operation. Denote B = max sup{b, (t), bz(t), ...,bL (t)} . From the definition of G(X(t)) and P, I
OSI ST
we have
IIG(X (m- P(X (mil =Il y(t) - P(X (mil=Il y(t) - C *B(t) + C *B(t) - P(X (mil ~ liYCt) - C * B(t)/I + Ilc * B(t ) - P(X (mil~ 8/2 +Ilc * B(t ) - Pz(X(m* B(t)11 = E! 2+II(C - ~ (x(t))) * B(t)11 ~ 8/ 2+llc - Pz(X (t))II· B ~ 8/2+81 • B.
Here, let E:I=d(2B), we have IIG(X(t))-P(X(t))lkE:. Thus, the proof is completed.
4.7 Continuous Process Neural Networks In this kind of process neural network, first the process neuron with time continuous functions as its inputs and output s is defined . Its spatial aggregation operator is still defined as the spatial weighted summary of multi-input time-varying signals, and temporal accumulation operators are taken as parameter-varying integrals with time. In this way, the aggregation/accumulation operations and the activation mode of process neurons can simultaneously reflect the spatial aggregation effect of external time-varying input signals and stage time accumulative effect in an inputting course, and can also implement the nonlinear real-time (or some time unit delay) mapping relationship between inputs and outputs [IZ1. These process neurons can constitute a complex process neural network with multiple hidden layers conforming with certain topological structures. Using the nonlinear transformation mechanism which the artificial neural network has to establish the mapping relation ship between time-varying inputs and outputs in the comple x nonlinear continuous system directly , gives it broad adaptability for many problems whose inputs and outputs are both continuous time function s in practical applications.
76
ProcessNeural Networks
4.7.1 Continuous Process Neurons A continuous process neuron is defined as a process neuron with continuous time functions as its inputs and outputs . This process neuron is composed of the operations of time-varying input signal weighting, spatial aggregation, temporal accumulation, and activation output. The spatial aggregation operator adopts multi-input signal weighted summary and the temporal accumulation operator adopts parameter-varying integral with time. The structure of the continuous process neuron is shown in Fig. 4.8.
Fig. 4.8 Continuous process neuron In Fig. 4.8, Xl(t), X2(t), .. ., xn(t) are the continuous time-varying input functions of a process neuron; w,(t), W2(t), .. ., wit) are corresponding connection weight functions respectively; "I:" is the spatial aggregation operator of the process neuron, and is taken as the weighted summary of multi-input signals; "Jr" is the temporal accumulation operator of the process neuron, and adopts parameter-varying integral with time;f(·) is the activation function which can be a Sigmoid function, a Gauss function, or also any other form of bounded function, etc. In Fig. 4.8, the mapping relationship between the inputs and the outputs of the continuous process neuron is (4.58)
where (}(t) is the activation threshold of the process neuron, and is a time-dependent function. In fact, the spatio-temporal aggregation operator of the continuous process neuron can also have other peroration forms, e.g. corresponding to the input time point t, the spatial aggregation operation can adopt a maximum or a minimum operation, and an S-operator or a T-operator; the temporal accumulation operator can be a convolution, a maximum or a minimum operation, etc. in the interval [0, t). As seen from Eq. (4.58), the continuous process neuron model expresses both the spatial weighted aggregation of time-varying input signals and the accumulation of phrase time effect on inputted time-varying signals before time t, and can realize the synchronous mapping relationship between the inputs and the outputs. Taking account of spatio-temporal aggregation with several time unit delays, Eq. (4.58) can be extended and rewritten as
Feedforward Process NeuralNetworks
77
(4.59)
where b is time granularity, k is a non-negative integer, and t-kO?O. The process neuron defined by Eqs. (4.58) and (4.59) can be used to establish a complex process neural network model with multiple hidden layers, and the time-varying information flow transfers in a real-time or a lingering mode in each layer of the network in this model.
4.7.2 Continuous Process Neural NetworkModel According to a certain topological structure, some process neurons defined by Eq. (4.58) or Eq. (4.59) and other types of neurons can construct process neural networks with continuous time functions as inputs and outputs . Neurons with the same type have the same structure, share a theory and a learning algorithm and carry out the same aggregation/accumulation operation in the network. At the same time, information transfer between hidden layers of neurons should meet the needs of the input/output signal type corresponding to each kind of neuron in the network model. In order to simplify the discussion, consider a feedforward continuous process neural network model with a process neuron hidden layer defined by Eq. (4.58) whose activation function for an output layer is linear. Fig. 4.9 shows the topological structure of the network .
y(t)
Fig. 4.9 Continuou s process neural network
In Fig. 4.9, XI(t), X2(t), ... , xn(t) are continuous input functions of the process neural network; wij(t) (i=I,2, .. . ,n; j=I,2, ... ,m) is a connection weight function between an input layer node i and a hidden nodej; vit) (j=1,2,... ,m) is a connection weight function of the hidden node j to an output node which can also be a time-invariant adjustable parameter; yet) is the output of the system. According to Fig. 4.9, the mapping relationship between the inputs and the outputs of the network is
78
ProcessNeural Networks
where [0, 11 is the input process interval of time-varying signals ; f is the uniform activation function of the hidden process neurons; 8j (t) is the activation threshold function of the hidden process neuron node j.
4.7.3
Continuity, Approximation Capability of the Model
Capability,
and
Computing
Theorem 4.10 (Continuity Theorem) Suppose that two inputs to the continuous process neural networks defined by Eq. (4.60) are X(t) , X*(t)E Uc(C[O, 11r and the corresponding outputs are respectively y(t), y' (t)EZC C[O, 11. Iffis continuous, then for any e>O, there exists £5>0, when IIX(t)-X*(t)II0, as long as X(t), X*(t) satisfy (4.66) then (4.67) Because f is continuous, for any £52>0, from the randomness of £5\ , there exists
FeedforwardProcess Neural Networks
79
(4.68) Therefore
Il y(t)- y' (t)11 = II~ v/t)f( r~ wi/r)X;cr)dr-B/t))- ~ v/t)f( r~ wij(r)x; (r)dr-B/t) )11
=II~ v/t)(J(u/t))- f(u;(t) ) 11 s V~llf(Uj(t))- f(u ;(t))11 < V ·m·o
2•
(4.69) So for any 00, if the selected J>O make s J2 td. Its network structure, connection weight function , and activation threshold are the same as those of P;. Obviously , Pk is the process neural network (P k:C[tk- l, tklr---+C([tk-I> tkl) defined in [tk-I> tkl with continuous time functions as its inputs and outputs. From Eq. (4.60) and the integral mean value theorem, we have
ll: (X(k )(t» =
t
vjk) f
=
t
f( t ~
vjk)
wjjk )(T)x?)(T)dT- BY»)
(~W~k) (~k )x? )(~k )(t - tk_
(4.79) 1) -
BY»).
Sk E [tk-l .t, l,
where -: and wjjk)(t) are the connection weights (functions) and By ) is the
P;. Moreover
activation threshold value of the process neural network C;(X(k)(t» = p;(X(k)(t» +ck =
=
~ vjk)
f( t (~Wjjk)(T)Xj(k\r)
~ vjk) f (~Wjjk) (;k )Xj(k\;k )(tk - tk_
1) -
}T- B;k))+ck
(4.80) B;k») +e. . ; k E [tk _l't kl,
so
Ilc;(X(k )(t» -t; (X(k )(mil
lit (~Wjjk) -t (~Wjjk) (~k ~ lit (f (~W~k ) vjk) f
=
(;k )x? )(;k )(t k - tk_l ) - By ))
vjk) f
vjk )
-f
(t
)Xj(k)
(~k)(t - tk_
wjjk)
(~k )Xj(k) (~k )(t - tk_
By) )11
l) -
(;k )Xj(k )(;k )(tk - t k_l )
l) -
+ e,
-
By) )
BY») )11 + e..
(4.81)
82
Process Neural Networks
As/is continuous, for given d(2N), there exists 6\>0, such that
I~ W~k) (~k )X;k) (~k )(f - fk_
1) -
s I~(W~k) (~k )x? l (~k) -
OJ-
(~Wijkl (;k )x? )(;k)(fk - fk_
wijk l (;k )x? )(;k) )!'If k -
l) -
JI
OJ
fHI < 8..,
there is
liCk (X(k) (f» -
(t»II::; e / (2N),
~ (X(k)
(4.82)
here only need Ifk-fk_11 to be small enough. Next, a process neural network P with continuous functions as its inputs and outputs is constructed in the interval [0, Denote
n
n
Define an activation function as g(f, A): for any fE [0, when fE (fk-h fk], g=j( Ak) where Ak is the spatio-temporal aggregation operation result for input signals in the interval Uk-I. f] of the process neuron defined by Eq. (4.58) without activation threshold; / is the Sigmoid function, the Gauss function or any form of bounded function ; k= 1,2,.. . ,N. A process neural network P is defined like this: Wi/f) is the connection weight function between an input node and a hidden layer node; Vj(f) is the connection weight function between the hidden layer node and the output node; the activation threshold function is O/f) ; and the activation function is g(f, A), that is P(X(f» =
L Pk(X(k)(f», N
k =!
Feedforward ProcessNeural Networks
83
and then
IIG (x(t)) -
p( x(t) )11
= IIG(x(t)) - G''(x(t)) + G"(x(t)) - P(x(t))11 = IIG(X(t))-
~G:(X(k)(t))+~G:(X(k)(t))-~~ (X(k)(t))11
~IIG(X(t))- ~G:(X(k)(t))II+II~G:(X (k)(t))- ~~ (X(k)(t))11
2:IIG ( X(k )(t))_~ (x (k)(t))11 N
/e/(t),
w(t)=
/ =0
L w/e/(t),
then the following integral formula holds:
/=0
r
(5.36)
x(t)w(t)dt =fx/w/ . n=O
Proof
Thus, the proof is completed . Suppose the input process interval of the process neural network is [0,1]. Through variable substitution, all variables in the input functions and connection weight functions of the network can be transformed into variables in [0,21t], so the orthogonality and the completeness of the Fourier function system can be used directly. Consider the process neural network shown in Fig. 5.1.
xit)
y
Fig. 5.1 Narrow-sense process neural networks
The input-output mapping relationship of the network is
100
Process Neural Networks
Given K learning samples: (x:(t),x~(t), .. ·,x~(t), d k ) , k =1,2, ...,K; dk is the expected output of the system when the inputs are x:(t),x~(t), ...,x~(t) ; suppose that the actual output of the network corresponding to the kth sample input is Yk (k=I,2, ..., K) , then the error function can be defined as
Finite Fourier orthogonal basis expansion k X1
IS
implemented for the sample
(r), x~ (t), ... , x~ (t) to yield
(5.38)
where L is the number of Fourier basis function items, which satisfies the precision requirement of the input functions. The network connection weight function Wi/t) U=I,2, .. ., n;j=1,2, .. ., m) is also expressed as an expansion of finite Fourier basis functions :
L w~el (t) . L
wi/t) =
(5.39)
1=0
Substitute Eqs . (5.38) and (5.39) into Eq. (5.37), and according to the conclusion in Theorem 5.1, the error function can be simplified as:
(5.40)
The connection weight parameters of the network can be determined by adopting a learning algorithm similar to the one described in Section 5.2 .
Learning Algorithms for Process Neural Networks
101
5.4 Learning Algorithm Based on the Walsh Function Transformation The Walsh function system is a finite, complete and normal ized function orthogonal system and has two forms, namely , continuous transformation and discrete transformation [I ll. Therefore, if the Walsh function system is selected as the basis function , the learning algorithm introduced in Section 5.2 has good adaptability to the system whose inputs are analytic functions or discrete time sequences.
5.4.1 Learning Algorithm Based on Discrete Walsh Function Transformation (1) Discrete Walsh transformation
When there are N discrete sample data in the interval [0, I] (generally N=2P where p is a positive integer), the discrete Walsh transformation pair is
Xi
N -I
(
. )
(5.41)
= LXkwal k,.!:..... , i = O, I,..., N - I, N=2 P , k=O N
x, =-L x;wal I
N- I
N
i=O
( i)
k , - , k=O,I,..., N - l, N= 2 P , N
(5.42)
where walik, ilN) is the Walsh basis function and its value domain is {-I ,+ I}; k is the sequency, i is the discrete normali zed time variable, x, is the original data, and X, is the transformed data. Lemma 5.1 In the interval [0,1], the inner product of two discrete Walsh functions with different sequency is 0, that is
~ wal(j,"!-)wal (k,"!- ) = 0, j ~ k, N = 2 1=0
N
N
P
•
(5.43)
Proof According to the definition of the discrete Walsh function, wal(j,t)wal(k,t) =wal(jffJk,t) where ffJ is Xor operator. Because each Walsh function can be denoted as a linear combination of finite Haar functions, and the summary of every Haar function of the other N-I discrete points in [0,I] (except har(O,O,t)) is 0, so the lemma holds. Lemma 5.2 In the interval [0,1], the inner product of two discrete Walsh function s with the same sequence is equal to N, that is ( . ) _ p L wal 2 l-'.!:..... -_ N , j ' < - N, N - 2 . i=O N
N-I
(5.44)
102
Process Neural Networks
Proof: According to the definition, any discrete Walsh function value at i/N (i=O,I, ..., N-l) in [0,1] is 1 or -1, and becomes 1 after being squared. Therefore, the inner product of discrete Walsh functions at N different discrete points is equal to N. Theorem 5.2 For any two continuous functions x(t), w(t), suppose that the sequence value at N=2P uniform discrete points in [0,1] are respectively Xi, Wi (i=O,I, ... , 2P- l), then the following integral formula holds
r I
N- l
x(t)w(t)dt = lim L walix, )wal( w;), N N.-+oo ;=0
=2
P
,
(5.45)
where xi=x(tD, Wi=W(ti); wal(xi) and x(tD are the discrete Walsh transformation pair, i=O,I, ... ,N-l. Proof Suppose ti=i/N (i=O,I, .. ., N-I) are N=2P equal division points in [0,1]. According to the definition of an integral, we have
£x(t)w(t)dt = lim Lx(ti )w(t)L\t;. N-l
(5.46)
N -+oo i=O
In the following, Eq. (5.46) will be proved correct. According to the definition of the discrete Walsh transformation, N -I
N-I
;=0
;=0
L waltx, )wal(w) = L wal(x(t; ))wal(w(t)) -I ( = N-l( L -1 NLx(tj)wal j,":". ;=0 N j=O N
)J(
N- l -1 L w(tk)wal ( k,":". N k=O N
)J
1 N- I N- l ( . ) 1 N - IN - l N - l ( . ) ( .) =-2 LLx(t j)w(t)wal 2 j,":" +-2 LL L x(t j)w(tk)wal j,":" wal k,":" , N ;=0 j=O N N ;=0 j=Ok=OMj N N
from Lemma 5.1, we have 1 N -l N - l N - l ( N 2 ~~k=t:"j x(tj)w(tk)wal
.)
j,~
(
wal
. )
k,~
')J
1 N'} =-2 L l NL- l x(t)w(tk) (N-l Lwal ( j,":" al ( k,":" ;=0 N N N j=Ok=O.koOj
Thus
and also from Lemma 5.2, we have
=0,
Learning Algorithms for Process Neural Networks
103
where lit;=lIN. When N tends to infinity, the limit is taken to both sides and Eq. (5.46) holds. Thus, the proof is completed. (2) Learning algorithm
Next, we will derive the learning algorithm based on the discrete Walsh transformation for the process neural network using the conclusion of Theorem 5.2. As the input process interval [0,71 can be converted into [0,1] through variable substitution, we will only discuss the situation when the input process interval is [0,1]. When the input functions of the network are analytic functions, the input functions are discretized into the sequence whose length is 2P within the interpolation precision. When the input functions are discrete time data, if the length of the sequence is not 2P, the corresponding length of the sequence can be obtained by smooth interpolation . In the interval [0,1], give K learning samples with sequence length of 2P
where tl=lIN, and dk is the expected output of the system corresponding to the inputs Xkl(t/), Xk2(tD, ... , Xkn(t/) (1=0,1, . . ., 2P-I). Implementing the discrete Walsh transformation on the learning sample, we have
Corresponding to the system inputs Xkl(t/), xdt/), ... , xdtD (1=0,1 , ... , 2P-I) , the input-output relationship of the process neural network corresponding to Eq. (5.1) is
L wjjl)bl(t), L
where wij(t) =
and bl(t), b 2(t), ... , bL(t) are a group of finite basis
1=1
functions in space C[O,n Let b1(t), b2(t), ... , bL(t) be Walsh basis functions, then from Theorem 5.2, we have
104
Process Neural Networks
(5.49)
where Yk is the actual output corresponding to the kth learning sample. The error function is defined as
(5.50)
where wal(xk ;(@ (/=0,1, ... , 2P- 1) is the Walsh transformation sequence of the ith component in the kth learning sample . The learning rules for the network connection weight and the activation threshold using the gradient descent algorithm are Vj
w;j l = wij)
= vj +a0.vj ' j = 1,2,...,m;
(5.51)
+ fJ/).w~l), i = I, 2,...,n; j = I, 2,..., m; 1= 0,1' 00 " N -I ,
(5.52) (5.53)
where w~l) is the coefficient of wij (t) corresponding wal(tl )
to the basis function
in the discrete Walsh basis function expansion, and a, [3, y are learning rate
L L wal(xk;(t l ))wal(Wi) (l l)) - OJ , and then n N- I
constants . For convenience, denote ukj =
; =\ 1=0
(5.54)
(5.55)
(5.56)
The corresponding learning algorithm is described as follows. Step 1 The input functions (analytic functions or discrete sample data) are converted into discrete time sequences with length of N=2P, and the discrete Walsh transformation is implemented on the input sequences according to Eqs. (5.41) and (5.42) ; Step 2 Denote the network learning error precision bye, the accumulative learning iteration times s=O, and the maximal learning iteration times by M; Step 3 Initialize the connection weights and activation thresholds of the
Learning Algorithms for Process Neural Networks
105
network vj ' wjjl) , OJ' i=I ,2, ...,n ;j=I,2,...,m; I=O,I, ...,N-l. Step 4 Calculate the error function E according to Eq. (5.50). If E(wal(l,t)) dt=l, 1=0,1,2, ...
(5.61)
wal(l,t)wal(s,t) =wal(l EEl s,t),
(5.62)
where lEEls denotes the Xor operation of two nonnegative integers. Theorem 5.3 For any two continuous functions x(t), wet), the following integral formula holds
!x(t)w(t)dt t !x(t)wal(l,t)dt !w(t)wal(l ,t)dt. =
1: 0
Proof Let p=IEEls, then according to the operation property of Xor, we have
(5.63)
106
Process Neural Networks
p =o, l =s; { p:t:- 0, l:t:- s. From the Walsh function integral property Eqs. (5.59) and (5.60), we have
{I
.br wal(l ED s,t)dt = '
l
=S',
0, l:t:- s.
Thus, according to the definition of a continuous Walsh transformation
£x(t)w(t)dt = £(~( (! x(t)wal(l,t)dt) wal(l,t) )(~( (£ w(t)wal(S,t)dt) Wal(S,t»)))dt =
£(~~( (£ x(t)wal(l,t)dt)( £w(t)wal(s,t)dt) wal(l,t)wal(s,t») )dt
=
!(~~( (! X(t)Wal(l,t)dt)( £W(t)Wal(l,t)dt) wal(l ED s,t))}t
=
~~( (( 1x(t)wal(l,t)dt)( 1w(t)Wal(S,t)dt)) 1wal(l ED s,t)dt)
=
t( 1x(t)wal(l,t)dt £w(t)wal(l,t)dt). 1=0
Thus, the proof is completed. (2) Learning algorithm
Next, we will derive a learning algorithm based on the continuous Walsh function transform of a process neural network using Theorem 5.3. Assume K learning samples (Xkl(t), xdt), .. ., Xkn(t), dk) where k=1,2, , K, and dk is the expected output. The input sample function Xkl(t), xdt), , Xkn(t) is transformed by a continuous Walsh transformation and gives N
N
N
1=0
1=0
1=0
La~lwal(l,t), La~lwal(l,t), ...,La:1wal(l,t),
(5.64)
where N is a positive integer satisfying the precision requirement of the continuous Walsh basis function expansion; a~ is the Walsh basis function expansion coefficient of xk;(t) determined by Eq. (5.58). Suppose the continuous Walsh transform of the connection weight function wij(t) is
Learning Algorithms for Process Neural Networks
N
(
N
N
)
~ wi?wal(l,t), ~ wijwal(l,t), ..., ~ w~?wal(l,t) , j = 1, 2,....m,
107
(5.65)
where w~) is the expansion coefficient of wit) corresponding to wal(l,t). Consider a process neural network with only one process neuron hidden layer and linear activation function in the output layer. By Theorem 5.3 and the orthogonality of Walsh basis function, when the input function is Xkl(t), xdt), ... , Xkn(t), the input-output relationship of the process neural network described by Eq. (5.1) is (5.66)
Define the error function
(5.67)
In a way similar to the training of process neural networks based on the discrete Walsh function transformation, the network connection parameters Wi)I) , Vj, and the activation threshold OJ can be determined by adopting the gradient descent algorithm. The corresponding learning algorithm is described as follows . Step 1 Determine the number N of the Walsh basis function according to the input function fitting precision requirement of a learning sample set. The input functions are transformed by continuous a Walsh transformation according to Eqs. (5.57) and (5.58). Step 2 Denote the network learning error precision by e, the accumulative learning iteration times s=O, and the maximal learning iteration times M ; Step 3 Initialize the connection weights and the activation thresholds vj ' w~ ), OJ' i=I,2, ...,n ;j=1,2, ...,m; I=O,I, ...,N-l;
Step 4 Calculate the error function E according to Eq. (5.67). If Ece or sz-M, go to Step 6; Step 5 Modify the connection weights and activation thresholds according to Eqs. (5.51)-(5.56); go to Step 4; Step 6 Output the learning result and stop.
108
Process Neural Networks
5.5 Learning Algorithm Based on Spline Function Fitting Spline function fitting is a function polynomial piecewise interpolation fitting method proposed by Schoenberg in 1946 [121. The spline funct ion has a simple structure, good flexility and lubricity , and favorable approximation properties for both analytic functions and discrete time sequence functions. Therefore, the connection weight function s of process neural networks can be represented as spline functions. During network training, by learning from time-invariant connection parameters in spline function s, the process neural network whose connection weights are denoted by spline function s can gradually approximate the input-output mapping relationship of real systems to complete the training for process neural network s.
5.5.1 Spline Function Suppose there are N+l time order partition points to, th t2, .. ., tN in the input process interval [O,n where to=O, tN=T. x(t) is a time-varying function defined in [O,n and the values at time partition points are respectively xUo), XUI ), X(t2), ... , X(tN). Then in the interpolation interval [t/-ht/], the spline function is defined as follow s. Definition of linear spline function Sl(t) =
x(tl) - X(tl _l ) h (t-tl _I)+X(tl _J ) , 1= 1,2,...,N.
(5.68)
I
Definition of quadratic spline function (5.69)
Definition of cubic spline function
(5.70)
2
where M / = s;UI),M I = s;UJ,M I = Sl(tl),h l = t l -ti-JA = x(tl) -M1hl /2. The spline functions Eqs. (5.68)-(5.70) are modified according to the power of t, and expressed in the form of a polynomial, and then the forms of the above spline polynomial functions are as follows. Form of linear spline (5.71)
Learning Algorithms for Process Neural Networks
109
Form of quadratic spline (5.72) Form of cubic spline (5.73) where the polynomial coefficients in Eqs. (5.71)-(5.72) satisfy the continuity and the lubricity to some extent of spline function s at interpolation point s (i.e. the continuity of some order derivative).
5.5.2 Learning Algorithm Derivation Consider a process neural network with only one process neuron hidden layer and linear activation function in the output layer. Suppose to'!lh,." '!N are N+ I interpolation points satisfying the precision requirement of the spline function interpolation fitting of the input function s XI(t),x2(t) ,. ",XI/(t) in the system input process interval [0,71. Because Xj(t) is known, 4t) U=I,2, ... ,n) can be denoted in the form of piecewise spline functions as Eqs. (5.68)-(5 .70) (the power of the spline function may be determined according to the complexity of the input function) by mature spline fitting method s 1131 in numerical analysis, and further be modified into the form of a piecewise interpolation polynomial as Eqs. (5.71)-(5.73). The connection weight functions in the network training are also expres sed in the form of a piecewise spline interpolation polynomial. Accordingly, when the input functions and the connection weight functions are both denoted by piecewise spline functions, the input-output relationship of the network is
Y=
fVJ((fi r w~;)(t)X~S)(t)dtJ -()j)' j= 1
1=1 ;=1
(5.74)
,-,
where w~; ) (t) and xj;S) (t) are respectively the spline functions of the network connection weight function wij(t) and the input function x;(t) in the interpolation interval [t'-I,!'], and s is the power of the spline function s. Give K learning sample functions (xlk(t),x~(t), ...,x;(t),dk) where k=I ,2,.. .,K, and d, is the expected output of the system while the inputs are The input functions
k X1
k X1
(z), x~ (t), ...,
x; (t).
(r), x~ (t), ... ,x; (t) and the connection weight funct ion of the
network wit) are denoted in the form of spline fitting functions (the form of spline polynomial interpolation) in the interpolation interval [t,-J,t,], and then in the interval [t,-J,t ,], the spline interpolation polynomial function forms of X;k (t) and wij(t) are as
110
Process Neural Networks
follows. While in linear spline fitting Xi~(t) = a~/+ a~ji' 1=1,2,..., N; k=1 ,2, , K; i=1 ,2, ,n,
(5.75)
Wjj/(t) =Wljj/t +WOiji' 1=1,2,...,N; i = 1, 2, ,n; j=1,2, .m.
(5.76)
While in quadratic spline fitting
_ Zk Z+aIiJ+a Zk Zkil ' I -- I, 2,..., N ', k = 12K . I, 2, ,n, XiiZk(t ) -azilt , , , ; 1= O Wi~/ (t ) = w;ij/
(5.77)
+ w~i + W~iji' 1= 1,2, ..., N ; i = 1,2, ,n; j = 1,2, .m.
(5.78)
While in cubic spline fitting Xj~k(t)=aii~t3+a;~tZ+a~;t+a~~ , 1=1,2,...,N; k=I,2, ...,K; i=1 ,2, ,n,
(5.79)
w~/ (t)=wiij/+w;ij/+w~j/t+w~jji' 1=1,2, ...,N; i=1,2, ...,n; j=I,2, .m. (5.80) In the above equation s, W(s-r)ij/ represents the coefficient of t'" (r=0, 1,2,3) in the sth spline interpolation polynomial of wij(t) where the superscript represents the degree of the interpolation spline, the first subscript represents the t'" term in the corresponding interpolation polynomial, the second subscript is the serial number of the network input node, the third subscript represents the serial number of the network hidden layer node, and the fourth subscript represent s the corresponding in the sth spline interpolation interval [tI- J,ttl . a(:_r)iI represents the coefficient of
r:
interpolation polynomi al of x jk (r) where the first subscript denote s the f-r term in the corresponding interpolation polynomial, the second subscript denote s the serial number of the network input node and the third subscript denote s the corresponding interpolation interval [tI- l,tI]. As xjk (t ) is known, the piecewise spline function fitting form of
k Xi
(t) is
determinate, and the connection weight functions are denoted in the form of a piecewise interpolation polynomial during the network training . The network error function is defined as follows:
(5.81)
In the following , only the situation when s=2 (that is a quadratic spline function ) is derived, and the situation that s= 1 or s=3 is similar . Here
Learning Algorithms for Process Neural Networks
111
(5.82) Denote
Then Eq. (5.82) can be reformulated as
(5.83)
It can be seen from Eq. (5.83) that the error function is the function only with respect to the network parameters
Vj,
OJ and
w(s-r)ij/ '
so the network training can be
accomplished by adopting a method such as the gradient descent algorithm. The specific algorithm steps are not repeated any more.
5.5.3 Analysis of the Adaptability and Complexity of a Learning Algorithm In the learning algorithm based on spline function piecewise fitting, as the spline function has good flexility and lubricity, it will improve the nonlinear mapping ability of the input-output relationship of a process neural network when using the piecewise spline function as the network connection weight function [14] . However,
112
Process NeuralNetworks
this learning algorithm first needs to determine the proper piecewise number of input process intervals and the power of the spline function in terms of the complexity of the input functions (or the complexity of the real systems). At the same time, the input functions need piecewise spline fitting, and this increases the pretreatment process before the network training in actual applications. In addition , the number of parameters that need adjusting in the network increases exponentially with the number of piecewise interpolation intervals and the power of the spline function . If the number of the network input nodes is n, the number of the middle hidden layer nodes is m, the number of the interpolation partition points is N, and the power of the spline function is s, then the number of parameters that need determining in the network is nxmxNx(s+ l)+2m, which makes the computation during the network training increase exponentially with n, m, Nand s. Therefore, it is important to properly choose the number of piecewise interpolation points and the power of the spline function. However, simulation experiment results show that in some special practical applications, this algorithm has universal adaptability and is an effective method for training of process neural networks.
5.6
Learning Algorithm Based on Rational Square Approximation and Optimal Piecewise Approximation
In actual signal processing , a great deal of experimental sample data usually needs handling and some specific type offunction is required to express this approximately. In system modeling based on process neural networks, the type of the system input function and connection weight function have great influence on the computational complexity and functional approximation precision in network training. Therefore, how to choose a proper approximation (or fitting) function form to express the network input function and the connection weight function has important meaning for the design of the network structure and reduction in the complexity of the learning algorithm. During previous discussion of learning algorithms, the input functions and the connection weight functions of a process neural network have used an algorithm based on basis expansion. In order to achieve high fitting precision with the original curve, especially the curves of some functions with acute change, the number of basis function terms is usually large. In this section, using the favorable approximation properties of the rational function and the optimal piecewise function, learning algorithms based on the rational square approximation [15.161 and optimal piecewise approximation [17,18] are respectively researched .
5.6.1 Learning Algorithm Based on Rational Square Approximation When the deviation is measured by the sense of Chebyshev, a rational function with
Learning Algorithms for Process Neural Networks
113
lower order has high approximation preci sion when it is used to approximate a known function (discrete or anal ytic), especially for some function s with acute change. On the other hand , when approximating by a polynomial , even by a high order polynomial, a satisfactory appro ximate expression can seldom be obt ained . Moreover, the rational function has a compact form , and there are mature implementation algorithms for the approximation process. (1) Rational square approximation of the function
Denote the rational function set by 9\ m.II' The element R(x) in 9\ m.1I has the function form (5.84)
Now consider the square deviation of the approximated function jixje C[a,b] and the rational function R(x). The following two situations are considered. (a) Continuous situation (interval approximation): Suppose C[a,b] is a set made up of a continuous real function in the interval [a,b]. 9\ /11.11 is a set con sisting of the whole ration al function s with the polynomial who se degre e <m as numerator and the polynomial whose degree ; Xi , Q(x ) = ; =0
n L)jXj,
(5.85)
j=O
let 2
p (f,R )=IIJ-RI1 = [U (x)-R(x )f dx .
(5.86)
p(f,R) is referred to as the square deviation of f and R. Obviously, p(f,R) is a non-n egative real number, thus let p '(f) = inf p(f ,R) , RE'll." ,
(5.87)
then p*(f) is referred to as the minimal (rational) square deviation off If (5.88)
such that p'(f)=p(f,R \ then R*(x) is referred to as the optimal square approximation rational expression offix). (b) Discrete situation (point approximation): Suppose X={ xh1h=I ,2,.. .,N } is a
114
Process Neural Networks
point set on the real axis, and the function fix) has definition in X, i.e. a string of real numbersfi,(h=1,2, .. .,N) is given such thatfixh)=fi,(l,2, .. . ,N) . For R(X)E 9\m.m let N
Px(f,R)=IIJ-RII: = L(j(xh)-R(Xh))2,
(5.89)
h=1
and p~(f)= inf Px(f,R). RE9t"',1l
(5.90)
Then p~ (f) is referred to as the minimal square deviation ofj(x) in X. If
R'(x) = P'(x)/Q'(x) =
~a>i Itb;XJ E 9\m.n'
such that p~ (f) = Px (f,R'), then R*(x) is referred to as the optimal square approximation rational expression of the function fix) in X. Theorem 5.4 (Existence Theorem 1) Suppose that fix) is continuous in [a,b], then there exists R*(X)E 9\m.n such that
p(f,R') = p'(f) = inf p(f ,R), RE9l/ll."
where p(f,R)=sup!f-RI. Theorem 5.5 (Existence Theorem 2) R*(X)E 9\m.n such that
Suppose thatfix)EL2[a,b], then there exists
p(f,R') = p'(f) = inf p(f,R),
where p(f, R) =
Ilj- RI1
2
=
RE9lm ••
r
(j(x) - R(X))2dx . Suppose thatfix)EL2 [a,b], then there exists
Theorem 5.6 (Existence Theorem 3) R' (X)E 9\m.n such that
p(f,R')=p'(f)= inf p(f ,R), RE9lm ,.
where p(f,R) is the weighted square approximate distance that can be obtained by
p(f ,R) =11W<j _R)11 = 2
r
w<x)(j(x)-R(X»)2 dx,
where w(x) is a continuous positive function . For proof of these theorems please refer to the related references.
Learning Algorithms for Process Neural Networks
115
Next, a numerical method (Newton method) is adopted to solve the rational optimal approximation of the function. The optimal rational function approximation of Eqs. (5.86) and (5.89) is equivalent to solving the minimal problem of the following formula:
p =p(r) =p(a"a" ...,am,fJ"fJ,,···,p,l =
r[
f(x)-
~a;x'/(1 + ~Pjxj)Jdx. (5.91)
The basic idea of the Newton method lies in converting the minimization into minimizing a series of
(5.92)
h were
cj
= "j
(k)
- "j
.
Because the rational function is a ratio of two linear functions for the coefficient rj (j=O, 1,... ,m+n), it is proper to approximate it by the linear terms of a Taylor series. The computational steps to solve the optimal rational expression approximation are as follows. ,,(0) • • • ,,(0) ). Step 1 Choose a group of initial values ,,(0) = (r:(0) o ' 1 ' , m+n ' Step 2 R(r,x) , as a function with regard to the parameter r, is expressed approximately with linear terms of a Taylor series expanded at r(O) R(",X )
"" R(
(0) ) ",X
~( _ + L.., "j j=O
(0») "j
aR(,,(O) ,x) = R(
a"j
and then by the least square method, obtain achieve the minimum.
t
(0)
,x
)
~ aR(,,(O) ,x) + L..,c j a ' j=O
G
"j
that makes the following formula
The necessary condition is that the partial derivative of
116
Process Neural Networks
with respect to E: is 0, that is to solve the normal equation system m+n
:L>'j j =()
{a-:'IR( T
(0 )
aTj
)
,x .
o-:'IR(T
(0)
at,
) -:'IR( (0 ) ) ,x dx= {U(x)-R(T« »,X)r T , x dr , i=O,I, ... ,m+n;
aT;
(5.93) Step 3 Modify r(O) using the above obtained E:: r (l)=r(O)+E:. Replace rIO) with r(l) , repeat the computation course from Step 2, and iterate until the modification quantity E: is small enough (according to the required precision).. It can be seen from Eq. (5.92), that if the above iteration process is convergent (i.e. E: tends to a zero vector ultimately), R(r*,x) obtained from conv ergence does satisfy the nece ssary equation
(5.94)
Furthermore from the point of view of computation, this stationary point is certainly minimal, as there are always some directions which can make p(r) descend at the maximum point or a saddle point. Due to the influence of rounding errors, etc ., in fact , it is impossible to be steady at this point. In order to make the solving of linear Eq. (5.94) in Step 2 feasible, its coefficient determinant must not be zero , so we only need to prove the linear independence among the partial derivatives of R(r,x) about ti . Rk(T,X) =aR(T,X)jaTk
,
k
=O,l,···,m+n.
Because the Gram determinant of the linear independence function group is always greater than zero , the coefficient determinant is not equal to zero. In the above algorithm, the coefficient matrix needs calculating every iteration, so the computational load is very large . In the following, we will introduce a deformation of the algorithm, a simplified Newton method. For convenience, Eq. (5.93) is rewritten as m+n
2,hij(T(O»)£j =gJT(O»), i=O,l ,.. ·,m+ n, j =O
i.e. let
h (T(O ») = {aR(T(O) ,x) . aR(T(O ),x) dr, I}
a
aT
}
ot. I
gJT (O ») = {V(X)-R(T«»,x)) aR~:) , x) dr, I
Learning Algorithms for Process Neural Networks
117
or written in matrix form (5.95) Obviously in the above algorithm, the coefficient matrix H that Eq. (5.95) solves in Step 2 changes every iteration, i.e., a Gram matrix needs recalculating each iteration, which includes a majority of the computational load of the iteration . Actually, if the initial value i O) is properly chosen, the coefficient matrix only needs calculating in the first iteration, and H is kept unchanged in the following iterations (either H is fixed when the iteration achieve s a certain extent or H is updated by stages). In this way a so-called simplified algorithm is obtained, and it is written in iteration formula as follows
where F'(r) denotes the inverse matrix of the coefficient matrix H(r) at r, and g(r) denotes the gradient direction of g(r) at r. In this notation, the iteration format of the aforementioned unsimplified algorithm may be represented as
where F'(r(k)) changes with the alteration of k. Obviously , compared with the former algorithm, the computational quantity reduces greatly every iteration after the simplification.
(2) Learning algorithm of process neural network with the input of rational expression Consider a multi-input-single-output system with a single process neuron hidden layer and linear activation function in the output layer, the network input-output mapping relationship of the network may be denoted as (5.96)
According to the complexity of signals in system input function (discrete or analytic) space, choose a proper rational function set 9tL•P, and express the input functions in the form of a rational function in the sense of an optimal square approximation with certain fitting precision . Meanwhile, the connection weight functions of the network are also denoted by rational functions. For this process neural network with rational functions as its inputs, the network training can adopt the gradient descent algorithm. Next, a specific training course is introduced. Denote K learning sample functions : (xt(t),x~(t), ...,x:(t), d k ) (k=1,2,.. . ,K), where
d,
is
the
system
expected
output
corresponding
to
the
input
Process NeuralNetworks
118
Xlk(t),X~(t) , ...,X:(t). The input functions
x:(t),x~(t), ...,x~(t) and the network
connection weight functions wij(t) are both denoted by the rational function xjk(t)
L / =L,aj~tl 1=0
P L,bj~tP , k
=1,2,...,K; i =1,2,...,n,
(5.97)
p=o (I) 1 /
-
L wij(t)-L,w j t 1=0
( p)
p
• _
•
. _
P L,u j t , l-l ,2,...,n, j - l,2, ...,m.
(5.98)
p= O
The error function of the network is defined as follows
(5.99)
where aj~
is the polynomial expansion coefficient of the rational expression
numerator part of
x:(z).
bj~ is the polynomial expansion coefficient of the rational k
expression denominator part of x j (t) . The learning rule for the connection weight and activation threshold of the network according to the gradient descent algorithm is as follow s Vj=vJ+a!lVj,
j= 1,2,...,m, 1, 2, ... ,m,. 1=1,2,...,L,
(5.100)
2, •••" n '
(5.102)
W (/ ) --
flA (/ ) , 1• -1 - , 2, ... , n,. j ' -wij(/) +PL1W j
U ij( p ) --
u(p) ij
j
+ ALl 1 ·u ( p ) ij '
I' -1 ,
j ' -12m' , , •••, ,
p=I,2, ...,P,
(5.101)
(5.103) where a, p, A, y are the learning rate constants. Denote
Then (5.104)
Learning Algorithms for Process Neural Networks
119
(5.105)
(5.106) (5.107)
The learning steps in network training are as follows : Step 1 Choose a proper rational function set 9iL ,P as the input function space for the network . The input functions and connection weight functions of the network are denoted in the form of rational function as Eqs. (5.97) and (5.98); Step 2 Denote the learning error precision of the network e. the accumulative leaning iteration times s=O, and the maximal leaning iteration times M; Step 3 Initialize the connection weights and activation thresholds of the network vj,w;j) ,u;jP) ,B j , i=I,2,..., n; j=I ,2,...,m ; 1=1 ,2,..., L; p=I ,2, ...,P; Step 4 Calculate the error function E from Eq. (5.98), and if Ece or sz-M, go to Step 6; Step 5 Modify the connection weights and activation thresholds according to Eqs. (5.100)-(5.1 07); s+ l-+s; Go to Step 4; Step 6 Output the learning result and stop. The computation of Eqs. (5.104)-(5.107) is rather complicated, it is better to write a pretreatment program (function) by adopting integral or numerical computation methods aiming at these computational formulas. After the initial values of the connection weights and the activation thresholds are given, the modification values of various parameters can be calculated first by calling this pretreatment program (function) each iteration, and then substituting into the network for training .
5.6.2
Learning Algorithm Approximation
Based
on
Optimal
Piecewise
Generally, function approximation or fitting can be described as follows : suppose that D is a point set in any dimension space,
g(P ; AI ' ~ ,..., An)
is a
parameter-varying function (when the parameter values are in a subset of n-dimensional space , it actually denotes a group of functions or function class
120
Process Neural Networks
defined in D, referred to as an approximation function) depending on a group of parameters A I,A2, ... ,An defined in D (i.e. PED) . To solve function approximation is to find out that for a function fiP) (PE D) defined in D, whether exists a group of parameter values Aj=AjO (i=1,2, .. .,n) such that the distance (or deviation) between g(P;Ao,~, ...,A~) andfiP) achieves the minimum .
At present, most researches on function approximation discuss the approximation using functions belonging to the function family defined with a unified analytic expression in the whole domain, and all the parameters in the approximation function are embodied in the approximation formula itself. However, in practical applications, the actual measured signal may change acutely under some conditions. When the unified form of function is used to approximate, there are usually many difficulties in the selection or construction of an approximation function form, and it may generate great deviation during fitting. We studied the "optimal piecewise approximation" method [17, 18J, i.e. the approximation function is defined piecewise by some analytic expressions; here some parameters in the approximation function are used to express how to divide the domain . Discussion of optimal piecewise approximation not only has special meaning for engineering technology, but also has great value from the viewpoint of reducing computational complexity. As piecewise approximation is adopted, in order to achieve the same fitting precision, the expression that defines the approximation function on every subsection may be simpler, or even may be a linear function. When calculating a function value, if variables can be determined to belong to some subinterval, it is easier to compute than using a complex formula over the whole interval uniformly. As usual, some logic operations used for judgment are much quicker than arithmetic operations in a computational program . Therefore, when the input signals are complicated, it can greatly reduce computational complexity during network learning if the optimal piecewise approximation method is applied to the construction and the training of process of the neural network. (1) Piecewise approximation of the function Suppose lJ'! (x),q)ix),... ,lJ'm(x),,,, is the continuous function sequence defined in the interval [a,b], and is linearly independent in any subinterval (a,,8)~[a,b], i.e. for any m
positive integer m, if there is Z> jlJ'/x)=O in xE(a,,8), then Cj=O (j=1,2, ... ,m) j =1
certainly holds where Cj is a real number. Any linear combination m
P(x) = :~:>j lJ'/x) (where Cj is a real number) j=!
is
referred
to
{~CjlJ'/X*j
E
as
an
m-order
generalized
R} are denoted as n;
polynomial,
and
the
whole
Learning Algorithms for Process Neural Networks
121
For a fixed positive integer n, consider the function with the following form
=P;(X)= ICijlJ.'/x), whenxE (Xi_1' x), i=I,2, ....n,
(5.108)
j =1
where Pi(X)EHm; XO<XI< ' ''<Xn-l<Xm xo=a, xn=b; and (Xi-I,xi) do not intersect with each other. The function ,!,<x) of this form is referred to as the function which can be divided into n segments by an m-order generalized polynomial defined in the interval [a,b], and the all of these functions are denoted as Hm(n). Therefore, Hm(l)=Hm. In the following, Hm(n) is especially denoted as Pm(n) while Hm=Pm(referred to as the function class which can be divided into n segments by (m-1)th polynomial to be defined). Next, we will discuss optimal approximation in the function class Hm(n). It can be seen from the definition of the function ,!,<x), it takes the interval's partition points Xi (i=1,2,... ,n-l) and Pi(x)'s coefficients cij (i=I,2, ... ,n; j=I,2, ... ,m) as parameters. So, differently from the approximation function considering the approximation in the whole interval , the parameters of the function are embodied not only in the formula (as the coefficients of Pi(x», but also on the subsection for the approximation interval. Definition 1 Suppose thatf(x) is an arbitrary function defined in [a,b], ,!,<x)EHm(n), then (5.109)
is referred to as the deviation between fix) and ,!,<x) in the sense of Chebyshev .
Definition 2 If there exists I/fe(X)E Hm(n) '1/, (x) = P;, (x) , when x E (xi _1' x)
(i = 1,2,..., n)
such that the relation Eq. (5.109) holds, then I/fe(x) is referred to as the optimal piecewise approximation of fix) in [a,b] belonging to Hm(n) (in the sense of Chebyshev). En(a,b) is referred to as the optimal piecewise approximation deviation of fix) in Hm(n) . The partition for the interval [a,b] corresponding to I/f;,(x): a=xO<xI1 is an arbitrary positive number, and then when the selection of connection weight basis function satisfies Eq. (6.29), Eq. (6.2) is integrable. At the same time, Eq. (6.20) is asymptotically steady, that is,
FeedbackProcessNeural Networks
135
limu .(t ) = O. t ~ oo
J
In Theorem 6.2, it is easy to understand that IXj(t) I must be bounded, but the assumption that luit)1 is bounded depends on the condition of the system. In fact, this assumption is generally satisfiable in practical application.
6.2 Other Feedback Process Neural Networks According to the spatio-temporal aggregation mechanism of process neurons and their structures and information transfer modes of feedback process neural networks, we can construct different feedback process neural network models to satisfy different demands.
6.2.1 Feedback Process Neural Network with Time-varying Functions as Inputs and Outputs [6] The structure, the spatio-temporal aggregation mechanism and the information transfer flow of this feedback process neural network are shown in Fig. 6.2.
XI (t) - - - - ; j H - 1 - - + (
. . 0'.--_-
k-""---....
y(t )
Fig. 6.2 Feedback process neural network with time-varying functions as inputs and an output
The hidden-layer process neurons in Fig. 6.2 process input signals using weighted aggregation, integral from 0 to t, activation output, etc. The output neuron is a process neuron that just includes spatial weighted aggregation . The transfer relationsh ip between input and output signals of various layer nodes in the network is as follows. The input to the system is X(t)=(x\(t),X2(t),.. . ,xn(t)) for tE [0,1], where [0,1] is the input process interval of the system. The output of a hidden layer process neuron node is
136
Process NeuralNetworks
f[t( r
"/,)=
w,(lz,... ,lp), dk) for k=1,2, .. .,K. Assume that the real output corresponding to the kth learning sample of the input system is Yk. The error function of the network is defined as
Multi-aggregation Process Neural Networks
151
where a;~k) is the coefficient of Xki(tth, ...,tp) corresponding to b/(tth, ... ,tp) in the basis function expansion. According to the gradient descent algorithm, the learning rule of the network connection weights and the activation thresholds is Vj = v j +tmv j , j = 1,2,....m,
(7.14)
w;j) =wy)+/3t:.wy), i=I,2, ...,n;j=I,2, ..., m; 1=1,2,...,L, OJ =OJ+f6,Oj,j=I,2, ....m,
(7.15) (7.16)
(7.17)
0= 0+ 77t:.O,
where a, /3, rand 77 are learning rate constants. Denote
then LlV j
aE K , =--a =-2L(g( Zk)-dk)g (Zk)!(U kj) ,
t:.w;j/) = -
vj
a~)
= -2t(g(Zk) -d k )g'(Zk
aE
~(
)VJ'(uk)a;~k) ,
),
,
t:.Oj=-aO =-2f;t g(zk)-d k g(zk)vJ(ukj)(-I), j t:.O
(7.18)
k=1
aE =-- =-2L(g(Zk) -dk )g'(Zk)(-1) . K
ao
If the activation functions
(7.19)
(7.20)
(7.21)
k=1
!
and g are both Sigmoid functions, then
f(u)=ftu)(l-ftu)).
The specific learning algorithm is described as follows. Step 1 Choose basis functions bt(tJ,tz,.. .,tp), bz(tth, ... ,tp), ... , bL(tth, ... ,tp) in input function space. The input functions and the network connection weight functions are expressed in the expansion form of this group of basis functions; Step 2 Give the error precision e, the number of accumulation learning iteration times is s=O, and the maximal number of learning iteration times is M; Step 3 Initialize the network connection weights and the activation thresholds Vj' w;j/), OJ' 0 (1=1,2,..., L ; j = 1, 2,...,m; 1=1,2, ...,L);
Step 4
Calculate the error function E according to Eq. (7.13), and if Ece or
152
Process NeuralNetworks
sz-M, go to Step 6. Step 5 Modify the connection weights and the activation thresholds according to Eqs. (7.14)-(7.21); s+1~s; Go to Step 4. Step 6 Output the learning result and stop. The input of multi-aggregation process neural networks may be a multivariate analytic function or discrete sample data that depend on the multidimensional process. Therefore, the basis function of a multivariate input function can adopt functions such as multivariate polynomials that are suitable for both analytic function expansion and discrete process data fitting.
7.3.2 Learning Algorithm of Multi-aggregation Process Neural Networks with Multivariate Functions as Inputs and Outputs In Eq. (7.6), the multivariate connection weight function v/tlh, ... ,tp) and the activation threshold function f1(tlh, ,tp) are expressed in the expansion form of the basis functions b1(tJ,tz, ... ,tp), bZ(tlh, ,tp), ... , bLCtJ,tz, ... ,tp). L
(7.22)
v/tl't2,...,t p) = "f.vj/lb[(tl't2, ...,tp), 1=1
L
(7.23)
OJ(t"t2,..·,tp) = "f.Oylbl(tl't2,..·,tp). 1=1
Substitute the basis function expansions of xltJ,tz, ... ,tp), Wij(t,,(z ,.. .,tp), V/t l,(z,... ,tp) and f1(tlh ,... ,tp) into Eq. (7.6), then the input-output mapping relationship of the network can be expressed as y(tl't2,...,t p) =
~(t vYlb, (t" t2,...,t p)J
Xf( £' £'... £'
x(t
-t
t(
taA(TI' T2 , .. · , Tp )
wijlb/(TI' T2 , ... , Tp)JdT1d T2 •..
1=1
Oji)b/(tl' t2,...,tP ) }
which can be simplified to yield
dr,
J
(7.24)
Multi-aggregation Process Neural Networks
y(t"t2,...,tp) =
t(t n
xf (
153
vY)b/ (t"t2,...,tp)) L
L
t;t;~aiSw~/)
l'1'...1'b/(t""t"2, ...,t"p)
xbs (t""t"2 ,..., t"p)d t"ld t"2 ...d t"p -
t
/=1
(7.25)
Bj/)b/ (t"t2,...,tp)) .
Denote
then Eq. (7.25) can be simplified as
(7.27)
Xkn(tlh,.oo,tp) , Denote K learning samples (Xkl(tlh,oo.,tp), Xdtlh ,oo .,tp), dk(tJ,tz, oo .,tp)) for k=1,2,oo .,K. Suppose that the real output of the network corresponding to the kth learning sample of the system is Yk(tlh, ... ,tp), and the error function of the network is defined as 00"
1 K E = - :LIIYk(t"t2, ...,tp)-dk(t"t2, ...,tp)11
K
k =l
t.[ r 1' ·J'(t,( t v)"b, "" ,',))r(to t.t.a~"w¥' -t. o;"b, ,',»)-d. J r'
=~
(I,
B, (I,,I, ,. .,1,)
1
(1,,1,....
(I,,I, ,...,1,)
dr.dr,...dr,
(7.28)
where aj\k) is the coefficient of Xk;(tlh, oo .,tp) corresponding to brt.tlh, oo .,tp) in the basis function expansion.
Divide the input process interval [O,Tp] into Kp equal parts, and denote respectively the interval division points as t~,t~, ...,t; (p=1,2, ...,P). Choose an arbitrary P-variable division point O~lpg(p.
(t:' ,t~2 ,...,tj) in [O,Tdx[O,Tz]x ...x[O,Tp ] where
Reformulate Eqs. (7.7), (7.8), (7.22) and (7.23) as
154
Process Neural Networks
Xi (t:' ,t~
L
,...i;) = LaA (t:' ,t~ ,...,t~) , i = 1,2, ...,n,
(7.29)
1=1 L
W ij
" W ij(I )b1 (till ' t2l2, ••. , tip). P - L..J P , I - 1, 2, . .. ,n .,}. -- 1, 2 ,...,m, (tIil' tl22 , •.• , tip)
(7.30)
1=1 L
Vj
" (I)bI (tilI ' t2lz , .. . , tip) 2 , .. . , tip)P - L..Vj P , ), --1, 2, ... .m, (tlil' tl2
(7.31)
1=1
L
OJ(t:' ,t~ ,..., t~ ) =Lff;l)bl (t:l ,t~ ,...,t~), j
=1,2, ...,m.
(7.32)
1=1
Substitute Eqs. (7.29)-(7.32) into Eq. (7.28), and then the error function of the network is
The basis function b/(tl,(z, ,tp) is a known function in the multidimensional process interval [O,T!lx[O,Tz]x x[O,Tp], so the value at each multidimensional division point of B/s(tt.tz, ... ,tp) in [O,TIlx[O,Tz]x ... x[O,Tp] can be figured out by integration or a numerical computation methods. By a gradient descent algorithm, the modification rule of the network connection weights and the activation thresholds is the same as in Eqs. (7.14)-(7.16). For the convenience of illustration, denote n
L
L
"""
L
(k)
L..L..L..ai/
(I )BIs (tll' ' tlzZ , ... , tip)" wij p - L.. O(l)b j I (til I ' tl2 Z , .. ., tip) P
i =1 1=1 s=1
From Eq. (7.33), we have
1=1
Multi-aggregation Process NeuralNetworks
155
During the training of multi-aggregation process neural networks, whose inputs and outputs are both multivariate process functions , the computation course is very complex. When the division points of a multivariate process interval are determined, the values of b, (t~1 ,t? ,...,tl) , Bis (t~l ,t;2,...,tl),
T
TI-P, K P
p=l
etc. can be predetermined by
p
integral or numerical computation. Then the network is trained according to Eqs. (7.14)-(7.16) and Eqs. (7.34)-(7.36).
7.4 Application Examples Example 7.1 Classification problem of binary process signal Consider the following three classes of classification problems of binary process signals with 2-dimensional input. The first class of process signal is (tlsin((altl+aZtZ)1t), tzcos(a3tltZ» ; a J,azE [0.5,0.7], a3E [1.0,1.2]; The second class of process signal is (ttsin((bltl+bztz)1t) , tzcos(b3t(tz» ; bJ,bzE [0.75,0.95], b3E [1.2,1.3]; The third class of process signal is (tlsin((cltl+CZtZ)1t), tzCOS(C3tltZ1t»; CJ,CzE [1.0,1.2], C3E [1.3,1.5]; Here the binary process variable s (tlh)E [O,l]x[O,I], a.; b.; Ci U=1,2,3) are the signal
156
ProcessNeural Networks
parameters. In the first class of signals, arbitrarily choose 15 triples, whose parameters at. az are in the interval [0.5,0.7] and a3 is in [1.0,1.2], to constitute 15 sample functions of which 10 are used as the training set samples and 5 as the test set samples. Similarly, generate 10 training sample functions and 5 test sample functions for each of the second and the third classes of signal sets. Thirty 2D binary functions constitute the network training set and 15 constitute the test set. Suppose that the expected output of the first class of signals is 0.33, the second is 0.67 and the third is 1.0. The multi-aggregation process neural network denoted by Eq. (7.5) is used for the binary process signal classification. The structure of the network is 2-5-1, which means there are 2 input nodes, 5 multi-aggregation process neuron hidden nodes and 1 output node in the network. The basis function adopts a 5-order binary polynomial function. The learning rate constants of the network a, /3, rand 1] are respectively 0.50, 0.63, 0.60 and 0.50. The learning error precision is 0.05, and the maximal iteration number is 5000. The network is trained repeatedly 20 times and in each training, the first values are initialized. The network is convergent after 359 iterations on average and Fig. 7.4 shows one of the iteration error curves for one learning course. Fifteen function samples of the test set are classified and recognized respectively by 20 training results in which 5 models (the training result of multi-aggregation process neural networks for one time) are all well-judged, 5 models have 14 well-judged results, 7 models have 13 and 3 models have 12. The mean correct recognition rate is 90.67%. The result of the experiment shows that multi-aggregation process neural networks have adaptability for the classification of multivariate process signals. 1.0 ....
o t:: II)
~
'2 ....
0.5
~ II)
...:l
o
o
100
200 300
400 500
600
Iterations Fig. 7.4 Iteration errorfunction curve
Example 7.2 Dynamic process simulation of primary oil recovery in oilfield exploitation In the course of oilfield exploitation, oil production during the phase of primary oil recovery (natural exploitation depending on original reservoir energy) rests with the dynamic distribution situation p(x,y,z,t) of the reservoir pressure around the well bore.
MUlti-aggregation Process Neural Networks
157
Here, (x.y.z) is the space coordinate of an arbitrary point in the reservoir whose coordinate origin is the midpoint of the thickness of the reservoir, and t is the reservoir exploitation (perforation oil recovery) time. Oilfield development is a nonlinear dynamic system that obeys non-Newtonian fluid percolation laws. After the oil well is perforated to produce oil, the oil well is used as the center to form a pressure funnel in the reservoir. The nearer to the well bore, the smaller the reservoir pressure is; the further from the well bore, the higher the reservoir pressure and the closer to the original reservoir pressure. The fluid (oil, gas and water) in the reservoir pore flows into the well bore and is recovered under the effect of pressure difference. With the extension of exploitation time, the reservoir producing region (the oil drainage radius) is enlarged gradually and the reservoir pressure falls continuously at the same time. In the situation of a heterogeneous reservoir (where geology, physical properties, oil-bearing characteristics and thickness distribution change little), the reservoir pressure is distributed radially using the well bore as a center 16 ,71• Here, p(x ,y,z,t) can be written as p(t,r), where p(t,r) is the reservoir pressure at a point r away from the well bore at time t and dynamically changes with the change of oil well exploration time and the well spacing r. The North Saertu development experimental area of the Daqing oilfield is a large channel sand body sediment and the PH reservoir in the western area is an approximate heterogeneous reservoir. In the period of early oilfield development , 13 wells with 2500 m well spacing were placed in the PH reservoir. The wells did not affect one another during the primary oil recovery stage. Because this region was an oilfield under development as an experimental area, the changes in reservoir pressure around every well and oil production were tested and recorded in detail by the well test method during exploitation. Cumulative oil production and pressure variations over a 15-day period were recorded from one of the oil wells and are shown in Table 7.1. The actual measured data of the above 13 oil wells is fitted with a precision of 0.01 by the adoption of a 3-order binary polynomial function. The actual measured data of 11 wells are used to constitute the training set and the actual measured data of two wells is used to constitute the test set. A multi-aggregation process neural network with multivariate process functions as inputs and outputs represented by Eq. (7.6) is used to simulate the primary oil recovery dynamic process of the oil well. The topological structure of the network is 1-5-1, i.e. 1 input node, 5 multi-aggregation process neuron hidden layer nodes and 1 output node. The basis function is chosen as a 3-order binary polynomial function. The input of the network is the reservoir pressure p(t,r) and the output is the cumulative oil production Q(t) changing with time. The oil drainage radius R can be calculated in the laboratory from core analysis data and the parameters of the reservoir such as porosity, permeability, pore structure, etc. according to theoretical formula. The oil drainage radius of the PH reservoir in the western area is 750 meters. Eq. (7.6) can be specifically reformulated as
Process NeuralNetworks
158
Table 7.1 Variations of formation pressure (MPa) around the oil well and oil production over time Time (d) 0 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15
o' 500 12.75 12.75 10.26 12.71 9.32 12.65 8.75 12.46 8.21 12.01 7.79 11.86 7.45 11.37 7.21 11.21 6.87 10.93 6.49 10.82 6.23 10.63 6.05 10.52 5.81 10.35 5.63 10.21 5.47 10.15 5.33 9.90
50 550 12.75 12.75 10.97 12.75 10.14 12.72 9.55 12.60 9.07 12.27 8.67 12.03 8.28 11.51 8.13 11.32 7.79 11.21 7.55 11.05 7.27 10.81 7.00 10.70 6.75 10.56 6.59 10.42 6.50 10.35 6.43 10.18
Formationpressure (MPa) under different spacing 150 200 400 100 250 300 350 750 600 650 700 12.75 12.75 12.75 12.75 12.75 12.75 12.75 12.75 12.75 12.75 12.75 11.33 11.56 11.71 11.83 12.13 12.32 12.45 12.75 12.75 12.75 12.75 10.42 11.13 11.32 11 .66 11.87 12.05 12.31 12.75 12.75 12.75 12.75 10.13 10.65 11.00 11.23 11.47 11.71 11.97 12.70 12.75 12.75 12.75 9.89 10.23 10.68 11.01 11.20 11.43 11.61 12.38 12.55 12.61 12.75 9.33 9.89 10.27 10.65 10.85 11.21 11.49 12.22 12.41 12.58 12.67 9.05 9.77 10.03 10.26 10.51 10.73 10.94 12.00 12.21 12.36 12.58 10.16 10.32 10.51 10.70 8.73 9.38 9.87 11.73 11.89 12.27 12.51 10.00 10.25 10.43 10.57 8.65 9.17 9.64 11.50 11.67 12.12 12.48 8.31 8.75 9.25 9.70 10.00 10.24 10.47 11.39 11.62 12.05 12.43 8.10 8.58 9.10 9.54 9.71 10.15 10.32 11.27 11.53 11.90 12.36 7.75 8.37 8.95 9.30 9.55 9.80 10.15 11.03 11.45 11.80 12.30 7.60 8.21 8.75 9.12 9.47 9.72 10.00 11.00 11.32 11.67 12.25 7.41 8.05 8.60 9.07 9.23 9.47 9.73 10.78 11.25 11.76 12.21 7.35 7.75 8.70 8.87 9.15 9.31 9.65 10.70 11.19 11.70 12.17 7.31 7.52 8.61 8.65 8.97 9.25 9.40 10.34 11.03 11.53 12.15
450
QU) (rrr')
12.75
0.00
12.61
31.20
12.46
57.62
12.23
80.70
11.82
101.46
11.67
116.35
IU5
131.60
10.92
145.78
10.71
159.26
10.63
172.10
10.50
184.36
10.35
196.85
10.20
208.30
10.03
219.06
9.90
229.97
9.71
240.35
'Spacing (m); Q(t) is the cumulativeoil productionof the oil well
Q(t) =
!.b
5 ('750 w/r,r)p(r,r)drdr-B,) Lv/t)f J=I
tE [0,15].
(7.37)
The learning rate constants a, /3, yof the network are respectively 0.45, 0.53 and 0.50. The learning error precision is 0.05, and the maximal iteration number is 3000. For 11 learning samples , the network is convergent after 273 iterations . Oil production of 2 test wells is predicted from the reservoir pressure of 2 test wells. Table.7 .2 shows a IS-day prediction result. The prediction result satisfies the analysis requirement of actual problems. The dynamic model of primary oil recovery in oilfield development established by multi-aggregation process neural networks retains various geological properties
Multi-aggregation Process Neural Networks
159
and percolation rules for the reservoir in the structure and property parameters of multi -aggregation process neural network. It can be applied to other oilfields of the same type and can instruct oilfield production and the formulation of oilfield development schemes. Table 7.2 Oil production predictionresult of oil well (unit: nr' ) Time (d) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Real value 35.5 61.3 83.7 103.3 119.2 134.5 157.3 169.1 181.4 193.1 203.8 214.3 225.2 235.5 246.3
Test well 1 Prediction value 34.1 59.5 81.6 101.7 121.0 135.3 158.4 172.3 183.9 194.6 204.7 214.8 224.5 235.1 245.7
Absolute error 1.4 1.8 2.1 1.6 2.2 1.2 1.1 3.2 2.6 1.5 0.9 0.5 0.7 0.4 0.6
Real value 32.6 60.3 82.1 103.2 117.9 135.4 158.2 170.3 184.6 196.2 207.9 218.7 230.5 241.3 252.0
Test well 2 Prediction value 31.1 59.5 81.6 101.7 121.0 138.3 159.4 172.4 183.9 195.6 206.7 216.8 228.7 239.1 249.7
Absolute error 1.5 0.8 0.5 1.5 3.1 2.9 1.2 2.1 0.7 0.6 1.2 1.9 1.8 2.2 2.3
7.5 Epilogue In this chapter, aiming at an information processing problem where the system input is a multivariate process function as well as a multidimensional process signal, we establish a general model of multi-aggregation process neural networks and a multi-aggregation process neural network model whose inputs and outputs are both multivariate process functions . Multi-aggregation process neural networks can consider sim ultaneously the influence of jo int action and the multivariate process effect accumulation of multiple process factors for complex systems, and have direct modeling and information processing ability for complex systems in multidimensional process space . Therefore, multi-aggregation process neural networks have good adaptability for much actual signal processing related to multivariate process factors and nonlinear system modeling. The computational results of actual application problems confirm the aforementioned conclusions. The information mapping mechanism and the training process of multi-aggregation process neural networks are complex, so it is nece ssary to continue to study this using highly efficient and steady learning algorithms. In addition, the author thinks
160
ProcessNeural Networks
that similar to univariate process neural networks, the solutions to theoretical problems of multi-aggregation process neural networks such as functional approximation ability, computing capability, continuity, etc. ought to be certain , and this should in theory support the effectiveness of multi-aggregation process neural networks in practical applications.
References [1] Chang T.e., Chao R.I. (2007) Application of back-propagation networks in debris flow prediction. Engineering Geology 85(3-4):270-280
[2] Mo X., Liu S., Lin Z., Xu Y., Xiang Y., McVicar T.R. (2004) Prediction of crop yield, water consumption and water use efficiency with a SVAT-crop growth model using remotely sensed data on the North China Plain. Ecological Modeling 183(2-3):301-322 [3] Panakkat A, Adeli H. (2007) Neural network models for earthquake magnitude prediction using multiple seismicity indicators. International journal of neural systems 17(1):13-33 [4] Peng X., Wang W.e., Huang S.P. (2005) Monte Carlo simulation for chemical reaction equilibrium of ammonia synthesis in MCM-41 pores and pillared clays. Fluid Phase Equilibria 231(2):138-149 [5] Xu S.H., He X.G. (2007) The multi-aggregation process neural networks and learning algorithm. Chinese Journal of Computers 30(1):48-56 (in Chinese) [6] Song K.P. (1996) Oil Reservoi r Numerical Simulation . Petroleum Industry Press, Beijing (in Chinese) [7] Wang K.I., He B., Chen R.L. (2007) Predicting parameters of nature oil reservoir using general regression neural network. In: International Conference on Mechatronics and Automation pp.822-826
8 Design and Construction of Process Neural Networks
As a kind of functional approximator, process pattern associative memory machine and time-varying signal classifier, process neural networks have broad applications in modeling and solving various practical problems related to the time process or multivariate process. For example, Ding and Zhong used a wavelet process neural network to solve time series prediction problems [1,2], and used a parallel process neural network to solve the problem of aircraft engine health condition monitoring [3J• Zhong et al. used a continuous wavelet process neural network to solve the problem of monitoring of an aero-engine lubricating oil system [4]. Xu et al. used a process neural network and a quantum genetic algorithm to solve oil recovery ratios [5]; Song et al. used a mixed process neural network to predict the chum in mobile communications [6]. In order to solve practical application problems, we must design and construct corresponding process neural networks in terms of concrete problems, including the choice of the network model, the determination of the number of hidden layers and hidden nodes, the selection or the design of the neuron type in each node layer (including the choice of activation function, etc.), and the design of corresponding learning algorithms and parameters, etc. In this chapter, according to the application background and demands of different practical problems some practical process neural network models with different mapping mechanisms are constructed, such as process neural networks with double hidden layers, discrete process neural networks, cascade process neural networks, feedback process neural networks, self-organizing process neural networks, etc. The learning algorithms and corresponding application examples for these models are also provided in this chapter.
8.1 Process Neural Networks with Double Hidden Layers Considering the adaptability of process neural networks regarding their nonlinear
162
Process Neural Networks
transform capability for handling complex time-varying systems, we can construct a process neural network with double hidden-layers, which combine s process neurons with common time-invariant neurons, for many practical application problems [7]. The model consists of four layers, i.e. an input layer, a process neuron hidden layer, a time-invariant neuron hidden layer, and an output layer. The process neuron hidden layer accomplishes the extraction of the procedural pattern characteristics of time-varying input signals , the spatio-ternporal 2-dimensional aggregation, etc. The time-invariant neuron hidden layer is mainly used to improve the mapping ability for the complex input-output relationship of the system and enhance the flexibility and knowledge memory capability of the network .
8.1 .1 Network Structure For convenience of discussion, suppose that the process neural network with double hidden layers is a multi-input-single-output system and its topological structure is n-m-K-I. In fact, it can be easily extended to the multi-input-multi-output situation . The input layer of the network consists of n node units for inputting n time-varying functions x\(t).xz(t), . .. ,xn(t) to the network. The first hidden layer consists of m process neuron nodes to complete the spatial weighted aggregation and the time process effect accumulation operation of n input functions and the extraction of the procedural pattern feature and the transform relationship of function samples. The second hidden layer consists of K common time-invariant neuron nodes to improve the nonlinear mapping capability of the complex input-output relationship of the system. The fourth layer is an output layer that includes a time-invariant neuron node to complete the system output. The topological structure of the network is shown in Fig. 8.1.
X2(t) _
y
Fig. 8.1 Process neural network s with double hidden layers
The input-output relationship among the layers of the network is as follows . The system input is X (t) = (Xl (t ), X 2 (r), ..., X n (t )), t e [O,T].
Design and Construction of Process Neural Networks
163
The output of the first hidden layer is
(8.1)
where
yjl)
is the output of the jth process neuron in the first hidden layer; wilt) is
the connection weight function between the input node and the first hidden layer node; Bj') is the output activation threshold of the jth process neuron in the first hidden layer; [0,11 is the system input process interval, and function in the first hidden layer. The output of the second hidden layer is ( I)
(Z ) _
(Z)
m Yk - g [ L>jkY j -Bk
)
'
_
k -1,2,..., K
f
is the activation
(8.2)
J= !
where
y?J
is the output of the kth neuron in the second hidden layer; Vjk is the
connection weight between the first hidden layer and the second hidden layer; BiZ) is the output activation threshold of the kth neuron in the second hidden layer; and g is the activation function in the second hidden layer. The system output of the output layer is K
Y = L,ukY?) '
(8.3)
k =1
where Y is the output of process neural networks; ,uk is the connection weight from the second hidden layer to the output layer. Combining Eqs. (8.1)-(8.3), the input-output relationship of the system is
(8.4)
8.1.2 Learning Algorithm Suppose that b\(t),bz(t), . ° o,bdt) are a group of standard orthogonal basis functions satisfying the required input function fitting precision. The input function Xi(t) and the weight function wij(t) are expanded with b1(t),bz(t), ... .bL(t) in the following forms
164
Process Neural Networks L
Xi(t) = L,aab,(t), [= 1
L
wi/t) = L,wy)b[(t), 1=1
Substitute these functions into Eq. (8.4), then the input-output relationship of the system can be expressed as
According to the orthogonality of basis functions, the above equation can be simplified to
(8.5)
For convenience, suppose that the activation functions of various layers are all Sigmoid functions , i.e.ftu)=g(u)=1/(l+e-U ) . Give P learning sample functions (Xpl(t),xp2(t), ... ,xpn(t),dp) for p=1,2,... ,P in which the first subscript of Xpi(t) denotes the serial number of the learning sample and the second denotes the serial number of the input function vector component; dp is the expected output of the network corresponding to the input Xpl (t),xp2(t), .. .,xpnCt). Suppose that YP is the actual output of the network corre sponding to the input Xpl(t),xpzet),...,xpn(t), the learning error function is defined as
where ai~P) is the coefficient in the expansion of the function Xpi(t) corresponding to the basis funct ion bl(t). By adopting the gradient descent algorithm, the modified formula of connection weights and activation thresholds of process neural networks are
+ asu, ' vjk = vj k + /3/iv jk ,
(8.8)
W (l ) IJ
+ ,A w(l) , I~ IJ
(8.9)
= 0 (1)
+ 1]/i0(1) ,
(8.10)
= Ilk
Ilk
=
w(l) IJ O(l ) J
oi
J
2
)
=
J
0i + UOi 2
)
2
),
(8.7)
(8.11)
Design and Construction of Process Neural Networks
165
where a, /3, y, 1] and A, are the learning rate constants of the network. Denote n
L
a(p) -
" " W(l) IJ II
L.J L.J
()(l) = U. J
JP ,
i=1 1=1
then
aE
P
'J.4
p=1
!¥4 =--a =-2L:(J.4g(zkp)-dp)g(Zkp)' P
,
= -2 ~(llkg(Zkp)-dp)Ilkg (zkp)!(u jp)'
(8.13)
a~) = -2t~(llkg(Zkp) -d p )llkg'(zkp)vjd'(ujp)ai\P),
(8.14)
Llv jk = Llwijl) = -
aE
av
(8.12)
jk
ee
-ff-(
)"
Ll()j(l ) =- aBjI) =-2~f:t Ilkg(zkp)-dp Ilkg (zkp)Vjd (ujp)(-l),
ot:
-f( Ilkg(Zkp)-d p) Ll()k(2) =-a()?) =-2~ Ilkg'(zkp)(-l),
(8.15) (8.16)
The training course for the network is described as follows. Step 1 Choose standard orthogonal basis functions b t(t),b 2(t), .. .,bL(t) in the input space. The selection of the number of basis functions L should make the expansion of the basis function satisfy the required precision. Input functions and connection weight functions are denoted as the expansions of basis functions; Step 2 Give the learning error precision of the network e; the accumulative learning iteration times s=O and the maximal learning iteration times M; Step 3 Initialize the network connection weights and the activation thresholds Ilk' Vjk' w;j ), ()j!), ()i 2 ) (i = 1, 2, ...n; j = 1,2,..., m; 1= 1,2,..., L; k = 1,2,..., K) ; Step 4 Calculate the error function E according to Eq. (8.6); If Ece or sz-M, go to Step 6; Step 5 Modify the connection weights and the activation thresholds according to Eqs. (8.7)-(8.16); s+l ~s; Go to Step 4; Step 6 Output the learning result and stop.
8.1.3 Application Examples Example 8.1 Application in rotating machine failure diagnosis 181
Looking at the problem of rotating machine failure diagnosis described in Example 6.2 in Chapter 6, we now adopt a process neural network with double hidden layers as the recognizer of a rotating machine failure automatic diagnosis system. Process neural networks can automatically extract a variety of process pattern characteristics
166
Process Neural Networks
from continuous input signals and generate memory through learning. Aiming at the four failure modes stated in Example 6.2 in Chapter 6, 22 curves in all are chosen of real measured signals, i.e. 5 of axis misalignment, 6 of eccentricity, 7 of abrasion, and 4 normal to constitute a learning sample set, and the test set is made up of another 8 samples. The structure parameters of the network are chosen as follows: I input node, 20 process neuron nodes in the first hidden layer, 10 time-invariant neuron nodes in the second hidden layer, and 1 output node. A Sigmoid function is adopted as the activation function. Because the network input signals change periodically, a trigonometric function system is chosen as the orthogonal basis function, and the number of basis functions is 50 (which is determined by the 22 learning samples and the 8 test samples with fitting precision 0.001 via experiment). The learning rate constants are a=0.50, /3=0.45, y=0.65, ,u=0.55 and ..1.=0.50, the maximal learning times N=lO,OOO, and the learning precision t=0.05. The network is actually convergent after 5937 iterations. The error function curve during the iteration is shown in Fig. 8.2. Eight test samples are recognized and 7 of them are well-judged, and the correction rate is 87.5%. It is a better result compared to some existing methods of rotating machine failure automatic diagnosis. 1.0,---------------,
i ~
...
0.5
~
2000
4000
6000
8000
Iteration
Fig. 8.2 Iteration error function curve
8.2 Discrete Process Neural Network When process neural networks are adopted to solve practical problems related to a time process such as system simulation modeling, signal processing, time series comparative analysis, etc., the situation that the system inputs are in discrete time sample data sequence is always encountered, i.e, information processing for a discrete time process. To solve the problem, we can construct a discrete process neural network model to directly process discrete time series. Discrete process neural networks are actually a special case of process neural networks with continuous time as inputs.
Design and Construction of Process Neural Networks
167
8.2.1 Discrete Process Neuron A discrete process neuron is also made up of weighted input signal, spatio-temporal 2-dimensional aggregation, activation output, etc. Differing from a continuous process neuron, its inputs and connection weights are both in a discrete time sequence. If the spatial aggregation of the input signals of a discrete process neuron is a kind of weighted summary, and the accumulation operation for the time process is a kind of time effect accumulation for the input time series, then the structure of a discrete process neuron is shown in Fig. 8.3.
y
Fig. 8.3 Discrete process neuron model
In Fig. 8.3, X,(t/),x2(t/), ... ,xn(t/) for 1=1,2,... are n discrete time input sequences of a discrete process neuron; w,(t/),W2(t/), ... ,wn(t/) for 1=1,2, ... are the corresponding connection weight sequences; "$" is the spatial aggregation operator for discrete input signals, "®" is a discrete time (process) accumulation operator, and ji-) is the activation function . The input-output relationship of a discrete process neuron can be expressed as
y = f ((W(t) $ X(t»)® K(-)-B),
(8.17)
where X(t) is the input matrix of a discrete process neuron; W(t) is the corresponding connection weight matrix; K(t) is the time accumulation kernel function of a discrete process neuron; Bis the activation threshold of the neuron. If "$" and "®" respectively adopt the spatial weighted summary and the accumulation of the temporal effect on discrete time input signals of the system, and the kernel function K(t)=l , then Eq. (8.17) can be rewritten as (8.18)
where I'J.t/=t,t/+ For the condition of finite time series (as in most practical applications), Eq. (8.18) can be denoted as
168
Process Neural Networks
(8.19)
where T is the length of the time series .
8.2.2 Discrete Process Neural Network Consider a multi-input-single-output system with only one discrete process neuron hidden layer , whose topological structure is shown in Fig. 8.4.
y
Fig. 8.4 Discrete process neural network In Fig. 8.4, "I" denotes the spatial weighted summary operator of the input time series, and "It" denotes the accumulation operator for the time effect of discrete input signals . The input layer has n nodes and the inputs are in discrete time sequence. The hidden layer of the discrete process neural networks has m nodes for completing spatio-temporal weighted aggregation and the accumulation of discrete input signals and the feature extraction and memory for discrete time signals. The output node is a time-invariant neuron to complete the system output. The input-output mapping relation ship of discrete process neural networks is
(8.20)
where wi/tD (/=1,2,...) is the connection weight from the input node i to the hidden layer j at time t/; Vj is the connection weight from the hidden node j to the output node ; Ojl) is the activation threshold of the hidden node j ;f is the activation function of the hidden neuron ; g is the activation function of the output node; () is the activation threshold of the output node. If the inputs of discrete process neural network s are finite time series and the length of the time series is T, then Eq. (8.20) can be reformulated as
Design and Construction of Process Neural Networks
169
8.2.3 Learning Algorithm For the training of the discrete process neural network denoted by Eq. (8.21), what needs to be determined are the connection weights of the network witl) (i=I,2, .. .,n; j=1,2, ... ,m; 1=1,2, .. .,T), V"V2"" ,V m and the activation thresholds. In the following, the training method of the network based on a gradient descent algorithm is presented. Given Klearning samples
where dk is the expected output. Suppose that the actual output corresponding to the kth learning sample input is Yh the error function of the network can be defined as
According to the gradient descent algorithm, the learning rules of the connection weights and activation thresholds of the network are = vj +~Vj' j =1,2, ....m,
(8.23)
wyCt/) = wyCt/) + j3!J..w;/t/), i = 1,2,...,n; j = 1,2,...,m; 1= 1,2,...,T,
(8.24)
Vj
()j(1) --
oA()(I) • -1 2 + ,..... j , } - , , ... ,m, () = () + I]!J..(),
()(I)
j
(8.25) (8.26)
where a, 13, rand I] are the learning rate constants. Denote
L L wyCt/ )x; (t/ )!J..t/ - ()?) =Ukj, n
T
(8.27)
;=1 /=1
(8.28) then (8.29)
170
ProcessNeural Networks
(8.30) (8.31) (8.32)
The learning algorithm is described as follows. Step I Give the error precision DO, the accumulative learning iteration times s=O, and the maximal learning iteration times M; Step 2 Initialize the connection weights and the activation thresholds Vj'
w~l), ()jl ), ()(i=1,2, ...n;j=I,2, ...,m; 1=1,2, ...,L);
Step 3 Calculate the error function E according to Eq. (8.22). If Ecs or sz-M, go to Step 5. Step 4 Modify the connection weights and the activation thresholds according to Eqs . (8.23)-(8.32); s+l-s; go to Step 3. Step 5 Output the learning result and stop.
8.2.4 Application Examples Example 8.2 Water-flooded layer identification in an oil reservoir The status of an oil reservoir as water flooded is an important as well as complex job in oilfield development, especially when an oilfield is in its late development stage. Based on log data, a water-flooded layer is mainly identified by morphological and amplitude characteristics and the combined relationships of various well-log curves that change with depth and can reflect the formation of geophysical properties. The output is a water-flooded level. Therefore, the key to automatically recognizing a water-flooded layer is to build an identification model that can actually reflect the corresponding relationship between the reservoir water-flooded status and the well-log response curves in the research region and extract the morphological characteristics of the well-log curves integrally and objectively. During actual measuring, a logging tool samples multi-parameter data in terms of 8 points per meter in the well, i.e. the sampled information is the actually discrete data changing with depth , and well-log curves are obtained by fitting discrete well-log data . Therefore, discrete proce ss neural networks have good adaptability for solving the problem of identifying reservoir water-flooded status based on well-log data . In actual data processing, the water flooding level is divided into strong water flooding, middle water flooding, weak water flooding, and no water flooding according to the degree of reservoir water flooding in an oil layer. Five variables are
Design and Construction of Process Neural Networks
171
chosen as the parameters of water-flooded layer identification, i.e. spontaneous potential SP, interval transit time AC, deep lateral resistivity RLW , shallow lateral resistivity RLLS , and reservoir effective thickness h. As different reservoirs have different thicknesses, when reservoir log data are inputted into the network, the input process interval may be in disunity, so the reservoir thickness should be normalized in advance. By a rarefying or denseness-making method, the thickness of the reservoir is normalized in the interval [0,1]. Choose 16 sample points in the interval of [0,1] from each reservoir, the change in the valid thickness of the reservoir is denoted by h. At the same time, because the dimension of each log variable is different and there is great difference between the log data of different log variables, the feature parameters should also be standardized. Suppose that Xijl is the lth original measuring value of the jth log parameter of the ith formation and the standardized data is X~I ,
X "I V
=
X "I 'J
-minx'i 'I IJ I,
•
max X "I - mm X "I i, l IJ i.l IJ
(8.33)
Discrete process neural networks described by Eq. (8.21) are used to identify water-flooded status according to discrete log data changing with depth. Via experimental comparative analysis, the topological structure of the network is 5-30-1, i.e. 5 input nodes, 30 discrete process neuron hidden nodes, and 1 time-invariant neuron output node. The learning sample set is made up of 10 typical reservoir samples apiece belonging to strong water flooding, middle water flooding, weak water flooding, and no water flooding. The test set consists of 15 water-flooded reservoir samples. The expected output of the network of no water flooding corresponds to 0.25, weak water flooding to 0.50, middle water flooding to 0.75, and strong water flooding to 1.0. The learning error precision is 0.05. The network is convergent after 5371 iterations. 15 samples are tested and 11 of them are well-judged. The correction rate is 73.3% and it obtained good results. The error function curve during the iteration is shown in Fig. 8.5. 1.0 I:
0
' (ij
'u 0
...c, ... ......0 ~
0.5
Iteration Fig. 8.5 Iteration error function curve
172
ProcessNeural Networks
Example 8.3 Approximation of discrete trigonometric function sample
°
In this example, discrete process neural networks are used to approximate 1 groups of discrete time function sample pairs. Suppose that the input interval of the discrete process is [0,1] and the input sample functions are {sin(2Iatt),cos(21tkt),k} for k=1 ,2,... ,1O. The sample functions are dispersed as {sin(2k1tti),cos(21tkti),k} where ti=i/128 for i=0,1,...,127. The network structure and parameters are chosen as follows: 2 input nodes, 15 hidden nodes and 1 output node. The error precision c=O.OOl; the learning rate constants a=0.5, ,8=0.80, /CO.65; the maximal learning times M=5000. A Walsh transform is implemented for discrete data and the transformed data are submitted to the network for training. The network is convergent after 283 iterations. The approximation error is 0.0009. The approximation result is shown in Table 8.1. Table 8.1 Approximation result of 10 groups of discrete trigonometric function samples Expected output
Actual output
Absolute error
0.1000 0.2000 0.3000 0.4000 0.5000 0.6000
0.0997 0.1996 0.3004 0.4008 0.5006 0.5995
0.0003 0.0004 0.0004 0.0008 0.0006 0.0005
0.7000
0.6991
0.0009
0.8000 0.9000 1.0000
0.8002 0.9003 0.9992
0.0002 0.0003 0.0008
The experimental results show that there is powerful approximation ability for a discrete function sequence when adopting discrete process neural networks and the training algorithm based on a discrete Walsh transform. It shows that process neural networks are very appropriate for modeling real-time discrete systems.
8.3 Cascade Process Neural Network [9] IIi practical problems, the inputs of some time-varying systems may be divided into several time phases, and each system in every phase may have its own special changing rules and characteristics. For instance, a crop planting cycle can be divided into sowing, seedling growth, blossoming and fructification, growing, ripening, etc. Although each growth phase is related to external environmental factors such as temperature, humidity, fertilizer, illumination, etc., the crop has its own growth rule in each phase, and the influence of various environmental factors on the crop in each
Design and Construction of Process Neural Networks
173
growth phase is different. At the same time, the growth state of the crop at each phase also affects the growth state at the next phase. With this problem in mind, a process neural network model with continuous phase inputs and discrete process outputs is considered. The model can be in a cascade structure; the input-output relationship of the system in different phases can be described by different models. As the complex degree of the input-output relationship in each time phase may be different, the structure of each process neural sub-network for realizing the mapping relationship can be the same or different. The input of each sub-network is the state output of the system in the previous phase and the time-varying input signal in the current time phase and the output is the aggregation and accumulation results of the system in the phase.
8.3.1 Network Structure Suppose that the system input process interval [O,T] can be divided into N time phases, and the time intervals of each phase are denoted as [0,Td,[hT2] , • • • ,[TN-J,TN ] where To=0<Till-IIW/t)11 k
where W;(t)=(wlj(t),W2/t) ,... ,wn/ t» for j=l ,2,.. . ,m. Here, the node / with the maximal similarity coefficient wins in the competition, i.e. / satisfies
r'. = max {rf} . }
j e {l .2•...•ml
(8.48)
For the input sample vector X'(t), if the node / wins in the competition, then the weights are adjusted according to the following rule: when the network again encounters the input X'(t) or an input sample vector similar to X'(t), the winning probability of the node lis increased, i.e. wij(t) (i=I,2,.. .,n;j=I,2,...,m) is adjusted so (t) move toward the sample X'(t) by algorithm as to make the weight function W. } adjusting, and finally make the output of the wining neuron / represent the pattern class that X'(t) represents. (b) Function orthogonal basis expansion The computation and the training of self-organizing process neural networks includes the accumulative operation (for instance integral operation) of process neurons over
Design and Construction of Process Neural Networks
181
time, so the learning algorithm based on orthogonal function basis expansion can be adopted. Suppose that b 1(t),b2(t), ... ,M t), are a group of standard orthogonal basis functions in C[O,n, X(t)=(Xl(t),xz(t), ,xn(t» is the function in an input space. Under the given fitting precision, Xi(t) is expressed in the finite expans ion form of basis functions, i.e. L
(8.49)
xi(t) = LaA(t), i = 1,2 ,....n . 1=1
In addition, the weight function Wi/t) is also expanded by b 1(t),b2(t), ... ,bL(t) L
Wi/t) = L wijl)bl(t), ; = 1,2, ...,n; j = 1,2, ....m,
(8.50)
i= 1
where wiji) is the connection weight between the input layer and the competitive layer of wij(t) corresponding to bi(t) . Substituting Eqs. (8.49) and (8.50) into Eq. (8.47), according to the orthogonality of basis functions, we have
(8.51)
(c) Description of the algorithm Step 1 Generate randomly an initial value of wij) (j=1,2,.. . ,m; ;=1,2,.. . ,n; /=1,2,. .. ,L ) in the interval [0,1];
Step 2
The input training sample
x' (z) = (x: (r), x~ (z), ..., x: (t))
is expanded
by the orthogonal basis functions b[(t),bit),... ,bL(t) according to Eq. (8.49) L
k
Xi (t) = Lai~bl (t) ;
Step 3
Calculate
rf
1=0
according to Eq. (8.51);
Step 4 Determine the winning process neuron/ according to Eq. (8.48) ; Step 5 The connection weights linked with the winning process neuron / are modified as follows while other weight functions remain unchanged A (i) ( k (I» ,J-J,l-, . - " . . - I 2,...,n,. l -, - 1 2,..."L ilW -1]aj/-w ij ij
where 1] is the learning rate constant.
(8.52)
182
Process Neural Networks
When the maximum i1wjj) is small enough (less than a preset small value) , the training ends, or else let W ij(I ) --
w ij(I )
+ UAW(lij ) '
J' -- 12m' , , . .. , ,
I' --
1, 2, •••, n', I - , I 2 , •••, L',
(8.53)
Step 6 Choose another training sample , return to Step 2 and continue with the modified connection weight. After the network training finishes, for any pattern sample X(t) waiting for (X (t) , (t» identification, calculate r. = J for j=I,2,... ,m . If / satisfies
w.
J
r .• J
=
IIX(t)II·IIW/t)II
max {r .}, then the pattern class which the process neuron / stands for is just
j e{l ,2,...,m}
J
the pattern class of the sample X(t) . It is obvious that the above algorithm is used to cluster input samples. A group of samples are clustered into several sub-classes through the above algorithm. (2) Supervised learning algorithm For every learning sample )(t), whose pattern class is known, we can adopt a supervised learning algorithm. After the input pattern has been inputted into the network, rjk is calculated by Eq. (8.51) and the winning neuron / is selected
r
according to Eq. (8.48). If the winning process neuron is the proper classification of )(t), then the connection weight of the corresponding proces s neuron is adjusted in the direction to )(t). Its modification formula is A (I) - , 1 2, ..., L . uWij - T/ ( ailk - wij(I) ) , J• -- J" ,. 1• -- 1, 2, .. . , n,. I -
(8.54)
When/ is an improper classification of )(t), its modification formula is A (I) --T/ail-W (k (I» ,J-J, . - " . 1• - I, 2 , ... ,n,. uW jj ij
I- , I 2, ..., L .
(8.55)
As seen from the above algorithm, it is a classification algorithm with teacher demonstration.
8.4.3 Application Examples Example 8.5 Sedimentary microfacies recognition in oil geology research Sedimentary microfacies recognition is an important as well as complex basic job in oil geology research. Subzone sedimentary microfacies are identified by the morphological and amplitude characteristics of a group of continuous well-log curves that change with depth and can reflect stratigraphic and geophysical properties.
183
Design and Construction of Process NeuralNetworks
Traditional methods are completed manually under the guidance of a law of facies model and facies sequence. Now, some scientists have already solved this problem using a neural network method. However, the existing methods need to convert well-log curves into a group of discrete vectors capable of embodying morphologically changing characteristics of continuous curves. In fact, well-log curves can be considered as a continuous process signal changing with depth. When process neural networks are used for sedimentary microfacies recognition, the networks can directly obtain morphological and amplitude changing characteristics of subzone continuous well-log curves. In oil geology research, sedimentary microfacies recognition mainly relies on four variables, i.e. three well-log curves (spontaneous potential, electric resistivity, micronormal) and subzone thickness. Microfacies types are divided into channel sand, abandoned channel sand, interchannel lamellate sand, and interchannel mud. Typical sample well-log curves of sedimentary microfacies are shown in Fig. 8.8. According to data from the core well and expert interpretation, 23 subzone microfacies samples are chosen from four classes in experiment to form the training set which includes typical samples of four different microfacies types. After the sample set is classified via self-organizing process neural networks, the class of the typical sample stands for the grouping class. If a divisiory class includes two or more typical samples, the classification fails.
H,.
a
~ M\ o~oLLo~o
L
'" :> ;:l
2 '-' 40
40 2.'
2)
~8..
0 2 4
40 40 20L20
0 2 4
6
0 2
6
4
6
0
0 2 4
6
' ~ ~ 30 ~ 30~ 30L30L ~ '-' 20
20
20
20
0
10
0
10
10
'{j .q ~ .::
@ ....
~
10
20
024
6
1
024
6
1
20
0 024
6
00 2 4
h (m) (a)
6
00
024
6
20L20L
l~lO~IO~I~
~
0
2 4
h (m) (b)
6
0 2 4
I~
6
0 2 4
h (m)
h (m)
(c)
(d)
6
Fig. 8.8 Typical samples of sedimentary microfacies (a) Channel sand (b) Abandoned channel sand (c) Interchannellamellate sand (d) Interstream mud
During the processing of real data, as different subzones have different thicknesses and the input process interval is not unified, the subzone thickness should be normalized to unit variance in advance. The maximum thickness of all subzones is
184
Process Neural Networks
rounded off and 1 is added so as to construct a unified process interval. As baseline value 0.2 is taken for the value of the part short of the layer with smallest thickness . In this way, the subzone thickness variable has been included in other input functions, so the subzone thickness parameter can be cut down. For a competitive learning algorithm without teacher demonstration, the structure parameters of the network are chosen as follows: the input layer has 3 nodes and the competitive layer has 4 nodes; the orthogonal basis function is selected as a trigonometric function; the number of basis functions is 50; the maximal cycle times are N=5000. The classification result and the network weight coefficients do not change any more after the network learns for 375 times. For 23 samples, the classification results of 20 samples are correct; the separable rate is 88.95%. For the situation with teacher demonstrat ion, 16 subzone samples belonging to 4 classes are chosen to form a learning sample set and the test set has 9 samples. The network is convergent after 563 iterations. The samples of the test set are recognized and all are well-judged.
8.5 Counter Propagation Process Neural Network [11] Counter propagation neural networks are a kind of three-layer heterogeneous feedforward neural network model proposed by Robert Hecht-Nielson in 1987. The model consists of the input layer, the competitive layer, and the output layer in which the competitive layer and the output layer respectively carry out a self-organizing mapping algorithm under the Grossberg rule to learn. Compared with homogeneous networks, the heterogeneity of counter propagation neural networks makes them closer to the information processing mechanism of the biological cerebral nervous system. They have significant applications in pattern recognition, pattern completeness, signal enhancement , etc., and show high learning efficiency and adaptive capability . Counter propagation process neural networks can be constructed by extending traditional counter propagation neural networks in the time domain. The competitive layer of counter propagation process neural networks consists of process neurons. The system inputs and the connection weights between the input layer nodes and the competitive layer nodes may be time-varying functions ; the output layer consists of common time-invariant neurons; the connection weights between the competitive layer and the output layer are time-invariant adjustable parameters . The competitive layer performs a generalized self-organizing mapping algorithm (a self-organizing mapping algorithm aiming at a time-varying function) which includes a comparative mechanism. The network in this layer implements adaptive classification for process input signals and simultaneously the connection weight functions complete the extraction of pattern classification information included in time-varying input signals. The output layer performs the Grossberg learning rule, implements classification representation and gives the expected outputs according to system requirements.
Design and Construction of Process Neural Networks
185
Counter propagation process neural networks have better mechanical adaptability to practical problems, such as pattern classification of time-varying signals, continuous system signal processing, aircraft engine rotor simulated fault diagnosis [IZI, etc.
8.5.1 Network Structure Counter propagation process neural networks are a feedforward network model with a three-layer structure. The model consists of the input layer, the competitive layer and the output layer. Neuron nodes in various adjacent layers connect fully to each other. Suppose that the input layer has n nodes to complete the input of n time-varying functions to the network; the competitive layer has H nodes composed of process neurons and this layer carries out a generalized self-organizing mapping algorithm to complete adaptive competitive classification for input patterns. The output layer is made up of m common time-invariant neuron nodes, carries out the Grossberg learning rule and gives the expected outputs according to the system requirement s. If the spatial aggregation operation adopts weighted summary and the temporal accumulation operation adopts an integral, the topological structure of the network is shown in Fig. 8.9.
Fig. 8.9 Counter propagation process neural network model
In Fig. 8.9, Xl(t),xZ(t),... ,xn(t) (tE [0,11) are the input functions of the network, wij(t) (i=I,2,... ,n; j =1,2,.. .,H) is the connection weight function from the input layer node i to the competitive layer node j ; Vjk(t) (j=l ,2,.. .,H; k=l ,2,.. .,m) is the connection weight between the competitive layer and the output layer; Yk (k=1,2,... ,m) is the output of the network; [0,11 is the input process interval; f is the neuron activation function of a process neuron.
8.5.2 Learning Algorithm The learning course of counter propagation process neural networks involves two algorithms . The self-organizing mapping algorithm is used between the input layer
186
Process Neural Networks
and the competitive layer so as to complete the training of wij(t) and the adaptive pattern classification for input functions. The Grossberg learning rule is used between the competitive layer and the output layer to adjust the time-invariant connection parameter Vjk and give the system outputs according to requirements. The algorithm is simply described as follows: first, determine the winning process neuron / in the competitive layer according to a self-organizing competitive mapping algorithm described in Section 8.4; second, adjust the connection weight functions from various nodes in the input layer to the node/ according to Eqs. (8.54) and (8.55) while other weight functions remain unchanged; then compute the outputs of the network, compare them with the expected outputs, and adjust the connection weights between the competitive layer and the output layer according to the Grossberg learning rule. The modification formula is (8.56) where
yj1)
is the output of process neuron j in competitive layer (a similarity
degree) ; Yk is the actual output of the output layer node k and dk is the expected output. Repeat the above steps until the error precision requirement is satisfied so as to complete the network training.
8.5.3 Determination of the Number of Pattern Classifications In the structure design of counter propagation process neural networks, it is important to determine the number of nodes in the competitive layer. The outputs of nodes in the competitive layer represent the pattern class of input function samples, so whether the number of nodes in the competitive layer is chosen properly will directly affect the execution efficiency of the network and correctness in solving practical problems . If the number of the real pattern classification of samples is known in advance, then the number of the nodes in the competitive layer can be given directly; if it is unknown , the number can be determined by the following dynamic clustering method. Suppose that the real problem domain includes K function samples: {X'(t), X2(t),... ,XK(t); X'(t)E (C[O,nr}, and these K samples have already included all the modes in the real problems. Three clustering parameters are set: the number of initial classification Hi; the similarity coefficient threshold (suppose that the bigger the similarity coefficient is, the more similar they are), and the between-cl ass distance threshold R. The reciprocal of similarity coefficient defined by Eq. (8.47) is selected as the distance between two input function samples, and the minimal value of distances between each two input function samples in the respective two classes is taken as the between-class distance. Steps for dynamic classification are as follows. Step I In the input function sample set, choose Ho (Ho-::::'K) samples as the
e
Design and Construction of Process Neural Networks
187
delegates of Ho pattern classes and construct Ho classes. Step 2 Compute the similarity coefficients between the rest function samples in the input function sample set and every existing pattern class delegate in tum. If their this function sample can be used as a maximal similarity coefficient is less than member to form a new class and as the delegate of the new pattern class, Ho+ l--+Ho. If their maximal similarity coefficient is bigger than then this function is attributed to the class with maximal similarity coefficient, and the mean value of this function sample and the delegate function sample of the original class is used as the delegate of the new class obtained after merging. Step 3 Compute the between-class distance between each two Ho classes. If the between-class distance between two classes is less than R, two classes are merged and the mean value of class delegate function samples in two classes is used as the delegate of the new class; if the between-class distance is bigger than R, then two input function sample classes remain unchanged . Step 4 After Step 3 above is carried out, the number of classifications may change. Replace Ho with the number of new classifications. If the classification result (including the number of classifications and the specific classification of function samples) changes, go to Step 3; if the classification result does not change any more, the classification finishes. Now the number of classifications Ho can be used as the reference number of nodes in the competitive layer of counter propagation process neural networks.
e,
e,
8.5.4 Application Examples Example 8.6 Water-flooding status identification of oil layer in oil exploitation The problem of water-flooding status identification of an oil layer has been described in Section 8.2 in this chapter. Here, we adopt counter propagation process neural networks to recognize a well-log water-flooded layer. Using many data of water-flooded oil layer analysis from core wells, 80 representative water-flooded oil layer samples are chosen to constitute the training set and 40 oil layer samples are chosen to constitute the test set. The water-flooding level of the oil layer is divided into strong water flooding, middle water flooding, weak water flooding, and no water flooding. The topological structure of counter propagation process neural networks is determined as 5-8-1. Eighty training samples (which are not classified in advance) are substituted into the network for training. The learning precision is 0.05, and the maximal learning times is 5000. In experiment, the network is convergent after 1319 iterations . The well-trained network is used to identify samples in the training sample set, 74 of them are well-judged , and the correction rate is 92.5%. Identify 40 samples in the test set, 31 of them are well-judged, and the correction recognition rate is 77.5%. This is a better result compared with the correction rate of 67% using the current automatic water-flooded layer identification method.
188
ProcessNeuralNetworks
8.6 Radial-Basis Function Process Neural Network In 1985, Powell proposed a radial-basis function (RBF) method of multivariate interpolation. In 1988, Broomhead and Lowe first applied the RBF in the design of neural networks and accordingly RBF neural networks were built. This network is a three-layer feedforward neural network model that realizes nonlinear mapping by changing parameters in the nonlinear transform functions of neurons and improves the learning rate by the linearization of connection weight adjustment. In recent years, it has been broadly applied to application problems, such as pattern recognition, association rule mining, signal processing, etc. Traditional radial-basis function neural networks are extended in the time domain and a radial-basis function process neural network model [13J can be constructed. The model has a three-layer feedforward structure: the first layer is the input layer composed of signal source (process functions related to time) nodes; the middle hidden layer unit is the radial-basis process neuron, the transformation function is the radial-basis kernel function, and the center of the kernel function can also be a time-dependent function; the third layer is the output layer to respond to the network input pattern .
8.6.1 Radial-Basis Process Neuron A radial-basis process neuron is composed of spatio-temporal 2-dimensional aggregation and radial-basis kernel function transformation, etc., and its structure is shown in Fig. 8.10.
y
Fig. 8.10 Radial-basis process neuron
In Fig. 8.10, Xl(t),X2(t), ... ,xn(t) are the input functions of the radial-basis process neuron in time interval [0,11; "f" is a temporal accumulation operator (for example, integral operation with respect to time); K( ·) is the kernel function of the radial-basis process neuron. The input-output relationship of the radial-basis process neuron is (8.57)
or
Design and Construction of Process Neural Networks
OJ
= JK(IIX(t)-x'(t)ll)dt ,
189
(8 .58)
where X(t)=(x] (t),x2(t) ,... ,xn(t» is the input of the network; X(t)= (x; (r), x; (t) ,...,x~ (t» is the kernel center function of the radial-basis process neuron;
11 ·11 is a norm;
OJ
is the output of the radial-basis process neuron.
8.6.2 Network Structure Suppose that the input layer of the radial-basis process neural network has n nodes for completing the input of the network of time-varying functions; the middle radial-basis process neuron hidden layer has m nodes, the transformation function of each unit is the radial-basis kernel function; the network output is the linear weighted summary of the output signals of hidden layer nodes . The topological structure of the network is shown in Fig . 8.11. In Fig. 8.11, Wj (j=l ,2,.. . ,m) is the weight coefficient of the output layer, and is an adjustable parameter of the network. Suppose that X(t)=(Xl(t),X2(t),... ,xn(t» is the input function of the network, x j(t) is the kernel center function of the jth radial-basis process neuron where tE [0,11 ; "f" is the integral in [0,11. then the input-output relationship of radial -basis process neural networks is F(X(t» =
f
WjK(
rIIX(t)-Xj(t)I~t),
r
j(t)!l)dt.
J=l
or F(X(t» = fW j j =l
K(IIX(t)-X
(8.59)
(8.60)
y
xn(t ) -~i,
,
25 M
U, >0
5N
N
o M
5N
51.703
12625
9.958
0.1989
22.654
197.3569
3.842752
1.720410
197.3226
3.847345
1.719169
51.611
12625
9.949
0.1985
22.185
197.3627
3.833413
1.722878
197.2940
3.842506
1.720708
52.496
\2625
9.930
0.1993
22.084
197.2416
3.791197
1.721718
197.2623
3.798063
1.719753
51.375
12625
9.820
0.1993
22.218
197.0514
3.819826
1.716119
\97.1374
3.795799
1.721422
51.324
12625
9.769
0. 1980
22.523
\96.8669
3.772746
1.72\066
\96.8737
3.775503
I. 721050
51.724
12625
9.772
0. \980
22.320
\97.0514
3.812689
1.719888
197.0325
3.8\6259
I. 720573
51.423
12625
9.772
0.\ 985
22.202
197.4607
3.812768
1.724559
\97.4881
3.8\ 1220
1.724457
51.632
12625
9.785
0.1985
22.298
197.4204
3.75\421
1.730540
\97.4055
3.738469
1.728948
52.378
12625
9.758
0.1980
22.058
197.328\
3.718828
1.728733
197.2353
3.734223
1.728826
52.259
12875
9.790
0.1980
22.489
\96.9187
3.729149
1.727716
197.0095
3.725853
1.728824
50.893
12875
9.792
0.1990
22.554
196.8957
3.745962
1.719202
196.8948
3.745363
1.719254
52.053
\2875
9.801
0.\983
22.263
196.7400
3.827659
1.7\ 1979
196.7389
3.827601
1.711786
51.216
12875
9.855
0.1979
22.188
\96.8092
3.842712
1.713227
196.8095
3.842876
1.713309
51.094
12875
9.813
0.1998
22.632
196.9880
3.936727
1.713239
196.9877
3.93687
1.713229
51.440
12875
9.780
0.2006
22.595
197.0456
3.9\9298
1.710837
197.0461
3.919102
1.710668
51.680
\2875
9.812
0.2002
22.298
196.838\
3.997875
1.7\2953
196.8056
4.002178
1.712016
51.508
\2875
9.804
0.\998
22.557
196.9015
4.03\108
1.713683
\96.8500
4.037784
1.7\2030
51.484
\2875
9.816
0.2006
22.313
196.6882
4.\349 \6
1.715818
\96.6438
4.140297
1.714054
52.274
12875
9.824
0.1998
22.224
196.5096
4.131458
1.7\1781
196.5479
4.137560
1.711203
52.058
12875
9.808
0.1996
22.286
196.5787
4.160254
1.704013
196.6682
4.137916
1.709078
52.754
12875
9.823
0.2001
22.344
196.5267
4.183951
1.704395
196.5138
4.183955
1.704763
51.8\6
12625
9.805
0.2001
22.309
196.4288
4.241944
1.698021
196.4883
4.233400
1.695516
52.584
12625
9.783
0. \998
22.428
196.4922
4.216187
1.689842
196.4455
4.227890
1.690182
51.343
12625
9.764
0.1993
22.549
\96.4979
4.2592
1.680353
196.4655
4.258337
1.680526
50.966
12625
9.788
0.1988
22.597
196.4057
4.210317
1.681163
196.4732
4.209069
1.681546
51.400
12625
9.781
0. \985
22.316
\96.2674
4.\ 26871
1.6809 \8
\96.3352
4.125\43
1.681713
50.993
12625
9.763
0.1990
22.264
\96.2097
4.097279
1.680659
196.\657
4.094817
1.683129
52.487
12625
9.763
0.1993
22.347
\96.2963
3.996 \06
1.684845
\96.2998
3.998278
1.684304
51.600
12625
9.789
0.2006
22.391
\96.5844
4.070610
1.680332
196.5606
4.066\ 48
1.682184
51.676
12625
9.754
0.2001
22.476
196.4807
4.035893
1.683330
196.5291
4.031475
1.682652
so as to achieve the preset goal. On the other hand, we can also build a prediction model according to practical economic operation data and predict the future economic development situation. This is just the control and prediction model required for an "accurate economy" that some economi sts dream about. We believe that so long as economi sts cooperate with mathematicians and IT experts intimately, the above goals are achievable.
Applicationof ProcessNeural Networks
223
199
.;< '" 197 03
;..,
195 2
0
4
6
8
10
12
14
X axis (time)
Fig. 9.26 The curves of 2TIl302_CV and 2TIl302_CV_REAL (unit: 10 s) 6.0 5.4
'" 4.8
.;