ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS
IFIP–The International Federation for Information Processing
IFIP was founded in 1960 under the auspices of UNESCO, following the First World Computer Congress held in Paris the previous year. An umbrella organization for societies working in information processing, IFIP’s aim is two-fold: to support information processing within its member countries and to encourage technology transfer to developing nations. As its mission statement clearly states, IFIP’s mission is to be the leading, truly international, apolitical organization which encourages and assists in the development, exploitation and application of information technology for the benefit of all people. IFIP is a non-profitmaking organization, run almost solely by 2500 volunteers. It operates through a number of technical committees, which organize events and publications. IFIP’s events range from an international congress to local seminars, but the most important are: The IFIP World Computer Congress, held every second year; Open conferences; Working conferences. The flagship event is the IFIP World Computer Congress, at which both invited and contributed papers are presented. Contributed papers are rigorously refereed and the rejection rate is high. As with the Congress, participation in the open conferences is open to all and papers may be invited or submitted. Again, submitted papers are stringently refereed. The working conferences are structured differently. They are usually run by a working group and attendance is small and by invitation only. Their purpose is to create an atmosphere conducive to innovation and development. Refereeing is less rigorous and papers are subjected to extensive group discussion. Publications arising from IFIP events vary. The papers presented at the IFIP World Computer Congress and at open conferences are published as conference proceedings, while the results of the working conferences are often published as collections of selected and edited papers. Any national society whose primary activity is in information may apply to become a full member of IFIP, although full membership is restricted to one society per country. Full members are entitled to vote at the annual General Assembly, National societies preferring a less committed involvement may apply for associate or corresponding membership. Associate members enjoy the same benefits as full members, but without voting rights. Corresponding members are not represented in IFIP bodies. Affiliated membership is open to non-national societies, and individual and honorary membership schemes are also offered.
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS IFIP 18th World Computer Congress TC12 First International Conference on Artificial Intelligence Applications and Innovations (AIAI-2004) 22–27 August 2004 Toulouse, France
Edited by
Max Bramer University of Portsmouth, UK
Vladan Devedzic University of Belgrade, Serbia and Montenegro
KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: Print ISBN:
1-4020-8151-0 1-4020-8150-2
©2004 Springer Science + Business Media, Inc. Print ©2004 by International Federation for Information Processing. Boston All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America
Visit Springer's eBookstore at: and the Springer Global Website Online at:
http://www.ebooks.kluweronline.com http://www.springeronline.com
Contents
Foreword Acknowledgments
xi
xiii
Paper Sessions Applications 1 Artificial Intelligence Systems in Micromechanics
1
FELIPE LARA-ROSANO, ERNST KUSSUL, TATIANA BAIDYK, LEOPOLDO RUIZ, ALBERTO CABALLERO AND GRACIELA VELASCO
Integrating Two Artificial Intelligence Theories in a Medical Diagnosis Application HADRIAN PETER AND WAYNE GOODRIDGE
11
Artificial Intelligence and Law HUGO C. HOESCHL AND VÂNIA BARCELLOS
25
Virtual Market Environment for Trade PAUL BOGG AND PETER DALMARIS
35
vi
AI Applications and Innovations
Neural Networks and Fuzzy Systems An Artificial Neural Networks Approach to the Estimation of Physical Stellar Parameters A.RODRIGUEZ, I.CARRICAJO, C.DAFONTE,B.ARCAY AND M.MANTEIGA Evolutionary Robot Behaviors Based on Natural Selection and Neural Network JINGAN YANG, YANBIN ZHUANG AND HONGYAN WANG
45
55
Control of Overhead Crane by Fuzzy-PID with Genetic Optimisation A. SOUKKOU, A. KHELLAF AND S. LEULMI
67
Creative Design of Fuzzy Logic Controller LOTFI HAMROUNI AND ADEL M. ALIMI
81
On-Line Extraction of Fuzzy Rules in a Wastewater Treatment Plant J. VICTOR RAMOS, C. GONÇALVES AND A. DOURADO
87
Agents An Autonomous Intelligent Agent Architecture Based on Constructivist AI FILIPO STUDZINSKI PEROTTO, ROSA VICARI AND LUÍS OTÁVIO ALVARES Finding Manufacturing Expertise Using Ontologies and Cooperative Agents OLGA NABUCO, MAURO KOYAMA, FRANCISCO PEREIRA AND KHALIL DRIRA Using Agents in the Exchange of Product Data UDO KANNENGIESSER AND JOHN S. GERO
103
117
129
Applications 2 A Pervasive Identification and Adaptation System for the Smart House PAULO F. F. ROSA, SANDRO S. LIMA, WAGNER T. BOTELHO, ALEXANDRE F. NASCIMENTO AND MAX SILVA ALALUNA
141
vii
Deductive Diagnosis of Digital Circuits J. J. ALFERES, F. AZEVEDO, P. BARAHONA, C. V. DAMÁSIO AND T. SWIFT
155
Verification of Nasa Emergent Systems CHRISTOPHER ROUFF, AMY VANDERBILT, WALT TRUSZKOWSKI, JAMES RASH AND MIKE HINCHEY
167
Theory Knowledge Base Structure: Understanding Maintenance JOHN DEBENHAM
177
Learning Bayesian Metanetworks from Data with Multilevel Uncertainty VAGAN TERZIYAN AND OLEKSANDRA VITKO
187
Using Organisational Structures Emergence for Maintaining Functional Integrity In Embedded Systems Networks JEAN-PAUL JAMONT AND MICHEL OCCELLO
197
Efficient Attribute Reduction Algorithm ZHONGZHI SHI, SHAOHUI LIU AND ZHENG ZHENG
211
Using Relative Logic for Pattern Recognition JULIUSZ L. KULIKOWSKI
223
Intelligent Tutoring and Collaboration MathTutor: A Multi-Agent Intelligent Tutoring System JANETTE CARDOSO, GUILHERME BITTENCOURT, LUCIANA FRIGO, ELIANE POZZEBON AND ADRIANA POSTAL Analysis and Intelligent Support of Learning Communities in Semi-structured Discussion Environments ANDREAS HARRER An Adaptive Assessment System to Evaluate Student Ability Level ANTONELLA CARBONARO, GIORGIO CASADEI AND SIMONE RICCUCCI
231
243
257
viii
AI Applications and Innovations
Forming the Optimal Team of Experts for Collaborative Work ACHIM KARDUCK AND AMADOU SIENOU
267
Internet Impact on Performance of Hypertext Classification of Selective Rich HTML Capture HOUDA BENBRAHIM AND MAX BRAMER Introducing a Star Topology into Latent Class Models for Collaborative Filtering
279
293
AND
Dialoguing with an Online Assistant in a Financial Domain: The VIP-Advisor Approach JOSEFA Z. HERNANDEZ, ANA GARCIA-SERRANO AND JAVIER CALLE An Agency for Semantic-Based Automatic Discovery of Web Services SIMONA COLUCCI, TOMMASO DI NOIA, EUGENIO DI SCIASCIO, FRANCESCO M.DONINI, MARINA MONGIELLO, GIACOMO PISCITELLI AND GIANVITO ROSSI
305
315
Genetic Algorithms GESOS: A Multi-Objective Genetic Tool for Project Management Considering Technical and Non-Technical Constraints CLAUDE BARON, SAMUEL ROCHET AND DANIEL ESTEVE Using Genetic Algorithms and Tabu Search Parallel Models to Solve the Scheduling Problem PEDRO PINACHO, MAURICIO SOLAR, MARIO INOSTROZA AND ROSA MUÑOZ Modelling Document Categories by Evolutionary Learning of Text Centroids J.I. SERRANO AND M.D. DEL CASTILLO
329
343
359
ix
Ontologies and Data Mining ODEVAL: A Tool for Evaluating RDF(S), DAML+OIL and OWL Concept Taxonomies ÓSCAR CORCHO, ASUNCIÓN GÓMEZ-PÉREZ, RAFAEL GONZÁLEZ-CABERO AND M. CARMEN SUÁREZ-FIGUEROA
369
AIR - A Platform for Intelligent Systems DRAGAN DJURIC, DRAGAN GASEVIC AND VIOLETA DAMJANOVIC
383
SwissAnalyst O. POVEL AND C. GIRAUD-CARRIER
393
Data Mining by MOUCLAS Algorithm for Petroleum Reservoir Characterization from Well Logging Data YALEI HAO, MARKUS STUMPTNER, GERALD QUIRCHMAYR AND QING HE
407
Reasoning and Scheduling Verification of Procedural Reasoning Systems (PRS) Programs Using Coloured Petri Nets (CPN) RICARDO WAGNER DE ARAÚJO AND ADELARDO ADELINO DE MEDEIROS Online Possibilistic Diagnosis Based on Expert Knowledge for Engine Dyno Test Benches O. DE MOUZON, X. GUÉRANDEL, D. DUBOIS, H.PRADE AND S. BOVERIE CBR and Micro-Architecture Anti-Patterns Based Software Design Improvement TIE FENG, JIACHEN ZHANG, HONGYUAN WANG AND XIAN WANG A Decision Support System (DSS) for the Railway Scheduling Problem L. INGOLOTTI, P. TORMOS, A. LOVA, F. BARBER, M.A. SALIDO AND M.ABRIL An Interactive Multicriteria Optimisation Approach to Scheduling MARTIN JOSEF GEIGER AND SANJA PETROVIC
421
435
449
465
475
This page intentionally left blank
Foreword
The papers in this volume comprise the refereed proceedings of the First International Conference on Artificial Intelligence Applications and Innovations (AIAI-2004), which formed part of the 18th World Computer Congress of IFIP, the International Federation for Information Processing (WCC-2004), in Toulouse, France in August 2004. The conference is organised by the IFIP Technical Committee on Artificial Intelligence (Technical Committee 12) and its Working Group 12.5 (Artificial Intelligence Applications). Further information about both can be found on the website at http://www.ifiptc12.org. A very promising sign of the growing importance of Artificial Intelligence techniques in practical applications is the large number of submissions received this time - more than twice the number for the Artificial Intelligence stream of the last World Computer Congress two years ago. All papers were reviewed by at least three members of the Programme Committee. The best 40 were selected for the conference and are included in this volume. The international nature of IFIP is amply reflected in the large number of countries represented here. The conference also featured an invited talk by Eunika Mercier-Laurent and a Symposium on Professional Practice in Artificial Intelligence, which ran alongside the refereed papers. I should like to thank the joint conference chairs, Professor John Debenham and Dr. Eunika Mercier-Laurent and my co-program chair Dr. Vladan
xii
AI Applications and Innovations
Devedzic for all their efforts in organising the conference and the members of our programme committee for reviewing an unexpectedly large number of papers to a tight deadline. I should also like to thank my wife Dawn for her help in editing this volume of proceedings. This is the first in a new series of conferences dedicated to real-world applications of AI around the world. The wide range and importance of these applications is clearly indicated by the papers in this volume. Both are likely to increase still further as time goes by and we intend to reflect these developments in our future conferences. Max Bramer Chair, IFIP Technical Committee on Artificial Intelligence
Acknowledgments
Conference Organising Committee Conference General Chairs John Debenham (University of Technology, Sydney, Australia) Eunika Mercier-Laurent (Association Francaise pour l’Intelligence Artificielle, France) Conference Program Chairs Max Bramer (University of Portsmouth, United Kingdom) Vladan Devedzic (University of Belgrade, Serbia and Montenegro)
Programme Committee Agnar Aamodt (Norway) Luigia Carlucci Aiello (Italy) Adel Alimi (Tunisia) Lora Aroyo (The Netherlands) Max Bramer (United Kingdom) Zdzislaw Bubnicki (Poland) Weiqin Chen (Norway) Monica Crubezy (USA) John Debenham (Australia)
xiv
Yves Demazeau (France) Vladan Devedzic (Yugoslavia) Rose Dieng (France) Henrik Eriksson (Sweden) Ana Garcia-Serrano (Spain) Nicola Guarino (Italy) Andreas Harrer (Germany) Jean-Paul Haton (France) Timo Honkela (Finland) Kostas Karpouzis (Greece) Dusko Katic (Serbia and Montenegro) Ray Kemp (New Zealand) Kinshuk (New Zealand) Piet Kommers (The Netherlands) Jasna Kuljis (United Kingdom) Ilias Maglogiannis (Greece) Eunika Mercier-Laurent (France) Antonija Mitrovic (New Zealand) Riichiro Mizoguchi (Japan) Enrico Motta (United Kingdom) Wolfgang Nejdl (Germany) Erich Neuhold (Germany) Bernd Neumann (Germany) Natasha Noy (USA) Zeljko Obrenovic (Serbia and Montenegro) Mihaela Oprea (Romania) Petra Perner (Germany) Alun Preece (United Kingdom) Abdel-Badeeh Salem (Egypt) Demetrios Sampson (Greece) Pierre-Yves Schobbens (Belgium) Yuval Shahar (Israel) Stuart Shapiro (USA) Derek Sleeman (United Kingdom) Constantine Spyropoulos (Greece) Steffen Staab (Germany) Mia Stern (USA) Gerd Stumme (Germany) Valentina Tamma (United Kingdom) Vagan Terziyan (Finland)
ARTIFICIAL INTELLIGENCE SYSTEMS IN MICROMECHANICS Felipe Lara-Rosano, Ernst Kussul, Tatiana Baidyk, Leopoldo Ruiz, Alberto Caballero, Graciela Velasco CCADET, UNAM
Abstract:
Some of the artificial intelligence (AI) methods could be used to improve the automation system performance in manufacturing processes. However, the implementation of these AI methods in the industry is rather slow, because of the high cost of the experiments with the conventional manufacturing and AI systems. To lower the experiment cost in this field, we have developed a special micromechanical equipment, similar to conventional mechanical equipment, but of much smaller size and therefore of lower cost. This equipment could be used for evaluation of different AI methods in an easy and inexpensive way. The proved methods could be transferred to the industry through appropriate scaling. In this paper we describe the prototypes of low cost microequipment for manufacturing processes and some AI method implementations to increase its precision, like computer vision systems based on neural networks for microdevice assembly, and genetic algorithms for microequipment characterization and microequipment precision increase.
Key words:
artificial intelligence, micromechanics, computer vision, genetic algorithms
1.
INTRODUCTION
The development of AI technologies opens an opportunity to use them not only for conventional applications (expert systems, intelligent data bases [1], technical diagnostics [2,3] etc.), but also for total automation of mechanical manufacturing. Such AI methods as adaptive critic design [4,5], adaptive fuzzy Petri networks [6,7], neural network based computer vision systems [8-12], etc. could be used to solve the automation problems. To check this opportunity up, it is necessary to create an experimental factory
2
F.Lara-Rosano, E.Kussul, T.Baidyk, L.Ruiz, A.Caballero, G.Velasco
with fully automated manufacturing processes. This is a very difficult and expensive task.
2.
MICROEQUIPMENT TECHNOLOGY
To make a very small mechanical microequipment, a new technology was proposed [13,14]. This technology is based on micromachine tools and microassembly devices, which can be produced as sequential generations of microequipment. Each generation should include equipment (machine-tools, manipulators, assembly devices, measuring instruments, etc.) sufficient for manufacturing an identical equipment set of smaller size. Each subsequent equipment generation could be produced by the preceding one. The equipment size of each subsequent generation is smaller than the overall size of preceding generation. The first-generation microequipment can be produced by conventional large-scale equipment. Using microequipment of this first generation, a second microequipment generation having smaller overall sizes can be produced. We call this approach to mechanical microdevices manufacturing MicroEquipment Technology (MET) [15]. The proposed MET technology has many advantages: (1) The equipment miniaturization leads to decreasing the occupied space as well as energy consumption, and, therefore, the cost of the products. (2) The labor costs are bound to decrease due to the reduction of maintenance costs and a higher level of automation expected in MET. (3) Miniaturization of equipment by MET results in a decrease of its cost. This is a consequence of the fact that microequipment itself becomes the object of MET. The realization of universal microequipment that is capable of extended reproduction of itself will allow the manufacture of low-cost microequipment in a few reproductive acts because of the lower consumption of materials, energy, labor, and space in MET. Thus the miniaturization of equipment opens the way to a drastic decrease in the unit cost of individual processing. At a lower unit cost of individual micromachining, the most natural way to achieve high throughput is to parallelize the processes of individual machining by concurrent use of a great quantity of microequipment of the same kind. Exploitation of that great number of microsized machine-tools is only feasible with their automatic operation and a highly automated control of the microfactory as a whole. We expect that many useful and proved concepts, ideas and techniques of automation can be borrowed from mechanical engineering. They vary from the principles of factory automation
Artificial Intelligence Systems In Micromechanics
3
(FMS and CAM) to the ideas of unified containers and clamping devices and techniques of numerical control. However automation of micromanufacturing has peculiarities that will require the special methods of artificial intelligence.
3.
AI BASED CONTROL SYSTEM FOR MICROMECHANICAL FACTORY
Let us consider a general hierarchical structure of the automatic control system for a micromechanical factory. The lowest (first) level of the system controls the micromechanical equipment (the micro machine-tools and assembly manipulators), provides the simplest microequipment diagnostics and the final measurement and testing of production. The second level of the control system controls the devices that transport workpieces, tools, parts, and the whole equipment items; coordinates the operation of the lowest level devices; provides the intermediate quality inspection of production and the more advanced diagnostics of equipment condition. The third control level contains the system for the automatic choice of process modes and routes for parts machining. The top (fourth) level of the control system performs detecting of non-standard and alarm situations and decision making, including communication with the operator. We proceed from the assumption that no more than one operator will manage the microfactory. It means that almost all the problems arising at any control level during the production process should be solved automatically and that operator must solve only a few problems, that are too complex or unusual to be solved automatically. Since any production process is affected by various disturbances, the control system should be an adaptive one. Moreover, it should be selflearning, because it is impossible to foresee all kinds of disturbances in advance. AI that is able to construct the self-learning algorithms and to minimize the participation of operator, seem to be especially useful for this task. AI includes different methods for creating autonomous control systems. The neural classifiers will be particularly useful at the lowest level of the control system. They could be used for the selection of treatment modes, checking of cutting tool conditions, control of the assembly processes, etc. They allow to make the control system more flexible. The system will automatically compensate for small deviations of production conditions, such as the change of cutting tool shape or external environment parameters, variations in the structure of workpiece materials, etc. AI will permit to design self-learning classifiers and should provide the opportunity to exclude the participation of human operator at this level of control.
4
F.Lara-Rosano, E.Kussul, T.Baidyk, L.Ruiz, A.Caballero, G.Velasco
At the second control level, the AI system should detect all deviations from the normal production process and make decisions about how to modify the process to compensate for the deviation. The compensation should be made by tuning the parameters of the lower level control systems. The examples of such deviations are the deviations from the production schedule, failures in some devices, off-standard production, etc. At this level the AI system should contain the structures in which the interrelations of production process constituents are represented. As in the previous case, it is desirable to have the algorithms working without the supervisor. The third control level is connected basically with the change of nomenclature or volume of the production manufactured by the factory. It is convenient to develop such a system so that the set-up costs for a new production or the costs to change the production volume should be minimal. The self-learning AI structures formed at the lowest level could provide the basis for such changes of set-up by selection of the process parameters, the choice of equipment configuration for machining and assembly, etc. At the third control level the AI structures should detect the similarity of new products with the products which were manufactured in the past. On the basis of this similarity, the proposals about the manufacturing schedule, process modes, routing, etc. will be automatically formed. Then they will be checked up by the usual computational methods of computer aided manufacturing (CAM). The results of the check, as well as the subsequent information about the efficiency of decisions made at this level, may be used for improving the AI system. The most complicated AI structures should be applied at the top control level. This AI system level must have the ability to reveal the recent unusual features in the production process, to make the evaluation of possible influence of these new features on the production process, and to make decisions for changing the control system parameters at the various hierarchical levels or for calling for the operator’s help. At this level, the control system should contain the intelligence knowledge base, which can be created using the results of the operation of the lower level control systems and the expert knowledge. At the beginning, the expert knowledge of macromechanics may be used. At present many methods of AI are successfully used in the industry [16,17]. They could be used also for micromechanics. But the problems of fully automated microfactory creation can not be investigated experimentally in conventional industry because of the high cost of the experiments. Here we propose to develop low cost micromechanical test bed to solve these problems. The prototypes of the first generation microequipment are designed and examined in the Laboratory of Micromechanics and Mechatronics,
Artificial Intelligence Systems In Micromechanics
5
CCADET, UNAM. The prototypes use adaptive algorithms of the lowest level. At present more sophisticated algorithms based on neural networks and genetic algorithms are being developed. Below we describe our experiments in the area of such algorithms development and applications.
4.
DEVELOPMENT OF MICROEQUIPMENT PROTOTYPES AND ADAPTIVE ALGORITHMS
4.1
Micromachine Tools
The developed prototype of the first generation micromachine tool is shown in Fig. 1. We have been exploiting this prototype for approximately four years for experimental work and student training.
Figure 1. The developed second prototype of the first generation of micromachine tool.
This prototype of the micromachine tool has the size and is controlled by a PC. The axes X and Z have 20 mm of displacement and the Y -axis has 35 mm of displacement; all have the same configuration. The resolution is per motor step.
4.2
Micromanipulators
At present, in the Laboratory of Micromechanics and Mechatronics, CCADET, UNAM the principles, designs and methods of manufacture of micromachine tools and micromanipulators corresponding to the first
6
F.Lara-Rosano, E.Kussul, T.Baidyk, L.Ruiz, A.Caballero, G.Velasco
microequipment generation are developed. All these works are accompanied with of the prototypes development (Fig.2).
Figure 2. Sequential micromanipulator prototype
4.3
Computer vision system
To obtain a low cost microequipment it is necessary to use low cost components. Low cost components do not permit us to obtain high absolute accuracy of the assembly devices. To avoid this drawback we have developed an adaptive algorithm for microassembly using a technical vision system (Fig. 3).
Figure 3. The prototype of visual controlled assembly system
The main idea of this approach is to replace the stereovision system, which demands two video cameras, for the system with one TV camera for teleconferences, with a cost of 40 dollars, and four light sources. The shadows from the light sources permit us to obtain the 3-D position of the
Artificial Intelligence Systems In Micromechanics
7
needle with the microring relative to the hole. The microring is to be inserted into the hole. We use a neural classifier to recognize the relative position. The problem of automatic microdevices assembly is very important in mechatronics and micromechanics. To obtain the high precision, it is necessary to use adaptive algorithms on the base of technical vision systems. We proposed an approach, that permits us to develop the adaptive algorithms based on neural networks. We consider the conventional pin-hole task. It is necessary to insert the pin into the hole using a low cost technical vision system. For this purpose it is necessary to know the displacements (dx, dy, dz) of the pin tip relative to the hole. It is possible to evaluate these displacements with a stereovision system, which resolves 3D problems. The stereovision system demands two TV cameras. To simplify the control system we propose the transformation of 3D into 2D images preserving all the information about mutual location of the pin and the hole. This approach makes it possible to use only one TV camera. Four light sources are used to obtain pin shadows. Mutual location of these shadows and the hole contains all the information about the displacements of the pin relative to the hole. The displacements in the horizontal plane (dx, dy) could be obtained directly by displacements of shadows center points relative to the hole center. Vertical displacement of the pin may be obtained from the distance between the shadows. To calculate the displacements it is necessary to have all the shadows in one image. We capture four images corresponding to each light source sequentially, and then we extract contours and superpose four contour images. We use the resulting image to recognize the position of the pin relative to the hole. We developed the neural network system which permits us to recognize the pin-hole displacements with errors less than 1 pixel [11,12].
4.4
Adaptive Algorithm of the Lowest Level
To compensate for the machine tool errors we have developed a special algorithm for the workpiece diameter measurement using the electrical contact of the workpiece with the measurement disk (Fig. 4). This measurement allows us to develop the algorithm for brass needle cutting. We obtained a brass needle with a diameter of and a length of (Fig. 5) almost equal to the Japanese needle [18].
8
F.Lara-Rosano, E.Kussul, T.Baidyk, L.Ruiz, A.Caballero, G.Velasco
Figure 4. The workpiece with measurement disk
Figure 5. The brass needle with
4.5
diameter
Genetic Algorithm for Micromachine Tool Characterization
To improve the micromachine tool precision it is necessary to correct its errors. To obtain the information about the micromachine tools errors, we use a two balls scheme for machine tool parameters measurement. One ball is fixed on the special tool support, which is inserted to the chuck. The second ball is fixed on the machine tool carriage (Fig. 6). By moving the carriage with the second ball up to the contact with the first ball in different positions it is possible to obtain all the needed information about the geometrical properties of the machine tool. But the geometrical parameters depend on the contact positions in a very complicated manner. To resolve the system of nonlinear equations which represent the mentioned dependence we use a genetic algorithm. This approach permits us to reduce to one third the micromachine tools errors.
Artificial Intelligence Systems In Micromechanics
5.
9
CONCLUSIONS
AI algorithms could be used to increase the level of manufacturing processes automatization. The experiments with AI algorithms in real industry factories are too expensive. In this article a low cost test bed for AI method examinations is proposed. This test bed is composed of the micromechanical models of conventional industry devices. The prototypes of micromachine tools and micromanipulators were developed and examined with some AI algorithms. The test bed examination results show that AI systems could be proved with low expenses.
Figure 6. Ball location in the micromachine tool
ACKNOWLEDGEMENTS This work was supported by CCADET, UNAM, projects CONACYT 33944-U, PAPIIT 1112102, NSF-CONACYT 39395-A.
REFERENCES 1. Eberhart, R.: Overview of computational intelligence [and biomedical engineering applications]. Proceedings od the 20-th Annual International Conference of the IEEE Engineering in Medicine and Biology Society 3 (1998) 1125-1129
10
F.Lara-Rosano, E.Kussul, T.Baidyk, L.Ruiz, A.Caballero, G.Velasco
2. Hui, T., Brown, D., Haynes, B., Xinxian Wang: Embedded e-diagnostic for distributed industrial machinery. IEEE International Symposium on Computational Intelligence for Measurement Systems and Applications (2003) 156-161 3. Awadallah, M., Morcos, M.: Application of AI tools in fault diagnosis of electrical machines and drives-an overview. IEEE Transactions on Energy Conversion 18, Issue 2 (2003) 245-251 4. Werbos, P.: Advanced Forecasting Methods for Global Crisis Warning and Models of Intelligence. In: General Systems Yearbook 22 (1977) 25-38 5. Prokhorov, D., Wunsch, D.: Adaptive Critic Designs. IEEE Trans. on Neural Networks 8, N 5 (1997) 997-1007 6. Xiaoou Li, Lara-Rosano, F.: A weighted fuzzy Petri net model for knowledge learning and reasoning. International Joint Conference on Neural Networks, IJCNN ’99 4 2368 –2372 7. Xiaoou Li, Wen Yu, Lara-Rosano, F.: Dynamic Knowledge Inference and Learning under Adaptive Fuzzy Petri Net Framework. IEEE Transactions on Systems, Man, and Cybernetics 39, N4 (2000) 442-450 8. Bottou, L., Cortes, C., Denker, J., Drucker, H., Guyon L., Jackel L., LeCun J., Muller U., Sackinger E., Simard P., Vapnik V.: Comparison of Classifier Methods: a Case Study in Handwritten Digit Recognition. In: Proceedings of IAPR International Conference on Pattern Recognition 2 (1994) 77-82 9. Fukushima, K. Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Networks 1 (1988) 119-130 10. Roska, T., Rodriguez-Vazquez, A.: Toward visual microprocessors. Proceedings of the IEEE 90, Issue 7 (July 2002) 1244-1257 11. Baidyk, T.: Application of Flat Image Recognition Technique for Automation of Micro Device Production. Proceedings of the International Conference on Advanced Intelligent Mechatronics “AIM’01”, Italy (2001) 488-494 12. Baidyk, T., Kussul, E., Makeyev, O., Caballero, A., Ruiz, L., Carrera, G., Velasco, G.: Flat image recognition in the process of microdevice assembly. Pattern Recognition Letters 25/1 (2004) 107-118 13. Kussul, E.: Micromechanics and the perspectives of neurocomputing. In: Neuron-like Networks and Neurocomputers, Kiev, Ukraine (1993) 76-82 (in Russian) 14. Kussul, E., Rachkovskij, D., Baidyk, T., Talayev, S.: Micromechanical Engineering: a Basis for the Low-cost Manufacturing of Mechanical Microdevices Using Microequipment. Journal of Micromechanics and Microengineering 6 (1996) 410-425 15. Kussul, E., Baidyk, T., Ruiz-Huerta, L., Caballero, A., Velasco, G., Kasatkina, L.: Development of Micromachine Tool Prototypes for Microfactories. Journal of Micromechanics and Microengineering 12 (2002) 795-812 16. Wenxin Liu, Venayagamoorthy, G., Wunsch, D.: A heuristic dynamic programming based power system stabilizer for a turbogenerator in a single machine power system. IAS Annual Meeting on Industry Applications 1 (2003) 270-276 17. Croce, F., Delfino, B., et al.: Operations and management of the electric system for industrial plants: an expert system prototype for load-shedding operator assistance. IEEE Transactions on Industry Applications 37, Issue 3 (2001) 701-708 18. Okazaki, Yu., Kitahara T.: Micro-Lathe Equipped with Closed-Loop Numerical Control. Proceedings of the 2-nd International Workshop on Microfactories, Switzerland (2000) 8790
INTEGRATING TWO ARTIFICIAL INTELLIGENCE THEORIES IN A MEDICAL DIAGNOSIS APPLICATION Hadrian Peter1, Wayne Goodridge2 1
University of the West Indies, Barbados; 2Dalhousie University, Canada
Abstract:
Reasoning Systems (Inference Mechanisms) and Neural Networks are two major areas of Artificial Intelligence (AI). The use of case-based reasoning in Artificial Intelligence systems is well known. Similarly, the AI literature is replete with papers on neural networks. However, there is relatively little research in which the theories of case-based reasoning and neural networks are combined. In this paper we integrate the two theories and show how the resulting model is used in a medical diagnosis application. An implementation of our model provides a valuable prototype for medical experts and medical students alike.
Key words:
Medical diagnosis, neural networks, case-based reasoning, reasoning system.
1.
INTRODUCTION
Research in Artificial Intelligence (AI) in medicine has relied on progress in research in knowledge bases and reasoning systems (inference mechanisms). Over the years many medical diagnosis systems – MYCIN, Iliad, DXplain, CADIAG-II, INTERNIST, QMR, and MDDB, to name a few – of which MYCIN [1, 2] is arguably the most popular, have been developed. The predominant form of knowledge in MYCIN is represented as a set of rules, and the reasoning system used is backward chaining. Iliad and DXplain [3] both use Bayesian reasoning to calculate probabilities of various diagnoses. CADIAG-II [4] – Computer-Assisted DIAGnosis – is a computer-assisted consultation system to support the differential diagnostic process in internal medicine. CADIAG-II uses fuzzy-based reasoning, however, the underlying knowledge base used is not explicitly described in
Hadrian Peter, Wayne Goodridge
12
the literature. The knowledge base of INTERNIST, and the strategy used by INTERNIST to address the diagnosis of patients, are described in [5, 6,7, 8]. QMR (Quick Medical Reference) [9, 10, 11], a reengineering of INTERNIST, is an in-depth information resource that helps physicians to diagnose adult disease. However, again, the underlying reasoning and knowledge systems employed in this diagnosis system are not readily available in the literature. Although MDDB [12] uses case-based reasoning, it uses simple lists as its knowledge base. A disadvantage of most of these methods is that, although they exhibit the capability of making differential diagnoses1, they do not offer definitive medical consultation2. A few earlier attempts at combining the theories of neural networks and case-based reasoning are found in [13,14,15]. In this paper we attempt to correct the shortcomings of the above methods by presenting a new approach to medical diagnosis in which we combine a knowledge base, whose underlying structure is the neural network [16, 17,18], with a Case-Based Reasoning system [19,20,21, 22]. We begin by reviewing case-based reasoning (CBR), we identify problems with such reasoning when used in the medical domain, and provide the motivation for our approach. We then examine neural networks, in particular the mathematical underpinnings of heteroassociative memory neural networks [23], and how they are incorporated in our model. The architecture of our model – the Case-based Memory Network (CBMN) – is introduced in the next section. We then present the medical diagnosis process in our model, followed by the operational model, a short simulation, and a consultation session. The paper ends with a brief evaluation of the model.
2.
MATERIALS AND METHODS
2.1
Case-Based Reasoning
Case-based Reasoning (CBR) [19] is an inference mechanism that has found increasing use in expert systems. It consists of the following four stages: retrieve the most similar case or cases; reuse the retrieved case or cases to solve the problem by analogical reasoning; revise the proposed
1
A differential diagnosis results when 2 or more diagnoses are possible. These diagnoses are prioritized. 2 Definitive medical consultation between the user and the system leads to a concrete diagnosis.
Integrating Two AI Theories In A Medical Diagnosis Application
13
solution; retain the parts of this experience which are likely to be useful for future problem solving. When CBR is applied to medical diagnosis systems, the following problems are usually identified: There is a concentration on reference rather than on diagnosis. There is a lack of intelligent dialog. This may result in “missing information” and therefore a decrease of the accuracy of the diagnosis. Inability of most similarity algorithms to handle attributes whose values are unknown. If the case base contains cases with attributes that take on multiple (rather than just binary) values, then the case base will be quite complex – requiring large numbers of predicates, relations, constraints, and operators [24]. Updating (revision) of the case base requires complex algorithms and/or highly skilled users. To overcome these problems, therefore, we developed a variation to the CBR technique called the Case-Based Memory Network (CBMN) model [25]. It was primarily developed to solve medical diagnostic problems and not “pure” classification problems [22]. To simulate the CBMN model we have also designed and implemented an expert system prototype called CaseB-Pro - an interactive system that accepts observed findings, generates appropriate questions, and makes conclusions based on the observed findings.
2.2
The role of Neural Networks
The attraction of neural networks in our model is that they have the ability to tolerate noisy inputs and to learn – features which are very desirable in a medical diagnosis system. The CBMN uses a special type of neural network called a heteroassociative neural network [23]. This neural network provides a mechanism for learning, recording what has been learnt, and identifying stored knowledge. The network stores disease patterns associated with cases, and also recalls cases from memory based on the similarity of those cases to the symptoms of the current case. This technique is different from the similarity measure and retrieval techniques such as kdtrees and Case Retrieval Nets (CRNs) [22] employed in CBR. Related classical works in the field of associative memories are found in [26, 27].
Hadrian Peter, Wayne Goodridge
14
Let the findings associated with a case be represented by a vector s(p), where p = 1,2, ...,P. Each vector s(p) is an n-tuple. Let the case associated with findings, s(p), be represented by a vector t(p). Each t(p) is an m-tuple. In our model we store (findings, case) pairs – that is, p= 1,....,P. Here, a “case” is an actual patient, and a “finding” is a symptom, sign, or an investigation. “P” is the maximum number of cases in the database, where and
We also define a weight matrix
where
The heteroassociative neural network can be described as a discrete network where the input and output nodes take values from the set {-1, 0,1}. We interpret the values as follows: -1 represents the findings that are absent, 0 represents the unknown findings, and 1 represents the findings that are present. Now (observed findings) can be represented as an n-tuple input vector, say k. Vector k will then be mapped to the domain by the matrix – findings will be mapped onto cases. That is,
or
where Example If the output layer of the network contains 3 nodes, then some of the following mappings are possible: map 1 map 2 map 3
Whenever new findings are presented to the current case, k is changed and, when multiplied with the weight matrix the vector t(p) is determined. This value of t(p) is then used to determine a set of actual cases from the case base that matches the observed findings.
3
The term training set is sometimes used to describe the vector t(p).
Integrating Two AI Theories In A Medical Diagnosis Application
15
If a node in vector t has a positive value, then this node represents a case in which the disorder associated with that case matches the current observed findings. For example, in map 1 the disorder associated with case 1 is a possible candidate. If a mapped vector t contains nodes with varying positive values, then the node with the largest positive value is most likely to be the case that has the most likely associated disorder for the observed findings. For example, if t = (3,1,-1) then the disorders associated with cases 1 and 2 are likely. However, the disorder associated with case 1 is the more likely candidate. A disorder, say k, is a part of a definitive diagnosis only if the available findings that will lead to a diagnosis of k exceed the findings that are known. This serves as the point at which we stop posing questions to the system. Let unknown input nodes not included and only unknown nodes included If then k can be a part of the definitive diagnosis.
2.3
Architecture of the CBMN
In its simplest form the CBMN structure consists of input information entity (IE) nodes and output (case) nodes. The design goal of the CBMN model is to ensure that a knowledge base, and learning and reasoning mechanisms can be incorporated in the same data structure and be used for diagnostic problem solving. In diagnosing a patient the physician utilizes information from past typical or known exceptional cases that are usually described by a list of symptoms. To design a medical case base used for diagnostic purposes it is necessary to have two types of cases in the case base [28]: 1. Case Generalizations called prototypes (pure cases) - these are the “classical” (textbook) cases as viewed by the medical expert. 2. General domain cases – these are actual cases. The features of disorders – in the input layer of the network - are mapped onto case prototypes (in the hidden layer) which represent the “text book” view of disorders in terms of its identifying features. A case – representing the output layer of the network - is an instance of a prototype, in the same way that an object is an instance of a class in the object oriented programming paradigm [29, 30]. The arrows in the diagram denote weighted links from the input nodes to the case nodes, and the calculation, and adjustment, of these weights is known as training the network. Cases are actual examples of patients and, in the CBMN model, cannot exist without prototypes, which are the physician’s representation of
16
Hadrian Peter, Wayne Goodridge
disorders. That is, every case in the case-base must be associated with one and only one known classical medical case (prototype).
Figure 1. The CBMN Architecture with Prototypes
2.4
The Medical Diagnostic Process in the CBMN
A medical consultation consists of the following stages: Recording of symptoms and patient history. Elicitation / Identification of signs. Formulation of notion of diagnosis [31] leading to a hypothesis and differential diagnosis. Investigations to narrow down or confirm the diagnosis. Medical diagnosis depends heavily on known facts about the case in question. The facts are used to form a notion of diagnosis [31], which results in a hypothesis. This hypothesis is strengthened or weakened by discovering more facts about the current case, which in turn invokes a different notion of diagnosis. This process is continued until a definitive diagnosis is found. So,
Integrating Two AI Theories In A Medical Diagnosis Application
17
again, making a definitive diagnosis is one of the essential differences between CBMN and many extant medical diagnosis systems. The new approach to the CBMN model includes a certainty factor4 [18] and a prevalence factor for each prototype in the case base. A certainty factor is a number ranging from 0 to 10 that represents the physician’s impression of the significance of the presence of a feature with respect to the prototype. A prevalence factor is a number ranging from 0 to 10 that expresses the physician’s impression that a patient will have the disorder associated with a given prototype. The certainty factor and prevalence factor may or may not be a scientific measurement since it represents only the physician's notion of the disorder. The presence or absence of features will affect the physician’s belief or disbelief in his hypothesis. Hence the concept of the belief factor is used in the CBMN to “balance” a physician’s belief and disbelief in a hypothesis. We now present the algorithm to find the next best question to be asked at a given stage of the medical consultation process. The system cannot reach any definitive conclusions until it has exhausted each stage.
1. Find the set, say S, of prototypes that match the observed findings. 2. Find the prototype, say k, with the highest belief factor in the set S. 3. Use the unknown feature, say f, of prototype k with the highest certainty factor to generate the next question. 4. If the certainty factor of f is greater than a question threshold value set by the experimenter), then the system moves to the next consultation stage until the investigation stage is reached. 5. When a diagnostic stage is finished the system lists all the prototypes with a confidence measure greater that a threshold value as candidates for the diagnosis of the presented features. 6. Repeat steps 1-5 above until the investigation stage is completed. The main objective of this algorithm is to find the optimum diagnostic path that will: (a) Get the correct diagnosis by asking the minimum number of questions and (b) exhaust each diagnostic stage before moving on to the next.
2.4.1
Simulation
The simulation of the diagnostic process is implemented in Delphi using a prototype called CaseB-Pro. Its functional specifications can be divided into
4
This model is an example of a truth-functional system for uncertain reasoning.
18
Hadrian Peter, Wayne Goodridge
the following two steps: training the system to identify new cases and using the case base to gain a computer-generated diagnosis. The two main design goals of the CaseB-Pro prototype - an expert system that combines the theories of neural networks and case-based reasoning – are to implement and test the CBMN model and to develop a computer system that can assist medical students and physicians with the diagnosing of patients. In section 4 we provide an evaluation of the model and an assessment of its “success”.
Figure 2. Creating Prototype for CML
The CBMN model uses three types of data structure to represent knowledge. These include: feature information entity data structures, prototype information entity data structures and case information entity data structures. The training of the model involves adding prototypes to the case base and then, if desired, adding sub-prototypes and actual cases associated with those prototypes. Training also involves the assignment of symptoms, signs, investigations, and exclusions. Training is conducted by using data from classical and actual cases of the disorder in question. The network is trained each time a new case is added to the database, or an existing case is modified. The neural network is used to store and recall cases. Each
Integrating Two AI Theories In A Medical Diagnosis Application
19
prototype, sub-prototype, and case, of the case base, has in common a node identification number that uniquely identifies the network node in question. Figure 2 illustrates how the prototype for the Chronic Myeloid Leukaemia (CML) disorder can be added to the system. Other prototypes can be added in a similar manner.
2.4.2
Interacting with the System
Figure 3 provides the interface through which users – namely, medical students, physicians, or other medical experts - interact with the CaseB-Pro system. A typical session (consultation) is invoked when a user types the “consult” command. A list of symptoms is then shown where the user can select major presenting symptoms in the case under consideration. The system then allows the user to enter examination results of the case in question. When the examination stage is exhausted the consultation enters into the investigation stage. An example consultation is provided for illustration.
Figure 3. Example of part of consultation session
Hadrian Peter, Wayne Goodridge
20
3.
RESULTS
For purposes of testing the system ten cases of Haematological conditions and related disorders were added to the case base. In order to make a preliminary evaluation of CaseB-Pro’s diagnostic capabilities, two medical experts who specialize in Haemoncological disorders independently simulated 18 classical Haematological cases within the scope of the system.
Key of symbols/abbreviations used in the table: MM = Multiple Myeloma DD = Differential Diagnosis NHL = Non Hodgkins Lymphoma +++ = DD Relevant (Related Disorders) CML = Chronic Myeloid Leukaemia ++ = DD Relevant (Unrelated Disorders) MF = Myelofibrosis + = DD Possibly Relevant PRV = Polycythaemia Rubra Vera = DD Irrelevant TB = Tuberculosis ALL = Acute Lymphoblastic Leukaemia AML = Acute Myeloid Leukamia AA = Aplastic Anemia
Table 1 shows the results of five out of 18 randomly selected simulated interactions. It is, however, important to note that although only 5 of the 18 cases are included in the table, in none of the 18 cases did the medical expert and CaseB-Pro arrive at totally different diagnoses. More specifically, in 9 of the 18 (50%) of the cases the diagnoses were the same at the end of stage 2 (see section 2.4) of the medical consultation. Of the 9 cases in which the diagnoses did not match exactly at the end of stage 2, 7 (77.8%) of them resulted in a match after the differential diagnosis (third) stage. The remaining 2 cases, for which concrete results were not possible at the end of stage 3, produced results in concurrence with the medical expert’s diagnoses after further investigations were conducted (in stage 4). As indicated in section 2.4.1 CaseB-Pro was also used as a teaching tool. To test this feature of CaseB-Pro, two students in the advanced stage of their medical studies were encouraged to interact with the system. Initially the students complained that they felt intimidated by the system and feared that
Integrating Two AI Theories In A Medical Diagnosis Application
21
the system would expose their lack of knowledge. However, after being persuaded by their medical instructor to use the system as a classroom exercise, the students were noticeably more relaxed. Under the guidance of their medical professor, the students were allowed to conduct an extensive interaction with the system in an effort to diagnose MM, MF, PRV, ALL, and AA (please see the table below). In 80% of the cases the students were able to arrive at the correct diagnoses largely due, they reported, to the ease with which they were able to follow the “trend of thought” used by CaseBPro.
4.
EVALUATION AND CONCLUSION
Many medical diagnosis systems have been designed and are currently in use. The originality of our approach, however, is that we have designed and implemented a system that combines case-based reasoning and artificial neural networks. Because of the restriction placed on the length of our paper, we were unable to provide a more detailed comparison with other approaches. Consistent with our goals we were able to (a) implement and test our model, and (b) to develop a computer system that can assist medical students and physicians with the diagnosing of patients. We have been able to develop a prototype, the CaseB-Pro, based on our new approach, whose authenticity medical experts and medical students were able to test. It may be too early to make strong pronouncements about the success of our model because it was tested on a small domain. Therefore more research using our approach should be conducted using larger domains and different evaluation strategies. Thus far the feedback from persons who have interacted with our prototype has been encouraging, and therefore we are confident that with further development and testing our prototype can evolve into a useful, full-fledged system.
REFERENCES 1. Buchanan, B.G., Shortliffe, E.H. (eds.): Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project. Addison-Wesley, Reading, Mass. (1984) 2. Shortliffe, B.G. Computer-Based Medical Consultants: MYCIN. Elsevier/North-Holland, Amsterdam (1976) 3. Bouhaddou, O., Sorenson, D.: Iliad V4.4. Journal of Medical Systems, 15(1) (1991) 93-110
22
Hadrian Peter, Wayne Goodridge
4. Adlassnig, K. –P., Kolarz, G. : Representation and semiautomatic acquisition of medical knowledge in CADIAG-1 and CADIAG-2. Computers and Biomedical Research, 19 (1986) 63-79 5. Miller, R. A., Pople, Jr., H. E., and Myers, J. D. INTERNIST-I, an Experimental ComputerBased Diagnostic Consultant for General Internal Medicine, New England J. Med., 307 (August 1982) 468-476. 6. Pople, Jr., H. E. The Formation of Composite Hypotheses in Diagnostic Problem Solving: An Exercise in Synthetic Reasoning, IJCAI 5 (1977) 7. Pople, Jr., H. E. Heuristic Methods for Imposing Structure on Ill-Structured Problems: The Structuring of Medical Diagnostics, in P. Szolovits (ed.), Artificial Intelligence in Medicine, Boulder, CO (1982) 119-185 8. Masserie, F. E., Miller, R.A., and Myers, J.D. INTERNIST-I properties: Representing common sense and good medical practice in a computerized medical knowledge base. Computers and Biomedical Research, 18 (1985) 458-479 9. Arene, I., et al : Evaluation of quick medical reference (QMR) as a teaching tool. MD Comput., 15 (1998) 323-326 10. Aliferis, C.F., et al: A temporal analysis of QMR. J. Am. Med. Inform. Assoc., 3 (1996) 79-91 11. Miller, R.A., Masarie, F.E.,Jr. : Use of the Quick Medical Reference (QMR) program as a tool for medical education, Methods Inf. Med., 28 (1989) 340-345 12.Gierl, L., Stengel-Rutkowski, S. Integrating Consultation and Semi-automatic Knowledge Acquisition in a Prototype-based Architecture: Experiences with Dysmorphic Syndromes. Artificial Intelligence in Med., 6 (1994) 29-49 13. Myllymaki, P., Tirri, H. Bayesian case-based reasoning with neural networks. In Proceedings of the IEEE International Conference on Neural Networks, San Francisco, CA, 1 (March 1993) 422-427 14. Honkela, T. Self-Organizing Maps and Case-based Reasoning. [Electronic Version] retrieved April, 2004 from http://www.mlab.uiah.fi/networks99/honkela.html 15. Agre, G. Koprinska, I. Case-based Refinement of Knowledge-Based Neural Networks. In: Proceedings of International Conference Intelligent Systems: A Semiotic Perspective, Gaithersburg, MD (Oct. 20-23, 1996) 16. Gallant, S.I.: Neural Network Learning and Expert Systems. MIT, Mass. (1993) 17. Goodridge, W. : The Case-Based Memory Network Model and its Use in Medical Expert Systems: MPhil Thesis, University of the West Indies, Barbados (2000) 18. Russell, S.J., Norvig, P. : Artificial Intelligence : A Modern Approach, ed. Pearson Education, Inc., New Jersey (2003) 19. Aamodt, A., Plaza, P. Case-Based Reasoning: Foundations Issues, Methodological Variations, and Systems Approaches. AI Communications, 7(1) (1994) 39-59 20. Kolodner, J.L. Case-Based Reasoning. Morgan Kaufmann, New York (1993) 21. Koton, P. Evaluating Case-Based Problem Solving. In: Hammond, K.J. (Ed.). Proceedings of the DARPA Workshop on Case-Based Reasoning, Morgan Kaufmann, CA (1989) 26017022. Lenz, M., et al, (eds.). Case-Based Reasoning Technology: from foundations to
Integrating Two AI Theories In A Medical Diagnosis Application
23
application. In: Lecture Notes in Computer Science, Vol. 1400. Springer-Verlag, New York (1998) 23. Fausett, L. Fundamentals of Neural Networks. Architectures, Algorithms, and Applications. Prentice-Hall International Editions, New Jersey (1994 24. Bergmann, R., Wilke, W. Case-Based Reasoning: Foundations Issues, Methodological Variations, and Systems Approaches. AI Communications, 3 (1995) 53-118 25. Goodridge, W., Peter, H., Abayomi, A. The Case-Based Neural Network Model and Its Use in Medical Expert Systems. In: Lecture Notes in Artificial Intelligence, Vol.1620. Springer-Verlag, Denmark (1999) 26. Anderson, J. A. A simple neural network generating an interactive memory. Mathematical Biosciences, 14 (1972) 197-220. 27. Kohonen, T. Correlation matrix memories. IEEE Transactions on Computers, 21 (1972) 353--359 28. Rainer, S., Pollwein, B., Gierl, L. Experiences with Case-Based Reasoning Methods and Prototypes for Medical Knowledge-Based Systems. In: Lecture Notes in Artificial Intelligence, Vol. 1620. Springer-Verlag, Denmark (1999) 29. Schach, S.R. Object-Oriented and Classical Software Engineering, ed. McGraw-Hill, New York (2002) 30. Booch, G. Object-Oriented Analysis and Design with Applications. Addison-Wesley Publishing Company, California (1994) 31. Lucas, P. A Theory of Medical Diagnosis as Hypothesis Refinement. Conference on Artificial Intelligence, Springer-Verlag, New York (1997) 169-180
This page intentionally left blank
ARTIFICIAL INTELLIGENCE AND LAW
Hugo C. Hoeschl;Vânia Barcellos E-Gov, Juridical Intelligence and Systems Institute – Ijuris
Keywords:
Artificial Intelligence (AI); Natural Intelligence (NI), Law; Information of Technology
Abstract:
This article intends to make an analysis of the intersection between Artificial Intelligence (AI) and Natural Intelligence (NI) and its application in the scope of Right. The impact caused by the Information Technology, methodologies and techniques used by the main systems developed in the last years and which the elements for the development of intelligent applications in the legal domain, with the aim of demonstrating the capacity to manipulate the knowledge properly and, being so, systemizing its relations, clarifying its bonds, to evaluate the results and applications. There is a real need of new tools that conciliate the best of AI and NI techniques, generating methods and techniques of storage and manipulation of information, what will reflect on law and justice.
1.
INTRODUCTION
According to Carnelutti [7] “to discover the rule of legal constructing, science does not have, of course, other ways beyond the senses and intelligence. Intelligence is nothing but the capacity to learn, to apprehend or to understand, to interpret and, mainly to adapt the factual situations. On the one hand, we have all this systematization of Law, using NI, its evolution, its technical, historical and social conditioning; on the other, we have the vertiginous evolution of the technology of the computer sciences, which have an search field dedicated of to the reproduction of human abilities, handicrafts as well as intellectual capacities, which is AI.
26
H.Hoeschl, V.Barcellos
The present study intends to start with the elements for the development of intelligent applications in the legal domain (section 2), demonstrating the intersection between AI and the NI, applied to Law (section 3). It intends to evaluate how the research is getting on, in a world-wide context (section 4), we will illustrate (section 5) the application of AI in Law through some empirical procedures adopted by the author and his team (section 5).
2.
ELEMENTS FOR THE APPLICATION OF AI TO THE LAW SPHERE
The AI sphere has been studied academically since the 1950s, but it has generated an increasing interest nowadays because of the arising of practical commercial applications. Researches in AI are related to areas of application that involve human reasoning, trying to imitate it and performing inferences. According to Savory[22], these areas of application that are generally enclosed in the definitions of AI include, among others: specialized systems or systems based on knowledge; intelligent / learning systems; understanding / translation from natural language; understanding / voice generating; analysis of image, scene in real time and automatic programming. Notice, then, an introductory and superficial vision, about how artificial intelligence can be defined [18]: “Artificial intelligence - The field of the computer sciences that intends to perfect the computers endowing them with some peculiar characteristics of human human intelligence, as the capacity to understand natural language and to simulate the reasoning in uncertainty conditions”. The following are important aspects of AI, as Rabuske [21], among others: development of heuristical methods for the solution of problems; representation of knowledge; treatment of natural language; acquisition of knowledge; artificial reasoning and logics and tools. Amongst its main applications, we have the following: mastering systems; processing of natural language; recognition of standards; robotics; intelligent databases; test of theorems and games. To make use of intelligent techniques and to try to develop computational tools endowed with logic or structured in cases, in order to assist in the study of legal data, involves another difficult task, which is, to analyze the form chosen by men to communicate and to materialize its norms: the codification of the word in abstract symbols and rigorous grammar rules. The study and development of any system of automatic and intelligent treatment of the legal information involves, basically, two tasks: the
Artificial Intelligence And Law
27
treatment of the natural language and the search of new techniques of storage. In the first one, the structure of a mechanism, which reads texts and, properly guided, identifies a series of relevant characteristics for the user, in some specific stages. It must search superficial and static references, as dates, names, numbers, etc; to identify subjects, subjects and sub-subjects and, finally, to detect conclusions and lessons, highlighting them, obviously, from other functions. In the second task, it is good to inquire about the return to the origins of the language. i.e.: the first forms of writing were pictographic, and, in the computational scope, the development of languages and interfaces are in allowing the use of icons (pictographic forms) [12], a more comfortable and practical means of communication than orthography. This is allowing to idealize a significant advance in communication, according to which “written texts will give place to mental images that present real as well as symbolic objects and emphasize the interaction and the experience in detriment of the passive learning” [12].
3.
ARTIFICIAL INTELLIGENCE vs. NATURAL INTELLIGENCE
In the sense of searching, in the practical sphere, this evolution announced, has a powerful referential: the intersection between NI and AI, where it is possible to try to conciliate the processing speed of the second and sophistication of the first one, as pointed by Epstein [11]. Artificial intelligence, in the delimited context - without damage of the already already presented definition - can also be understood, in a still very primary vision, as “the set of techniques used to try to carry through automatons adopting behaviors similar to the ones of human thought”, as pointed by Morvan (apud Epstein) [11]. We know that NI is inferior to AI in the capacity of search and the examination of options, but superior in refined and percipient tasks, such as making analogies and to creating metaphors. Thus, a mechanism that combines NI and AI techniques, intending an adequate manipulation of natural language, allows the identification of ideas of a legal text. However, it is important to emphasize that a step in the searching direction, the body of a writing piece, in what a person “thought”, or either, theirs ideas and conclusions, is teleologically linked to the desire of searching what a person really “felt” when analyzing the subject on which they wrote.
H.Hoeschl, V.Barcellos
28
3.1
Reasoning’s figures
Is important to emphasize that AI it is a figure typical of the information technology, practically molded by it. For the delineation of the intersection pointed, we will soon devote attention to some figures linked to NI, as the analogical reasoning, existing before computers [18]: Analogical reasoning - A form of knowledge in which the dynamics of a phenomenon of the real world – such as aerodynamics of an airplane which is intended to make - is understood from the study of a model of the phenomenon. One of the biggest contributions of the computer sciences was to reduce costs (and to increase convenience) of analogical reasoning. The reasoning based on some case is something almost as old as the human habit of “to walking forward”. However, we are dealing with a tool of artificial intelligence that uses such nomenclature, being able to be defined as a “methodology”, which has as a basic characteristic to search the best solution for a current situation in previous experiences in previous experiences, applying the already consolidated knowledge and whose effectiveness was already validated. Such procedures, derived from the information technology, has an evident similarity to a traditional figure of the legal reasoning, analogy, one of the most efficient and pertinent instruments for the integration of the commands of law. According to Bobbio, the analogy can be defined as follows [3]: “we understood by “analogy” the procedure by which a not-regulated case is attributed the same discipline than to a similar regulated case.... The analogy is certainly more typical and the most important interpretative procedures of a certain normative system,: it is the procedure by means of which the called trend of each legal system to become is explained to enlarge itself beyond the cases expressly regulated.” According to Bobbio, “To be able to come to a conclusion, that is, to make the attribution to the non-regulated case of the same legal consequences attributed to the similar regulated case, there is a need of not any similarity between the two cases, but a relevant similarity, is necessary to ascend from the two cases to a quality that both have in common, that is at the same time the very reason by which those and not other consequences had been attributed to the regulated case”. Other figures resemble the presented context, as the extensive interpretation and the syllogism, with which the analogy cannot be confused. The syllogism has a vertical mechanism of drawing conclusions, while the analogy and the extensive interpretation have a horizontal resource. But, even though analogy and extensive interpretation are next to one another and
Artificial Intelligence And Law
29
horizontalized, they are significantly different from each other, pointed also by the very author [3]. This difference causes a strong impact on the construction and modeling of intelligent systems in the legal area, since the proposal is not the construction of systems that generate norms, but that facilitate its application (at least for the time being). The comparison between institutes demonstrates the importance of the analysis of the structuralized logical processes around the reasoning of a specific area, and demonstrates also, that the logics has much to contribute with artificial intelligence – even in systems based on cases -, being exactly this one of the most favorable aspects of the intersection pointed between AI and NI. This comparison has the purpose of demonstrating that the approach of these institutes tends to produce good results, as well as the viability of the intersections, of AI with the NI, as well as of the figures of reasoning derived from the technology of information with those particular to the legal universe.
4.
STATE OF ART IN THE INTERNATIONAL SCENERY
We focus on the scientific production in the area of artificial intelligence and law in the International Conference of Artificial Intelligence and Law ICAIL, edition carried through in Oslo, Norway. In this context, the following works are distinguished: Towards adding knowledge to learning algorithms for indexing legal cases, Bruninghaus[6]. The authors mainly ratify important basic concepts in the scope of the development of intelligent systems for the legal domain those structured in cases - as, for example the high cost and the slowness of the manual representation of concrete situations. A classification based on boarding was used in the Smile System, constructed by the authors, to locate prompt situations in legal texts, in a automatic way. A controlled vocabulary and linguistic information were integrated to the system with an algorithm of machine learning, important allies in the task of surpassing the traditional difficulties of language. Bankruptcy case law: a hybrid IR-CBR approach, Elhadi and Tibor [10] describe a work of combination between retrieval of information (information retrieval, IR) and CBR, with a modeling based on the procedures according to which the legal operators effect its research techniques as a parcel of the procedural dialectic, suggesting, specifically, that the experiences accumulated by the professionals of the legal domain are used in the modeling of the system.
30
H.Hoeschl, V.Barcellos
Some observations on modeling case based reasoning with formal argument models, Bench-Capon [2] warns the international community about the importance of conciliating cases and norms in the modeling of a system for this type of application, affirming that “the truth on the subject is that both are essential”. The evaluation of legal knowledge based system, Stranieri and Zeleznikow [23] perform important premises on the evaluation of systems based on the legal knowledge, affirming that strategical evaluations of systems that operate in a so complex domain as the legal one are difficult because of the specificity of the systems, considering a model of evaluation with neural nets, referenced by the binoms “structural/functional” and “ qualitative/ quantitative” to evaluate the answers emitted by a system. Dialectic semantic for argumentation frameworks. Jakobovits and Vernier [15] proposed a formalist study of dialogues, where a dialogue is seen with a shock between two people, initiated by the proponent who defends the proposed thesis. Carrying through an examination of the argumentative criteria and the different types of dialogues, the authors consider to provide a semantics correspondence allusive to the blocked dialogue. Precedent, deontic logic and inheritance. Horty [14] considers the establishment of some connections between reasoning based on precedents and deontic and monotonic logics He supports that deontic logics acts as a prioritary sensitive reasoning in a norms conflict, reformulating and simplifying a reasoning model based in Ashley’s precedents, according to the deontic logics. AI techniques for modeling legal negotiation. Bellucci and Zeleznikow [1], the authors centralize the study in the introduction of the development of an algorithm that uses methods of AI to support the production of a business decision, using a form of cognitive mapping called “bi directed fuzzy cognitive maps”. The integration of retrieval, reasoning and drafting for refugee law: a third generation legal knowledge based system. Yeardwood and Stranieri [25]. The authors had developed a structure for the construction of arguments that includes more than 200 arguments, in contribution with the Refugee Review Court of Australia. The process of construction of these arguments requires the integration of retrieval of literal information (IR) with reasoning. Justice: A judicial search tool using intelligent concept extraction. Osborn and Sterling present the modeling of a system based on legal knowledge called “Justice”, whose target is to retrieve previous cases. It uses concepts of conceptual retrieval of information, conceptual summarization, analysis automatized statistics and informal document conversion for formalized semistructured representations.
Artificial Intelligence And Law
31
A demonstration of a legal reasoning system based on teleological analogies, text in the which Kakuta and Haraguchi [16], using a structure called GDA (Goal-Dependent Abstraction), the authors analyze an exemplary situation judged by the Supreme Court of Japan, ahead of two consistent doctrinal trends, demonstrating how the system can become an important aid in the task of evaluating and to revise interpretations of norms. Agents and norms: how to fill the gap?, Falcone and Sartor [9] affirm that two specific structural approaches are important to the work described: 1. That of the legal theory and related computational applications, especially in the areas of specialist legal systems, normative reasoning and juridical diagnostic; e 2. That of the theory of the multiagent systems (MAS) and the related computational applications, especially in the areas of computerized cooperative work (to computer supported cooperative work, CSCW). The most important aspects are the following: language and its formalism; reference theories (legal philosophy and deontic philosophy, reverenced by the theory of the agents); targets (models of legal institutions, systems of legal information); rhetoric philosophy and the norm concept. Norms in artificial decision making, Boman [5] effected a study on independent artificial agents programmed to represent and to evaluate vacant or inexact information, with the characterization of agents with a production method of decision in real time, at risk or uncertainty, finishing for presenting a model of“contraining action” using norms. Prescribed mental attitudes in goal-adoption and norm-adoption. In this paper, with a theoretical approach, Castelfranchi [8] affirms that the norms do not only want behavioral conformity, but produces culturing and reassures the authority of the norms, considering that they require submission. Approximate syllogisms, on the logic of everyday life, Philips [19] affirms that, since Aristotle, it is recognized that a valid syllogism cannot have two specific premises, claiming that some rules can be established to similar syllogisms with particular premises, affirming that the lay are correct if it is considered that that these syllogisms do not have strict validity, but similar, having in mind that the premises available in daily life are typically particular.
5.
APPLICATION OF AI TO LAW
We will illustrate the application of AI to law through some empirical procedures adopted by the author and his team. Called methodology CBR is used in parts with techniques of retrieval of literal information, presenting a superior performance to the traditional data bases. For in such a way, had been developed two new technologies for the
32
H.Hoeschl, V.Barcellos
team the Context Structured Search– CSS and the Dynamically Contextualised Knowledge Representation (DCKR). CSS® is a methodology that allows the search in natural language through the context of the information contained in the knowledge base, thus breaching, the search paradigm by means of key words and connectors, making it possible for the user to describe a number of characters presented by each consultation, allowing thus, a more elaborated conception of the search. The research is considered ‘ contextual ’ and ‘ structured ’ because of the following reasons: 1. We take into consideration the context of documents stored at the formation of the rhetorical structure of the system 2. This context guides the process of adjustment of the entrance as well as the comparison and election of documents; 3. At the moment of the elaboration of the consultation, the entrance is not limited to a set of words, or the indication of attributes, being able to assume the format of a question structured by the set of a long text is added to the possibility of operating dynamic weights on specific attributes, that work as ‘ filters ’ and make a preliminary election of documents to be analyzed. DCKR® consists of a dynamic process of analysis of the general context that involves the problem focused. It makes comparisons between the context of documents, enabling the accomplishment of a more precise search and with a better quality. Moreover, the documents are retrieved through pre-determined indexes, that can be valuated by the user when consulting. This technique implies a significant increment in the performance in knowledge structured systems The Group of intelligence applied to law has been developing researches and implementing systems, involving technology of the legal information. Among the already developed systems, the following ones are distinguished among others: Alpha Themis® - Intelligent software for the retrieval of the knowledge contained in the “resolutions” of the national courts. It is a system of legal technology, one of the first ones in Brazil to unite Artificial Intelligence and Law. It uses techniques of textual Data base and CBR. Jurisconsulto® - Innovative system to retrieve sentences in computerized data bases through CBR. It uses techniques of textual Data Base and CBR. Jurisconsulto® – Sistema inovador para recuperar decisões judiciais em bancos de dados informatizados através de CBR. Utiliza técnicas de Bando de Dados textuais e CBR. Olimpo®- The system has its performance centered in the combination of aspects derived from CBR and from the representation of added literal information to an suitable organization of knowledge the referring to the resolutions of the Security Council of the UN, what allows the retrieval of texts with characteristics similar to the information supplied by the user in
Artificial Intelligence And Law
33
natural language. New documents are automatically enclosed in the knowledge base through the extraction of relevant information through a technique called DCKR®. Concepts of CBR and techniques of information retrieval have been applied for a better performance of the system, resulting in the methodology called CSS®. Such techniques are consolidated through the application in systems and, mainly, by the approval of papers in International Congresses, such as ICAIL (International Conference on Artificial Intelligence and Law), ICEIS (International Conference on Enterprise Information Systems) among others.
6.
CONCLUSION
The intention is in this work, yet in a synthetic way, to discourse on the intersection between the NI and AI, trying to conciliate them, considering their specificities, in the scope of law. We present some summaries of the activities of international scientific community of artificial intelligence and right, mainly in the development of applications. The comparison of the institutes demonstrates the importance of the analysis of the logical processes structuralized around the reasoning of a specific area, even in systems based on knowledge, having exactly there, one the strongest favorable aspects of the pointed intersection between AI and NI. It is possible to conclude that there is a real necessity of attention to the production of tools that emphasize such activity, because, thus, new methods and techniques of storage and manipulation of information will be generated, what it will reflect in a strong way on law and justice, as writing has done.
REFERENCES 1.
2.
3. 4. 5. 6.
7.
BELLUCCI, Emilia, ZELEZNIKOV, John. AI techniques for modeling legal negotiation. In proceedings of the seventh international conference on artificial intelligence and law, p. 108-116, Oslo: Norway, June, 14-18, 1999. 220 p. BENCH-CAPON, T. J. M. Some observations on modeling case based reasoning with formal argument models. In proceedings of the seventh international conference on artificial intelligence and law, p. 36-42, Oslo: Norway, June, 14-18, 1999. 220 p. BOBBIO, Norberto. Teoria do ordenamento jurídico. 4 ed., São Paulo: Unb, 1994. BOBBIO, Norberto. Teoria general del derecho. Bogotá: Temis, 1987. BOMAN, Magnus. Norms in artificial decision making. Artificial intelligence and law, Dordrecht: Netherlands, v 7, n. 1, p. 17-35, march, 1999. BRUNINGHAUS, Stefanie, ASHLEY, Kevin D. Toward adding knowledge to learning algorithms for indexing legal cases. In proceedings of the seventh international conference on artificial intelligence and law, p. 9-17, Oslo: Norway, June, 14-18, 1999. 220 p. CARNELUTTI, Francesco. Metodologia do Direito, Campinas:Bookseller, 2002.
34 8.
9.
10.
11. 12. 13. 14.
15.
16.
17.
18. 19. 20.
21. 22.
23.
24. 25.
H.Hoeschl, V.Barcellos CASTELFRANCHI, Cristiano. Prescribed mental attitudes in goal-adoption and normadoption. Artificial intelligence and law, Dordrecht: Netherlands, v 7, n. 1, p. 37-50, march, 1999. CONTE, Rosaria, FALCONE, Rino, SARTOR, Giovanni. Introduction: agents and norms: how to fill the gap-. Artificial intelligence and law, Dordrecht: Netherlands, v 7, n. 1, p. 115, march, 1999. ELHADI, Mohamed T., VAMOS, Tibor. Bankruptcy case law: a hybrid 1R-CBR approach. In proceedings of the seventh international conference on artificial intelligence and law, p. 134-135, Oslo: Norway, June, 14-18, 1999. 220 p. EPSTEIN, Isaac. Cibernética. São Paulo: Ática, 1986. HAMIT, Francis. A realidade virtual e a exploração do espaço cibernético. Rio de Janerio: Berkley, 1993. HOESHL, Hugo Cesar. Tecnologia da Informação Jurídica para o Conselho de Segurança da UNO. Ed. Papel Virtual; Rio de Janeiro; 2002. HORTY, John F. Precedent, deontic logic and inheritance. In proceedings of the seventh international conference on artificial intelligence and law, p. 63-72, Oslo: Norway, June, 14-18, 1999. 220 p. JAKOBOVITS, H., VERMEIR, D.. Dialectic semantic for argumentation frameworks. In proceedings of the seventh international conference on artificial intelligence and law, p. 5362, Oslo: Norway, June, 14-18, 1999. 220 p. KAKUTA, Tokuyasu, HARAGUCHI, Makoto. A demonstration of a legal reasoning system based on teleological analogies. In proceedings of the seventh international conference on artificial intelligence and law, p. 196-205, Oslo: Norway, June, 14-18, 1999. 220 p. OSBORN, James, STERLING, Leon. Justice. A judicial search tool using intelligent concept extraction. In proceedings of the seventh international conference on artificial intelligence and law, p. 173-181, Oslo: Norway, June, 14-18, 1999. 220 p. PFAFFENBERGER, Bryan. Dicionário dos usuários de micro computadores. Rio de Janeiro: Campus, 1993. PHILIPS, Lothar. Approximate syllogisms – on the logic of everyday life. Artificial intelligence and law, Dordrecht: Netherlands, v 7, ns. 2-3, p. 227-234, march, 1999. PLAZA, Enric (Eds.). Case-based reasoning research and development. Berlin; Heidelberg; New York; Barcelona; Budapest; Hong Kong; London; Milan; Paris; Santa Clara; Singapore; Tokyo: Springer, 1997. RABUSKE, Renato Antonio. Inteligência Artificial. Florianópolis: Ed. Ufsc, 1995. SAVORY, S. E.(editor), “Some Views on the State of Art in Artificial Intelligence” em “Artificial Intelligence and Expert Systems”, Ellis Horwood Limited, 1988, pp. 21-34, Inglaterra. STRANIERI, Andrew, ZELEZNIKOV, John. The evaluation of legal knowledge based system. In proceedings of the seventh international conference on artificial intelligence and law, p. 18-24, Oslo: Norway, June, 14-18, 1999. 220 p. WOLKMER, Antônio Carlos. Fundamentos da História do direito. YEARDWOOD, John, STRANIERI, Andrew. The integration of retrieval, reasoning and drafting for refugee law: a third generation legal knowledge based system. In proceedings of the seventh international conference on artificial intelligence and law, p. 117-137, Oslo: Norway, June, 14-18, 1999. 220 p.
VIRTUAL MARKET ENVIRONMENT FOR TRADE
Paul Bogg and Peter Dalmaris Faculty of Information Technology, University of Technology, Sydney, Australia
Abstract:
In today’s society, there is a need for greater understanding of activities within electronic market environments and the role information plays in negotiation. Fuelled by greater communication reach and information flow, electronic markets are ever increasingly becoming the primary domain for trade. This paper details a collaborative virtual market environment where participants engaging in trading activities may exploit information refined from both external Internet sources and internal market signals. That information provided is retrieved and distilled using time constrained, data mining techniques.
Key words:
Virtual market place, virtual emarket environment,
1.
INTRODUCTION
What is the future of online trading? What kind of information will a trader require in order to complete a deal? What electronic environment best suits the social dynamic that is the market place of tomorrow? In an information based society, with an increasingly improving technological infrastructure, there is a need for research into how these new factors will influence electronic markets of the future. Presently, few online environments in a narrow range of online market applications, supply the necessary trade information usually required in order for a buyer and seller to complete a deal. Indeed, what is required in order for a deal to be completed can be seen as an constraint satisfaction problem[1] where objectives are represented as constraints and the strategies used to fulfill them are information based.
36
Paul Bogg, Peter Dalmaris
It is therefore necessary to investigate environments that enhance the traders and entrepreneurs. If the fulfillment of objectives belonging to these actors is achieved from appropriate information, then environments which facilitate this kind of information in a timely and constrained manner will conduit market stimulation. Information here includes both external (internet) and internal market signals. There are a number of existing models of electronic market environments around. EBay and ETrade are both examples of present day e-marking trading environments which have been around for a significant number of years. Both provide sufficient trading mechanisms in order to complete deals; however they are not reaching far enough. Their paradigm is suited to well regulated, but simplified buy/sell versions of traditional markets. This paper proposes an alternative: an online collaborative virtual environment, or “virtual e-market place”. Virtual environments are quickly becoming widespread in conjunction with rapidly improving technological infrastructure. Indeed technology seems to be moving so quickly, the potential for virtual environments appearing on hand-held devices, including 3-G mobile devices, in the not too distant future is very real [2]! In an important way, virtual environments complement market environments – market activity is intrinsically social by nature and relies on this communal “look and feel” to generate a buoyant atmosphere on the virtual trading floor. By definition, collaborative virtual environments are social. Figure 1 details the overall conceptual framework of the e-market virtual environment. This environment incorporates the data mining, e-market and agent activities within a single system. At the e-market place layer, basic market transactions are managed as industry processes using a multi-agent business process management system. These transactions include the creation of particular markets, the buying and selling of goods, and the packaging of goods, services, needs and requirements. The basic market transactions take place between a set of ‘major actor classes’ in the market— the decision-makers. These transactions are monitored by ‘supporting actor classes’ that include data mining actors. At the market evolution layer, the entrepreneurial transactions are also managed by business process management technology. This includes the timely extraction of reliable information and knowledge, as well as entrepreneurial intervention. At the market information layer, atomic signals from information bots and other sources feed into the data mining actors, the reliability of these atomic signals is evaluated before the signals are combined by the process management system at the market evolution layer.
Virtual Market Environment For Trade
37
Figure 1. Conceptual framework integrating the domains involved
Section 2 will discuss more about present and future e-markets. Section 3 will discuss data mining methodologies and the relevancies in an e-market environment. Section 4 will deal with a brief overview of a process management system for integrating e-market, data mining, intelligent agent and actor activities. Section 5 will cover an exemplar collaborative virtual environment that has been built at UTS, combining the features of constrained information mining with trading.
2.
ELECTRONIC MARKETS
E-markets are a rich domain of trade and entrepreneurial activity. There is a vast amount of information which resides within the e-markets and the Internet generally which assists market forces and traders alike. Little has been done to harness the rich information flowing through e-markets in order to facilitate negotiation and trade. “[T]he value of the Internet and IT lies in their capacity to store, analyse and communicate information instantly, anywhere at negligible cost” [Economist, 27 Sep. 2000]. The environment of electronic markets is quite different to traditional market environments, and so the scope for trade activity in e-markets cannot be assumed to be the same as their traditional counterpart. Substantial amounts of data may be acquired quickly and cheaply from the electronic markets themselves. Further, valuable signals may be observed by searching the Internet, by reading news feeds, by watching corporate pages, and by reading background market data such as stock exchange data. If all of these individual signals can be distilled into meaningful information, then it may
38
Paul Bogg, Peter Dalmaris
Figure 2. Actors interacting in an emerging market landscape
be useful to the trader who relies on “the right information, at the right granularity, at the right time”. The scope of possible buying and selling is determined by the range of the support provided by the environment. Trading and the provision of information are intimately interrelated. In the example model constructed at UTS, the basic market transactions take place in the “e-market place” between a set of major actor classes; these are based on an extension of the model given in [3]. There are eight major actor classes—see Figure 2. A central logical component is an “e-exchange” where individual markets are created, within which, subsequent deals are done. This class is essentially an “empty virtual box” for holding markets and for advertising their existence. It contains a collection of pre-packaged market protocols that may be invoked by the creation of a market. The buyer and seller classes include those wishing to buy or sell a good or service, or any combination of goods or services, in a market. The remaining actor classes are all entrepreneurial in nature, and are instrumental in market evolution. Members of the content aggregator actor class act as forward aggregators, they package goods and services and place these packages in markets in the e-exchange. Members of the solution providers class act as intermediaries in the negotiation of contracts and the development of long-term business relationships. E-speculators take shortterm positions in markets in the e-exchange. Sell-side asset exchanges exchange or share assets between sellers. Specialist originators act as reverse aggregators, they coordinate and package orders for goods and services on behalf of various buyers. The major actors, shown in Figure 2 are all provided with distilled information by the supporting actors, which utilise a variety of data mining and information discovery methods. The buyers, sellers and entrepreneurs are all realised as avatars in the virtual market place.
Virtual Market Environment For Trade
3.
39
DATA MINING TECHNIQUES
Data mining techniques have successfully been applied to stock, financial market analysis applications [4] and research has been conducted into ecommercial applications of data mining [13]. A variety of structured and unstructured techniques have been developed to mine data from a variety of sources, such as relational databases, web pages and text files. Presently in the e-market environment the main methods have been primarily limited to the B2C framework. This paper describes set of classes which utilise both “classical” and e-commerce based mining methods [5] in order to present timely and essential information to the actors. A set of concurrent mining processes are supported by bots, supported in turn by specialised agents. The structure for the data-mining task is outlined in Figure 3. An actor in the marketplace requests some kind of information. Information bots scour the market place and Internet for relevant information ranging from text in Internet news feeds, to latest stock reports, to price updates and so forth. The information retrieved is stored in a database in an unstructured format. Pre-processors are separated processes which engage in parsing raw data into formats for data-mining. The data mining bots mine the pre-processed data for information discoveries as required by the actor. Process Agents, on behalf of the actor, retrieves the information mined by the data mining bots in a constraint enforced manner. Information returned to the agents finally undergoes a visualisation transformation enacted by the Information Visualiser in order for the actor to understand its importance. Delivering relevant information at the right granularity and within tight time constraints are important features of the data mining interactions model. One final feature exists which is important to the information – its visualisation. This will be outlined in section 5. Information is returned in the form most easily digestible by the actor (for humans, natural language, for agent, statistics and numbers are more easily crunched).
Figure 3. Data mining interaction model
40
4.
Paul Bogg, Peter Dalmaris
INDUSTRY PROCESS MANAGEMENT
The term industry processes is used here to refer to processes that are triggered by both e-market transactions and entrepreneurial intervention. This is in-line with the term industry process re-engineering which concerns the re-engineering of trans-corporate processes as electronically managed processes. Industry process re-engineering addresses four issues: complexity, interoperability, communication and management. Of these four issues, complexity, communication and management are particularly relevant here. Communication and management are dealt with by an agent architecture described below. All e-market transactions are managed as industry processes. The problem of delivering information to both traders and entrepreneurs at the right granularity and at the right time is also a process management problem, as is the problem of effecting any intervention that may be triggered indirectly by that information. A single business process system manages the e-market transactions, the entrepreneurial intervention, and its delivery of information. The complexity in managing e-market transactions stems from the distributed nature of e-markets and from the complex time constraints that govern the operation of e-markets. The complexity in information delivery here stems from the unreliable and unpredictable nature of the Internet, and the problem of delivering something useful and reliable in the required time. This means that a powerful process management system is required, particularly in its capacity to manage heavily constrained and possibly interdependent processes of high value. The system used is a multi-agent system that is based on a three-layer, BDI (Belief-DesireIntention) hybrid agent architecture. Multi-agent technology is an attractive basis for industry process reengineering [6] [7]. A multi-agent system consists of autonomous components that interact with messages. The scalability issue is “solved”— in theory—by establishing a common understanding for inter-agent communication and interaction. KQML (Knowledge Query and Manipulation Language) is used for inter-agent communication [8]. Specifying an inter-agent communication protocol may be tedious but is not technically complex. Standard XML-based ontologies will enable data to be communicated freely [9] but much work has yet to be done on standards for communicating expertise. Any process management system should take account of the “process knowledge” and the “performance knowledge”. Process knowledge is the information that has been accumulated, particularly that which is relevant to the process instance at hand. Performance knowledge is knowledge of how effective people, agents, methods and plans are at achieving various things.
Virtual Market Environment For Trade
5.
41
E-MARKET AS A COLLABORATIVE VIRTUAL PLACE
The collaborative market environment should seamlessly integrate the mining and process models with the underlying technology and support both social and negotiation interactions. Market activities are distributed by nature, so there’s also a need to be able to handle all interactions in a single ‘logical’ location, which in turn allows convenience for research observation and collection of strategies and behaviours within the market place. Much research and development involved with collaborative virtual environments (CVEs) [10] has been towards the implementation of ‘virtual place’ models offering ways to handle information, communication and additional functionality [11] [12]. Our approach describes a ‘virtual emarket place’ that provides ‘virtual e-markets’ with an environment in which the complete range of market and trade activities are naturally represented. The model for the e-market place is shown in Figure 4. The Virtual environment is at the centre of all activity. It is a distributed collaborative environment implemented using Adobe Atmosphere technology and facilitates the 3D representations of actors and markets over the Internet. Within this environment we have built an e-market place. The e-market place is a centralised location for the populations of e-markets. It is essentially the hub for existing markets and for the creation of emerging new markets alike. An e-market is a market area designated for trade and activities. Markets are split into market areas and each e-market describes a set of goods and/or services which are traded within the market. Fig. 4 shows a screenshot of the implementation of the virtual UTS environment. The implementation of this virtual environment is divided to three areas. These areas are marked as 1, 2 and 3 in Figure 5. Area 1 displays the 3-D environment in which the user moves and interacts with the various agents. The movement of the user’s avatar can be controlled through the keyboard and mouse, and the interaction with the various environmental objects, such as walls, desks etc., is as one would expect in the natural environment. Area 2 is used to display general information about a product (for example, its 3dimensional representation), comparisons between products in tabular format, and tools that can facilitate the creation of new markets by the user. Area 3 is used for facilitating the actor communications and dialogue and displays the history of the dialog as it continues between a user and a virtual agent. It is through this that the actors can converse with one another.
42
Paul Bogg, Peter Dalmaris
Figure 4. E-market model for basic trade in 3 classes of goods
Another aspect of the virtual environment model is establishing a 3D representation of the actors (including traders, etc). In the context of an emarket, the representation should be understandable to all other actors, thus a human who has entered the virtual environment should have an avatar model assigned which shows where they are, the direction they are facing, and any gestures communicated. Avatar models may also be customised to signify specific traits, which may be used to identify individuals from others.
Figure 5. The virtual environment market place
Virtual Market Environment For Trade
43
Finally, the virtual environment enables agent interaction in order to interface data-mining information. The process agents described here are the same as those described earlier in the data mining section and have representations in the 3D context. Actors may utilise the mining methods provided by the agents (via an agent avatar visualisation) in order to search for relevant information from many sources across the Internet. They do this in one of the two ways outlined earlier: data driven and hypothesis driven. Agents will seek out information based on the goal and retrieve what is available in a time constrained manner. The results are displayed in a manner that is appropriate to the information to be visualised. In order to fulfill these, further information will be required. How one may determine whether a product is reasonably priced in this instance might be a combination of comparing Internet prices, market trends for similar products, and being able to negotiate a better price with the seller. The former two would be something an agent might be able to provide visually with data mining techniques, the latter is a social feature inherently tied with markets. How one may determine whether a product is of good quality in this instance might be a combination of Internet published reviews about the product, technical forum views about the seller, and a visual 3D inspection of the product itself. Precisely what information is required and by when is another matter for further research. Furthermore, there are a wide variety of other e-market scenarios for which the constraints and the satisfying of the constraints become considerably more complex. An example is for a multi-tier auction for e-procurement over the Internet involving steel. In the e-market context, it becomes a much more complex problem of how one would best go about to achieve the best price (as a buyer and seller), however, it is reasonable to suspect that it certainly involves timely, accurate and reliable information, visualisation, and the ability to interact socially with those involved.
6.
CONCLUSION
This research has described an implementation of a collaborative virtual environment for e-markets permitting market activities and trade between actors represented as virtual avatars. By enabling a singular virtual location for activity, a smarter way of business is engaged by reducing information overload to well organized visualisations and an environment for interactive decision making and data mining. The merger of e-business with the feeling of ‘place’ supplied by the 3D virtual context provides for a more coherent and effective way of doing business.
44
Paul Bogg, Peter Dalmaris
By constructing e-markets in the virtual world context, we are also facilitating further research into what drives trade in the e-market context. Having a centralised market activity centre allows unobtrusive observations to be made about the way business is done, allowing discoveries about the emerging processes between actors to be uncovered. The reader may enter the UTS e-market place from the following URL (you will need Internet Explorer 6 or newer, and the Adobe Atmosphere plug-in): http://research.it.uts.edu.au/emarkets/tech_vw/demo/demo.html. The e-market is still in prototypical form and consequently no formal evaluations have been made at the time of writing.
REFERENCES [1] Sathi, A. and Fox, M., 1989, Constraint-directed negotiation of resource reallocation. In L. Gasser and M. Huhns, editors, Distributed Artificial Intelligence Volume II, San Mateo, California, 1989. Morgan Kaufmann. [2] Rakkolainen, I., Vainio, T., A 3D City Info for Mobile Users, Tampere University of Technology, Finland [3] Wise, R. and Morrison, D., 2000. Beyond the Exchange; The Future of B2B. Harvard Business review Nov-Dec 2000, pp86-96. [4] Kovalerchuk, B. and Vityaev E., 2000, Data Mining in Finance: Advances in Relational and Hybrid Methods. Kluwer, 2000. [5] Han, J. and Kamber, M., 2001, Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, CA, 2001. [6] Jain, A.K., Aparicio, M. and Singh, M.P., 1999 “Agents for Process Coherence in Virtual Enterprises” in Communications of the ACM, Volume 42, No 3, March 1999, pp62—69. [7] Jennings, N.R., Faratin, P., Norman, T.J., O’Brien, P. & Odgers, B. Autonomous Agents for Business Process Management. Int. Journal of Applied Artificial Intelligence 14 (2) 145—189, 2000. [8] Finin, F. Labrou, Y., and Mayfield, J. “KQML as an agent communication language.” In Jeff Bradshaw (Ed.) Software Agents. MIT Press (1997). [9] Robert Skinstad, R. “Business process integration through XML”. In proceedings XML Europe 2000, Paris, 12-16 June 2000. [10] Capin, T.K., Pandzic, I. S., Magnenat-Thalmann, N. and Thalmann, D. Avatars in Networked Virtual Environments. John Wiley and Sons, Chichester, 1999. [11] J.A. Waterworth. Spaces, places, landscapes and views: experiential design of shared information spaces. In A. J. Munro, K. Hook and D. Benyon (eds), Social Navigation of Information Space, Springer, London, pp. 132-154, 1999. [12] T. Manninen. Interaction in Networked Virtual Environments as Communicative Action – Social Theory and Multi-player Games. CRIWG2000 Workshop, Madeira, Portugal, IEEE Computer Society Press, October 18-20 2000. [13] Jaideep Srivastava and Robert Cooley, Web Mining for E-Commerce: Concepts and Case Studies J.Stefan Institute, Slovenia and Carnegie Mellon University USA
AN ARTIFICIAL NEURAL NETWORKS APPROACH TO THE ESTIMATION OF PHYSICAL STELLAR PARAMETERS Rodriguez, Alejandra Department of Information and Communication Technologies University of A Coruña. A Coruña. 15071. Spain
[email protected] Carricajo, Iciar Department of Navigation and Earth Sciences University of A Coruña. A Coruña. 15011. Spain
[email protected] Dafonte, Carlos Department of Information and Communication Technologies University of A Coruña. A Coruña. 15071. Spain
[email protected] Arcay, Bernardino Department of Information and Communication Technologies University of A Coruña. A Coruña. 15071. Spain
[email protected] Manteiga, Minia Department of Navigation and Earth Sciences University of A Coruña. A Coruña. 15011. Spain
[email protected] Abstract This paper presents an artificial neural networks approach to the estimation of effective stellar temperatures by means of optical spectroscopy.
46 The present work is included in a global project, whose final objective is the development of an automatic system for the determination of the physical and chemical parameters of stars. In previous works, we designed a hybrid system that integrated neural networks, fuzzy logic and expert systems to obtain the stellar spectral type and luminosity in the MK standard system. Considering those results, we now propose the design of several neural networks for the calculation of stellar temperatures. The proposed networks have been trained with synthetic spectra that were previously contrasted with statistical clustering techniques. The final networks obtained a success rate of 88% for public catalogue spectra. Our final objective is to calibrate the MK classification system, obtaining thus a new relation between the temperature and the spectral type.
Keywords: Neural Networks, Clustering Techniques, Stellar Physical Parameters, Optical Spectroscopy
1.
Introduction
This work is part of a global project that studies the last phases of stellar evolution. Our main objective is to provide an automatic system, based on artificial intelligence techniques, that contributes to the determination of chemical and physical stellar parameters by means of optical spectroscopy. Spectroscopy is among the most powerful currently available techniques for the study of stars and, in particular, their physical conditions (temperature, pressure, density, etc.) and chemical components (H, He, Ca, K, etc.). In general terms, a stellar spectrum consists of a black body continuum light distribution, distorted by the interstellar absorption and reemission of light, and by the presence of absorption lines, emission lines and molecular bands [Zombeck, 1990]. The stellar spectra are collected from telescopes with appropriate spectrographs and detectors. Observers collect the flux distribution of each object and reduce these data to obtain a one-dimensional spectrum calibrated in energy flux and wavelength (Å). As part of the above-mentioned global project, we have collected a sample of approximately 400 low-resolution stellar spectra from astronomical observations that were carried out at several telescopes. In order to extract useful information from the individual spectra and to study the evolution in the whole sample, we must determine the main stellar parameters, such as spectral type, luminosity, temperature, surface gravity, metallicity, etc.
ANNs for the Estimation of Stellar Parameters
47
The estimation of the stellar parameters is often carried out by human experts, who analyse the spectra by hand, with no more help than their own experience. These manual analyses usually lead to a MK classification of the spectra. The MK classification system was firstly proposed in 1943 by Morgan, Keenan & Kellman [Morgan et al., 1943] and has experienced many revisions ever since [Kurtz, 1984]. This bidimensional system is the only one that is widely used for stellar classification. One of its main advantages is that MK classifications are often static, because they are based on the visual study of the spectra and on a set of standard criteria. However, the same spectra can be classified differently by different experts and even differently by the same person at different times. Any classification system should hold a compromise between maintaining the full information of the spectra and the need for a compact summary of this information. An optimal summary is obviously obtained by a study of the physical parameters. The manual techniques that are currently used to estimate stellar parameters are very time-consuming and involve a great amount of human resources. It would be highly advisable to dispose of automatic, fast and efficient computational techniques that allow the experts to classify a large number of spectra according to their physical parameters. In previous works, we developed an expert system for the classification of the stellar spectra of Post-AGB stars in the visible electromagnetic spectral range. The obtained results led us to extend this system to stars of different luminosities and to add new computational techniques, such as fuzzy logic, in order to refine the automatic processing of spectra. We have also tested feed-forward, radial-basis functions (RBF) and self-organising neural networks in order to obtain the spectral type and luminosity of stars through the analysis of their optical spectra. Combining both techniques, we formalised a hybrid system that obtains MK classifications and is able to determine the most appropriate method for each spectrum type. A complete description of these works can be found in [Rodriguez et al., 2003]. Our previous developments have proven that artificial techniques allow researchers to obtain the spectral type and luminosity of stars in a fast, easy and accurate way. Neural networks in particular have given excellent results, which is why this paper proposes the design and implementation of several artificial neural networks to estimate the effective stellar temperature. By carrying out a sensibility analysis of this technique in the estimation of the physical parameters of stars, we can determine the best learning algorithm and network structure for this specific problem.
48
2.
Methodology
As mentioned in Sect. 1, we have implemented artificial techniques in order to obtain MK classifications. That first approach combined signal processing [Kalouptsidis, 1997], knowledge-based systems [Buchanan and Shortliffe, 1984], fuzzy logic techniques [Bouchon-Meunier et al., 1991] and artificial neural networks models [Arbib, 1995], integrated by a relational database that stores and structures all the stellar information. Table 1 contrasts the behaviour of the artificial techniques and that of two human experts who collaborated on this project, applied to 100 unclassified spectra belonging to the standard atlas of spectra from [Pickles, 1998].
These results led us to develop a final hybrid system based on an expert system that determines the global type of each star and, according to the type, sends the spectra to different neural networks in order to obtain their spectral type as well as their luminosity level. The success rate of that system was very similar to the agreement percentage between experts in the field (about 80%). This paper does not describe the development in detail: a more complete description of the models and stellar parameters used can be found in [Rodriguez et al., 2003]. We only include a brief explanation of the morphological algorithms so as to clarify how the spectral parameters, used as the input layer of most of the proposed neural networks models, are obtained and measured.
2.1
Morphological Analysis
Before presenting the spectra to the neural networks, we carry out a morphological analysis of all the spectra in order to obtain the values of the parameters that characterise each spectrum separately. These spectral parameters include the measurement of 25 spectral features that can be divided into three types:
ANNs for the Estimation of Stellar Parameters
49
Absorption and emission lines: including hydrogen, helium and metallic lines (Ca, K, etc.). Molecular bands: hydrogen and carbon absorption bands. Rates between lines: CH-K rates, He-H rates. From a morphological point of view, an absorption line is a descending (ascending for emission) deep peak that appears in an established wavelength zone [Zwicky, 1957]. To accurately calculate the intensity of each line, we carry out an estimation of the local spectral continuum. That is
where is the estimation of the continuum for sample is the flux in sample N is the number of values used to calculate the local spectral continuum, and X is a binary vector that indicates the representative fluxes of the spectral continuum in the zone. A molecular band is a spectral zone in which the flux suddenly decreases from the local continuum during a wide lambda interval [Zwicky, 1957]. For the molecular bands, this means that we only have to measure their energy to decide if they are significant enough. That is
where is the flux of the band between the samples and L is the projection line, E is the flux function, the wavelength, the left limit of the band and the right limit.
2.2
Training and Validation Spectra
In order to build the training set of the artificial neural networks that will be applied to the problem of estimation of stellar effective temperatures, we have chosen a complete and consistent set of synthetic optical spectra. The selected spectra were generated using the SPECTRUM v. 2.56. software written by Richard O. Gray. This software is a stellar spectral synthesis program that allows us to obtain prototype spectra of a chosen effective temperature and surface gravity. SPECTRUM is distributed with an atomic and molecular line list for the optical spectral region 3500 Å to 6800 Å, called luke.lst, suitable for computing synthetic spectra with temperatures between 4000K and 20000K. The details on the physics included in this sofware can be found in [Gray, 2003].
50
For the generation of synthetic spectra we also used the models calculated by Robert Kurucz [Kurucz, 1979]. The atmospheric model is a tabulation of temperature and pressure at range of mass depths in the stellar photosphere, calculated on the basis of a variety of opacity sources. Each model is characterised by four parameters: effective temperature metallicity ([M/H]), microturbulence velocity and surface gravity We generated a total of 350 solar metallicity spectra with effective temperatures between 4000K and 20000K and surface gravity from 0.5 to 5.0. This set of synthetic spectra covers the spectral range K-B with luminosity levels I to V (in the MK system). The spectra were generated in the 3500-6800 Å range, which is the wavelength interval where the software is capable of correctly obtaining synthetic spectra. Before training the neural networks, the synthetic spectra were contrasted with statistical clustering techniques in order to verify their suitability and to avoid dispersions in the whole sample. Grouping techniques are often used as a tool to verify and test the quality of an established clustering of data: the grouping algorithms make their own division into classes of the input data. Our purpose was to try to discard those spectra that were not similar to any others. We implemented the max-min and K-clustering algorithms [Kaufman and Rousseuw, 1990]. The data groups obtained by these two techniques prevent us from using some of the initial spectra that could not be included in any of the clusters. We also observed that the algorithms were able to accurately group the spectra of low and medium temperatures (4000K-8000K), although both techniques stored all the spectra with temperatures above 8000K in the same category. Obviously, they are not capable of separating spectra with temperatures higher than 8000K. Finally, we built the training set with 348 synthetic spectra, excluding those that were discarded by the clustering techniques. As for the test set of spectra, we used the public catalogue of Pickles [Pickles, 1998], choosing only the 79 spectra that match our temperature range.
2.3
Artificial Neural Networks Implementation
The neural networks used in this experimentation are based on supervised learning models and backpropagation networks [Rojas, 1996] in particular. In our experiments, we have made use of three different learning algorithms: Standard backpropagation: as the most common learning algorithm, it updates the weights after each training pattern.
ANNs for the Estimation of Stellar Parameters
51
Enhanced backpropagation: it uses a momentum term that introduces the old weight change as a parameter for the computation of the new weight change. Batch Backpropagation: in standard backpropagation, an update step is performed after each single pattern; in batch backpropagation, all the weight changes are summed over a full presentation of all the training patterns (one epoch). Only then, an update with the accumulated weight changes is performed. As for the topology, the different networks that were implemented are shown in Table 2. These topologies have been tested for the three backpropagation learning algorithms. We built the input patterns for the nets with 659 flux values (from 3510 Å to 6800 Å), as well as with the 25 spectral features obtained by means of the morphological algorithms described in Sect. 2.1. The output of all networks is a continuous function that obtains the effective temperature in the form
We used the Stuttgart Neural Network Simulator (SNNS v.4.1) [Zell, 2002] to implement the above described networks and to transform them into C code, so they can be integrated into the previously developed tool. In the training phase, the weights were updated in a topological order and initiated randomly with values in the [-1, 1] interval. During our experiments, we found that the best topology corresponds to an enhanced backpropagation network 659:10:5:3:1; that is, a network with 659 flux values in the input layer, three hidden layers with 10, 5 and 3 neurons respectively, and 1 neuron for the effective temperature in the output layer.
52
2.4
Enhanced Techniques
While analysing the success rates of the implemented neural networks for each different temperature, we observed that all the networks have a higher performance for temperatures between 4000K-8000K (less than 200K mean error). Considering this fact, we separated the input spectra into two sets (4000K-8000K and 8000K-20000K) and trained the best performance network once again, but only with the patterns that correspond to spectra with temperatures in the interval 8000K-20000K. This “overtrained” network resulted in a better performance for all the test spectra. An additional research consisted in analysing the weights of the output layer units of the best network so as to determine, for each temperature interval, which input parameters have more influence on the output. We were able to reduce the inputs of the network because we considered only the sufficiently significant flux values. This enhanced strategy allowed us to obtain networks that converge sooner, and has resulted in a significant improvement of the performance of the original artificial neural networks (around 2%).
3.
Discussion and Results
The best implemented neural networks are able to determine the temperature of the 79 selected spectra from the test catalogue with an error rate below 15% and a mean deviation of 300 K, according to the estimation made by Pickles [Pickles, 1998]. The experts in this field usually work with a mean error of 10% of the correct temperature. Taking this error rate into account, our neural networks approach is able to correctly estimate 88% of the effective temperatures with a minimal mean deviation of 3% (for the coolest stars) and a maximum mean deviation of 7% (for the hottest stars). The final performance is shown in Table 3. As shown in the results, the best temperature estimations are reached for cooler stars (4000K-8000K), whereas the stars with higher temperatures present a higher error rate; one explanation of this particularity could be that the training set includes few stars of this temperature range, because the models on which we based the generation of the synthetic spectra (Kurucz) can generate only 1 “hot spectrum” for every 4 “cool spectra”. Including more spectra for higher temperatures is not meaningful, because the error bar for this interval is established in 1000K. So, for higher temperatures, we can only include one sample spectrum for every 1000K, whereas for cool stars, where the error bar is 250K, we can include four samples. In future extensions, we hope
ANNs for the Estimation of Stellar Parameters
53
to increase the success rate for hot stars by designing a net of neural networks that includes different trained networks for hot and cool stars.
4.
Conclusions
This paper has presented a computational approach to the estimation of the physical parameters of stars and, in particular, the calculation of effective temperature by means of artificial neural networks. We have described several models of neural networks and analysed their performance and results to discover the best approach to the estimation of each temperature range. Backpropagation networks were trained with approximately 350 synthetic spectra that were previously contrasted with statistical clustering techniques. In order to obtain the input patterns of the neural networks and extract and measure spectral features, we used the morphological analysis algorithms that were developed in the expert systems approach. Several networks were trained with this parameterisation, and other networks with full flux values; the second strategy yielded the best performance. The best networks have reached a success rate of approximately 88% for a sample of 79 testing spectra from public catalogues. For the development and evaluation of the system, we can count on the essential collaboration of experts from the area of Astronomy and Astrophysics of the University of A Coruña. As an additional study, the best implemented networks were analysed to determine the flux values that are more influential in the output for each temperature interval; by training the networks with these small input patterns, we increased the performance of the original networks. At present, we are analysing other types of neural structures and learning algorithms in order to improve the classification properties. We are also studying the influence of differences in signal to noise ratio in the resulting classification, since our final objective is to obtain a calibration that can determine our own relation between temperature
54
and MK classifications. We are also completing the development of our stellar database, STARMIND (http://starmind.tic.udc.es), to make it accessible through the Internet.
Acknowledgments The authors acknowledge support from grants AYA2000-1691 and AYA2003-09499, financed by the Spanish Ministerio de Ciencia y Tecnología.
References [Arbib, 1995] Arbib, M. A. (1995). The Handbook of Brain Theory and Neural Networks. MIT Press, Cambridge, Massachusetts. [Bouchon-Meunier et al., 1991] Bouchon-Meunier, R., Yager, R., and Zadeh, L. A. (1991). Uncertainty in Knowledge Bases. Springer-Verlag, Berlin. [Buchanan and Shortliffe, 1984] Buchanan, B. and Shortliffe, E. (1984). Ruled-based Expert Systems. Reading MA: Adisson-Wesley. [Gray, 2003] Gray, R. (2003). Spectrum: A stellar spectra synthesis program. http://www1.appstate.edu/dept/physics/spectrum. [Kalouptsidis, 1997] Kalouptsidis, N. (1997). Signal Processing Systems: Theory and Design. Wiley, New York. [Kaufman and Rousseuw, 1990] Kaufman, L. and Rousseuw, P. J. (1990). Finding Groups in Data. Wiley, New York. [Kurtz, 1984] Kurtz, M. J. (1984). The MK Process and Stellar Classification of Stars. David Dunlap Observatory, Toronto. [Kurucz, 1979] Kurucz, R. L. (1979). Model atmospheres for G, F, A, B and O stars. The Astrophysical Journal Suppl., 40. [Morgan et al., 1943] Morgan, W. W., Keenan, P. C., and Kellman, E. (1943). An Atlas of Stellar Spectra with an Outline of Spectral Classification. University of Chicago Press, Chicago. [Pickles, 1998] Pickles, A. J. (1998). A stellar spectral flux library. 1150-25000 å. Publications of the Astronomical Society of the Pacific, 110:863–878. [Rodriguez et al., 2003] Rodriguez, A., Dafonte, C., Arcay, B. and Manteiga, M. (2003). An Artificial Neural Network Approach to Automatic Classification of Stellar Spectra. In Proc. 7TH. Int. Work Conf. on Artificial and Natural Neural Networks (IWANN’03), 2: 639–646, Menorca, Spain. [Rojas, 1996] Rojas, R. (1996). Neural Networks. A Systematic Introduction. Springer-Verlag, Berlin. [Zell, 2002] Zell, A. (2002). Stuttgart neural network simulator. http://wwwra.informatik.uni-tuebingen.de/SNNS/. [Zombeck, 1990] Zombeck, M. V. (1990). Handbook of Astronomy and Astrophysics. Cambridge University Press, Cambridge, second edition. [Zwicky, 1957] Zwicky, F. (1957). Morphological Astronomy. Springer, Berlin.
EVOLUTIONARY ROBOT BEHAVIORS BASED ON NATURAL SELECTION AND NEURAL NETWORK* Jingan Yang1,2, Yanbin Zhuang2, Hongyan Wang1 1
Institute of Artificial Intelligence, Hefei University of Technology Hefei 230009, Anhui, P. R. China
[email protected] 2
School of Computer Science & Engineering, Changzhou Institute of Technology Changzhou 213002, Jiangsu Province, P. R. China
[email protected] Abstract
The methodology of artificial evolution based on the traditional fitness function is argued to be inadequate for constructing the entities with behaviors novel to their designers. Evolutionary emergence via natural selection(without an explicit fitness function) is a promising way. This paper primarily considers the question of what to evolve, and focuses on the principles of developmental modularity based on neural networks. The connection weight values of this neural network are encoded as genes, and the fitness individuals are determined using a genetic algorithm. In paper we has created and described an artificial world containing autonomous organisms for developing and testing some novel ideas. Experimental results through simulation have demonstrated that the developmental system is well suited to long-term incremental evolution. Novel emergent strategies are identified both from an observer’s perspective and in terms of their neural mechanisms.
Keywords:
Natural selection, development neural network, evolutionary algorithm, novel behavior.
Introduction The goal of artificial life presents us with the problem that we do not understand natural life well enough to specify it to a machine. Therefore we must either increase our understanding of it until we can, or create a system which outperforms the specifications we can give it. The first possibility includes the traditional top-down methodology, which is clearly as inappropriate for ALife as it has been proved to be for artificial intelligence. It also includes man* Partial
funding provided by the Internatioanl Centre for Theoretical Physics, Trieste, Italy
56
ual incremental(from bottom to up) construction of autonomous systems with the aim of increasing our understanding and ability to model life by building increasingly impressive systems, retaining functional validity by testing them within their destination environments. The second option is to create systems which outperform the specifications given them and which are open to produce behaviors comparable with those of (albeit simple) natural life. Evolution in nature has no(explicit) evaluation function. Through organism-environment interactions, including interactions between similarly-capable organisms, certain behaviors fare better than others. This is how the non-random cumulative selection works without any long-term goal. It is why novel structures and behaviors emerge. As artificial evolution is applied to increasingly complex problems, the difficulty in specifying satisfactory evaluation functions is becoming apparent–see [1], for example. At the same time, the power of natural selection is being demonstrated in prototypal systems such as Tierra and Poly-World[2]. Artificial selection involves the imposition of an artifice crafted for some cause external to a system beneath it while natural selection does not. Natural selection is necessary for evolutionary emergence but does not imply sustained emergence (evermore new emergent phenomena) and the question “what should we evolve?” needs to be answered with that in mind[3]. This paper sets out to answer that question. Further discussion concerning evolutionary emergence can be found in [4], along with evaluations of other natural selection systems. Note that an explicit fitness landscape is not a requirement for artificial selection and so an implicit fitness landscape does not imply natural selection. General issues concerning long-term evolution have been addressed by Harvey’s Species Adaptation Genetic Algorithm’(SAGA) theory[5]. He demonstrates that changes in genotype length should take place much more slowly than crossover’s fast mixing of chromosomes. The population should be nearlyconverged, evolving as species. Therefore the fitness landscape (actual or implicit) must be sufficiently correlated for mutation to be possible without dispersing the species in genotype space or hindering the assimilation by crossover of beneficial mutations into the species.
1.
Designing Evolutionary Robot
Neural networks are the straightforward choice because of their graceful degradation (high degree of neutrality). But how should the network structure be specified? The evolutionary emergence of novel behaviors requires new neural structures. We can expect most to be descended from neural structures that once have different functions. There are many known examples of neural structures that serve a purpose different from a previous use. Evidence from gene theory tells us that genes are used like a recipe, not a blueprint. In any cell,
57
at any stage of development, only a tiny proportion of the genes will be in use. Further, the effect that a gene has depends upon the cell’s local environment–its neighbors. Two sections above are related: for a type of module to be used for a novel function(and then to continue to evolve from there), without loss of current function, either an extra module must be created or there must be one spare’ (to alter). Either way, a duplication system is required. This could be either by gene duplication or as part of a developmental process. Gene duplication can be rejected as a sole source of neural structure duplication, because the capacity required to store all connections in a large network without a modular coding is genetically infeasible. Therefore, for the effective evolutionary emergence of complex behaviors, a modular developmental process is called for. For the sake of research validity (regarding long-term goals), this should be included from the outset. Gruau’s cellular encoding: Gruau used genetic programming(GP) to evolve his cellular programming language’ code for developing modular artificial neural networks[6, 7]. The programs used are trees of graph-rewrite rules whose main points are cell division and iteration. The crucial shortcoming is that modularity can only come from either gene duplication (see objections above) or iteration. But iteration is not a powerful enough developmental backbone. Consider, for example, the cerebral cortex’s macro-modules of hundreds of mini-columns. These are complicated structures that cannot be generated with a repeat one hundred times: mini-column’ rule. There are variations between modules. Cellular automata: Many researchers have used conventional cellular automata (CA) for the construction of neural networks. However, such work is more at the level of neuron growth than the development of whole networks. Although CA rules are suited to the evolution of network development in principle, the amount of work remaining makes this a major research hurdle. Diffusion models: while there are a number of examples of work involving the evolution of neural networks whose development is determined by diffusion along concentration gradients, the resulting network structures have (to date) been only basic. So as to concentrate on the intended area of research, these models have also been passed over. Lindenmayer systems: Kitano used a context-free L-system[8, 9] to evolve connectivity matrices. The number of rules in the genotype was variable. Boers and Kuiper used a context-sensitive L-system to evolve modular feed-forward network architecture[10, 11]. Both these works used backpropagation to train the evolved networks. Also, the resulting structures were fully-connected clusters of unconnected nodes(i.e. no links within clusters and if one node in cluster A is linked to one node in cluster B then all nodes in A are linked to all in B). It may be that the results achieved reflect the workings of backpropaga-
58
Figure 1.
Schematic block diagram of a neuron (from Cliff, Harvey and Husbands, 1992).
tion more than evolution. However, these works demonstrated the suitability of L-systems to ‘non-iterative modular’ network development.
2.
The Neural and Development Systems
The artificial neural networks used here are recurrent networks[12, 13]. Let the current node be and excitatory output at time is inhibitory output is Let the node which links to the current node be and the value of noise be then
Where
59
Developmental system: A context-free L-system was designed for the evolution of networks of these neurons. Specific attention was paid to producing a system in which children’s networks resemble aspects of their parents’. Each node has a bit-string ‘character’(label) associated with it, initialized at construction and modifiable during development. These characters may be of any non-zero length. A node may be network input, network output, or neither, as determined by an axiom(birth) network and development. A production rule matches a node if its predecessor is the start of the node’s character. The longer the matching predecessor, the better the match; the best matching rule (if any) is applied. Thus ever more specific rules can evolve from those that have already been successful. The production rules have the following form:
Where: P Predecessor (initial bits of node’s character) Successor 1: replacement node’s character Successor 2: new node’s character bits link details [0=no,1=yes] reverse types [inhibitory/excitatory] of (input, output) links on (inhibitory, excitatory) link from to (inhibitory, excitatory) link from to If a successor has no character (0 length) then that node is not created. Thus this predecessor node may be replaced by 0, 1 or 2 nodes. The ‘replacement’ successor (if it exists) is just the old (predecessor) node, with the same links but a different character. The new successor (if it exists) is a new node. It inherits a copy of the old node’s input links unless it has a link from the old node or It inherits a copy of the old node’s output links unless it has a link to the old node or New network input nodes are (only) produced from network input nodes and new network output nodes are (only) produced from network output nodes. Character-based matching of network inputs and outputs ensures that the addition or removal of nodes later in development or evolution will not damage the relationships of previously adapted network inputs and outputs. Genetic decoding of production rules: The genetic decoding is loosely similar to that in [10]. For every bit of the genotype, an attempt is made to read a rule that starts on that bit. A valid rule is one that starts with 11 and has enough bits after it to complete a rule. To read a rule, the system uses the idea of ‘segments’. A segment is a bit string with its odd-numbered bits all 0. Thus the reading of a segment is as follows: read the current bit; if it is a 1 then stop; else read the next bit–this is the next information bit of the segment; now start over, keeping track of the information bits of the segment. Note that a segment can
60
be empty (have 0 information bits). The full procedure to (try to) read a rule begins with reading a segment for each of the predecessor, the first successor (replacement node) and the second successor (new node). Then the six linkdetails bits represented by the binary strings are read, if possible. This example is represented in table 1.
3.
Experimental Scheme
To develop and validate the method above, a simple ALife system has been created. ‘Life’ is a two-dimensional world of artificial organisms each of which controlled by a neural network using the developmental system above. Evolution is strictly controlled by natural selection. There are no global system rules that delete organisms; this is under their own control. The experimental world (Figure 2) is divided into a grid of squares; usually 20 × 20 of them. No two individuals may be within the same square at any time. This gives the organisms a ‘size’ and puts a limit on their number. They are otherwise free to move around the world, within and between squares. As well as a position, each organism has a forward (facing) direction, set randomly at birth.
Algorithm Initialization Every square in the world has an individual with a single-bit genotype ‘0’ born into it. Main Loop In each time step (loop), every individual alive at the start of the cycle is processed once. The order in which the individuals are processed is otherwise random. These are the steps involved for each individual: Network inputs are updated. Development–one iteration. Update all neural activations including network outputs.
61
Figure 2. The experimental world (a two-dimensional toroidal virtual world containing autonomous organisms).
Actions associated with certain network outputs are carried out according to those outputs. These actions are: “reproduce”, “fight”, “turn clockwise”, “turn anti-clockwise”, and “moveforward”.
Neural Network Details The axiom network consists of three nodes with two excitatory links:
The network output node’s character (01) matches reproduction. The network input node’s characters(left input 01) match this, without matching any of the other action characters. Finally, the hidden node’s characters neither match other node’s characters or the action characters nor are matched by the other nodes’ or the action characters. Development takes place throughout the individual’s life, although necessary limits on the number of nodes and links are imposed.
Interactions Between Organism and Environments Five built-in actions are available to each organism. Each is associated with network output nodes whose characters start with a particular bit-string:
01* Try to reproduce with organism in front 100* Fight: Kill organism in front (if there is one) 101* Turn anti-clockwise 110* Turn clockwise
62
111* Move forward (if nothing in the way) For example, if a network output node has the character string 1101001, the organism will turn clockwise by an angle proportional to the node’s excitatory output. If an action has more than one matching network output node then the relevant output is the sum of these nodes’ excitatory outputs, bounded by unity as within any node. If an action has no output node with a matching character, then the relevant output is noise, at the same level as in the (other) nodes. Both reproduce and fight are binary actions. They are applied if the relevant output exceeds a threshold and have no effect if the square in front is empty. To turn and move forward are done in proportion to output. When an organism reproduce with a mate in front of it, the child is placed in the square beyond the mate if that square is empty. If it is not, the child replaces the mate. An organism cannot reproduce with an individual that is fighting if this would involve replacing that individual. Reproduction involves crossover and mutation. Life’s crossover always offsets the cut point in the second individual by one gene, with equal probability either way–which is why the genotype lengths vary. Mutation at reproduction is a single gene-flip(bitflip). An organism’s network input nodes have their excitatory inputs set to the weighted sum of ‘ matching’ output nodes’ excitatory outputs from other individuals in the neighborhood. If the first bit of an input node’s character is 1 then the node takes its input from individuals to the right hand side (including forward-and back-right), otherwise from individuals to the left. An input node ‘matches’ an output node if the rest of the input node’s character is the same as the start of the character of the output node. For example, an input node with character 10011 matches(only) output nodes with character’s starting with 0011 in the networks of individuals to the right. Weighting is inversely proportional to the Euclidean distances between individuals. Currently the input neighborhood is a 5 × 5 area centered on the relevant organism.
4.
Experimental Results
Kin similarity and convergence: When two Life organisms (with networks developed from more than just a couple of production rules each) reproduce, the child’s network almost always resembles a combination of the parents’ networks. Examination of networks from Life’s population at any time shows similarities between many of them. The population remains nearly converged, in small numbers of species, throughout the evolution. The criterion of a sufficiently correlated (implicit) fitness landscape has been met by the developmental system, making it suitable for long-term evolution. Emergence of increasing complexity: Once Life has started, there is a short period while genotype lengths increase until capable of containing a production rule. For the next ten to twenty thousand time steps (in typical runs),
63
Figure 3.
A neural network of a dominant organism.
networks resulting in very simple strategies such as ‘do everything’ and ‘always go forwards and kill’ dominate the population. Some networks do better than others but not sufficiently well for them to display a dominating effects. In every run to date, the first dominant species that emerges has been one whose individuals turn in one direction while trying to fight and reproduce at the same time. Figure 3 shows an example of such an individual, after the user dragged the nodes apart to make detailed examination possible. Note the outputs o101, o01 and o100 (turn anti-clockwise, reproduce and fight). Note also the large number of links necessary to pass from inputs to outputs, and the input characters which match non-action output characters of the same network (o000, o00). Individuals of this species use nearby members, who are also turning in circles, as sources of activation (so keeping each other going). Although this is a very simple strategy, watching it in action makes its success understandable. Imagine running around in a small circle stabbing the air in front of you. Anyone trying to attack would either have to get his timing exactly right or approach in a faster spiral–both relatively advanced strategies. These individuals also mate just before killing. The offspring (normally) appear beyond the individual being killed, away from the killer’s path. Because of the success of this first dominant species, the world always has enough space for other organisms to exist. Such organisms tend not to last long; almost any movement will bring them into contact with one of the dominant organisms. Hence these organisms share some of the network morphology of the dominant species. However, they can make some progress: Individuals have emerged that are successful at turning to face the dominant species and holding their direction while trying to kill and reproduce. An example of such a ‘rebel’ (from the same run as Figure 3 is shown in Figure 4. Running averages of the number of organisms reproducing and killing (Figure 5) suggest that further species emerge. However, organisms have proved
64
Figure 4.
Figure 5.
A neural network of a rebel organism.
Typical running averages of the number of organisms reproducing and killing.
difficult to analyze beyond the above, even at the behavioral level. All that can currently be said is that they share characteristics of the previous species but are different.
5.
Conclusions
The algorithm proposed in this paper is novel, feasible. Although the behaviors that emerged are very low-level, they are encouraging nonetheless, for the increases in complexity were in ways not specified by the design. It is difficult to evaluate any ongoing emergence because of the difficulty in analyzing later
65
organisms. Either tools to aid in such analysis will have to be constructed, or a more transparent system should be created. In work involving pure natural selection, the organisms’ developmental and interaction systems are analogous to the fitness functions of conventional genetic algorithms. While the general aim involves moving away from such comparisons, the analogy is useful for recognizing how the epistasis of fitness landscape issue transfers across: Certain ontogenetic and interaction systems can result in individuals with similar genotypes but very different phenotypes. The results show that Life’s developmental system does not suffer from this problem, making it suitable for long-term incremental evolution. This is a significant result for a modular developmental system. This work has made it clear that the specification of actions, even at a lowlevel, results in the organisms being constrained around these actions and limits evolution. Alternatives in which the embodiment of organisms is linked to their actions that need to be further investigated. 1 The evolutionary robots can learn and integrate the visual behaviors and attention autonomously with biologically motivated visual behaviors cooperated as parallel, complementary and “highly coupled processes” in the tracking systems, simplifying the acquisition of perceptual information and system modeling and control; 2 Combining competitive coevolutionary genetic programming with evolutionary learning, and creating evolutionary robotics with advanced complex vision systems, performing more complex obstacle avoidance behavior and target approach activity; 3 Evolutionary robots can solve out problems on obstacle avoidance and target approach autonomously by self-learning environments knowledge based on competitive coevolutionary genetic programming.
Acknowledgements Authors would like to thank the reviewers for their helpful revising suggestions and comments.
References [1] Zaera N., Cliff D., and Bruten J.(1996). Evolving collective behaviours in synthetic fish. Proceedings of SAB96, MIT Press Bradford Books, 635–644. [2] Ray, T. S.(1991). An approach to the synthesis of life. In Langton, C.; Taylor, C.; Farmer, J.; and Rasmussen, S. eds., Artificial Life II, 371–408. Redwood City, CA: Addison-Wesley. [3] Yaeger, L.(1993). Computational genetics, physiology, metabolism, neural systems, learning, vision, and behavior or polyworld: Life in a new context. In Langton, C. G., ed., Artificial Life III, 263–298.
66 [4] Malrey Lee(2003). Evolution of behaviors in autonomous robot using artificial neural network and genetic algorithm. International Journal of Information Sciences, Vol.155, Issue 1-2, pp.43-60. [5] Harvey, I.(1993). Evolutionary robotics and SAGA: the case for hill crawling and tournament selection. In Langton, C. G., ed., Artificial Life III. [6] Koza, J. R.(1992). Genetic Programming. Cambridge, MA: MIT Press/Bradford Books. [7] Gruau, F.(1996). Artificial cellular development in optimization and compilation. Technical Report, Psychology Department, Stanford University, Palo Alto, CA. [8] Lindenmayer, A.(1968). Mathematical models for cellular interaction in development. Journal of Theoretical Biology (Parts I and II), 18: 280–315. [9] Kitano, H. (1990). Designing neural networks using genetic algorithms with graph generation system. Complex Systems, 4: 461–476.
[10] Boers, E. J. and Kuiper, H.(1992). Biological metaphors and the design of modular artificial neural networks. Master’s thesis, Departments of Computer Science and Experimental Psychology, Leiden University. [11] Yang, J. A. and Wang, H. Y.(2002). Research on combined behavior method for evolutionary robot based on neural network. Journal of Shanghai Jiao Tong University, Vol.36, Sup. pp.89–95. [12] Wang, H. Y. and Yang, J. A.(2000). Research on evolutionary robot behavior using developmental network, Acta Electronica Sinica, 28(12): 41–45. [13] Wang, H. Y., Yang, J. A. and Jiang, P.(2000). Research on evolutionary robot behavior using developmental network [J]. Journal of Computer & Development, 37(12): 1457–1465.
CONTROL OF OVERHEAD CRANE BY FUZZYPID WITH GENETIC OPTIMISATION
A. Soukkoul, A. Khellaf2, S. Leulmil 1Department of Electrotechnics, University of Skikda 21000, Algeria. 2Department of electronics, University Ferhat Abbas-Sétif 19000, Algeria
[email protected];
[email protected];
[email protected] Abstract:
A fuzzy logic controller with the fuzzy knowledge base: scaling factors of the input/output variables, membership functions and the rules are optimized by the use of the genetic algorithms, is presented in this work, and its application in the highly nonlinear systems. The fuzzy structure is specified by a combination of the mixed Sugeno’s and Mamdani’s fuzzy reasoning. The mixed, binary-integer, coding is utilized to construct the chromosomes, which define the set of necessary prevailing parameters for the conception of the desired controller. This new controller stands out by a non standard gain (output scaling factor) which varies linearly with the fuzzy inputs. Under certain conditions, it becomes similar to the conventional PID controller with non-linearly variable coefficients. The results of simulation show, well, the efficiency of the proposed controller.
Key words:
Genetic algorithms, fuzzy controller, PID
1. INTRODUCTION The control of complex systems, that are nonlinear or contains an unknown dynamics, is a difficult task. In addition, the desired performances become more and more severe. It is one of the raisons that lead to the apparition of new control methods that are more sophisticated. The apparition of the fuzzy logic formalism [1] gave birth of a new approach, where the control laws are replaced by a set of linguistic rule, determines the control strategies and objectives. This approach requires the knowledge of the human appraisal (experience). The fuzzy systems
68
A. Soukkou, A. Khellaf, S. Leulmi
performances depend on the exactness of the obtained knowledge’s. These problems have lead to the development of hybrid systems that could unite the training capacities of the neural networks and the decision capacities of fuzzy systems. The main objective of hybrid systems is the design of fuzzy controllers with optimum parameters. In this hybridization, the network topology is specified by the type of the inference and by the fuzzy rules used (Sugeno’s or Mamdani reasoning). The training allows us to adjust the weights of the network, that constitute the base of the fuzzy knowledge, by the minimization of a given cost function that depends on the performances and the desired control strategy. The backpropagation, based upon the usual gradient method, suffers from convergence problems. This constitutes a constraint that degrades the efficiency degrees of the fuzzy neural controllers. The genetic algorithms (GA), inspired from the natural genetics [2], become a powerful competitor in the parameter optimisation domain necessary for the design of optimal fuzzy controllers. Designing an optimal fuzzy controller is, then, a selection of an optimal vector of parameters or variables that will minimize a certain cost function subject to some constraints. Then, we can expect that simultaneous use of the combined fuzzy-genetic approaches lead to the construction of a fuzzy inference system whose structural parameters (its fuzzy knowledge base) are optimized by the GA. The main objective is to exploit the effectiveness of each to build more flexible and efficient controllers. This article is divided in two main parts: Description of the methodology of conception of the fuzzy neural controller. Simulation and results of the application of the new controller on an example of nonlinear systems.
2.
METHODOLOGY OF CONCEPTION
Mamdani’s and Sugeno’s models are the basis of the majority of the fuzzy controllers used in the literature of the domain of the fuzzy logic control. The main difference between the two reasonings resides in the way of assessment of the consequences of the fuzzy rules. In the method of Mamdani [3-4], every rule is represented by a conjunction (MIN) and the aggregation of the rules by a disjunction (MAX). In this model, the consequences are independent of the controller’s entries. The link between the conditions and the actions is achieved, solely, by operators.
Control Of Overhead Crane By Fuzzy-PID
69
Takagi-Sugeno and Kang [5] proposed a model to develop a systematic approach for a generation of fuzzy rules from the data set of input-output (I/O). A rule of this type has the following shape:
Generally
possesses a polynomial form of
and
This form is called the model of Sugeno of order one. In some cases, must be constant (Sugeno of order zero). The determination of the coefficients of the consequences must be made by a parametric identification method [6]. The methods of defuzzification adapted to this form of rules are based on the approach: Defuzzify Without Combining (DWC). The more used is the one weighted-average defuzzification, given by:
Where M is the number of the rules and is the degree of activation of the ith rule calculated either by the operator of conjunction (MIN) or the arithmetic product.
The simplicity and the reduced execution time present two advantages of the Sugeno’s approach. Its simplicity of implementation has attractes the attention of several researchers [7-8]. The main goal is to develop new technics to reduce the number of parameters of the consequences that burdens the Sugeno’s method. Hao Ying [9] proposed an analytical method of treatment of the Sugeno controller’s. The output polynomials are proportional one of the other. While introducing some modifications to the level of the form of the rules, one gets:
70
A. Soukkou, A. Khellaf, S. Leulmi
Where,
Using the weighted-average defuzzification method, the digital output for a set of entries and is given by:
Where M is the number of the rules and is the degree of activation of the rule calculated either by the operator of conjunction (MIN) or the arithmetic product, represents the center of gravity of the fuzzy set output (singleton) of the rule. Under the vectorial form:
The expression (7) becomes:
Where,
Control Of Overhead Crane By Fuzzy-PID
71
The expression represents an association of the linear controllers, which contain the linear PID controller as a special case The following three-input variables are used by the discrete-time PID control in incremental form:
Where and y(n) are a set-point/reference signal for process output and the process output at sampling time n, respectively. According to data of the problem to treat, the output variable can be the action of control or its variation. It is sometimes discriminating to formulate the rules of inference so that the response signal intervenes like increment of the control signal between the instants n and n-1:
With respects of the saturation regions:
The increment permits to introduce an integral behaviour to the global controller level. In the rest of this paper, we denote the output and the change in output as U(n) and at time n, respectively. In the case of the direct action control, the controller’s output form will be given by:
The law of PD controller in its incremental or discreet form is given by:
Where, and are constant gains named proportional-gain, and derivative-gain, respectively. It returns the controller equivalent to a PD with gains and variables and non-linear, since the output of a fuzzy system is non-linear in relation to these entries.
In the case of the control variation, the output controller’s will be:
A. Soukkou, A. Khellaf, S. Leulmi
72
By analogy with a PI and a PID in the discrete-time form, the control variation is:
The relation (14) can write itself as follows:
This last is similar to the one of the expression (16), only the constant gains and are replaced by nonlinear factors: et respectively. Generally, the control of the complex systems by a PID presents a delicate task. The solution of the problem comes back to choose or to develop an efficient method for the determination of the gains of different actions, or the combination of the PID with other more sophisticated methods very adapted to the complexity and to the nature of the industrial processes. The controller so developed presents a way of this hybridization, whose goal is to take advantage on the one hand of the precision of the mathematical model of the PID, on the other hand, of the suppleness of the elements manipulated by the fuzzy logic. What drives to the realization of a fuzzy controller to one degree of autonomy.
3.
CONTROL STRUCTURE
Figure 1 shows the structure of the command system that contains four blocks: Optimization block characterized by GA. Structural block representing the fuzzy controller. Decisions block defining the performances criteria. System block to be controlled.
Control Of Overhead Crane By Fuzzy-PID
73
Figure 1. Control structure.
The interaction between these four blocks is summarized by the flow chart represented by figure 2.
Figure 2. Interaction between the different blocks of the control structure.
74
4.
A. Soukkou, A. Khellaf, S. Leulmi
PARAMETERS CODING
The difference between the genetic algorithms used in the literature resides to the levels of the coding (binary, integer or real) and the genetic operators adapted to every coding type [13]. The introduction of the coding mixed, binary-integer, multiparametered and concatenated permits to construct the chromosome of the GA. This technique allows or encourages the coding and the successive juxtaposition of the different parameters. The parameters to optimize are: The widths of the fuzzy subsets associated to the spaces of I/O, binary-coded. The rules or the centers of the fuzzy subsets of the consequences are coded in integer. Every center is coded by a number included between 1 and the maximal number N of the fuzzy subsets of the output space [10-12]. The coefficients associated to the scaling factors of I/O are binary-coded. Every chromosome can represent a solution of the problem, that is, a fuzzy optimal knowledge base. Its structure is represented by the figure 3. and are the codes of the widths of the universes of discourses of the fuzzy subsets (ZE) of the entries and of the output U, respectively. The are the codes of the conclusions characterizing the centers of the output fuzzy subsets and and are the codes of the coefficients of the scaling factors of the I/O variables.
5.
MECHANISM OF THE GA
The genetic algorithms are stochastic procedures of research and exploration, inspired from the natural genetics. The figure 4 summarizes the principle of the standard GA. The GA uses the selection by wheel of lotteries for the reproduction, two points crossover and the mutation operators.
Control Of Overhead Crane By Fuzzy-PID
Figure 3. Structure of the chromosome.
Figure 4. Execution process of the GA.
75
A. Soukkou, A. Khellaf, S. Leulmi
76
If the mutation, takes place in the binary-coded part, the operator of mutation has the same principle that the one of the standard mutation. If the mutation takes place in the integer part, instead of changing a 1 in 0 or 0 in 1, we change the allele by a random from value 1 to N different from its actual value.
Figure 5. Mutation.
Finally, to put a limitation to the genetic divergence, one of the elitism strategies has been introduced. This last, based on the technic of the steadystate selection, permits to construct a new more effective generation than the previous. The criterion of stop of the GA is fixed by a, well determined, number of generations (Max-Gen). The validity of the methodology and that of the form of command (PD non linear) are tested on the overhead crane considered as a non-linear, single input / multi-output system (SIMO).
6.
SIMULATION
Figure 6 shows the schematic diagram of the crane system [14], as well as its structure. The dynamics of the system are characterized by:
Figure 6. Overhead crane structure.
Control Of Overhead Crane By Fuzzy-PID
77
Where, and F are the mass of the wagon, load mass, crane position, the angle of the segment in respect to the vertical axis, and driving force, respectively. The numerical used data in the simulation are: L = 1m, The objective of the control is to generate a sequences of actions F(in Newton) permitting to consolidate the system in a desired state from a predetermined initial state. The overhead crane system has four state variables If the fuzzy controller has N inputs, there will be items of rules even if every variable has M fuzzy subsets. It becomes difficult to design the fuzzy controller. The solution is to divide the system in sub-systems. Liu Diantong in [14] proposed one controller with two fuzzy subcontrollers: displacement fuzzy controller is used to control load displacement and anti-swing fuzzy controller is used to damp load swing. Figure 7 is the schematic diagram of the control system. The adapted chromosome structure for this system has the same form of the figure 3, but doubled. A part for and the other for The control system will be the superposition of the two outlays of the two controllers. In this application the objective function responsible for the ordering of the chromosomes in the population is:
The controller thus developed for this application is characterized by: (5×5×5, 5×5×5), i.e., five fuzzy subsets for each I/O variables. What gives to the total 25×2 rules. Every chain of the population will contain 95×2 alleles (10 bits for each of the widths of the universe of discourses, 25×2 integers for the rules and 10 bits for every coefficient of the gains). The parameters of the GA are presented in table 1.
A. Soukkou, A. Khellaf, S. Leulmi
78
The
The parameters, and initial conditions used
and in
this
simulation
are:
Figure 8 indicates the evolution of the objective function. The rule base gotten at the end of execution of the GA is:
Figure 7. Schematic diagram of the control
Figure 8. Evolution of the objective function.
Simulation results from initial state (10, 0, 30, 0) to target state (10, 0, 0, 0) are shown in figure 9. The generated force is presented by the figure 10.
Control Of Overhead Crane By Fuzzy-PID
Figure 9. Evolution of the state variables.
Figure 10. Output control.
79
A. Soukkou, A. Khellaf, S. Leulmi
80
7.
CONCLUSION
In this work, the proposed controller has been applied to control the overhead crane system. The efficiency of this approach is measured by the controller’s capacity to bring back the crane to its desired state. One can say that this controller manages to achieve the desired task, which justifies the efficiency of the proposed conception method. The association of the fuzzy reasoning and the genetic algorithms constitute an efficient means to exploit the power and the suppleness of the elements manipulated by these different procedures and approaches intervening in the phase of the conception and add to the resulted controller an