Lecture Notes in Artificial Intelligence Edited by J. G. Carbonell and J. Siekmann
Subseries of Lecture Notes in Computer Science
4031
Moonis Ali Richard Dapoigny (Eds.)
Advances in Applied Artificial Intelligence 19th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2006 Annecy, France, June 27-30, 2006 Proceedings
13
Series Editors Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA Jörg Siekmann, University of Saarland, Saarbrücken, Germany Volume Editors Moonis Ali Texas State University-San Marcos Department of Computer Science 601 University Drive, San Marcos, TX 78666-4616, USA E-mail:
[email protected] Richard Dapoigny Université de Savoie LISTIC/ESIA Domaine Universitaire, BP 806, 74016 Annecy Cedex, France E-mail:
[email protected] Library of Congress Control Number: 2006927485
CR Subject Classification (1998): I.2, F.1, F.2, I.5, F.4.1, D.2, H.4, H.2.8, H.5.2 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13
0302-9743 3-540-35453-0 Springer Berlin Heidelberg New York 978-3-540-35453-6 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2006 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 11779568 06/3142 543210
Preface
“Intelligent Design and complex problem solving are twined like wife and husband.” In the current competitive global industrial environment there are many problems which need intelligent systems technology for optimal solutions. The central theme of the 19th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE 2006) is to focus on the research methodologies and practical implementations of these methodologies for intelligent solutions of problems in real-world applications. We are pleased to present the papers in these proceedings which cover various aspects of applications of applied intelligent systems. We received more than 330 papers from many countries and each paper was reviewed by at least two reviewers. Only 134 papers were selected for presentation in the normal and special sessions. The normal sessions cover the following topics: planning and scheduling, multi-agent systems, fuzzy logic, data mining and knowledge discovery, genetic algorithms, decision support, expert systems, neural networks, computer vision, speech recognition, systems for real-life applications, machine learning, model-based reasoning, heuristic search, and knowledge engineering. We also organized several special sessions in the areas of bioinformatics, ontology, knowledge discovery, intelligent control systems, intelligent industrial systems, and applications of data mining. The conference program also included four invited lecturers given by Fausto Giunchiglia, Erik Sandewall, Sylviane Gentil and Trevor Martin. Many people have contributed in various ways to the successful organization of this conference. We would like to express our sincere thanks to the authors, Program Committee members and chairs, special session chairs, reviewers and organizers for their hard work. We would also like to thank Ms. Valerie Braesch and Ms. Joelle Pellet for their efficiency in dealing with several issues related to conference organization and management. There are many other participants whose role was of crucial importance in the organization of the conference. The present conference would not have been possible without their valuable support. Annecy, June 2006
Moonis Ali Richard Dapoigny
Organization
IEA/AIE 2006 was organized by the department of Computer Science, University of Savoie and ISAI (International Society of Applied Intelligence) in cooperation with LISTIC/ESIA, AAAI, ACM/SIGART, AFIA, CSCSI/SCEIO, ECCAI, ENNS, INNS, JSAI, TAAI and Texas State University - San Marcos.
Conference Organization General Chair
Moonis Ali (Texas State University - San Marcos, USA) Richard Dapoigny (University of Savoie, France) Patrick Br´ezillon (Paris VI University, France) Laurent Foulloy (University of Savoie, France) Patrick Barlatier (University of Savoie, France) Sylvie Galichet (University of Savoie, France) Gilles Mauris (University of Savoie, France) Eric Benoit (University of Savoie, France) Val´erie Braesch (University of Savoie, France) Joelle Pellet (University of Savoie, France)
Program Chair Program Co-chairs
Organizing Committee
Conference Secretariat
Program Committee F. Alexandre V. Auberg´e P. Barlatier F. Belli E. Benoit L. Berrah L. Borzemski P. Br´ezillon
E.K. Burke B. Chaib-draa C.W. Chan J. Chen S.M. Chen P. Chung F.S. Correa da Silva R. Desimone
G. Dreyfus F. Esposito L. Foulloy C. Freksa S. Galichet H.W. Guesgen M.T. Harandi H. Hattori
VIII
Organization
T. Hendtlass G. Horvath M.P. Huget T. Ito S. Iwata F. Jacquenet L. Jain C. Jonker K. Kaikhah F. Kabanza Y.L. Karnavas L. Kerschberg T. Kinosihta A. Kumar R.L. Loganantharaj K. Madani M.M. Matthews G. Mauris
J. Mira L. Monostori K. Morik H. Munoz-Avila Y.L. Murphey B. Neumann N.T. Nguyen T. Nishida H.G. Okuno B. Orchard C. Pellegrini S. Pesty F.G. Pin W. Don Potter S. Ramaswamy M.C. Randall R. Ranon V.J. Rayward-Smith
C. Roche L. Saitta M. S´ anchez-Marr´e K. Suzuki T. Tanaka J. Treur S. Vadera P. Valckenaers M. Valtorta J. Vancza X.Z. Wang S. Watanabe G. Williams T. Wittig C. Yang Y. Yang
Additional Reviewers S. Sakurai D. Jiang C.S. Lee F. Honghai E. Gutieriez L. Wang C.S. Ho B.C. Chien T.P. Hong A. Khashman A. Pfeiffer L.H. Wang H.M. Lee C.H. Cheng M. Nakamura D.L. Yang M. Mejia-Lavalle N.C. Wei F. Liu S. Li J.S. Aguilar-Ruiz A. Lim
A. Garrido B.C. Csaji B. Sekeroglu B. Kadar C. Araz C.C. Wang C.H. Chang C.Y. Chang C. Nguyen D. Krol D.A. Clifton E. Onaindia G.S. Kukla H. Selim H. Rau H. Hesuan H.M. Chiang I. Ozkarahan J. Kang J. Rezaei M. Jerzy J. Jianyuan
J.L. Lin J. Liu J. Sun K.W. Chau K. Zhang K. Dimililer K.H. Kim K. Rim K.R. Ryu L. Tarassenko M.G. Garcia-Hernandez M. Galetakis M. Shan M.S. Oh P.R. Bannister P.J. Chen R. Garcia-Martinez S. Lee T. Arredondo T.Y. Lee W. Weidong W.A. Tan
Organization
W. Xu Y. Li Y.S. Huang Y.M. Chiang Y. Geng Z.J. Viharos
Z. Kemeny G. Li M. Debasis J. Kalita J. Clifford F. Lang
D. Gilbert G. Yuriy N. Zhong S. Jennings
IX
Table of Contents
Invited Contributions Managing Diversity in Knowledge Fausto Giunchiglia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Artificial Intelligence for Industrial Process Supervision Sylviane Gentil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Fuzzy Ambient Intelligence in Home Telecare Trevor Martin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
Multi-agent Systems Modeling and Multi-agent Specification of IF-Based Distributed Goal Ontologies Nacima Mellal, Richard Dapoigny, Patrick Barlatier, Laurent Foulloy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
Agent-Based Approach to Solving Difficult Scheduling Problems Joanna Jędrzejowicz, Piotr Jędrzejowicz . . . . . . . . . . . . . . . . . . . . . . . . . .
24
Development of the Multiple Robot Fish Cooperation System Jinyan Shao, Long Wang, Junzhi Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
Introducing Social Investors into Multi-Agent Models of Financial Markets Stephen Chen, Brenda Spotton Visano, Ying Kong . . . . . . . . . . . . . . . . .
44
Cross-Organisational Workflow Enactment Via Progressive Linking by Run-Time Agents Xi Chen, Paul Chung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
Comparison and Analysis of Expertness Measure in Knowledge Sharing Among Robots Panrasee Ritthipravat, Thavida Maneewarn, Jeremy Wyatt, Djitt Laowattana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
Multiagent Realization of Prediction-Based Diagnosis and Loss Prevention Rozália Lakner, Erzsébet Németh, Katalin M. Hangos, Ian T. Cameron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
XII
Table of Contents
Emergence of Cooperation Through Mutual Preference Revision Pedro Santana, Luís Moniz Pereira . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
Running Contracts with Defeasible Commitment Ioan Alfred Letia, Adrian Groza . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
A Self-organized Energetic Constraints Based Approach for Modelling Communication in Wireless Systems Jean-Paul Jamont, Michel Occello . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Decision-Support Evaluation of Several Algorithms in Forecasting Flood C.L. Wu, K.W. Chau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Simulation Analysis for On-Demand Transport Vehicles Based on Game Theory Naoto Mukai, Jun Feng, Toyohide Watanabe . . . . . . . . . . . . . . . . . . . . . . 117 A Set Theoretic View of the ISA Hierarchy Yee Chung Cheung, Paul Wai Hing Chung, Ana Sălăgean . . . . . . . . . . . 127 Tale of Two Context-Based Formalisms for Representing Human Knowledge Patrick Brézillon, Avelino J. Gonzalez . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Some Characteristics of Context Patrick Brézillon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Signal and Image Representations Based Hybrid Intelligent Diagnosis Approach for a Biomedicine Application Amine Chohra, Nadia Kanaoui, Véronique Amarger, Kurosh Madani . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Handling Airport Ground Processes Based on Resource-Constrained Project Scheduling Jürgen Kuster, Dietmar Jannach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Distribution System Evaluation Algorithm Using Analytic Hierarchy Process Buhm Lee, Chang-Ho Choi, Nam-Sup Choi, Kyoung Min Kim, Yong-Ha Kim, Sang-Kyu Choi, Sakis A. Meliopoulos . . . . . . . . . . . . . . . 177
Table of Contents
XIII
Genetic Algorithms A Hybrid Robot Control System Based on Soft Computing Techniques Alfons Schuster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 A Combination Genetic Algorithm with Applications on Portfolio Optimization Jiah-Shing Chen, Jia-Leh Hou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Genetic Algorithm-Based Improvement of Robot Hearing Capabilities in Separating and Recognizing Simultaneous Speech Signals Shun’ichi Yamamoto, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean-Marc Valin, Ryu Takeda, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno . . . . . . . . . . . . . 207 A Hybrid Genetic Algorithm for the Flow-Shop Scheduling Problem Lin-Yu Tseng, Ya-Tai Lin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 Solving a Large-Scaled Crew Pairing Problem by Using a Genetic Algorithm Taejin Park, Kwang Ryel Ryu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Population Structure of Heuristic Search Algorithm Based on Adaptive Partitioning Chang-Wook Han, Jung-Il Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 Generating Guitar Tablature with LHF Notation Via DGA and ANN Daniel R. Tuohy, W.D. Potter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Heuristic Search Search Space Reduction as a Tool for Achieving Intensification and Diversification in Ant Colony Optimisation Marcus Randall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 Truck Dock Assignment Problem with Operational Time Constraint Within Crossdocks Andrew Lim, Hong Ma, Zhaowei Miao . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 A Hybrid Genetic Algorithm for Solving the Length-Balanced Two Arc-Disjoint Shortest Paths Problem Yanzhi Li, Andrew Lim, Hong Ma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
XIV
Table of Contents
A Fast and Effective Insertion Algorithm for Multi-depot Vehicle Routing Problem with Fixed Distribution of Vehicles and a New Simulated Annealing Approach Andrew Lim, Wenbin Zhu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 On the Behaviour of Extremal Optimisation When Solving Problems with Hidden Dynamics Irene Moser, Tim Hendtlass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Local Search Algorithm for Unicost Set Covering Problem Nysret Musliu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
Data-Mining and Knowledge Discovery Evaluating Interestingness Measures with Linear Correlation Graph Xuan-Hiep Huynh, Fabrice Guillet, Henri Briand . . . . . . . . . . . . . . . . . . 312 Extended Another Memory: Understanding Everyday Lives in Ubiquitous Sensor Environments Masakatsu Ohta, Sun Yong Kim, Miyuki Imada . . . . . . . . . . . . . . . . . . . 322 Incremental Clustering of Newsgroup Articles Sascha Hennig, Michael Wurst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Topic Detection Using MFSs Ivan Yap, Han Tong Loh, Lixiang Shen, Ying Liu . . . . . . . . . . . . . . . . . . 342 A Rule Sets Ensemble for Predicting MHC II-Binding Peptides Zeng An, Pan Dan, He Jian-bin, Zheng Qi-lun, Yu Yong-quan . . . . . . . 353 Constructing Complete FP-Tree for Incremental Mining of Frequent Patterns in Dynamic Databases Muhaimenul Adnan, Reda Alhajj, Ken Barker . . . . . . . . . . . . . . . . . . . . . 363
Planning and Scheduling An Optimal Method for Multiple Observers Sitting on Terrain Based on Improved Simulated Annealing Techniques Pin Lv, Jin-fang Zhang, Min Lu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 An On-Line Approach for Planning in Time-Limited Situations Oscar Sapena, Eva Onaindía . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
Table of Contents
XV
Priority-Constrained Task Sequencing for Heterogeneous Mobile Robots Metin Ozkan, Inci Saricicek, Osman Parlaktuna, Servet Hasgul . . . . . . 393 New Heuristics to Solve the “CSOP" Railway Timetabling Problem Laura Ingolotti, Antonio Lova, Federico Barber, Pilar Tormos, Miguel Angel Salido, Montserrat Abril . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 SEaM: Analyzing Schedule Executability Through Simulation Riccardo Rasconi, Nicola Policella, Amedeo Cesta . . . . . . . . . . . . . . . . . . 410 From Demo to Practice the MEXAR Path to Space Operations Amedeo Cesta, Gabriella Cortellessa, Simone Fratini, Angelo Oddi, Nicola Policella . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Fuzzy Logic A New Method for Appraising the Performance of High School Teachers Based on Fuzzy Number Arithmetic Operations Chih-Huang Wang, Shyi-Ming Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432 New Methods for Evaluating the Answerscripts of Students Using Fuzzy Sets Hui-Yu Wang, Shyi-Ming Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442 Genetic Lateral and Amplitude Tuning with Rule Selection for Fuzzy Control of Heating, Ventilating and Air Conditioning Systems Rafael Alcalá, Jesús Alcalá-Fdez, Francisco José Berlanga, María José Gacto, Francisco Herrera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 Fuzzy Motivations for Evolutionary Behavior Learning by a Mobile Robot Tomás Arredondo V., Wolfgang Freund, Cesar Muñoz, Nicolas Navarro, Fernando Quirós . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 Optimization of Self-organizing Fuzzy Polynomial Neural Networks with the Aid of Granular Computing and Evolutionary Algorithm Ho-Sung Park, Sung-Kwun Oh, Tae-Chon Ahn . . . . . . . . . . . . . . . . . . . . 472 Fuzzy Clustering-Based on Aggregate Attribute Method Jia-Wen Wang, Ching-Hsue Cheng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
Computer Vision Recurrent Neural Network Verifier for Face Detection and Tracking Sung H. Yoon, Gi T. Hur, Jung H. Kim . . . . . . . . . . . . . . . . . . . . . . . . . . 488
XVI
Table of Contents
Automatic Gait Recognition by Multi-projection Analysis Murat Ekinci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500 A Novel Image Retrieval Approach Combining Multiple Features of Color-Connected Regions Yubin Yang, Shifu Chen, Yao Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510 An Application of Random and Hammersley Sampling Methods to Iris Recognition Luis E. Garza Castañón, Saúl Montes de Oca, Rubén Morales-Menéndez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520 Biometric-Iris Random Key Generator Using Generalized Regression Neural Networks Luis E. Garza Castañón, MariCarmen Pérez Reigosa, Juan A. Nolazco-Flores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 Head Detection and Tracking for the Car Occupant’s Pose Recognition Jeong-Eom Lee, Yong-Guk Kim, Sang-Jun Kim, Min-Soo Jang, Seok-Joo Lee, Min Chul Park, Gwi-Tae Park . . . . . . . . . . . . . . . . . . . . . . 540
Case-Based Reasoning Prediction of Construction Litigation Outcome – A Case-Based Reasoning Approach K.W. Chau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 Component Retrieval Using Knowledge-Intensive Conversational CBR Mingyang Gu, Ketil Bø . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 Identification of Characteristics After Soft Breakdown with GA-Based Neural Networks Hsing-Wen Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
Knowledge Engineering Integrating Organizational Knowledge into Search Engine Hiroshi Tsuji, Ryosuke Saga, Jugo Noda . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Ontology for Long-Term Knowledge Anne Dourgnon-Hanoune, Patrick Salaün, Christophe Roche . . . . . . . . 583 Introducing Graph-Based Reasoning into a Knowledge Management Tool: An Industrial Case Study Olivier Carloni, Michel Leclère, Marie-Laure Mugnier . . . . . . . . . . . . . . 590
Table of Contents
XVII
Retaining Consistency in Temporal Knowledge Bases Franz Wotawa, Bibiane Angerer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
Machine Learning Locality-Convolution Kernel and Its Application to Dependency Parse Ranking Evgeni Tsivtsivadze, Tapio Pahikkala, Jorma Boberg, Tapio Salakoski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610 Intrusion Detection Based on Behavior Mining and Machine Learning Techniques Srinivas Mukkamala, Dennis Xu, Andrew H. Sung . . . . . . . . . . . . . . . . . 619 Tractable Feature Generation Through Description Logics with Value and Number Restrictions Nicola Fanizzi, Luigi Iannone, Nicola Di Mauro, Floriana Esposito . . . 629
Model-Based Reasoning Diagnosing Program Errors with Light-Weighted Specifications Rong Chen, Franz Wotawa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639 Diagnosis of Power System Protection Rui D. Jorge, Carlos V. Damásio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650 Towards Lightweight Fault Localization in Procedural Programs Bernhard Peischl, Safeeullah Soomro, Franz Wotawa . . . . . . . . . . . . . . . 660
Speech Recognition On Adaptively Learning HMM-Based Classifiers Using Split-Merge Operations Sang-Woon Kim, Soo-Hwan Oh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668 Comparative Study: HMM and SVM for Automatic Articulatory Feature Extraction Supphanat Kanokphara, Jan Macek, Julie Carson-Berndsen . . . . . . . . . 674 A Study on High-Order Hidden Markov Models and Applications to Speech Recognition Lee-Min Lee, Jia-Chien Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
XVIII
Table of Contents
Diagnostic Evaluation of Phonetic Feature Extraction Engines: A Case Study with the Time Map Model Daniel Aioanei, Julie Carson-Berndsen, Supphanat Kanokphara . . . . . 691
Systems for Real Life Applications Soft Computing for Assessing the Quality of Colour Prints Antanas Verikas, Marija Bacauskiene, Carl-Magnus Nilsson . . . . . . . . . 701 An Efficient Shortest Path Computation System for Real Road Networks Zhenyu Wang, Oscar Che, Lijuan Chen, Andrew Lim . . . . . . . . . . . . . . . 711 Automatic Topics Identification for Reviewer Assignment Stefano Ferilli, Nicola Di Mauro, Teresa Maria Altomare Basile, Floriana Esposito, Marenglen Biba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721 A Decentralized Calendar System Featuring Sharing, Trusting and Negotiating Yves Demazeau, Dimitri Melaye, Marie-Hélène Verrons . . . . . . . . . . . . 731
Applications Unidirectional Loop Layout Problem with Balanced Flow Feristah Ozcelik, A. Attila Islier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 A Heuristic Load Balancing Scheduling Method for Dedicated Machine Constraint Arthur M.D. Shr, Alan Liu, Peter P. Chen . . . . . . . . . . . . . . . . . . . . . . . . 750 An Adaptive Control Using Multiple Neural Networks for the Variable Displacement Pump Ming-Hui Chu, Yuan Kang, Yuan-Liang Liu, Yi-Wei Chen, Yeon-Pung Chang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760 An Elaborated Goal Production Module for Implementing a Virtual Inhabitant Se-Jin Ji, Jung-Woo Kwon, Jong-Hee Park . . . . . . . . . . . . . . . . . . . . . . . 770 Agent-Based Prototyping of Web-Based Systems Aneesh Krishna, Ying Guan, Chattrakul Sombattheera, Aditya K. Ghose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780 High-Dimensional Micro-array Data Classification Using Minimum Description Length and Domain Expert Knowledge Andrea Bosin, Nicoletta Dessì, Barbara Pes . . . . . . . . . . . . . . . . . . . . . . . 790
Table of Contents
XIX
On Solving Edge Detection by Emergence Mohamed Batouche, Souham Meshoul, Ali Abbassene . . . . . . . . . . . . . . . 800 Clustering Microarray Data Within Amorphous Computing Paradigm and Growing Neural Gas Algorithm Samia Chelloug, Souham Meshoul, Mohamed Batouche . . . . . . . . . . . . . 809 Conflict-Directed Relaxation of Constraints in Content-Based Recommender Systems Dietmar Jannach, Johannes Liegl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819 Modeling pH Neutralization Process Via Support Vector Machines Dongwon Kim, Gwi-Tae Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830 Generating Tutoring Feedback in an Intelligent Training System on a Robotic Simulator Roger Nkambou, Khaled Belghith, Froduald Kabanza . . . . . . . . . . . . . . . . 838 Elaborating the Context of Interactions in a Tutorial Dialog Josephine Pelle, Roger Nkambou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848 Static Clonal Selection Algorithm Based on Match Range Model Jungan Chen, Dongyong Yang, Feng Liang . . . . . . . . . . . . . . . . . . . . . . . . 859 Diagnosing Faulty Transitions in Recommender User Interface Descriptions Alexander Felfernig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869 An Unsupervised Method for Ranking Translation Words Using a Bilingual Dictionary and WordNet Kweon Yang Kim, Se Young Park, Dong Kwon Hong . . . . . . . . . . . . . . . 879 Neuro-fuzzy Learning for Automated Incident Detection M. Viswanathan, S.H. Lee, Y.K. Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . 889 Intelligent GIS: Automatic Generation of Qualitative Spatial Information Jimmy A. Lee, Jane Brennan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898 On-Line Learning of a Time Variant System Fernando Morgado Dias, Ana Antunes, José Vieira, Alexandre Manuel Mota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 908
XX
Table of Contents
Special Session on Bioinformatics Bioinformatics Integration Framework for Metabolic Pathway Data-Mining Tomás Arredondo V., Michael Seeger P., Lioubov Dombrovskaia, Jorge Avarias A., Felipe Calderón B., Diego Candel C., Freddy Muñoz R., Valeria Latorre R., Loreine Agulló, Macarena Cordova H., Luis Gómez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917 The Probability Distribution of Distance TSS-TLS Is Organism Characteristic and Can Be Used for Promoter Prediction Yun Dai, Ren Zhang, Yan-Xia Lin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927 Protein Stability Engineering in Staphylococcal Nuclease Using an AI-Neural Network Hybrid System and a Genetic Algorithm Christopher M. Frenz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935 Identification of Over and Under Expressed Genes Mediating Allergic Asthma Rajat K. De, Anindya Bhattacharya . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943 Correlogram-Based Method for Comparing Biological Sequences Debasis Mitra, Gandhali Samant, Kuntal Sengupta . . . . . . . . . . . . . . . . . 953 Learning Genetic Regulatory Network Connectivity from Time Series Data Nathan Barker, Chris Myers, Hiroyuki Kuwahara . . . . . . . . . . . . . . . . . . 962 On Clustering of Genes Raja Loganantharaj, Satish Cheepala, John Clifford . . . . . . . . . . . . . . . . 972
Special Session on Ontology and Text Towards Automatic Concept Hierarchy Generation for Specific Knowledge Network Jian-hua Yeh, Shun-hong Sie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982 An Open and Scalable Framework for Enriching Ontologies with Natural Language Content Maria Teresa Pazienza, Armando Stellato . . . . . . . . . . . . . . . . . . . . . . . . . 990 Acquiring an Ontology from the Text Núria Casellas, Aleks Jakulin, Joan-Josep Vallbé, Pompeu Casanovas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1000
Table of Contents
XXI
Terminae Method and Integration Process for Legal Ontology Building Sylvie Despres, Sylvie Szulman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014 An Approach to Automatic Ontology-Based Annotation of Biomedical Texts Gayo Diallo, Michel Simonet, Ana Simonet . . . . . . . . . . . . . . . . . . . . . . . 1024 Lexical and Conceptual Structures in Ontology Christophe Roche . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1034 Discovering Verb Relations in Corpora: Distributional Versus Non-distributional Approaches Maria Teresa Pazienza, Marco Pennacchiotti, Fabio Massimo Zanzotto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042 Modelling Knowledge with ZDoc for the Purposes of Information Retrieval Henri Zinglé . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053
Special Session on Data for Discovery in Engineering Partially Ordered Template-Based Matching Algorithm for Financial Time Series Yin Tang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059 Model and Algebra for Genetic Information of Data Deyou Tang, Jianqing Xi, Yubin Guo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1071 Forecasting Intermittent Demand by Fuzzy Support Vector Machines Yukun Bao, Hua Zou, Zhitao Liu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1080
Special Session on Intelligent Control Systems Nonlinear Discrete System Stabilisation by an Evolutionary Neural Network Wasan Srikasam, Nachol Chaiyaratana, Suwat Kuntanapreeda . . . . . . . 1090
Special Session on Intelligent Systems for Industry Genetic Algorithm for Inventory Lot-Sizing with Supplier Selection Under Fuzzy Demand and Costs Jafar Rezaei, Mansoor Davoodi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1100 A Self-tuning Emergency Model of Home Network Environment Huey-Ming Lee, Shih-Feng Liao, Tsang-Yean Lee, Mu-Hsiu Hsu, Jin-Shieh Su . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1111
XXII
Table of Contents
Simulation Studies of Two-Layer Hopfield Neural Networks for Automatic Wafer Defect Inspection Chuan-Yu Chang, Hung-Jen Wang, Si-Yan Lin . . . . . . . . . . . . . . . . . . . . 1119 Supporting Dynamic Supply Networks with Agent-Based Coalitions Chattrakul Sombattheera, Aditya Ghose . . . . . . . . . . . . . . . . . . . . . . . . . . . 1127 Reducing Transportation Costs in Distribution Networks Xi Li, Andrew Lim, Zhaowei Miao, Brian Rodrigues . . . . . . . . . . . . . . . 1138 Application of an Intuitive Novelty Metric for Jet Engine Condition Monitoring David A. Clifton, Peter R. Bannister, Lionel Tarassenko . . . . . . . . . . . . 1149 Determination of Storage Locations for Incoming Containers of Uncertain Weight Jaeho Kang, Kwang Ryel Ryu, Kap Hwan Kim . . . . . . . . . . . . . . . . . . . . 1159 Fault Diagnostics in Electric Drives Using Machine Learning Yi L. Murphey, M. Abul Masrur, ZhiHang Chen . . . . . . . . . . . . . . . . . . . 1169 An Integrated and Flexible Architecture for Planning and Scheduling Antonio Garrido, Eva Onaindía, Ma. de Guadalupe García-Hernández . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1179 A Robust RFID-Based Method for Precise Indoor Positioning Andrew Lim, Kaicheng Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1189 A Study of Optimal System for Multiple-Constraint Multiple-Container Packing Problems Jin-Ling Lin, Chir-Ho Chang, Jia-Yan Yang . . . . . . . . . . . . . . . . . . . . . . 1200 Planning for Intra-block Remarshalling in a Container Terminal Jaeho Kang, Myung-Seob Oh, Eun Yeong Ahn, Kwang Ryel Ryu, Kap Hwan Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1211 Develop Acceleration Strategy and Estimation Mechanism for Multi-issue Negotiation Hsin Rau, Chao-Wen Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1221 Least Squares Support Vector Machines for Bandwidth Reservation in Wireless IP Networks Jerzy Martyna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1231
Table of Contents
XXIII
Special Session on Applications of Data Mining An Ontology-Based Intelligent Agent for Respiratory Waveform Classification Chang-Shing Lee, Mei-Hui Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1240 A New Inductive Learning Method for Multilabel Text Categorization Yu-Chuan Chang, Shyi-Ming Chen, Churn-Jung Liau . . . . . . . . . . . . . . . 1249 An Intelligent Customer Retention System Bong-Horng Chu, Kai-Chung Hsiao, Cheng-Seen Ho . . . . . . . . . . . . . . . 1259 Software Diagnosis Using Fuzzified Attribute Base on Modified MEPA Jr-Shian Chen, Ching-Hsue Cheng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1270 New Methods for Text Categorization Based on a New Feature Selection Method and a New Similarity Measure Between Documents Li-Wei Lee, Shyi-Ming Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1280 Using Positive Region to Reduce the Computational Complexity of Discernibility Matrix Method Feng Honghai, Zhao Shuo, Liu Baoyan, He LiYun, Yang Bingru, Li Yueli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1290 A Novel Mining Algorithm for Periodic Clustering Sequential Patterns Che-Lun Hung, Don-Lin Yang, Yeh-Ching Chung, Ming-Chuan Hung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1299 Text Mining with Application to Engineering Diagnostics Liping Huang, Yi Lu Murphey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1309 Handling Incomplete Categorical Data for Supervised Learning Been-Chian Chien, Cheng-Feng Lu, Steen J. Hsu . . . . . . . . . . . . . . . . . . 1318 Mining Multiple-Level Association Rules Under the Maximum Constraint of Multiple Minimum Supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang . . . . . . . . . . . . . . . . 1329 A Measure for Data Set Editing by Ordered Projections Jesús S. Aguilar-Ruiz, Juan A. Nepomuceno, Norberto Díaz-Díaz, Isabel Nepomuceno . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1339 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1349
Managing Diversity in Knowledge Fausto Giunchiglia Department of Information and Communication Technology University of Trento, 38050 Povo di Trento, Italy
We are facing an unforeseen growth of the complexity of data, content and knowledge. Here we talk of complexity meaning the size, the sheer numbers, the spatial and temporal pervasiveness of knowledge, and the unpredictable dynamics of knowledge change, unknown at design time but also at run time. In knowledge engineering and management the ”usual” approach is to take into account, at design time, all the possible future dynamics. The key idea is to design a ”general enough” reference representation model, expressive enough to incorporate all the possible future variations of knowledge. The approach proposed here is somewhat opposite. Instead of taking a topdown approach, where the whole knowledge is designed integrated, with a pure a-priori effort, we propose a bottom-up approach where the different knowledge parts are kept distinct and designed independently. The key idea is to consider diversity as a feature which must be maintained and exploited and not as a defect that must be cancelled or absorbed in some general ”universal-looking” schema. People, organizations, communities, populations, cultures build diverse representations of the world for a reason, and this reason lies in the local context. What context exactly, is hard to say. However it can be safely stated that context has many dimensions: time, space, contingent goals, short term or long term goals, personal or community bias, environmental conditions, ., and so on. We will present and discuss the ideas above comparing, as an example, how the notions of context and of ontology have been applied in the formalization of knowledge (for instance in the Semantic Web). We will then argue that a context-based approach to managing diversity in knowledge must be studied at three different levels: 1. Representation level, dealing with all the issues related to how local and global knowledge are represented, to their semantics, and to the definition of the operations which allow to manipulate them. 2. Organization level, dealing with the organization and interaction of interconnected knowledge parts and systems manipulating them. 3. Social level, dealing with the problem of how systems (incrementally) reach agreement, thus creating (sub)communities of shared or common knowledge.
M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, p. 1, 2006. c Springer-Verlag Berlin Heidelberg 2006
Artificial Intelligence for Industrial Process Supervision S. Gentil Laboratoire d’Automatique de Grenoble UMR 5528 CNRS-INPG-UJF BP 46, F-38402 Saint Martin d’Hères Cedex
[email protected] Abstract. This paper presents some difficulties of complex industrial process supervision and explains why artificial intelligence may help to solve some problems. Qualitative or semi-qualitative trend extraction is mentioned first. Then fault detection and fault supervision are evoked. The necessity for intelligent interfaces is explained next and distributed supervision is finally mentioned. Keywords: Supervision, Model-based Diagnosis, Prognosis, Trend extraction, Causal Reasoning, Multi-agents systems, Decision Support, Fuzzy Logic.
1 Introduction Nowadays, automation is the most important factor for complex industrial plant development. A complex process (transportation system, plane, thermal or nuclear power plant, electrical network, water treatment plant …) is a process in an open environment, where uncertainty and dynamical phenomena play an important role and decision making is difficult. These processes involve many measured variables. Their automation needs managing an important amount of heterogeneous information, distributed among interconnected sub-systems. Controlling such plants is done with a SCADA (Supervisory Control and Data Acquisition system). It is a reactive system connected on line to the process with real time constraints. It may have various objectives. A first set of objectives concerns the production performances: product quality, deadlines and manufacturing costs. A second set of objectives concerns safety. Industrial accidents resulting in human victims or environmental damages are no more accepted. Moreover, cost reduction requires a permanent availability of the production tool. Guarantying plant safety has been relying for a long time only on hardware redundancy. Some functionality is now replaced by analytical redundancy. Automation concerns a plant with various functioning modes: normal operating mode, start up and shut down modes and failure modes, corresponding to the different system states when the plant suffers a failure or a malfunction. These modes are quite difficult to predict. They may need either to stop the process immediately or to change the control laws (changing control loops set point or even the control architecture). Supervision objective is to manage the control effect on the process. This means verifying that the process is in its normal mode. If it is not the case, a reaction has to be found at least to avoid damages and to limit shuts downs, and if possible to M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 2 – 11, 2006. © Springer-Verlag Berlin Heidelberg 2006
Artificial Intelligence for Industrial Process Supervision
3
guaranty production continuity. Depending on circumstances, multiple tasks may be envisaged [1]. Supervision vocabulary related to failure modes is now clarified. A failure is a permanent interruption of the system or of a component function while a malfunction is an intermittent one. A fault is a deviation of at least one characteristic feature of the system behaviour from the normal condition. A fault can be observed thanks to a fault indicator, computed with measured variables. Monitoring refers to the possibility to recognize a fault mode. In its basic version, it consists in checking variables with regard to thresholds, generally empirically set by process engineers, and in generating alarms if necessary. In case of danger, the supervisor triggers an appropriate counteraction automatically. In more advanced versions of monitoring, a mathematical model can be used to generate fault indicators or rough measurements can be processed to obtain features relevant for classification. In the classical diagnostic procedure, two steps are recognized. Detection consists in using a set of fault indicators to generate qualitative symptoms (Boolean or fuzzy values). Isolation or classification consists in deducing from a set of symptoms a minimal set of physical components whose malfunction or failure is sufficient in order to explain the symptoms. Isolation is difficult because secondary faults may be observed that are only consequences of primary faults related to a component failure. The abnormal component set must be precise (i.e. contain the right abnormal components) and as small as possible (i.e. not contain many normal components). When the faulty components are isolated, the plant has to be stopped if safety is at stake or a redundant component can be started, or a different control strategy has to be used (fault tolerant control). Nevertheless, it is impossible to anticipate every situation in a complex system and to plan the corresponding counteraction. One has to bring into play many varied procedures that are not linked in a deterministic manner but in function of the context. This explains why human operators are still in charge of large industrial installations. A human’ important quality is to know how to manage unforeseen uncertain situations and to design relevant strategies when dysfunction occurs. Excluding human operators from control rooms is not planned, even for highly automated processes. Thus supervision must be designed with a man/machine cooperation point of view. Human operator is always the ultimate link in the decisional chain. The supervisory system must justify its choices, explain them to the human operator and support human reasoning in the final decision-making. This is when artificial intelligence may play an important role. Considering the present supervision tools it is to be noted that operators are provided with quantities of data, but no analysis about these data and their relations. In a cascading faulty situation the adequacy between the alarms flow and the time required by the operator to analyze it online is not respected any more. Supervision support functionalities are mainly concerned in a man-machine interface ergonomics, which is obviously a major step but not sufficient. It is necessary to conceive on-line systems able to manage the information combinatory, to link and to structure data—to sum up, to achieve all the tasks for which human reasoning is rapidly limited [2]. Monitoring, alarm filtering, fault detection and isolation, diagnosis and action advice are features to be provided to operators. In the following sections, solutions to some of the above mentioned problems will be presented, inspired by AI techniques. The present paper does not pretend to be a survey of AI for supervision [3], but rather presents some realizations with an AI user
4
S. Gentil
point of view. Fuzzy clustering or expert systems for diagnosis will not be described. We have rather chosen to describe applications where control theory and artificial intelligence are intimately mixed.
2 Semi-qualitative Trend Extraction for Monitoring or Prognosis Trend analysis is a useful approach to extract information from numerical data and represent it symbolically, in a qualitative or semi-qualitative way. A qualitative trend is a description of the evolution of the qualitative state of a variable, in a time interval, using a set of symbols called primitives. Generally, a small set of primitives is used to create episodes: an episode is an interval of uniform behaviour. A qualitative trend is consequently a qualitative history that is represented by a sequence of consecutive (contiguous and non-overlapping) episodes. The trend can be further interpreted by the operator in function of the specific system under study, thanks to his/her expert knowledge. Process trends have particularly been considered in the chemical process industry [4] and in medical applications. Often, the use of a semi-qualitative trend is preferred to a purely qualitative one, to allow representing quite precisely numerical data. 2.1 Semi-qualitative Trend Extraction This paragraph presents a methodology for semi-qualitative trend extraction as an example [5]. Data acquired on line on the process are first described by successive linear segments. The segmentation algorithm determines on line the moment when the current linear approximation is no longer acceptable and when a new segment should be computed. Two consecutive segments may be discontinuous. The segment i ends at time te(i), one sampling time before the next one starts t0(i+1). Corresponding data amplitude is denoted by y. The new segment is used with the preceding one to form a shape. Seven temporal primitives are used for this classification: Increasing, Decreasing, Steady, Positive Step, Negative Step, Increasing/Decreasing Transient, Decreasing/Increasing Transient. The shape associated with the segment i starts at time tb(i)=t0(i)-1. The classification of the segment into a temporal shape provides symbolic information meaningful to the operator. A semi-quantitative episode is described by [Primitive, tb(i), yb(i), te(i),ye(i)]. To obtain an easily understandable trend, only three basic primitives are used for the episodes: {Steady, Increasing, Decreasing}. If a shape is in {Steady, Increasing, Decreasing}, the symbolic information is obviously converted into one of the symbolic primitives. If the shape is Step or Transient, it is split into two parts. A Positive Step for instance becomes [Increasing, te(i-1), ye(i-1),t0(i), y0(i)] , [Steady, t0(i), y0(i), te(i), ye(i)]. The aggregation of episodes consists in associating the most recent episode with the former ones to form the longest possible episode. In the case of two Steady episodes, the result can be Steady, Increasing or Decreasing following the increase of the global sequence.
Artificial Intelligence for Industrial Process Supervision
5
On line, the four steps are achieved using the current segment (i) to extract a temporary trend. Later, when a new linear approximation will have been calculated, the current segment will become the previous segment and will now be completely defined. It will be aggregated with the definitive trend and a temporary trend will be calculated using the new current segment. 2.2 Trend Analysis for Prognosis Trend analysis can be used in different ways to inform the operator about the future behaviour of a given signal. The qualitative information such as variable is Steady or Increasing is by itself a useful tool. Moreover, the semi-quantitative episodes can be used to predict the value of the monitored variables in a time window. The segmentation algorithm provides the linear model that better describes the present behaviour of the signal. If an alarm threshold is known, then the model can be used to solve the problem: how much time is needed for the signal to arrive at this threshold? A two-tank system is used to illustrate the method (Fig. 1). The system has been modelled using physical relations implemented with Simulink. The simulator allows
Fig. 1. The two-tank system example
6
S. Gentil
the simulation in normal as well as in faulty conditions. Noise is added to simulated variables. The lower tank level is controlled by the servo-valve Vk1. Measurements available to the control and supervision systems are the water levels h1, h2 and output flow q1, q2 of each tank. The fault scenario described in Fig. 1 starts with an h2 level set-point change at time 0, followed by a failure at time 6000 sec. A ramp leakage is introduced in tank 1, simulating a progressive failure. The control system compensates the leakage, maintaining the tank levels equal to their reference values by increasing the valve opening. The operator can see on the interface the variable trends and their prediction (grey background). This prediction is based on the last episode, which is extrapolated on a long time horizon. The prediction allows seeing level h1 remaining constant for a long time thanks to the regulation loop; but this will no longer be the case when the valve reaches its saturation level (100%). The operator has an idea about the time left before this saturation and can thus perform a counteraction (a reduction of the set point value for level h2, or a programmed shutdown, or a call to the maintenance service).
3 Fuzzy Reasoning for Fault Detection Classical control approach to fault detection is purely numerical: static or dynamic relations among the various process measurements are used for early detection of abnormal behavior. These relations are deduced from a mathematical dynamic model with the implicit assumption that no phenomenon has been ignored, that all the data are quantitatively known, that the parameters and measurements are accurate. Basic fault indicator generation is quite simple. A numerical model is fed on line with the same inputs as the process. In the simplest version, fault indicators (or residuals) r (t ) are the difference between computed and measured outputs. In the following section, a linear dynamic model is assumed to simplify the notations. Let q −1
be the shift operator of one sampling time ( q y (t ) = y (t − 1) ) and F(q) the filter ( * representing the model whose input is u and output y ( ( ( r ( t ) = y (t ) − y * ( t ) = y ( t ) − F ( q ) u ( t ) (1)
This expression can be generalised (Frank, 1996)
[
( ( r (t ) = H ( q ) y (t ) − F ( q )u ( t )
]
(2)
where H(q) is a filter that gives good properties to residuals (sensitivity, robustness…). When r(t) is high, something abnormal is happening while if it is zero, the components whose model is used to generate the residual are normal. In practice, the detection system needs to be robust to modelling errors and measurement noise. Consequently tolerance thresholds are introduced. Their value can be chosen empirically or from statistical studies. The diagnostic result could be very different depending on whether the fault indicator is equal to a specified threshold plus or minus epsilon, regardless of the amplitude of epsilon, which can lead to unstable detection. This is why many people chose to adopt a fuzzy reasoning for detection.
Artificial Intelligence for Industrial Process Supervision
7
This allows a gradual decision. Moreover, fuzzy reasoning allows taking into account more information than a simple numerical value, for instance the residual variation. Fuzzification with symmetric trapezoidal membership functions and using traditional notations lead to fuzzy descriptions for the residual and its variation that are used in the following inference rule if ri ( k ) is A and Δri (k ) is B then variable state is S
(3)
The conclusion S ∈ {OK , AL} ; OK means that the variable behaviour is normal while AL means that an alarm must be triggered. The symbolic rule base is presented in Table 1. Choosing product and sum as fuzzy operators expressing this rule base, a simple matrix product can express the inference, which results in short real time processing [6]. Table 1. Fuzzy inference for detection
Δri (k ) = ri ( k ) − ri ( k − 1) N
ri ( k )
Z
P
NN
0/OK+1/AL
0/OK+1/AL
0.2/OK+0.8/AL
N
0/OK+1/AL
0.4/OK+0.6/AL
0.6/OK+0.4/AL
Z
0.8/OK+0.2/AL
1/OK+0/AL
0.8/OK+0.2/AL
P
0.6/OK+0.4/AL
0.4/OK+0.6/AL
0/OK+1/AL
PP
0.2/OK+0.8/AL
0/OK+1/AL
0/OK+1/AL
Two kinds of decisions have been proposed to aggregate partial conclusions over a time window of M sampling periods: a robust decision D D
+
)
(4)
(
)
(5)
(
)
(6)
(
)
(7)
− + D (OK ) = H S1 (OK ) K S M (OK ) + + D ( AL ) = H S1 ( AL ) K S M ( AL )
+ − D (OK ) = H S1 (OK ) K S M (OK ) +
−
is a disjunction (max for instance) and H +
and a sensitive decision
(
− − D ( AL ) = H S1 ( AL ) K S M ( AL )
H
−
−
is a conjunction (min for instance).
D (OK ) (respectively D (OK ) ) represents the maximal (minimal) bound for the − + state normal and D ( AL ) (respectively D ( AL ) ) represents the maximal (minimal)
bound for the alarm state.
8
S. Gentil
4 Causal Reasoning for Fault Isolation After fault detection, potential causes have to be isolated. Causality occupies a central position in human cognition. Informal descriptions of real world in the form A causes B, are exceedingly common. AI community has been working for a long time on representations of causality. Causal descriptions are the source of various reasoning modes. B can be predicted or explained using A. A could be deduced from B. Causality plays an essential role in human decision-making [7]. Diagnosis is also a causal process, because it consists in designating the faulty components that have caused, and can explain, the observed malfunctions. A causal structure is a qualitative description of the effect or influence that system entities (variables, faults, etc.) have on other entities. It may be represented by a directed graph (digraph). A causal graph represents a process at a high level of abstraction. In the logical theory of abductive diagnosis, diagnosis is formalized as reasoning from effects to causes. Causal knowledge is represented as logical implications of the form causes • effects where causes are usually abnormalities or faults. The pieces of causal knowledge can be organized in a directed graph. This abductive type of reasoning contrasts with deductive reasoning from causes to effects. Causality, assimilated to calculability, has also been used to represent the physical behaviour of a system. Influence graphs are another type of causal approach to diagnosis. The graph nodes represent the system variables; the directed arcs symbolize the normal relations among them (see Fig. 2 for an example). Influence graphs avoid fault modelling, which could be unfeasible in the case of a complex system. They provide a tool for reasoning about the way in which normal or abnormal changes propagate and are suitable for physical explanations of the dynamical evolution of variables, whether normal or abnormal. No a priori assumption is made about the type of relations labelling each arc. They could be qualitative or quantitative. The simplest influence graph structure is the signed digraph (SDG). The branches are labelled by signs: "+" (or "-") when the variables at each end of the arc have the same (or opposite) trends. All influence-graph-based diagnostic methods implement the same basic principle. The objective is to account for deviations detected in the evolution of the variables with respect to the normal behaviour, using a minimum of malfunctions at the source. Malfunctions can be related to physical components, so as to obtain a minimal diagnosis. If significant deviations are detected, primary faults, directly attributable to a failure or an unmeasured disturbance, are hypothesized. The propagation paths in the directed graph are analyzed to determine whether this fault hypothesis is sufficient to account for secondary faults, resulting from its propagation in the process over time. The algorithm is a backward/forward procedure starting from an inconsistent variable. The backward search bounds the fault space by eliminating the normal measurements causally upstream. Then each possible primary deviation generates a hypothesis, which is forward tested using the states of the variables and the functions of the arcs. Arcs can be labelled with dynamic quantitative relations, which is justified from the diagnostic needs of industrial plants [8]. This leads to a combined method for diagnosis that takes advantage of the precision of FDI fault indicators because it uses a quantitative model. Simultaneously, it benefits from the logical soundness of AI
Artificial Intelligence for Industrial Process Supervision
9
approaches through the use of a causal structure that supports the diagnostic reasoning [9,10]. This method has been applied successfully to an industrial process during the European project CHEM [11].
5 Interface Skilled operators generally take the proper decision when the failure is clearly identified. With the current control systems, the execution of actions is not problematic. Their main problem comes from the difficulty in linking the behaviour of correlated variables (e.g. in control loops or when the temporal delays between the deviations in both behaviours are significant). In addition when a failure mode does not correspond to an abrupt change but rather to a slow drift, they can hardly detect the fault. The several kinds of reasoning that have been presented in the previous sections are very well adapted to a cooperative interface allowing quick understanding of the process state.
Synoptic
Graph
Detection
Isolation
Propagation
Fig. 2. Interface: detection (upper part); causal graph for isolation (lower part)
10
S. Gentil
In the interface screen presented in Fig. 2, each line is associated with a variable, and each column is associated with a sampling time. Columns close to the left side represent earlier time samples. The information display is inspired by the concept of mass data display. Each decision (detection, isolation) is associated with a colour. The results of fuzzy detection and isolation reasoning are defuzzified into an index associated with a colour shade. In the detection view, the detection decisions are defuzzified into an index of a colour map from green (normal state) to red (faulty state). In the isolation view, the decisions are defuzzified into an index of a colour map from blue (upstream fault) to red (local fault). The causal graph representation is also well adapted to an interface explaining the source of the observed faults (Fig. 2). The output of the isolation reasoning is displayed in the graph in the following way. If the node is not red, the variable is in a normal state. If the contour of the node is red, the alarm corresponds to a secondary fault. If the interior of the node is red then the variable is isolated as a primary fault. The fault path is also shown in red. The fault path in the causal graph can be seen as a justification module that explains the alarm chain to the operators.
6 Distributed Diagnosis for Complex Systems Diagnosis of complex systems, controlled through networks of controllers or automata, plants interesting new problems. Each automata or controller is connected to a particular subsystem and must communicate with each others because, as the components are physically connected, the control system must take it into account. This is specifically important when a fault occurs because this fault will propagate along the process and local diagnosis could lead to a wrong result. The theoretical problem is related to the distribution of diagnosis between several diagnosers and their communication (which information must they share?). A solution has been studied in the framework of a European project MAGIC [12]. MAGIC proposes a general purpose architecture and a set of tools to be used for the detection and diagnosis of incipient or slowly developing faults; for early identification of faulty conditions; for predictive maintenance; for system reconfiguration. MAGIC is based on a distributed architecture based on a MultiAgents-Multi-Level (MAML) concept. Agents are running on the same or separate computer platforms (semantically and spatially distributed elements). Various agents, each responsible for a specific task, are interacting while diagnosing a plant. Agents are dedicated to data acquisition, symptom generation for various components, global diagnostic decision and operator support. Communication between different agents and between different agent layers is based on CORBA client-server technology. Faults can be identified by applying different fault detection algorithms, implemented in different diagnostic agents [13]. One diagnostic agent is able to apply the same detection algorithm to different components, each configured individually. A diagnostic agent provides a symptom (value in [0..1] where 0 stands for “behaviour similar to the model”) and an evaluation of the reliability of the model used for detection (value in [0..1] where 1 stands for fully valid). A diagnostic decision agent centralizes all the symptoms. Its objective is to provide a final decision about the components’ state, integrating symptoms covering different overlapping subsystems
Artificial Intelligence for Industrial Process Supervision
11
or merging different symptoms about the same sub-system, using a signature table or fuzzy logic. This project is just a first step towards a generic distributed diagnostic solution. Industrial solutions should adopt normalized agents systems.
References 1. Iserman R., Fault Diagnosis Systems, Springer Berlin Heidelberg, 2006. 2. Gentil S., Montmain J., Hierarchical representation of complex systems for supporting human decision making, Advanced Engineering Informatics, 18/3 (2004) 143-159. 3. de Kleer J., Kurien J., Fundamentals of Model-based Diagnosis, IFAC Symposium Safeprocess 2003 Washington (USA). 4. Venkatasubramanian, V,. Process Fault Detection and Diagnosis: Past, Present and Future. 4th Workshop On-Line Fault Detection and Supervision in the Chemical Process Industries 2001, Seoul (Korea). 5. Charbonnier S., Garcia-Beltran C., Cadet C., Gentil S., Trends extraction and analysis for complex system monitoring and decision support, Engineering Applications of Artificial Intelligence, 18/1 (2004) 21-36. 6. Evsukoff A., Gentil S., Montmain J., Fuzzy Reasoning in Co-operative Supervision Systems, Control Engineering Practice 8 (2000) 389-407. 7. Montmain J., Gentil S., Causal modelling for supervision, 14th IEEE International Symposium On Intelligent Control/Intelligent Systems and Semiotics, ISIC 1999 Cambridge (USA). 8. Montmain J., Gentil S., Dynamic causal model diagnostic reasoning for on-line technical process supervision, Automatica 36 (2000), 1137-1152. 9. Gentil S., Montmain J., Combastel C., Combining FDI and AI Approaches within CausalModel-based Diagnosis, IEEE Transactions SMC-Part B, 34 (5) (2004) 2207-2221. 10. M.-O. Cordier, Dague P., Lévy F., Montmain J., Staroswiecki M., Travé-Massuyès L., Conflicts versus Analytical Redundancy Relations : A comparative analysis of the modelbased diagnostic approach from the artificial intelligence and automatic control perspectives, IEEE Transactions on Systems, Man and Cybernetics - Part B., 34 (5) (2004) 1992-2206. 11. Heim, S. Gentil, S. Cauvin, L. Trave-Massuyes, B. Braunschweig, Fault diagnosis of a chemical process using causal uncertain model, Workshop PAIS Prestigious Applications of Intelligent Systems, 15th European Conference on Artificial Intelligence ECAI 2002, Lyon (FR). 12. B. Köppen-Seliger, T. Marcu, M. Capobianco, S. Gentil, M. Albert, S. Latzel, 2003, Magic: an integrated approach for diagnostic data management and operator support, IFAC Symposium Safeprocess 2003 Washington (USA). 13. S. Lesecq, S. Gentil, M. Exel, C. Garcia-Beltran, 2003, Diagnostic Tools for a Multi-agent Monitoring System IMACS IEEE Multi-Conference CESA 2003, Lille (Fr.).
Fuzzy Ambient Intelligence in Home Telecare Trevor Martin Department of Engineering Mathematics, University of Bristol, Bristol BS8 1TR UK
Telecare is the use of communication and/or sensor technologies to detect remotely the requirements of people in need of medical care or other assistance. Typically, but not exclusively, the users of telecare systems are elderly people who would otherwise need residential or nursing support. There is growing interest in the use of telecare, particularly in countries where facing growth in the proportion of elderly people in the population (with consequent increases in care requirements). Both socially and financially, it is generally preferable for the elderly to remain in their own homes for as long as possible. A number of research projects have looked into home-based telecare and telemedicine systems as a way of increasing quality of life for the elderly as well as reducing the cost of care. We can distinguish telemedicine as a subfield of telecare, where the specific aim is to remotely monitor physiological parameters of a person (such as blood sugar levels, blood pressure, etc) whereas telecare is a less specific form of monitoring looking to generate alerts in emergency situations. Telecare is not a new idea - simple alarms operated by pull-cords or pendants have been available for 30 or more years. We refer to these as first generation systems, typically used as panic-alarms to summon help in the case of a fall or other emergency. Whilst such systems have obvious benefits, they become useless when the user is unable to raise the alarm (e.g. because of unconsciousness) or does not recognise the need to signal an alarm. Second generation telecare systems make use of sensors to detect emergency situations. These sensors can be worn on the body, measuring factors such as respiration, pulse, etc, or can be sited around the home, detecting movement, possible falls, etc. Second generation systems are obviously far more sophisticated than the first generation and may require substantial computation and a degree of ambient intelligence to establish when an emergency situation arises. As with the first generation, an alarm can be triggered to summon help. In both first and second generation systems, the aim is to detect an emergency and react to it as quickly as possible. Third generation telecare systems adopt a more pro-active approach, giving early warning of possible emergency situations. In order to do this, it is necessary to monitor the well-being of a person defined in terms of their physical, mental, social and environmental status. By detecting changes in the daily activities of the person, it is possible to detect changes in their well-being which may not be immediately observable, but can be detected over a longer period of time. Advances in sensor design and the continuing increase in processing power make it possible to implement such an ambient intelligence system, and we will describe the implementation of a third generation telecare system which has been tested in homes of elderly clients over a long term (6 to 18 months). M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 12–13, 2006. c Springer-Verlag Berlin Heidelberg 2006
Fuzzy Ambient Intelligence in Home Telecare
13
The system is installed within a home as a customised sensor network, able to detect a persons movements and their use of furniture and household items. The sensors are designed to operate discreetly, such that the occupant need not interact directly with any component. We will focus particularly on the intelligent processing which enables the system to take low level data (e.g. kitchen sensor activated; cold water run for 20 seconds; kettle switched on for 60 seconds; fridge door opened) and answer questions such as is the occupant eating regularly, is the occupants social interaction increasing or decreasing, has the occupants sleep pattern changed significantly in the past few months etc. This is a very substantial inference and learning task and the nature of the data and queries make a soft computing approach a natural choice. Initial trials of the system indicate that intelligent data analysis and reasoning enables us to make plausible inferences of this type with a high degree of accuracy.
Modeling and Multi-agent Specification of IF-Based Distributed Goal Ontologies Nacima Mellal, Richard Dapoigny, Patrick Barlatier, and Laurent Foulloy LISTIC-ESIA, BP 806 University of Savoie 74016 Annecy, France {nacima.mellal, richard.dapoigny, patrick.barlatier, Laurent.foulloy}@univ-savoie.fr
Abstract. The concept of service is central in the design of distributed systems. In this approach for example, the web is developing web services and grid services. Nowadays, it is essential to take into account the crucial aspects of the dynamic services, that is to say their ability to adapt and to be composed in order to complete their task. To this end, the first part of the present paper aims to describe the implementation of a methodology which deals the automatic composition of services in distributed systems. Each service is related to a goal and is represented by a functional model called an Ontology. The model relies on a core reasoning process between interacting functional components of the complex system following the Information Flow (IF) approach. Afterwards, in the second part, we propose an algorithm describing the mechanism of the dynamic composition, basing on the first part and using Multi Agent System (MAS), where the agents support the functional components of the complex systems.
1 Introduction Embedded computing systems must offer autonomous capabilities in the often hostile environment in which they operate. Moreover, these systems are becoming more and more complex such as distributed intelligent components which are able to communicate and to reason about actions. Such systems include command and control systems, industrial process-control plants, automotive systems, dataacquisition and measurement systems, just to name a few. Although these systems are dedicated to design-defined tasks, they must offer some level of intelligence to cope with the dynamic and unpredictable environment with which they interact. A computational model powerful enough to capture the intensional knowledge within distributed systems and expressive enough to serve as a basis for reasoning and planning is required. Upon the teleological assumption a formal model based on goal structures has been proposed [4]. These structures are elaborated with the mechanisms of Formal Concept Analysis (FCA) during the design phase and results in domain ontologies able to be coordinated at run-time within an Information Flow (IF)-based framework. Web Services foster an environment where complex services that are composed of a number of tasks can be provided. With its modularity and its ability to M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 14 – 23, 2006. © Springer-Verlag Berlin Heidelberg 2006
Modeling and Multi-agent Specification of IF-Based Distributed Goal Ontologies
15
support communication standards, the concept of service seems appropriate for the present model, each service being related to a high-level goal. The inter-dependencies between goal ontologies related to each service are deduced from an IF-based process implemented in a group of autonomous agent programs. These agents communicate asynchronously over a network to compose the desired high level goal. In the second section, the semantics for goal fusion is presented through the goal ontology and the process of fusion. The third section describes the major features of the MAS implementation. The fourth section is dedicated to future works and conclusion.
2 The Semantics for Goal Fusion 2.1 The Goal Ontology In the design of distributed intelligent system, as well as in the run-time process, the service-based model plays a central role. The proposed approach relates each service with a top-level goal. This approach allows dividing complex systems functionality into elementary functional components (externally a service is seen as a group of subservices). The design process involves the composition of sub-services, the fusion of services and to check if the composition and/or fusion fulfill the requirements. Architectures using services are widely applied in the domain of web services or telecommunications and more recently, in the engineering domain (e.g., automotive applications) [13]. For many engineering applications, the results must be abstracted in terms of the functions (teleological approach). The functional knowledge is an expression of the teleological knowledge on which the system can reason. Any functional concept is described by a goal definition which is related to the intentional aspect of function [11] and some possible actions (at least one) in order to fulfill the intended goal [8] [10]. Inside a given service, the causal decomposition of a high-level goal is exclusively an AND decomposition. It relies on dependencies in the lambda-calculus style. However, the OR decomposition can be expressed at several abstraction levels. The alternative dependencies of a given goal (OR-type goal) are either expressed at the highest level (i.e., the service level, since a single goal is active at a given time inside a given process) or at a finest one by splitting the goal into several actions guarded by pre-conditions (way of achievement). The teleological basis introduced in [3] relates the structure and behavior of the designed system to its goals. The goal modeling requires to describe a goal representation (i.e., a conceptual structure), and to define how these concepts are related. For our purpose, a classification of goals is built on the basis of two sorts, i) the goal type which requires an action verb, a physical role with a set of location types and ii) a physical context including the physical role with the location where the role has to be considered. Definition 1. Given R, a finite set of physical roles and Ψ, the finite set of physical entity types, a physical context type is a tuple: ξi = ( r, µ(r), ψ1, ψ2, ... ψµ(r)), where r∈R, denotes its physical role (e.g., a physical quantity), µ: R→Nat, a function assigning to each role its arity (i.e., the number of physical entity types related to a
16
N. Mellal et al.
given role) and {ψ1, ψ2, ... ψµ(r)} ⊆ Ψ, a set of entity types describing the spatial location types where the role has to be taken. A similar definition holds for physical contexts tokens ci by replacing physical entity types with physical entity tokens. Definition 2. A goal type is a pair(A, Ξ), where A is an action symbol and Ξ is a nonempty set of physical context types. In [5], since the goal types appear to be components of a hierarchical structure, it is worth describing them by means of a subsumption hierarchy (i.e., a concept lattice). A formal context is built, from which a concept lattice is extracted using the Formal Concept Analysis tool (FCA). Pruning techniques based both on standard FCA-based techniques [7] and applicative works [3], extract the goal hierarchy in which each node is a concept labeled with the appropriate goal. Using the results of FCA about hierarchies of conceptual scales together with a definition of ontology, we produce a domain ontology of goal types. This ontology reflects the intentional knowledge concerning the interaction of a given service with the physical system. Unlike hierarchies, ontologies have a richer conceptual structure and can be accessed through well-defined standard languages such as OWL-S. The nature of a primitive goal relationship is inferrable from the physical or control equations where variables vi are inter-connected through their physical contexts ξi .
γ i influences functionally γ j have already achieved γ i , with the notation:
Definition 3. A goal
γi < γ j
iff the only way to achieve
γ j is to
(1)
An important consequence is that the different levels of abstraction are able to cope with the complexity of the system. Let us develop an example extracted from [5] which represents the concept lattice of a system controlling the velocity and the level of a water channel and having two input sensors for pressure measurement and an actuator able to regulate the water level. In addition the system is able to communicate through a network with other similar systems. The atomic goal types for this system are: γ1 γ2 γ3 γ4 γ5 γ6 γ7 γ8 γ9
=({to =({to =({to =({to =({to =({to =({to =({to =({to
acquire},{(pressure, 1, liquid volume)}) compute},{(velocity, 1, channel part)}) compute},{(level, 1, channel part)}) send},{(velocity, 1, channel part), (level, 1, channel part)}) receive},{(velocity,1, channel_part), (level,1, channel_part)}) compute},{(level, 2, {channel part, channel part})}) compute},{(offset, 1, actuator)}) receive},{(offset, 1, actuator)}) act upon},{(position, 1, actuator)})
The three goal types (static services) related to the services (i.e., the hierarchies), namely Γ1, Γ2 and Γ3 respectively describe a measurement, a manual and a control services. For each goal, the user must specify in a design step the appropriate service including local influences (see Figure.1).
Modeling and Multi-agent Specification of IF-Based Distributed Goal Ontologies
17
Γ1 =({to measure},{(velocity,1,channel_part),(level,1,channel_part)}) Γ2 =({to control},{(level, 1, channel part)}) Γ3 =({to manually-Move},{(position, 1, actuator)})
Ontological definition includes concepts (goal types) together with basic ontological axioms. Goal types are the basic concepts and their relations are defined through three core notions: the functional part-of ordering relation, the overlap between goals, and the sum (fusion) of goal types. The first relation concerns the partial order notion. In the spirit of the typed λ-calculus we will consider only the undirect functional part-of for which transitivity holds. Therefore, the functional partof relation on concepts, denoted as ⊆ Γ, is reflexive, anti-symmetric and transitive. The overlap is referred by the binary relation O between goal types and specified with the following ontological axiom:
γ i Ογ j ≡ ∃γ k (γ k ⊆ γ j ∧ γ k ⊆ γ j )
(2)
Definition 4. A functional part-of hierarchy F is described by the following tuple: F = (Γ, ⊆ Γ, O, +), where Γ is a finite set of goal types, ⊆ Γ is a partial order on Γ, O is the overlap relation, and +, the fusion relation on Γ.
Fig. 1. The Goal Ontologies
2.2 The Fusion Process of Goal Ontologies As a crucial topic, information exchange between functional hierarchies must occur in a semantically sound manner. Major works stem from the idea that a classification of information (types versus tokens) must exist in each of the components of a distributed system [1], [9]. Of particular relevance for us is the work in Information Flow (IF) and its application to context-dependent reasoning. Therefore, we follow the IF mathematical model which describes the information flow in a distributed system. It is based on the understanding that the information flow results from spaciotemporal connections of event tokens in the system. Classifications are connected via infomorphisms. The basic construct of channel theory is that of an IF channel
18
N. Mellal et al.
between two IF classifications which models the information flow between components. Local logics express physical constraints through Gentzen sequents. Therefore there is a need to consider distributed IF logics of IF channels. Recent IFbased works such as [14] which tackle the ontology coordination problem have demonstrated that IF channels can accommodate with ontological descriptions. IF theory describes how information can flow through channels to convey new information under first order logic. Reasoning about goals is based on their extensional properties, i.e., on their physical context tokens, namely ci. The semantic integration of goal types from separate systems is achieved through a process including several steps. This process described in [8], uses IF classifications where the classification relation is in fact a subsumption relation. We have extent this work to goal hierarchies where the classification relation expresses the functional dependency. The reader is supposed to be familiar with the IF formalism and notations (for a deeper understanding of channel theory, see [1]). Given multiple functional hierarchies Fi(0) ... Fj(k) in a distributed application, where 0 stands for the requesting computing system, k for the remote one, and i, j, for the respective services Si, Sj on these systems, an IF theory can be computed on the core resulting from the disjoint union of hierarchies. The semantic interoperability of hierarchies relies on the IF theory expressing how the different types from classifications are logically related to each other. For example, a sequent like γq(k)├ γp(0) with γp(0) ∈ Fi(0) and γq(k) ∈Fj(k) represents the functional dependency between goals types which reflects how the tokens of different hierarchies (i.e., contexts) are interconnected. The associated core logic is distributed on the sum of each node (co-product) and extracts all goal sequents between distributed services provided that they share some physical context(s) [6]. Let us begin with a service Si, on system 0 which must achieve the purpose referred by its related goal γi and its functional hierarchy Fi(0). Unfortunately, to complete this goal, the service requires another goal(s) able to produce a set of physical contexts ci, .with 1 ≤ i ≤ n . In addition, the remote service on system k we are looking for, must include goal type(s) such as γq(k) = ({to_send}, ξi) where ci is of type ξi, and γr(k)├ ... since it must satisfy additional local constraints. The objective of the composition process is to demonstrate that the requesting system (i.e., Fi(0)), is able to automatically select the appropriate goal from the remote system (i.e., Fj(k)), to complete its task (in the case of different ontologies, it requires first an ontology alignment). From the goal hierarchies, we can derive the IF theories for each potential service. The theory for each service is divided in two parts, a set Σh covering syntactical constraints with respect to the ontological relations (see def. 4) and a set Σs describing semantic constraints within a given application. The goal types are classified on the basis of their physical context (extensional). In order to construct the IF channel, we must first extract from each potential hierarchy the physical context that matches the required pattern. This equivalence can be formalized with a classification M such as |typ(M)| = n. The types {α, β, ...} of classification M represents the common domain of physical contexts (i.e., the partial alignment) and all their possible tokens. To relate the physical context classification with the goal classification it is useful to introduce
Modeling and Multi-agent Specification of IF-Based Distributed Goal Ontologies
19
the flip of goal classifications where flipping amounts to interchanging rows and columns. In order to satisfy the context correspondences, the flips of classifications Fi(0)⊥ ..., Fj(k)⊥ are introduced, which give rise to the respective couples of infomorphisms ζi (0) , ... ζj (k). Each pair of channels Fi(0)⊥ -M- Fj(k)⊥ capture the information flow of the context alignment. The partial alignment of physical contexts is performed through the classification M where relations such as ζi(0)∧(α) =cp(0) and ζj(k)∧(α)= cq(k) mean that cp(0) and cq(k) represent identical types in the respective classifications Fi(0)⊥ and Fj(k)⊥. Given the flips of classifications and the type functions ζi(0)∧ and ζj(k)∧, we are able to express the token functions ζi(0)∨, ζj(k)∨ since they must verify the fundamental property of infomorphisms (see [1]). To express how the high level goal type from the classification Fi(0) can be related to a classification Fj(k), we introduce the conjunctive power for the flips Fi(0)⊥ and Fj(k)⊥, namely ∧Fi(0)⊥ and ∧Fj(k)⊥. The conjunctive power classifies goal types to sets of physical contexts whenever their entire context is in the set. This gives rise to conjunction infomorphisms: κi(0):Fi(0)⊥→ ∧Fi(0)⊥ and κj(k): Fj(k)⊥ → ∧Fj(k)⊥ Finally, the information flow between the requesting classification (i.e., Fi(0)) and potential candidate classifications (i.e., Fj(k)), is captured by the colimit of the distributed system, that is the resulting classification C with the pair of infomorphisms: gi(0): ∧Fi(0)⊥ → C and gj (k) : ∧Fj(k)⊥ → C The types of the classification in C are elements of the disjoint union of types from ∧Fi(0)⊥ and ∧Fj(k)⊥ and the tokens are pairs of goal types (γp(0) , γq(k)). A token of ∧Fi(0)⊥ is connected to a token of ∧Fj(k)⊥ to form the pair (γp(0) , γq(k)) iff ζi(0)∨(γp(0)) and ζj(k)∨(γq(k)) are of the same type in M. The IF classification on the core results from the distributed classifications. It captures both goal capabilities and the identification of some physical contexts. The sequents of the IF theory on the core are restricted to those sequents which respect
Fig. 2. Construction of the core channel from classifications
20
N. Mellal et al.
goal capabilities and physical contexts identifications. Given the logic Log(C)=L on the core C, the distributed logic DLogC (L) on the sum of goal hierarchies Fi(0) + Fj(k) is the inverse image of Log(C) on this sum. The logic is guaranteed to be sound on those tokens of the sum that are sequences of projections of a normal token of the logic in C. In other words, the inverse image of IF logic in C is the result of the coproduct of Fi(0) and Fj(k) with the morphism [fi(0) ° gi(0), fj (k) ° gj (k) ]-1. We obtain sequents like γp(0) , γq(k) relating goal(s) on remote systems to the local goal(s). From here it is straightforward to extend goal dependencies to dependencies between higher-level goal, and finally between distributed services. A scenario-based example with a full description of this process is available in [5].
3 The Goal Fusion with MAS 3.3 The MAS Features Agent-based systems are of increasing importance. They are regarded as a new paradigm enabling an important step forward in empirical sciences, technology and theory. Cooperative agents can be equipped with different properties which do not all appear to be necessary or useful in every application, such as Autonomous Behavior, Cooperative ability, Intelligent or Emergent Behavior. Multi-Agents Systems (MAS) models are considered as programming paradigms as well as implementation models for complex information processing systems. More precisely, in distributed systems, the MAS paradigm is often useful, since each agent is able to locally process data and exchange only high-level information with other parts. Savings in bandwidth and transmission times are important features of such an architecture. In this paper, we will exploit the high-level and dynamic nature of multi-agent interactions which is appropriate to open systems where the constituent components and their interaction patterns are continuously changing. The IF-based mechanism of searching for goal dependencies is typically that of distributed problem solving systems, where the component agents are explicitly designed to cooperatively achieve a given goal. Modularity is achieved by delegating control to autonomous components which can better handle the dynamics of the complex environment and can reduce their interdependencies. 3.2 The MAS Implementation A variety of architectures are available for multi-agent systems [16]. The method described in §2.2 is implemented through the BDI formalism [12], each agent having its own knowledge base. This architecture exhibits deliberative reasoning (feed forward, planning) while its version includes here a modified loop reflecting the teleological nature of the formal model. Unlike the classical BDI model where goalsare deduced from beliefs, we derive beliefs from goals. More precisely, the desires are represented by the global goal that is committed to achieve, that is, its related goal ontology. The IF-based process acquires goal dependencies from other agents. As a consequence, the belief set is represented by the set of influencing goals since its is an expression of the teleological knowledge. The ongoing work turns out to map goals onto computational resources and we must consider the temporal
Modeling and Multi-agent Specification of IF-Based Distributed Goal Ontologies
21
constraints which is not the case in major of planning's models and the problem become more complex. This knowledge reflects the intentions, that is the intended sequences of actions in order to achieve the above sequence of goals. The agent implementation of the dependency search is detailed below in the algorithm and summarized in fig 3.
Begin : ‘ to achieve service i ’ in System (0), If ( i is not achievable locally), Then, Agent0 (A0) gets the ontology (Fi(0) ) of i. A0 derives the IF theory of i. A0 forms the IF classification of i A0 extracts the physical context ci and generates M A0 broadcasts a request to all agents working with typ(ci ). If (no agent answers) Then service aborted Else, A0 identifies the candidate agents Ak . A0 builds a classification . A0 computes the flip of Fi(0): Fi(0)⊥, and ∧Fi(0)⊥ . Ak computes the flip of Fj(k): Fj(k)⊥ and ∧Fj(k)⊥ . A0 merges the results of Ak,, builds the classification C . //Tokens of C are pairs of goal types (γp(0) , γq(k)) and types are elements of the disjoint union of types from ∧Fi(0)⊥ and ∧Fj(k)⊥ If γp(0) , γq(k) are of the same type in M, Then, A0 computes Log(C), connects (γp(0) , γq(k)), and deduces the distributed logic. A0 applies the constraints s for goal selection. End: Fusion of the ontologies of the local goal with the remote one.
Fig. 3.The Fusion Algorithm
22
N. Mellal et al.
The algorithm complexity depends on the significant phases. The context analysis and alignment is not significant. The computation of classifications is O(nK(p)) with n, the number of goal types and p, the number of physical contexts in a service. The value of K(p) results from the conjunctive power of classifications and we have obviously p p p ⎛ p⎞ ⎛ p⎞ ⎛ p⎞ K ( p) = ∑ ⎜⎜ ⎟⎟ . It is also trivial to see that : K ( p) = ∑ ⎜⎜ ⎟⎟ < ∑ ⎜⎜ ⎟⎟ = 2 p . p '=1 ⎝ p' ⎠ p '= 0 ⎝ p' ⎠ p ' =1⎝ p' ⎠ If we consider the same size orders for the classifications, the core classification 2 p+1 computation requires in the worst case O(n ×2 ). This term is the upper limit of the 2 p computation time, therefore the overall complexity can be reduced to O(n ×2 ). Since 2 the number p of physical contexts in a service is the most deterministic part (O(n ) ⊂ n O(2 )), the algorithm can be optimized (in complexity) by dividing big services into smaller ones and to process each of them on different computing systems (parallelization).
4 Conclusion This paper presented a formal process to compose goal ontologies in a sound manner with major guidelines for the implementation of related teleological agents. The goal composition can occur either at run-time between distributed systems in order to find goal dependencies or in the design step as a modular tool where the user composes high-level goals from primitive goals. While other models such as qualitative ones (either causal models or abstraction hierarchies) generates spurious solutions, the IFbased approach avoids this problem through the use of sound distributed logics. Such a process must not be considered as a simple pattern matching process, since the resulting goal dependencies must respect the sum of the local logics both on the syntactic and the semantic level. The agent specification meant to be simple enough to be used in the development of reusable and maintainable agent architectures for agent-oriented systems in engineering environments. The notion of dependence between agents is a challenging problem [2]. Some authors have proposed a graph structure to formalize the relationships between agents [15]. In this work, the IF-based approach tackles the problem of building these dependencies from distributed logics. The dependences are dynamic and their types of are AND-based. Concerning the ongoing work, the behavioral model based on the Event Calculus maps each goal to possible actions guarded with pre-conditions (we clearly separate goals from their way of achievement). The more adapted architecture for this step is the hybrid architecture combining deliberative and reactive reasoning. The reactive part is necessary since it allows re-planning in case of failure during the execution phase. We plan the MAS implementation with the Cognitive Agent Architecture (Cougaar) with the mechanisms for building distributed agent systems, it provides a rich variety of common services to simplify agent development and deployment. Cougaar seems a good compromise.
Modeling and Multi-agent Specification of IF-Based Distributed Goal Ontologies
23
References 1. Barwise J., Seligman J.: Information Flow. Cambridge tracts in Theoretical Computer Science, 44, (1997) Cambridge University Press. 2. Castefranchi C., Cesta A., Miceli M.: Dependence Realtions in Multi-agent systems. In Y. Demazeau and E. Werner eds., Decentralized AI, Elsevier (1992) 3. Dapoigny R., Benoit E., Foulloy L.: Functional Ontology for Intelligent Instruments.Foundations of Intelligent Systems. LNAI 2871 (Springer) (2003) pp 88–92 4. Dapoigny R., Barlatier P., Benoit E., Foulloy L.: Formal Goal generation for Intel-ligent Control systems. 18th International Conference on Industrial & Engineering Applications of Artificial Intelligence & Expert Systems LNAI 3533 Springer (2005) pp. 712-721 5. Dapoigny R., Barlatier P., Mellal N., Benoit E., Foulloy L.: Inferential Knowledge Sharing with Goal Hierarchies in Distributed Engineering Systems. Procs. of IIAI’05, Pune (India) (2005) 6. Dapoigny R., Barlatier P., Benoit E., Foulloy L.: Formal Goal generation for Intelligient Control systems. 18th International Conference on Industrial & Engineering Applications of Artificial Intelligence & Expert Systems. LNAI 3533 Springer (2005) pp 712-721. 7. Dapoigny R., Barlatier P., Mellal N., Benoit E., Foulloy L., Goal integration for service inter-operability of engineering systems, Int. Conf. on Conceptual Structures (ICCS 2005), Kassel, de, July 2005, pp. 201-202 8. Ganter B., Wille R.: Formal concept analysis - mathematical foundations. (1999) Springer 9. Hertzberg J., Thiebaux S.: Turning an Action Formalism into a Planner:a case Study. Journal of Logic and Computation, 4, (1994) pp 617-654 10. Kent R.E.: Distributed Conceptual Structures. The relational methods in Computer Science, LNCS 2561, pp 104-123, (2002) 11. Lifschitz V.: A Theory of Actions. Procs. of the tenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann eds., (1993) pp 432-437 12. Lind M.: Modeling Goals and Functions of Complex Industrial Plant.Journal of Applied Artificial Intelligence 8 (1994) pp 259–283 13. Rao A.S., and Georgeff M.P. : BDI Agents: from Theory to Practice. In Procs. of the 1rst Int. Conf. on Multi-agent Systems (ICMAS’95) pp 312--319 (1995) 14. Schätz B.: Towards Service-based Systems Engineering: Formalizing and checking service specifications, Tech. Report TUMI-0602 München (2002) 15. Schorlemmer M., Kalfoglou Y.: Using information-flow theory to enable semantic interoperability. In 6e Congres Catala en Intelligencia Artificial, Palma de Mallorca, Spain, (2003) 16. Sichman J.S., Conte R. : Multi-Agent Dependence by Dependence Graphs. Procs. of AAMAS’02 pp 483--490 (2002) 17. Zambonelli F. Jennings N.R., Wooldridge M. : Developing MultiAgent Systems: The Gaia Methodology. ACM Transactions on Software Engineering and Methodology, 12(3), (2003), pp 317–370.
Agent-Based Approach to Solving Difficult Scheduling Problems Joanna Jędrzejowicz1 and Piotr Jędrzejowicz2 1
2
Institute of Mathematics, Gdańsk University, Wita Stwosza 57, 80-952 Gdańsk, Poland
[email protected] Department of Information Systems, Gdynia Maritime University, Morska 83, 81-225 Gdynia, Poland
[email protected] Abstract. The paper proposes a variant of the A-Team architecture called PLA-Team. An A-Team is a problem solving architecture in which the agents are autonomous and co-operate by modifying one another’s trial solutions. A PLA-Team differs from other A-Teams with respect to strategy of generating and destroying solutions kept in the common memory. The proposed PLA-Team performance is evaluated basing on computational experiments involving benchmark instances of two well known combinatorial optimization problems - flow shop and job-shop scheduling. Solutions generated by the PLA-Team are compared with those produced by state-of-the-arts algorithms.
1
Introduction
Recently, a number of agent-based approaches have been proposed to solve different types of optimization problems [8]. One of the successful approaches to agent-based optimization is the concept of A-Teams. An A-Team is composed of simple agents that demonstrate complex collective behavior. The A-Team architecture was originally developed by Talukdar [13]. A-Teams have proven to be successful in addressing hard optimization problems where no dominant algorithm exists. Within the A-Team multiple agents achieve an implicit cooperation by sharing a population of solutions. The design of the ATeam architecture was motivated by other architectures used for optimization including blackboard systems and genetic algorithms. In fact, the A-Team infrastructure could be used to implement most aspects of these other architectures. The advantage of the A-Team architecture is that it combines a population of solutions with domain specific algorithms and limited agent interaction. In addition, rich solution evaluation metrics tend to result in a more diverse set of solutions [9]. According to [13] an A-Team is a problem solving architecture in which the agents are autonomous and co-operate by modifying one another’s trial solutions. These solutions circulate continually. An A-Team can be also defined as a set of M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 24–33, 2006. c Springer-Verlag Berlin Heidelberg 2006
Agent-Based Approach to Solving Difficult Scheduling Problems
25
agents and a set of memories, forming a network in which every agent is in a closed loop. An asynchronous team (A-Team) is a strongly cyclic computational network. Results are circulated through this network by software agents. The number of agents can be arbitrarily large and the agents may be distributed over an arbitrarily wide area. Agents cooperate by working on one another’s results. Each agent is completely autonomous (it decides which results it is going to work on and when). Results that are not being worked on accumulate in common memories to form populations. Randomization (the effects of chance) and destruction (the elimination of weak results) play key roles in determining what happens to the populations. A-Team architecture allows a lot of freedom with respect to designing procedures for communication between agents and shared memories as well as creation and removal of individuals (solution) from common memories. In this paper an approach to construct a team of agents, further on referred to as a PLA-Team, is proposed and validated. The approach makes use of the idea which has been conceived for the population learning algorithm. An effective search for a solution of computationally hard problems requires a cocktail of methods applied to a population of solutions, with more advanced procedures being applied only to more promising population members. Designing and implementing a PLA-Team is seen as an extension of the range of available agent-based optimization tools. In Section 2 of the paper, a PLA-Team architecture is described and some details of the proposed approach are given. In Sections 3 and 4 experiment results are reported. The approach has been used to solve benchmark instances of the two well known and difficult combinatorial problems: the flow shop and job-shop scheduling. The solutions obtained by the proposed PLA-Team are compared with the performance of state-of-the-art algorithms including parallel ones. The conclusions contain an evaluation of the proposed approach and some suggestions for future research.
2
The Proposed Agent-Based Approach
The proposed agent-based approach is a variant of the A-Team architecture. Its origin can be traced to the idea of the population learning algorithm (PLA) and hence it is further on called the PLA-Team. Population learning algorithm was proposed in [4] as yet another population-based method, which can be applied to support solving difficult decision-making and optimization problems. Thus far PLA has been effectively applied to solving a variety of difficult scheduling problems. The algorithm has proven successful in finding better lower bounds than previously known for numerous instances of benchmark problems maintained in the OR-LIBRARY (http://people.brunel.ac.uk/~mastjjb/jeb/info.html)including, for example permutation flowshop scheduling, due-date scheduling or task scheduling on a single machine with total weighted tardiness as a criterion [5], [6]. The algorithm proved also successful in the domain of ANN training [3], producing high quality neural networks in a competitive time. The computational
26
J. Jędrzejowicz and P. Jędrzejowicz
intelligence embedded within a population learning algorithm scheme is based on the following heuristic rules: – To solve difficult computational problems use agents applying a cocktail of methods and techniques including random and local search techniques, greedy and construction algorithms, etc., building upon their strengths and masking weaknesses. – To escape getting trapped in a local optimum generate or construct an initial population of solutions called individuals, which in the following stages will be improved, thus increasing chances for reaching a global optimum. Another mean of avoiding getting trapped in local optima is to apply, at various stages of search for a global optimum, some random diversification algorithms. – To increase effectiveness of searching for a global optimum divide the process into stages retaining after each stage only a part of the population consisting of "better" or "more promising" individuals. Another mean for increasing effectiveness is to use at early stages of search improvement algorithms with lower computational complexity as compared to those used at final stages. Although PLA has been producing high quality solutions its main disadvantage is, in most cases, an extensive demand for the computational power resulting in long computation times. To increase efficiency of the approach a parallel version of the PLA was proposed [3]. The parallel PLA is based on the co-operation between the master agent (server) whose task is to manage computations and a number of agents - autonomous PLA processes, who act in parallel. A set of the following rules governs the PLA: – Master agent defines the number of working agents and the size of the initial population for each of them. – Each working agent uses the same, described earlier, learning and improvement procedures. – Master agent activates a parallel processing. – After completing each stage agents inform master agent about the best solution found so far. – Master agent compares the received values and sends out the best solution to all agents replacing their current worst solution. – Master agent can stop computations if the desired quality level of the objective function has been achieved. Alternatively, computations are stopped after the predefined number of iterations at each PLA stage, has been executed. – Each working agent can also stop computations if the above condition has been met. The approach suggested in this paper uses a similar parallel architecture with different rules for communication between a variety of learning and improvement
Agent-Based Approach to Solving Difficult Scheduling Problems
27
procedures employed to find a solution. The idea is to use identical procedures as in case of "classic" PLA with different agent roles and functionalities. This time an agent is not necessarily a complete PLA implementation but rather a simple local search procedure, a metaheuristic, or even a set of agents, aiming to find a solution to a problem at hand and representing one of the original PLA procedures. The proposed team of agents consists of a number of autonomous and asynchronous agents of n kinds, where n is a number of learning and improvement procedures employed. The number of agents of each kind is not limited and depends on the availability of computational resources. There is, however an identical number of agents of each kind within a population of agents. An additional element of the proposed system is a common memory which contains a number of individuals representing feasible solutions to the problem at hand. Each agent acts independently executing a sequence of computation steps. A single computation step for an agent includes the following: – Copying a number of individuals from the common memory, which is represented as a list of randomly generated individuals (feasible solutions). Individuals in the discussed examples are either permutations of jobs or permutations of tasks. – Executing learning and improvement procedure upon thus drawn individuals. – Returning the improved individuals to the common memory through discarding worst individuals in the common memory and replacing them with the improved ones. There is no master agent nor any centralized control. Stopping condition for a group of agents working as a PLA-Team is set beforehand by the user. It could be either an amount of time the agents are allowed to work or a number of common memory calls performed by a selected agent. It should be noted that within the proposed approach, the main idea of PLA, which is using more computationally complex learning/improvement procedures as a number of individuals within the population of solutions decreases is, indirectly, preserved. This is achieved through setting agent properties during its fine-tuning. Each agent, notwithstanding which local search procedure or metaheuristic it uses, is set to execute a similar number of evaluations of the goal function and hence, performing a single computation step in a comparable time. The number of evaluations of the goal function is controlled by allocating to an agent a number of individuals (solutions), which are drawn from the common memory and which an agent tries to improve at a given computation step. Consequently agents executing more computationally complex learning and improvement procedures would be drawing, at a single step, less individuals than agents executing less complex procedures. General scheme of the PLA-Team is shown in the following pseudo-code:
28
J. Jędrzejowicz and P. Jędrzejowicz
PLA-Team begin Initialize the common memory by using a random mechanism to produce P individuals (here feasible solutions of the problem at hand); Set within a parallel and distributed environment n×m agents, where n is a number of improvement procedures employed and m is a number of agents of each kind; For each agent kind set a number of individuals that an agent can draw in a single step from the common memory to assure a comparable number of the goal function evaluations in a single computation step; repeat for each agent Draw randomly the allowed number of individuals from P and copy them into working memory; Improve individuals in the working memory by executing learning and improvement procedures; Replace worse individuals in P by comparing pair-wise best-one currently produced with the currently worst-one in P, second best-one produced with the currently second worst-one in P, etc. After each comparison an individual from P is replaced by the improved one if the latter is better; until stopping criterion is met Select best individual from P as a solution end
3
Flowshop Scheduling Using a PLA-Team Architecture
In the permutation flowshop scheduling problem (PFSP) there is a set of n jobs. Each of n jobs has to be processed on m machines in the order 1 . . . m. The processing time of job i on machine j is pij where pij are fixed and nonnegative. At any time, each job can be processed on at most one machine, and each machine can process at most one job. The jobs are available at time 0 and the processing of a job may not be interrupted. In the PFSP the job order is the same on every machine. The objective is to find a job sequence minimizing schedule makespan (i.e., completion time of the last job). The PLA-Team applied to solving the PFSP instances makes use of the standard evolutionary algorithm with a cross-over and mutation, tabu search and simulated annealing. The above learning and improvement procedures have been previously used by the authors within the PLA implementation. A detailed description of these procedures can be found in [6]. Problem instances are represented as text files in exactly the same format as they appear in the OR-LIBRARY. The computational experiment was designed to compare the performance of the PLA-Team with other state-of-the art techniques. It has been decided to follow the experiment plan of [11] to assure comparability. The experiment involving PLA has been carried on a PC computer with the 2.4 GHz Pentium 4 processor and 512 MB RAM. The PLA-Team consisted of 3 agents working in a parallel environment on a cluster of 3 PC computers with P4 1.9 GHz processors
Agent-Based Approach to Solving Difficult Scheduling Problems
29
Table 1. The average deviation from the currently known upper bound (%) instance NEHT GA HGA SAOP SPIRIT 20 × 5 3.35 0.29 0.20 1.47 5.22 20 × 10 5.02 0.95 0.55 2.57 5.86 20 × 20 3.73 0.56 0.39 2.22 4.58 50 × 5 0.84 0.07 0.06 0.52 2.03 50 × 10 5.12 1.91 1.72 3.65 5.88 50 × 20 6.20 3.05 2.64 4.97 7.21 100 × 5 0.46 0.10 0.08 0.42 1.06 100 × 10 2.13 0.84 0.70 1.73 5.07 100 × 20 5.11 3.12 2.75 4.90 10.15 200 × 10 1.43 0.54 0.50 1.33 9.03 200 × 20 4.37 2.88 2.59 4.40 16.17 500 × 20 2.24 1.65 1.56 3.48 13.57 Average 3.33 1.33 1.15 2.64 7.15
GAR GAMIT PLA PLA-Team 0.71 3.28 0.03 0.00 1.97 5.53 0.58 0.32 1.48 4.33 0.42 0.26 0.23 1.96 0.07 0.03 2.47 6.25 0.77 0.71 3.89 7.53 1.67 1.62 0.18 1.33 0.03 0.01 1.06 3.66 0.72 0.52 3.84 9.70 1.09 1.35 0.85 6.47 0.56 0.41 3.47 14.56 1.05 1.04 1.98 12.47 1.13 1.09 1.84 6.42 0.68 0.61
with 512 MB RAM. The results obtained by applying the PLA and the PLATeam are compared with the following results reported in [11]: NEHT - the NEH heuristic with the enhancements, GA - the genetic algorithm, HGA - the hybrid genetic algorithm, SAOP - the simulated annealing algorithm, SPIRIT - the tabu search, GAR - another genetic algorithm and GAMIT - the hybrid genetic algorithm. For evaluating the different algorithms the average deviation from the currently known upper bound is used. Every algorithm has been run to solve all 120 benchmark instances from the OR-LIBRARY and the data from a total of 10 independent runs have been finally averaged. As a stopping criteria all algorithms have been allocated 30 seconds for instances with 500 jobs, 12 seconds for instances with 200 jobs, 6 seconds for instances with 100 jobs, 3 seconds for instances with 50 jobs and 1.2 seconds for instances with 20 jobs. In case of the PLA-Team common memory included 100 individuals, the evolutionary algorithm was allowed to draw 50 instances in each computation step, the tabu search algorithm - 20 instances and the simulated annealing - 10 instances. PLATeam was iterating until the allowed time has elapsed. The results obtained for all 120 instances from the OR-LIBRARY benchmark sets averaged over 10 runs are shown in Table 1. PLA-Team does also outperform the state of the art ant-colony algorithms reported in [10]. These authors propose two ant colony algorithms M-MMAS and PACO. Both are generating 40 ant-sequences and use some local search algorithms to improve solutions. M-MMAS reportedly required less than one hour of computational time on a Pentium 3 computer with 800 MHz for solving all 20, 50 and 100 job instances from the OR-Library. It is worth noting that the PLATeam required 306 seconds for the same task, albeit using the above described cluster of 3 Pentium 4 computers. Comparison of the ant-colony algorithms and the PLA-Team are shown in Table 2. PLA-Team worked using the same settings as previously reported. Data for M-MMAS and PACO are taken from [10].
30
J. Jędrzejowicz and P. Jędrzejowicz Table 2. The average deviation from the currently known upper bound (%) instance M-MMAS PACO PLA-Team 20 × 5 3.35 0.29 0.00 20 × 10 5.02 0.95 0.32 20 × 20 3.73 0.56 0.26 50 × 5 0.84 0.07 0.03 50 × 10 5.12 1.91 0.71 50 × 20 6.20 3.05 1.62 100 × 5 0.46 0.10 0.01 100 × 10 2.13 0.84 0.52 100 × 20 5.11 3.12 1.35
From the experiment results it can be easily observed that PLA-Team can be considered as a useful state-of-the-art tool for the permutation flowshop scheduling.
4
Job-Shop Scheduling Using a PLA-Team Architecture
An instance of the job-shop scheduling problem consists of a set of n jobs and m machines. Each job consists of a sequence of n activities so there are n × m activities in total. Each activity has a duration and requires a single machine for its entire duration. The activities within a single job all require a different machine. An activity must be scheduled before every activity following it in its job. Two activities cannot be scheduled at the same time if they both require the same machine. The objective is to find a schedule that minimizes the overall completion time of all the activities. In this paper a permutation version of the job-shop scheduling problem is used. That is, given an instance of the job-shop scheduling problem, a solution is a permutation of jobs for each machine defining in a unique manner a sequence of activities to be processed on this machine. For a problem consisting of a set of n jobs and m machines a solution is a set of m permutations of n elements each. A feasible solution obeys all the problem constraints including precedence constraints. The PLA-Team applied to solving the job-shop scheduling problem instances makes use of the standard evolutionary algorithm with a cross-over and mutation, the tabu search and the simulated annealing. The above learning and improvement procedures are exactly the same as in case of the permutation flow shop described in previous section. The computational experiment carried out was designed to compare the performance of the PLA-Team with other approaches including agent-based or distributed algorithms. The PLA-Team again consisted of 3 agents, one for each of the learning and improvement procedures used, working in a parallel environment on a cluster of 3 PC computers with P4 1.9 GHz processors with 512 MB RAM. In Table 3 the results obtained by PLA-Team are compared with the
Agent-Based Approach to Solving Difficult Scheduling Problems
31
Table 3. PLA-team versus A-team
FT10 ABZ5 ABZ6 LA16 LA17 LA18 LA19 LA20 ORB1 Average
A-Team Av.result St. dev. Av. 958.40 12.35 1238.00 0.00 948.00 0.00 949.75 6.50 787.00 0.00 856.67 6.13 852.00 9.09 906.67 3.68 1108.00 11.43 5.46
PLA-Team deviat. Av.result St. dev. Av. deviat. 3.05% 952.00 0.00 2.37 % 0.00% 1221.00 0.00 -1.05% 0.53% 943.00 0.00 0.00% 0.50% 964.20 4.66 2.03% 0.38% 792.00 0.00 1.02% 1.02% 854.75 2.25 0.79% 1.19% 850.50 1.50 1.01% 0.52% 907.00 0.00 0.55% 4.63% 1080.00 0.00 1.98% 1.31% 0.93 0.96%
A-Team results of [1] on a set of 10×10 instances from the OR-LIBRARY. These results have been generated by a team consisting of agents using two variants of the simulated annealing, three variants of the tabu search, two variants of the hill climbing algorithm and a genetic algorithm plus a destroying server. A-team was allowed to work until 10,000,000 evaluations with a population size of 500. In case of the PLA-Team common memory included 100 individuals, the evolutionary algorithm was allowed to draw 30 instances in each computation step, the tabu search algorithm 10 instances and the simulated annealing 5 instances. PLA-Team was allowed to iterate for 30 computation steps of the simulated annealing agent. All results are averaged over 5 independent runs. A further experiment aimed at comparing the PLA-Team performance with the state of the art algorithms. In this experiment common memory included 50 individuals, the evolutionary algorithm was allowed to draw 15 instances in each computation step, the tabu search algorithm 10 instances and the simulated annealing 5 instances. PLA-Team was allowed to iterate for 10 computation steps of the simulated annealing agent. All results were averaged over 10 independent Table 4. Average deviation from the optimum (%) and average computation time (s) SAT KTM KOL MSA PLA-Team Instan. Av.dv Av.t Av.dv Av.t Av.dv Av.t Av.dv Av.t Av.dv Av.t ABZ7 4.55 5991 x x x x 3.08 1445 3.34 521 ABZ8 9.40 5905 x x x x 7.71 1902 10.97 732 ABZ9 7.71 5328 x x x x 7.16 1578 7.68 555 LA21 1.53 1516 0.38 1720 0.48 594 0.23 838 0.57 213 LA24 1.50 1422 0.75 1170 0.58 509 0.49 570 1.08 196 LA25 1.29 1605 0.36 1182 0.20 644 0.08 1035 0.46 387 LA27 2.28 3761 1.00 919 0.76 3650 0.84 982 1.34 396 LA29 5.88 4028 3.83 3042 3.47 4496 4.65 1147 3.98 487 LA38 1.87 3004 1.21 3044 0.54 5049 1.56 1143 1.28 465 LA40 1.72 2812 0.96 6692 0.59 4544 0.59 1894 1.34 630
32
J. Jędrzejowicz and P. Jędrzejowicz
runs. The PLA-Team performance was compared with a rescheduling based simulated annealing and a tabu search based approach denoted, respectively as SAT and KTM and reported in [12], the hybrid genetic algorithm simulated annealing of [7] denoted KOL, and a parallel modular simulated annealing [1]. Comparison results are shown in Table 4. It can be observed that the PLA-Team, while not being the best performer among the compared approaches, achieves reasonably good or even good solutions in a competitive time. On the 10 × 10 set of benchmark instances the PLA-Team has produced consistently very good results.
5
Conclusions
PLA-Team architecture seems to be a useful approach which can produce effective solutions to many computationally hard combinatorial optimization problems. Extensive experiments that have been carried show that: – A team of agents sharing common memory and solving problem instances using different learning and improvement strategies achieves a synergic effect producing better results than each of the individual procedures could obtain in a comparable time. – PLA-Team can be considered as a competitive approach as compared with many alternative algorithms. – Agents co-operating within a PLA-Team demonstrate a complex collective behavior. – Effectiveness of the proposed approach can be attributed to diversity of learning and improvement procedures, diversification introduced by the population-based approach, parallel computations and a unique strategy of controlling computation cycle time through differentiation of a number of individuals each kind of agents is working with. The approach needs further investigation. An open question is how will a PLA-Team perform with the increased number of replicated agents and how such systems should be designed. It would be also interesting to introduce some advanced mechanisms allowing a team of agents to escape from local optima. At present the fine-tuning phase of the PLA-Team elements is rather intuitive and should be, in future, based on scientifically established rules.
References 1. Aydin, M.E., Fogarty, T.C.: Teams of autonomous agents for job-shop scheduling problems: An Experimental Study, Journal of Intelligent Manufacturing, 15(4), (2004), 455–462 2. Aydin, M.E., Fogarty, T.C.: A simulated annealing algorithm for multi-agent systems: a job-shop scheduling application, Journal of Intelligent Manufacturing, 15(6), (2004), 805–814
Agent-Based Approach to Solving Difficult Scheduling Problems
33
3. Czarnowski, I., Jędrzejowicz, P.: Application of the Parallel Population Learning Algorithm to Training Feed-forward ANN. In: P.Sincak (ed.) Intelligent Technologies - Theory and Applications, IOS Press, Amsterdam, (2002)10–16 4. Jędrzejowicz P.: Social Learning Algorithm as a Tool for Solving Some Difficult Scheduling Problems, Foundation of Computing and Decision Sciences, 24 (1999) 51–66 5. Jędrzejowicz, J., Jędrzejowicz, P.: PLA-Based Permutation Scheduling, Foundations of Computing and Decision Sciences 28(3) (2003) 159–177 6. Jędrzejowicz, J., Jędrzejowicz, P.: New Upper Bounds for the Flowshop Scheduling Problem, In:M. Ali, F. Esposito (ed.)Innovations and Applied Artificial Intelligence, LNAI 3533 (2005),232–235 7. Kolonko, M.: Some new results on simulated annealing applied to job shop scheduling problem, European Journal of Operational Research, 113, (1999), 123–136. 8. Parunak, H. V. D.: Agents in Overalls: Experiences and Issues in the Development and Deployment of Industrial Agent-Based Systems, Intern. J. of Cooperative Information Systems, 9(3) (2000), 209–228 9. Rachlin, J., Goodwin, R., Murthy, S., Akkiraju, R., Wu, F., Kumaran, S., Das, R. : A-Teams: An Agent Architecture for Optimization and Decision-Support J.P. Muller et al. (Eds.): ATAL’98, LNAI 1555 (1999), pp. 261–276 10. Rajendran, Ch., Ziegler, H. : Ant-colony algorithms for permutation flowshop scheduling to minimize makespan/total flowtime of jobs, European Journal of Operational Research 1555(2004) 426–438. 11. Ruiz, R., Maroto, C., Alcaraz, J.: New Genetic Algorithms for the Permutation Flowshop Scheduling Problems, Proc. The Fifth Metaheuristic International Conference, Kyoto, (2003) 63-1–63-8 12. Satake, T., Morikawa, K., Takahashi, K., Nakamura, N. : Simulated annealing approach for minimizing the makespan of the general job-shop, International Journal of Production Economics, 60-61, (1999) 515–522. 13. Talukdar, S., Baerentzen, L., Gove, A., de Souza, P. : Asynchronous Teams: Cooperation Schemes for Autonomous, Computer-Based Agents, Technical Report EDRC 18-59-96, Carnegie Mellon University, Pittsburgh, (1996)
Development of the Multiple Robot Fish Cooperation System Jinyan Shao, Long Wang, and Junzhi Yu Center for Systems and Control, Department of Mechanics and Engineering Science Peking University, Beijing 100871, P.R. China
[email protected] Abstract. In this paper, we present the development of the Multiple Robotic Fish cooperation System (MRFS), which is built on the basis of a series of radiocontrolled, multi-link biomimetic fish-like robots designed in our lab. The motivation of this work is that the capability of one single fish robot is often limited while there are many complex missions which should be accomplished by effective cooperation of multiple fish robots. MRFS, as a novel test bed for multiple robotic fish cooperation, can be applied to different types of complex tasks. More importantly, MRFS provides a platform to test and verify the algorithms and strategies for cooperation of multiple underwater mobile robots. We use a disk-pushing task as an example to demonstrate the performance of MRFS.
1 Introduction In recent years, biomimetic robotics has emerged as a challenging new research topic, which combines bioscience and engineering technology, aiming at developing new classes of robots which will be substantially more compliant and stable than current robots. Taking advantage of new developments in materials, fabrication technologies, sensors and actuators, more and more biologically inspired robots have been developed. As one of the hot topics, robotic fish has received considerable attention during the last decade [1]-[7]. Fish, after a long history’s natural selection, have evolved to become the best swimmers in nature. They can achieve tremendous propulsive efficiency and excellent maneuverability with little loss of stability by coordinating their bodies, fins and tails properly. Researchers believe that these remarkable abilities of fish can inspire innovative designs to improve the performance, especially maneuverability and stabilization of underwater robots. By now, the research on robotic fish mainly focus on design and analysis on individual robot fish prototype, while seldom is concerned with the cooperation behaviors of the fish. As we know, fish in nature often swim in schools to strive against the atrocious circumstances in the sea. Similarly, in practice, the capability of a single robot fish is limited and it will be incompetent for achieving complex missions in dynamic environments. Thus, for real-world ( ocean based) applications, a cooperative multiple robot fish system is required, which is the motivation of this work. In this paper, we provide the development of the Multiple Robot Fish cooperation System (MRFS), which is built on the basis of a series of radio-controlled, multi-link biomimetic robot fish. There are two features in MRFS: first, this cooperative platform M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 34–43, 2006. c Springer-Verlag Berlin Heidelberg 2006
Development of the Multiple Robot Fish Cooperation System
35
is general and can be applied to different types of cooperative tasks; second, highlevel tasks are eventually decomposed into two reactive motion controllers, which are designed under full consideration on the inertia of fish and the hydrodynamic forces of surrounding water. The remaining of the paper is organized as follows. In Section 2 we present the establishment of the platform for MRFS. Based on the hardware and software platform, a four-level hierarchical control architecture is provided in Section 3. In Section 4, a cooperative dish-pushing task is used as an example to demonstrate the performance of MRFS and corresponding experimental results are shown. Finally, we conclude the paper and outline some future work in Section 5.
2 Design and Implementation of MRFS Platform Before introducing MRFS, we first present the robotic fish prototypes developed in our laboratory. Figure 1 shows some of them. For technical details on design and implementation of the robotic fish, the readers are referred to [1] [2]. We can’t elaborate on them here due to space limitation.
Fig. 1. Prototypes of different robot fishes. (a) Up-down motioned, 3-link robot fish, 380 mm in length. (b) 4-link robot fish with infrared sensors, 450 mm in length. (c) 4-link, 2-D motioned robot fish, 400 mm in length. (d) 3-link, 2-D motioned robot fish, 380 mm in length. (e) 2-link robot fish with infrared sensors, 280 mm in length. (f) 3-link robot fish with miniature wireless camera, 400 mm in length.
As mentioned above, a single fish is often limited both in capabilities and movement range. It will be incompetent for many complex tasks in dynamic environments. In this case, a multi-robot fish cooperative system becomes a desired solution. Inspired by the technology of multi-agent system and the approaches developed for cooperation of ground mobile robots, we establish the hardware platform of MRFS as depicted in Figure 2. The whole system can be decomposed into four subsystems: robot fish subsystem, image capturing subsystem, decision making and control subsystem and wireless communication subsystem.
36
J. Shao, L. Wang, and J. Yu
Fig. 2. Hardware platform for MRFS
Fig. 3. Architecture of software platform
The information about fish and their surroundings are captured by an overhead camera and after being effectively processed, they are sent to the decision making and control subsystem as inputs. Then, based on input signals and specific control strategies for different tasks, the decision making subsystem produces corresponding control commands and transmits them to every robotic fish through the wireless communication subsystem. Since a global vision is adopted, MRFS should be basically categorized into centralized control system and so global planning and optimization can be obtained as a result. Based on the hardware architecture, we also develop a task- oriented software platform, on which we can implement various functions associated with cooperative task, such as task selection, environmental parameters setting, real-time display, image processing, control algorithm loading and commands executing. Figure 3 shows the schematic diagram of the software system architecture. It consists of GUI (Graphics User Interface), image processing module, algorithm module, communication module and fish module. Through GUI, users can choose different tasks, set parameters of environment (goals, obstacles, etc.). In image processing module, a global image information is captured and processed. After that, useful information is abstracted and used for making decision. Algorithm module contains the algorithms
Development of the Multiple Robot Fish Cooperation System
37
and strategies, determining how the fished cooperate with each other. Communication module transmits every control command to the fish module which are the actuators of MRFS.
3 Hierarchical Control Algorithm for Cooperative Applications In this section, we propose a hierarchical control algorithm for cooperative application on MRFS. A four-level hierarchical architecture is developed: The first level is task planner level. In this level, the required task is decomposed into different roles. During the decomposition, it should be guaranteed that these roles are necessary and sufficient for achieving the task. After producing different roles we should select the most qualified candidate of robotic fish for each role according to some proper rules. Aiming at requirements of different tasks, we introduce both static and dynamic role assignments mechanism. In static assignments, once roles are determined at the beginning of the task, they will not change during the task; while in the dynamic mechanism, the fish may exchange their roles according to the progress of the task. The third level is the action level. In this level, a sequence of actions are designed for each role. By action, we mean an intended movement of fish, such as turning, advancing, and so on. The fourth level, which is called the controller level, is the lowest one. In this controller level, we give a sufficient consideration to some unfavorable factors when control due to the speciality of fish. – When the fish swims, the interaction between it and its surrounding water will result in resonance at certain frequency. Moreover, the fish can’t stop immediately even if the oscillating frequency is set to zero. Hydrodynamic forces and the fish’s inertia will make it drift a short distance along its advancing direction. – In our design, the fish’s orientation is controlled by modulating the first two joints’ deflection (φ1 ,φ2 ). However, it is quite difficult to adjust the deflection accurately, because the drag force produced by surrounding water is an unstructured disturbance and we can’t get its precise model.
Fig. 4. The block diagram of the hierarchical architecture
38
J. Shao, L. Wang, and J. Yu
Based on the above conditions, we adopt a PID controller for piecewise speed control and a fuzzy logic controller for orientation control, which are presented particularly in our previous work [3]. Figure 4 illustrates the block diagram of cooperative control architecture consisting of four levels.
4 Application: Three-Fish Cooperative Disk-Pushing In this section, we will utilize a cooperative disk-pushing task to demonstrate the performance of MRFS and the hierarchial control architecture. We conduct the experiments in a swimming tank with three biomimetic robotic fish as shown in Figure 5 (a). There is a 25cm diametical disk floating on the water and a 30cm wide gate in the right side of the tank. The objective for the three fish is to push the disk into the gate. It seems that this task is quite simple, one fish may be enough to achieve it. In fact, the fish’s head is only 40mm wide and the disk drifts with the fluctuant water, it is difficult for a fish to touch and push the disk exactly in the expected point. Moreover, as we mentioned in Section 2, because of its inertia and hydrodynamic forces, the fish can’t stop immediately even if the oscillating frequency is set to zero. It will drift along its current direction and thus, overshot occurs. When this happens, the fish must turn back and re-adjust its relative position to the disk, which will take a long time. Additionally, while the fish swims back and adjusts its attitudes, it will inevitably disturb the surrounding water. As a result, the disk may float away before the fish approaches it, which will increase difficulty to the task. 4.1 Cooperative Strategies In order to push the disk more stably and precisely, the fish are expected to help each other and work cooperatively. However, as the number of the fish increases, more problems may occur. First, because of space limitation, the fish may collide with each other. In order to deal with such circumstance, the fish must adopt some collision avoiding strategies at the same time push the disk, which will debase the performance of cooperation. Moreover, when multiple fish swim in the same place, the surrounding water will be disturbed violently. The disk will float everywhere and becomes more difficult to track and push. Additionally, the water wave produced by fish may cause uncertainty and inaccuracy to the image information captured by the overhead camera. Considering the characteristics of the fish’s motion and the speciality of the hydro-environments, we adopt relatively conservative strategies to achieve this task. Strategy I: As shown in Figure 5 (b), we divide the space (tank) into two parts in X coordinates: Part A and Part B. Part B is closer to the gate and called Attacking Region. in this region, the fish aim to push or hit the disk into the gate precisely. While Part A is called Pushing Region in which the fish try to push the disk to the direction of the gate, preparing for attacking. In Y coordinates, we divide the tank into five parts: I V, representing different regions that the three fish take charge of. We use such kind of region-responsibility strategy with main intent to avoid collisions between the fish.
Development of the Multiple Robot Fish Cooperation System
(a)
39
(b)
Fig. 5. The experiment setting and region division
For each fish, if the disk is not in the region that it is responsible for. It will still swim restrictively within its region and cannot invade other regions to disturb other fish. Strategy II: We prefer slowing down than overshooting. When controlling the fish, we intent to slow down the fish to some extent, especially when it approaches the disk. Strategy III: As we mentioned, the fish can’t stop immediately even you send stop command. It will drift a short distance out of control. So, in practice, even when the fish is idle, we will let the fish wander or move very slowly instead of using stop command to halt it. 4.2 Task Decomposition and Responsibility Assignments Based on the region division shown in Figure 5 (b), we define three roles: one is main attacker(FishC), the other two are (left and right) assistant attackers(FishL and FishR). The role assignments are static, and they will not change during the task. When the disk is not in the region that one fish takes charge of, it will swim restrictively in its region and can not invade to other regions to disturb other fish. Figure 5 (b) indicates region division and allocation of responsibility for three roles. 4.3 Actions Design When a fish pushes the disk, it may use either its head, middle-body or even its tail. Also it can disturb the surrounding water in a proper way and let the disk drift closer to the gate. Hence, we designed the following primitive actions for fish individuals: Action 1: As depicted in Figure 6, the first action for the fish is to swim approaching the disk and hit the disk exactly along the direction from the disk to the gate. where (xF , yF , α ) denotes the pose of the fish, (xD , yD ) and (xC , yC ) stand for the center of the disk and the position of the gate, β is the expected direction from the disk to the gate, l1 indicates the expected moving direction of the disk. Considering the fish’s bodylength and its inertia, we choose a point G (xG , yG )which locates at the extended line of l1 as the pushing-point. l2 is the section connecting the fish to G, and l3 represents the perpendicular of l1 which pass through point G. As illustrated in Figure 6 (a), if the pushing-point G locates between the fish and the disk, i.e. (xD − xF ) × (xC − xD ) > 0, we define the perpendicular bisector l4 for section
40
J. Shao, L. Wang, and J. Yu
(a)
(b)
Fig. 6. The action of pushing disk along the exact direction to the gate
l2 . Then using r as radius, we make a circle C which is tangental to l1 at point G. If the circle intersects l4 at one point, we chose this point as a temporary goal for the fish, or if they are two intersections, we chose the point with small x-coordinates, namely T , as the temporary goal. While, if the fish is far away from the disk and there is no intersection between C and l1 , G will be the temporary goal point for the fish. As the fish moves, a series of temporary goals will be obtained, which will lead the fish swim gradually to the pushing-point. After some geometrical analysis, The positions of intersection points can be calculated by the following equation: (x − xD − ρ cos β − r sin α )2 + (y − yD − ρ sin β + r cos α )2 = r2 xG 1 y − 21 (yG + yF ) = xyF − − y (x − 2 (xG + xF )) G
(1)
F
Figure 6 (b) depicts the case when the fish is between the disk and the goal, i.e. (xD − xF ) × (xC − xD ) < 0. Action 2: Although when determining the pushing-point, we give sufficient consideration for the dynamics of the fish and the difficulty when controlling it, we still can’t guarantee the fish reach its destination in the expected attitudes, especially its orientation. Once it gets to the pushing-point with large orientation error, it may miss the disk. In this case, we design the following action which allows the fish to push the disk by throwing head. As shown in Figure 7 (a), if the fish approaches the pushing-point (in a small neighbor region) and its orientation satisfies the following condition, it will take a sharp turn to the direction of the disk. (xF , yF ) ∈ {(xG , yG ) − (xF , yF ) ≤ δ } (2) α ∈ {|α − β | ≥ ζ } where δ and ζ are the bounds for position error and orientation error, which are deterπ mined empirically experiments. In our experiment, we choose δ = 5cm and ζ = 15 . Action 3: In particular, this action takes full advantage of the agility of fish’s tail. Figure 7 (b) indicates the fish pats the disk by its tail.
Development of the Multiple Robot Fish Cooperation System
(a)
(b)
41
(c)
Fig. 7. Three actions: a)pushing disk by shaking(throwing) head, b) pushing the disk by tail and c) swimming towards a virtual pushing-point
(a)
(b)
(c)
Fig. 8. Cooperative actions: a) two fish both use Action 1, (b) one takes Action 1 and the other use Action 3 and c)one use Action 1 and the other two use Action 4
Action 4: Action 4 will be implemented when the disk floats very close to the gate and in the corner of the tank. This action allows the fish to move the disk slowly towards the goal by oscillating. As shown in Figure 7 (c), in this action, we design a point outside the tank as a virtual pushing-point. Thus the fish will always try to swim to that virtual destination, although it can never reach it. During moving (or struggling), its body, especially posterior body, oscillates continually, which disturbs the water and makes the disk float towards the gate. In addition, based on the above four basic actions, some cooperative action can be obtained naturally. Figure 8 presents three of them. According to the above assignments for responsibility and cooperative actions designed, we sum up the corresponding strategies for each fish individual when the disk is in different situations: Figure 9 describes the scenario of one experiment. We conduct extensive experiments for this task on MRFS. The results are not very perfect but successful and promising. For more pictures and videos of MRFS and the experimental results, please visit http :// www.mech.pku.edu.cn/robot/MRFS.html. Now, we are carrying out more experiments and searching better methods to reduce the impact of fish’s inertia and the disturbance from surrounding water more effectively. Also, our ongoing work includes adopting other strategies and comparing them using some criteria and finally choosing the most efficient one. Although the above cooperative task may be quite simple, it is very heuristic for further research on more complex cooperation of multiple fish. In fact, this study involves
42
J. Shao, L. Wang, and J. Yu Table 1. Assignments of responsibility and action selection Disk I II Part A III IV V I II Part B III IV V
FishL Action 1,2 Action 1,3 Idle Idle Idle Action 4 Action 1,3 Action 4 Idle Idle
Multi-robot fish FishC FishR Idle Idle Action 1,2 Idle Action 1,2 Idle Action 1,2 Action 1,3 Idle Action 1,2 Idle Idle Action 1,2 Idle Action 1,2 Action 4 Action 1,2 Action 1,3 Idle Action 4
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 9. Sequences of overhead images, order is left to right and top to down
a spectrum of problems, ranging from cooperative strategies, self-organizing mechanism, task allocation, collective swarming behavior, control laws, and so on. In our experiments, only three fish are used, which is because the tank is not large enough to accommodate more. When a larger space is available, more fish will swim freely in the tank, and other complicated tasks such as self-organized schooling and cooperative formation can be designed and tested.
5 Conclusions and Future Work This paper concentrates on the multi-robot fish cooperation problem. It describes the design and implementation of MRFS and evaluates its performance using a cooperative disk-pushing task. Corresponding experimental results show that higher efficiency and much greater capabilities can be obtained when the fish cooperate with each other properly. It is observed that MRFS provides a useful and effective platform to design and achieve cooperative tasks for multiple robot fish. Of course, there is still a long time before employing the fish to implement some practical tasks in the sea. Future work will focus on the following two aspects. On
Development of the Multiple Robot Fish Cooperation System
43
the one hand, we plan to add some sensors like ultrasonic and infrared detectors to the fish body to improve its autonomy and efficiency. On the other hand, more complex tasks will be designed and tested on MRFS. Next, we will attempt to organize competitive games. Similar to Robocup, we can expect water polo matches between seven autonomous robotic fish and seven manually controlled fish to imitate the real polo game by following the same rules. The results of such kind of attempts will be presented soon.
References 1. Yu, J., Wang, L., Tan, M.:A framework for biomimetic robot fish’s design and its realization, in Proc. American Control Conf. Portland, USA. (2005)1593-1598 2. Yu, J., Wang, L.:Parameter optimization of simplified propulsive model for biomimetic robot fish, in Proc. IEEE Int. Conf. Robotics and Automation. Barcelona, Spain,(2005)3317-3322 3. Yu, J., Wang, S., Tan, M.:Basic motion control of a free-swimming biomimetic robot fish, in Proc. IEEE Conf. Decision and Control, Maui, Hawaii USA, (2003)1268-1273 4. Triantafyllou, M. S., Triantafyllou,G. S.: An efficient swimming machine. Sci. Amer. 272 (1995)64-70 5. Lighthill, M. J.: Note on the swimming of slender fish. J. Fluid Mech. 9 (1960)305–317 6. Barrett, D., Grosenbaugh, M., Triantafyllou, M.: The optimal control of a flexible hull robotic undersea vehicle propelled by an oscillating foil, in Proc. IEEE AUV Symp. 24 (1999)1-9 7. Terada, Y.: A trial for animatronic system including aquatic robots, J. Robot. Soc. Jpn. 18(2000)195-197 8. Domenici, P., Blake, R. W.: The kinematics and performance of fish fast-start swimming, J. Exper. Biol., 200 (1997)1165-1178
Introducing Social Investors into Multi-Agent Models of Financial Markets Stephen Chen1, Brenda Spotton Visano2, and Ying Kong2 1
Information Technology Program, Atkinson Faculty, York University 4700 Keele Street, Toronto, Ontario M3J 1P3
[email protected] 2 Economics Program, Atkinson Faculty, York University 4700 Keele Street, Toronto, Ontario M3J 1P3
[email protected],
[email protected] Abstract. Existing models of financial market prices typically assume that investors are informed with economic data and that wealth maximization motivates them. This paper considers the social dimensions of investing and the effect that this additional motivation has on the evolution of prices in a multi-agent model of an equity market. Agents in this model represent both economically informed investors and socially motivated investors who base their decision to invest solely on the popularity of the investment activity itself. The new model captures in a primitive but important way the notion of frenzy associated with speculative manias and panics, and it offers further insight into such anomalies as market bubbles and crashes. Keywords: Multi-Agent Model, Financial Market Modelling, Social Investors, Herding, Signalling Games.
1 Introduction Recent attempts to model complex financial market dynamics through simulation by multi-agent models have focused primarily on investors who base their decisions to trade on economic information alone [2][4][18]. While we are also interested in the effects of differing but interdependent agent motivations on the time path of market prices, we will also consider the social motivations of investors. The actions of social investors can add a unique and highly destabilizing influence to the dynamics of a financial market. One of the goals of this study is to examine how the corridor of stability between efficient markets and markets that lead to bubbles/crashes is affected by the introduction of social investors. A multi-agent model (MAM) of a financial market with two classes of agents which include economically informed (traditional) investors and socially motivated investors has been developed. Economic data informing the traditional investors include the current market price of the stock, the expected future value of the stock, and the recent rate of change in the observed market price. Therefore, the agents which represent the traditional investors in our MAM will capture the behavioural effects of both the fundamentals traders (who trade on fundamental value only) and the noise traders (who trade on price trends). A socially motivated investor will base his or her M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 44 – 53, 2006. © Springer-Verlag Berlin Heidelberg 2006
Introducing Social Investors into Multi-Agent Models of Financial Markets
45
trading decisions on the popularity of the investment activity alone. If a social investor sees a large number of other investors buying, then a social investor will buy as well. Conversely and symmetrically, a social investor seeing a large number of sellers will sell. It is useful to highlight the differences between noise traders and social investors. Noise traders are usually very sophisticated investors who may use charts and technical analysis to exploit (short-term) price trends. In exploiting (i.e. following) price trends, the actions of noise traders tend to exaggerate price fluctuations in a financial market [2][4][18]. Social investors are typically poorly informed investors – their investment decisions are not based on any financial information at all. The effects of social investors on a financial market are less predictable, and these effects will be studied in our paper. Compared to existing financial market MAMs that are designed to reproduce aggregate market data [2][4][18], the purpose of the new MAM is to reproduce singularities in market dynamics. Existing models of event singularities are static and thus can neither expose conditions that may aid prediction of a market bubble nor inform any intervention strategies designed to mitigate its effects. The newly developed multi-agent model is a first attempt to address these shortcomings – it promises a tool that will enable our observation of the market dynamics that characterize periods of instability. Although primitive in its characterization of equity markets, we have been able to replicate the basic results of the extant models, and thus provide a basis for comparison of price dynamics with and without the social investor. We explore the sensitivity of the price dynamics to agent behaviours by altering the number of social investors and tracking the time path of prices under different assumptions about the behaviour of the economically informed investors. For example, the developed MAMs are able to meaningfully model the situation where the fundamental value of the underlying asset is known with certainty by the economically informed investor. In this situation, the system remains stable with even a relatively large number of social investors. When the fundamental value is not known with certainty and economically informed investors adjust their estimate of the true market price by including recent price movements (in a manner that reflects noise trading behaviour), the threshold level of social investors required to drive the system unstable declines. With only a small number of social investors, the stable dimensions of the first two models reproduce the efficient market hypothesis and mean reversion dynamics of extant models respectively. The introduction of “noise” trading in the second model causes temporary overshooting, but trading guided by estimates of fundamentals operates to dampen the deviations over time such that the price again converges on the fundamental value. Trading by a dominance of fundamentals-informed speculators thus ensures that speculation will stabilize markets in the way analysed by Friedman [9]. In the presence of social investing behaviour, the stability of the MAM is conditional on the number of social investors, and the stability threshold is sensitive to both the trading behaviour of the economically informed investor and the size of any shock to fundamental values. With a “small” number of social investors (where “small” varies depending on the behaviour of the economically informed investor), the system is stable. As the number of social investors increases, the system hits a threshold beyond which the system is unstable in the sense that deviations from the fundamental
46
S. Chen, B. Spotton Visano, and Y. Kong
increase, rather than decrease, over time. As such, the model captures the notion of a corridor of stability outside of which the model becomes dynamically unstable [3][15]. We are also able to demonstrate the manner in which the threshold level of stability determined by the degree of social investing is also sensitive to the size of any shock to fundamental values through the new MAMs. One of the economic and financial questions that the developed MAM allows us to address is the determination of the threshold size of social investing relative to the innovation (shock to the fundamental value) that pushes the model outside of its corridor of stability. The MAM also supports the analysis of how sensitive this threshold is to the type of economically informed trading behaviours, and to the size of a shock to the underlying fundamental value of the asset. The economic background for these explorations is developed in the following section before the basic multi-agent model is developed in section 3. Sections 4 through 6 examine the model parameters corresponding to efficient markets, noise trading, and price shocks. Section 7 discusses the results and directions for future research, and section 8 summarizes.
2 Background In conventional models of prices in competitive financial markets, the price of equities reflects fully and accurately the existing information on the income earning potential of an asset. This “efficient market” outcome as explored by Fama [5][6][7][8] suggests that the present discounted value of the expected future income over the life of the asset—its “fundamental value”—will ultimately govern the asset’s market price. Deviations from this so-called “fundamental value” will only be temporary – speculators capable of estimating the true fundamental value will quickly arbitrage away any implicit capital gains. Deviations may appear as the result of new information about future profitability and, as such, represent the disequilibrium adjustment to a new equilibrium with prices again equal to the now altered fundamental value. Although intuitively appealing and consistent with a long-standing tradition in finance that acknowledges the importance of “value investing” [26], actual price movements and the resulting distribution of returns do not appear to adhere to the strict predictions of the efficient markets hypothesis. Explanations for persistent deviations from estimated fundamental values include various explanations for a “bubble” in stock prices. A “bubble” occurs when competitive bidding, motivated by repetitive and self-fulfilling expectations of capital gains, drives up a given asset’s price in excess of what would otherwise be warranted by a fundamental value. The bubble may be driven by the presence of “noise” traders. Noise traders attempt to exploit short-term momentum in the movement of stock prices, and their actions (e.g. buying when prices are rising and selling when prices are falling) can exaggerate any movement in prices. The presence of noise traders alters, however, neither the ultimate equilibrium market price for stocks nor the fact that the market will eventually reach it. In the extant literature, the formal introduction of “noise” traders creates a mean-reverting market dynamic to explain temporary deviations from fundamentals [16][17]. The presence of noise traders can confound market dynamics to such an extent that under some conditions or for some time, it is profitable for the more sophisticated traders to
Introducing Social Investors into Multi-Agent Models of Financial Markets
47
disregard the intrinsic value of the asset, follow the herd, and thus contribute to the asset bubble that results [10]. It has also been suggested that herding may explain the excess kurtosis observable in high-frequency market data [2]. Both the “fundamentals” trader and the “noise” trader base their decision to trade on economic information alone. The fundamentals trader estimates expected future profitability of the underlying firm and extracts capital gains by trading when current market prices deviate from the estimate. The noise trader will “chart” the past history of prices to predict future movements (assuming repetitive time trends). These charts allow noise traders to indirectly capture a “herding” market psychology that may move markets independently, if temporarily, away from fundamental values. Since these traders base their decisions solely on objective market information, the more traditional financial models exclude by assumption the possibility that the investment activity may also be a social activity. In situations where individuals are motivated to belong to a group, the possibility of fads, fashions, and other forms of collective behaviour can exist. Visano [25] suggests that investing in equity markets is not immune from social influences, especially when investors face true uncertainty. Consistent with the early views of financial markets as “voting machines” when the future is uncertain [13][25], Visano’s result explains the fad and contagion dimensions of investing which relate to Lynch’s [19] explanations of the recent internet “bubble”. When objective information is incomplete and individuals base investment decisions on social rather than economic information, outcomes become contingent on the collective assessment of the objective situation, and these outcomes are no longer uniquely identifiable independent of this collective opinion. Attempts to model this heterogeneity of investment behaviour and multiplicity of interdependent outcomes render the mathematics so complex as to threaten the tractability of the typical highly aggregated dynamic model. By presenting an opportunity to analyze the effects of different agent behaviours, there are significant potential benefits in using a multi-agent model to simulate the actions of a financial market.
3 Model Assumptions All of the following multi-agent models have the same basic parameters. Ten thousand agents each have the opportunity to choose to own or not own a stock. A decision to change from owning to not owning will cause the agent to sell, and a decision to change from not owning to owning will cause the agent to buy. During each time period of the simulation, each agent will be activated in a different random order. When activated, an agent will observe the current conditions and determine if it should own or not own the stock. Informed investors will base their decisions on the price of the stock and a calculated value of the stock. Depending on the model, this calculation may include the fundamental value of the stock, the current price of the stock, price trends, and derivatives of price trends. (See Appendix A for calculation details.) Social investors will base their decisions on the actions of other investors. Examining a random sample of
48
S. Chen, B. Spotton Visano, and Y. Kong
100 other investors, seeing a large number invested will lead to a decision to own, seeing a small number invested will lead to a decision to not own, and seeing a number in between will lead to no change in their previous decision. A decision by an agent to buy or sell will affect the market price of the stock. In the following multi-agent models, the market price is determined by assuming a primitive linear supply and demand function. (Each unit of demand causes the price to increase by one unit such that the price is equal to the number of agents who own the stock.) At the beginning of the model, 50% of each class of investor own the stock and the remaining agents do not own the stock. The initial market price is thus 5000, and the fundamental price of the stock is set at 5100.
4 Model 1 – Efficient Markets The efficient market hypothesis (which assumes there are only economically informed investors with perfect information) predicts a market price equal to the fundamental value of the stock. Further, the introduction social investors should have no effect since any deviation between the market price and the fundamental value will cause the informed investors to engage offsetting trades until the market price again matches the fundamental value. We reproduce these results in our base-line system below in Figure 1. 10000
8000
6000 Price
0 social 10% social 40% social
4000
2000
0 Time
Fig. 1. A market with investors who have perfect information is inherently stable. This stability is not affected by the introduction of social investors.
5 Model 2 – Noise Trading The assumptions underlying the base-line model are stringent – estimates of underlying fundamental asset values are rarely identifiable as uniquely and unambiguously as the efficient market hypothesis assumes. When the possibility is introduced that an investor knows her or his information is less than perfect and that others may possess
Introducing Social Investors into Multi-Agent Models of Financial Markets
49
better information, an incentive is created for the investor to follow the herd. Following the herd can help investors achieve capital gains when other investors are also accumulating capital gains (i.e. not selling). However, herding can also drive prices away from the true underlying fundamental value, so the risk of capital loss becomes possible. To capture some of these strategic investment considerations, we adapt the base-line model by adding the first and second price derivative to the investment decisions of the economically informed traders. The second multi-agent model is again stable if it has only informed investors who base their future price expectations on both the first and second derivative of a stock’s price (see Figure 2). This MAM is also stable if a small number of social investors is added to the system. Although there is greater price volatility due to the price momentum created by the noise trading behaviour, this price momentum does not cause instability for a relatively small number of social investors (see Figure 2). However, this price momentum does lead to market instability if the market influence of social investors is large enough. 10000
8000
6000 Price
0 social 10% social 40% social 4000
2000
0 Time
Fig. 2. A market with informed investors who balance the rewards of short-term capital gains with the risks of returning to the fundamental value should be stable. However, this balance can be easily affected by the addition of social investors.
6 Model 3 – Price Shocks Historically, the onset of a market bubble is stimulated by a shock to the asset’s underlying fundamental value. For example, a new technology like the internet increases the future profitability of a stock by an unknown amount. In this situation, informed investors become more reliant on market signals to determine the future fundamental value. With only informed investors participating in the market, this determination can eventually be made (see Figure 3). However, the addition of even a small number of social investors can distort the price signals enough to cause instability.
50
S. Chen, B. Spotton Visano, and Y. Kong 10000
8000
6000 Price
0 social 10% social 4000
2000
0 Time
Fig. 3. A price shock occurs at the tick mark on the Time axis. Before this time, the fundamental price is 5100, and it is 5600 after. A market with only informed investors can handle this price shock, but the introduction of even a relatively small number of social investors greatly reduces the market’s ability to remain stable after a price shock.
7 Discussion The traditional analysis of financial crises focuses on highly aggregated dynamic models. These models help describe the potential conditions under which a speculative bubble or a financial crises may occur. However, these models provide little insight into the dynamic conditions of these events – especially when investors have different motivations. To better understand potential market instabilities, it is necessary to employ techniques such as the multi-agent model that permit an analysis of the complexities created when investor heterogeneity and interdependence is introduced. The dynamics of the current models are overly sinusoidal and thus unrepresentative of real markets. In general, choosing agent parameters for a stable market (e.g. Model 1) is easy and straightforward. However, finding a set of agent parameters that are on the boundary of stability (e.g. Models 2 and 3) has been more difficult. Traditional economic models provide little insight into the specification of these parameters, so the current multi-agent models represent a foundation for the coordinated exploration and development of computational and economic models. In particular, the developed multi-agent model suggests a signalling game. In a game theory model, players make their decisions based on payoffs that depend on the actions of the other players. The likely behaviour of the players in a game can be determined by identifying a Nash equilibrium. In a Nash equilibrium, a decision by one player will make a specific decision the most beneficial to the other player, and this decision by the other player will make the original decision the most beneficial to the first player. Since this set of decisions is stable, the game model is in equilibrium. The previous MAMs of financial markets [2][4][18] could not be easily analyzed with game theory. In these models, noise investors (players) who observe a stock price that is rising on fundamentals will decide to buy the stock because of its upward price trajectory, and this decision may subsequently cause the stock to rise
Introducing Social Investors into Multi-Agent Models of Financial Markets
51
above its fundamental price. When the stock price is above its fundamental value, fundamentalist investors (players) will decide to sell. Since a decision by fundamentalists to buy can lead to a decision by noise investors to also buy which can then lead to a decision by fundamentalist investors to sell, these models do not have a Nash equilibrium. The new multi-agent model which introduces social investors has two Nash equilibria. The first equilibrium involves the buying decisions of informed investors being seen by the social investors which leads them to buy, and this decision will drive up the price expectations of the informed investors which maintains their decision to buy. A Nash equilibrium where all of the players are buying represents the actions of market participants during a speculative frenzy. The second Nash equilibrium involves the converse situation where a decision to sell by informed investors leads the social investors to sell which confirms the decision of informed investors to sell. Between these two Nash equilibria which represent market bubbles and crashes, the actions of the informed and social investors can create the dynamics of an efficient market. In a signalling game, a signal (like the price shock in our MAM) can change the equilibrium outcome of a game model. Signaling models have been widely used in economics [24] to explain phenomena such as uninformative advertising [21][23], limit pricing [20], dividends [1][12] and warranties [11][22]. In these games, a player with private information (the informed player) sends a signal of his private type to the uninformed player before the uninformed player makes a choice. In our MAMs, the informed player can be viewed as the economically informed (traditional) investors while the uninformed player can be viewed as the socially motivated investors. The informed investors will make a decision and pass a signal to the socially motivated investors. The socially motivated investors will make a decision based on this signal. This game can have different equilibriums which depend on the nature of the informed investors. If the informed investors act as one person and all make the same decision, a pooling equilibrium is reached. Conversely, a separating equilibrium will be generated if the informed investors make different investment decisions. To create the possibility that the signaling game can represent an efficient market, it is necessary that some signals will lead to a separating equilibrium where some investors buy and some investors sell. The insight gained from choosing agent parameters that lead to marginally stable MAMs will be of great value in designing new signaling game models.
8 Summary A series of multi-agent models for financial markets has been developed. These computational models provide a valuable tool for the evaluation and development of economic models that represent the conditions and events of financial crises. A key focus of the models in development is the role of social investors – an investor class that cannot be meaningfully handled by the assumptions of traditional economic models.
52
S. Chen, B. Spotton Visano, and Y. Kong
References 1. Battacharya, S.: Imperfect information, dividend policy and the ‘bird in the hand’ fallacy. Bell Journal of Economics 9(1):259-70. (1979) 2. Cont, R., Bouchard, J.-P.: Herd Behavior and Aggregate Fluctuations in Financial Markets. Macroeconomic Dynamics 4:170-196. (2000) 3. Dimand, R.W.: Fisher, Keynes, and the Corridor of Stability. American Journal of Economics and Sociology 64(1):185-199. (2005) 4. Eguiluz, V.M., Zimmermann, M.G.: Transmission of Information and Herd Behavior: An Application to Financial Markets. Physical Review Letters 85:5659-5662. (2000) 5. Fama, E.F.: The Behavior of Stock Market Prices. Journal of Business 38:34-105. (1965) 6. Fama, E.F.: Efficient Capital Markets: A Review of Theory and Empirical Work. Journal of Finance, 25:383-416. (1970) 7. Fama, E.F.: Foundations of Finance. Basic Books, New York. (1976) 8. Fama, E.F.: Efficient Markets: II. Journal of Finance 46(5):1575-1617. (1991) 9. Friedman, M.: The Case for Flexible Exchange Rates. In Friedman, M. (ed), Essays in Positive Economics, University of Chicago Press, pgs 157-203. (1953) 10. Froot, K.A., Scharfstein, D.S., Stein, J.C.: Herd on the Street: Informational Inefficiencies in a Market with Short-Term Speculation. Journal of Finance 47:1461-1484. (1992) 11. Gal-Or, E.: Warranties as a signal of quality. Canadian Journal of Economics 22(1):50-61. (1989) 12. John, K., Williams, J.: Dividends, dilution and taxes: A signalling equilibrium. Journal of Finance 40(4):1053-1069. (1985) 13. Keynes, J.M.: The General Theory of Employment, Interest and Money. Reprinted Prometheus Books. (1936, 1997) 14. Kindleberger, C.P.: Manias, Panics, and Crashes. Basic Books, New York. Revised edition issued in 1989. (1978) 15. Leijonhufvud, A.: Effective Demand Failures. Swedish Journal of Economics 75: 27-48. (1973) Reprinted in Leijonhufvud, A., Information and Coordination: Essays in Macroeconomic Theory, Oxford University Press. (1981) 16. De Long, J.B., Schleifer, A., Summers L.H., Waldman, R.J.: Noise Trader Risk in Financial Markets. Journal of Political Economy 98:703-738. (1990) 17. De Long, J.B., Schleifer, A., Summers L.H., Waldman, R.J.: Positive Feedback Investment Strategies and Destabilizing Rational Speculation. Journal of Finance 45(2):379-395. (1990) 18. Lux, T., Marchesi, M.: Scaling and criticality in a stochastic multi-agent model of a financial market. Nature 397:498-500. (1999) 19. Lynch, A.: Thought Contagions in the Stock Market. Journal of Psychology and Financial Markets 1:10-23. (2000) 20. Milgrom, P., Roberts., J.: Limit pricing and entry under incomplete information: An equilibrium analysis. Econometrica 50:443. (1982) 21. Milgrom, P., Roberts., J.: Price and advertising signals of product quality. Journal of Political Economy 94(4):796-821. (1986) 22. Miller, M.H., Rock, K.: Dividend policy under asymmetric information. Journal of Finance 40:1031. (1985) 23. Nelson, P.: Advertising as information. Journal of Political Economy, 82(4):729-54. (1974) 24. Riley, J.G.: Silver signals: Twenty- years of screening and signalling. Journal of Economic Literature 39:432-478. (2001)
Introducing Social Investors into Multi-Agent Models of Financial Markets
53
25. Visano, B.S.: Financial Crises: Socio-economic causes and institutional context. Routledge, London. (forthcoming) 26. Williams, J.B.: The Theory of Investment Value. Reprinted Augustus M. Kelley, New York. (1938, 1965)
Appendix A – Investor Models For a series of artificial market prices, the future market price calculated by the informed investors is shown in Table 1. In model 1, the calculated price is always equal to the fundamental value (e.g. 51). In models 2 and 3, the price calculation includes a time-based multiplier and a second derivative of the current trend. If the difference in the past trends have the same sign as the current trend, then it is also multiplied by 0.8. Therefore, the price calculated at time 2 is 42.5% of 51 plus 57.5% of (47 + (2)/0.8 + (-1)/1.6 + [(-2)-(-1)]*0.8) and it is 42.5% of 51 plus 57.5% of (47 + (-2)/1.6 + (-1)/2.4 + [0-(-2)]) at time 3. For the social investors, the examined sample size is 100 investors, and the large number that causes buying is 60 and the small number that causes selling is 42. Table 1. Future price expectations calculated by the informed investors in each model for a series of artificial market prices Time 0 1 2 3 4 5 6 7 8 9
Market Price 50 49 47 47 48 50 51 51 54 53
Model 1 51 51 51 51 51 51 51 51 51 51
Models 2 and 3 51 48.4 46.4 48.9 50.0 52.7 52.1 51.4 57.0 51.4
Cross-Organisational Workflow Enactment Via Progressive Linking by Run-Time Agents Xi Chen and Paul Chung Computer Science Department, Loughborough University Loughborough, Leicestershire, United Kingdom LE11 3TU {X.Chen, P.W.H.Chung}@lboro.ac.uk
Abstract. Driven by popular adoptions of workflow and requirements from the practice of virtual enterprise (VE), research in workflow interoperability is currently on the increase. Nonetheless, it is still in its early stage compared with the maturity of individual workflow technology. Some attempts have been tried, however results are not satisfactory especially in a VE context, where many of the partnerships are dynamic and temporary. Reasons include the rigidity and high initial coordination cost inherently associated with top-down modelling and enactment approaches. Therefore, this paper proposes a bottom-up and WfMS 1 -independent approach towards cross-organisational workflow enactment, which is via progressive linking enabled by run-time agents. This is expected to pave the way for further cross-organisational workflow needs. Keywords: Enterprise.
Multi-Agent
Systems,
Workflow
Interoperability,
Virtual
1 Introduction Business processes are at the core of productivity for an organisation. They control and describe how business is conducted in terms of “a set of one or more linked procedures or activities which collectively realise a business objective or policy goal, normally within the context of an organisational structure defining functional roles and relationships” [1]. To support mobility and dynamism, individual business processes are vital for a company to react faster and be more flexible in running its daily business in a constantly changing environment [2]. However, the idea of virtual enterprises (VE) blurs the boundaries between organisations and requires crossorganisational interactions, which brings in many challenges. Workflow was born to tackle the issue of business process automation and is proven, to date, as a mature technology. It is carried out with the support of workflow management system (WfMS) that provides complete design, execution and management services to workflow. The essential strategy of workflow is the separation of business logic from software applications. Although being separated, they are still linked in the form of ‘activities’ that represent logical steps within a process [1]. Centred around a set of activities, an activity-based workflow is constructed in contrast to an 1
WfMS is the acronym for workflow management system [1].
M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 54 – 59, 2006. © Springer-Verlag Berlin Heidelberg 2006
Cross-Organisational Workflow Enactment Via Progressive Linking
55
entity-based workflow [3]. Each activity can be either manual or automatic depending on the task to be carried out, where the automatic ones are, mostly, implemented as applications. The problem of workflow interoperability has been identified due to adoptions of diverse WfMS products between organisations and the inevitability of interconnections for the purpose of cooperation across organisational boundaries. Three basic interoperation patterns, namely, chained process, nested synchronous subprocess, and event synchronised process [4] should be tackled, among which more concerns are put on the first two [5]. Due to inherent complexity, event synchronisation, although encountered very often in real business, has not received much attention by WfMC itself [6]. A number of research projects have been carried out in the area of workflow interoperability, nevertheless results are not satisfactory when applied in a VE context because of their rigidity and the high initial coordination cost imposed by top-down approaches. This project proposes a more effective approach, which addresses enactment of cross-organisational workflow from a bottom-up view and in the form of a progressive linking mechanism supported by run-time agents. It is expected the success of such an approach will shed light on and facilitate the formalisation and execution of cross-organisational workflow. This paper identifies challenges of workflow enactment in the context of VE, describes the approach, presents a possible way of implementation, discusses its effectiveness and highlights the future work.
2 Challenges Workflow interoperability is commonly examined from a top-down view [7,8,9], which intends to start from the concept of traditional workflow and extend it beyond organisational boundaries in order to keep the control flow manageable. However, as a technology-driven approach, it brings in much initial cost in terms of a detailed and rigid pre-definition that does not reflect the run-time nature of agile interactions within VEs where many of the partnerships are dynamic and temporary. However, if centralised control is removed due to the choice of a bottom-up modelling approach, there seems to be no way of propagating control flow from one workflow to another at run time. Real-life interoperation always poses a tightly-interwoven control flow structure. Also, many existing business processes are mobile and ever-changing because of their dynamic nature. The dynamism should be dealt with effectively by the crossorganisational workflow to minimise disturbance to cooperation, which implies a loosely-coupled interaction mechanism. Therefore, the realisation of tightlyinterwoven processes by means of a loosely-coupled mechanism is identified as a challenge. According to the Workflow Reference Model [6] initiated by WfMC, Interface 4 is the standard interface dedicated for the purpose of interoperability and has attracted much attention. Although standardisation provides a solution with regard to interoperability, the practical value is discounted in the face of the diversity of standards [10] and the reality towards their acceptance. Therefore, standardisation cannot be fully relied on.
56
X. Chen and P. Chung
Moreover, approaches that require substantial effort, e.g., dialogue definition [5], workflow view [7,11], Interworkflow [9], agent-based workflow [12], and standardisation [13,14], are unlikely to be adopted widely in the near future.
3 Progressive Linking Approach For the purpose of simplicity, interoperation discussed in this section is confined to participation between two organizations but not any particular two. It is assumed that when a workflow is invoked it will always instantiate a new process instance. 3.1 Interoperation Modelling At least, three aspects, namely, control flow, data flow and communication, should be addressed in order for two workflows to interoperate with each other. For control flow, other measures must be taken to route control due to the lack of a centralised architecture. To facilitate the discussion, the concept of interaction point is introduced here. An interaction point can be defined as from (or to) which, a request (or response) is emitted (or targeted). Since workflow engines are state machines, an appropriate sequence of interaction can be ensured at run time [7] as long as interaction points are correctly specified in both participating processes. Synchronisation is achieved by a process sending request and waiting for a response from the other process[5]. Activity-level modelling for interaction point is chosen in order to make the approach adaptive to all interoperability patterns. An interaction point is therefore modelled as an interface activity, which is further implemented in the form of a generic workflow activity. This activity is configured to synchronously (letting the process wait while the application is executing) invoke software agent as an external run-time application, which makes it an agentenhanced approach [15]. Control token [16] is passed back and forth among WfMSs and agents. Data flow is managed by the semantics of ‘sending’ and ‘waiting’, which are implicitly indicated by the type (incoming or outgoing) of data being exchanged by the two workflow engines through this interface activity. Basic interoperability patterns are all modelled at the activity level by employing the concepts of interface activity, incoming and outgoing data, which is illustrated in Fig. 1. Legend Original Activity Interface Activity Outgoing Data IncomingData
(a) Chained
(b) Nested
(c) Synchronised
Fig. 1. Interoperation patterns modelling using interface activities
Cross-Organisational Workflow Enactment Via Progressive Linking
57
Mediated communication [7] is used due to the loosely-coupled approach is adopted. Semantic service descriptions extracted from individual processes are used for message routing. Thus, agents on the one side pass outgoing data from the interface activity to the mediator, whilst agents on the other side check whether desired incoming data arrives and deliver it to the corresponding interface activity accordingly. 3.2 Compatible Workflow At build time, based on a common agreement, the workflows involved should be modelled and tuned into compatible ones and interface activities are inserted into the processes at desired positions to make both processes ready to go. Semantic service descriptions (in the form of interaction identifiers) are also attached to each pair of outgoing and incoming data belonging to the interface activities on both sides. 3.3 Form Filling At run time, an empty form is created and two compatible workflows begin their interoperation by filling the form jointly in sequence. Using Fig. 1 (a) as an example, when the first interface activity (only containing outgoing data) from Organisation A is executed, Agent A is invoked and puts the source activity ID (A2), interaction identifier (PurchaseOrder), data (PO200511) and attached document (if any) associated with this activity as well as the identifier of the partner process (Process B) into the form. The occurrence of information in the form triggers Agent B who arranges the instantiation of a new process on the side of Organisation B. After being instantiated, Process B reaches its first interface activity (only containing incoming data) who calls Agent B to register its interaction identifier as a mark of interest. Agent B looks at the form and checks whether there is such an identifier in an unfinished entry. If yes, it writes the activity ID (B1) into the entry and transfers the data and the document (if any) to Process B for consumption. Table 1 gives the headings of the form and an example entry. Table 1. Heading of communication form and an example entry
Seq.
Source
Target
Interaction ID
Message
Doc
1
A2
B1
PurchaseOrder
PO200511
attached
Iteration NIL
... By doing so, the form shows the current progress of the interoperation. Apart from making a loosely-coupled structure possible, in case of exception, it can be used to trace and locate the trouble spot. When all interactions finish, the completed form can be saved as a historical record. The progressively filled form is also able to handle event synchronisation and iterative cases effectively through reasoning based on recorded data and progressive status of the form filling. For example, the appearance of two successive uncompleted entries with a blank Source and Target field in each means a rendezvous point is reached; a completed entry with the Source and Interaction ID fields exactly the same
58
X. Chen and P. Chung
as the ones in a previous entry implies an occurrence of iteration, in which case, the entry needs to be marked in the field of Iteration to draw the attention to the recipient.
4 Implementation Implementation of the approach is underway. A client/server system archetecture is chosen. General characteristics of WfMS are fully exploited in order to achieve a WfMS-independent solution and avoid undue complexity. Both interoperation triggering and acceptance will utilise the workflow application invocation mechanism. To support the mediated communication among software agents, a blackboard system [17] will be adopted. This is due to the structure and functionalities provided by the blackboard system architecture match the proposed approach very well. Knowledge sources (KSs) can be implemented as agents on the client side whilst the blackboard (BB) can be used to hold the form on the server. KStrigger mechanism can be used as well to bring attention to agents on both sides when something happens to the form.
5 Discussion and Future Work The approach of progressive linking is developed by using artificial intelligence technology based on a comprehensive investigation of workflow model in terms of control flow, data flow, activity model and application invocation mechanism. It addresses the challenges identified in Section 2. Firstly, this approach reflects realistic cooperation between processes. The complexity of cross-organisational control flow is wrapped in the procedure of invoking mediated software agents, which enable interoperation without a centralised control mechanism. Secondly, loosely-coupled interaction mechanism is provided by the run-time progressive interaction. The tightly-interwound process is dealt with by activity-level interaction modelling that provides a general method for all interoperability patterns. Thirdly, interoperability standards are avoided as much as possible in terms of the agent invocation through application invocation interface rather than workflow interoperability interface. Finally, since the substantial work is implemented in the form of external software agents, there is no structural change imposed on involving WfMSs, which brings in a WfMS-independent solution. However, since this approach relies on compatible workflows, the issue of compatibility has yet to be addressed. Obviously, cross-organisational workflow compatibility cannot be solved by means of cross-organisational workflow enactment alone but the idea of progressive linking paves the way for a possible direction for achieving it by letting the intelligent agents progressively negotiate the flow of interoperation from scratch within an intelligent framework for cross-organisational cooperation. Internal processes are exposed as services, which allows software agents to negotiate and pick up the desired ones on the fly to dynamically construct crossorganisational workflow. These will be addressed in the future work. It is expected that the progressive linking approach enabled by run-time agents will facilitate intelligent interoperation that will benefit B2B e-business among VEs.
Cross-Organisational Workflow Enactment Via Progressive Linking
59
References 1. Workflow Management Coalition: Terminology & Glossary, Technical Report WFMCTC-1011. Workflow Management Coalition (1999) 2. International Business Machines Corporation: IBM WebSphere MQ Workflow Version 3.6 – Concepts and Architecture. 8th edn. IBM Corp. (2005) 3. Guillaume, F.: Trying to Unify Entity-based and Activity-based Workflows. < http://www.zope.org/Wikis/DevSite/Projects/ComponentArchitecture/TryingToUnifiyWor kflowConcepts>. (accessed 3.7.2005) 4. Workflow Management Coalition: Workflow Standard – Interoperability Abstract Specification, Technical Report WFMC-TC-1012. Workflow Management Coalition (1999) 5. Biegus, L., Branki, C.: InDiA: a Framework for Workflow Interoperability Support by Means of Multi-Agent Systems. Engineering Applications of Artificial Intelligence 17 (7) (2004) 825-839 6. Workflow Management Coalition: The Workflow Reference Model, Technical Report WFMC-TC-1003. Workflow Management Coalition (1995) 7. Schulz, K.A., Orlowska, M.E.: Facilitating Cross-Organisational Workflows with a Workflow View Approach. Data & Knowledge Engineering 51 (1) (2004) 109-147 8. Leymann, F., Roller, D.: Production Worklow – Concepts and Techniques. Prentice Hall, New Jersey (2000) 9. Workflow Management Coalition: Interworkflow Application Model: The Design of Cross-Organizational Workflow Processes and Distributed Operations Management, Technical Report WFMC-TC-2102. Workflow Management Coalition (1997) 10. Bernauer, M., Kappel, G., Kramler, G., Retschitzegger, W.: Specification of Interorganizational Workflows – A Comparison of Approaches. Vienna University of Technology White Paper. Vienna (2002) < http://www.big.tuwien.ac.at/ research/ publications/2003/0603.pdf>. (accessed 1.11.2005) 11. Chiu, D.K.W., Cheung, S.C., Karlapalem, K., Li, Q., Till, S.: Workflow View Driven Cross-Organizational Interoperability in a Web-service Environment. Information Technology and Management 5 (2004) 221-250 12. Jennings, N., Norman, T., Faratin, P., O’Brien, P., Odgers, B.: Autonomous Agents for Business Process Management. International Journal of Applied Artificial Intelligence 14(2) (2000) 145-189 13. Workflow Management Coalition: Workflow Standard – Interoperability Wf-XML Binding, Technical Report WFMC-TC-1023. Workflow Management Coalition (2001) 14. O'Riordan, D.: Business Process Standards For Web Services. < http://www.webservicesarchitect.com/content/articles/BPSFWSBDO.pdf>. (accessed 6.7.2005) 15. Shepherdson, J., Thompson, S., Odgers, B.: Cross Organisational Workflow Coordinated by Software Agents. Proceedings of WACC Workshop on Cross-Organisational Workflow Management and Co-ordination, San Francisco. (1999) 16. Aalst, W.v.d. and Hee, K.v.: Workflow Management: Models, Methods, and Systems. MIT Press, Cambridge (2002) 17. Corkill, D.: Blackboard Systems. AI Expert 6(9) (1991) 40-47
Comparison and Analysis of Expertness Measure in Knowledge Sharing Among Robots Panrasee Ritthipravat1 , Thavida Maneewarn1 , Jeremy Wyatt2 , and Djitt Laowattana1 1
2
FIBO, King Mongkut’s University of Technology Thonburi, Thailand {pan, praew, djitt}@fibo.kmutt.ac.th School of Computer Science, University of Birmingham, United Kingdom
[email protected] Abstract. Robot expertness measures are used to improve learning performance of knowledge sharing techniques. In this paper, several fuzzy Q-learning methods for knowledge sharing i.e. Shared Memory, Weighted Strategy Sharing (WSS) and Adaptive Weighted Strategy Sharing (AdpWSS) are studied. A new measure of expertise based on regret evaluation is proposed. Regret measure takes uncertainty bounds of two best actions, i.e. the greedy action and the second best action into account. Knowledge sharing simulations and experiments on real robots were performed to compare the effectiveness of the three expertness measures i.e. Gradient (G), Average Move (AM) and our proposed measure. The proposed measure exhibited the best performance among the three measures. Moreover, our measure that is applied to the AdpWSS does not require the predefined setting of cooperative time, thus it is more practical to be implemented in real-world problems.
1
Introduction
Reinforcement learning notoriously requires a long learning period, particularly when applied with a complicated task. Additionally, it is difficult for a robot to explore huge state and action spaces in a short time. To alleviate these problems, multiple mobile robots have been served to learn a task by exploring different part of state and action spaces simultaneously. During a learning period, they may share the knowledge they have learnt. Unfortunately, most reinforcement learning techniques require auxiliary methods to integrate external knowledge sources into the robot’s knowledge. In general, knowledge gained from one robot is possibly different from that of the others, even if the robots have the same mechanism and learn the same task. This happens because the robots have different experiences and properties. Therefore, knowledge sharing among robots is one of the most challenging topics in robotic research. Knowledge sharing among reinforcement learning robots has been extensively studied in order to utilize and gain benefit from multiple knowledge sources, which can be obtained from other robots or even human being. However, the robot would gain more benefit when the knowledge is shared from the more competent sources than the less competent one. We believe that knowledge sharing M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 60–69, 2006. c Springer-Verlag Berlin Heidelberg 2006
Comparison and Analysis of Expertness Measure
61
techniques should achieve better performances if robots can identify the expertness of the knowledge source. Hence the robots can effectively determine the appropriate amount of the acquired knowledge from the source. Previous research shows that knowledge sharing can be carried out in several ways. They can be classified as direct and indirect methods. The direct method focuses on direct integration all available sources of shared knowledge into robot’s knowledge. Various techniques were studied. For example, the ‘Policy Averaging’ [1] is the method which averaged all policies into the new knowledge. The ‘Weighted Strategy Sharing: WSS’ [2] is the method in which weights were assigned to all knowledge sources according to the robot expertise or compatibility of agent state spaces [3] and then summed into the new knowledge. The ‘Same-policy’ [1] is the method in which all agents used and updated the same policy. In the indirect method, external knowledge sources will be used to guide the robot’s decision making, but they will not be integrated into robot’s learning directly. Most works used shared knowledge to bias action selection. The robot selects an action according to the resulting probabilities. Techniques were ‘Skill Advice Guided Exploration (SAGE)’ [4], ‘Supervised Reinforcement Learning (SRL)’ [5] etc.
2
Knowledge Sharing Techniques
Knowledge sharing techniques investigated in this paper are summarized as follows. 2.1
Shared Memory: SM
This technique is inspired from the ‘Same-policy’ [1]. After interaction with an environment, the robots use and update the same set of state-action values. Since all robots have the same brain, each individual robot’s experiences directly affects the overall robots’ decision making. For learning a task with n robots, action values will be updated n times in each iteration. Learning should be faster than individual learning since n robots explore various states simultaneously. 2.2
Weighted Strategy Sharing: WSS
The WSS method [2] is composed of two phases: individual learning and cooperative learning phases. Initial all robots are in the individual learning phase. They learn a task separately. At a predefined end of learning trials, all robots switch to the cooperative learning phase which allows the robots to share the learned state-action values. In this phase, the action values of all robots will be weighted and summed as the new knowledge for every robot as shown in Eq. 1 Qnew (s, a) =
n m=1
Wm Qm (s, a),
(1)
62
P. Ritthipravat et al.
where Qnew (s, a) is a new set of state-action values initialized for all n sharing robots. Superscript m indicates the mth robot’s. Wm is weight calculated from robots’ expertise as presented in Eq. 2 Wm =
expertnessm , n expertnessp
(m = 1, . . . , n),
(2)
p=1
where expertnessm is the mth robot expertness value. Therefore, at the end of cooperative learning phase, all robots have homogeneous set of state-action values. The individual learning phase will then be continued thereafter. These phases will switch back and forth at every cooperative time. The cooperative time is set at every predefined end of learning trials. It will determine how often sharing knowledge is carried out. Frequent sharing means the robots have less time for being in the individual learning phase. This results in low diversity of robots’ knowledge. On the contrary, infrequent sharing means that high diversity of the robots’ knowledge can be achieved. However, the robot rarely takes benefit from them. Setting the suitable cooperative time is quite difficult. Additionally, all robots have to finish their individual learning phase before the cooperative learning phase can start. If a robot finishes its individual learning phase before the other robots, it has to wait for other robots to finish their individual learning phase. Then they can share their knowledge. Therefore, the WSS does not support asynchronous knowledge sharing in multiple robots. Problems of setting the suitable cooperative time and waiting for the other robots have made the WSS impractical for real-world implementation. The idea of adaptively selecting the suitable cooperative time is inspired from [6]. We will introduce the new method named Adaptive Weighted Strategy Sharing in the next subsection. 2.3
Adaptive Weighted Strategy Sharing: Adaptive WSS
This strategy allows each robot to make a decision whether to share knowledge with the other n − 1 robots by itself. The robot is presumed to perceive all the other robots’ knowledge and their expertness values at any time t. At the end of robot learning trial, the robot will assign weights to all sources of shared knowledge as computed from Eq. 2. Difference between the robot’s weight, Wi , and that of the other robots, Wj where j ∈ n − 1 will be employed to determine j=i
probability of sharing, P robsharing , as shown in Eq. 3. ⎧ ⎪ if |Wi − Wj | ≤ T h1 , ⎨0 P robsharing = 1 if |Wi − Wj | ≥ T h2 , ⎪ ⎩ |Wi −Wj |−T h1 otherwise. T h2 −T h1
and
(3)
In Eq. 3, two thresholds Th1 and Th2 will be set. In this paper, they are 0.1 and 0.5 respectively. If the difference is less than Th1 , sharing will not be occurred. In the contrary manner, if the difference is higher than the Th2 , the sharing probability will be 1. In this manner, each robot is able to determine which
Comparison and Analysis of Expertness Measure
63
sharing robots are. Once sharing robots have been determined, a new knowledge can be obtained from Eq. 1. After the sharing, all sharing robots will have the same level of expertise. Therefore new expertness value for all sharing robots can be computed from an average of the sharing robot expertness values. Unlike the WSS method, the robot will update its knowledge and the expertness value immediately while the other sharing robots are learning a task. The new knowledge and the expertness value will be kept in robots’ memories. Once robot’s learning trial is finished, they will be employed for the next learning trial. From the techniques presented above, weights play an important role in knowledge sharing among robots. Weight will be used not only to determine whether the knowledge from the sources should be used but also how much the knowledge from the source should be contributed to the new knowledge. In this paper, weight which is determined from the robot expertness will be studied. A new robot expertness measure based on regret evaluation is proposed.
3
Measure of Expertness
The robot expertness indicates how well its current policy is. Two expertness measures previously proposed by Ahmadabadi’s team were Gradient (G) and Average Move (AM) measures. The expertness value evaluated by the G measure takes accumulated rewards since each individual learning phase has begun in order to be less biased from long history of experiences. However, the G measure suffers when it has negative value. The higher negative value of G could have two possible meanings, either it has sufficiently learned to indicate which actions should not be executed or it is exploring improper actions. In the second case, the use of this measure can degrade the robot learning performance. Another measure is the Average Move (AM). The AM takes an average number of moves that the robot executed before achieving the goal into consideration. The lower number of moves that the robot has done, the higher expertness value is. However, when the robot randomly explores an environment, the AM cannot be used to represent the robot expertise. As described above, previous measures are inefficient to represent the robot expertness measure. In this paper, we proposed a new measure of expertise based on regret evaluation. The regret measure is formed from the uncertainty bounds of the two best actions, i.e. the greedy action and the second best action. Bounds of both actions will be compared. If the lower limit of the bound of the greedy action is higher than the upper limit of the bound of the second best action, it is more likely that the greedy action is the best action. The regret measure given state s at time t + 1 is calculated from: regret(st+1 ) = −(lb(Q(st+1 , a1 )) − ub(Q(st+1 , a2 ))),
(4)
where lb(Q(st+1 , a1 )) is the lower limit of estimated state-action value given state s at time t + 1 of the greedy action. ub(Q(st+1 , a2 )) is the upper limit of approximated state-action value given state s at time t + 1 of the second best action. They are approximated from past state-action values sampled from time
64
P. Ritthipravat et al.
t− k + 1 to t as: {QT (sT , a)}tT =t−k+1 = {Qt−k+1 (st−k+1 , a), . . . , Qt (st , a)} where k is a number of samples. Due to the fact that normal distribution in each stateaction value is assumed, the uncertainty bounds of the sampled state-action values, Bound(Qt (st , a)), can be estimated from: sˆ Bound(Qt (st , a)) = QT (sT , a) ± t α2 ,k−1 √ , k
(5)
where QT (sT , a) is the mean of the k samples of the estimated state-action values. sˆ is sample standard deviation. (1 − α) is the confidence coefficient and t α2 ,k−1 is student’s t-distribution providing an area of α2 in the upper tail of a t distribution with k − 1 degrees of freedom. Mapping the regret measure into the expertise value given state s at time t can be defined as follows: expertnessm (st ) = 1 −
1 . 1 + exp(−b ∗ regret(st ))
(6)
From Eq. 6, the regret value is mapped into a flipped sigmoid function ranges between [0 1]. b is slope of the mapped function. The large negative regret measure causes the expertness value to approach one. Illustrative examples of positive and negative regret measure are shown in Figs. 1(a) and 1(b). In Figs. 1(a) and 1(b), a state composed of 5 possible actions is taken into consideration. Each action has its corresponding uncertainty bound as presented by three horizontal lines. The upper and lower lines are the upper and lower limits of the bound. The lines labeled with the numbers indicate limits considered. The middle line represents average of estimated state-action values. Small circles represent possible greedy actions. As seen in Fig. 1(a), there exist two possible greedy actions, i.e. actions a3 and a4 . If the greedy action is a4 , the robot may decide that the action a4 is the best action. However, action a3 has a higher average of estimated state-action values. It is more likely to be the best action. Therefore, when uncertainty bounds of two best actions are overlapped, the best action cannot be explicitly determined. In this case, whatever the greedy action is, the regret measure will have a positive value. It represents that the robot
(a)
(b)
Fig. 1. Examples of regret measure (a) positive regret (b) negative regret
Comparison and Analysis of Expertness Measure
65
lacks the confidence whether the greedy action is the best action. In Fig. 1(b), the greedy action is definitely the action a3 and it has a high chance of being the best action. The regret measure will have a negative value. Thus, the measured regret can represent how confidence the robot has in taking the greedy action.
4
Simulation
Eight techniques were used to test effectiveness of our proposed expertness measure within the scope of two problems: knowledge sharing among robots that learn a task from scratch and among robots that relearn the transferred knowledge. Fuzzy Q-learning technique used in this paper follows unmodified version presented in [7]. These techniques are 1.)‘SP’; Separate learning or learning without sharing knowledge. 2.) ‘SM’: Shared Memory. 3.)‘WSS’: Weighted Strategy Sharing with G expertness measure 4.) ‘WSSAM’: WSS with AM measure 5.) ‘WSSR’: WSS with regret measure. 6.) ‘AdpWSS’: Adaptive WSS + G. 7.) ‘AdpWSSAM’: Adaptive WSS + AM 8.) ‘AdpWSSR’: Adaptive WSS + regret. For the WSS based algorithms, 5 cooperative times are studied. They are at every 20, 40, 60, 80, and 100 learning trials. The best results will be selected and compared with the other algorithms. For the first problem, intelligent goal capturing behaviour is simulated. The goal will avoid being captured once it realizes that the distance between the goal and the robot is lower 30 cm. and the orientation of the goal w.r.t the robot is in between -45◦ and 45◦ . It will run in a perpendicular direction to the robot’s heading direction. 1000 trials are tested for a single run. For the second problem, two robots relearn knowledge gained from a static obstacle avoidance to form the dynamic obstacle avoidance behaviour. Two robots learn to approach a goal which has two opponents moving in opposite direction. Learning parameters for all problems are summarized in table 2. The discount factor and learning rate are 0.9 and 0.1 for all problems respectively. Parameters were tuned for each algorithm by hand. The accumulated reward, averaged over trial in each run, Accumulated Rewards (Intelligent Goal Capturing) 80
60
40
20
SP SM WSS(60) WSSAM(20) WSSR(20) AdpWSS01 AdpWSSAM01 AdpWSSR01
0
−20
−40
−60
(a)
0
100
200
300
400
500 trials
600
700
800
900
(b)
Fig. 2. Intelligent goal capturing (a) robots’ path (b) accumulated rewards
1000
66
P. Ritthipravat et al.
is recorded. Their averages over 50 runs are used to compare the learning performance of these algorithms. From the simulation results, accumulated rewards and collision rates of these behaviours are summarized in table 1. For the WSS based algorithm, the best results are selected to compare with the other algorithms. The best cooperative time will be presented in the parenthesis. As an example, the WSS (20) is the WSS algorithm with the cooperative time is set at every 20 learning trials. Robot’s path and accumulated rewards averaged over trial of the first and the second problems are shown in Figs. 2 and 3 respectively. In Fig. 2(a), two robots learn to capture their goals in separated environment. The robots are presented by the bigger circles with lines indicated their heading direction. In Fig. 3(a), two robots learn to move to the goal while avoiding collision with dynamic obstacles in the same environment. Two opponents move in opposite direction presented by arrow lines as to obscure the robots. The robots have to avoid both opponent robots and their teammate. The simulation results showed that, the WSSR gave the best performances. The second best algorithm is the WSSAM. However, the AM measure is not suitable when applying to the Adaptive WSS. As seen in Figs. 2(b) and 3(b), the AdpWSSAM was only slightly better than the separate learning method. The SM was better than the AdpWSS and the AdpWSSAM. However it converged to suboptimal accumulated rewards in both problems. The AdpWSSR gave the superior performances for the first problem. However, it has slightly inferior performances compared to the WSS based algorithms in the second problem. As seen from the simulation results, our proposed expertness measure based on regret evaluation resulted in better performance than other expertness measures. For the WSS based algorithm, the best cooperative time depended on task and algorithms. Setting the suitable time for real-world implementation makes the algorithms are impractical. Additionally, it is not support asynchronous learning in multiple robots since the robot has to wait for all other robots to finish their individual learning phase. Therefore, only the AdpWSSR, SM and SP methods are suitable to be implemented into real-world robot learning problem. The results from the real-world knowledge sharing experiment are shown in the next section. Accumulated Rewards (Relearning) 60
50
40
30
20 SP SM WSS(20) WSSAM(40) WSSR(60) AdpWSS01 AdpWSSAM01 AdpWSSR01
10
0
−10
−20
(a)
0
0.2
0.4
0.6
0.8
1 trials
1.2
1.4
1.6
1.8
(b)
Fig. 3. Dynamic obstacle avoidance (a) robots’ path (b) accumulated rewards
2 4
x 10
Comparison and Analysis of Expertness Measure
67
Table 1. Simulation results (average over two behaviours) (a) General method Methods Accum. Collision Rewards rate SP 49.4209 0.1274 SM 54.5071 0.1287 — — —
(b) WSS based method Methods Accum. Collision Rewards rate WSS 58.3409 0.1122 WSSAM 61.1725 0.0970 WSSR 61.2822 0.0846
(c) AdpWSS based method Methods Accum. Collision Rewards rate AdpWSS 53.8814 0.1215 AdpWSSAM 51.9866 0.1229 AdpWSSR 58.6398 0.1035
Table 2. Parameter setting Parameters Inputs of the FIS Outputs of the FIS
Problem 1
Problem 2
‘dg ’ , ‘og ’
‘dg ’, ‘og ’, ‘dob ’, ‘oob ’
Gains of linear and angular velocities with candidate actions are [0, 0.25, 0.5] and [-1, 0, 1] respectively
Centre positions of Membership functions
‘dg ’: {0, 15, 40} cm. ‘og ’: {-0.4, 0, 0.4} rad.
Spans of Membership functions
‘dg ’: {5, 5, 15} cm. ‘dg ’: {15, 15, 30} cm. ‘og ’: {0.15, 0.2, 0.15} rad. ‘og ’: {0.15, 0.2, 0.15} rad. ‘dob ’: {5, 5, 5} cm. ‘oob ’: {0.25, 0.25, 0.25, 0.25, 0.25}rad. 1 if ε > 0.1 ε = 0.1 ε = (success)+1 0.1 otherwise ⎧ success ⎨ 100 100 success r1 (t) = r2 (t) = −100 collide −0.1 otherwise ⎩ 0 otherwise
ε − probability
Reward function
‘dg ’: {20, 60, 140} cm. ‘og ’: {-0.4, 0, 0.4} rad ‘dob ’: {15, 30, 10}cm. ‘oob ’: {− π2 , − π4 , 0, π4 , π2 } rad.
where ‘dg ’ is distance from a goal to a robot ‘og ’ is orientation of the goal with respect to the robot reference frame ‘dob ’ is distance from the closest obstacle to the robot ‘oob ’ is orientation of the closest obstacle with respect to the robot reference frame
5
Experiments
In this section, knowledge gained from robotic simulators is transferred to real robots. The robots learn a dynamic obstacle avoidance behaviour in the same environment within 1300 learning trials. Experiments were performed on the robotic soccer platform. However, to avoid frequent collision among robots, the problem is simplified. The opponent will stop moving if there is an obstacle within 5 cm. Three algorithms are implemented into real robots. They are ‘SP’, ‘SM’ and ‘AdpWSSR’. Experimental results are shown in Figs. 4(a) and 4(b).
68
P. Ritthipravat et al. Accumulated reward 120 SP SM AdpWSSR
100
80
60
40
20
0
−20
−40 0
(a)
200
400
600 trials
800
1000
1200
(b)
Fig. 4. Experimental results (a) robots’ path (b) accumulated rewards
In Fig. 4(a), robots’ path was shown. Two opponents moved in the opposite direction as presented by arrows. The robot1 moved to the goal successfully while avoiding collision with the robot2 and the opponent2. The robot2 was moving to the goal. As seen in Fig. 4(b), the AdpWSSR gave higher accumulated rewards averaged over trial in a long run. In the beginning of relearning period, the SM gave the best results. This was corresponding to the simulation results showed in Fig. 3(b). However learning performance of the SM was decreased when learning period was extended. Both knowledge sharing techniques gave better performances than learning without sharing knowledge.
6
Discussion
From the simulation results, the WSSR gave the best performances averaged over two behaviours. However, it is impractical for implementation in the real robot since it requires predefined setting of cooperative time. Additionally, it does not support asynchronous knowledge sharing among robots because the robot has to wait for all other robots to finish their individual learning phase before the cooperative learning phase can begin. Though using AM measure with the WSS based algorithm seems to work properly, it is not suitable when it is applied to the Adaptive WSS algorithm. Additionally, the AM measure cannot be used effectively when the robot randomly explores the states. SM gave the lowest performances compared to the other knowledge sharing algorithms. Its problem arises when a local minimum is encountered. It is difficult for the robots to get out from such situation. This happened because the robots use common decision making policy. It is difficult to achieve different solutions from the group’s judgment. Therefore, the proposed measure of expertise gave the best results among all measures since it truly represents the robot expertness. Moreover, the robot expertness value measured by our method is varied over state space according to robot’s experiences. This is different from weights assigned by previous measures in which the robot’s expertise is treated to be equal in each state. Treating in this manner results in low diversity of the robots’ learning after knowledge
Comparison and Analysis of Expertness Measure
69
sharing is arisen. The experimental results showed that the knowledge gained from robotic simulators can be seamlessly transferred to the real robots. The Adaptive WSS with regret measure is practical for real-world implementation since it does not require the predefined setting of cooperative time and it also supports asynchronous learning in multiple robots.
7
Conclusion
In this paper, various knowledge sharing algorithms are studied. A new expertness measure based on regret evaluation is proposed. Simulation results showed that our proposed measure when applied to various investigated algorithms gave the best performances compared to the other previously proposed measures. Additionally, the Adaptive WSS with our proposed regret measure is practical for real-world implementation since the setting of suitable cooperative time is not required and it also supports asynchronous knowledge sharing among robots. Experimental results showed that knowledge gained from robotic simulators can be seamlessly transferred to real robots. However, learning time still needs to be improved. Extension to other knowledge sharing methods should be explored in the future works.
Acknowledgement This work was supported by the Thailand Research Fund through the Royal Golden Jubilee Ph.D Program (Grant No. PHD/1.M.KT.43/C.2). Matlab software was supported by TechSource System (Thailand).
References 1. Tan, M.: Multi-agent reinforcement learning: Independent vs cooperative agents. In: Proc. 10th Int. Conf. Machine Learning (1993) 2. Ahmadabadi, M.N., Asadpour, M.: Expertness Based Cooperative Q-Learning. IEEE Trans. SMC.–Part B. 32(1) (2002) 66–76 3. Bitaghsir, A.A., Moghimi, A., Lesani, M., Keramati, M.M., Ahmadabadi, M.N., Arabi, B.N.: Successful Cooperation between Heterogeneous Fuzzy Q-learning Agents. IEEE Int. Conf. SMC. (2004) 4. Dixon, K.R., Malak, R.J., Khosla, P.K.: Incorporating Prior Knowledge and Previously Learned Information into Reinforcement Learning Agents. Tech. report. Institute for Complex Engineered Systems, Carnegie Mellon University (2000) 5. Moreno, D.L., Regueiro, C.B., Iglesias, R., Barro, S.: Using Prior Knowledge to Improve Reinforcement Learning in Mobile Robotics. Proc. Towards Autonomous Robotics Systems. Univ. of Essex, UK (2004) 6. Yamaguchi, T., Tanaka, Y., Yachida, M.: Speed up Reinforcement Learning between Two Agents with Adaptive Mimetism. IEEE Int. Conf. IROS. (1997) 594–600 7. Ritthipravat, P., Maneewarn, T., Laowattana, D., Wyatt, J.: A Modified Approach to Fuzzy Q-Learning for Mobile Robots. In: Proc. IEEE Int. Conf. SMC. (2004)
Multiagent Realization of Prediction-Based Diagnosis and Loss Prevention Roz´alia Lakner1 , Erzs´ebet N´emeth1,2 , Katalin M. Hangos2 , and Ian T. Cameron3 1
2
Department of Computer Science, University of Veszpr´em, Veszpr´em, Hungary Systems and Control Laboratory, Computer and Automation Research Institute, Budapest, Hungary 3 School of Engineering, The University of Queensland, Brisbane, Australia 4072
Abstract. A multiagent diagnostic system implemented in a Prot´eg´eJADE-JESS environment interfaced with a dynamic simulator and database services is described in this paper. The proposed system architecture enables the use of a combination of diagnostic methods from heterogeneous knowledge sources. The process ontology and the process agents are designed based on the structure of the process system, while the diagnostic agents implement the applied diagnostic methods. A specific completeness coordinator agent is implemented to coordinate the diagnostic agents based on different methods. The system is demonstrated on a case study for diagnosis of faults in a granulation process based on HAZOP and FMEA analysis.
1
Introduction
For complex multiscale process systems that are difficult to model, a combination of model-based analytical and heuristic techniques is usually needed to develop a diagnostic system [1]. The approach of multiagent systems (MAS) [2] which emerged in AI represents a promising solution for such a diagnosis task, being based on information from heterogeneous knowledge sources [3]. A multiagent system can then be used for describing the system model, the observations, the diagnosis and loss prevention methods with each element being established through formal descriptions. This work investigates the use of the architecture and algorithms of multiagent systems for diagnosing faults in process plants when both dynamic models and heuristic operational knowledge of the plant are available. In particular, we consider a granulation process and the advice to operators in order to reduce potential losses. The significance of this work lies in a coherent fault detection and loss prevention framework based on a well-defined formalization of complex processes and the diagnostic procedures. M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 70–80, 2006. c Springer-Verlag Berlin Heidelberg 2006
Multiagent Realization of Prediction-Based Diagnosis and Loss Prevention
2 2.1
71
Main Processes and Techniques in Fault Detection and Diagnosis Fault Detection, Diagnosis and Loss Prevention
Early detection and diagnosis of process faults while the plant is still operating in a controllable region can help avoid abnormal events and reduce productivity loss. Therefore diagnosis methods and diagnostic systems have practical significance and strong traditions in the engineering literature. The diagnosis of process systems is usually based on symptoms. Symptoms are deviations from a well-defined ”normal behaviour”, such as Tlow = (T < Tmin ) which is defined by using a measurable temperature variable T . In the case of a dynamic system the measurable quantities are time-varied, so the symptoms related to these variables will also change with time. In model-based fault detection and diagnosis one usually assigns a so-called root cause to every faulty mode of the system, the variation of which acts as a cause of the fault. In the case of a fault it is usually possible to take actions in the initial phase of the transient to avoid serious consequences or to try to drive the system back to its original ”normal” state. Dedicated input signal(s) serve this purpose for each separate fault (identified by its root cause) where the preventive action is a prescribed scenario for the manipulated input signal. 2.2
HAZOP and FMEA Analysis
The information available for the fault detection and diagnosis task is typically derived from a variety of sources which have varying characteristics. These sources include conceptual design studies and risk analyses as well as detailed dynamic models for parts of the system or for specific operating modes [4]. Heuristic operational experience is often elicited from operators and other plant personnel. The heuristic information can be collected with systematic identification and the analysis of process hazards, as well as the assessment and mitigation of potential damages using so-called Process Hazard Analysis (PHA). There are several methods used in PHA studies such as Failure Modes and Effects Analysis (FMEA), Hazard and Operability Analysis (HAZOP), Fault Tree Analysis (FTA) and Event Tree Analysis (ETA). The Hazard and Operability study is the most widely used methodology for hazard identification. HAZOP [5] is a systematic procedure for determining the causes of process deviations from normal behaviour and the consequences of those deviations. This works on the fundamental principle that hazards and operational problems can arise due to deviations from normal behaviour. It addresses both the process equipment, operating procedures and control systems (in this case, known as CHAZOP). Failure mode and effect analysis (FMEA) [6] is a qualitative analysis method for hazard identification, universally applicable in a wide variety of industries. FMEA is a tabulation of each system component, noting the various modes by
72
R. Lakner et al.
which the equipment can fail, and the corresponding consequences (effects) of the failures. FMEA focuses on individual components and their failure modes. HAZOP and FMEA provide a comprehensive analysis of the key elements that help constitute an effective diagnostic system. The incorporation of failure modes can greatly enhance the tool’s capabilities. 2.3
Prediction-Based Diagnosis
Prediction of a system’s behaviour is used for deriving the consequences of a state of the system in time that is usually performed in process engineering by dynamic simulation. With the help of prediction, however, the faulty mode of the system can also be detected based on the comparison between the real plant data and the predicted values generated by a suitable dynamic model. This type of fault detection and diagnosis is called prediction-based diagnosis [7]. Because process systems are highly nonlinear and their models can be drastically altered depending on the actual fault mode, simple reduced models are needed for prediction-based diagnosis.
3
Knowledge Representation of the Diagnostic System
The proposed framework for a multiagent diagnostic system consists of an ontology design tool and a multiagent software system. The domain specific knowledge is represented as modular ontologies using the ontology design tool Prot´eg´e [8]. This knowledge is integrated into a multiagent software system where different types of agents cooperate with each other in order to diagnose a fault. 3.1
Process-Specific Ontology
The process-specific ontology describes the concepts, their semantical relationships and constraints related to the processes in question, similar to the general ontology for process systems given by OntoCAPE [9]. The process-specific ontology has two different parts, namely the common knowledge of the general behaviour of the process systems and the application-specific knowledge. This description defines the structure of a general process model for the process in question and enables the construction of a concrete process model realization which can be used as a dynamic simulation both in real-time simulation and in prediction-based diagnosis. 3.2
Diagnostic Ontology
The knowledge from human expertise and operation about the behaviour of the system in the case of malfunction, together with the reasons, consequences and possible corrections is described here. The diagnostic ontology contains the semantic knowledge on diagnostic notions (e.g. symptoms, root causes), different kind of tools such as FMEA and HAZOP tables and procedures such as reasoning based on FMEA or HAZOP knowledge.
Multiagent Realization of Prediction-Based Diagnosis and Loss Prevention
3.3
73
Real-Time Database
Both the process-specific ontology and the diagnostic ontology contain timevarying elements such as process variables, actuator variables and their related variables. The values of these variables can be supplied by either a real process or a simulator and can be stored in the real-time database.
4 4.1
The Multiagent Diagnostic System The Main Elements of the Multiagent Diagnostic System
Similar to the ontology classification described above, the agents of the diagnostic system belong to three main categories such as process-related, diagnosticrelated and real-time service related agents. The process agents. Process agents assist the user and the other agents in modelling and simulation of the process in question. This can be performed under different, faulty and non-faulty circumstances. Some types of process agents and their main tasks are as follows: – Process output predictors (PPs) provide prediction by using dynamic simulation with or without preventive action(s). – The prediction accuracy coordinator (PAC) checks the accuracy of the prediction result and calls additional agents to refine the result if necessary. – Model parameter estimators can be associated with each of the PPs. The PAC may call this agent by requesting a refinement of the model parameters when the accuracy of the agent is unsatisfactory. The diagnostic agents. Diagnostic agents perform measurements, symptom detection, fault detection [7], fault isolation and advice generation for avoiding unwanted consequences. These agents may perform logical reasoning and/or numerical computations. Some types of diagnostic agents and their main tasks are as follows: – The symptom generator and status evaluator is based on non-permissible deviations that checks whether a symptom is present or not. – The state and diagnostic parameter estimators (SPEs) are advanced symptom generators that use several related signals and a dynamic state space model of a part of the process system to generate a symptom. – Fault detectors (FDs) use the services provided by SPEs or PPs to detect the fault(s) by using advanced signal processing methods. – Fault isolators (FIs) work in the case of the occurrence of a symptom to isolate the fault based on different techniques (fault-tree, HAZOP, FMEA, fault-sensitive observers etc.). – Loss preventors (LPs) suggest preventive action(s) based on different techniques that have been used for the HAZID and remedial actions (HAZOP, prediction, etc.).
74
R. Lakner et al.
– The completeness coordinator checks completeness of the result (detection, isolation or loss prevention) and calls additional agents if necessary. – The contradiction or conflict resolver (CRES) calls additional agents in case of contradiction to resolve it. The real-time agents. Beside the two main categories, the diagnostic system contains the following real-time agents for controlling and monitoring the process environment: – Monitoring agents access and/or provide data from real world or from simulation. – Pre-processor agents detect the non-permissible deviations which can be the possible symptoms. – Control agents control the process in case of preventive actions. – Corroborating agents act on requests from diagnostic agents and provide additional measured values or information. 4.2
The Structure of the Multiagent Diagnostic System
Several agent construction and simulation tools have been proposed in the literature by a number of researchers and commercial organizations. A non-exhaustive list includes: ABLE [10], AgentBuilder [11], FIPA-OS [12], JADE [13] and ZEUS [14]. The JADE (Java Agent DEvelopment Framework) has been chosen as the
Real process or real-time simulator
Monitoring Agent
Real-time database
Corroborating Agent
PreProcessor Agent
Control Agent
Real-time agents Based on Real-time database ontology ACL messages Agent Directory Management Facilitator System RMI server (for communication)
Remote Monitoring Agent (GUI)
Diagnostic agents Based on Diagnostic ontology (HAZOP, FMEA)
Model pa rameter es timator
Pre diction a ccuracy coo rdinator
Proce ss output p redictor
Contradiction or conflict resolver
ACL messages Completeness coordinator
State and diagnostic param eter estim ator
Loss preventor
Fault isolato r
Fault d etector
Symptom generator
ACL messages
Process agents Based on Process-specific ontology
Fig. 1. The structure of the multiagent diagnostic system
Multiagent Realization of Prediction-Based Diagnosis and Loss Prevention
75
multiagent implementation tool, that has integration facilities with the Prot´eg´e ontology editor and the Java Expert System Shell (JESS) [15]. The JADE agent platform can be split into several containers which are separate JAVA virtual machines and contain agents implemented as JAVA threads. The communication among the agents is performed through message passing represented in FIPA Agent Communication Language (FIPA ACL). JADE does not support inferencing techniques but it can be integrated with some reasoning systems, such JESS and Prolog. JESS is a rule engine and scripting environment written in the JAVA language. It possesses both a very efficient forward chaining mechanism using the Rete algorithm as well as a backward chaining mechanism. The dynamic models for the simulations are implemented in MATLAB. MATLAB serves to generate real-time data of the simulated process system and it contains the simplified models for prediction. The communication between MATLAB and JADE is solved by the TCP/IP protocol. For storing the huge amount of archive data a MySQL database is used. The connection between JADE and MySQL databases is realized by MySQL Connector/J. The main elements and the software structure of the proposed multiagent diagnostic system implemented in JADE can be seen in Figure 1.
HP-101 Feed Hopper CC-101 Screw Feeder CC-103 Recycle conveyor M1 Screw Feeder Motor M8 Bucket Elevator Motor
T-101 Binder tank P-101 Binder pump PRV-101 Pressure relief
Binder (S3)
T-101
VC-101 Video Camera HE-101 Dryer Bar Heater RD-102 Rotary Dryer VS-101 Vibratory Sieve M3 Rotary Drier Motor M5 Vibratory Sieve Motor V1 Vibrator for shute
RD-101 Granulator CR-101 Roll Crusher CC-102 Belt Weigher/Conveyor M2 Rotary Drum Motor M6 Crusher Motor M7 Belt Weigher Motor V2 Vibrator for recycle shute
HE-102 Air Heater FN-101 Fan M4 Fan Motor
PRV-101 F FT 103
FE 103
FT 102
FE 102
UNAC/PLC Control System
F
P-101
Fresh Feed (S1) S1
FC 103
F FT 101
FC 102
FE 101
HP-101
from ST101 W101
FC 101
S3b
S3a
V 101
S3c
S1
TT 102
HE-101
RD-101
TE 101
V2
S4 M2 S-10
SC 102
M8 SC 107
CC-103
JC 101
M4
3 Phase
RD-102
air From PLC/UNAC
air
V1
SC 101
From PLC/UNAC
S2 M1
SC 104
VC-101
CC-101
Air (S5)
S5 HE-102
S6
M3
FN-101
Trip SC 103
S 105
S7
HT 101
manual
Oversize
S10
FT 104
VS-101
Product (S11)
S11
CR-101
M6
S8
M5
S9 JC 102
Undersize CC-102 Control Legend S Speed Controller T Temperature Controller W Weight Controller ST Tachometer F Flow controller VC Video Camera V Vision system J Power Controller H Humidity Measurement CS Control System P Pressure
ST 101 To PLC/UNAC
M7 S 106
Granulation Pilot Plant From PLC/UNAC
Department of Chemical Engineering The University of Queensland W 101
Date: 17 / 02 / 04
SIZE
Drawn:
SCALE
C. Atkinson / I.Cameron
FSCM NO
A4
Fig. 2. Granulation pilot plant schematic
DWG NO
REV
GP-1001 NTS
SHEET
3 1 OF 1
76
R. Lakner et al.
5
Case Study
The proposed methods and the prototype diagnostic system are demonstrated on a commercial fertilizer granulation system [16]. The simplified flowsheet of the plant with the variables used by the diagnostic system is shown in Fig. 2. The aim of the case study was to investigate the cooperation of the diagnostic agents, therefore we have selected a case when the diagnostic result can only be obtained by a combination of different fault detection and isolation methods. 5.1
Knowledge Elements of the Granulation Diagnostic System
There are two different types of knowledge elements in the granulator diagnostic system. The dynamic process models that contain traditional engineering knowledge of a process plant in the form of a set of differential-algebraic equations and the systematically collected heuristic knowledge that originates from a HAZOP or FMEA analysis. The results of the HAZOP analysis are collected in a HAZOP result table, the structure of which is shown in Fig. 3. It defines logical (static) cause-consequence relationships between symptoms and potential causes that can be traced to root Guideword Mean Particle Diameter (D50)
Slurry feed flow
Deviation LESS
LESS
Possible causes
Consequences
(1) Decrease in fresh feed size
(2) Decrease/loss of slurry flow (1) operator error in setting the flowrate (2) failure in valve actuator (3) failure in valve causing closure (4) reduced slurry production in preneutralizer
Action required
* decrease in system holdup
a) increase fresh feed size
* change in granulation condition * change in recycle PSD
b) change to original feed type c) increase slurry flow
* reduced liquid phase in granulator * lack of granulation * lower product size range flow from granulator
The structure of a HAZOP result table ComDescripponent tion FCV Slurry flow control valve
Failure mode Stuck
Possible causes maintance failure
Effects Local loss of flow control
Detection System potential product indirectly via quality impacts product quality
corrosion Closed
lower or no flow
Criticality
Action
MEDIUM - review maintenance quality procedures reduction in product
no growth D50 reduces in product
Open
The relevant part of the FMEA table Fig. 3. HAZOP and FMEA result tables
Multiagent Realization of Prediction-Based Diagnosis and Loss Prevention
77
causes of the deviation. The table in Fig. 3 illustrates two related symptoms with at least two different causes each. A possible cause is regarded as a root cause if it refers to a failure mode of a physical component in the system, for example cause (2) in the second row of the HAZOP table. When such a root case is found we can complement or refine the diagnosis result by using the corresponding item from the FMEA table also shown in Fig. 3. 5.2
Simulation Results
In order to illustrate the operation of the proposed agent-based diagnostic system, only a part of the system, namely the diagnostic agent-set, based on logical reasoning is demonstrated. The structure of the agent-system can be seen in the left-hand side of Figure 4. Apart from the built-in main-container’s agents, the agent platform contains three containers: the first for the real-time agents (MonitoringAgent and PreProcessorAgent), the second for the diagnostic agents (SymptomGeneratorAgent, FaultIsolatorAgents - based on both HAZOP and FMEA analysis, CompletenessCoordinatorAgent and LossPreventorAgent) and the third for a process agent (ProcessOutputPredictor). The main behaviour of the diagnostic agents is the logical reasoning based on heuristic knowledge (HAZOP, FMEA) with the help of the JESS rule engine. The communication and the operation of the diagnostic agent sub-system can be seen in the right-hand side of Figure 4. Based on the variable-values supplied by the MonitoringAgent the PreProcessorAgent determines the deviances in the system. In the case of deviance the SymptomGeneratorAgent checks the presence of symptoms and informs the CompletenessCoordinatorAgent. It calls the FaultIsolator- and LossPreventorAgent to determine the possible faults and suggest preventive actions. In case
Fig. 4. The structure and the communication of the agent system
78
R. Lakner et al.
Fig. 5. The HAZOPFaultIsolatorAgent’s conclusion
of multiple faults the FMEAFaultIsolatorAgent refines the result. Based on the suggestions of these agents the CompletenessCoordinatorAgent orders the operation of the ProcessOutputPredictor for predicting the behaviour of the system with the preventive action. The diagnostic process performed by the above agents is illustrated on the example of a symptom, when the mean particle diameter (d50 ) is less than a limit value. This situation corresponds to the rows of the HAZOP table seen in Fig. 3. A part of these diagnostic agents’ conclusions can be seen in Fig. 5 where the messages about the operation of the HAZOPFaultIsolatorAgent are listed. The above listed diagnosis and loss prevention results has been refined the diagnostic results based on the FMEA analysis initiated by the CompletenessCoordinatorAgent and the unique root cause "Slurry flow control valve fails Closed" has been deduced.
6
Conclusion and Discussion
A novel coherent fault detection and loss prevention framework for process systems is proposed in this paper implemented in a Prot´eg´e-JADE-JESS environment that has clearly shown the advantages of such a technology in building complex diagnostic systems based on heterogeneous knowledge sources. The process
Multiagent Realization of Prediction-Based Diagnosis and Loss Prevention
79
ontology and the process agents based thereon have been designed following the structure of process systems that is first explored in [17] where a Coloured Petri Net-based diagnosis system is described. The diagnostic procedures based on model-based reasoning have been developed for a G2-based intelligent diagnostic system in [18] where the need for combining the different fault isolation methods to refine the diagnosis has arisen.
Acknowledgements This research has been supported by the Hungarian Research Fund through grants T042710 and T047198, which is gratefully acknowledged, as well as the Australian Research Council International Linkage Award LX0348222.
References 1. Blanke, M., Kinnaert, M., Junze, J., Staroswiecki, M., Schroder, J., Lunze, J., Eds, Diagnosis and Fault-Tolerant Control. Springer-Verlag. (2003) 2. Jennings, N., R.,Wooldridge, M., J.: Agent Technology, Springer-Verlag, Berlin. (1998) 3. W¨ orn, H., et al.: DIAMOND: Distributed Multi-agent Architecture for Monitoring and Diagnosis, Production Planning and Control. 15 (2004) 189–200 4. Cameron, I.T., Raman, R.: Process Systems Risk Management. Elsevier, (2005) 5. Knowlton, R., E.: Hazard and operability studies : the guide word approach, Vancouver: Chematics International Company (1989) 6. Jordan, W.: Failure modes, effects and criticality analyses. In: Proceedings of the Annual Reliability and Maintainability Symposium, IEEE Press (1972) 30–37 7. Venkatasubramanian, V., Rengaswamy, R., Kavuri, S., N.: A review of process fault detection and diagnosis Part II: Qualitative models and search strategies. Computers and Chemical Engineering 27 (2003) 313–326 8. The Prot´eg´e Ontology Editor and Knowledge Acquisition System, (2004) http://protege.stanford.edu 9. Yang, A., Marquardt, W., Stalker, I., Fraga, E., Serra, M., Pinol, D.: Principles and informal specification of OntoCAPE, Technical report, COGents project, WP2. (2003) 10. Agent Building and Learning Environment (ABLE). http://www.research.ibm.com/able 11. Reticular Systems. AgentBuilder - An integrated Toolkit for Constructing Intelligence Software Agents. (1999) http://www.agentbuilder.com. 12. FIPA-OS. http://www.nortelnetworks.com/products/announcements/fipa/ index.html. 13. JADE - Java Agent DEvelopment Framework. http://jade.tilab.com. 14. Nwana, H., S., Ndumu D., T., Lee L., C.: ZEUS: An advanced Tool-Kit for Engineering Distributed Multi-Agent Systems. In: Proc of PAAM98, (1998) 377–391 15. JESS, the Rule Engine for the Java platform. http://herzberg.ca.sandia.gov/jess/ 16. Balliu, N.: An object-oriented approach to the modelling and dynamics of granulation circuits, PhD Thesis, School of Engineering, The University of Queensland, Australia 4072. (2004)
80
R. Lakner et al.
17. N´emeth, E., Cameron, I. T., Hangos, K. M.: Diagnostic goal driven modelling and simulation of multiscale process systems. Computers and Chemical Engineering 29 (2005) 783-796. 18. N´emeth, E., Lakner R., Hangos K. M., Cameron I. T.: Prediction-based diagnosis and loss prevention using model-based reasoning. In: Lecture Notes in Artificial Intelligence, Vol. 3533, Springer-Verlag, (2005) 367-369.
Emergence of Cooperation Through Mutual Preference Revision Pedro Santana1 and Lu´ıs Moniz Pereira2 1
IntRoSys S.A. Universidade Nova de Lisboa, Quinta da Torre, Campus FCT-UNL 2829-516 - Portugal 2
Abstract. This paper proposes a method allowing an agent to perform in a socially fair way by considering other agents’ preferences. Such a balanced action selection process is based on declarative diagnosis, which enables the removal of contradictions arising as all agents’ preferences are confronted within the deciding agent. Agents can be negatively affected when some of their preferences are not respected for the sake of a global compromise. The set of preferences to be yielded by agents in order to remove all contradictions in a balanced way (i.e. the diagnosis that better manages how each agent is to be affected) is determined by minimising a cost function computed over application independent features. By respecting the resulting non-contradictory preferences set, the deciding agent acts cooperatively. Keywords: Preference revision, multi-agent systems, cooperative behaviour.
1 Introduction Preference criteria are subject to be modified when new information is brought to the knowledge of the individual, or aggregated when we need to represent and reason about the simultaneous preferences of several individuals. As a motivating example, suppose you invite three friends Karin, Leif and Osvald to go and see a movie. Karin prefers thrillers to action movies. Leif, on the other hand, prefers action movies to thrillers. Finally, Osvald is like Leif and prefers action movies to thrillers. Suppose you need to buy the tickets. Which movie do you choose? One way to consider preferences in Multi-Agent scenarios, is to devise a strategy for removal of contradictions, which may arise when the preferences of all agents are put together. For instance, if agent A prefers a to b and agent B the other way around, then we have a contradiction (a synonym of conflict in this context). Removing the referred contradiction is another way of saying that at least one of the agents will have to relax its own preferences so a trade-off is achieved. This paper proposes a methodology along this line of reasoning, that can be used as an agent mental process to allow the agent to perform in a fair manner. For instance, an agent engaged in a task that requires to make some choices, like selecting a TV programme, may choose to act fairly by considering others’ preferences as well. Another possible application for the method set forth herein is to help a broker (i.e. an agent M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 81–90, 2006. c Springer-Verlag Berlin Heidelberg 2006
82
P. Santana and L.M. Pereira
responsible for distributing work) to allocate agents to those tasks that better suit them; in this case preferences can involve skills as well. Other work on preferences revision can be found in [1, 2], where the authors study the preservation of properties by different composition operators of preference relations definable by first-order formulas. In terms of preferences aggregation [3] handles different types of relationships that take into account the relative importance of the interacting agents, whereas [4] extended CP nets to handle preferences of several agents based on the notion of voting. In [5], a stimulating survey of opportunities and problems in the use of preferences, reliant on AI techniques, is presented.
2 Approach Overview This paper presents some of the concepts presented in [6], which proposes an adapted version of the contradiction removal method defined for the class of normal logic programs plus integrity constraints, as proposed in [7], which transforms a given program into its revisable form. Then, it follows that each of the stable models of the revisable program specifies which preferences minimally need to be yielded or added for that program model to be consistent. Finally, from all preference revision minimal stable models, the one that is fairest (i.e. seeking highest cooperation while minimising losses) is selected and proffered to the end-user. The proposed method involves the following steps: (1) set the preferences for each agent; (2) integrate all agents’ preferences into a single merged program; (3) extend the merged program into a revisable program form; (4) determine the minimal stable models of the revisable program so as to generate all possible revision hypotheses; (5) from the set of minimal stable models select the most fair one, taking into account present and past iterations; and (6) repeat the whole process as desired.
3 Background Concepts 3.1 Stable Models Let L be a first order language. A literal in L is an atom A in L or its default negation not A. A Normal Logic Program (NLP) P over L (sometimes simply called program) is a finite set of rules of the form H ← B1 , B2 , · · · , Bn , not C1 , not C2 , · · · , not Cm , with (n ≥ 0, m ≥ 0) comprising positive literals H, and Bi , and default literals not Cj . LP denotes the language of P . Models are 2-valued and represented as sets of those positive literals which hold in the model. The set inclusion and set difference are with respect to these positive literals. Minimality and maximality too refer to this set inclusion. Definition 1. [8] Let P be a NLP and I a 2-valued interpretation. The GL-transformation of P modulo I is the program PI , obtained from P by (1) removing from P all rules which contain a default literal not A such that A ∈ I and (2) removing from the remaining rules all default literals. Since PI is a definite program, it has a unique least model J. Define ΓP (I) = J. Stable models are the fixpoints of ΓP , and they do not always exist (namely when a finite program contains loops over an odd number default negations).
Emergence of Cooperation Through Mutual Preference Revision
83
3.2 Preference Relation Given a set N , a preference relation is any binary relation on N . Given two elements a and b in N , a b means that a is preferred to b. We assume that N contains at least two elements. We restrict to satisfy the properties of a strict partial order, which involves: Irref lexivity : ∀x, x x Asymmetry : ∀x∀y, x y ⇒ y x T ransitivity : ∀x∀y∀z, (x y ∧ y z) ⇒ x z Let us call the above properties integrity constraints, which are employed to detect preference contradictions (others could be added); they can be described in logic programming as follows: ⊥ ← p(x, x).
⊥ ← p(x, y), p(y, x).
⊥ ← p(x, y), p(y, z), not p(x, z).
where x, y, and z are variables, ⊥ represents a contradiction, when present in a model of a program, and p(a, b) represents a preference of type a b. An agent i can define its preferences by adding facts of type pi (x, y). As previously stated, the preferences of all agents have to be merged into a merged program. As a consequence, we have to add the following clause to the integrity constraints set: p(x, y) ← pi (x, y). Thus, there will exist a p predicate rule for each corresponding pi predicate. If one of the integrity constraints rules succeeds, then there is a contradiction in the merged program. 3.3 Declarative Diagnosis In this section we adopt the definitions set forth in [6]. Given a contradictory program P , i.e. with a contradictory stable model, to revise (i.e. eliminate) the contradiction symbol (⊥) from its models we need to modify P by adding and removing rules. In this framework, the diagnostic process reduces then to finding such combinations of rules. To specify which rules in P may be added or removed, we assume given a set C of predicate symbols in LP . C induces a partition of P into two disjoint parts: a changeable one Pc and stable one Ps . Pc contains the rules in P defining predicate symbols in C, while Ps contains the rules in P defining predicate symbols not belonging to C. Pc is the part subject to the diagnosis process. Definition 2. Let P be a program and C a set of predicate symbols in LP . Let D be a pair U, I where U is a set of atoms, whose predicate symbols are in C and I ⊆ Pc . Then, D is a diagnosis for P iff (P − I) U ⊥. The pair {}, {} is called empty diagnosis. Intuitively, a diagnosis specifies the rules to be added and removed from the changeable part of P to revise its contradiction ⊥. In order to minimise the number of changes, one should consider minimal diagnosis. Definition 3. Let P be a program and D = U, I a diagnosis for P . Then, D is a minimal diagnosis for P iff there exists no diagnosis D2 = U2 , I2 for P such that (U2 I2 ) ⊆ (U I).
84
P. Santana and L.M. Pereira
Let us now clarify what Pc and Ps are. Ps (the stable partition of program P ) refers to the integrity constraints and preferences that agents consider as non-negotiable (i.e. those preferences agents do no accept to discard at the cost of rejecting further cooperation). In this work we only considered negotiable preferences. On the other hand, Pc refers to the agents’ preferences that will be subject to revision (i.e. that can be yielded). As an example, let us assume that there are two agents, one with a single preference and another one with two preferences; these define Pc : ⎛ ⎞ ⊥ ← p(x, x) ⎛ ⎞ ⎜ ⊥ ← p(x, y), p(y, x) ⎟ p1 (a, b) ⎜ ⎟ ⎜ Pc = ⎝ p2 (b, a) ⎠ Ps = ⎜ ⊥ ← p(x, y), p(y, z), notp(x, z) ⎟ ⎟ ⎝ p(x, y) ← p1 (x, y) ⎠ p2 (c, b) p(x, y) ← p2 (x, y) which contains those rules (i.e. preferences) that can be removed as well as those rules that have been added so as to guarantee consistency. In this case, one possibility would be to remove p1 (a, b) because it contradicts p2 (b, a) (non-reflexivity) whereas a rule p(c, a) would be added to ensure transitivity. Another possibility, which is simpler and so preferable, would be to remove p2 (b, a) and no further change would be required. Now it is necessary to extend P so it has a suitable form for contradiction removal. To do that, we have to consider the revisable set, which includes default atoms, not A, whose CWA (Closed World Assumption) makes them initially true, can be revised by adding A to P . By so doing, one can disable rules having such defaults in their body, and, as a consequence, remove some contradictions. Also, the positive atom A, being initially false, can be made true by adding it as a fact. This will enable rules with A introduced in their body. Definition 4. The revisables of a program P is a subset of the set of atoms A (with A = ⊥) for which there are no rules defining A in P . Definition 5. Let P be a program and V a set of revisables of P . A set Z ⊆ V is a revision of P with respect to V iff P Z ⊥. Definition 6. Let P be a program and C a set of predicate symbols in LP . The transformation Γ that maps P into a program P is obtained by applying to P the following two operations: (1) add not incorrect(A ← Body) to the body of each rule A ← Body in Pc and (2) add, for each predicate p with arity n in C, the rule p(x1 , · · · , xn ) ← uncovered(p(x1 , · · · , xn )). In our example, the revisable program would be: ⎛ ⎞ ⊥ ← p(x, x) p1 (a, b) ← not incorrect(p1 (a, b)) ⎜ ⊥ ← p(x, y), p(y, x) p2 (b, a) ← not incorrect(p2 (b, a)) ⎟ ⎜ ⎟ ⎜ ⊥ ← p(x, y), p(y, z), p2 (c, b) ← not incorrect(p2 (c, b)) ⎟ ⎟ Γ (P ) = ⎜ ⎜ ⎟ not p(x, z) ⎜ ⎟ ⎝ p(x, y) ← p1 (x, y) p1 (x, y) ← uncovered(p1 (x, y)) ⎠ p(x, y) ← p2 (x, y) p2 (x, y) ← uncovered(p2 (x, y))
Emergence of Cooperation Through Mutual Preference Revision
85
The transformation Γ preserves the truths of program P . Then, by adding instances of incorrect/1, one can eliminate unsound answers of predicate rules and, by adding instances of uncovered/1, one can complete predicates with missing answers.
4 Proposed Approach The proposed approach considers that, in opposition to definition 6, the uncovered literal is applied to pi /2 instead of p/2. That is to say, that those added preferences relate to the group (i.e. p/2) and not to an agent i in particular. With this alteration, the number of even loops is reduced, and as a consequence complexity as well. In addition, the preference to be added is usually related to the group and not to an agent in particular. Let us imagine a scenario where one agent prefers a to b whereas another prefers b to c. So that transitivity holds in that scenario, it is necessary to state that a is preferred to c. However, this information does not refer to any agent in particular - it belongs to the group as a whole. If considered otherwise, the addition of a preference could not be exploited as a metric of unhappiness; how could one assume that an agent has been disadvantaged with the addition of a preference that might be true (i.e. the referred agent may in fact prefer a to c)? The stable models of Γ (P ), minimal with respect to the revisables, that do not lead to a contradiction are potential candidates for the best diagnosis. Let us focus on how cooperation among agents is polarised by choosing which preferences have to be minimally yielded by each agent, in a fair way. From the set of such minimal stable models (i.e. in the set of possible minimal diagnoses), we have to select the best diagnosis. Since the entire process describes a mental process to be reiterated each time a task and/or the environment require, we need to take into account all previous losses an agent had in prior iterations (i.e. by recording every time some agent had to yield a preference). Therefore, the best diagnosis will be one determined by following a criterion of fairness. This is achieved by a cost function that weighs a given diagnosis in a fair way, i.e. by taking into account previous decisions, is minimised over the whole set of available diagnoses. 4.1 Yielding a Preference Let us analyse what yielding a preference formally means. Definition 7. Let P be a program containing all preferences of all agents involved in a given iteration, D = U, I a diagnosis for P, PA ⊆ P a program with all preferences for agent A and p(x, y) a preference of x over y. For each incorrect(p(x, y)) ∈ I, an agent A has ’yielded’ a preference if pA (x, y) ∈ PA . For each uncovered(p(x, y)) ∈ U , an agent A has ’yielded to add’ a preference. Computing the number of times an agent has to yield or yield to add a preference in a given diagnosis gives us an idea of how much displeased that agent will be with the solution trade-off. The presence of yielding to add preferences in a given solution may result in erroneous assumptions about others’ preferences; as such, solutions including such assumptions are to be penalised. Still, yielding a preference and yielding to add
86
P. Santana and L.M. Pereira
a preference have not the same weight in the unhappiness of the agent. Therefore, we consider different weights for each type, wy and wya respectively. In the experimental part of this paper wy = 2 and wya = 1 are considered; that is to say that, yielding a preference is twice worse than yielding to add a preference, which intuitively makes sense. The exact proportion should be subject to further analysis, but out of the scope of this paper. Notice the cost of yielding a preference only affects the agent in question, whereas the cost of yielding to add a preference affects all agents in the same amount. This means that yielding a preference increases the dispersion of unhappiness among agents, whereas yielding to add preferences increases the average number of unhappiness. 4.2 Diagnosis Assessment The goal is to determine which diagnosis (i.e. which minimal stable model) is less expensive in terms of the “yieldings” agents have to endorse. In order to do so, the cost of a diagnostic D is assessed via a cost function. Two main components are considered, namely: the average of yieldings, a(D), and the dispersion of yieldings, d(D). The average of yieldings is defined by a(D) = n1 · a∈A ωyield (a, D), where D is the diagnostic in question, A is the set of agents involved in the process, n is the number of agents in A, and ωyield (a, D) returns the total cost of agent a in diagnosis D. The total cost of an agent a in a diagnosis D is ωyield (a, D) = ωy · ny (a, D) + ωya · nya (a, D), where ωy and ωya are the weights of yielding and yielding to add a preference, respectively, ny (a, D) and nya (a, D) are the number of yield and yield to add preferences respectively in diagnosis D (see
definitions 7). 1 2 The dispersion of a diagnosis D, d(D) = a∈A (wyield (a, D) − a(D)) , n · refers to the concept of standard deviation from statistics. It is worth emphasising the quadratic term in the standard deviation definition, which increases the cost of a diagnosis quadratically, in relation to its displacement from the average. Accordingly, as the unfairness of solutions increases, their related cost increases at a faster pace. The standard deviation concept is used to measure how much unfair a diagnosis is for a given agent; in other words, the more disparate is the sacrifice of a given agent when compared to the average sacrifice, the more unfair the diagnosis is. The purpose of the cost function is to reduce the sacrifice of all agents as well as to avoid that some agents be much more sacrificed than the average. The next formula describes the minimising of the cost function in order to obtain the best diagnosis of the current iteration, bd [n]: wwin bd [n] = min (βd · d(D) + βa · a(D)) (1) D wall where wwin is the quantity of accumulated victories of the agent less sacrificed (i.e. with smaller wyield ) in D, wall is the quantity of accumulated victories the agent with greater amount of accumulated victories has, and βa and βd are weights. An agent accumulates a victory each time it is the one with smaller ωyield in the best diagnosis of a given iteration. It is assumed that the preferences of agents may change between iterations. The wwwin component endows the system with memory, which allows to take into all account previous iterations. This way, the selection of the best diagnosis takes into
Emergence of Cooperation Through Mutual Preference Revision
87
consideration that some agents may have been more sacrificed in the past than others. Intuitively, the possibilities of accepting the diagnosis in question as the best diagnosis, increase if the winning agent has less victories than the one with more victories, proportionally. As a result, all agents gradually converge into an homogeneous number of victories.
5 Experimental Results The proposed method has been implemented in the XSB-Prolog1 using the XSB-XASP Package2, which implements the Answer Set semantics (cf. [9] for other application examples) and allows computing the stable models of a given program. In this section we will briefly go over some implementation details. To do so, some (simplified for presentation purposes) XSB-PROLOG code will be exhibited. With the purpose of creating several scenarios that can be covered by different stable models, it is necessary to create even loops over default negation (where tnot is XSB-Prolog’s tabled not), such as: covered(X):-tnot(uncovered(X)). uncovered(X):-tnot(covered(X)).
incorrect(X):-tnot(correct(X)). correct(X):-tnot(incorrect(X)).
where each loop allows for its two opposite stable models solutions. The integrity constraints have been defined as follows (a two agent scenario has been considered): false(X,_,_) :- p(X,X). false(X,Y,_) :- p(X,Y), p(Y,X). false(X,Y,Z) :- p(X,Y), p(Y,Z), X \= Z, tnot( p(X,Z) ).
Agents’ preferences can be defined in this manner (note that incorrect/1 literals have already been added): p(son,X,soaps) :- tnot( incorrect(p(son,X,soaps)) ). p(son,cinema,X) :- tnot( incorrect(p(son,cinema,X)) ).
which states that the son prefers to see anything in the TV to soaps, and prefers cinema to anything else. This code allows creating scenarios where new preferences are added (i.e. where uncovered/1 literals are inserted): p(X,Y):-p(agent_1,X,Y). p(X,Y) :- p(agent_2,X,Y). p(X,Y):-not p(agent_1,X,Y),not p(agent_2,X,Y),uncovered(p(X,Y)).
The following code generates several hypotheses (i.e. stable models), which are then considered for the selection of the best diagnosis), by performing a call to the XSBXASP package so it computes in Res the stable models with ok as the top goal; ok succeeds only if a given scenario does not fail to cope with each integrity constraint: 1 2
http://xsb.sourceforge.net http://xsb.sourceforge.net/packages/xasp.pdf
88
P. Santana and L.M. Pereira
ok :- tnot( notok ). notok :- option(X), option(Y),option(Z), notok(X,Y,Z). notok(X,Y,Z) :- false(X,Y,Z). pstable_model(ok,Res,1) % Top goal call
An everyday multi-agent conflict resolution scenario is the one we can find in our houses when all family members have to cooperate (at least it is so expected) to decide on which TV programme they will watch. A simple version of such a scenario has been implemented with these characteristics: Involved Agents: Father, mother, and son; Available TV Programmes: Soaps, news, cinema, and documentaries; Weights: β1 = 1 and β2 = 3. β2 should be greater than β1 in order to guarantee that the solutions converge to a small amount of yielded preferences. As long as this proportion between the two parameters is kept, the final results are not very sensitive to parameter variation. Son’s preferences: cinema x; documentaries news Mother’s preferences: soaps x; cinema news; documentaries news Father’s preferences: x soaps; news x; documentaries news Starting with the accumulated victories as (mother, f ather, son) = (1, 1, 1), the selected TV program (such that no other is preferred to in the best diagnosis) is cinema, the accumulated victories change to (mother, f ather, son) = (1, 1, 2), and the explanation is (“inc” represents “incorrect”): [inc(p(son,cinema,cinema)),inc(p(father,news,cinema)), inc(p(father,news,news)),inc(p(mother,soaps,cinema)), inc(p(mother,soaps,news)),inc(p(father,soaps,soaps)), inc(p(mother,soaps,soaps))]
The son is the one yielding less preferences and as a result he wins. Cinema is the only TV programme such that no other is preferred to in the best diagnosis and so it is selected. All preferences leading to contradiction are said to be incorrect. For instance, the son prefers cinema to anything, but he may not prefer cinema to cinema itself (irreflexivity). The father prefers news to anything else, but since the son prefers cinema to anything else (including news), the father has to yield the preference of news over cinema in particular (asymmetry). Other solutions exist, such as adjusting son’s preferences so the father could see what he most prefers. However, those solutions are not so balanced (i.e. fair) than the selected one, in the current context. In a subsequent iteration the selected TV programme is soaps, the accumulated victories become (mother, f ather, son) = (2, 1, 2), and the explanation is: [inc(p(son,cinema,cinema)),inc(p(father,cinema,soaps)), inc(p(son,cinema,soaps)),inc(p(father,news,cinema)), inc(p(father,news,news)),inc(p(father,news,soaps)), inc(p(father,soaps,soaps)),inc(p(mother,soaps,soaps))]
Although seeing soaps increases the total number of preferences to be yielded (i.e. now the son has to yield two preferences), soaps is preferred because the mother was in
Emergence of Cooperation Through Mutual Preference Revision
89
disadvantage in terms of victories. If the past had not been considered, at this iteration the result would be the same as in the previous iteration. In a third iteration, the selected TV programme is soaps, the accumulated victories become (mother, f ather, son) = (2, 2, 2) and the explanation is: [inc(p(son,cinema,cinema)),inc(p(son,cinema,news)), inc(p(mother,cinema,news)),inc(p(son,cinema,soaps)), inc(p(father,news,news)),inc(p(mother,soaps,cinema)), inc(p(mother,soaps,news)),inc(p(father,soaps,soaps)), inc(p(mother,soaps,soaps))]
A new best diagnosis is produced, which is caused by the disadvantage of the father. From this sequence of three examples, one can observe the evolution of the solution so as to maintain everybody happy. In a second experiment a new son’s preference, x soaps, which is going to augment the displeasure in watching soaps (the father also prefers anything to soaps), has been added. Resetting the accumulated victories to (mother, f ather, son) = (1, 1, 1) we obtain cinema as the selected TV programme, (mother, f ather, son) = (1, 1, 2) as the new accumulated victories, and the following explanation: [inc(p(son,cinema,cinema)),inc(p(father,news,cinema)), inc(p(father,news,news)),inc(p(mother,soaps,cinema)), inc(p(mother,soaps,soaps)),inc(p(father,soaps,soaps)), inc(p(son,soaps,soaps)),inc(p(mother,soaps,soaps))]
In line with the victory of the son, cinema is selected. Running a new the selected TV program is cinema, (mother, f ather, son) = (2, 1, 2) is the new set of accumulated victories, and the explanation is as follows: [inc(p(son,cinema,cinema)),inc(p(father,news,cinema)), inc(p(father,news,news)),inc(p(father,news,soaps)), inc(p(son,news,soaps)),inc(p(mother,soaps,cinema)), inc(p(father,soaps,soaps)),inc(p(son,soaps,soaps)), inc(p(mother,soaps,soaps))]
This time the mother wins since she was at a disadvantage in terms of victories; nevertheless, cinema is seen rather than soaps. In a last iteration we get news as the selected programme, (mother, f ather, son) = (2, 2, 2) as the new accumulated victories, and the following explanation: [inc(p(son,cinema,cinema)),inc(p(son,cinema,news)), inc(p(mother,cinema,news)),inc(p(father,news,news)), inc(p(mother,soaps,cinema)),inc(p(mother,soaps,news)), inc(p(father,soaps,soaps)),inc(p(son,soaps,soaps)), inc(p(mother,soaps,soaps))]
The father wins because he was at a disadvantage in terms of victories and, as a consequence, news is the selected programme. This experience demonstrates that it is not enough to be disadvantaged in terms of victories to obtain what one desires. More concretely, the mother was at a disadvantage and, as a result, she won the second iteration; still, soaps was not the chosen programme category. This occurred because the son also made clear that he would prefer to see anything but soaps.
90
P. Santana and L.M. Pereira
6 Concluding Remarks A method for preference revision in a multi-agent scenario has been presented. Instead of considering explicit priorities among agents and/or preferences, the method proposes a dynamic approach. A cost function that considers generic features of the solution (e.g. the quantity of preferences yielded by agents) has been employed to obtain a general approach, avoiding parameters too application dependent. Introducing memory in the revision process enables the emergence of cooperation as iterations unfold. Cooperation shows up in the form of an homogeneous amount of victories amongst all agents. Assessing which preferences one should add or remove in the way prescribed in this paper, allows us to enact a flexible method for preference revision. If instead priorities among preferences are to be considered as the sole method for preference revision, intensive and tedious parameter tuning is going to be required so as to guarantee that preferences are conveniently revised. In such a memory-less solution, the system is not able to evolve towards cooperation as iterations unfold, resulting in fully deterministic and static solutions. On the contrary, we consider our approach of special interest for dynamic environments. We have employed the two-valued Stable Models semantics to provide meaning to our logic programs, but we could just as well have employed the three-valued WellFounded Semantics [10] for a more skeptical preferential reasoning. Also, we need not necessarily insist on a strict partial order for preferences, but have indicated that different constraints may be provided.
References 1. Chomicki, J.: Preference formulas in relational queries. ACM Transactions on Database Systems 28 (2003) 427–466 2. Andreka, H., Ryan, M., Schobbens, P.Y.: Operators and laws for combining preference relations. Journal of Logic and Computation 12 (2002) 13–53 3. Yager, R.R.: Fusion of multi-agent preference ordering. Fuzzy Sets and Systems 117 (2001) 1–12 4. Rossi, F., Venable, K.B., Walsh, T.: mCP nets: representing and reasoning with preferences on multiple agents. In: Procs. of the 19th Conf. on Artificial Intelligence, LNCS 749, AAAI Press (2004) 729–734 5. Doyle, J.: Prospects for preferences. Computational Intelligence 20 (2004) 111–136 6. Dell’Acqua, P., Pereira, L.M.: Preference revision via declarative debugging. In: Progress in Artificial Intelligence, Procs. 12th Portuguese Int. Conf. on Artificial Intelligence (EPIA’05), Covilh˜a, Portugal, Springer, LNAI 3808 (2005) 7. Pereira, L.M., Dam´asio, C., Alferes, J.J.: Debugging by diagnosing assumptions. In Fritzson, P., ed.: Procs. of the 1st Int. Workshop on Automatic Algorithmic Debugging (AADEBUG’93), Springer–Verlag, LNCS 749 (1993) 58–74 8. Gelfond, M., Lifschitz., V.: The stable model semantics for logic programming. In: Procs. of the 5th Int. Logic Programming Conf., MIT Press (1998) 9. Baral, C.: Knowledge Representation, Reasoning and Declarative Problem Solving. Cambridge U.P. (2003) 10. Gelder, A.V., Ross, K.A., Schlipf, J.S.: The well-founded semantics for general logic programs. J. ACM 38 (1991) 620–650
Running Contracts with Defeasible Commitment Ioan Alfred Letia and Adrian Groza Technical University of Cluj-Napoca Department of Computer Science Baritiu 28, RO-400391 Cluj-Napoca, Romania {letia, adrian}@cs-gw.utcluj.ro
Abstract. Real life contracts imply commitments which are active during their running window, with effects on both normal runs as well as in the case of exceptions. We have defined defeasible commitment machines (DCMs) to provide more flexibility. As an extension to the task dependency model for the supply chain we propose the commitment dependency network (CDN) to monitor contracts between members of the supply chain. The workings of the DCMs in the CDN is shown by a simple scenario with supplier, producer, and consumer. Keywords: Multi-agent systems, Autonomous agents, Internet applications.
1
Introduction
Although contracts are a central mechanism for defining interactions between organizations, there is currently inadequate business support for using the information provided by these contracts. The current requirements of the supply chains [1] demand a more outward view on contract management for each entity within the chain. The emerging services science deals with such issues: 1) services contract specifications, cases, and models; 2) service level agreements; 3) automatic and semi-automatic services contract generation and management; 4) legal issues in services contract and operations; 5) decision support systems for contracts operations. Our work [2] belongs to the above trend, based on the temporalised normative positions in defeasible logic [3] using a special case of the nonmonotonic commitment machines [4]. In the supply chain context a contract breach can be propagated over the entire chain, with rules imposed by law that help agents to manage perturbations in the supply chain. Each agent has more than one way to respond to a perturbation, following remedies that are adequate for an efficient functionality of the supply chain: expectation damages, opportunity cost, reliance damages, and party designed damages [5]. The main contribution of this paper consists in showing how the defeasible commitment machines (DCMs) can be used within a commitment dependency network (CDN). In the next section we describe the temporalised normative positions that we use in section 3 to define DCMs and contracts. In the section 4 M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 91–100, 2006. c Springer-Verlag Berlin Heidelberg 2006
92
I.A. Letia and A. Groza
we show for a simple scenario how contracts are executed and the section 5 discusses how exceptions can be captured in our framework.
2
Temporalised Normative Positions
We are using the temporalised normative positions [3]: A normative defeasible theory (NDL) is a structure (F , RK , RI , RZ , RO , ) where F is a finite set of facts, RK RI RZ RO are respectively a finite set of persistent or transient rules (strict, defeasible, and defeaters) for knowledge, intentions, actions, and obligations, and representing the superiority relation over the set of rules. A rule in NDL is characterized by three orthogonal attributes: strength, persistence, modality. RK represents the agent’s theory of the world, RZ encodes its actions, RO the normative system (obligations), while RI and the superiority relation capture the agent’s strategy or his policy. The conclusion of a persistent rule holds at all instants of time after the conclusion has been derived, unless a superior rule has derived the opposite conclusion, while a transient rule establishes the conclusion only for a specific instance of time [3]. Whenever the premises of strict rules are indisputable then so is the conclusion, while defeasible rules are rules that can be defeated by contrary evidence. Defeaters are rules that cannot be used to draw any conclusions, their only use is to prevent some conclusions, as in ”if the customer is a regular one and has a short delay for paying, we might not ask for penalties”. This rule cannot be used to support a ”not penalty” conclusion, but it can prevent the derivation of the penalty conclusion. →tX , ⇒tX and tX are used for transient rules (strict, defeasible respectively defeaters), →pX , ⇒pX and pX for persistent rules (strict, defeasible respectively defeaters), where X ∈ {K, I, Z, O} represents the modality. A conclusion in NDL is a tagged literal where +ΔτX q : t means that q is definitely provable of modality τ X, at time t in N DL (fig. 1) and +∂X q : t means q is defeasibly provable of modality X, at time t in N DL (fig. 2, 3). Similarly, −ΔτX q : t means that q is τ not definitely provable of modality X and −∂X q : t says that q is not defeasibly provable of modality X. Here τ ∈ {t, p}, t stands for transient, while p for a persistent derivation. A strict rule r ∈ Rs is ΔX − applicable if r ∈ Rs,X ∀a : tk ∈ A(r) : ak : tk is ΔX − provable, and is ΔX − discarded if r ∈ Rs,X ∃ak : tk ∈ A(r) : ak : tk is ΔX − rejected. Conditions for ∂X − applicable and ∂X − discarded are similar, with Δ replaced by ∂. The conditions for concluding whether a query is transient or persistent, definitely provable is shown in the figure 1. For the transient case, at step i + 1 one can assert that q is definitely transient provable if there is a strict transient rule r ∈ Rst with the consequent q and all the antecedents of r have been asserted to be definitely (transient or persistent) provable, in previous steps. For the persistent case, the persistence condition (3) allows us to reiterate literals definitely proved at previous times. For showing that q is not persistent definitely provable, in addition to the condition we have for the transient case, we
Running Contracts with Defeasible Commitment
93
+ΔtX : If P (i + 1) = +ΔtX q : t then q : t ∈ F , or t ∃r ∈ Rs,X [q : t] r is ΔX − applicable +ΔpX : If P (i + 1) = +ΔpX q : t then q : t ∈ F , or p ∃r ∈ Rs,X [q : t] r is ΔX − applicable or ∃t ∈ Γ : t < t and +ΔpX q : t ∈ P (1..i). Fig. 1. Transient and persistent definitely proof for modality X t t +∂X : If P (i + 1) = +∂X q : t then (1) +ΔX q : t ∈ P (1..i) or (2)−ΔX ∼ q : t ∈ P (1..i) and (2.1) ∃r ∈ Rsd,X [q : t]: r is ∂X -applicable and (2.2) ∀s ∈ R[∼ q : t]: s is ∂X -discarded or ∃w ∈ R(q : t) : w is ∂X -applicable or w s
Fig. 2. Transient defeasible proof for modality X p t +∂X : If P (i + 1) = +∂X q : t then p (1) +ΔX q : t ∈ P (1..i) or (2)−ΔX ∼ q : t ∈ P (1..i), and p (2.1) ∃r ∈ Rsd,X [q : t]: r is ∂X -applicable, and (2.2) ∀s ∈ R[∼ q : t]: either s is ∂X -discarded or ∃w ∈ R(q : t): w is ∂X -applicable or w s; or p (3) ∃t ∈ Γ : t < t and +∂X q : t ∈ P (1..i) and (3.1) ∀s ∈ R[∼ q : t”], t < t” ≤ t, s is ∂X -discarded, or ∃w ∈ R(q : t”): w is ∂X -applicable and w s.
Fig. 3. Persistent defeasible proof for modality X
have to assure that, for all instances of time before now the persistent property has not been proved. According to the above conditions, in order to prove that q is definitely provable at time t we have to show that q is either transient, or persistent definitely provable [3]. Defeasible derivations have an argumentation like structure [3]: first we choose a supported rule having the conclusions q we want to prove, second we consider all the possible counterarguments against q, and finally we rebut all the above counterarguments showing that, either some of their premises do not hold, or the rule used for its derivation is weaker than the rule supporting the initial conclusion q. ∼ q denotes the complementary of literal q (if q is the positive p then ∼ q is ¬p; if q is ¬p then ∼ q is p). A goal q which is not definitely provable is defeasibly transient provable if we can find a strict or defeasible transient rule for which all its antecedents are defeasibly provable, ∼ q is not definitely provable
94
I.A. Letia and A. Groza
and for each rule having ∼ q as a consequent we can find an antecedent which does not satisfy the defeasible provable condition (fig. 2). For the persistence case, the aditional clause (3) (fig. 3) verifies if the literal q : t has been persistent defeasibly proved before, and this conclusion remained valid all this time since there has been no time t when the contrary ∼ q was proved by the rule s, or that rule was not stronger than the one sustaining q.
3
Defeasible Commitment Machines
Commitment machines were proposed as a formalism for declarative specification of protocols. We view a contract as a protocol binding different parties to their commitments by specifying the type of services agreed upon, the obligations, and the remedies in case of breach. Contracts are represented by defeasible commitment machines (DCM), that is a theory in the normative defeasible logic (NDL) consisting of two parts. The first part captures the representation of commitments and the operations on them in NDL (section 3.2) as a contract independent theory, while the second is contract dependent and includes rules describing specific contractual clauses (section 3.3). 3.1
Standard Commitments
We use the notion of commitment for the clauses of the contract, translated into facts, definitions, or normative rules. The commitments capture the obligations of one party towards the nother. Realistic approaches attach deadlines to commitments in order to detect their breach or satisfaction. A base-level commitment C(x, y, p : tmaturity ) : tissue binds a debtor x to a creditor y for fulfilling the proposition p until the deadline tmaturity . A conditional commitment CC(x, y, q : tmaturity , p : tmaturity ) : tissue denotes that if a condition q is brought about at tmaturity , then the commitment C(x, y, p : tmaturity ) : tissue will hold. In the conditional commitment CC(s, b, pay(Pc ) : tmaturity , deliver(gi ) : tmaturity + 3) : tissue the agent s (representing the seller agent or the debtor) assumes the obligation to agent b (representing the buyer or the creditor) to deliver the item gi in three days after the buyer has paid the price Pc . A commitment may be in one of the following states: active (between tissue and tmaturity and ¬breach), violated (tmaturity ≤ tcurrent and the commitment was not discharged or released) or performed (if the debtor executes it until tmaturity ). The operations for the manipulations of commitments [6] are: – Create(x, C) : tissue - the debtor x signs the commitment C at time tissue (can only be performed by the C’s debtor x); – Cancel(x, C) : tbreach - the debtor x will no longer satisfy its obligation (this can usually1 , be performed only by C’s debtor x); 1
The current practice in law decommits an agent from its obligations in some special situations (i.e. the creditor has lost his rights). Hence, the normative agent that monitors the market can also cancel some commitments.
Running Contracts with Defeasible Commitment
95
– Release(y, C) : tx - releases C’s debtor x from commitment C (performed by the creditor y); – Assign(y, z, C) : tx - replaces arbitrarily y with z as C’s creditor (performed by the creditor y); – Delegate(x, z, C) : tx - replaces x with z as C’s debtor (performed by the debtor x); – Discharge(x, C) : tx - C’s debtor x fulfills the commitment. These operations cannot be carried out arbitrarily. They are subject to rules that govern the electronic market and which set the power of agents within that market [4]. An agent has power when one of its actions determines a normative effect. For instance, the agents must have the power to delegate or assign a commitment, otherwise their operations have no normative consequence. 3.2
Commitments in Temporalised Normative Positions
We enhanced the task dependency network model [1, 5] used to model the supply chain. A commitment dependency network (CDN) is a graph (V,E) with vertices V = G ∪ A, where: G = the set of commitments, A = S ∪ P ∪ C the set of agents, S = the set of suppliers, P = the set of producers, C = the set of consumers, and a set of edges E connecting agents with their input and output commitments. An output commitment for agent a is a commitment in which a is the debtor, while an input commitment for agent a is a commitment in which a is the creditor. With each agent a we associate an input set Ia = {c ∈ C|c, a ∈ E} containing all the commitments where a is creditor and an output set Oa = {c ∈ C|a, c ∈ E} containing all the commitments where a is debtor. Agent a is a supplier if Ia = 0, a consumer if Oa = 0, and a producer otherwise. Such a multi-party commitment network is satisfiable if all the commitments may be discharged [7]. Following the steps in [4], we have defined [2] the defeasible commitment machine (DCM) using the normative defeasible logic instead of the causal logic with the goal to increase the flexibility of the commitments. Commitments in NDL are declared persistent knowledge →pK C(x, y, p) : ti , →pK CC(x, y, q, p) : ti . The rules of a DCM (fig. 4) capture the meaning of the operations, where tm stands for tmaturity , representing the deadline attached to the commitment. Persistent conclusions remain valid until a more powerful derivation retracts them (for instance r2 r1 ). For the life cycle of a commitment, cancellation means an exception which appears in contract execution (rules r3 and r4 ). Usually, cancellation is compensated by activating another commitment or a contrary-to-duty obligation. The debtor may propose another commitment which is more profitable for both partners in the light of some arising opportunities on the market or just recognizes its incapacity to accomplish the task. The sooner the notification, the lower the damages. In some situations, a commitment may be active even after it is breached [6], expresed here by the defeasible rule r3 . Therefore, a normative agent can block the derivation of that conclusion in order to force the execution of a specific commitment. The same reason is valid for the rules r13 and r14 when the debtor does not execute its commitment until the deadline tm . However, a
96
I.A. Letia and A. Groza
r1 : Create(x, y, p : tm ) : tissue →pK C(x, y, p : tm ) : tissue r2 : Discharge(x, y, p : tm ) : tperf →pK ¬C(x, y, p : tm ) : tperf r3 : Cancel(x, y, p : tm ) : tbreach ⇒pK ¬C(x, y, p : tm ) : tbreach r4 : Cancel(x, y, p : tm ) : tbreach ⇒pK C(x, y, contrary to duty : tm ) : tbreach r5 : Release(x, y, p : tm ) : trelease →pK ¬C(x, y, p : tm ) : trelease r6 : Delegate(x, y, p : tm , z) : tdelegate ⇒pK ¬C(x, y, p : tm ) : tdelegate r7 : Delegate(x, y, p : tm , z) : tdelegate ⇒pK C(z, y, p : tm ) : tdelegate r8 : Assign(x, y, p : tm , z) : tassign ⇒pK ¬C(x, y, p : tm ) : tassign r9 : Assign(x, y, p : tm , z) : tassign ⇒pK C(x, z, p : tm ) : tassign r10 : CCreate(x, y, q : tm , p : tm + τ ) : tissue →pK CC(x, y, q : tm , p : tm + τ ) : tissue r11 : CDischarge(x, y, q : tm , p : tm + τ ) : tperf →pK ¬CC(x, y, q : tm , p : tm + τ ) : tperf r12 : CDischarge(x, y, q : tm , p : tm + τ ) : tperf →pK C(x, y, p : tm + τ ) : tperf r13 : tcurrent > tm ∧ C(x, y, p : tm ) : tm ⇒pK ¬C(x, y, p : tm ) : tm r14 : tcurrent > tm ∧ C(x, y, p : tm ) : tm ⇒pK C(x, y, contrary to duty : tm ) : tm r2 r1 , r3 r1 , r5 r1 , r6 r1 , r8 r1 , r11 > r10 Fig. 4. Defeasible commitment machine
r18 r19 r20 r21 r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 r32
: SendRequest : tx →pK request : tx : SendOf f er : tx →pK of f er : tx : SendOf f er : tx →tZ CCreate(M, C, acceptC : tm , goods : tm + 2) : tx : SendAccept : tx →pK accept : tx : accept : tx ∧ CC(M, C, acceptC : tm , goods : tm + 3) : tx →tZ CDischarge(M, C, acceptC : tm , goods : tm + 3) : tx : SendAccept : tx →tZ CCreate(C, M, goods : tm , pay : tm + 2) : tx : SendGoods : tx ⇒pK goods : tx + 2 : SendGoods : tx →tZ CCreate(M, C, pay : tm , receipt : tm + 1) : tx : goods : tx ∧ CC(C, M, goods : tm , pay : tm + 2) : tx →tZ CDischarge(C, M, goods : tm , pay : tm + 2) : tx : goods : tx ∧ C(M, C, goods : tm ) →tZ Discharge(M, C, goods : tm ) : tx : SendP ayment : tx →pK pay : tx + 1 : pay : tx ∧ CC(M, C, pay : tm , receipt : tm + 1) : tx →tZ CDischarge(M, C, goods : tm , pay : tm + 2) : tx : pay : tx ∧ C(C, M, payC : tm ) : tx →tZ Discharge(C, M, payC : tm ) : tx : SendReceipt : tx ⇒pK receipt : tx + 1 : receipt : tx ∧ C(M, C, receipt : tm ) : tx →tZ Discharge(M, C, receipt : tm ) : tx Fig. 5. Contracts in Temporalised Normative Positions
commitment cannot be active after it is satisfied (rule r2 ). Note that assign and delegate are defeasible, because agents need special power to execute them. 3.3
Contract Specification in DCM
The rules in the figure 5 use the DCM for representing a specific contract between two agents, expressing actions that spread over more instances of time (i.e rules r24 , r28 , r31 ). For instance, the rule r24 says that the seller agent starts the action SendGoods at time tx , but the items reach the destination only after two
Running Contracts with Defeasible Commitment
97
days, when the fluent goods becomes true. The same rules are also defeasible, meaning that if an unpredictable event appears (i.e. an accident), their consequent fluents may be retracted. The execution of the contract may start from any state, because there is no specific order of actions. This can be useful for the supply chain, where long time business relationships suppose that the first steps in contract negotiation are no longer needed.
4
Running the Contracts
In the simple scenario of figure 6 the supplier A commits to deliver the item g1 no later than the deadline tm . The producer B commits to pay the item in maximum 3 days after receiving it, and also commits to deliver g2 until tm . The consumer C commits to pay no later than 2 days after obtaining the product. The commitment dependency network specifies which commitments are in force at a particular instance of time. The picture illustrates the instance t1 of time from the figure 7, which traces an execution in this scenario. At time t0 the consumer C notifies agent B that it intends to buy the item g2 paying the price Pc in 2 days after the shipment was made. Consequently, producer B asks the supplier A for delivering the item g1 at the price Pc . At time t1 agent A commits to deliver its output item, and at time t2 it executes the action SendGoods(g1 ). Since it is a transient derivation, the agent will send it just once. According to rule r25 , the consequence of the operation is the commitment →pK CC(A, B, Pc : tm , receipt : tm + 1) : t2 . At time t4 , according to rule r24 , the items arrive (⇒pK g1 : t4 ) and agent B pays the price Pc for them. This defeasible derivation can be defeated by an unpredictable event. The fluent g1 : t4 fires the rule r26 , the conditional commitment is discharged and applying rule r12 a baselevel commitment is created (→pK C(B, A, Pc : t7 ) : t4 ). At time t10 all fluents are true showing that the goods g1 and g2 were delivered, the amounts Pc and Pc were paid and both receipts were sent, meaning that the system has reached a desirable state [8]. The goal of executing a contract does not consist in just performing certain sequences of actions, but to reach a desirable state. Observe that not all the possible commitments specifying the contract had been activated. Such situations often arise when the agents are running a long time business relationships and they do no initiate their interactions from a start state. This is an argument for - cn 1
- cn 2
Commitments: c = C(A, B, g : t ) : t ? ? c1 = C(B, C, g1 : tm ) : t1 2 2 1 m A B C c3 = CC(C, B, g2 : tm , Pc : tm + 2) : t1 6 6 c4 = CC(B, A, g1 : tm , Pc : tm + 3) : t1 cn cn 4 3 Fig. 6. Commitment dependency network: supplier A, producer B, consumer C
98
I.A. Letia and A. Groza
t0 B : →tZ Create(B, A, g1 : tn , Pc : tn + 3) : t0 C : →tZ Create(C, B, g2 : tn , Pc : tn + 2) : t0 DCM : ⇒p ⇒p K CC(B, A, g1 : tn , Pc : tn + 3) : t0 K CC(C, B, g2 : tn , Pc : tn + 2) : t0 t1 A : →tZ Create(A, B, g1 : tm ) : t1 B: →tZ Create(B, C, g2 : tm ) : t1 DCM ⇒p ⇒p K C(A, B, g1 : tm ) : t1 K C(B, C, g2 : tm , ) : t1 CC(B, A, g1 : tn , Pc : tn + 3) : t1 CC(C, B, g2 : tn , Pc : tn + 2) : t1 t2 A : →tZ SendGoods(g1 ) : t2 DCM : →p K CC(A, B, Pc : tm , receipt : tm + 1) : t2 CC(B, A, g1 : tn , Pc : tn + 3) : t2 CC(C, B, g2 : tn , Pc : tn + 2) : t2 C(A, B, g1 : tm ) : t2 C(B, C, g2 : tm , ) : t2 t3 DCM : CC(A, B, Pc : tm , receipt : tm + 1) : t3 CC(B, A, g1 : tn , Pc : tn + 3) : t3 CC(C, B, g2 : tn , Pc : tn + 2) : t3 C(A, B, g1 : tm ) : t3 C(B, C, g2 : tm , ) : t3 t4 B : →tZ SendP ay(Pc ) : t4 DCM : ⇒p →p K g1 : t4 K C(B, A, Pc : t7 ) : t4 CC(A, B, Pc : tm , receipt : tm + 1) : t4 CC(C, B, g2 : tn , Pc : tn + 2) : t4 C(B, C, g2 : tm , ) : t4 . . . t9 B : →tZ SendReceipt(receipt ) : t9 DCM : ⇒p → C(B, C, receipt : t10 ) : t9 K Pc : t9 g2 : t9 receipt : t9 Pc : t9 g1 : t9 p t10 DCM : ⇒K receipt : t10 Pc : t10 g2 : t10 receipt : t10 Pc : t10 g1 : t10
Fig. 7. Trace of running the contracts in DCM
using our framework for the supply chain context. Note also that there are no base-level commitments, so the system is in a final state, where the interactions may end. But the interaction can continue from such a state by activating any of the commitments of the contract. A well-formed contract is one in which both final and undesirable states2 do not occur at the same time. In the supply chain the majority of actions are repetitive, a requirement easily captured by our approach. The agents have only to derive persistently and defeasibly their actions (i.e. ⇒pZ SendGoods : tx ). In the case of perturbations in the supply chain, agents can rebut the above rule by activating a stronger one which specifies more or less items to be delivered (changing the superiority relation over the set of rules).
5
Exceptions
An exception represents a deviation from the normal flow of contract execution. It can be an opportunity, a breach, or an unpredictable sequence of operations. Expected exceptions can be captured by defining a preference structure over the runs within the commitment dependency network [9]. With the superiority relation in defeasible logic, we can easily define such a structure. 2
States in which at least one fluent is not true.
Running Contracts with Defeasible Commitment
99
In our view, unexpected exceptions can be managed in two ways: by introducing exception patterns or, when there is no domain dependent information, by applying principles of contract law. Contracts can be more or less elaborate, with several levels. Using well-defined exception patterns, one can generate more robust contracts. Moreover, it is considered that 80% of actual judicial cases follow the same classes of exception patterns [10]. We can provide a taxonomy of template contracts and a taxonomy of exceptions. When there are no explicit contrary-to-duty rules and no dependent domain information, the solution is to apply principles of contract law in order to compute the remedy, such as expectation damages, reliance damages, and opportunity costs. The amount of expectation damages must place the victim in the same position as if the actual contract had been performed. The amount of reliance damages must place the victim in the same position as if no contract had been signed. The amount of opportunity-cost damages must place the victim in the same position as if the best alternative contract had been performed [11, 5]. By tracking the life cycle of the commitments within a CDN one can detect and anticipate exceptions in contract execution, and therefore design proactive agents for such a market. An active base-level commitment represents a hard constraint for the debtor agent, while proposing a conditional commitment denotes a more risk-averse attitude. Moreover, inner commitments are permitted in a defeasible commitment machine. This opens the possibility of designing agents with different levels of risk attitude [2].
6
Related Work and Conclusions
Nonmonotonic commitment machines have been defined [4] using causal logic, while in DCMs deadlines have been attached to commitments, which represents a more realistic approach. Moreover, defeasible logic is more suitable than causal logic for capturing exceptions. Contracts have been already represented with defeasible logic and RuleML [12], but, by introducing DCMs between members of the supply chain, we offer a more flexible solution for contract monitoring. Capturing exceptions in the commitment machines of [9] is not performed with deadlines, needed for detecting the breach of a contractual clause. Exceptions in a semantic perspective [13] have used courteous logic which is a subset of defeasible logic. Commitments between a network of agents have also been analyzed [7], but without time constraints. The main contribution of this paper consists in introducing DCMs in the execution of contracts, to obtain two main advantages. On the one hand, agents can reason with incomplete information. Therefore, contracts represented as DCMs are more elaboration tolerant [4]. Also, this property of nonmonotonic logics allows us to model confidential contractual clauses. On the other hand, our long term research goal is to manage exceptions in contract execution. We argue that using DCMs and the expressiveness of defeasible logic it is easier to catch both expected and unexpected exceptions. The novelty regarding commitments
100
I.A. Letia and A. Groza
consists in attaching deadlines to each commitment by using the temporalised normative defeasible logic [3].
Acknowledgments We are grateful to the anonymous referees for useful comments. Part of this work was supported by the grant 27702-990 from the National Research Council of the Romanian Ministry for Education and Research.
References 1. Walsh, W., Wellman, E.: Decentralized supply chain formation: A market protocol and competitive equilibrium analysis. Journal of Artificial Intelligence Research 19 (2003) 513–567 2. Letia, I.A., Groza, A.: Agreeing on defeasible commitments. In: Declarative Agent Languages and Technologies, Hakodate, Japan (2006) 3. Governatori, G., Rotolo, A., Sartor, G.: Temporalised normative positions in defeasible logic. In: 10th International Conference on Artificial Inteligence and Law, Bologna, Italy (2005) 4. Chopra, A.K., Singh, M.P.: Nonmonotonic commitment machines. In: International Workshop on Agent Communication Languages and Conversation Policies, Melbourne, Australia (2003) 5. Letia, I.A., Groza, A.: Automating the dispute resolution in a task dependency network. In Skowron, A., ed.: Intelligent Agent Technology, Compiegne, France (2005) 365–371 6. Mallya, A.U., Yolum, P., Singh, M.P.: Resolving commitments among autonomous agents. In: International Workshop on Agent Communication Languages and Conversation Policies, Melbourne, Australia (2003) 7. Wan, F., Singh, M.: Formalizing and achieving multiparty agreements via commitments. In: 4th International Joint Conference on Autonomous Agents and Multiagent Systems, Utrecht, The Netherlands, ACM Press (2005) 770–777 8. Winikoff, M., Liu, W., Harland, J.: Enhancing commitment machines. In: Declarative Agent Languages and Technologies. LNCS 3476, Springer-Verlag (2005) 198– 220 9. Mallya, A.U., Singh, M.P.: Modeling exceptions via commitment protocols. In: 4th International Joint Conference on Autonomous Agents and Multiagent Systems, Utrecht, The Netherlands, ACM Press (2005) 122–129 10. Bibel, L.W.: AI and the conquest of complexity in law. Artificial Intelligence and Law 12 (2004) 159–180 11. Craswell, R.: Contract law: General theories. In Bouckaert, B., Geest, G.D., eds.: Encyclopedia of Law and Economics, Volume III. The Regulation of Contracts. Cheltenham (2000) 1–24 12. Governatori, G.: Representing business contracts in RuleML. Journal of Cooperative Information Systems 14 (2005) 13. Grosof, B.: Representing E-Commerce rules via situated courteous logic programs in RuleML. Electronic Commerce Research and Applications 3 (2004) 2–20
A Self-organized Energetic Constraints Based Approach for Modelling Communication in Wireless Systems Jean-Paul Jamont1 and Michel Occello2 1
Institut National Polytechnique de Grenoble, LCIS, 26000 Valence, France 2 Universit´e Pierre Mend`es, LCIS/INPG, 26000 Valence, France {jean-paul.jamont, michel.occello}@esisar.inpg.fr
Abstract. Open physical artificial systems often involve wireless autonomous entities under high constrainted energetic policies. Their features naturally lead to apply multiagent techniques to ensure both the autonomy of entities and the best whole system organization. We propose a multiagent approach for wireless communication robust management for such physical systems using self-organization mechanisms.
1
Introduction
A physical complex system can be defined as composed of many software/hardware elements which interact each other and with their environment These interactions are often non-linear and generally contain feedback loops. These systems are characterized by the emergence at a global level of new properties and of a new dynamic which is not easily predictible from the observation and the analysis of the elementary interactions. Working with these systems like in collective robotics or massive instrumentation imposes the use of wireless technology. The features of such complex cognitive physical system leads to naturally apply multiagent techniques to ensure both the autonomy of entities and the best organization of the whole system [4]. This paper presents the different steps of the design of an energy efficient middleware based on the MWAC model (Multi-Wireless-Agent Communication). In a the first section is discussed the necessity to adopt a message oriented middleware (MOM) for our applications. We then propose our multiagent approach based on self-organization to manage the communication of these decentralized embedded nodes networks. We give finally an insight to some quantitative results showing the benefit of a multiagent approach compared to traditional protocols in a real world application of intrumentation of an underground instrumentation.
2
A Multiagent Approach to Design an Energy Efficient Message Oriented Middleware
Our research works deal with embedded multiagent systems like collective robotics or physical instrumentation. Considering complex embedded control M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 101–110, 2006. c Springer-Verlag Berlin Heidelberg 2006
102
J.-P. Jamont and M. Occello
systems as networks of decentralized cooperative nodes is an attractive way to design physical intelligent applications [10, 4]. Multihop communication. In wireless networked systems, communication between two hosts is generally not direct. To communicate, entities require help from other hosts (multihop communication). Such a requirement creates an important routing problem because updating the location of neighbors is difficult. All adapted wireless routing protocols use flooding techniques. In a flooding technique, a host gives the message to all its neighbors which do the same. Limited power ressources. Hosts have limited power ressources. One of the whole system aims is so to reduce as much as possible the energy expense. When they have nothing to do generally for sparing energy they enter in a sleep mode. When they communicate they must use good routing protocols and optimal ways (generally the criteria are the number of hops). But they must decrease as much as possible the flooding scheme because the associated power cost is very high. An agressive environment like an underground river system (as for one of our applications) can cause some internal faults for agent. The communication infrastructure must be very adaptive, fault tolerant and self-stabilized: an agent failure must not have an important impact on the system. This system must provide reliable communications and must adapt to ”real-time” constraints. Furthermore, in the case of mobile devices the infrastructure of systems are not persistent. Message oriented middleware. We need to design a mobile communication management layer to manage the wireless communications between the different agents of the system. This layer must increase interoperability, portability and flexibility of an application by allowing the application to be distributed over heterogenous multiple agents. It must reduce the complexity of development of the agents. This layer will be a Message Oriented Middleware (MOM) (fig. 1).
Fig. 1. Our embedded multiagent system architecture
A Self-organized Energetic Constraints Based Approach
103
Multiagent approach The distributed and open nature of wireless networks means that the multiagent approach is an adapted answer. Another advantage of this approach is the external representation of the interactions and of the organization. External representations offer multiple possibilities such as the monitoring by an external observer. A few works reaching the same objectives show that the approach is interesting : they are described in [5]. Our MOM must be economic in an energy point of view : this constitutes one of the main differences with the other works on multiagent based middleware [1, 9]. We are thereafter going to be interested in the AEIO decomposition [2]. We will follow the method of multiagent design discussed in [11], associated to this multiagent decomposition. It proposes a decomposition according to four aspects collectively accepted today. The agent aspect (A) gathers all elements together for defining and constructing these entities. Our agents have hybrid architectures, i.e. a composition of some pure types of architectures. Indeed, the agents will be of a cognitive type in case of a configuration alteration, it will be necessary for them to communicate and to manipulate their knowledge in order to have an efficient collaboration. On the other hand, in a normal use case it will be necessary for them to be reactive (stimuli/response paradigm) to be most efficient. All the agents have the same communication capabilities but the communicated data depend of their roles. The environment aspect (E) for dealing with the analysis of environment elements and with capability such as the perception of this environment and the actions one can do on it. The environment will be made of measurable information. It is deterministic, non episodic, dynamic and continuous. Agents can move in this physical environment but don’t know their position. The interaction aspect (I) includes all elements which are in use for structuring the external interactions among the agents (agent communication language, interaction protocols). The organization aspect (O) allows to order agent groups in organization determined according to their roles. The MWAC model focus on the two last aspects.
3
The MWAC Model
Organization and interaction aspect. In this type of application no one can control the organization a priori. Relations between agents are going to emerge from the evolution of the agents’states and from their interactions. Our organizational basic structures are constituted by (see fig 2) : one and only one group representative agent (r) managing the communication in its group, some connection agents (c) which know the different representative agents and can belong to several groups, some simple members (s) which are active in the communication process only for their own tasks (They don’t ensure information relay). With this type of organizational structure, the message path between the source (a) and the receiver (b) is generally ((a, r),∗ [(r, c), (c, r)], (r, b)). Because a representative agent is the most sollicited agent in a group, the best one is the one having the most important level of energy and the most important number of neighbors. We use a role allocation based self-organization
104
J.-P. Jamont and M. Occello
Fig. 2. Group organization
mechanism involving the election of a representative agent based on a function which estimates the adequation between its desire to be the manager and its capacity to be. The energy saving is obtained owing to the fact that the flooding is only directed to the representative agents of the groups and to some connection agents. However, networks with an organizational structure must pay attention to the maintenance of their routing table. Generally, the adaptive features of these tables come from periodical exchanges between the different nodes. In our approach we do not wish to use this technique to ensure the maintenance of coherence. Indeed, our principle will be ”if we do not need to communicate, it is useless to spend energy to ensure the coherence maintenance”. However, we will thus use eavesdropping of surrounding agent communications. We extract knowledge from these messages exchanges to update our beliefs about our neighbors. Moreover, our self-organization mechanism will integrate an energy management policy. The MWAC formal description. We propose here a formal description of our model. The notation find their sources in the work described in [8]. Identifier. Hosts of the network are modeled by agent. Each agent have an identificator i. We note Ai the agent identified by i. The multiagent system. The multiagent system Γ is the set of agents Γ = {A1 , A2 , ..., Ai , ..., An } with card(Γ ) = n. Our multiagent system is open : hosts can enter or leave the system. Time. We note T the ordered set with the operator < and an element −∞ with ∀t T, t < −∞. So T = N ∪ {−∞}. Groups. 1) An agent group is noted G. In our organization, a group is identified by its representant Identifier. The group where the representant is AR is noted GR . All groups are part of the system : GR P(Γ ). (2: intention) A group has a finite time to live (with a lower and a higher limit).
A Self-organized Energetic Constraints Based Approach
105
The lower limit is the most interesting (the group birth) : we note [AR , t0 ] the group created by AR at t0 with (AR , t0 ) Γ × T. (3: belief in extension) We note [AR , t0 ]Aj ,t1 the set of agents that Aj think members of the group [AR , t0 ] at t1 . (4: extension) We note [AR , t0 ]t the set of agents really in [AR , t0 ] at t. We note [AR , t0 ]t the group composition GR created at t0 at the given date t. This knowledge can be defined from the belief of the agents: [AR , t0 ]t = {Aj Γ | Aj [AR , t0 ]Aj ,t ∧ Aj [AR , t0 ]AR ,t }
(1)
Belief. BAi ϕ minds that the agent Ai thinks ϕ, in other words it thinks that ϕ is true. To highlight the recursive feature of the group definition given in 1, we can note that (Aj [AR , t0 ]Ai ,t ) ≡ (BAi (Aj [AR , t0 ]t )). Desir. DAi ϕ minds that the agent Ai desires ϕ, in other words it wants to verify ϕ. Knowledge. KAi ϕ minds that the agent Ai knows ϕ. Roles. (1) We note role(Ai , t) the function that returns the role of the agent Ai at the date t with (Ai , t) Γ × T. A role can be RR for a representant, RC for a connection agent and RS for a simple member. When an agent is initialized, he has no role. The function role can then return ∅ to signify that the agent has no role. (2: simplification of writing ) We note rolet (Ai ) the last role taken by Ai . (3: choice of a role) Each agent chooses a role depending on its neighborhood. So, choosing a role leads to notify the new role to neighbors and modify its knowledge about its own role. So KAi (role(Ai , tv ) = RR ) can be understood following different way. Firstly, we learn simply that the agent Ai is representant, but if KAi (role(Ai , tv−1 ) = role(Ai , tv )) then the agent Ai has modified his role to be representant. Power supply. (1) We note power(Ai , t) the function which returns the energy level (a percentage) of the agent Ai at the date t with (Ai , t) Γ × T. (2: simplification of writing) We note power(Ai ) the current energy level of the agent Ai . Neighborhood. We note NAi the neighborhood that Ai knows. It is a set of agents in the emission range of the agent Ai not including itself. So, NAi P(Γ ). An agent knows a neighbor by its unique identifier but can access to its role and its group ( ∀Aj NAi , KAi role(Aj ) ∧ KAi group(Aj ) ) with group the function defined similar to role but group(Aj ) returns the group identifier of the agent Aj . We can notice that if KAi [AR , t0 ]Aj ,t1 then KAi group(Aj ) = R. The reciprocal is not true because there is an uncertainty about the time. Formalized description of the role attribution. Choosing a role depends firstly on its neighborbood (basic algorithm). However, because our power level is low, an agent can not desire to be representant (energetic constraint ). The decision processes of agents are not synchronized. Two neighbors can take the same decision at the same time. It is possible that two close agents choose a representative
106
J.-P. Jamont and M. Occello
role: there is a representant conflict which must be detected and corrected. It is possible to have two closer groups which don’t include a connection agent between them: there is an inconsistency which must be detected and corrected. We begin by focusing on our algorithm which allows to the agent Ai to choose a role in function of its neighborhood NAi . Basic algorithm. 1) There is no neighbor : the concept of role doesn’t make sense. (NAi = ∅) ⇒ (KAi (role(Ai ) = ∅)) 2) Neighbors exist (NAi = ∅). (KAi (card( {Aj NAi | role(Aj ) = RR } ) = 0) ⇒ (KAi (role(Ai ) = RR )) KAi ((card( {Aj NAi | role(Aj ) = RR } ) = 1) ⇒ (KAi (role(Ai ) = RS )) KAi ((card( {Aj NAi | role(Aj ) = RR } ) > 1) ⇒ (KAi (role(Ai ) = RC )) Energetic constraint. Generally, the role of representative or connection make that the agents take an active part in the management of communications. From this fact, consumption of energy is higher. So, (power(Ai ) < trigV alue) ⇒ (KAi (role(Ai ) = RS )). Detecting and correcting a representant conflict. (1: Conflict detection) An agent Ai detects a conflict with other agents if KAi (NAi = ∅) ∧ KAi (role(Ai ) = RR ) ∧ KAi ((card( {Aj NAi | role(Aj ) = RR } ) >= 1). (2: Conflict correction) Ai has detected a conflict with other agents. he sends a Conf lictRepresentativeResolution message (see the interaction aspect) to its representative neighbors. This message contains the score of the agent Ai . The agents, which receive this message, calculate their own score. Agents with an inferior score leave their role and choose another. An agent with a better score sends its score to its neighbors. An exemple of score function can be simply expressed. The following function supports an agent with a high energy level and a significant neighbor (the interest is to have dense groups in order to limit the flooding volume). score(Ai ) = power(Ai ).card(NAi ) Detecting and correcting an inconsistency. (1: Inconsistency detection) An inconsistency can be detected only by one representative starting from beliefs of one of its members. This detection needs an interaction between an agent Ai and its representative AR (message V erif yN eighborGroupConsistency). The agent Ai will send the list of the groups of its neigborhood of which it does not know if its representative knows the proximity. We define NAi ,L = {Ak NAi | role(Ak ) = RC }. A connection agent is member of many groups, so, if ALNAi ∧ role(AL ) = RC ∧ [Aα , tα ]AL ,ta ∧ [Aβ , tβ ]AL ,tb . then KAi (group(AL ) = α) et KAi (group(AL ) = β). We define ζAi = {Aj NAi | group(Aj ) = group(Ai ) ∧ ( ∃Ak NAi ,L / (group(Ak ) = group(Aj ) ∧ group(Ak ) = group(Ai )) )}. The inconsistency is found by Ai if card(ζAi ) = 0. The representative agent AR of Ai receives a message with ζAi . For ∀An ζAi , if card({Ay NAR ,L | group (Ay ) = n}) = 0 then there is a real inconsistency.
A Self-organized Energetic Constraints Based Approach
107
(2: Inconsistency correction) In this case several strategies can be used. We judge that if a path with a low energy cost is available, one will support a stability of the organization to a reorganization. A search for path towards one of the groups soft will thus be sent with a TTL (Time to Live) relatively low. If a path exists, the organization does not change. If not, the representative proposes to Ai , if role(Ai ) = RC , to be a representant (ISuggestY ouT oBeRepresentative). The agent Ai can refuse to become representative (if its energy level is too low) but in all the case, the representant AR leaves its role. About belief and knowledge of its neighborhood. We have seen that the reasoning is based on the belief/knowledge on the neighbors. In our system a belief is a recent knowledge (on which no reasoning has been yet applied). If an agent receives an information σ, it is a belief. If it does not find a contradiction with this knowledge then σ becomes a knowledge. If not, a message W hoAreM yN eighbors can be sent to verify some knowledge. An agent Ai which receives the information σ, can be the receiver or can just be a relay of a communication which takes place in our range of communication. In this last case, we talk about eavesdropping. Eavesdropping allows to an agent to verify some information about a neighbor (identifier) without using a specific message... without extra energy expense. The agents will interact only with the agents in acquaintance. Agents interact by asynchronous exchange of messages (without rendez vous). Among the different protocols that we use, the choice of an introduction protocol is essential. Indeed, this protocol allows to the agents to be known, i.e. to bring their knowledge and their know-how to the agents’ society. An other important protocol is the best representant election protocol seen previously. These protocols are an arrangement of some of the different types of small messages defined in [5].
4
Implementation and Evaluation in the Case of an Underground River System Instrumentation
Implementation. Therefore, we will demonstrate the feasibility of our approach in the case of the instrumentation of an underground hydrographic system (the EnvSys project [6]). In a subterranean river system, the interesting parameters to measure are numerous: temperature of the air and the water, air pressure and if possible water pressure for the flooded galleries, pollution rate by classical pollutants, water flow, draft speed, etc. All these information will be collected at the immediate hydrographic network exit by a works station like a PC. These data will be processed to activate alarms, to study the progress of a certain pollution according to miscellaneous measuring parameters, to determine a predictive model of the whole network by relating the subterranean parameters measures of our system with the overground parameter measures more classically on the catchment basin. We have chosen for sensors a classical three-layers embedded architecture (physical layer/link layer/applicative layer).
108
J.-P. Jamont and M. Occello
We use the physical layer which is employed by NICOLA system, a voice transmission system used by the French speleological rescue teams [3]. This layer is implemented in a digital signal processor rather than a full analogic system. Thereby we can keep a good flexibility and further we will be able to apply a signal processing algorithm to improve the data transmission. The link layer used is a CAN (Controller Area Network) protocol stemming from the motorcar industry and chosen for its good reliability. The applicative layer is constituted by the agents’ system. A hybrid architecture enables to combine the strong features of each of reactive (to the message) and cognitive capabilities (to detect inconsistency and re-organisation ). The ASTRO hybrid architecture [10] is especially adapted to a real time context. The integration of deliberative and reactive capabilities is possible through the use of parallelism in the structure of the agent. Separating Reasoning/Adaptation and Perception/Communication tasks allows a continuous supervision of the evolution of the environment. The reasoning model of this agent is based on the Perception/Decision/Reasoning/Action paradigm. The cognitive reasoning is thus preserved, and predicted events contribute to the normal progress of the reasoning process. In this application, the agent must transmit periodically measures to the workstation. The communication module calls the MAS middleware services supplied through a component. The agent must use a WCommunication package, written in Java langage and translated into C++ langage because a lot of physical plateforms use this langage. This package contents two abstracts classes (Identifier and Message) and two main classes called Communication and BitField. In the Message abstract class the designer must implement the primitives to convert the message in a bit field (BitField MessageToBitField(Message m) and the recipocal primitive Message BitFieldToMessage(BitField b)). In the Identifier abstract class the designer must implement the type of identifier and two primitives BitField IdentifierToBitField() and Message BitFieldToMessage(BitField b). The primitive to convert the identifier in a bit field must be implemented by the designer. The Communication class contains a list of couples (Identifier,Message) for the emission and the reception. This list is private and must be accessed via Bool SendMessage(identifier, Message) and CoupleIdentifierMessage ReceiveMessage(). The package must be connected to the operating system. The operating system must give the battery energy level (primitive SetBatteryLevel(Float l) to the Communication class and must give the bit field which arrives. In an other hand, the middleware gives to the operating system the bit field to send by calling BitField GetBitFieldToSend(). These agents are embedded on autonomous processor cards. These cards are equipped with communication modules and with measuring modules to carry out agent tasks relative to the instrumentation. These cards supply a real time kernel. The KR-51(the kernel’s name) allows multi-task software engineering for C515C microcontroller. We can produce one task for one capability. We can then quite easily implement the parallelism inherent to agents and satisfy the realtime constraints.
A Self-organized Energetic Constraints Based Approach
109
Fig. 3. Approach comparison for the uni directionnal use case
Evaluation. In order to evaluate and improve such agents’ software architectures and the cooperation techniques that they involve, we introduce a simulation stage in our development process. The simulation first allowed us to experiment our approach and the software solutions that we provide for the various problems. We have compared our MAS to three traditionnal solution based on ad-hoc protocols. The DSDV protocol (Destination-Sequenced Distance-Vector protocol [12]) and the natural DSR protocol (Dynamic Source Routing protocol [7]) do not appear in this comparison because its efficiency were lower than the ehanced version of DSR which uses a route maintenance (memorization of the main route). We thereafter call efficiency the ratio between the theoretical useful volume of the optimal way divided by the volume of each transmitted communication. In the EnvSys project, all agents communicate only with the workstation situated at the end of the undergound river system : it is a unidirectionnal protocol. In this case, messages are small. For this example, three messages are sent by five seconds. The same scenario is applied for the different protocols. We can see that the benefit (fig 3) of our approach is important in the ENVSYS case. Our routing method can deliver quickly all messages with a good efficiency. Higher is the number of sensors better is the reactivity of our approach. We must note that if the system knows no pertubation or mobility variation of DSR will be better from an efficiency point of view is normal because in this case DSR learns all the routes (succession of sensors) allowing to communicate with the workstation. It is not really the case of our approach which reasons about the group and not from the sensors. One consequence is that the routes used by the messages with our approach are not optimal. We can see that our approach support the addition of a lot of sensors. The number of groups don’t explose with the number of sensors but their density increases.
5
Conclusion
We have presented in this paper a multiagent system to manage wireless communication between agents in respect to energy constraints. We have proposed
110
J.-P. Jamont and M. Occello
a multiagent middleware based on a decentralized self-organization model which allows to make abstraction of this energy efficient communication management at the application level. We use this middleware in the case of a wireless sensor network. In this application, all the agents use the ASTRO hybrid architecture. The middleware is included in the ASTRO communication module. This middleware allows to manage the openess of the system: adding an host does not require a manual reconfiguration. Most of hosts’dysfunctions should not threaten the functional integrity of the whole system: it is be self-adaptive to a sensor power fault. Throught the simulation step, we can notice what the multiagent approach, providing an emergent feature which is inferred by the MAS, makes the system fault tolerant to changes of the environment in which it evolves. Generic aspects of agents allow us to envisage differents applications for this middleware such as diagnosis, risk management, data fusion...
References 1. Calisti, M. et al. An agent-based middleware for adaptative roaming in wireless networks. In Workshop on Agents for Ubiquitous Computing, 2004. 2. Y. Demazeau. From interactions to collective behavior in agent-based systems. In European Conference on Cognitive Science, 1995. 3. N. Graham. The Nicola Mark II a New Rescue Radio for France. In The CREG Journal, volume 38, pages 3–6, December 1999. 4. J.-P. Jamont and M. Occello. Using self-organization for functionnal integrity maintenance of wireless sensor networks. In Proceedings of IEEE International Conference on Intelligent Agent Technology. IEEE Computer Society, 2003. 5. J.-P. Jamont and M. Occello. An adaptive multiagent infrastructure for selforganized physical embodied systems. In IEEE International Symposium on Advanced Distributed Systems, volume LNCS 3061. Springer Verlag, January 2004. 6. J.-P. Jamont, M. Occello, and A. Lagreze. A multiagent system for the instrumentation of an underground hydrographic system. In Proceedings of IEEE International Symposium on Virtual and Intelligent Measurement Systems, 2002. 7. D.-B. Johnson and D.-A. Maltz. Dynamic source routing in ad hoc wireless networks. In Mobile Computing, pages 153–181. Kluwer Academic Publishers, 1996. 8. F. Legras and C. Tessier. Lotto: group formation by overhearing in large teams. In Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2003, pages 425–432, Australia, July 2003. ACM. 9. M. Mamei and F. Zambonelli. Self-organization in multi-agent systems : a middleware approach. In AAMAS 2003 Workshop on Engineering Self-Organising Systems, pages 233–248, 2003. 10. M. Occello, Y. Demazeau, and C. Baeijs. Designing organized agents for cooperation in a real time context. In Collective Robotics, volume LNCS/LNAI 1456, pages 25–73. Springer-Verlag, March 1998. 11. M. Occello and J.L. Koning. Multi-agent based software engineering: an approach based on model and software reuse. In From Agent Theory to Agent Implementation II - EMCSR 2000 Symposium, pages 645–657, Vienna, April 2000. 12. C.E. Perkins, E.M. Royer, and S. Das. Highly dynamic destination-sequenced distance-vector (dsdv) routing for mobile computers. In ACM SIGCOMM’94, 1994.
Evaluation of Several Algorithms in Forecasting Flood C.L. Wu and K.W. Chau Department of Civil and Structural Engineering, Hong Kong Polytechnic University, Hunghom, Kowloon, Hong Kong, People’s Republic of China
[email protected] Abstract. Precise flood forecasting is desirable so as to have more lead time for taking appropriate prevention measures as well as evacuation actions. Although conceptual prediction models have apparent advantages in assisting physical understandings of the hydrological process, the spatial and temporal variability of characteristics of watershed and the number of variables involved in the modeling of the physical processes render them difficult to be manipulated other than by specialists. In this study, two hybrid models, namely, based on genetic algorithm-based artificial neural network and adaptive-network-based fuzzy inference system algorithms, are employed for flood forecasting in a channel reach of the Yangtze River. The new contributions made by this paper are the application of these two algorithms on flood forecasting problems in real prototype cases and the comparison of their performances with a benchmarking linear regression model in this field. It is found that these hybrid algorithms with a “black-box” approach are worthy tools since they not only explore a new solution approach but also demonstrate good accuracy performance.
1 Introduction Numerical models for flood propagation in a channel reach can broadly be classified into two main categories: conceptual models [1-5]; and, empirical models based on system analysis or “black-box” approach. Huge amount of data are usually required for calibration of these conceptual models. In many cases, a simple “black-box” model may be preferred in identifying a direct mapping between inputs and outputs. During the past decade, several nonlinear approaches, including artificial neural network (ANN), genetic algorithm (GA), and fuzzy logic, have been employed to solve flood forecasting problems. Smith and Eli [6] applied a back-propagation ANN model to predict discharge and time to peak over a hypothetical watershed. Tokar and Johnson [7] compared ANN models with regression and simple conceptual models. Liong et al. [8] employed an ANN approach for river stage forecasting in Bangladesh. Cheng and Chau [9] employed fuzzy iteration methodology for reservoir flood control operation. Chau and Cheng [10] performed a real-time prediction of water stage with ANN approach using an improved back propagation algorithm. Chau [11] calibrated flow and water quality modeling using GA. Cheng et al. [12] combined a fuzzy optimal model with a genetic algorithm to solve multiobjective rainfall-runoff model calibration. Chau [13-14] performed river stage forecasting and rainfall-runoff correlation with particle swarm optimization technique. Cheng et al. [15] carried out long-term prediction of discharges in Manwan Reservoir using ANN models. M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 111 – 116, 2006. © Springer-Verlag Berlin Heidelberg 2006
112
C.L. Wu and K.W. Chau
In this paper, two hybrid algorithms, namely, genetic algorithm-based artificial neural network (ANN-GA) and adaptive-network-based fuzzy inference system (ANFIS), are applied for flood forecasting in a channel reach of the Yangtze River. To the knowledge of the authors, these types of algorithms have never been applied to hydrological and water resources problems. The new contributions made by this paper are the application of these two algorithms on flood forecasting problems in real prototype cases and the comparison of their performances with a benchmarking linear regression (LR) model in this field.
2 Genetic Algorithm-Based Artificial Neural Network (ANN-GA) A hybrid integration of ANN and GA, taking advantages of the characteristics of both schemes, may be able to increase solution stability and improve performance of an ANN model. A genetic algorithm-based artificial neural network (ANN-GA) model is developed here wherein a GA [16] is used to optimize initial parameters of ANN before trained by conventional ANN. In the GA sub-model, the objective function used for initializing weights and biases is represented as follows: p
min J (W ,θ ) = ∑ Yi − f ( X i ,W ,θ )
(1)
i =1
where W is the weight, θ is the bias or threshold value, i is the data sequence, the total number of training data pairs, data, and
th
p is
th
X i is the i input data, Yi is the i measured
f ( X i ,W ,θ ) represents simulated output. The main objective of the sub-
model is to determine optimal parameters with minimal accumulative errors between the measured data and simulated data.
3 Adaptive-Network-Based Fuzzy Inference System (ANFIS) In this study, the output of each rule is taken as a linear combination of input variable together with a constant term. The final output is the weighted averaged of each rule’s output. The fuzzy rule base comprises the combinations of all categories of variables. As an illustration, the following shows a case with three input variables and a single output variable. Each input variable ( x , y , and z ) is divided into three categories. Equally spaced triangular membership functions are assigned. The categories are assigned: “low,” “medium,” and “high.” The number of rules in a fuzzy rule base is
c n , where c is the number of categories per variable and n the number of variables. The optimal number of categories is obtained through trials and performance comparison. The format of the rule set contains an output oi , j ,k for a combination of category i of input variable variable z , respectively.
x , category j of input y , and category k of input
Evaluation of Several Algorithms in Forecasting Flood
113
x , y , and z will be to be assigned to the corresponding output oi , j ,k will
If a rule is triggered, the corresponding memberships of computed. The weight
wi , j ,k
be furnished by the result of a specific T-norm operation. Multiplication operation is adopted here. A single weighted average output will then be acquired by combining the outputs from all triggered rules as follows:
o=
∑ w ⋅o ∑w i , j ,k
i , j ,k
(2)
i , j ,k
For this flood forecasting model, some parameters, including each triangular membership function and the consequence part of each rule, have to be obtained through learning by ANN. The algorithm is able to enhance the intelligence when working in uncertain, imprecise, and noisy environments and to accomplish faster convergence. It possesses the characteristics of both the neural networks, including learning abilities, optimization abilities, and connectionist structures, and the fuzzy control systems, including human like “if-then” rule thinking and ease of incorporating expert knowledge, etc. In this system, the parameters defining the shape of the membership functions and the consequent parameters for each rule are determined by the back-propagation learning algorithm and the least-squares method, respectively.
4 Application Case The studied channel reach from Luo-Shan to Han-Kou is located at the middle of the Yangtze River. The water elevation at Luo-Shan station ranges from 17.3m during the non-flooding period to 31.0m during the flooding period whilst the mean levels are 20.8m and 27.1m during the non-flooding and flooding periods, respectively. The key objective of this study is to forecast water stages of the downstream station, Han-Kou, on the basis of its counterparts at the upstream station, Luo-Shan. For the ANN-GA model, a three-layer network is adopted with three input nodes and one output node. As an initial data preprocessing, the input and output data are normalized to be ranging between 0 and 1, corresponding to the minimum and the maximum water stages, respectively. ANN-GA models are trained with different number of nodes in the hidden layer so as to determine the optimal network geometry for these data sets. A testing set is incorporated so as to avoid the overfitting problem. Training is stopped when the error learning curve of the testing set starts to increase whilst that of the training set is still decreasing. It is found that, amongst them, the architecture with 3 nodes in the hidden layer is the optimal. For an ANFIS model, more number of categories will furnish higher accuracy, but at the same time will have the disadvantages of larger rule bases and higher computation cost. Trial and error procedure is performed with a view to selecting the appropriate number of variable categories. Careful treatment is also made to avoid overfitting, though it is anticipated that more subspaces for the ANFIS model might
114
C.L. Wu and K.W. Chau
result in better performance. An optimal number of categories of 3 is adopted, after having taken into consideration of the computational time, root mean square error in training (RMSE_tra), and root mean square error in validation (RMSE_vali).
5 Results and Analysis
$EVROXWHHUURUP
The performance comparison of the LR, ANN-GA, and ANFIS models in forecasting 1-day lead time water levels at Han-Kou on the basis of the upstream water levels at Luo-Shan station during the past three days is shown in Figure 1. The fluctuation of absolute error is the largest for the LR model and is smallest for the ANFIS model. Table 1 shows the performance comparison using RMSE_tra, RMSE_vali, training time, and number of parameters. The ANFIS model is able to attain the highest accuracy, yet requires less training time than ANN-GA model. However, it should be noted that the ANFIS model involves more number of parameters than the other two models.
$1),6
$11*$
/5
7LPHGD\
Fig. 1. Performance comparison in terms of absolute errors for different algorithms Table 1. Performance comparison for different models in flood prediction
Models LR ANN-GA ANFIS
RMSE_tra (m) 0.238 0.213 0.204
RMSE_vali (m) 0.237 0.226 0.214
Training time (s) Nil 135 49
Number of parameter 4 16 135
Evaluation of Several Algorithms in Forecasting Flood
115
Their differences in performance can be explained somewhat by the fact that the LR model can only fit a linear function to input-output data pairs whilst both the ANN-GA and ANFIS models can contort themselves into complex forms in order to handle non-linear problems. It is justifiable that an ANN-GA model with 16 parameters is more flexible than LR model with 4 parameters since the coupling of ANN and GA can take advantage of the local optimization of ANN and the global optimization of GA. The results indicate that the local approximation approach of the ANFIS model has better performance in mapping the connectivity of input-output data pairs than the global approximation approach of the ANN-GA model. More importantly, the ANN-GA model entails more training time than the ANFIS model due to the time consuming searching nature of GA. Nevertheless, with the recent rate of development of computer technology, it will not be a major constraint. As such, it is trusted that hybrid algorithms, including ANN-GA and ANFIS, will have significant potentials as alternatives to conventional models in solving hydrological problems.
6 Conclusions In this paper, two hybrid “black-box” models are applied for real flood forecasting. Both ANN-GA and ANFIS models are able to produce accurate flood predictions of the channel reach between Luo-Shan and Han-Kou stations in the Yangtze River. Amongst them, the ANFIS model, having the characteristics of both ANN and FIS, is the optimal in terms of the simulation performance, yet requires a larger amount of parameters in comparison with the benchmarking LR model. The ANN-GA model adequately combines the advantage of ANN with the advantage of GA, yet consumes most computation cost. Both ANN-GA and ANFIS models could be considered as feasible alternatives to conventional models. The new contributions made by this paper are the application of these two algorithms on flood forecasting problems in real prototype cases and the comparison of their performances with a benchmarking model in this field.
Acknowledgement This research was supported by the Internal Competitive Research Grant of Hong Kong Polytechnic University (A-PE26).
References 1. Chau, K.W., Jiang, Y.W.: 3D Numerical Model for Pearl River Estuary. Journal of Hydraulic Engineering ASCE 127(1) (2001) 72-82 2. Chau, K.W., Jin, H.S.: Numerical Solution of Two-Layer, Two-Dimensional Tidal Flow in a Boundary Fitted Orthogonal Curvilinear Coordinate System. International Journal for Numerical Methods in Fluids 21(11) (1995) 1087-1107 3. Chau, K.W., Jin, H.S., Sin, Y.S.: A Finite Difference Model of Two-Dimensional Tidal Flow in Tolo Harbor, Hong Kong. Applied Mathematical Modelling 20(4) (1996) 321-328
116
C.L. Wu and K.W. Chau
4. Chau, K.W., Lee, J.H.W.: Mathematical Modelling of Shing Mun River Network. Advances in Water Resources 14(3) (1991) 101-124 5. Chau, K.W., Lee, J.H.W.: A Microcomputer Model for Flood Prediction with Application. Microcomputers in Civil Engineering 6(2) (1991) 109-121 6. Smith, J., Eli, R.N.: Neural-Network Models of Rainfall-Runoff Process. Journal of Water Resources Planning and Management, ASCE 121(6) (1995) 499-508 7. Tokar, A.S., Johnson, P.A.: Rainfall-Runoff Modeling using Artificial Neural Networks. Journal of Hydrologic Engineering, ASCE 4(3) (1999) 232-239 8. Liong, S.Y., Lim, W.H., Paudyal, G.N.: River Stage Forecasting in Bangladesh: Neural Network Approach. Journal of Computing in Civil Engineering, ASCE 14(1) (2000) 1-8 9. Cheng, C.T., Chau, K.W.: Fuzzy Iteration Methodology for Reservoir Flood Control Operation. Journal of the American Water Resources Association 37(5) (2001) 1381-1388 10. Chau, K.W., Cheng, C.T.: Real-time Prediction of Water Stage with Artificial Neural Network Approach. Lecture Notes in Artificial Intelligence 2557 (2002) 715-715 11. Chau, K.W.: Calibration of Flow and Water Quality Modeling using Genetic Algorithm. Lecture Notes in Artificial Intelligence 2557 (2002) 720-720 12. Cheng, C.T., Ou, C.P., Chau, K.W.: Combining a Fuzzy Optimal Model with a Genetic Algorithm to solve Multiobjective Rainfall-Runoff Model Calibration. Journal of Hydrology 268(1-4) (2002) 72-86 13. Chau, K.W.: River Stage Forecasting with Particle Swarm Optimization. Lecture Notes in Computer Science 3029 (2004) 1166-1173 14. Chau, K.W.: Rainfall-Runoff Correlation with Particle Swarm Optimization Algorithm. Lecture Notes in Computer Science 3174 (2004) 970-975 15. Cheng, C.T., Chau, K.W., Sun, Y.G., Lin, J.Y.: Long-Term Prediction of Discharges in Manwan Reservoir using Artificial Neural Network Models. Lecture Notes in Computer Science 3498 (2005) 1040-1045 16. Goldberg, D.E., Kuo, C.H.: Genetic Algorithms in Pipeline Optimization. Journal of Computing in Civil Engineering ASCE 1(2) (1987) 128-141
Simulation Analysis for On-Demand Transport Vehicles Based on Game Theory Naoto Mukai1 , Jun Feng2 , and Toyohide Watanabe1 1
Department of Systems and Social Informatics, Graduate School of Information Science, Nagoya University Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan
[email protected],
[email protected] 2 Hohai University, Nanjing, Jiangsu 210098, China
[email protected] Abstract. In these years, on-demand transportations (such as demandbus) are focused as new transport systems. Vehicles in the on-demand transport systems must take reasonable actions in various situations to increase their profits. However, it is difficult to find convincing solutions in such situations because there are uncertainties about customers and other transport vehicles. Therefore, in this paper, we focus on two issues: “how to control risk?” and “how to compete (or cooperate) with another transport vehicle?”. Moreover, we show the decision-making processes for the transport vehicles on the basis of game theory. The profits for transport vehicles are classified into assured and expected rewards. The former represents scheduled customers in advance. The latter represents undetermined customers. Transport vehicles set their routes in consideration of the balancing between the rewards (i.e., risk). The transport vehicles are classified into several types based on risk policies and transport strategies. Finally, we report results of simulation experiments.
1
Introduction
A transport system is one of the main functions in a city life. Thus, numerous effort to improve the system has been continuously. Technological advances in recent years are ready to change the scene surrounding the transport systems. In other words, many people own mobile devices instrumented GPS functions and can inform their positions to transport companies (or vehicles). Hence, strategies for transport vehicles will change to more dynamic plans. In fact, it appears that most of traditional strategies is static: e.g., bus systems incorporate fixed bus routes (fixed bus stops). In recent years, new strategies called on-demand transportations [1, 2, 3] are introduced into some local towns. On-demand transportation system requires no specific routes because transport vehicles of the system visit positions of customers according to occasion demands. However, there are some problems in the on-demand transportations. Especially, the route decision problem of the on-demand transportation is more difficult than traditional static transportation systems [4, 5] due to conflicts with other transport systems and uncertainties about M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 117–126, 2006. c Springer-Verlag Berlin Heidelberg 2006
118
N. Mukai, J. Feng, and T. Watanabe
other vehicles (which include trains) and customers. For the reason, current managements of the system are limited to local areas and small customers. Our purpose is to establish more effective framework of on-demand transportation systems for companies and customers. Therefore, in this paper we focused on two issues: “how to control risk?” and “how to compete (or cooperate) with another transport vehicle?”. It is obvious that the risk for transport vehicles (companies) concerns to their profits. The profits can be classified into assured and expected rewards. The former represents scheduled customers in advance. The latter represents undetermined customers. It seems that the setting for travel routes in consideration of the expected reward in addition to the assured reward is useful for transport vehicles although they involve risks in varying degrees. Hence, we introduce policies against the risk which control the balancing between two rewards. Moreover, it appears that each transport vehicle has a different transport strategy (i.e., answers for “which route should be selected?”) which affects their profits. It is not always possible to plan better transport strategies than other vehicles because one strategy influences results of other strategies. Hence, we clarify the decision-making processes in environments where different strategies compete with each other on the basis of game theory. In the results, we classify the transport vehicles into three risk policies (neutral, risk-avoiding, and risk-taking) and three transport strategies (mixed, competitive, and Nash). We also performed simulation experiments and show the characteristics of the transport vehicles. The remainder of this paper is as follows: Section 2 formalizes transport areas and vehicles for simulation experiments. Section 3 describes the risk policies which control the balancing between assured and expected rewards. Section 4 describes the transport strategies based on the game theory [6]. Section 5 reports results of the simulation experiments. Finally, Section 7 concludes and offers our future works.
2
Environment
In this section, we formalize an environment surrounding transport systems for simulation experiments. 2.1
Transport Area
A transport area A for transport systems is represented by a graph structure which includes nodes N and edges E as Equation (1). A node n represents a picking place (like a bus stop) and the number of waiting customers at n at time t is given by |c(n, t)|. An edge e represents a route with length |e| between the picking places. The nodes include one or more depots D for transport vehicles. Basically, vehicles set their transport routes according to occasion demands in advance their departures. After the setting, the transport vehicles depart from their depots. Finally, the transport vehicles return their depots through their traveling routes. In this simulation experiment, we deal with pick-up transportation problems and the transport area is a two-way circular graph.
119
average number
Simulation Analysis for On-Demand Transport Vehicles
cycle
time
Fig. 1. Example of trend function
⎧ R ⎪ ⎪ ⎨ N E ⎪ ⎪ ⎩ D
= (N, E) = {n1 , n2 , · · ·} = {e(n, n )|n, n ∈ N } = {d1 , d2 , · · ·} ∈ N
(1)
Generally, occasion patterns of transport demands are not always the same due to uncertainties of customers. However, it appears that there is a trend depending on locations or time of day. For example, at the start of office hours, people move from housing area to business area. Hence, we formalize such occasion trends as a distribution function d(n, t) in Figure 1. The function represents an average number of occurrence demands on node n at time t. The function also has a cycle like time of day. Moreover, the uncertainty of the trend is given by a normal probability distribution with average d(n, t) and variance σ 2 . In other words, if the variance σ 2 is small, the number of occurrence demands is almost the same with the average number d(n, t). If not so, the number of occurrence demands varies widely. 2.2
Transport Vehicle
A transport vehicle v is given by five parameters as Equation (2). The depot is a depot of the vehicle. The speed is a traveling speed of the vehicle so that the traveling time between n and n is represented by |e(n, n )|/speed. The capacity is a maximum riding number of customers at the same time. If the number of customers |c(n, t)| is more than the maximum number, the overlimit customers (|c(n, t)| − capacity) must wait for arrival of the next vehicle. Moreover, if two vehicles visit the same node at the same time, the half number of customers |c(n, t)|/2 is assigned to each vehicle. The policy and strategy are a risk policy and a transport strategy of the vehicle. These two parameters are discussed in the following sections. v = (depot, speed, capacity, policy, strategy)
3
(2)
Risk Policy
It appears that the risk for transport vehicles concerns to their rewards. In this simulation, the rewards for transport vehicles are regarded as the number of riding customers in one traveling (i.e., from start to return). We can classify the
120
N. Mukai, J. Feng, and T. Watanabe
rewards into assured and expected rewards. The former represents scheduled customers as Equation (3) where td is a departure time from depot and R is a traveling route. It means that transport vehicles certainly get a reward ra at node n (except when the vehicles scramble for the reward). We call vehicles which estimate their rewards by Equation (3) as reactive vehicles. ra = |c(n, td )| (3) n∈R
The latter represents undetermined customers as Equation (4) where tr is a return time to the depot. As described above, the function d(n, t) includes uncertainty represented by a normal probability distribution with variance σ 2 . Thus, it means that transport vehicles maybe get a reward re at the node n in addition to the reward ra (except when the vehicles scramble for the reward). re =
n∈R
tr
|d(n, t)|dt
(4)
td
It seems that the setting for travel routes in consideration of the expected reward in addition to the assured reward maybe increase the profit of vehicles although they involve risks in varying degrees. Hence, we introduce a weight function w(σ 2 ) (0 ≤ w(σ 2 ) ≤ 1) of the expected reward as a policy against risks so that the total of reward can be estimated as Equation (5). We call vehicles which estimate their rewards by Equation (5) as proactive vehicles [7, 8]. r = ra + w(σ 2 ) · re
(5)
In this paper, we adopt Equation (6) as the weight function. The parameter α (α ≥ 0) dictates attitudes of vehicles against the risks. Figure 2 shows three types of the weight function. If α is 1, the attitudes of vehicles are neutral. The degree of variance σ 2 is inversely proportional to the weight of the expected reward. If α is more than 1, the attitudes of vehicle are risk-taking. The curve of the risk-taking is distorted on the upper side. Hence, risk-taking vehicles prefer gambles even though the uncertainty (i.e., variance σ 2 ) is high. If α is less than 1, the attitudes of vehicles are risk-avoiding. The curve of the risk-avoiding is distorted on the lower side. Hence, risk-avoiding vehicles prefer assuredness even though the uncertainty (i.e., variance σ 2 ) is low. w(σ 2 ) = −σ 2α + 1
4
(6)
Transport Strategy
In this section, we consider a competitive game, which is called synchro game in the field of the game theory, between two transport vehicles. Transport strategy is classified into three types: mixed, competitive, and Nash. We show decisionmaking processes between two vehicles with specific strategies by using simple
Simulation Analysis for On-Demand Transport Vehicles
121
1
Risk-Taking 0.8
0.6
w(
2
)
r2
n2
r3
n3
v1
Neutral
n1
0.4
v2
Risk-Avoiding
0.2
0 0
0.2
0.4
2
0.6
0.8
1
Fig. 2. Three types of weight function
Fig. 3. Situation of route selection
Table 1. Profit matrix for v1 and v2 v2 n2 n3 n2 (r2 /2 : r2 /2) (r2 : r3 ) v1 n3 (r3 : r2 ) (r3 /2 : r3 /2)
situation shown in Figure 3. In the situation, there are two vehicles (v1 and v2 ) at n1 . They must select from n2 and n3 as the next node. For simplicity, we assume that both vehicles adopt the same risk policy: i.e., their estimation rewards are the same value. Hence, estimate rewards of n2 and n3 are given by r2 and r3 , respectively. A profit matrix for v1 and v2 is shown in Table 1. The profit matrix represents the rewards between the two vehicles in all possible combinations of their choices. For example, if v1 selects n2 and v2 also selects n2 , the rewards of v1 and v2 are r2 /2 (half of r2 ). 4.1
Mixed Strategy
Mixed strategy is a simple probabilistic behavior for transport vehicles. In the situation, v1 selects n2 at the probability p and n3 at the probability (1 − p). In the same way, v2 selects n2 at the probability q and n3 at the probability (1 − q). The probability can be determined in advance because the probabilities are independent of strategies of other vehicles. For example, if p is 0.5, v1 visits n2 and n3 evenly. If p is 1, v1 always visits n2 only. The expected value of the total reward is calculated by Equation (7) where R(n2 ) and R(n3 ) are expected values when v1 selects n2 and n3 , respectively. ⎧ = p · R(n2 ) + (1 − p) · R(n3 ) ⎨R R(n2 ) = q · r22 + (1 − q) · r2 (7) ⎩ R(n3 ) = q · r3 + (1 − q) · r23 4.2
Competitive Strategy
Competitive strategy is an optimal behavior against the mixed strategy. Hence, a vehicle with competitive strategy needs to estimate behaviors of rival
122
N. Mukai, J. Feng, and T. Watanabe
vehicles with mixed strategy (i.e., the probability of route selections). If once the probability is estimated, the vehicle with competitive strategy just selects the node of higher total reward. Consequently, the expected value of the total reward is calculated by Equation (8). R = max (R(n2 ), R(n3 )) 4.3
(8)
Nash Strategy
Nash strategy is an optimal behavior when strategies of other vehicles can not be estimated. Thus, this strategy is also independent of strategies of other vehicles as well as the mixed strategy. The probability is based on Nash Equilibrium in the field of game theory. It means that the expected value of total reward for v2 cannot over than v1 even if v2 selects n2 or n3 . Hence, the probability is calculated by equality R(n2 ) = R(n3 ), and its calculation result is shown in Equations (9) and (10). Consequently, the expected value of the total reward is calculated by Equation (11). Nash strategy also can be regarded as Cooperative strategy because the expected values of both vehicles are the same. p=
2r2 − r3 r2 + r3
(1 − p) =
R= 4.4
(9)
−r2 + 2r3 r2 + r3
(10)
3 r2 · r3 · 2 r2 + r3
(11)
Example
Consider the next specific situation. Let r2 be 3, and r3 be 2. Vehicle v1 adopts mixed strategy with p = 0.5: i.e., v1 selects n2 and n3 evenly. What strategy should the vehicle v2 adopt? Figure 4 shows the expected value of total reward for v2 when 3.5
3
2.5
2
R 1.5
1
0.5
0 0
0.2
0.4
p
0.6
0.8
1
Fig. 4. Expected value of total reward
Simulation Analysis for On-Demand Transport Vehicles
123
p is from 0 to 1. The R(n2 ) is the straight line, and R(n3 ) is the broken line. If v2 can estimate that the probability p is 0.5, v2 should select n2 as the next node. This is because R(n2 ) is more than R(n3 ). If v2 can not estimate the probability p, v2 should select the probability 0.8 that line R(n2 ) intersects line R(n3 ). This is because the value of v1 can not over than v2 even if v1 selects n2 or n3 .
5
Simulation Experiments
In this section, we report two results of simulation experiments. The first simulation experiment compares three risk policies: neutral, risk-avoiding, and risktaking. The second simulation experiment compares three transport strategies: mixed, competitive, and Nash. 5.1
Simulation Environment
The environment in this simulation experiment is as follows. The transport area is a two-way circulate graph which includes 1 depot, 10 nodes, and 5 two-way branches. The lengths of all edges are set to the same value with the speeds of vehicles (i.e., all vehicles move from a current node to all link-nodes in a unit time). The values of distribution function of the area are set from 1 to 5 randomly, and the cycle of the distribution function is set to 10. The cycle is repeated 1000 times. As described above, we deal with synchro games between two vehicles. Thus, in this simulation experiments, two vehicles departure from the same depot at the same time and return to the same depot at the same time. The graphs of the simulation results show the average profit (the number of riding customers) for each vehicle when the capacity of vehicles is set from 10 to 19. 5.2
Risk Policy
We investigate effects of attitudes of vehicles against risks on their profits. The variance σ 2 of occasion patterns is set to 0.1 and 0.5. Figure 5 shows the results of reactive and proactive vehicles. These results indicate that the proactivity of vehicles improves their profits, and enough capacity of vehicles maximizes its performance. However, the uncertainty of the occasion patterns makes the performance less effective. Figure 6 shows the results of three risk policies: neutral(α = 1), risk-avoiding(α = 1/3), and risk-taking (α = 3). In the result(a), there is no difference among the three risk policies because the risk is very low. In the result(b), the risk-taking vehicle shows a good result. This reason is that the average of the distribution function is set to small value so that the number of waiting customers (who cannot ride the previous vehicle) is also small. Hence, the expected reward becomes more important factor for their profits. 5.3
Transport Strategy
We investigate effects of transport strategies on their profits. The variance σ 2 of occasion patterns is set to 0.1. Figure 7 shows the results of the three transport
124
N. Mukai, J. Feng, and T. Watanabe
11.4
7.4
Reactive Proactive
11.2
7.2
11 7
10.8
Reactive Proactive
10.6
Profit
Profit
6.8
10.4
6.6 10.2 10
6.4
9.8 6.2 9.6 9.4
6 10
11
12
13
14
15
16
17
18
19
10
11
12
13
Capacity
14
15
16
17
18
19
Capacity
(a) Reactive vs. Proactive (σ 2 = 0.1)
(b) Reactive vs. Proactive (σ 2 = 0.5)
Fig. 5. Comparing between reactive and proactive vehicles 11.4
7.5
Neutral Risk-Avoiding Risk-Taking
11.2
Neutral Risk-Avoiding Risk-Taking
7.4
11 7.3
Profit
Profit
10.8
10.6
7.2
10.4 7.1 10.2 7 10
9.8
6.9 10
11
12
13
14
15
16
17
18
19
10
11
12
13
Capacity
14
15
16
17
18
19
Capacity
(a) Policies for risk (σ 2 = 0.1)
(b) Policies for risk (σ 2 = 0.5)
Fig. 6. Comparing among three risk policies 12.5
12.5
Mixed Strategy 12
Mixed Strategy 12
Comp Strategy
11.5
Nash Strategy
11.5
11
Profit
Profit
11
10.5
10.5
10
10
9.5
9.5
9
9
8.5
8.5
10
11
12
13
14
15
16
17
18
Capacity
(a) Mixed Strategy vs. Comp. Strategy
19
10
11
12
13
14
15
16
17
18
19
Capacity
(b) Mixed Strategy vs. Nash Strategy
Fig. 7. Simulation results for transport strategy
strategies: mixed, competitive, and Nash. These results indicate that competitive and Nash strategies naturally show good results, and enough capacity of vehicles maximizes its performance. At the first view, two curves of competitive and Nash
Simulation Analysis for On-Demand Transport Vehicles
125
strategies are almost the same. However, we can see that the curve of competitive strategy slightly increases earlier than the curve of Nash one. This fact suggests that vehicles should adopt Nash strategy while learning actions of rival vehicles to estimate their strategies if the capacity is not so enough. Additionally, after the learning actions of rival vehicles, the vehicles should adopt competitive strategy.
6
Conclusion
In this paper, we focused on on-demand transportation systems. The service of traditional systems is limited due to conflicts with other transport systems and uncertainties about other transport vehicles. Therefore, we formalized the risk for transport vehicles and classified the policy against the risk into three types: neutral, risk-avoiding, and risk-taking. Furthermore, we showed decision-making processes based on the game theory and classified strategy for transport vehicles into three types: mixed, competitively, and Nash. Finally, we reported the results of simulation experiments. The results indicate that the appropriate risk policy and transport strategy for the environment can improve the profits of transport systems even though there are conflicts and uncertainties. In the future work, we must consider alternate games among transport vehicles (i.e., differential arrival time). Moreover, we would like to extend our theory to actual environments.
Acknowledgment We would like to thank Japan Society for the Promotion of Science (JSPS). And, we acknowledge to Prof. Naohiro Ishii of Aichi Institute of Technology.
References 1. Ohta, M., Shinoda, K., Noda, I., Kurumatani, K., Nakashima, H.: Usability of demand-bus in town area. Technical Report 2002-ITS-11-33, Technical Report of IPSJ (2002) in Japanese. 2. Noda, I., Ohta, M., Shinoda, K., Kumada, Y., Nakashima, H.: Is demand bus reasonable in large scale towns? Technical Report 2003-ICS-131, Technical Report of IPSJ (2003) in Japanese. 3. Harano, T., Ishikawa, T.: On the vakidity of cooperated demand bus. Technical Report 2004-ITS-19-18, Technical Report of IPSJ (2004) in Japanese. 4. Desrochers, M., Lenstra, J., Savelsbergh, M., F.Soumis: Vehicle routing with time windows: Optimizatin and approximation. Vehicle Routing: Methods and Studies (1988) 65–84 5. Solomon, M., Desrosiers, J.: Time window constrained routing and scheduling problems. Transportations Science 22 (1988) 1–13 6. Gibbons, R.: Game Theory for Applied Economics. Princeton Univ Pr (1992)
126
N. Mukai, J. Feng, and T. Watanabe
7. Mukai, N., Feng, J., Watanabe, T.: Dynamic construction of routine patterns for transport vehicles based on ant colony system. IPSJ Journal 46 (2005) to be appeared in this November. 8. Mukai, N., Feng, J., Watanabe, T.: Proactive route planning based on expected rewards for transport systems. In: Proceedings of IEEE International Conference on Tools with Artificial Intelligence. (2005) to be appeared in this November.
A Set Theoretic View of the ISA Hierarchy Yee Chung Cheung, Paul Wai Hing Chung, and Ana S˘ al˘ agean Department of Computer Science Loughborough University Loughborough, UK
[email protected],
[email protected] Abstract. The ISA (is-a) hierarchies are widely used in the classification and the representation of related objects. In terms of assessing similarity between two nodes, current distance approaches suffer from its nature that only parent-child relationships among nodes are captured in the hierarchy. This paper presents an idea of treating a hierarchy as a set rather than as a tree in the traditional view. The established set theory is applied to provide the foundation where the relations between nodes can be mathematically specified and results in a more powerful and logical assessment of similarity between nodes. Keywords: fuzzy matching, information retrieval, knowledge representation.
1
Introduction
The ISA hierarchies are widely used to classify and represent domain concepts. In its tree structure, the root represents the most general concept and its child nodes represent more specific concepts. Each node can be further decomposed as necessary. In a hierarchy the parent-child relation is the only one that is explicitly represented between nodes. The mainstream approaches to assess the similarity between nodes in a hierarchy are developed based on the idea of conceptual distance. The conceptual distance between two nodes is defined in terms of the length of the shortest path that connects the nodes in a hierarchy (Rada, [5]). However, similarity assessment based on conceptual distance does not always provide satisfactory results. This paper describes an alternative approach that views a hierarchy as a set which enables richer information to be specified. In this view the established set theory can be applied to both hierarchy specification and similarity assessment. The next section gives a brief description of similarity assessment based on conceptual distance. The third section presents the set theoretic approach. The application of the set theoretic view to similarity assessment is demonstrated in section four. A number of examples of capability matching are used throughout the paper. The paper ends with a discussion and conclusion section. M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 127–136, 2006. c Springer-Verlag Berlin Heidelberg 2006
128
2
Y.C. Cheung, P.W.H. Chung, and A. S˘ al˘ agean
Conceptual Distance Approaches
Figure 1 is a simple capability ontology of programming skills represented as a hierarchy. For example, the term Object-Oriented represents the general concept of object-oriented programming skills; the term VB means Visual Basic programming skills. The parent-child relation between Object-Oriented and VB can be interpreted as VB programming skill is a kind of Object-Oriented programming skill. This capability ontology can be used to describe the skills of agents and the required capability to perform specific tasks. {Programming}
Logic
Prolog
Borland C++
Object-Oriented
C++
Java
MS C++
Structure
VB
VB5
COBOL
VB6
RPG
VB.Net
Standard Architect
Developer
Professional
Fig. 1. A simple ISA hierarchy of programming skills
To identify the most appropriate agent for a given task, it is required to assess the goodness of fit (GOF) of an agent’s skills against the required capability. The GOF is represented as a number in the interval [0, 100] where the upper limit 100 implies a perfect match. In the following examples, oa refers to the capability of an available agent and or refers to the required capability for performing a task. The following equation, taken from [4], defines GOF based on the distance approach: IP + EP GOF = 1 − × 100 IR + ER where IP is the number of edges on the path between or and the common ancestor of or and oa ; EP is the number of edges of the path between oa and the common ancestor of or and oa ; IR is the number of edges on the path between or and the root of the hierarchy; ER is the number of edges on the path between oa and the root of the hierarchy.
A Set Theoretic View of the ISA Hierarchy
129
Table 1 shows the results of applying this equation to a few examples based on Figure 1. From these examples, it can be seen that this approach does not always produce appropriate GOF values. Consider examples 2 and 3: if the required capability is Java, both available capabilities VB and C++ have the same GOF with value 50. However, an agent who knows C++ may require less effort to learn Java than another agent who knows VB. Another problem can be found in examples 4 and 5. In example 5, the required capability is C++ and the availability is general capability in Object-Oriented. However, it has the same GOF as example 4 where the required capability is any Object-Oriented skills and the available capability is C++. Finally, examples 6 and 7 show a serious problem: when either oa or or is the root, the GOF value is always 0. Table 1. Examples GOF using traditional distance approach Example oa or GOF(oa , or ) 1 Java Java 100 2 VB Java 50 3 C++ Java 50 4 C++ Object-Oriented 66 5 Object-Oriented C++ 66 6 MS C++ Programming 0 7 Programming MS C++ 0
Chung and Jefferson suggested in [3] that different types of relationship between two nodes in a hierarchy have to be dealt with appropriately. They identified four different categories in matching domain concepts in a hierarchy, which are: 1. 2. 3. 4.
oa oa oa oa
is the same as or ; is a descendant of or in the hierarchy; is an ancestor of or in the hierarchy; and or are on different branches in the hierarchy.
In category 1, as oa is the same as or , i.e. oa = or , it is obvious that GOF(oa , or ) = 1. In category 2, oa is a concept of or . Therefore, if a task requires or then someone who knows oa is suitable, and thus GOF(oa , or ) = 1. In category 3, oa is more general than or . It means that oa may or may not be what the user is required. A general rule for the domain is required. In category 4, oa and or are on different branches in the hierarchy. Their investigation concludes that it is inappropriate to apply a general rule to determine the GOF value in this category. Nodes on different branches in a hierarchy may or may not be related. It is up to the domain experts to determine how closely two nodes are related or not related at all. On the other hand, in [6] Sussna identified that besides the length of the path, the specificity of two nodes in the path (measured by the depth in the hierarchy) is an important parameter that affects the distance measure. In his work, a weight
130
Y.C. Cheung, P.W.H. Chung, and A. S˘ al˘ agean
is assigned to each edge in the hierarchy and the total weight of the path between two nodes is calculated. The weights try to capture the fact that for the same path length, nodes lower in the hierarchy seem to be conceptually closer. In another similar work, [2], Agirre and Rigau took the density of concepts in the hierarchy into consideration: concepts in a deeper part of the hierarchy should be ranked closer, and the Conceptual Density [1] formula is used to provide more accurate results. Although the above works improve the assessment of similarity, they still suffer from the nature of the ISA hierarchy where only the parent-child relation is captured.
3
Set Theoretic Approach
We propose to view a hierarchy as a collection of sets and their inclusion relationships. Namely to each node in the tree we associate a set and each edge from a parent S to a child A represents the fact that the set A is included in the set S, i.e. A ⊆ S. This corresponds to the intuition that the notion A is conceptually included in the more general notion S. Different children of the same parent may or may not overlap. This also corresponds intuitively to the fact that the concepts may or may not have some degree of similarity. We also quantify the “size” of the sets by defining a measure function on the set of all subsets of the root set. For each such set A its measure is a real number µ(A) with µ(A) ≥ 0. As usual the measure function will have the properties: 1. µ(∅) = 0 (The empty set has size 0) 2. If A ⊆ B then µ(A) ≤ µ(B) 3. If A and B are disjoint then µ(A ∪ B) = µ(A) + µ(B). We are interested not so much in the sizes of the sets but rather in their relative sizes. For each set A except the root we define the quantity P (A) representing the relative size of A against the size of its parent set S, i.e. P (A) = µ(A)/µ(S). Intuitively this quantifies what proportion of the general concept S is covered by the concept A. Obviously, since A ⊆ S, we have 0 ≤ P (A) ≤ 1. For each parent S having children A1 , A2 , . . . , Ak we assume we are given P (A1 ), P (A2 ), . . . , P (Ak ) and P (Ai1 ∩ Ai2 ∩ . . . ∩ Ait ) for all 2 ≤ t ≤ k and 1 ≤ i1 < i2 < . . . < it ≤ k. We make an important simplifying assumption, namely that each child is, in a sense “uniformly distributed” throughout its parent set. More precisely, if a node S has children A1 and A2 which are not disjoint (i.e. A1 ∩ A2 = ∅) and furthermore A1 has children B1 and B2 , then say B1 appears in the same proportion in A1 ∩ A2 as in A1 , that is µ(B1 ∩ A1 ∩ A2 )/µ(A1 ∩ A2 ) = P (B1 ). In the sequel we will call this assumption “the uniformity property”. We are now ready to define GOF(oa , or ) for our model. Intuitively, we want to measure what proportion of the required notion or is covered by the available notion oa . Therefore, we define GOF(oa , or ) = 100
µ(oa ∩ or ) . µ(or )
(1)
A Set Theoretic View of the ISA Hierarchy
131
We will look in more detail at how can this be computed according to the positions of oa and or in the hierarchy. A summary will be given in Table 2. If or = oa or or ⊂ oa then oa ∩ or = or so (1) becomes GOF(oa , or ) = 100. This fits well with the intuition that we have a perfect match in this case. If oa ⊂ or then oa ∩ or = oa so (1) becomes GOF(oa , or ) = 100µ(oa )/µ(or ). This can be computed as follows: assume the path in the tree from oa to its ancestor or consists of the sets oa ⊆ B1 ⊆ B2 ⊆ . . . ⊆ Bu ⊆ or . Then GOF(oa , or ) = 100
µ(oa ) µ(oa ) µ(B1 ) µ(Bu ) = · ···· · µ(or ) µ(B1 ) µ(B2 ) µ(or )
hence GOF(oa , or ) = 100P (oa )P (B1 ) · · · P (Bu ) Finally we have the case when none of oa and or are included in the other. We look first at the situation where oa and or are siblings, i.e. both are children of the same parent S. We have: µ(oa ∩ or ) GOF(oa , or ) = 100 = 100 µ(or )
µ(oa ∩or ) µ(S) µ(or ) µ(S)
= 100
P (oa ∩ or ) P (or )
For the more general case when none of oa and or are included in the other and they are not siblings, GOF(oa , or ) can be computed as follows: let S be the common ancestor of oa and or and let the path from oa to S consist of the sets oa ⊆ B1 ⊆ B2 ⊆ . . . ⊆ Bu ⊆ S and the path from or to S consist µ(oa ) of the sets or ⊆ C1 ⊆ C2 ⊆ . . . ⊆ Cv ⊆ S. Then, as before, we have µ(B = u) P (oa )P (B1 ) · · · P (Bu−1 ). Due to the uniformity property, oa occupies uniformly a proportion µ(oa )/µ(Bu ) of any subset of Bu , in particular of Bu ∩ or . This means µ(oa ∩ or ) µ(oa ∩ Bu ∩ or ) µ(oa ) = = µ(Bu ∩ or ) µ(Bu ∩ or ) µ(Bu ) so µ(oa ∩ or ) = µ(Bu ∩ or )µ(oa )/µ(Bu ). We still have to compute µ(Bu ∩ or ). Since or ⊆ Cv we have µ(Bu ∩ or ) = µ(Bu ∩ Cv ∩ or ). Again by the uniformity property, Bu ∩ Cv occupies uniformly a proportion µ(Bu ∩ Cv )/µ(Cv ) of any subset of Cv , in particular of or . Hence µ(Bu ∩ or ) µ(Bu ∩ Cv ∩ or ) µ(Bu ∩ Cv ) = = µ(or ) µ(or ) µ(Cv ) so µ(Bu ∩ or ) = µ(or )µ(Bu ∩ Cv )/µ(Cv ). So we can compute GOF(oa , or ) = 100
µ(oa ∩ or ) µ(Bu ∩ or )µ(oa ) µ(Bu ∩ Cv )µ(oa ) = 100 = 100 µ(or ) µ(Bu )µ(or ) µ(Cv )µ(Bu )
and finally GOF(oa , or ) = 100
P (Bu ∩ Cv ) P (oa )P (B1 ) · · · P (Bu−1 ). P (Cv )
132
Y.C. Cheung, P.W.H. Chung, and A. S˘ al˘ agean Table 2. Formulae for GOF in the set approach GOF(oa , or )
Explanations
General formula oa = or oa ⊃ or oa ⊂ or
a ∩or ) 100 µ(oµ(o r) 100 100 100P (oa )P (B1 ) · · · P (Bu )
oa ⊆ B1 ⊆ . . . ⊆ Bu ⊆ or
oa and or siblings
a ∩or ) 100 P (o P (or )
u ∩Cv ) oa and or arbitrary 100 P (B P (oa )P (B1 ) · · · P (Bu−1 ) oa ⊆ B1 ⊆ . . . ⊆ Bu ⊆ S P (Cv ) or ⊆ C1 ⊆ . . . ⊆ Cv ⊆ S
Table 2 summarises the formulae for computing GOF using the set approach. Examples of computations of GOF using this definition will be given in the next section. We note that storing at each node all the quantities P (Ai1 ∩Ai2 ∩. . .∩Ait ) for all 1 ≤ t ≤ k and 1 ≤ i1 < i2 < . . . < it ≤ k can lead to excessive memory usage. It also provides a level of detailed information that often will be neither available nor needed. We can simplify the representation as follows. For each parent S having children A1 , A2 , . . . , Ak we assume we are given P (A1 ∩A2 ∩. . .∩Ak ) and, optionally P (A1 ), P (A2 ), . . . , P (Ak ) and P (Ai1 ∩ Ai2 ∩ . . . ∩ Ait ) for all 2 ≤ t < k and 1 ≤ i1 < i2 < . . . < it ≤ k. If we are not given P (Ai1 ∩ Ai2 ∩ . . . ∩ Ait ) for some t and some 1 ≤ i1 < i2 < . . . < it ≤ k, we assume by default that Ai1 ∩ Ai2 ∩ . . . ∩ Ait = A1 ∩ A2 ∩ . . . ∩ Ak , and therefore P (Ai1 ∩ Ai2 ∩ . . . ∩ Ait ) = P (A1 ∩ A2 ∩ . . . ∩ Ak ). If P (A1 ), P (A2 ), . . . , P (Ak ) are not given then we assume by default that P (A1 ) = P (A2 ) = . . . = P (Ak ) and S = A1 ∪ A2 ∪ . . . Ak . We can then deduce the values P (Ai ) using the inclusion-exclusion formula: k µ(A1 ∪ A2 ∪ . . . Ak ) = µ(Ai ) − µ(Ai1 ∩ Ai2 ) + . . . + i=1
1≤i1 0.75 then set μk = μk /2. 4-If rk < 0.25 then set μk = 2.μk . 5-If VN (xk + pk ) < VN (xk ) then the new iteration is accepted. 6-If the stopping condition for the training is not met, return to step 2.
On-Line Learning of a Time Variant System
911
3 On-Line Version As pointed out before, the difficulties come from computing the derivatives for the Hessian matrix, inverting this matrix and computing the trust region, the region for which the approximation contained in the calculation of the Hessian matrix is valid. In the literature, some attempts to build on-line versions can be found, namely the work done by Ngia [4] developing a modified iterative Levenberg-Marquardt algorithm which includes the calculation of the trust region and the work in [5] which implements a Levenberg-Marquardt algorithm in sliding window mode for Radial Basis Functions. 3.1 A Double Sliding Window Approach with Early Stopping The current work is an evolution of the one presented in [6] where an on-line version of the Levenberg-Marquardt algorithm was implemented using a sliding window with Early Stopping and static test set. In the present work two sliding windows are used, one for the training set and another for the evaluation set with all the data being collected on-line. As in the previous work, the Early Stopping technique [7], [8] is used for avoiding the overfitting problem because it is almost mandatory to employ a technique to avoid overtraining when dealing with systems that are subject to noise. The Early Stopping technique was chosen over other techniques that could have been used (like Regularization and Prunning techniques) because it has less computational burden. The use of two sliding windows will introduce some difficulties since both data sets will be changing during training and evaluation phases. For these two windows it is necessary to decide their relative position. In order to be able to perform Early Stopping in a valid way, it was decided to place the windows in a way that the new samples will go into the test window and the samples that are removed from the test set will go in to the training set according to figure 1. Training window
Test window Sample displacement direction
Most recent samples
Fig. 1. Relative position of the training and test sets
If the inverse relative position of the two windows was used the samples would be part of the test set after they have been part of the training set and so the objective of evaluating the generalization ability would be somehow faked. In order to save some of the time necessary to collect all the samples needed to fill both the test and training window, the training is started after some data has been
912
F.M. Dias et al.
collected but before the windows are both filled. The test window always keeps the same number of samples, while the training window is growing in the initial stage. The choice of maintaining the test window always with the same number of points was taken with the objectives of maintaining this window as stable as possible (since it is responsible for producing the numerical evaluation of the models) and assuming the use of a minimal test window that should not be shortened. The windows may not change in each training iteration since all the time between sampling is used for training which may permit several training epochs before a new sample is collected. But each time the composition of the windows is changed the test and training errors will probably be subjected to an immediate change that might be interpreted as an overtraining situation. The Early Stopping technique is here used in conjunction with a measure of the best model that is retained for control. Each time there is a change in the windows the values of the best models (direct and inverse) must be re-evaluated because the previous ones, obtained over a different test set, are no longer valid for a direct comparison. The procedure used for the identification of the direct model on-line is represented in figure 2. As was already explained, training starts when a predefined amount of points have been collected. After each epoch the ANN is evaluated with a test set. The value of the Mean Square Error (MSE) obtained is used to perform Early Stopping and to retain the best models. The conditions for overtraining and the maximum number of epochs are then verified. If they are true, the Flag, which indicates that the threshold of quality has been reached, will also be verified and if it is on, the training of the inverse model starts, otherwise the models will be reset since new models need to be prepared. Resetting here means that the model’s weights are replaced by random values between -1 and 1 as in used in the initial models. After testing the conditions for overtraining and the maximum number of epochs, if they are both false, the predefined threshold of quality will also be tested and if it has been reached the variable Flag will be set to on. In either case the remaining time of the sampling period is tested to decide if a new epoch is to be performed or if a new sample is to be collected and training is to be performed with this new sample included in the sliding window. Each time a new sample is collected both the direct and inverse models must be re-evaluated with the new test set and the information about the best models updated. The procedure for the inverse model is very similar and almost the same block diagram could be used to represent it. The on-line training goes on switching from direct to inverse model each time a new model is produced. The main difference between the procedure for direct and inverse model lies in the evaluation step. While the direct model is evaluated with a simple test set, the inverse model is evaluated with a control simulation corresponding to the hybrid Direct/Specialized approach for generating inverse models [9]. During the on-line training the NNSYSID [10] and NNCTRL [11] toolboxes for MATLAB were used.
On-Line Learning of a Time Variant System
913
Start
Yes Training and Evaluation Best model yet?
Collecting Information reached the preset amount of points?
Yes
No
Save Model
No Overtraining Yes
No
Exceeded the number of epochs?
Is the Flag on? Yes
No
Yes
No
Reset Model
Reached the threshold of quality? Yes
No
Flag on
Enough time for another iteration?
Yes
No
Start Training the Inverse Model
Collect sample
Re-evaluate best models
Fig. 2. Block diagram for the identification of a direct model on-line
4 Time Variant System The time variant system used for this test is a cruise control with a variable gain according to equation 1: 0.05 s+0.05 if sample 500 (14) 0.075 if sample > 500 s+0.05 that is, the gain is increased 50% after sample 500 is collected. The system used is rather simple but the time variance introduced allows testing the functionality of the algorithm proposed.
914
F.M. Dias et al.
5 Results The test sequence is composed of 100 points, the sliding window used for training has a maximum of 200 samples and training starts after 240 samples have been collected. Both direct and inverse models were one hidden layer models with 6 neurons on the hidden layer and one linear output neuron. The direct model has as inputs the past two samples of both the output of the system and the control signal. The sampling period used was 150 seconds, which allowed performing several epochs of training between each control iteration. During the initial phase of collecting data a PI was used in order to keep the system operating within the range of interest. The PI parameters are Kp=0.01 and Ki=0.01. After this initial phase the PI is replaced by an IMC controller, using the direct and inverse models trained on-line. The first inverse model is ready at sample 243, that is only 2 samples after the training has started. After the 240 samples have been collected it only took one sampling period to complete the training of the direct model and another sampling period to complete the inverse model even though the Matlab code was running on a personal computer with a Celeron processor at 466MHz using 64Mbytes of memory.
Fig. 3. Result obtained using the IMC control strategy and the proposed on-line learning strategy with a time variant system
On-Line Learning of a Time Variant System
915
As can be seen from figure 3 after the gain is changed the quality of control decreases and a oscilation appears that can reach 20 to 30 degrees. For a new model to be prepared a test window and a training window with data that belong to the new system need to completed. The necessary points are collected in the next 300 samples. Since the system was changed at sample 500 the renewing of the data in the sliding windows is only complete at sample 800. During this stage training and evaluation are quite difficult since there is a mixture of data from the two systems, but once the two sliding windows are filled with data from the new system the solution proposed is able to create accurate models: the quality of control is re-established at sample 795.
6 Conclusion This paper presents on-line identification and control of a time varying system using the Levenberg-Marquardt algorithm in a batch version with two sliding windows and Early Stopping. The problems pointed out in section 3 to perform Early Stopping under a changing sliding window for the training set were not critical and a good choice of the parameters for identification of the overtraining situation and for the maximum number of iterations for each attempt to create a model were sufficient to obtain reasonable models to perform IMC control. The PID is here just used to maintain the system in the operating range while data is being collected and is disconnected as soon as the ANN models are ready. As shown here, even for a noisy system, for which overtraining is a real problem it is possible to create models on-line of acceptable quality as can be concluded from the values presented in table 1. The artificial time variant system used in this experiment is an extreme situation compared with most real time variant systems, which vary slowly. Nevertheless the successful application of the Levenberg-Marquardt sliding window solution to this situation shows that it will also work for slowly variant systems. With this artificial time variant system it can be seen that learning is very difficult when the sliding windows contains data from the system previously and after the change. This corresponds to training an ANN with mixed data from two different systems. Once this situation is overcome the models of the new system are rapidly obtained. This problem would not happen for a slow changing system. The sliding window solution with Early Stopping for the Levenberg-Marquardt algorithm is very interesting since it does not limit the capabilities of the algorithm and overcomes the limitations of application of the traditional solution.
References 1. K. Levenberg, “A method for the solution of certain problems in least squares,” Quart. Appl. Math., vol. 2, pp. 164–168, 1944. 2. D. Marquardt, “An algorithm for least -squares estimation of nonlinear parameters,” SIAM J. Appl. Math., vol. 11, pp. 431–441, 1963.
916
F.M. Dias et al.
3. M. Nørgaard, O. Ravn, N. K. Poulsen, and L. K. Hansen, Neural Networks for Modelling and Control of Dynamic Systems, Springer, 2000. 4. Lester S. H. Ngia, System Modeling Using Basis Functions and Application to Echo Cancelation, Ph.D. thesis, Department of Signals and Systems School of Electrical and Computer Engineering, Chalmers University of Technology, 2000. 5. P. Ferreira, E. Faria, and A. Ruano, “Neural network models in greenhouse air temperature prediction,” Neurocomputing, vol. 43, no. 1-4, pp. 51–75, 2002. 6. Fernando Morgado Dias, Ana Antunes, Jos´e Vieira, and Alexandre Manuel Mota, “Implementing the levenberg-marquardt algorithm on-line: A sliding window approach with early stopping,” 2nd IFAC Workshop on Advanced Fuzzy/Neural Control, 2004. 7. N. Morgan and H. Bourlard, “Generalization and parameter estimation in feedforward nets: Some experiments.,” Advances in Neural Information Processing Systems, Ed. D.Touretzsky, Morgan Kaufmann, pp. 630–637, 1990. 8. Jonas Sj¨oberg, Non-Linear System Identification with Neural Networks, Ph.D. thesis, Dept. of Electrical Engineering, Link¨oping University, Su´ecia, 1995. 9. Fernando Morgado Dias, Ana Antunes, and Alexandre Mota, “A new hybrid direct/specialized approach for generating inverse neural models,” WSEAS Transactions on Systems, vol. 3, Issue 4, pp. 1521–1529, 2004. 10. M. Nørgaard, “Neural network based system identification toolbox for use with matlab, version 1.1, technical report,” Tech. Rep., Technical University of Denmark, 1996. 11. M. Nørgaard, “Neural network based control system design toolkit for use with matlab, version 1.1, technical report,” Tech. Rep., Technical University of Denmark, 1996.
Bioinformatics Integration Framework for Metabolic Pathway Data-Mining Tom´as Arredondo V.1 , Michael Seeger P.2 , Lioubov Dombrovskaia3, Jorge Avarias A.3 , Felipe Calder´ on B.3 , Diego Candel C.3 , Freddy Mu˜ noz R.3 , 2 2 2 Valeria Latorre R. , Loreine Agull´o , Macarena Cordova H. , and Luis G´ omez2 1
2
Departamento de Electr´ onica
[email protected] Millennium Nucleus EMBA, Departamento de Qu´ımica 3 Departamento de Inform´ atica, Universidad T´ecnica Federico Santa Mar´ıa Av. Espa˜ na 1680, Valpara´ıso, Chile
Abstract. A vast amount of bioinformatics information is continuously being introduced to different databases around the world. Handling the various applications used to study this information present a major data management and analysis challenge to researchers. The present work investigates the problem of integrating heterogeneous applications and databases towards providing a more efficient data-mining environment for bioinformatics research. A framework is proposed and GeXpert, an application using the framework towards metabolic pathway determination is introduced. Some sample implementation results are also presented.
1
Introduction
Modern biotechnology aims to provide powerful and beneficial solutions in diverse areas. Some of the applications of biotechnology include biomedicine, bioremediation, pollution detection, marker assisted selection of crops, pest management, biochemical engineering and many others [3, 14, 22, 21, 29]. Because of the great interest in biotechnology there has been a proliferation of separate and disjoint databases, data formats, algorithms and applications. Some of the many types of databases currently in use include: nucleotide sequences (e.g. Ensemble, Genbank, DDBJ, EMBL), protein sequences (e.g. SWISS-PROT, InterPro, PIR, PRF), enzyme databases (e.g. Enzymes), metabolic pathways (e.g. ERGO, KEGG: Kyoto Encyclopedia of Genes and Genomes database) and literature references (e.g. PubMed) [1, 4, 11, 6, 23]. The growth in the amount of information stored has been exponential: since the 1970s, as of April 2004 there were over 39 billion bases in Entrez NCBI (National Center of Bioinformatics databases), while the number of abstracts in PubMed has been growing by 10,000 abstracts per week since 2002 [3, 13]. The availability of this data has undoubtedly accelerated biotechnological research. However, because these databases were developed independently and are M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 917–926, 2006. c Springer-Verlag Berlin Heidelberg 2006
918
T. Arredondo V. et al.
managed autonomously, they are highly heterogeneous, hard to cross-reference, and ill-suited to process mixed queries. Also depending on the database being accessed the data in them is stored in a variety of formats including: a host of graphic formats, RAW sequence data, FASTA, PIR, MSF, CLUSTALW, and other text based formats including XML/HTML. Once the desired data is retrieved from the one of the database(s) it typically has to be manually manipulated and addressed to another database or application to perform a required action such as: database and homology search (e.g. BLAST, Entrez), sequence alignment and gene analysis (e.g. ClustalW, T-Coffee, Jalview, GenomeScan, Dialign, Vector NTI, Artemis) [8]. Beneficial application developments would occur more efficiently if the large amounts of biological feature data could be seamlessly integrated with data from literature, databases and applications for data-mining, visualization and analysis. In this paper we present a framework for bioinformatic literature, database, and application integration. An application based of this framework is shown that supports metabolic pathway research within a single easy to use graphical interface with assisting fuzzy logic decision support. To our knowledge, an integration framework encompassing all these areas has not been attempted before. Early results have shown that the integration framework and application could be useful to bioinformatics researchers. In Section 2, we describe current metabolic pathway research methodology. In Section 3 we describe existing integrating architectures and applications. Section 4 describes the architecture and GeXpert, an application using this framework is introduced. Finally, some conclusions are drawn and directions of future work are presented.
2
Metabolic Pathway Research
For metabolic pathway reconstruction experts have been traditionally used a time intensive iterative process [30]. As part of this process genes first have to be selected as candidates for encoding an enzyme within a potential metabolic pathway within an organism. Their selection then has to be validated with literature references (e.g. non-hypothetical genes in Genbank) and using bioinformatics tools (e.g. BLAST: Basic Local Alignment Search Tool) for finding orthologous genes in various other organisms. Once a candidate gene has been determined, sequence alignment of the candidate gene with the sequence of the organism under study has to be performed in a different application for gene locations (e.g. Vector NTI, ARTEMIS). Once the genes required have been found in the organism then the metabolic pathway has to be confirmed experimentally in the laboratory. For example, using the genome sequence of Acidithiobacillus ferrooxidans diverse metabolic pathways have been determined. One major group of organisms that is currently undergoing metabolic pathways research is bacteria. Bacteria possess the highest metabolic versatility of the three domains of living organisms. This versatility stems from their expansion into different natural niches, a remarkable degree of physiological and genetic
Bioinformatics Integration Framework for Metabolic Pathway Data-Mining
919
adaptability and their evolutionary diversity. Microorganisms play a main role in the carbon cycle and in the removal of natural and man-made waste chemical compounds from the environment [17]. For example, Burkholderia xenovorans LB400 is a bacterium capable of degrading a wide range of PCBs [7, 29]. Because of the complex and noisy nature of the data, any selection of candidate genes as part of metabolic pathways is currently only done by human experts prior to biochemical verification. The lack of integration and standards in database, application and file formats is time consuming and forces researchers to develop ad hoc data management processes that could be prone to error. In addition, the possibility of using Softcomputing based pattern detection and analysis techniques (e.g. fuzzy logic) have not been fully explored as an aid to the researcher within such environments [1, 23].
3
Integration Architectures
The trend in the field is towards data integration. Research projects continue to generate large amounts of raw data, and this is annotated and correlated with the data in the public databases. The ability to generate new data continues to outpace the ability to verify them in the laboratory and therefore to exploit them. Validation experiments and the ultimate conversion of data to validated knowledge need expert human involvement with the data and in the laboratory, consuming time and resources. Any effort of data and system integration is an attempt towards reducing the time spent by experts unnecessarily which could be better spent in the lab [5, 9, 20]. The biologist or biochemist not only needs to be an expert in his field as well stay up to date with the latest software tools or even develop his own tools to be able to perform his research [16, 28]. One example of these types of tools is BioJava, which is an open source set of Java components such as parsers, file manipulation tools, sequence translation and proteomic components that allow extensive customization but still require the development of source code [25]. The goal should be integrated user-friendly systems that would greatly facilitate the constructive cycle of computational model building and experimental verification for the systematic analysis of an organism [5, 8, 16, 20, 28, 30]. Sun et al. [30] have developed a system, IdentiCS, which combines the identification of coding sequences (CDS) with the reconstruction, comparison and visualization of metabolic networks. IdentiCS uses sequences from public databases to perform a BLAST query to a local database with the genome in question. Functional information from the CDSs is used for metabolic reconstruction. One shortcoming is that the system does not incorporate visualization or ORF (Open Reading Frames) selection and it includes only one application for metabolic sequence reconstruction (BLAST). Information systems for querying, visualization and analysis must be able to integrate data on a large scale. Visualization is one of the key ways of making the researcher work easier. For example, the spatial representation of the genes within the genome shows location of the gene, reading direction and metabolic
920
T. Arredondo V. et al.
function. This information is made available by applications such as Vector NTI and Artemis. The function of the gene is interpreted through its product, normally the protein in a metabolic pathway, which description is available from several databases such as KEGG or ERGO [24]. Usually, the metabolic pathways are represented as a flowchart of several levels of abstraction, which are constructed manually. Recent advances on metabolic network visualization include virtual reality use [26], mathematical model development [27], and a modeling language [19]. These techniques synthesize the discovery of genes and are available in databases such as Expasy, but they should be transparent to the researcher by their seamless integration into the research application environment. The software engineering challenges and opportunities in integrating and visualizing the data are well documented [16, 28]. There are some applications servers that use a SOAP interface to answer queries. Linking such tools into unified working environment is non-trivial and has not been done to date [2]. One recent approach towards human centered integration is BioUse, a web portal developed by the Human Centered Software Engineering Group at Concordia University. BioUse provided an adaptable interface to NCBI, BLAST and ClustalW, which attempted to shield the novice user from unnecessary complexity. As users became increasingly familiar with the application the portal added shortcuts and personalization features for different users (e.g. pharmacologists, microbiologists) [16]. BioUse was limited in scope as a prototype and its website is no longer available. Current research is continued in the CO-DRIVE project but it is not completed yet [10].
4
Integration Framework and Implementation
This project has focused on the development of a framework using open standards towards integrating heterogeneous databases, web services and applications/tools. This framework has been applied in GeXpert, an application for bioinformatics metabolic pathway reconstruction research in bacteria. In addition this application includes the utilization of fuzzy logic for helping in the selection of the best candidate genes or sequences for specified metabolic pathways. Previously, this categorization was typically done in an ad hoc manner by manually combining various criteria such as e-value, identities, gaps, positives and score. Fuzzy logic enables an efficiency enhancement by providing an automated sifting mechanism for a very manually intensive bioinformatics procedure currently being performed by researchers. GeXpert is used to find, build and edit metabolic pathways (central or peripheral), perform protein searches in NCBI, perform nucleotide comparisons of organisms versus the sequenced one (using tblastn). The application can also perform a search of 3D models associated with a protein or enzyme (using the Cn3D viewer), generation of ORF diagrams for the sequenced genome, generation of reports relating to the advance of the project and aid in the selection of BLAST results using T-S-K (Takagi Sugeno Kang) fuzzy logic.
Bioinformatics Integration Framework for Metabolic Pathway Data-Mining
921
Fig. 1. High level framework architecture
The architecture is implemented in three layers: presentation, logic and data [18]. As shown in Figure 1, the presentation layer provides for a web based as well as a desktop based interface to perform the tasks previously mentioned.0 The logical layer consists of a core engine and a web server. The core engine performs bioinformatic processing functions and uses different tools as required: BLAST for protein and sequence alignment, ARTEMIS for analysis of nucleotide sequences, Cn3D for 3D visualization, and a T-S-K fuzzy logic library for candidate sequence selection. The Web Server component provides an HTTP interface into the GeXpert core engine. As seen in Figure 1, the data layer includes a communications manager and a data manager. The communication manager is charged with obtaining data (protein and nucleotide sequences, metabolic pathways and 3D models) from various databases and application sources using various protocols (application calls, TCP/IP, SOAP). The data manager implements data persistence as well as temporary cache management for all research process related objects. 4.1
Application Implementation: GeXpert
GeXpert [15] is an open source implementation of the framework previously described. It relies on Eclipse [12] builder for multiplatform support. The GeXpert desktop application as implemented consists of the following: – Metabolic pathway editor: tasked with editing or creating metabolic pathways, shows them as directed graphs of enzymes and compounds. – Protein search viewer: is charged with showing the results of protein searches with user specified parameters. – Nucleotide search viewer: shows nucleotide search results given user specified protein/parameters.
922
T. Arredondo V. et al.
– ORF viewer: is used to visualize surrounding nucleotide ORFs with a map of colored arrows. The colors indicate the ORF status (found, indeterminate, erroneous or ignored). The GeXpert core component implements the application logic. It consists of the following elements: – Protein search component: manages the protein search requests and its results. – Nucleotide search component: manages the nucleotide search requests and its results. In addition, calls the Fuzzy component with the search parameters specified in order to determine the best candidates for inclusion into the metabolic pathway. – Fuzzy component: attempts to determine the quality of nucleotide search results using fuzzy criteria [1].The following normalized (0 to 1 values) criteria are used: e-value, identities, gaps. Each has five membership functions (very low, low, medium, high, very high), the number of rules used is 243 (35 ). – ORF component: identifies the ORFs present in requested genome region. – Genome component: manages the requests for genome regions and its results. GeXpert communications manager receives requests from the GeXpert core module to obtain and translates data from external sources. This module consists of the following subcomponents: – BLAST component: calls the BLAST application indicating the protein to be analyzed and the organism data base to be used. – BLAST parser component: translates BLAST results from XML formats into objects. – Cn3D component: sends three-dimensional (3D) protein models to the Cn3D application for display. – KEGG component: obtains and translates metabolic pathways received from KEGG. – NCBI component: performs 3D protein model and document searches from NCBI. – EBI component: performs searches on proteins and documentation from the EBI (European Bioinformatic Institute) databases. GeXpert data manager is tasked with load administrating and storage of application data: – Cache component: in charge of keeping temporary search results to improve throughput. Also implements aging of cache data. – Application data component: performs data persistence of metabolic paths, 3D protein models, protein searches, nucleotide searches, user configuration data, project configuration for future usage.
Bioinformatics Integration Framework for Metabolic Pathway Data-Mining
923
Fig. 2. Metabolic pathway viewer
4.2
Application Implementation: GeXpert User Interface and Workflow
GeXpert is to be used in metabolic pathway research work using a research workflow similar to identiCS [30]. The workflow and some sample screenshots are given: 1. User must provide the sequenced genome of the organism to be studied. 2. The metabolic pathway of interest must be created (can be based on the pathway of a similar organism). In the example in Figure 2, the glycolisis/gluconeogenesis metabolic pathway was imported from KEGG. 3. For each metabolic pathway a key enzyme must be chosen in order to start a detailed search. Each enzyme could be composed of one or more proteins
Fig. 3. Protein search viewer
924
T. Arredondo V. et al.
Fig. 4. Nucleotide search viewer
4.
5.
6. 7.
8. 9. 10.
5
(subunits). Figure 3 shows the result of the search for the protein homoprotocatechuate 2,3-dioxygenase. Perform a search for amino acid sequences (proteins) in other organisms. Translate these sequences into nucleotides (using tblastn) and perform an alignment in the genome of the organism under study. If this search does not give positives results it return to the previous step and search another protein sequence. In Figure 4, we show that the protein homoprotocatechuate 2,3-dioxygenase has been found in the organism under study (Burkholderia xenovorans LB400) in the chromosome 2 (contig 481) starting in position 2225819 and ending in region 2226679. Also the system shows the nucleotide sequence for this protein. For a DNA sequence found, visualize the ORF map (using the ORF Viewer) and identify if there is an ORF that contains a large part of said sequence. If this is not the case go back to the previous step and chose another sequence. Verify if the DNA sequence for the ORF found corresponds to the chosen sequence or a similar one (using blastx). If not choose another sequence. Establish as found the ORF of the enzyme subunit and start finding in the surrounding ORFs sequences capable of coding the other subunits of the enzyme or other enzymes of the metabolic pathway. For the genes that were not found in the surrounding ORFs repeat the entire process. For the enzyme, the 3D protein model can be obtained and viewed if it exists (using CN3D as integrated into the GeXpert interface). The process is concluded by the generation of reports with the genes found and with associated related documentation that supports the information about the proteins utilized in the metabolic pathway.
Conclusions
The integrated framework approach presented in this paper is an attempt to enhance the efficiency and capability of bioinformatics researchers. GeXpert is a
Bioinformatics Integration Framework for Metabolic Pathway Data-Mining
925
demonstration that the integration framework can be used to implement useful bioinformatics applications. GeXpert has so far shown to be a useful tool for our researchers; it is currently being enhanced for functionality and usability improvements. The current objective of the research group is to use GeXpert in order to discover new metabolic pathways for bacteria [29]. In addition to our short term goals, the following development items are planned for the future: using fuzzy logic in batch mode, web services and a web client, blastx for improved verification of the ORFs determined, multi-user mode to enable multiple users and groups with potentially different roles to share in a common research effort, peer to peer communication to enable the interchange of documents (archives, search results, research items) thus enabling a network of collaboration in different or shared projects, and the use of intelligent/evolutionary algorithms to enable learning based on researcher feedback into GeXpert.
Acknowledgements This research was partially funded by Fundaci´ on Andes.
References 1. Arredondo, T., Neelakanta, P.S., DeGroff, D.: Fuzzy Attributes of a DNA Complex: Development of a Fuzzy Inference Engine for Codon-’Junk’ Codon Delineation. Artif. Intell. Med. 35 1-2 (2005) 87-105 2. Barker, J., Thornton, J.: Software Engineering Challenges in bioinformatics. Proceedings of the 26th International Conference on Software Engineering, IEEE (2004) 3. Bernardi, M., Lapi, M., Leo, P., Loglisci, C.: Mining Generalized Association Rules on Biomedical Literature. In: Moonis, A. Esposito, F. (eds): Innovations in Applied Artificial Intelligence. Lect. Notes Artif. Int. 3353 (2005) 500-509 4. Brown, T.A.: Genomes. John Wiley and Sons, NY (1999) 5. Cary M.P., Bader G.D., Sander C.: Pathway information for systems biology (Review Article). FEBS Lett. 579 (2005) 1815-1820, 6. Claverlie, J.M.: Bioinformatics for Dummies. Wiley Publishing (2003) 7. C´ amara, B., Herrera, C., Gonz´ alez, M., Couve, E., Hofer, B., Seeger, M.: From PCBs to highly toxic metabolites by the biphenyl pathway. Environ. Microbiol. (6) (2004) 842-850 8. Cohen, J.: Computer Science and Bioinformatics. Commun. ACM 48 (3) (2005) 72-79 9. Costa, M., Collins, R., Anterola, A., Cochrane, F., Davin, L., Lewis, N.: An in silico assessment of gene function and organization of the phenylpropanoid pathway metabolic networks in Arabidopsis thaliana and limitations thereof. Phytochem. 64 (2003) 1097-1112 10. CO-Drive project: http://hci.cs.concordia.ca/www/hcse/projects/CO-DRIVE/ 11. Durbin, R.: Biological Sequence Analysis, Cambridge, UK (2001) 12. Eclipse project: http://www.eclipse.org 13. Entrez NCBI Database: www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide
926
T. Arredondo V. et al.
14. Gardner, D.,: Using genomics to help predict drug interactions. J Biomed. Inform. 37 (2004) 139-146 15. GeXpert sourceforge page: http://sourceforge.net/projects/gexpert 16. Javahery, H., Seffah, A., Radhakrishnan, T.: Beyond Power: Making Bioinformatics Tools User-Centered. Commun. ACM 47 11 (2004) 17. Jimenez, J. I., Miambres, B., Garca, J., Daz, E.: Genomic insights in the metabolism of aromatics compounds in Pseudomonas. In: Ramos, J. L. (ed): Pseudomonas, vol. 3. NY: Kluwer Academic Publishers, (2004) 425-462 18. Larman, C.: Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and Iterative Development. Prentice Hall PTR (2004) 19. Loew, L. W., Schaff, J. C.: The Virtual Cell: a software environment for computational cell biology. Trends Biotechnol. 19 10 (2001) 20. Ma H., Zeng A., Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics 19 (2003) 270-277 21. Magalhaes, J., Toussaint, O.: How bioinformatics can help reverse engineer human aging. Aging Res. Rev. 3 (2004) 125-141 22. Molidor, R., Sturn, A., Maurer, M., Trajanosk, Z.: New trends in bioinformatics: from genome sequence to personalized medicine. Exp. Gerontol. 38 (2003) 10311036 23. Neelakanta, P.S., Arredondo, T., Pandya, S., DeGroff, D.: Heuristics of AIBased Search Engines for Massive Bioinformatic Data-Mining: An Example of Codon/Noncodon Delineation Search in a Binary DNA Sequence, Proceeding of IICAI (2003) 24. Papin, J.A., Price, N.D., Wiback, S.J., Fell, D.A., Palsson, B.O.: Metabolic Pathways in the Post-genome Era. Trends Biochem. Sci. 18 5 (2003) 25. Pocock, M., Down, T., Hubbard, T.: BioJava: Open Source Components for Bioinformatics. ACM SIGBIO Newsletter 20 2 (2000) 10-12 26. Rojdestvenski, I.: VRML metabolic network visualizer. Comp. Bio. Med. 33 (2003) 27. SBML: Systems Biology Markup Language. http://sbml.org/index.psp 28. Segal, T., Barnard, R.: Let the shoemaker make the shoes - An abstraction layer is needed between bioinformatics analysis, tools, data, and equipment: An agenda for the next 5 years. First Asia-Pacific Bioinformatics Conference, Australia (2003) 29. Seeger, M., Timmis, K. N., Hofer, B.: Bacterial pathways for degradation of polychlorinated biphenyls. Mar. Chem. 58 (1997) 327-333 30. Sun, J., Zeng, A.: IdentiCS - Identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence. BMC Bioinformatics 5:112 (2004)
The Probability Distribution of Distance TSS-TLS Is Organism Characteristic and Can Be Used for Promoter Prediction Yun Dai1 , Ren Zhang2 , and Yan-Xia Lin3 1 3
School of Mathematics and Applied Statistics, University of Wollongong, Australia 2 School of Biological Sciences, University of Wollongong, Australia School of Mathematics and Applied Statistics, University of Wollongong, Australia
Abstract. Transcription is a complicated process which involves the interactions of promoter cis-elements with multiple trans-protein factors. The specific interactions rely not only on the specific sequence recognition between the cis- and trans-factors but also on certain spatial arrangement of the factors in a complex. The relative positioning of involved cis-elements provides the framework for such a spatial arrangement. The distance distribution between gene transcription and translation start sites (TSS-TLS) is the subject of the present study to test an assumption that over evolution, the TSS-TLS distance becomes a distinct character for a given organism. Four representative organisms (Escherichia cloi, Saccharomyces cerevisiae, Arabidopsis thaliana and Homo sapiens) were chosen to study the probability distribution of the distance TSSTLS. The statistical results show that the distances distributions vary significantly and are not independent of species. There seems a trend of increased length of the distances from simple prokaryotic to more complicated eukaryotic organisms. With the specific distance distribution data, computational promoter prediction tools can be improved for higher accuracy.
1
Introduction
Transcription initiation is the first and one of the most important control points in regulating gene expression and promoters play a pivotal role in this process of all living organisms. A promoter is the nucleotide acid sequence region upstream of the transcription start site. It includes a core promoter part that governs the basic transcription process and regulatory sequences that control more complicated temporal and spatial expression patterns. It is generally believed that the regulation of transcription relies not only on the specific sequence pattern recognition and interaction between the DNA sequences (cis-elements) and other factors (trans) but also on their spatial organization during the process. This implies that the cis-elements need generally to be located in a defined distance one to the other. With the vast amount of genomic data available and rapid development of computational tools, there is an increasing advancement in the computational M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 927–934, 2006. c Springer-Verlag Berlin Heidelberg 2006
928
Y. Dai, R. Zhang, and Y.-X. Lin
techniques to predict promoters, including recognition of promoter region, identification of functional TSS, or both of them [1, 3, 6, 7]. Many algorithms have been developed for the prediction of promoters which vary in performance. In general, the algorithms for promoter recognition can be classified into two groups: the signal-based approach and the content-based approach. There are also methods that combine the both - looking for signals and for regions of specific compositions [6]. Although a great deal of research has been undertaken in the area of promoter recognition, the prediction techniques are still far from satisfactory. Most programs inevitably produce a considerable level of false positives (FPs) at any significant level of true positive (TP) recognition. The promoter recognition systems for large-scale screening require acceptable ratios of TP and FP predictions (i.e. those that maximize the TP recognition while minimize the FP recognition.). The probability distribution of the distance between transcription start site and translation start site (TSS-TLS) has been studied and utilized in promoter prediction of E. coli bacterium, a model prokaryotic organism. It has been demonstrated that, combining with the information of the E. coli empirical probability distribution of the distance TSS-TLS, promoter prediction of E. coli can be improved significantly [2], using a common neural network promoter prediction tool NNPP2.2 [8, 10]. This has triggered our further interest in examining the TSS-TLS distance distribution in different living organisms with an assumption that over evolution, the TSS-TLS distance will become a distinct character for a given organism. Here, we report our investigation results on four living organisms and discuss the significance and potential application.
2
The Distribution of Distance TSS-TLS
The distance TSS-TLS is defined as the number of base pairs between the TSS and the first nucleotide (A in the case of ATG) of the TLS. For convenience, the following notation will be used in this paper. Given a genome sequence, let s denote the position of TSS of a gene and let D(s) denote the distance (base) between s and its TLS. D is considered as a random variable and we hypothesize that, for different organisms, the probability distribution of D would be significantly different. For each organism, the distribution of D should have its own special characteristics. To test our hypothesis, we considered the available data of four different representative organisms in this study: a model bacterium Escherichia coli [4], a model monocellular eukaryote yeast Saccharomyces cerevisiae [12], a model higher plant Arabidopsis thaliana [5] and human Homo sapiens [11]. All of them have record on the positions of TSS and TLS identified through laboratory testing. In genome sequence, sometimes one TLS might correspond to multiple TSS’s. Considering the probability distribution of the distance TSS-TLS, all such valid pairs will be counted. Thus, the sample size of genome sequence for each species, E. coli, S. cerevisiae, A. thaliana and H.sapiens, are 820, 202, 18958 and 12763 respectively.
The Probability Distribution of Distance TSS-TLS
929
Table 1. Summary statistics for distance TSS-TLS of different organisms Mean
SD
Median Kurtosis Skewness Min
E. coli 98.37 117.90 S. cerevisiac 110.78 102.90 A. thaliana 137.20 134.27 H. sapiens 10904.95 44422.41
56 68 98 353
11.73 2.77 32.09 227.42
2.92 1.79 3.58 12.69
0 9 1 1
Max
Sample Size 920 820 481 202 3214 18958 1313939 12727
We produce the histogram and smoothed probability density function of the distance TSS-TLS for the four organisms and present them in Figures 1 and 2. Summary statistics for four organisms are given in Table 1. In Figures 1 and 2, all four organisms have positively-skewed distributions. From Table 1, we note that both of mean and median of the samples increases from E. Coli to S. cerevisiae, then to A. thaliana and finally to H.sapiens. Since distributions are positively skewed, the values of median are more meaningful. Additionally, the smoothed density functions for E. coli and S. cerevosiae have higher and narrower peak than that for A.thaliana and H. sapiens. This fact can also be seen from the values of sample standard deviation given in Table 1. Figures 1 and 2 clearly show that the smoothed density functions for the four organisms are significantly different. Since the sample size from each individual organism is reasonably large and the difference between the smoothed density functions are significant, we are confident that the probability distributions of distance TSS-TLS are different from organism to organism. (This can be tested by nonparametric method. Since the case is so clear, such a test is omitted.) Histogram and smoothed density of the Distance TSS and TLS given by S. cerevisiae (Sample Size is 202)
0.006 0.0
0.0
0.002
0.004
Density Estimate
0.004 0.002
Density Estimate
0.006
Histogram and smoothed density of the Distance TSS and TLS given by E.Coli (Sample Size is 820)
0
200
400
600
TSS and TLS Distance (base)
800
1000
0
100
200
300
400
500
TSS and TLS Distance (base)
Fig. 1. The histogram and smoothed density of distance TSS-TLS for E. coli and S. cerevisiae
Y. Dai, R. Zhang, and Y.-X. Lin
6*10^-6
Density Estimate
2*10^-6
4*10^-6
0.0006 0.0004
0
0.0
0.0002
Density Estimate
Histogram and smoothed density of the Distance TSS and TLS given by H. sapiens (Sample Size is 12727)
8*10^-6
10^-5
Histogram and smoothed density of the Distance TSS and TLS given by A. thaliana (Sample Size is 18959)
0.0008
0.0010
930
0
2000
4000
6000
8000
10000
TSS and TLS Distance (base)
14000
0
2*10^5
6*10^5
10^6
1.4*10^6
TSS and TLS Distance (base)
Fig. 2. The histogram and smoothed density of distance TSS-TLS for A. thaliana and H. sapiens
In summary, the distance TSS-TLS of simple prokaryotes tends to be shorter than that of complicated eukaryotes and the value of the distance TSS-TLS of simple prokaryotes is clustered closely than that of complicated eukaryotes. Random variable D is related to the TSS position. Therefore, it must contain information about the location of TSS. The above fact indicates that for different organisms the probability distributions of D are different. This indicates that the amount information on TSS position provided by the distribution of D will be different. Some may show more information than the others. For example, Figure 3 shows that the distance TSS-TLS for E. coli and S. cerevisiae might contain more information TSS than A. thaliana does. (Due to scaling problem, we are unable to add the smoothed density function for H. sapiens to Figure 3. According to the statistics for H. sapiens, the smoothed density function for H. sapiens will be flatter than other three and the information on TSS will be less than others as well.).
3
Improving Promoter Predictions Through the Information of Distance TSS-TLS
Given a DNA sequence, denote S a set containing all TSSs of the underlying DNA sequence and s a position in the DNA sequence. Currently, many promoter prediction tools are able to provide the probability s as a position of TSS, i.e. P (s ∈ S) (s ∈ S means s is a position for TSS). However, the prediction false positives rate is very high because they are unable to take into account any
The Probability Distribution of Distance TSS-TLS
931
0.0080 0.0075 0.0070 0.0065 0.0060 ecolip.density$y
0.0055 0.0050 0.0045 0.0040 0.0035 0.0030 0.0025 0.0020 0.0015 0.0010 0.0005 0.0 0
500
1000
1500
2000
The Distance TSS-TLS
Fig. 3. Comparison of the Smoothed Density of the Distance TSS-TLS for E. coli, S. cerevisiae and A. thaliana
other particular information from the underlying DNA sequence when the tools were developed, for example NNPP2.2 [9, 8, 10]. In previous section, we have demonstrated that different species have different probability distributions for the distance TSS-TLS. Therefore, the information of the distance TSS-TLS should benefit promoter prediction. In the following we give a formula to show how we combine the probability P (s ∈ S) with the information of the distance TSS-TLS given by an individual organism to improve TSS prediction for the organism. Consider the probability that position s is a TSS position and the distance between s and relevant TLS is between in d − a and d + a can be expressed as P (s ∈ S and D(s) ∈ [d − a, d + a])
(1)
where a and d are positive integer and D(s) denotes the distance between s and the TLS associated with s. We suggest to use this joint probability to replace P (s ∈ S) as a score for the purpose of predicting TSS position. Probability (1) can be evaluated by using the following formula P (s ∈ S and D(s) ∈ [d − a, d + a]) = P (s ∈ S)P (D(s) ∈ [d − a, d + a]|s ∈ S). (2) By using the above formula, the estimation of P (s ∈ S and D(s) ∈ [d − a, d + a]) can be obtained subject to the estimation of P (s ∈ S) and P (D(s) ∈ [d − a, d + a]|s ∈ S) being available. The estimation of P (s ∈ S) can be obtained from other promoter prediction techniques, for example NNPP2.2. The estimation of P (D(s) ∈ [d − a, d + a]|s ∈ S), as what have done in [2], can be provided by empirical sample distribution related to the under considered species. If the
932
Y. Dai, R. Zhang, and Y.-X. Lin
estimation of P (s ∈ S) is obtained through NNPP2.2, based on the design of NNPP2.2, we always let a = 3 in (1). The previous study on E. coli has shown that using probability P (s ∈ S and D(s) ∈ [d − a, d + a]) to measure the likelihood of TSS position will be able to significantly reduce the level of false positives [2]. The main reason we suggest to use P (s ∈ S and D(s) ∈ [d − a, d + a]) to predict TSS instead of P (s ∈ S) alone is based on the following argument. In practice once a position s in a DNA sequence is suspected as a position of TSS, we will study the position s as well as the gene sequence around s. Therefore, while testing if s is a true position of TSS, we should not ignore the information showed in its neighbor. From our data (Figures 1-2), it found all the density functions are strong positively-skewed. Considering the plots of the histogram and smoothed density of the distance TSS-TLS given by A. thaliana and H. sapiens, we found, when the distance TSS-TLS is large beyond certain point, the value of the probability density function will drop down to a very small value quickly and the density function provides very limited information for the distribution of the position TSS beyond that point. However, before the point the probability density function of the distance TSS-TLS has a bell shape within a certain range of the distance close to zero. Obviously, this range is varied from species to species, and the centre, the peak and the fitness of the bell shape are different from species to species as well. All of these indicate that the information of the distance TLS-TSS within that range might help the prediction of TSS. In practice, if we are only interested in predicting TSS within a particular range, formula (2) can be modified as follows P (s ∈ S, D(s) ∈ [d − a, d + a] and D(s) < K) = P (s ∈ S)P (D(s) < K|s ∈ S)P (D(s) ∈ [d − a, d + a]|s ∈ S and D(s) < K) (3) where the estimation of P (s ∈ S) can be obtained from other promoter prediction tool, say NNPP2.2; for different spices P (D(s) < K|s ∈ S) and P (D(s) ∈ (d − a, d + a)|s ∈ S and D(s) < K) can be estimated through the sample drawn from the population of the underlying species. For example, for A. thaliana data, K can be chosen as 706. Among A. thaliana sample (size 18958), 94.77% TSS have distance TSS-TLS less 706. Since the samples size is very large, we expect that 94.77% is a reasonable estimation of the probability P (D(s) < 706|s ∈ S). Considering A. thaliana genes with distance TSS-TLS less than 706, Figure 4 (a) gives the histogram diagram which provides the information on conditional probability P (D(s) ∈ [d − a, d + a]|s ∈ S and D(s) < 706), where a = 3. Another example is for H. sapiens. If K is chosen as 42517, by the same reason above, P (D(s) < 42517|s ∈ S) can be estimated through sample, which is about 94%, i.e. approximately 94% H. sapiens genes have distance TSS-TLS less than 42517. The histogram diagram for H. sapiens sample with distance TSS-TLS < 42517 is given by Figure 4 (b).
The Probability Distribution of Distance TSS-TLS Histogram and smoothed density of the Distance TSS and TLS given by H. sapiens (distance < 42517)
0.0
0.0
0.001
0.0001
0.0002
Density Estimate
0.003 0.002
Density Estimate
0.004
0.0003
0.005
Histogram and smoothed density of the Distance TSS and TLS given by A. thaliana (distance 0 then 8: solve PLN conflicts(rev PLN) 9: {Scheduling part; reasoning on time and resources} 10: for all ai ∈ rev PLN | ai ∈ rev SCH do 11: if ∃ a consistent allocation of ai in rev SCH then 12: rev SCH ← rev SCH ∪ {ai } 13: else 14: solve SCH conflicts(ai , R, quantity, time point, Cannot acts, Should not acts, C)
Algorithm 1. General scheme for integrating planning and scheduling
1184
A. Garrido, E. Onaind´ıa, and Ma.d.G. Garc´ıa-Hern´ andez
Repair problem
Planning problem
solve_PLN_conflicts Flaw repair solve_SCH_conflicts
Delete actions
Insert actions Goals to be achieved Goals to be maintained
Action Network
ø ø
G G’
Planning Process
ai,R,quantity,time_point, Cannot_acts,Should_not_acts,C Actions scheduling (consistent allocation)
P1
P2
P3
P4
P5
XP
1
P2
XP XP XP 3
4
5
Extract heuristic information
Scheduling Process
Fig. 3. Subtasks that are derived from steps 8 and 14 of Algorithm 1
executable due to time/resources conflicts. In both cases, the planner performs the same tasks (see Fig. 3): it converts the original problem into a planning repair problem which is later solved by applying classical planning techniques (flaw repair mechanisms, such as inserting actions to support unsatisfied preconditions). During the process of solving the planning problem, the planner eventually invokes the scheduler to obtain advice and make a better decision about the most convenient planning decision. That is, the scheduler helps the planner focus its search upon those planning alternatives that better accomplish the time/resource objectives/constraints. The key points in this functional behaviour are the two calls (steps 8 and 14 of Algorithm 1) to the planning process and the subtasks derived from them (see Fig. 3): 1. solve PLN conflicts. In this case, the planner invokes itself in order to fix the planning conflict in rev PLN. This is the task that requires less heuristic information, entailing fewer decision points, because it means to transforming a planning conflict into a planning repair problem. Hence, the overall idea is how to solve a planning conflict from a planning perspective. 2. solve SCH conflicts. Unlike solve PLN conflicts, this is a harder task as it implies solving a scheduling conflict through planning. When the scheduler detects a constraint violation due to an oversubscripted resource R, it attempts to find a consistent allocation for actions in rev SCH. The ultimate goal is to find out when that resource is needed (at which time point), how much is required (quantity) and who requires it (action ai ). Additionally, the scheduler provides the planner some helpful information on which actions cannot and should not be removed from the AN (Cannot acts and Should not acts); for instance, a set of actions that must compulsorily be in
An Integrated and Flexible Architecture for Planning and Scheduling
1185
the plan, or that support the fulfillment of some time/resource constraints, or that correspond to a sequence of actions already scheduled, etc. Moreover, in order to guide the planning search as much as possible, the scheduler supplies an extra set of constraints (C) that the planner must hold when solving the problem; for instance, do not use more than n units of resource Rj when solving this conflict. 3. The decision criteria to come up with a planning repair problem. After receiving any of the two mentioned calls, the planner analyses the information provided and transforms it into its own terms, i.e. into a planning repair problem (see Fig. 3). The idea is to formulate a repair problem in terms of which subgoals are to be achieved and which ones are to be maintained in the AN. In order to deduce this information, the planner makes use of the information provided by the scheduler and/or its own planning tactics to decide whether it is convenient to delete some actions prior to call the planning solver. 4. The advice provided by the scheduler to focus the planning search process. This stage corresponds to a classical planning solving task. The interesting point here is that the planner will count on the scheduler in order to decide which planning alternative to take on. This way, the planner shows the scheduler the planning decision points (plans P1 , P2 , . . . P5 ) and the scheduler determines the best alternative (P2 ) from the time/resource perspective. As a conclusion, this model represents an open collaborative approach where planner and scheduler work together to repair any conflict that arises during the process of building up the plan, no matter the origin of the conflict. Both components work in a strongly-coupled manner taking into account the information/advice provided by the other module (bidirectional communication). Obviously, the planner plays the central role in this collaborative work, though it is intelligently guided by the heuristic information provided by the scheduler. 3.3
Application Example
In this section we will use the rovers problem as an application example to show how our integrated model works. The goal is to have soil, rock and image data communicated. The initial plan, generated by any automated planner is shown in Fig. 4-a, which is subsequently converted into the AN of Fig. 4-b. However, the problem imposes these additional complex constraints: i) actions have different durations and consume energy of rover0; ii) the initial energy level of rover0 is 10; iii) sun is available to recharge only in waypoint1 and waypoint3; iv) the subgoal (have image objective1) cannnot be achieved before time 20; v) the effect (calibrated camera0) only persists during 30 units; and vi) the goal (communicated image) has a deadline, so it must be achieved before time 40. Following Algorithm 1, actions are selected from the AN and revised by the planning and scheduling parts. While no planning nor scheduling conflicts arise, the sets rev PLN and rev SCH are updated. This process continues until revising (comm image wp0 wp1) that introduces a scheduling conflict in rev SCH because resource energy becomes oversubscripted: (comm image wp0 wp1) requires
1186
A. Garrido, E. Onaind´ıa, and Ma.d.G. Garc´ıa-Hern´ andez
6 units of energy, but only 4 (10-3-2-1) are available. Then, the scheduler executes solve SCH conflicts((comm image wp0 wp1), (energy rover0), 2 (6-4), (comm image wp0 wp1).on, {(sample soil wp0), (take image cam0 wp0)},∅, { (energy rover0) ≤ 10}). Note that Cannot acts contains the actions that need to appear in the plan (they are the only way to achieve certain subgoals), whereas Should not acts is empty in this case. On the other hand, C indicates that now energy of rover0 cannot exceed 10 (since this is the initial value). The repair problem solver decides to maintain the same subgoals. Then, the planning solver tries to repair this conflict by inserting actions (recharge wp1) and (recharge wp3), thus creating several alternative plans that may be inserted at different timepoints (P1 , P1 , . . . P3 ), as seen in Fig. 4-b. Since there are several different alternatives, the planner communicates the scheduler these choices and the scheduler verifies their consistency, informing the planner about the best choice to follow. After solving this conflict, the algorithm resumes by studying the next actions in the AN until no more actions need to be revised. [10,10] (sample_soil wp0).on
(sample_soil wp0).off
CL
[-3,-3]
IS (T0)
0: 1: 2: 3: 4: 5: 6: 7:
(calibrate cam0 wp0) (take_image cam0 wp0) (sample_soil wp0) (comm_image wp0 wp1) (comm_soil wp0 wp1) (drop) (sample_rock wp0) (comm_rock wp0 wp1)
CL
communicated_image obj1 [0,40]
[5,5] (calibrate cam0 wp0).on
(calibrate cam0 wp0).off
have_image obj1 [20,¥]
[-2,-2]
P1
recharge wp1 ...
CL
P1' recharge wp3 ...
calibrated cam0 [0,30] [7,7]
(take_image cam0 wp0).on
P2
(take_image cam0 wp0).off
recharge wp1 ...
[-1,-1]
P2' recharge wp3 ...
CL
resource conflict: energy rover0
[15,15] (comm_image wp0 wp1).on
P3 recharge wp1 ...
(comm_image wp0 wp1).off [-6,-6]
P3' recharge wp3 ...
(a)
(b)
Fig. 4. (a) Initial plan given by an automated planner for the rovers application example; (b) Action network that includes the additional complex problem constraints
4
Discussion Through Related Work
From the AI point of view, there exist two immediate perspectives for solving planning and scheduling problems: a planner can be extended to handle time and resources (temporal planning approach) [1, 9]; or a scheduler can be extended to course some dynamic action selection (planning capabilities embedded within scheduling) [4]. In both cases, the indistinguishable mixture of capabilities may increase the complexity of the overall approach, making big problems practically intractable. From the OR point of view, there exist very efficient algorithms to perform the task allocation [10], but modelling the problem and, particularly,
An Integrated and Flexible Architecture for Planning and Scheduling
1187
finding an initial plan as as set of actions becomes much more difficult. Consequently, the way in which planning and scheduling must be combined has been addressed as an interesting hot topic of research [2]. Although there have been some successful attempts of integrating planning and scheduling, such as HSTS [11] or Aspen [12], they are ad-hoc models designed for solving problems in a particular domain. This loss of generality makes them fail at the time of demonstrating success in other domains. On the contrary, our integrated approach tries to be a general, flexible model that dynamically interleaves planning and scheduling processes (both playing a similar role) to tackle planning problems with a strong and weak component of resource/time management. We combine both processes, separating implementations of planning and scheduling to achieve a bi-modular solver, in which each component deals with the respective sub-problem as presented in [13]: the planner produces a plan while the scheduler enforces the time and resource constraints on this plan, but both work on the basis of the information/advice provided by the other component. This allows us to benefit from their knowledge by separate, such as plan execution and reparation, definition of timepoints, resource allocation, temporal constraint management, etc. Further, it also incorporates the use of common integrated heuristics, reasoning on an action network that combines a TCN with a rCN [7, 8], and an intelligent interaction between both processes.
5
Conclusions
Separating the planning component from the scheduling component when solving real-world problems has shown important drawbacks that are difficult to overcome unless an integrated architecture is used. The key issue about integration is how to level out the roles of both processes. In this paper we have proposed a flexible architecture that makes use of two specialised processes for planning and scheduling under a semi-distributed approach, in opposition to other completely isolated approaches. This way, our approach provides a high flexibility to deal with more complex problems in a more efficient way, since planning features of the problem are managed by a planner, while scheduling features are managed by a scheduler. Further, both processes take advantage of common heuristics and of the advice provided by the other module. For instance, in situations where there exist different planning alternatives, the scheduler can help the planner select the best decision according to scheduling features, such as use of resources, fulfillment of temporal constraints, etc., which are per se difficult to be managed by a planner. We certainly know there is still some work to be done before achieving a fully automated tool able to solve real-life problems (some decision points and improvements in the planning/scheduling interaction require further investigation). However, the open design of our architecture allows to incorporate new modules to provide extended reasoning on additional aspects of the problem, such as dealing with soft constraints and preferences and including OR algorithms for scheduling, which is part of our current work.
1188
A. Garrido, E. Onaind´ıa, and Ma.d.G. Garc´ıa-Hern´ andez
Acknowledgments This work has been partially supported by the Spanish government project MCyT TIN2005-08945-C06-06 (FEDER).
References 1. Ghallab, M., Nau, D., Traverso, P.: Automated Planning. Theory and Practice. Morgan Kaufmann (2004) 2. Smith, D., Frank, J., J´ onsson, A.: Bridging the gap between planning and scheduling. Knowledge Engineering Review 15(1) (2000) 47–83 3. Bartak, R.: Integrating planning into production scheduling: a formal view. In: Proc. of ICAPS-2004 Workshop on Integrating Planning Into Scheduling. (2004) 4. Smith, S., Zimmerman, T.: Planning tactics within scheduling problems. In: Proc. ICAPS-2004 Workshop on Integrating Planning Into Scheduling. (2004) 83–90 5. Gerevini, A., Long, D.: Plan constraints and preferences in PDDL3. Technical report, University of Brescia, Italy (2005) 6. Edelkamp, S., Hoffmann, J., Littman, M., Younes, H., eds.: Proc. of the International Planning Competition IPC–2004. (2004) 7. Dechter, R., Meiri, I., Pearl, J.: Temporal constraint networks. Artificial Intelligence 49 (1991) 61–95 8. Wallace, R., Freuder, E.: Supporting dispatchability in schedules with consumable resources. Journal of Scheduling 8(1) (2005) 7–23 9. Gerevini, A., Saetti, A., Serina, I., Toninelli, P.: Planning in PDDL2.2 domains with LPG-TD. In: Proc. ICAPS-2004. (2004) 33–34 10. Winston, W.: Operations Research: Applications and Algorithms. (1994) 11. Muscettola, N.: HSTS: Integrating planning and scheduling. In Zweben, M., Fox, M., eds.: Intelligent Scheduling. Morgan Kaufmann, CA (1994) 169–212 12. Chien, S., et al.: ASPEN - automating space mission operations using automated planning and scheduling. In: Proc. SpaceOps 2000. (2000) 13. Pecora, F., Cesta, A.: Evaluating plans through restrictiveness and resource strength. In: Proc. WIPIS of AAAI-2005. (2005)
A Robust RFID-Based Method for Precise Indoor Positioning Andrew Lim1,2 and Kaicheng Zhang1 1
Department of Industrial Engineering and Logistics Management Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong {iealim, kczhang}@ust.hk 2 School of Computer Science & Engineering South China University of Technology, Guangdong, PR China
Abstract. A robust method for precise indoor positioning utilizing Radio Frequency Identification (RFID) technology is described. The indoor environment is partitioned into discrete locations, and a set of RFID tags are scattered in the environment. Sample reading patterns of RFID tags are collected at each location by an RFID reader. A pattern recognition and classification method is used, when human, vehicle, or other carrier moves around the environment with an RFID reader, to estimate its physical location based on the reading pattern. The resulting estimation error is within one meter. The method is adaptive to different tag distributions, reader ranges, tag reading defects, and other alterable physical constraints. Results from extensive experiments show that the described method is a robust and cost-saving solution to indoor positioning problems in logistics industry. A provisional US patent is granted on the described method. Keywords: Intelligent Systems, Systems for Real Life Applications, Indoor Positioning.
1
Introduction
Positioning and location identification are useful in a number of industrial fields. In logistics and transportation domain, the real-time position of each cargo, vehicle carrier, and even human worker can be valuable information. There are growing needs for acquiring high-precision indoor location information at a low cost. 1.1
Background and Related Work
Global Positioning Systems (GPS) has great value in determining the position of a user around the globe. However, for indoor applications, GPS has some problems. First, when it comes to indoor or urban outdoor areas, GPS is not reliable solution due to the poor reception of satellite signals. In addition, indoor applications usually require a much higher precision than that GPS could achieve, sometimes as high as within one meter. M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1189–1199, 2006. c Springer-Verlag Berlin Heidelberg 2006
1190
A. Lim and K. Zhang
Many of the existing Indoor Positioning Systems (IPS) [1] are based on Wireless Local Area Network. Wireless access points are installed in each room and portable receiving device can determine which room it is in when it receives signal from access points. This approach would be useful to determine which room inside a building the tracked subject is in. However the precision is not high enough for logistics applications. Also there will be problem if any of the access points fails to work. There are other technologies that provide more precise positioning. Wi-Fi, infrared, ultrasonic, or magnetic technologies has been employed in positioning applications such as [2], [3] and [4]. The use of these technologies comes with high-cost. And when comes into industrial environment, the performance may not be good. Radio Frequency Identification (RFID), a technology invented more than half a century ago, has recently receive overwhelming attention, especially in its use in the logistic applications for cargo tracking and identification. For a comprehensive understanding of RFID, refer to [5]. There has been notably efforts in employing RFID in positioning, both in outdoor applications [3] and indoor applications [6, 7]. 1.2
Indoor Positioning – An Industrial Example
Located at the International Container Terminal, SunHing godown, owned by SunHing Group, is one of the largest distribution centers in Hong Kong. Its main operations include receiving goods from various manufacture plants in South China, storing them inside its 11,000 sq ft warehouse, and picking the cargos and loading them into containers, which are then transshipped to various locations in Japan, Europe and the U.S. The receiving, picking and transferring of cargos within SunHing warehouse are done by more than 20 forklifts. In peak seasons, the operations are quite busy and error in placement or picking could be costly. SunHing believes that an indoor positioning system could be used on its forklifts to enhance its operation accuracy and efficiency. Usually within a warehouse, a cargo terminal, or a truck depot, we want to know the position of each vehicle. This position information enables the warehouse to have better performance measurement, dynamic job scheduling, and automatic information acquisition. Furthermore, the use of high precision indoor positioning system is not restricted to such logistics application. It could also be used in high precision local navigation system or intelligent system where location-awareness of the mobile agents is required.
2 2.1
Our Proposed System “Mobile Reader Dispersed Tag” Approach
There exist RFID positioning systems that put tags on mobile agent and place RFID readers at fixed locations [8, 9]. In this approach, when one tag is being
A Robust RFID-Based Method for Precise Indoor Positioning
1191
read by a reader, the system verdicts that the mobile agent carrying the tag is inside the read range of the particular reader. This approach will be practical for applications in which, mobile agents move around well-partitioned and separated areas through some checkpoints. Readers are normally placed at such checkpoints to detect the movement of the agents. However, when the precision requirement is higher, or the number of checkpoint increases, this “mobile tag fixed reader” approach becomes impractical. Also, the unstable nature of wireless communication, especially of the UHF radio wave communication in industrial environment [10], introduces high error rate when position information of the mobile agent is only based on the reading of one single tag. Thus, in contrast to “mobile tag fixed reader” method, a “super-distributed tag infrastructure” [11] approach is proposed. In this approach, a large number of passive tags are distributed in the environment, and an RFID reader is carried by each mobile agent. When an agent moves inside the environment, at each location one tag is read and its location determined. Successful applications of such method include [6] and [7]. However, the restriction that only one single tag could be read at any time is too harsh. To place tags on the floor [6, 7] is also inconvenient or infeasible in some industrial applications. Therefore we propose a new “mobile reader dispersed tag” approach: a large number of tags are dispersed in the environment, allowing more than one tag being read by a reader in the environment. A mobile agent is first placed at designated grid points, or “sample points”. “Sample readings” of what tags are read by reader at each point are collected and stored. After that when the agent is moving about in the environment, the real-time reading results are collected and compared against the sample readings; and the location of the agent is classified into one of the sample points with pattern recognition and classification techniques. 2.2
System Overview
Figure 1 illustrates the architecture of our proposed RFID-based indoor positioning system. The system basically contains Dispersed Tags, Mobile Reading and Processing Module, and Backend Server. Dispersed tags are a set of fixed tags distributed in the environment, in this case on the ceiling of the indoor environment. The Mobile Reading and Processing Module is carried by the mobile agent, for instance, a forklift. The module consists of RFID antenna, RFID reader, PDA, and wireless adapter. RFID reader controls antenna, which emits and receives UHF radio waves, and identify the RFID tags inside its reading range. A PDA or a PC is connected to RFID reader, to control the reader and collect the tag reading results. The results are merely on what tags are read in the most recent reading cycle. No additional information, such as signal strength, or direction of tag is required. The processed reading results are then sent through wireless network to the backend server, where classification algorithms convert the reading results to an estimate of the mobile agent’s location. The backend information and control server could provide positioning service and control for more than one mobile agent, and can be further integrated into the warehouse management system.
1192
A. Lim and K. Zhang
Fig. 1. Architecture of proposed RFID-based indoor positioning system
2.3
Pattern Matching Formulation
We now formulate the positioning problem in the proposed system as a pattern matching process: First, a set R of m RFID tags, each identified by a 96-, 128-, or 256-bit string, is dispersed in the environment. Next, we partition the environment into n discrete grid points. The sample data we collect at the ith (0 ≤ i ≤ n) point is a ternary tuple Si = (xi , yi , oi ), where xi is the x-coordinate of ith sample point, yi is the y-coordinate of ith sample point, and oi a subset of R, is the set of observed RFID tag identification strings at the ith sample point. Finally we define the input and output of the problem. Input: A sample data set U collected at n points {S1 , S2 , . . . , Sn }, and A set T of tags observed by a mobile agent at an unknown position, T ⊆ R. Output: An integer j (1 ≤ j ≤ n), where (xi , yi ) is the estimated location for T .
3 3.1
System Setup and Experiment Design System Setup
We set up out proposed system in a quasi-industrial indoor environment in our re-search center. The effective area is a rectangle of size 4.2 m by 8.4 m. This area
A Robust RFID-Based Method for Precise Indoor Positioning
1193
Fig. 2. Floor Plan of Experiment Environment
is partitioned into 7 rows and 7 columns, thus creating 49 grids. The width and length of a grid are not equal. It is made this way so that one grid can perfectly match with one piece of tile on the ceiling. Fig. 2 shows the floor plan of the test environment and the partition. Fig. 3(a) and (b) are photos of the floor and ceiling of the experiment area. A total number of 176 RFID tags, arranged in 22 columns by 8 rows, are placed onto the ceiling. In Fig. 2, each tag is illustrated by a small square. Each rectangular tile on the ceiling has 4 tags at its vertices and another 4 on its two long sides. The tags used are 4” by 4” Symbol UHF Carton Tag, as shown in Fig. 4. A simple carrier is built for the experiment as a mobile agent, as shown in Fig. 5. One Symbol AR400 RFID Reader with power supply and two Symbol High Performance Antenna are installed on the carrier. Metal bars are attached to both long and short sides of the carrier to assist in placing the carrier at precise locations. The height and angel of the antenna are adjustable. The reader is connected to a computer that collects the reading results at each location. 3.2
Experiment Design
The main purpose of the experiments is to examine the feasibility of the proposed method as an effective and practical approach for indoor positioning in logistics applications. Also, good algorithms for solving the formulated pattern matching problem are to be discovered. A number of factors related to the performance
1194
A. Lim and K. Zhang
Fig. 3. (a)Experimental Indoor Environment (b)The Ceiling
Fig. 4. A Tag Placed on the Ceiling
Fig. 5. Mobile Agent with Reader and Antenna
of the system are considered: height, angle, orientation and power level of the antenna, and the density of sample data collection (sampling rate). In total, 7 sets of sample data are collected, as summarized in Table 1. The standard sample data set U1 is collected at the center of each partitioned grid, thus having 49 sample points. The standard power level is 255, which is full power. The standard antenna angel is 0◦ with respect to the horizontal plane. The standard antenna height is 77 cm (antenna to ceiling 144.5 cm). The standard orientation of the carrier is facing east (long side of antenna parallel to long side of the rectangular area). U2 is collected with standard configuration at both center and boundary of each grid, thus having 15×14 points. U3 through U6 are sample sets with variations on the power, angel, orientation, and height of the antenna, as described in Table 1. U7 is the union of U1 and U5 , introducing another element ri , the orientation, into the tuple (xi , yi , oi ). Note that, when collecting the readings at every point for each sample set, an additional 9 readings are collected. Together the 10 readings at each point are used as testing set of observations at that point with respect to its sample set. And this testing set is classified against a sample data (normally itself, or a standard sample set) with a pattern matching algorithm to examine the accuracy of the classification or the effect of various alteration on the configuration.
A Robust RFID-Based Method for Precise Indoor Positioning
1195
Table 1. Sample Data Sets
4
Sample Data Sets
Descriptions
Sample Points
U1 –7×7 std U2 –15×14 std U3 –7×7 power192 U4 –7×7 slope11 U5 –7×7 side U6 –7×7 height115 U7 –7×7 std side mix
Standard Standard Power level = 192 (half) 11◦ Slope of antenna 11◦ Carrier rotated 90◦ (facing north) Raised antenna U1 ∪ U5
7×7 15×14 7×7 7×7 7×7 7×7 7×7×2
Pattern Matching Algorithms
We have tested several different algorithms to solve the pattern matching problem formulated in 2.3. Each algorithm runs over all 6 sample data and the results are compared. We find that two algorithms – “Intersection over Union” and “Tag-to-Location Mapping Count’ – are able to provide sufficiently good results, as compared to the rest. They are describes in this chapter and the experiment results presented in Chapter 5. 4.1
“Intersection over Union” Algorithm
As in the formulation of the problem in 2.3, the algorithm takes in a sample data set U and an Observation T , and outputs an integer j – the index of the matched pattern. Following are the steps to find such j: 1. For each Si ∈ U , compute the “similarity” between Si and T as follows Sim(Si , T ) =
|Si ∩ T | |Si ∪ T |
(1)
2. Choose j such that Sim(Sj , T ) ≥ Sim(Si , T ) for all 1 ≤ i ≤ n. Equation (1) merely suggests that the “similarity” between the observation set and a sample set is the quotient of the number of their common elements over the number of their unified elements. 4.2
“Tag-to-Location Mapping Count” Algorithm
The “Tag-to-Location Mapping Count” algorithm matches observation set T to a sample set Sj ∈ U in the following way: 1. Define tag-to-location mapping, a mapping from a tag id string to a set of integers, which are the indices of the tag’s associated sample points. T T L(s) = { p | s ∈ op , 1 ≤ i ≤ n}
(2)
1196
A. Lim and K. Zhang
2. Given T , for each sample point at (xi , yi ), compute the mapping count score 0, if i ∈ T T L(s) score(i, T ) = (3) 1, if i ∈ T T L(s) s∈T
3. Choose j such that score(j, T ) ≥ score(i, T ) for all 1 ≤ i ≤ n. Intuitively, (3) suggests that for each tag in an observation, we add one count to every sample point where its corresponding sample set contains the tag. The sample point that receives most count will be considered as solution. Note that, however, this would be faulty when dealing with observations at boundary areas, as Fig. 6 explains. Thus we exclude the boundary points when evaluating the performance of “Tag-to-Location Mapping Count” algorithm.
Fig. 6. “Tag-to-Location Mapping Count” Algorithms Runs into Problem with Boundary Cases
5 5.1
Experimental Results Comparison on Algorithms
To compare the effectiveness of the classification algorithms, 10 readings collected at each point in 7×7 sample data are used as observations and U1 is used Table 2. Comparison on Two Algorithms Sample Test Test “Intersection over “Tag-to-Location Data Data Points Union” Mapping Count” Errors Error Rate Errors Error Rate U1 U1 U1 U1 U1 U2
U1 U3 U4 U5 U6 U2
250 250 250 250 250 1210
3 113 5 81 143 43
1.2% 45.2% 2.0% 32.4% 57.2% 3.55%
1 72 36 111 116 72
0.4% 28.8% 14.4% 44.4% 46.4% 5.95%
A Robust RFID-Based Method for Precise Indoor Positioning
1197
as sample data set. The reading accuracy are presented in Table 2. Boundary points are excluded for the comparison. Thus for row 1–5, there are 5×5×10=250 test cases. And for row 6, there are 11×11×10=1210 test cases. We can see that, both algorithms perform well on U1 –7×7 std self-validation test case. Each algorithm outperforms the other in some of the rest test cases. We also observe that change of power, orientation and height of the antenna while using the standard sample data significantly reduces the positioning accuracy. 5.2
Positioning Accuracy of the Proposed Method
Table 3 presents the results of the “Intersection over Union” algorithm running over each same sample data and test data. For all the test cases, our proposed method could provide >97% accuracy in determining the correct position. Furthermore, for the 97% positioning accuracy with ∑ O C ij
No
Y es
C o n t a in e r i = m i n _ c o s t ( C a n d i d a te ) i+ +
No
j=1
CWi ≥
ni
∑
OCW
j =1
ij
No
Y es
C a n d id a te + = C c
Fig. 2. The procedure of choosing appropriate container for a cluster
works [2]. It uses genetic algorithm combined with heuristic rules to do the single container packing with constraints of load bearing limitation and stable packing. Each object is assigned a unique number from 1 to n, which is the number of objects to be packed. Genetic algorithm uses these object numbers as genes of individuals. Then, objects are assigned to different layers by heuristic rules based on unloading sequence of objects. In the meantime, multiple-chromosome individual is introduced into genetic operations in order to do crossover and mutation independently for different layers. Finally, heuristic rules, based on constraints of stability and load bearing, are applied to transfer one dimension string into threedimension packing layout [2]. If it exists that all the objects in a cluster is unable to pack into specified container, the proposed method will redistribute container for the particular cluster. First the algorithm looks for larger capacity, which can pack all objects in the cluster. If the larger capacity container can be found, we choose the container with minimum cost and do the single container packing again. If it is unable to find appropriate container to pack all the objects in the particular container, the initial k value of K-means will increase by 1, then start over again grouping and packing procedure for the new clusters.
4 Simulation and Discussion In order to evaluate the performance of the proposed method, a computer program was developed based on the proposed algorithm and hundreds of testing cases, which
Multiple-Constraint Multiple-Container Packing Problems
1207
were generated randomly, were simulated. The number of objects for these testing cases is around 100 and types of container are 4~12. From the simulation results, we found the efficiency, which is calculated based on the Formula 1 and the proposed mathematic model in section 2.2, of most of these testing cases is around 80%. The number of generation does not have much effect on the performance when it is over 5, as shown in Figure 3. Because the degree of approximate of objects will be strong after some generations of K-means algorithm and the number of objects in these testing cases is not too many, it is quite reasonable that it doesn't needs too much generation to reach the steady results of clustering. We use three related factors of objective function, such as capacity of container, unloading sequence of objects, and dimensions of objects, as the measure of degree of approximate during the procedure of clustering, so the performance of packing is very high efficiency. The Relationship of Objective and Generation Num. of K-means 40 35 30
) % ( e v i t c e j b O
25 20 15 10 5 0
Testing Cases
代數5 GN: 5 GN: 15 代數15
代數10 GN: 10 GN: 20 代數20
Fig. 3. The relationship of objective and generation number of K-means clustering
From Figure 4, we obviously observed that it has better objective when the weight of container coast is set to 0.25, the weight of free space is set to 0.25, and the weight of loading overhead is set to 0.5. This results show that loading overhead can be satisfied than the other two factors in the objective.
The Effects of Weights for Factors of Objective Function 50
NPP 0.25 SWP 0.5 USSP 0.25 NPP 0.5 SWP 0.25 USSP 0.25 NPP 0.25 SWP 0.25 USSP 0.5
45 40
) % ( e v i t c e j b O
35 30 25 20 15 10 5 0
Testing Cases
Fig. 4. The effects of weights for factors of objective function
1208
J.-L. Lin, C.-H. Chang, and J.-Y. Yang
Theoretically, the more types of container it has, the higher efficiency of space utilization has. Since the more type of container, it can choose, the more compact container can be found. From Figure 5, the packing algorithm shows higher efficiency of space utilization when the type of container is more for most of testing cases, but these results is not quite obvious. The Relationship of Space Utilization and # of Container Type 35 30
)% ( ev tic ej bO
25 20 15 10 ContainerType 4 ContainerType 8 ContainerType 12 ContainerType 16
5 0
Testing Cases
Fig. 5. The relationship of space utilization and number of container type
To evaluate the effect of dimensions and number of objects, we designed 100 testing cases, whose dimension is ±10% based on (8, 5, 6), (10, 5, 6), and (12, 6, 7). The number of objects to be packed is set to 80, 120, and 160, respectively. When the number of objects and dimensions of objects are moderate, the simulation results showed very high efficiency and better than others, as shown in Figure 6. It is also reasonable, because the smaller dimensions of objects are, the more compact packing is. Besides, the more objects to be packed, the more dispersing of unloading sequence of objects is. So when the number of objects and dimensions of objects are moderate, packing will have higher efficiency.
The Effects of Dimensions and Number of Objects 21 20
%)( ev it ce jb O
19 18 17
Dimension base(8,5,6), # 160 Dimension base(10,5,6), # 120 Dimension # 8080 物件維度 Base 12base(12,6,7), 6 7 變動 10% 個數 物件維度 Base 8 5 6 變動 10% 個數 160
16
物件維度 Base 10 5 6 變動 10% 個數 120 15
Testing Cases
Fig. 6. The effects of dimensions and number of objects to be packed
The last observation is the variation of dimensions among objects. First, use (9, 7, 8) as the dimension base of testing cases. Then generate objects based on the
Multiple-Constraint Multiple-Container Packing Problems
1209
dimension base with 10%, 20%, and 30% adjustment. The number of generation is set to 5 and the number of objects is set to 90. Figure 7 showed the simulation results of these testing cases. It is obvious that the smaller variation percentage of dimensions among objects, the better performance the results have, since the compact packing is more difficult when the dimensions among objects have larger variation. The Effects of Dimensions Variation among Objects 40 35
)% ( ev it ce jb O
30 25 20 15 10
物件維度 9 7 8 變動 個數 9090 Dimension Base base (9,7,8) ±10%,10%Objects: 物件維度 Dimension Base base (9,7,8) 9 7 8 變動 ±20%,20%Objects: 個數 9090 Dimension Base base (9,7,8) ±30%,30%Objects: 物件維度 9 7 8 變動 個數 9090
5 0
Testing Cases
Fig. 7. The effects of dimension variation among objects
5 Conclusion This research proposed an efficient algorithm to solve multiple-container packing problems with multiple constraints. We are not only consider the space utilization, but also take the load bearing limitation, unloading overhead, and stable packing into considerations. Besides the algorithm, a computer program based on the proposed algorithm was developed. The computer system is not only a simulation tool for performance analysis, but also a system to provide practical solutions for customer designated multiple-container packing problems. Thousands of cases were simulated and analyzed to evaluate the performance of the proposed research and prove the applicability in real world. The adjustable weight of factors of objective function makes the proposed method more practicable and useful for real world applications.
References [1] Lin, J. L. & Chang, C. H, "Stability 3D container packing problems," The 5th Automation '98 and International Conference on Production Research, (1998) [2] Lin, J.L. & Chung, Cheng-Hao, "Solving Multiple-Constraint Container Packing by Heuristically Genetic Algorithm," The 8th Annual International Conference on Industrial Engineering – Theory, Applications and Practice, Nov. 10-12, 2003, Las Vegas, U.S.A., (2003) [3] Verweij, B., Multiple destination bin packing. ALCOM-IT Technical Report (1996) [4] Bellot, P. and El-Beze, M., A clustering method for information retrieval, Technical Report IR-0199. Laboratoire d’Informatique d’Avignon (1999)
1210
J.-L. Lin, C.-H. Chang, and J.-Y. Yang
[5] Boley, D. Gini, M. Gross, R. Han, E.H. Hastings, K. Karypis, G. Kumar, V. Mobasher B. and Moore J., Partitioning-Based Clustering for Web Document Categorization. DSSs Journal, (1999) [6] Dunham, M., Data Mining: Introductory and Advanced Topics, Prentice Hall (2003) [7] Han, J. and Kamber M., Data Mining: Concepts and Techniques, Morgan Kafmann Publishers (2001) [8] Inderjit S. Dhillon & Dharmendra S. Modha, “A Data-clustering Algorithm on Distributed Memory Multiprocessors,” Proc. of Large-scale Parallel KDD Systems Workshop, ACM SIGKDD, 1999 August (also Large-Scale Parallel Data Mining, Lecture Notes in AI, Vol. 1759, pp. 245-260, 2000) [9] Jia-Yan Yang, "A Study of Optimal System for Multiple-Constraint Multiple-Container Packing Problems," Master Thesis, Huafan University (2005)
Planning for Intra-block Remarshalling in a Container Terminal Jaeho Kang1 , Myung-Seob Oh1 , Eun Yeong Ahn1 , Kwang Ryel Ryu1 , and Kap Hwan Kim2 1
2
Department of Computer Engineering, Pusan National University San 30, Jangjeon-dong, Kumjeong-gu, Busan, 609-735, Korea {jhkang, oms1226, tinyahn, krryu}@pusan.ac.kr Department of Industrial Engineering, Pusan National University San 30, Jangjeon-dong, Kumjeong-gu, Busan, 609-735, Korea
[email protected] Abstract. Intra-block remarshalling in a container terminal refers to the task of rearranging export containers scattered around within a block into designated target bays of the same block. Since the containers must be loaded onto a ship following a predetermined order, the rearrangement should be done in a way that containers to be loaded earlier are placed on top of those to be loaded later to avoid re-handlings. To minimize time to complete a remarshalling task, re-handlings should be avoided during remarshalling. Moreover, when multiple yard cranes are used for remarshalling, interference between the cranes should be minimized. In this paper, we present a simulated annealing approach to the efficient finding of a good intra-block remarshalling plan, which is free from re-handlings at the time of loading as well as during remarshalling.
1
Introduction
In a container terminal, the efficiency of container loading onto a ship is highly dependent on the storage location of the containers in the yard. The containers must be loaded onto the ship in a predetermined loading sequence. Therefore, if the containers are located adjacently in the yard for successive loading, the travel distance of yard cranes and service time for container loading can be greatly reduced. However, gathering containers close to one another is not sufficient for efficient loading because the loading sequence must be followed. Gathering containers without considering the sequence may cause a considerable number of re-handlings at the time of loading because the container to be fetched out at each time is often found under others. In addition, time required for intra-block remarshalling must be minimized for efficient use of valuable resources such as yard cranes. In particular, when multiple yard cranes are used collaboratively for remarshalling, care must be taken to minimize crane interference.
This work was supported by the Regional Research Centers Program (Research Center for Logistics Information Technology), granted by the Korean Ministry of Education Human Resources Development.
M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1211–1220, 2006. c Springer-Verlag Berlin Heidelberg 2006
1212
J. Kang et al.
For a remarshalling task, cranes have to move tens of containers considering the sequence of container loading and interference between cranes. Therefore, the number of possible crane schedules is very large and finding the optimal one is intractable. Moreover, a crane schedule for remarshalling should be constructed in a reasonable time. In order to solve the scheduling problem efficiently, we applied simulated annealing[1]. Simulated annealing searches for a good partial order among containers to be rearranged that requires less time for remarshalling and does not make any re-handling during remarshalling and loading. A candidate solution expressed in the form of partial order graph does not regard the number of cranes to be used and their detailed movements but it can express a collection of possible re-handling-free total orders of container moves. Each candidate solution is used to construct a full crane schedule by a greedy evaluation heuristic with detailed crane simulation. By the heuristic, container moves are assigned to each crane and an appropriate order of these moves is determined under the constraint of the given partial order graph in a way of minimizing time for remarshalling, taking crane interference into account. The estimated time of the constructed full schedule is used as the quality of the partial order graph. Simulated annealing navigates the search space of partial order graphs using their evaluation results as a compass. The performance of the evaluation heuristic is adjustable by changing the number of container assignments looked forward in selecting a container for the next assignment. Therefore, under the limited time of scheduling, there are two possible directions of improving the quality of schedule: trying more candidates or constructing a crane schedule with more container assignments looked forward. Experimental results show that a better remarshalling schedule can be obtained by carefully balancing between the two. The next section explains the remarshalling problem in detail and reviews some related researches. The simulated annealing approach and the evaluation heuristic are presented in Section 3 and 4, respectively. In Section 5, the proposed method is tested and the results are summarized. Finally in Section 6, we give our conclusions and some topics for further research.
2
Intra-block Remarshalling
Intra-block remarshalling is a task of moving a set of target containers to designated bays within a block. We call the bays in which the target containers are located before remarshalling ‘source bays’ and the empty bays into which the target containers are moved ‘target bays.’ In Fig. 1, there are two source bays S1 and S2 and two target bays TW and TG . In this figure, each bay in the block is three rows wide and three tiers high. A bay can contain up to nine containers. Figure 1 also shows cross-sectional views of the source and target bays. A rectangle drawn with a solid line in the source bays is a container. In this example, fourteen target containers need to be moved during remarshalling. The number in each container refers to its order of loading at its target bay and an alphabetic character denotes the target bay for the container.
Planning for Intra-block Remarshalling in a Container Terminal Yard crane 202
Yard crane 203
Block 4A
1213
Row 1 Row 3 4G Tier 3 7W Tier 2 1G 7G 3G 4W 5G 5W
Bay 1
S1
TG
S2
TW Bay 12 Tier 1 1W 2G 3W 2W 6W 6G Source bay S1 Source bay S2 Target bay TW Target bay TG
Fig. 1. An example of remarshalling problem
Because a predetermined sequence of container loading in each target bay must be followed, if a container for later loading is moved and stacked on top of an earlier one, the upper container should be moved to a temporary location during container loading. For example, in Fig. 1, if container 4G is moved and stacked on top of container 3G in one of stacks in TG , then container 4G should be moved to somewhere before container 3G is being fetched out for loading. This temporary movement of container is called ‘re-handling.’ Re-handlings incur extra works of yard cranes and delay the work of container loading that is one of the most expensive and time-critical services of container terminals. Therefore, target containers should be moved into their target bays in a manner of avoiding such re-handlings. For a very simple solution of scheduling cranes for remarshalling, the moves of target containers can be sequenced in the reverse order of container loading. For example, in Fig. 1, a yard crane can move container 7W first into its target bay TW , container 6W the next, and so on. However, this approach may cause additional work of yard cranes during remarshalling. For example, in Fig. 1, after moving container 7W , a yard crane has to move container 5G temporarily to some other location before moving container 6W . Later in the remarshalling, container 5G will be moved again to be stacked into its proper target bay. These temporary movements of containers during remarshalling are also called re-handlings and they delay the work of remarshalling that seizes valuable resources such as yard cranes. Therefore, this type of re-handlings should also be avoided to finish remarshalling as early as possible. A re-handling-free schedule does not incur any re-handling situation during remarshalling as well as during loading. For a schedule to be free of re-handlings, the following two main constraints must be satisfied. Constraint-Re-handling-Free-In-Loading: After remarshalling, each container should be loaded onto the ship without re-handling. Constraint-Re-handling-Free-In-Remarshalling: Each container must be moved from its source bay to its target bay without re-handling. There has been a little research on the issue of remarshalling. Kim and Bae presented a two-stage approach to planning remarshalling.[2] In the first stage, containers to be moved and the target bays of selected containers are determined in a manner of minimizing the number of containers to be moved by considering the given ship profile. In the second stage, the sequence of container moves is determined to minimize crane movements. They assumed that remarshalling is
1214
J. Kang et al.
started and finished before the loading sequence is determined and a single crane is used for remarshalling. Moreover, they did not consider the specific storage slots for containers in the bays before and after moving. In contrast, our paper assumes that multiple cranes carry out remarshalling after the loading sequence is determined. Furthermore, we consider the specific storage slots of containers in the bays and interference between cranes. Some studies have developed strategies that determine a desirable stacking location for each container that comes into the yard. Dekker and Voogd described and compared various stacking strategies.[3] Each stacking strategy is evaluated by a few measures such as the number of re-handling occasions and workload distribution of yard cranes. Kim et al. derived a decision rule to locate export containers from an optimal stacking strategy to minimize the number of rehandlings in loading.[4] They assumed that containers are loaded onto a ship in the decreasing order of container weight group. These two studies tried to reduce the number of re-handlings by assigning the location of each container that comes into the yard more carefully. However, re-handlings in loading cannot be avoided completely for various reasons such as imprecise container weight information[5] and insufficient yard storage. Therefore, remarshalling may not be completely removed in most of the practical situations. Finally, there are studies on crane scheduling particularly when multiple cranes are used in a block. Ng presented a dynamic programming-based heuristic approach to crane scheduling to minimize the waiting time of trucks.[6] Dynamic programming is used to partition the block into ranges in which each crane can move; these ranges help crane interference to be resolved. However, these studies cannot be used to schedule cranes for remarshalling directly because a primitive task of remarshalling is moving a container from one place to another in the same block rather than giving a container to or taking a container from a truck. In addition, there are some order constraints among container moves imposed by the sequence of container loading. Therefore, a crane scheduling method that can efficiently consider the characteristics of remarshalling is required.
3
Using Simulated Annealing for Remarshalling
This section describes a method of deriving a partial order graph from the locations of containers before and after moving. The derived partial order graph is used as an initial solution of simulated annealing. An efficient method of generating valid neighbors from the current partial order graph is also presented in this section. 3.1
Determining Target Locations for Containers
In this sub-section, we present a heuristic method of determining target locations of containers under the two main constraints introduced in the previous section. Figure 2 shows a step-by-step example of the heuristic. In the initial state Fig. 2(a), containers 7W , 7G, 3G, 4G, 5G, and 5W are located on top of stacks
Planning for Intra-block Remarshalling in a Container Terminal Source bays
Target bays
(a)
Source bays
1215
Target bays
(d) 7W
4G
4W
3G 1G
1G 7G 3G
4W 5G 5W
5W
4G 5G
1W 2G 3W
2W 6W 6G
7W
7G 6G
3G
4W 1W
3G 1G
5W 2W
4G 5G
7W 3W
7G 6G
1W 2G 3W
(b)
2W 6W
(e) 7W 1G
4W 5G 5W
4G
1W 2G 3W
2W 6W 6G
7G
(c)
2G
6W
(f) 1G 1W 2G 3W
4W
3G
4W 1W
3G 1G
5G
5W
4G
5W 2W
4G 5G
2W 6W 6G
7W
7G
7W 3W 6W
7G 6G 2G
Fig. 2. Example of determining of the target locations of containers
in their source bays. These six containers can be moved to their target bays without violating Constraint-Re-handling-Free-In-Remarshalling. From these containers, some to the same target bay can be selected to be moved and stacked without breaking Constraint-Re-handling-Free-In-Loading. For example, if 7G, 4G, and 5G are selected, the order of moves for these selected containers should be 7G → 5G → 3G to satisfy Constraint-Re-handling-Free-InLoading. Although there can be other container selections for a stack, each selection can have only one order of moving if the selected containers belong to the same target bay. Figure 3 shows the detailed procedure of container selection for filling a stack. By the procedure, 4G, 7G, and 3G are selected, and the order of moves is 7G → 4G → 3G. The target locations of the three containers are now fixed. After removing these three containers from the source bays and placing them on a stack in their target bay, the state of bays will be shown as Fig. 2(b). The container selection and placing process is repeated until the target locations of all containers are determined. In this example, the set of containers {7W , 5W , 4W } is the next selection, and their order is 7W → 5W → 4W . When the target locations of all containers are determined, we can obtain a target configuration defined by the locations of the containers at target bays. 3.2
Deriving Partial Order Graph from the Locations of Containers
Figure 4(a) shows the source configuration defined by the initial positions of containers at the source bays for the remarshalling problem of Fig. 1. The source configuration constrains the order of container moves satisfying ConstraintRe-handling-Free-In-Remarshalling. If container c1 is placed on top of another container c2 on the same stack in a source configuration, c1 should be moved to its target bay before c2 to satisfy Constraint-Re-handling-FreeIn-Remarshalling. Similarly, a target configuration, which is defined by the
1216
1.
2.
J. Kang et al.
Select target bay t that has the largest number of target containers, which can be moved without violating Constraint-Re-handling-Free-In-Remarshalling. 1.1. For tie breaking, one of the target bays is chosen at random 1.2. Let Ct be a set of containers that satisfy Constraint-Re-handling-Free-InRemarshalling for selected target bay t. From Ct, select a sufficient number of containers to fill one of empty stacks of t using the following priority rules. (In Fig. 2, a maximum of three containers can be stacked in a row) 2.1. Select the container that has the largest number of target containers under it. 2.2. If a tie is found, then select the container that has the largest number of target containers for t under it. 2.3. If a tie still exists, then select one of these containers at random. Fig. 3. Procedure for selecting containers to fill a stack
positions of containers at the target bays, also restricts the order of moves to fulfill the two main constraints. In a stack of a target configuration, if container c1 is prior to container c2 in loading, c2 must be moved before c1 so that c1 should be placed on top of c2 . By combining these two types of ordering constraints imposed by the source and target configurations, a partial order graph can be derived to express all the possible re-handling-free sequences of moves under the two configurations. The derived partial order graph is shown in Fig. 4(c). A solid arrow in the figure represents an order constraint issued by the source configuration, and a dotted arrow is a constraint derived from the target configuration. In our experiment, a derived partial order graph is used for the initial solution of simulated annealing.
(a) Source configuration 7W
4G
(b) Target configuration
(c) Partial order graph
4W 1W
3G 1G
7W
1G 7G 3G 4G 5G 5W
5W 2W
4G 5G
7G
1W 2G 3W 2W 6W 6G
7W 3W 6W 7G 6G 2G
1G
1W
2G 3G 4G
3W
4W
2W 5G
5W
6W
6G
Fig. 4. Partial ordering of container moves
3.3
Generating Neighborhood Solutions
A different partial order graph can be obtained by slightly modifying the current one. Figure 5 shows an example of generating neighbors from the partial order graph in Fig. 4(c). In the graph, two containers on different stacks of the same target bay are selected at random for swapping. After swapping the
Planning for Intra-block Remarshalling in a Container Terminal Source configuration Candidate target configuration 1 4W 1W
2G 1G
1G 7G 3G 4G 5G 5W
5W 2W
3G 5G
1W 2G 3W 2W 6W 6G
7W 3W 6W 7G 6G 4G
7W
4G
7W
1G
7G
2G 3G 4G
Source configuration Candidate target configuration 2 7W
4G
4W
5W 2W 4W 4G 5G 7W 3W 6W 7G 6G 2G
7W
1G
3W 4W
4G
6W
(a) Candidate solution 1
1W
2G 3G
6G
3G 1G
1G 7G 3G 4G 5G 5W
7G
2W
1W
1W 2G 3W 2W 6W 6G
3W
5G 5W
1W
1217
5G 5W
2W
6W
6G
(b) Candidate solution 2
Fig. 5. An example of generating neighbors from the partial order graph in Fig. 4(c)
selected containers, a new partial order graph can be generated. For example, in Fig. 5(a), container 2G and 4G are selected and a neighbor, which is slightly different with the partial order graph of Fig. 4(c), is obtained. Note that some containers should be reordered in their stacks according to their turns of loading to satisfy Constraint-Re-handling-Free-In-Loading. A single container can be transferred to another non-full stack of the same target bay by swapping it with a virtual container as shown in Fig. 5(b). A container swapping does not always lead to a valid partial order graph. When a cycle exists in a modified graph, no re-handling-free total order can be derived from it. Another swapping is tried to the original graph if the previous swapping introduces a cycle. Cycle detection can be performed in a linear time proportional to the number of containers.[7]
4
Constructing a Schedule from Partial Order Graph
This section presents a detailed description of the evaluation heuristic. The evaluation heuristic constructs an executable crane schedule by assigning each move of container to one of cranes and by determining the order of moves archived by each crane. It also detects crane interference by simulation based on the state-transition-graph model. Figure 6 shows a snapshot of constructing a crane schedule using the heuristic. In this figure, some container moves have been already assigned and currently crane A is moving container 2W . A new container assignment is required to the idle crane B that just finished the work of moving container 3W . Under the constraints of the given partial order graph, container 1G and 6W are two possible containers for the assignment to crane B. The heuristic simulates all the possible container assignments and selects the best-looking one to construct the partially built crane schedule incrementally.
1218
J. Kang et al.
Fig. 6. An example of partially built crane schedule by the evaluation heuristic
The heuristic uses a depth-limited branch-and-bound search[8] to find the best-looking container. Figure 7 shows a tree that was constructed by the heuristic with the maximum depth of four. For each of the containers, there are two possible scenarios of avoiding crane interference when two cranes are used for remarshalling and interference is detected by simulation. Interference between the cranes can be resolved by giving the priority of working to one of the two cranes. The crane that is given the priority works without waiting. In contrast, the other should wait at the location nearest to the crane with priority to avoid interference. Therefore, a node in the tree represents both of crane priority and container assignment. The crane that has priority in each node is expressed in underline face. A node is evaluated by delay time, which is denoted on the right of the node, during the work of moving the assigned container. In Fig. 7, container 1G is selected for the assignment of a new container to crane B, and crane A will take the priority in moving container 2W . The partially built crane schedule of Fig. 6 is expanded by adding the selected container 1G to crane B. Another tree search will be performed for the next new container assignment for crane A because crane A will finish its work of moving container 2W earlier than crane B. This process is repeated until all the containers are assigned to one of the cranes. The maximum depth L of the depth-limited search is adjustable. Usually a better crane schedule can be obtained with higher L. However, higher L requires longer time in evaluation and results less trials in the level of simulated annealing if the time for crane scheduling is limited. The results of the experiment show that a better remarshalling schedule can be obtained by adjusting the balance between trying more partial order graphs and evaluating each graph with more forward looking.
Planning for Intra-block Remarshalling in a Container Terminal
1219
Fig. 7. Depth-limited branch-and-bound for assigning a container to idle crane B
5
Results of Experiment
For the experiment, we chose an environment with 33 bays in a block, each bay consisting of nine rows by six tiers. Two non-crossing cranes were used for remarshalling and the cranes required at least a five-bay gap for suitable working clearance. Four empty target bays are used to stack 196 target containers. Initially, target containers were randomly scattered within the block. Five different scenarios were used to test our approach to problems of various difficulties by changing the crowdness of target containers in the source bays. The time of searching for the remarshalling schedule was limited to 10 minutes. About 15,800 candidate solutions were evaluated in this time limit by depth-limited branchand-bound with L = 1. About 14 candidates were evaluated with L = 8 in the same time limit. We also had tried another local-search algorithm, hill-climbing search, which did not show a better performance than simulated annealing. A Pentium PC with 3.2GHz was used for the experiments and each experiment was repeated ten times. Figure 8 shows the performance of search with different L for each scenario. The graph depicts the relative estimated time of generated remarshalling schedules compared to those generated by search with L = 1. It is easily noticeable that higher L does not always give a better remarshalling schedule when the running time is limited. There is a point of balance between the maximum depth in evaluation and the number of evaluations for each scenario.
6
Conclusions
This paper presented a method of generating a remarshalling schedule for multiple cranes. Partial order graph is used to make a schedule to be free of rehandlings, and simulated annealing is applied to minimize time required for remarshalling at the level of partial order graphs. We also presented an evaluation heuristic for constructing a full crane schedule from a partial order graph.
J. Kang et al.
Relative estimated time of remarshalling
1220
100% 99%
4 bays
98% 97% 6 bays 96% 8 bays 95%
16 bays 29 bays
94% 1
2
3
4
5
6
7
8
Maximum depth L of depth-limited branch-and-bound
Fig. 8. The relative estimated time of remarshalling with schedules generated by simulated annealing with various maximum depths of evaluation heuristics
Our proposed method can generate an efficient crane schedule in a reasonable time. The results of the experiment show that a better remarshalling schedule can be obtained by carefully adjusting the running time of the evaluation heuristic under a limited time of computation. For further research, we are interested in finding a smart way of generating neighbors by selecting containers to be swapped considering feedback information that can be obtained in the process of applying the evaluation heuristic.
References 1. Aarts, E., Korst, J.: Simulated Annealing. Local Search in Combinatorial Optimization. John Wiley & Sons (1997) 91-120 2. Kim, K. H., Bae, J.-W.: Re-Marshaling Export Containers. Computer and Industrial Engineering 35(3-4) (1998) 655–658 3. Dekker, R. and Voogd, P.: Advanced methods for container stacking. The Proceeding of International Workshop on Intelligent Logistics Systems (2005) 3–29 4. Kim, K. H., Park, Y. M., Ryu, K. R.: Deriving decision rules to locate export containers in container yards. European Journal of Operational Research, 124 (2000) 89–10 5. Kang, J., Ryu, K. R., Kim, K. H.: Determination of Storage Locations for Incoming Containers of Uncertain Weight. Proceedings of the 19th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent System (2006) 1159–1168 6. Ng, W. C.: Crane scheduling in container yards with inter-crane interference. European Journal of Operational Research 164 (2005) 64–78 7. Nivasch, G.: Cycle Detection Using a Stack. Information Processing Letters 90(3) (2004) 135–140 8. Russell, S. J., Novrig, P.: Artificial Intelligence: A Modern Approach (second edition) Prentice Hall (2002)
Develop Acceleration Strategy and Estimation Mechanism for Multi-issue Negotiation Hsin Rau and Chao-Wen Chen Department of Industrial Engineering, Chung Yuan Christian University Chungli, Taiwan 320, Republic of China
[email protected] Abstract. In recent years, negotiation has become a powerful tool in electronic commerce. When two negotiation parties still have a lot of space to negotiate, little-by-little concession has no benefit to the negotiation process. In order to improve negotiation efficiency, this study proposes a negotiation acceleration strategy to facilitate negotiation. In addition, this paper develops an estimation mechanism with regression technique to estimate the preference of opponent, with which results the joint utility of negotiation can be maximized. Finally, an example is given to illustrate the proposed estimation mechanism.
1 Introduction Negotiation is a process by means of which agents communicate and compromise to reach mutually beneficial agreements [8]. Recently, negotiation becomes a powerful tool to aid electronic commerce trade. In negotiation strategy, Jennings and Wooldridge [3] considered the negotiation agent should increase the capability of intelligence and learning in order to know the opponent’s negotiation attitude. Based on this idea, researchers considered that agents should be able to respond to opponent’s actions intelligently and proposed some algorithms or rules to develop new negotiation strategies or tactics [4-10]. Factors in negotiation strategy influencing the negotiation mostly are time, opponent’s concession attitude and offer range, etc. However, little-by-little concession has no benefit when two negotiation parties still have a lot of space to negotiate. In order to improve negotiation efficiency, this study extends the negotiation decision functions (NDFs) proposed by Faratin et al. [2] to promote an acceleration strategy to speed up negotiation. In addition, a regression technique is used to estimate the opponent’s preference. Moreover, this study uses linear programming to find the maximal joint utility in order to maximize satisfaction. Finally, an example is given to illustrate the proposed methodology. The rest of this paper is organized as follows. Section 2 briefly introduces negotiation decision functions. Section 3 develops the negotiation acceleration strategy. Section 4 discusses how to estimate opponent’s favorite. Section 5 describes the methodology for maximizing utility. Section 6 gives an example for illustration. In the final section, a conclusion is provided. M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1221 – 1230, 2006. © Springer-Verlag Berlin Heidelberg 2006
1222
H. Rau and C.-W. Chen
2 Negotiation Decision Functions This study extends the negotiation model proposed by Faratin et al. [2]. In their negotiation model, two parties negotiate on one issue or multi-issues, such as price, delivery time, quality, etc. Let x aj→b (t n ) be an offer proposed by agent a to agent b for negotiation issue j at time tn.
a t max is agent a’s negotiation deadline. A negotiation
thread is denoted as X aj →b (t n ) , which is a sequence of alternating offers of the form ( x aj →b (t 0 ), x bj→ a (t1 ), x aj →b (t 2 ),L ). The negotiation is terminated by an acceptance or a withdrawal from either side. Agent a’s response at
t n to agent b’s offer sent at time
t n−1 is defined as: a ⎧reject if t n > t max ⎪⎪ I a (t n , x bj→a (t n−1 )) = ⎨accept if V a ( x bj→a (t n−1 )) ≥ V a ( x aj→b (t n )) ⎪ a→b ⎩⎪ x j (t n ) otherwise
(1)
where x aj →b (t n ) is the counter offer from agent a to agent b when an offer x bj→a (t n −1 ) is not accepted by a. Agent i (a or b) has a scoring function, V ji : [ min ij , max ij ]→[0, 1], that gives the score (or utility) agent i assigns a value of issue j in the range of its acceptable values.
V ji is defined in Eqs.(2) and (3). w aj is agent a’s weight for issue j, which indicates the importance of issue j for agent a. J is the number of issues.
⎧⎪( x ij − min ij ) /(max ij − min ij ) V ji (x ij ) = ⎨ i i i i ⎪⎩(max j − x j ) /(max j − min j )
as x ij is an increasing function as x ij is a decreasing function
(2)
J
V a ( x) = ∑ w aj v aj ( x j )
(3)
j =1
The NDFs proposed by Faratin et al. [2] include three tactics: time-dependent, resource-dependent, and behavior-dependent. Moreover, we find that negotiation strategies used mostly relate to time, so we use the time-dependent tactic in this study. The characteristic of time-dependent tactic is that agent gives an offer according to time with concession parameter β , which is described by the concession function
α aj (t )
and offers, as shown in Eqs. (4) and (5).
a ⎧⎪ x aj →b (t0 ) + α aj (tn ) × ( x aj → b (tmax ) − x aj → b (t0 )) x aj → b (tn ) = ⎨ a →b a a →b a a →b ⎪⎩ x j (t0 ) + α j (tn ) × ( x j (tmax ) − x j (t0 ))
if x aj → b (tn ) decreasing if x aj → b (tn ) increasing
a a α aj (t ) = k aj + (1 − k aj )(min(t n , t max ) / t max )1 / β
(4) (5)
Develop Acceleration Strategy and Estimation Mechanism
Time-dependent tactic is divided into three ranges according to the value of (1)
1223
β:
β 1,
The details can be found in Faratin et al. (1998).
3 Develop a Negotiation Acceleration Strategy In NDFs, two negotiation agents might spend a lot of time to advance little toward their negotiation settlement. In order to improve this situation, this study proposes a negotiation acceleration strategy for single issue, as expressed in Eq. (6). 1
α aj (t n ) = [( α aj (t n −1 ) +CTF( t n ))+(1-( α aj (t n −1 ) +CTF( t n )))× γ ×CAF( t n ))] β
(6)
where γ ∈ [0,1] . CTF is a concession time function and can be defined as the ratio of remaining concession to remaining time, as shown in Eq. (7).
CTF (t n ) =
1 − α aj (t n −1 ) a t max − t n −1
(7)
CAF evaluates the distance of current offer between both agents (see Fig. 1) as compared with the distance of initial offer between both agents, as shown in Eq. (8).
⎧ x aj→b (t n −1 ) − x bj→a (t n −1 ) if x aj →b (t n ) decreasing ⎪ a →b b→a − x ( t ) x ( t ) ⎪ j 0 CAF (t n ) = ⎨ b→j a 0 a →b − x ( t ) x ( t n −1 j n −1 ) ⎪ j if x aj→b (t n ) increasing a b b a → → ⎪ x j (t 0 ) − x j (t 0 ) ⎩
Fig. 1. Illustration of concession acceleration function (CAF)
(8)
1224
H. Rau and C.-W. Chen
(a) Effect of weight
(b) Effect of
γ
(gamma)
Fig. 2. Effects of weight and gamma
CAF ∈ [−1,1] . When x bj→a (t n −1 ) > x aj →b (t n −1 ) , CAF>0 indicates normal concession, whereas CAF 0, i = 1, 2, . . . , N
(10)
1236
J. Martyna
Thus is considered the following minimization problem: N
min J(w, b, ξ) =
w,b,ξ
subject to
1 || w ||2 +C ξi 2 i=1 yi [wT φ(xi ) + b] ≥ 1 − ξi , ξi > 0, i = 1, 2, . . . , N, C > 0
(11)
where C is a positive constant parameter used to control the tradeoff between the training error and the margin. The dual of the system (11) as the result of Karush-Kuhn-Tucker (KKT) condition [9] leads to a well-known convex quadratic programming (QP). The solution of the QP problem is slow for large vectors and it is difficult to implement in the on-line adaptive form. Therefore, a modified version of the SVM called the Least Squares SVM (LS-SVM) was proposed by Suykens et al [18]. In the LS-SVM method, the following minimization problem is formulated W
1 1 2 min J(w, b, e) = wT w + γ ek w,b,e 2 2 k=1
subject to
yk [wT φ(xk ) + b] = 1 − ek , k = 1, 2, . . . , N
(12)
The corresponding Lagrangian for Eq. (12) is given by L(w, b, e; α) = J(w, b, e) −
N
αk {yk [wT φ(xk ) + b] − 1 + ek }
(13)
k=1
where the αk are the Langrange multipliers. The optimality condition leads to the following (N + 1) × (N + 1) linear system 0 YT b 0 = (14) Y Ω ∗ + γ −1 I α 1 where Z = [φ(x1 )T y1 , . . . , φ(xN )T yN ] Y = [y1 , . . . , yN ] 1 = [1, . . . , 1] α = [α1 , . . . , αN ] and Ω ∗ = ZZ T . Due to the application of Mercer’s condition [18] there exists a mapping and an expansion ∗ Ωkl = yk yl φ(xk )T φ(xl ) = yk yl K(xk , xl )
(15)
Thus, the LS-SVM model for function estimation is given y(x) =
N k=1
αk yk · K(x, xk ) + b
(16)
Least Squares Support Vector Machines for Bandwidth Reservation
1237
where parameters αk and b are based on the solution to Eqs. (14) and (15). The parameters αk , b denote the optimal desired weights vector for the bandwidth reservation when the MT crossed the shadowed areas. The parameter yk contains the information. In comparison with the standard SVM method, the LS-SVM has a lower computational complexity and memory requirements.
4
Simulation Results
In this section, we give some simulation results presenting the effectiveness of our approach. We compared the proposed LS-SVM bandwidth reservation method with the fuzzy logic controller (FLC) for bandwidth reservation. Our simulation was restricted to 36 cells in the cellular IP network. In each cell one of the BS station is allocated. Additionally, we assumed that in each cell 100 Mbps were accessible. Each new call is a MT and it will move to one of the six neighbouring cells with equal probability. We assumed that there were two classes of traffic flow, , , where each triple defines the burst size of the flow, an average rate of flow, and the end-to-end delay requirement for each class [5]. Each base station can service a traffic flow in any class. Figures 4 and 5 show the call-blocking probability and the call-dropping probability for two schemes: with the LS-SVM bandwidth reservation method and with the fuzzy logic reservation controller. The depicted graphs illustrate that the proposed LS-SVM method can improve the call-blocking probability ca. by 30% for the low value of call arrival rate and ca. by 100%. for the the high value of call arrival rate, respectively. Analogously, the call-dropping probability is also improved. For the low value of a call arrival rate it is ca. 10% and for the low value of call arrival rate and for the high value of a call arrival it is ca. 30%.
C a l .06 l .05 b l .04 o c .03 k. p .02 r o .01 b.
First class (LS-SVM)
r Second class (LS-SVM)
b b
First class (FLC) b Second class (FLC)
b b b r 10−2
b r
r
r r
r
10−1 Call arrival rate
100
Fig. 4. Call-blocking probability versus call arrival rate
1238
J. Martyna C a .03 l l .025 d r o .02 p. p .015 r o b. .01
First class (LS-SVM)
r Second class (LS-SVM)
b
First class (FLC)
b Second class (FLC)
b b
b b b r 10−2
r
r
r
10−1 Call arrival rate
r
r
100
Fig. 5. Call-dropping probability versus call arrival rate
It can also be seen that the bandwidth reservation method is somewhat better for the little traffic flow stream than for the heavily traffic flow stream.
5
Conclusion
In this paper we have established a new scheme for bandwidth reservation in wireless IP networks. We have used the LS-SVM method to determinate the required bandwidth in neighbouring cells. The gained bandwidth reservation show that the LS-SVM method can be a good candidate to effectively improve handoff call-blocking probability and call-dropping probability. However, the investigated method will be compared with other methods of bandwidth reservation.
References 1. S. Basagni, M. Conti, S. Giordano, I. Stojmenovi´c, ”Mobile Ad Hoc Networking”, IEEE Press, John Wiley and Sons, Inc. (2004). 2. S. Chen, A.K. Samingan, L. Hanzo, ”Support Vector Machine Multiuser Receiver for DS-CDMA Signals in Multipath Channels”, IEEE Trans. on Neural Networks, Vol. 12, No. 3 (2001) 604 - 611. 3. S. Choi, K.G. Kin, ”Predictive and Adaptive Bandwidth Reservation for Handoffs in QoS-Sensitive Cellular Networks”, in: Proceedings of ACM SIGCOMM ’98, Vancouver (1998). 4. C. Cortes, V.N. Vapnik, ”Support Vector Networks”, Machine Learning, 20, (1995) 273 - 297. 5. S. Dixit, R. Prasad (Eds.), ”Wireless IP and Building the Mobile Internet”, Artech House, Boston, London (2003). 6. M. Ei-kadi, S. Olariu, H. Abdel-Wahab, ”Rate-Based Borrowing Scheme for QoS Provisioning in Multimedia Wireless Networks”, IEEE Trans. on Parallel and Distributed Systems, Vol. 13, No. 2 (2002) 156 - 166.
Least Squares Support Vector Machines for Bandwidth Reservation
1239
7. X. Gong, A. Kuh, ”Support Vector Machine for Multiuser Detection in CDMA Communications”, in: The 33rd Asilomar Conference on Signals, Systems, and Computers, Vol. 1 (1999) 680 - 684. 8. M. Hasegawa, G. Wu, M. Mizuno, ”Applications of Nonlinear Prediction Methods to the Internet Traffic”, in: The 2001 IEEE International Symposium on Circuits and Systems, Vol. 2 (2001) III-169 - III-172. 9. H. Kuhn, A. Tucker, ”Nonlinear Programming”, in: Proceedings of the 2nd Berkeley Symposium on Mathematical Statistics and Probabilistics, University of California Press (1951) 481 - 492. 10. G.-S. Kuo, P.-C. Ko, M.-L. Kuo, ”A Probabilistic Resource Estimation and SemiReservation Scheme for Flow-Oriented Multimedia Wireless Networks”, IEEE Communications Magazine (2001) 135 - 141. 11. J.-H. Lee, S.-U. Yoon, S.-K. Youm, C.-H. Kang, ”An Adaptive Resource Allocation Mechanism Including Fast and Reliable Handoff in IP-Based 3G Wireless Networks”, IEEE Personal Communications (2000) 42 - 47. 12. D.A. Levine, I.F. Akyildiz, M. Naghshineh, ”A Resource Estimation and Call Admission Algorithm for Wireless Multimedia Networks Using the Shadow Cluster Concept”, IEEE/ACM Trans. on Networking, Vol. 5, No. 1 (1997) 1 - 12. 13. T. Liu, P. Bahl, I. Chlamtac, ”Mobility Modeling, Location Tracking, and Trajectory Prediction in Wireless ATM Networks”, IEEE Journal on Selected Areas in Communications, Vol. 16, No. 6 (1998) 922 - 936. 14. S. Lu, V. Bharghavan, ”Adaptive Resource Management Algorithm for Indoor Mobile Computing Environment”, in: Proceedings of ACM SIGCOMM ’96, Stanford, CA, Sept. (1996). 15. M. Naghshineh, M. Schwartz, ”Distributed Call Admission Control in Mobile/Wireless Networks”, IEEE Journal on Selected Areas in Communications, Vol. 14, No. 4 (1996) 711 - 717. 16. R. Ramjee, R. Nagarajan, D. Towsley, ”On Optimal Call Admission Control in Cellular Networks”, Wireless Networks Journal, Vol. 3, No. 1 (1997) 29 - 41. 17. Q. Ren, G. Ramamurthy, ”A Real-Time Dynamic Connection Admission Controller Based on Traffic Modeling, Measurement and Fuzzy Logic Control”, IEEE Journal on Selected Areas in Communications, Vol. 18, No. 2 (2000) 184 - 196. 18. J.A.K. Suykens, T. van Gestel, J. de Brabanter, B. de Moor, J. Vandewalle, ”Least Squares Support Vector Machines”, World Scientific, New Jersey, London, Singapore, Hong Kong (2002). 19. V.N. Vapnik, ”The Nature of Statistical Learning Theory”, Springer-Verlag, Berlin, Heidelberg, New York (1995). 20. V.N. Vapnik, ”Statistical Learning Theory”, John Wiley and Sons, (1998). 21. V.N. Vapnik, ”The Support Vector Method of Function Estimation”, in: J.A.K. Suykens, J. Vandewolle (Eds.), ”Nonlinear Modeling: Advanced Black-box Techniques”, Kluwer Academic Publishers, Boston (1998) pp. 55 - 85.
An Ontology-Based Intelligent Agent for Respiratory Waveform Classification Chang-Shing Lee and Mei-Hui Wang Department of Computer Science and Information Engineering National University of Tainan, Tainan, Taiwan
[email protected] Abstract. This paper presents an ontology-based intelligent agent for respiretory waveform classification to help the medical staff with the judging the respiratory waveform from the ventilator. We present the manual construction tool (MCT), the respiratory waveform ontology (RWO), and the intelligent classification agent (ICA) to implement the classification of the respiratory waveform. The MCT allows the medical experts to construct and store the fuzzy numbers of respiratory waveforms to the RWO. When the ICA receives an input respiratory waveform (IRW), it will retrieve the fuzzy numbers from the RWO to carry out the classification task. Next, the ICA will send the classified results to the medical experts to make a confirmation and store the classified results to the classified waveform repository (CWR). The experimental results show that our approach can classify the respiratory waveform effectively and efficiently.
1 Introduction The research on the ontology has been spread widely to be critical components in the knowledge management, Semantic Web, business-to-business applications, and several other application areas [2]. For example, C. S. Lee et al. [8] proposed a fuzzy ontology application to news summarization, M. H. Burstein [4] presented a dynamic invocation of semantic web services using unfamiliar ontologies, A. Hameed et al. [6] presented an approach to acquire knowledge and construct multiple experts’ ontologies in a uniform way, and R. Navigli et al. [9] presented an OntoLearn system for automated ontology learning to extract relevant domain terms from a corpus of text, relate them to appropriate concepts in a general-purpose ontology, and detect taxonomic and other semantic relations among the concepts. C. S. Lee et al. [7] proposed an intelligent fuzzy agent for meeting scheduling decision support system. Currently, for long-term mechanical ventilators (LMVs) patients, the ventilator provides the physician with some vital information such as pressure-time waveform to assist in diagnosis. Besides, the medical staff often needs to spend much of the monitoring and making out a large number of information about patients such as medical history, medicine administered, and the physiological variables. But, for physiological variables, they are continuously generated over time, and generally correspond to physio-pathological processes that require rapid responses. So, P. F’elix et al. [5] proposed the FTP (Fuzzy Temporal Profile) model to represent and reason M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1240 – 1248, 2006. © Springer-Verlag Berlin Heidelberg 2006
An Ontology-Based Intelligent Agent for Respiratory Waveform Classification
1241
on information concerning the evolution of a physical parameter, and then also study the applicability of this model in the recognition of signal patterns. In addition, because of the development of both the healthcare sciences as well as information technologies, the ICU units handle an ever-increasing amount of signals and parameters, so S. Barror et al. [1] proposed a patient supervision system that the fuzzy logic is playing in its designs. In this paper, we present an ontology-based intelligent agent for respiratory waveform classification. The experimental results show that our approach can work efficiently and effectively for classifying the respiratory waveform. The remainder of this paper is organized as follows. In Section 2, we briefly present the structure of the respiratory waveform ontology. Section 3 introduces the structure of ontology-based intelligent agent for respiratory waveform classification. The experimental results are presented in Section 4. Finally, the conclusions are drawn in Section 5.
2 The Structure of Respiratory Waveform Ontology In this section, first we briefly introduce the basic fuzzy notions for the fuzzy number. If one fuzzy set satisfies the normality and convexity, then this fuzzy set is called the fuzzy number [5]. For example, a fuzzy number π C can be denoted by
,
C = (a, b, c, d ), a ≤ b ≤ c ≤ d , where [b, c] is core core(C ) = {v ∈ R | π C (v) = 1} , and ]a, d [ is support, supp(C ) = {v ∈ R | π C (v) > 0} [10]. Then, we call a, b, c, and d are the begin support, begin core, end core, and end support, respectively [10]. Now we give the definition of the fuzzy number to generate the possibility distribution of the ~ respiratory waveform slope. A fuzzy number M is of LR-type if there exist reference functions L, R, and scalars α >0, β >0 with ⎧ ⎛ m− x⎞ ⎟ x≤m ⎪ L⎜ ⎪ ⎝ α ⎠ μ M~ ( x) = ⎨ ⎪ R⎛⎜ x − m ⎞⎟ x ≥ m ⎪ ⎜⎝ β ⎟⎠ ⎩
(1)
~
where m, called the means value of M , is a real number and α and β are called the
~
left and right spreads, respectively. Symbolically M is denoted by ( m, α , β ) LR [11]. Second, we introduce the concepts of the ventilator waveform. Fig. 1 shows the typical pressure-time diagram under the volume-controlled and constant flow. From the Fig. 1, the airway pressure depends on the alveolar pressure and the total of all airway resistances, and it can be affected by the resistance and compliance values specific to the ventilator and the lung. On inspiration, intra-thoracic volume is increased; this lowers intra-pleural pressure, making it more negative and causing the lungs to expand and the air to enter [3]. Consequently, at the beginning of the inspiration, the pressure between the point A and B increases dramatically on account of the resistances in the system. Then, after the point B, the pressure increases in a
1242
C.-S. Lee and M.-H. Wang Pressure
C
Peak pressure
D E
B
Plateau pressure
Gradient V/C
F
G
A
Inspiration time
Time
Expiration time
Fig. 1. Pressure-time diagram under the volume-controlled and constant flow
straight line until the peak pressure at the point C is reached. At the point C, the ventilator applies the set tidal volume and no further flow is delivered; therefore, the pressure quickly falls to the point D, the plateau pressure. Then, because of lung recruitment, the pressure lightly drops to the point E. In addition, the slope of line A-D will affect the static compliance and it is a key feature when examining the
Respiratory Waveform Ontology
Domain Layer
Major Category Layer
Minor Category Layer
Pressure-Time Diagram
Volume-Time Diagram
RT fuzzy numbers
Physiological Parameters Layer
Inspiration Time Expiration Time I: E Ration
Peak Pressure Plateau Pressure Mean Pressure PEEP
Lung Measurement
Dynamic Compliance Static Compliance Airway Resistance
Breath Rate Expiratory Minute Volume Expiratory Tidal Volume
Disease Layer
Asthma
Chronic Bronchitis 1. Persistent cough with sputum production for at least 3 months of the year for 2 consecutive years 2. Small airway is obstructed
Flow-Volume Loop
…
RWS fuzzy numbers
…
Respiratory Waveform Slope
Respiratory Airway Pressure
Test Mechanisms of Breathing
1. Wheezing 2. Breathlessness 3. Chest Tightness 4. Cough
Pressure-Volume Loop
RAP fuzzy numbers
Respiratory Time
Concept Layer
Flow-Time Diagram
Pressure-Time Slope Maximum Slope Minimum Slope
…
Work of Breathing Work done by Patients Work done by Ventilator Total Work
Bronchiectasis 1. An abnormal and permanent dilatation of the bronchi 2. Chronic infection 3. Cough with sputum
Arterial Blood Gas Analysis Blood Oxygen Saturation Partial Pressure of Carbon Dioxide Partial Pressure of Oxygen PH
Pneumoconiosis 1. Disorder caused by inhalation of mineral or biological dusts 2. Dyspnoea 3. Cough with sputum
Pulmonary Fibrosis …
1. Honeycomb Lung 2. Progressively Breathless 3. Dry, non-productive cough 4. Decreased Transfer Factor
Fig. 2. The structure of the respiratory waveform ontology
An Ontology-Based Intelligent Agent for Respiratory Waveform Classification
1243
patients. On expiration, the muscles of the chest wall relax and the lungs return to their original size by elastic recoil, with the expulsion of air. As a result, the pressure exponentially falls to point F, the expiratory-end pressure (EEP) or baseline pressure. Finally, we introduce the structure of the respiratory waveform ontology, shown as Fig. 2. Included the domain layer, major category layer, minor category layer and concept layer, the respiratory waveform ontology is an extended domain ontology of the [8]. The domain name is respiratory waveform ontology, and it consists of several major categories such as the pressure-time diagram, volume-time diagram, pressure-time loop, flow-volume loop, and so on. In the minor category layer, there are a couple of categories such as the respiratory time (RT) fuzzy numbers, respiratory airway pressure (RAP) fuzzy numbers, and respiratory waveform slope (RWS) fuzzy numbers, and so on. The concepts layer is divided into two sub-layers, the physiological parameters layer and disease layer. The physiological parameters layer contains the parameters of the fuzzy numbers, such as the respiratory time, respiratory airway pressure, respiratory waveform slope, and so on. And, the disease layer is with these fuzzy numbers like asthma, chronic bronchitis, pulmonary fibrosis, and so on.
3 The Structure of Ontology-Based Intelligent Agent for Respiratory Waveform Classification In this section, we briefly describe the functionality of the ontology-based intelligent agent for respiratory waveform classification, shown in Fig. 3. There are the manual construction tool (MCT), the respiratory waveform ontology (RWO), and the intelligent agent (ICA) proposed to implement the classification of the respiratory waveform. The MCT allows the medical experts to construct the parameters of the standard respiratory waveform (SRW) such as RT fuzzy numbers, and RAP fuzzy
Manual Construction Tool for Respiratory Waveform Ontology
Respiratory Waveform Ontology
Medical Experts
Input Respiratory Waveform
Intelligent Classification Agent Medical Experts Confirmation
Classified Respiratory Waveform
Not OK
OK
Fig. 3. The structure of the ontology-based intelligent agent for respiratory waveform classification
1244
C.-S. Lee and M.-H. Wang
numbers and then the respiratory waveform slope (RWS) fuzzy numbers is also automatically constructed. After completing the construction, all of these parameters of the SRW are restored to the RWO. If the ICA receives an IRW, based on the information restored in the RWO, the ICA starts carrying out the classification of the respiratory waveform. Next, the ICA sends the classified results to the medical experts to make a confirmation. If the classified results are passed the verification of the medical experts, then they are stored to the classified waveform repository (CWR). Otherwise, the IRW is sent back to the ICA to work again. Below is the algorithm for the ICA. Algorithm for the Intelligent Classification Agent ~ ~ Step 1: Retrieve the fuzzy numbers ~ pA , tA , ~ pB , and tB from RWO Step 1.1: ~ p = (n , r , δ ) A
pA
pA
p A LR
~ Step 1.2: tA = (nt A , rt A , δ t A ) LR Step 1.3: ~ pB = (m pB , α pB , β pB ) LR ~ Step 1.4: tB = (nt B , rt B , δ t B ) LR Step 2: Generate ~ s = (m , α , β AB
s AB
s AB
)
s AB LR
~ Step 2.1: Generate the positive slope ~ s AB where the Δ~ p is positive and Δ t is positive. m pB − n p A Step 2.1.1: ms AB = mt B − nt A
Step 2.1.2: α s AB = (m p B − n p A ) × (α t B + δ t A ) + (mt B − nt A ) × (α p B + δ p A ) Step 2.1.3: β s AB = (m p B − n p A ) × ( βt B + γ t A ) + (mt B − nt A ) × ( β p B + γ p A ) ~ Step 2.2: Generate the negative slope ~ s AB where the Δ~ p is negative and Δ t is positive. m pB − n p A Step 2.2.1: ms AB = mt B − nt A Step 2.2.2: α s AB = (mt B − nt A ) × (α p B + δ p A ) − (m p B − n p A ) × ( β t B + γ t A ) Step 2.2.3: β s AB = (mt B − nt A ) × ( β p B + γ p A ) − (m p B − n p A ) × (α t B + δ t A ) Step 3: Retrieve the SRW from the RWO and the IRW form the ventilator Step 4: For k ← 1 to c /* The c denotes of the number of the SRW Sk .*/ Step 4.1: For j ← 1 to n-1 /* The n denotes the number of the significant points.*/ Step 4.1.1: total _ μ IRW ∈Type Sk =
n −1
∑μ j =1
IRW ∈Type Sk
( x j ) /* μ IRW∈Type Sk ( x) denotes
the membership degree of the value x of the IRW belongs to the k-th SRW Type Sk . Step 4.2: Normalize the total _ μ IRW∈Type Sk and save it to μ IRW ∈Type S k ( x) .
An Ontology-Based Intelligent Agent for Respiratory Waveform Classification
μ IRW ∈Type S ( x) = k
total _ μ IRW ∈Type S
1245
k
n
∑ MAX {μ j =1
IRW ∈ Type S k
( x j )}
Step 5: Run the matching task for IRW and SRW Type1 n
{( μ IRW ∈Type S1 ( xslope ) > σ s ) ∧ ( I[ μ ~p ( x p )] > σ p ) ∧
If
AB
j
j =1
1
j
n
( I[ μ ~t ( xt )] > σ t )} j =1
Then μ f
j
1
j
n
1
n
p ( x p )], I[ μ ~ ( A ) ( y1 ) = I[ μ IRW ∈ Type S ( xslope ), I[ μ ~ t ( xt )] ] 1
j =1
j
j
j =1
j
j
Step 6: Run the matching task for IRW and SRW Type2
{( μ IRW ∈Type S ( x slope ) > σ s ) ∧
If
AB
2
n
( I[ μ ~p j ( x p j )] > σ p ) ∧ j =1
2
n
( I[ μ ~t j ( xt j )] > σ t )} 2
j =1
Then μ f
2
( A) ( y2 ) = I[ μ IRW ∈Type S ( xslope ), 2
n
n
j =1
j =1
I[μ ~p j ( x p j )], I[μ ~t j ( xt j )]]
Step 7: End.
4 Experimental Results To test the performance of the ontology-based intelligent agent for respiratory waveform classification, we set up a test experimental environment at National University of Tainan. First, we select two types of respiratory waveforms as the SRW Type1 and SRW Type2 denoting the characteristic of the typical normal waveform and highpressure waveform, respectively. Table1 shows the parameters of SRW Type1 and SRW Type2 defined by the medical staff. And, all of the parameters of signification points will be stored to the RWO. Second, we design an interface to allow the medical experts to define or tune the RT fuzzy numbers and RAP fuzzy numbers for the respiratory waveform. Then, the RWS fuzzy numbers will be constructed auto-matically. Fig. 4 shows the MCT for the SRW Type1. Finally, in order to evaluate the accuracy of the ICA, we choose the precision and recall as our criteria. Fig. 5 (a) and Fig. 5 (b) show the precision and recall versus the number of Table 1. The parameters of significant points for four types of the SRW
SRW Type1 Type2
A (0.2, 6) (0.2, 6)
Significant Point (Time: sec, Pressure: cmH2O) B C D E F G (0.3, 22) (1.1, 44) (1.2, 31) (1.9, 29) (2.2, 6) (4.4, 6) (0.3, 22) (1.1, 60) (1.2, 31) (1.9, 29) (2.2, 6) (4.4, 6)
1246
C.-S. Lee and M.-H. Wang
(a)
(b)
(c) Fig. 4. The manual construction tool of the RT, RAP, and RWS fuzzy numbers for the SRW Type1
An Ontology-Based Intelligent Agent for Respiratory Waveform Classification Precision
1247
SRWType1
120 100 80 60 40 20 0
SRWType2
10
40
70
100
130
160
190
220
250
The number of IRW
(a) Recall 105 100 95 90 SRWType1
85
SRWType2
80 75 10 30 50 70 90 110 130 150 170 190 210 230 250 270 The number of IRW
(b) Fig. 5. The precision and recall of the IRW for the SRW Type 1 and SRW Type 2
IRW curve, respectively, when the threshold σ s AB is 0.75. We observe that the average precision and recall for SRW Type1 is about 64% and 96%, respectively. And the average precision and recall for SRW Type2 is about 68% and 96%, respectively. The experimental results show that our approach can work effectively.
5 Conclusions In this paper, we present an ontology-based intelligent agent, including MCT, RWO, and ICA, for respiratory waveform classification to help the medical staff with the judging the respiratory waveform from the ventilator. From the experimental results, we observe that the proposed ontology-based intelligent agent applied to classify the respiratory waveform is effectively and efficiently. In the future, we will extend our method to process more different types of IRW and semi-automatically construct the RWO.
1248
C.-S. Lee and M.-H. Wang
Acknowledgement This work is partially supported by the National Science Council of Taiwan under the grant NSC94-2213-E-024-006, and the Service Web Technology Research Project of Institute for Information Industry and sponsored by MOEA, Taiwan.
References 1. S. Barro, R. Mar’in, F. Palacios, and R. Rui’z: Fuzzy logic in a patient supervision system, Artificial Intelligence in Medicine. Vol. 21. (2001) 193-199 2. C. Brewster, K. O’Hara, S. Fuller, Y. Wilks, E. Franconi, M. A. Musen. J. Ellman, and S. B. Shum: Knowledge representation with ontologies: the present and future. Vol. 19. (2004) 72-81 3. S. M. Burns: Working with respiratory waveforms: How to Use Bedside Graphics. AACN Clinical Issues. Vol. 14. (2003) 133-144 4. M. H, Burstein: Dynamic invocation of semantic web services that use unfamiliar ontologies. IEEE Intelligent Systems. Vol. 19. (2004) 67-73 5. P. Félix, S. Barro, and R. Marín: Fuzzy constraint networks for signal pattern recognition. Artificial Intelligence. Vol. 148. (2003) 103-140 6. A. Hameed, D. Sleeman, and A. Preece: Detecting mismatches among experts’ ontologies acquired through knowledge elicitation. Knowledge-Based Systems. Vol. 15. (2002) 265-273 7. C. S. Lee and C. Y. Pan: An intelligent fuzzy agent for meeting scheduling decision support system. Fuzzy Sets and Systems. Vol. 142. (2004) 467-488 8. C. S. Lee, Z. W. Jian, and L. K. Huang: A fuzzy ontology and its application to news summarization. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics. Vol. 35. (2005) 859-880 9. R. Navigli, P. Velardi, and A. Gangemi: Ontology learning and its application to automated terminology translation. IEE Intlligent Systems. Vol. 18. (2003) 22-31 10. J. Yen, and R. Langari: Fuzzy logic. Prentice-Hall, New Jersey (1999) 11. H. -J. Zimmermann: Fuzzy set theory and its applications. Kluwer, Boston (1991)
A New Inductive Learning Method for Multilabel Text Categorization Yu-Chuan Chang1, Shyi-Ming Chen2, and Churn-Jung Liau3 1
Department of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei, Taiwan, R.O.C.
[email protected] 2 Department of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei, Taiwan, R.O.C.
[email protected] 3 Institute of Information Science, Academia Sinica Taipei, Taiwan, R.O.C.
[email protected] Abstract. In this paper, we present a new inductive learning method for multilabel text categorization. The proposed method uses a mutual information measure to select terms and constructs document descriptor vectors for each category based on these terms. These document descriptor vectors form a document descriptor matrix. It also uses the document descriptor vectors to construct a document-similarity matrix based on the "cosine similarity measure". It then constructs a term-document relevance matrix by applying the inner product of the document descriptor matrix to the document similarity matrix. The proposed method infers the degree of relevance of the selected terms to construct the category descriptor vector of each category. Then, the relevance score between each category and a testing document is calculated by applying the inner product of its category descriptor vector to the document descriptor vector of the testing document. The maximum relevance score L is then chosen. If the relevance score between a category and the testing document divided by L is not less than a predefined threshold value λ between zero and one, then the document is classified into that category. We also compare the classification accuracy of the proposed method with that of the existing learning methods (i.e., Find Similar, Naïve Bayes, Bayes Nets and Decision Trees) in terms of the break-even point of micro-averaging for categorizing the "Reuters-21578 Aptè split" data set. The proposed method gets a higher average accuracy than the existing methods.
1 Introduction As the amount of information in the Internet is growing so rapidly, it is difficult for users to find desired information unless it is organized and managed well. Text categorization (TC) is a major research topic in machine learning (ML) and information retrieval (IR) to help users obtain desired information. A document can belong to a M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1249 – 1258, 2006. © Springer-Verlag Berlin Heidelberg 2006
1250
Y.-C. Chang, S.-M. Chen, and C.-J. Liau
single category, multiple categories, or not belong to any category. The goal of automatic text categorization is to utilize categorized training documents to construct text classifiers, which are then used to classify new documents into appropriate categories automatically. Several machine learning methods have been developed to deal with the text categorization problem, e.g., regression models [10], [17], nearest neighbor classification [16], [18], Bayesian probabilistic approaches [1], [12], [14], decision trees [1], [8], [12], inductive rule learning [1], [6], neural networks [18] and on-line learning [6]. In this paper, we present a new inductive learning method for multilabel text categorization. It uses the mutual information measure [11] for term selection and constructs document descriptor vectors for each category based on the selected terms. These document descriptor vectors form a document descriptor matrix and they are also used to construct a document-similarity matrix based on the cosine similarity measure [2]. It then constructs a term-document relevance matrix by applying the inner product of the document descriptor matrix to the document-similarity matrix. It infers the degree of relevance of the selected terms to construct the category descriptor vector of each category. The relevance score between each category and a testing document is calculated by applying the inner product of its category descriptor vector to the document descriptor vector of the testing document. The maximum relevance score L is then chosen. If the relevance score between a category and the testing document divided by L is not less than a predefined threshold value λ, where λ∈[0, 1], then the document is classified into that category. We also compare the classification accuracy of the proposed method with that of the existing learning methods (i.e., Find Similar [13], Decision Trees [5], Naïve Bayes and Bayes Nets [14]) in terms of the break-even point of micro-averaging for categorizing the "Reuters-21578 Aptè split 10 categories" data set [21]. The experimental results show that the proposed method outperforms the existing methods.
2 Preliminaries Machine learning systems deal with categorization problems by representing samples in terms of features in order to apply machine learning methods to text categorization, documents must be transformed into feature representations. The vector space model [13] is widely used in information retrieval for the representation of documents. In the vector space model, a document is represented as a document descriptor vector of terms and every element in the vector denotes the weight of a term with respect to the document. The learning methods presented in [10] and [17] calculate the tf × idf term weight for each term. The learning methods presented in [1], [8] and [12] use a binary weight 1 or 0 to represent each index term. The dimension of a term’s space is an important research topic in text categorization. With a high dimension, a classifier over fits training samples, which may be good for classification purposes, but it is not feasible for classifying previously unseen testing samples. The purpose of term selection is to choose relevant terms for document indexing that yield the highest accuracy rates. The simplest term selection method [19] is based on the frequency of a term’s occurrence in documents, where only terms that occur in the highest number of documents are retained. Other term
A New Inductive Learning Method for Multilabel Text Categorization
1251
selection methods are based on information-theoretic functions, such as the DIA association factor [9], information gain measure [6], [12], chi-square measure [4], and mutual information measure [8], [19]. In recent years, an increasing number of categorization methods have applied the mutual information (MI) measure in term selection [3], [7]. The mutual information score between term ti and category c is defined by
MI (t i , c) = p(t i , c) log 2
p(t i , c) , p(t i ) p(c)
(1)
where p(ti, c) = NC (ti)/NC, p(ti) = N(ti)/N, p(c) = NC /N, NC (ti) denotes the number of occurrences of term ti in category c, NC denotes the number of occurrences of all terms in category c, N(ti) denotes the number of occurrences of term ti in the collection, and N denotes the number of occurrences of all terms in the collection. There are some rules for determining the threshold value in multilabel text categorization [15], [16]. The threshold value is used when a document may belong to multiple categories. In this paper, we use a variant of the rank-based thresholding (R-cut) measure [16] to assign text documents into categories. We use the following criterion for multilabel text categorization: Score(c i , d ) ≥ λ, L
(2)
where Score(ci, d) denotes the relevance score between category ci and document d, L = max Score(c j , d ) denotes the maximum relevance score, λ is a threshold value j
that controls the degree of multilabel categorization, and λ ∈ [ 0, 1] . If the relevance score between category cj and document d divided by L is not less than the threshold value λ, where λ ∈ [ 0, 1] , then the document d is classified into category ci. The lower the threshold value λ, the more categories a document may belong to. If λ = 0, then the document belongs to all categories. Thus, the threshold value λ provides us some flexibility to deal with multilabel text categorization problem. In the following, we briefly review some classifiers [8], namely, Find Similar [13], Decision Trees [5], Naïve Bayes and Bayes Nets [14]. (1) Find Similar Classifier [13]: The Find Similar method is a variant of Rocchio’s method for relevance feedback, which is often used to expand queries based on the user’s relevance feedback. In text classification, Rocchio’s method calculates the weight wt of a term t as follows:
wt = α ⋅ wt + β ⋅
∑w
i∈ pos
t,i
N pos
+γ ⋅
∑w
i∈neg
t, i
N neg
,
(3)
where wt denotes the weight of term t, Npos denotes the number of positive documents in the category, Nneg denotes the number of negative documents in the category, and α, β and γ are the adjusting parameters. The method finds the representative centroid of the positive documents of each category and classifies a new document by comparing it with the centroid of each category by using a specific similarity measure. In [8],
1252
Y.-C. Chang, S.-M. Chen, and C.-J. Liau
Dumais et al. let α = 0, β = 1 and γ = 0 and use the Jaccard similarity measure to calculate the degree of similarity between the centroid of each category and a testing document. (2) Decision Trees Classifier [5]: A decision tree (DT) text classifier is a tree in which the internal nodes are labeled by terms, the branches are labeled by weights, and the leaves are labeled by categories. The classifier categorizes a testing document dj by recursively test the weights of its terms (i.e., the internal nodes) until a leaf node is reached. The label of the leaf node is then assigned to dj. The main advantage of the decision tree method is that it is easily interpretable by humans. (3) Naïve Bayes Classifier [12]: The Naïve Bayes (NB) classifier (also called the Probabilistic Classifier) is a popular approach for handling classification problems. It uses the joint probability of terms and categories to calculate the probability that the terms of a document belong to certain categories, and then applies the Bayesian Theory to calculate the probability of the document dj belonging to category ci: n
P (d j | ci ) = ∏ P ( wkj | c i ),
(4)
k =1
P ( wkj | ci ) =
P ( wkj , ci ) P (ci )
,
(5)
where P ( d j | c i ) denotes the probability of document dj belonging to category ci; P ( wkj | ci ) denotes the probability of term tk of document dj belonging to category ci, and n denotes the number of terms belonging to document dj and category ci. The naïve part of the NB method is the assumption of term independence. This makes NB classifiers far more efficient than non-naïve Bayes methods due to the fact that there is no need to consider the conditional probabilities of terms. (4) Bayes Nets Classifier [14]: In [14], Sahami utilizes the Bayesian network for classification, which relaxes the restrictive assumptions of the Naïve Bayes classifier. A 2-dependence Bayesian classifier allows for the probability that each feature is directly influenced by the appearance/non-appearance of at most two other features.
3 A New Inductive Learning Method for Multilabel Text Categorization In this section, we present a new inductive learning method for mulitlabel text categorization. The mutual information measure shown in Eq. (1) is used to select the top K terms that have the highest MI scores for a category. Assume there are N documents and K selected terms in category c. We use a K × N document descriptor matrix to represent the binary weights of the K selected terms in each document. A column in the document descriptor matrix is a document descriptor vector based on the K selected terms. For example, assume that there are 5 documents d1, d2, d3, d4, d5 and 4 selected terms t1, t2, t3, t4 in category c. Fig. 1 shows an example of a 4 × 5 document
A New Inductive Learning Method for Multilabel Text Categorization
1253
descriptor matrix A. Each column of the document descriptor matrix A represents the document descriptor vector of a document. For example, from the second column of the document descriptor matrix A, we can see the document descriptor vector d 2 of
the document d2, where d 2 = [1 0 1 0] . It indicates that the terms t1 and t3 are appearing in the document d2 and the terms t2 and t4 are not appearing in the document d2. d 1 d 2 d3 d4 d 5 t1 ª1 1 0 1 0º » t « A = 2 «1 0 1 0 0». t3 «1 1 1 0 1»
«
t4 ¬1
» 0 1 0 1¼
Fig. 1. Document descriptor matrix A
We use the “cosine similarity measure” [2] to construct a document-similarity matrix S, shown as follows:
S (i, j ) =
A(i) ⋅ A( j ) , A(i ) × A( j )
(6)
where the value of S(i, j) indicates the degree of similarity between document di and document dj, S(i, j)∈[0, 1], and A(i) and A (j) denote the ith and the jth column vectors of the document descriptor matrix A, respectively. For the document descriptor matrix A shown in Fig. 1, we can get its document-similarity matrix S, as shown in Fig. 2. d1
S =
d1 d2 d3 d4 d5
d2
ª 1 1 2 « 1 «1 2 « 3 2 1 6 « « 12 1 2 «1 2 1 2 ¬
d3
d4
3 2
12
1
6 1 0
2
1
2 0 1
6
0
d5 1
2º » 12 » 2 6 ». » 0 » 1 »¼
Fig. 2. Document-similarity matrix S
We can obtain the term-document relevance matrix R by applying the inner product of the document descriptor matrix A to the document-similarity matrix S, shown as follows:
R = A ⋅ S,
(7)
where the value of R(i, j) denotes the relevance degree of term ti with respect to document dj. Therefore, for the above example, we can get the term-document relevance matrix R, as shown in Fig. 3.
1254
Y.-C. Chang, S.-M. Chen, and C.-J. Liau
R =
t1 t2 t3 t4
d1 ª2.2 «1.9 « «3.3 « ¬2.6
d2
d3
d4
d5
2.4 1.3 2.2 1.2 º 1.1 1.9 0.5 1.5 »» . 2.6 3.1 1.2 3 » » 1.6 2.7 0.5 2.5¼
Fig. 3. Term-document relevance matrix R
We use Eq. (8) to get the category descriptor vector v c for category c, vc = R ⋅1,
(8)
where 1 = [1, 1, ..., 1] T . Thus, for the above example, we can get T T vc = R ⋅1 = R ⋅ [1 1 1 1] = [9.3 6.9 13.2 9.9] .
Then, we use the weight-averaged method to normalize v c . Thus, for the above example, v c is normalized into [0.24 0.17 0.34 0.25]. Finally, we refine the weight
vci of the ith term in the category descriptor vector v c into wci to obtain the refined category descriptor vector wc , where
wci = vci × log 2
|C | , cf i
(9)
wci denotes the refined weight of the ith term in the refined category descriptor vector wc , |C| denotes the number of categories, and cfi denotes the number of category descriptor vectors containing term ti. This refinement reduces the weights of the terms that appear in most of the categories and increases the weights of the terms that only appear in a few categories. Assume that the document descriptor vector of a testing document dnew is d new . We can then apply the inner product to calculate the relevance score Score(c, dnew) of category c with respect to the testing document dnew as follows: Score(c, dnew) = d new ⋅ wc .
(10)
We calculate the relevance score of each category with respect to dnew, rank these relevance scores, and then assign dnew to multiple categories according to Eq. (2). In other words, we choose the maximum relevance score L among them. If the relevance score between a category and the testing document divided by L is not less than a predefined threshold value λ, where λ∈[0, 1], then the document is classified into that category.
A New Inductive Learning Method for Multilabel Text Categorization
1255
4 Experimental Results We have implemented the proposed multilabel text categorization method to classify the "Reuters-21578 Aptè split 10 categories" data set [21] using Delphi Version 5.0 on a Pentium 4 PC. The "Aptè split 10 categories" data set contains the 10 top-sized categories obtained from the "Reuters-21578 Aptè split" data set [20], where each category has at least 100 training documents for training a classifier. We chose the "Aptè split 10 categories" data set as our experimental data set, because it accounts 75% of the "Reuters-21578 Aptè split" data set. Table 1 shows the category names of "Aptè split 10 categories", the number of training samples for each category, and the number of testing samples for each category. There are totally 6490 training documents and 2547 testing documents in the "Aptè split 10 categories" data set. Table 1. The number of training and testinging samples for each category of the "Aptè split 10 categories" data set
Category Names Earn Acq Money-fx Grain Crude Trade Interest Wheat Ship Corn
Number of Training samples 2877 1650 538 433 389 369 347 212 197 182
Number of Testing samples 1087 719 179 149 189 118 131 71 89 56
Several evaluation criteria for dealing with classification problems have been used in text categorization [15], [16]. The most widely used measures are based on the definitions of precision and recall. If a sample is classified into a category, we call it “positive” with respect to that category. Otherwise, we call it “negative”. In this paper, we use the following micro-averaging method [15] to evaluate the recall and the precision of the proposed method, where Recall =
∑ ∑
Precision =
C i =1
C i =1
TPi
TPi + ∑i =1 FN i C
∑ ∑
C i =1
C i =1
TPi
TPi + ∑i =1 FPi C
,
(11)
,
(12)
1256
Y.-C. Chang, S.-M. Chen, and C.-J. Liau
TPi denotes the number of correctly classified positive samples for category ci, FNi denotes the number of incorrectly classified negative samples for category ci, FPi denotes the number of incorrectly classified negative samples for category ci, and |C| denotes the number of categories. If the values of the precision and the recall of a classifier can be tuned to the same value, then the value is called the break-even point (BEP) of the system [12]. BEP has been widely used in text categorization evaluations. If the values of the precision and the recall are not exactly equal, we use the average of the nearest precision and recall values as the BEP. Based on the mutual information measure [11] for term selection, we select the top 300 terms for training classifiers. Table 2 compares the break-even point of the proposed method with those of four existing learning methods [8], namely, Find Similar, Decision Trees, Naïve Bayes and Bayes Nets. From Table 2, we can see that the proposed method gets a higher average accuracy than the existing methods. Table 2. Breakeven performance for Reuters-21578 Aptè split 10 categories
Methods Category Earn Acq Money-fx Grain Crude Trade Interest Ship Wheat Corn Average
Find Similar
Naïve Bayes
Bayes Nets
Decision Trees
92.9 % 64.7 % 46.7 % 67.5 % 70.1 % 65.1 % 63.4 % 49.2 % 68.9 % 48.2 % 64.6 %
95.9 % 87.8 % 56.6 % 78.8 % 79.5 % 63.9 % 64.9 % 85.4 % 69.7 % 65.3 % 81.5 %
95.8 % 88.3 % 58.8 % 81.4 % 79.6 % 69.0 % 71.3 % 84.4 % 82.7 % 76.4 % 85 %
97.8 % 89.7 % 66.2 % 85.0 % 85.0 % 72.5 % 67.1 % 74.2 % 92.5 % 91.8 % 88.4 %
The Proposed Method (λ = 0.87) 97.5 % 95.1 % 79.2 % 84.7 % 84.4 % 85 % 81 % 85.4 % 79.8 % 78.2 % 91.3 %
5 Conclusions In this paper, we have presented a new inductive learning method for multilabel text categorization. The proposed method uses a mutual information measure for term selection and constructs document descriptor vectors for each category based on the selected terms. These document descriptor vectors form a document descriptor matrix and they are also used to construct a document-similarity matrix based on the cosine similarity measure. It then constructs a term-document relevance matrix by applying the inner product of the document descriptor matrix to the document-similarity matrix. The proposed method infers the degree of relevance of the selected terms to construct the category descriptor vector of each category. The relevance score between each category and a testing document is calculated by applying the inner product of
A New Inductive Learning Method for Multilabel Text Categorization
1257
its category descriptor vector to the document descriptor vector of the testing document. The maximum relevance score L is then chosen. If the relevance score between a category and the testing document divided by L is not less than a predefined threshold value λ, where λ∈[0, 1], then the document is classified into that category. From the experimental results shown in Table 2, we can see that the proposed method gets a higher average accuracy than the existing methods.
Acknowledgements This work was supported in part by the National Science Council, Republic of China, under Grant NSC 94-2213-E-011-003.
References [1] Aptè, C., Damerau, F.J., Weiss, S.M.: Automatic Learning of Decision Rules for Text Categorization. ACM Transactions on Information Systems 1 (1997) 233−251 [2] Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999) [3] Bekkerman, R., Ran, E.Y., Tishby, N., Winter, Y.: Distributional Word Clusters vs. Words for Text Categorization. Journal of Machine Learning Research (2003) 1183−1208 [4] Caropreso, M.F., Matwin, S., Sebastiani, F.: A Learner-Independent Evaluation of the Usefulness of Statistical Phrases for Automated Text Categorization. In: Chin, A.G. (eds.): Text Databases and Document Management: Theory and Practice. Idea Group Publishing, Hershey PA (2001) 78−102 [5] Chinkering, D., Heckerman, D., Meek, C.: A Bayesian Approach for Learning Bayesian Networks with Local Structure. Proceedings of Thirteen Conference on Uncertainty in Artificial Intelligence. Morgan Kaufman, San Franscisco California (1997) 80−89 [6] Cohen, W.W., Singer, Y.: Context-Sensitive Learning Methods for Text Categorization. ACM Transactions on Information Systems. 17 (1999) 141−173 [7] Dhillon, I.S., Mallela, S., Kumar, R.: A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification. Journal of Machine Learning Research 3 (2003) 1265−1287 [8] Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representation for Text Categorization. Proceedings of CIKM-98, 7th ACM International Conference on Information and Knowledge Management. Bethesda MD (1998) 148−155 [9] Fuhr, N., Buckley, C.: A Probabilistic Learning Approach for Document Indexing. ACM Transactions on Information Systems 9 (1991) 323−248 [10] Fuhr, N., Pfeifer, U.: Probabilistic Information Retrieval as Combination of Abstraction Inductive Learning and Probabilistic Assumptions. ACM Transactions on Information Systems 12 (1994) 92−115 [11] Hankerson. D., Harris, G.A., Johnson, P.D., Jr.: Introduction to Information Theory and Data Compression. CRC Press, Boca Raton Florida. (1998) [12] Lewis, D.D., Ringuetee, M.: Comparison of Two Learning Algorithms for Text Categorization. Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (1994) 81−93
1258
Y.-C. Chang, S.-M. Chen, and C.-J. Liau
[13] Rocchio, J.J.: Relevance Feedback in Information Retrieval. In Salton G. (eds.): The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall New Jersey (1971) 313−323 [14] Sahami, M.: Learning Limited Dependence Bayesian Classifiers. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press, (1996) 335−338 [15] Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34 (2002) 1−47 [16] Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1 (1999) 69−90 [17] Yang, Y., Chute, C.G.: An Example-based Mapping Method for Text Categorization and Retrieval. ACM Transactions on Information Systems 12 (1994) 252−277 [18] Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. Proceedings of SIGIR-99 22th ACM International Conference on Research and Development in Information Retrieval. Berkeley, California (1999) 42−49 [19] Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. Proceedings of ICML-97 14th International Conference on Machine Learning. Nashville, TN (1997) 412−420 [20] Reuters-21578 Aptè split data set, http://kdd.ics.uci.edu/data-bases/reuters21578/ reuters21578.html [21] Reuters-21578 Aptè split 10 categories data set, http://ai-nlp.info.uniroma2.it/moschitti/ corpora.htm
An Intelligent Customer Retention System Bong-Horng Chu1,3, Kai-Chung Hsiao2, and Cheng-Seen Ho2,4 1
Department of Electronic Engineering, National Taiwan University of Science and Technology, 43 Keelung Road Sec. 4, Taipei 106, Taiwan
[email protected] 2 Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, 43 Keelung Road Sec. 4, Taipei 106, Taiwan
[email protected] 3 Telecommunication Laboratories, Chunghwa Telecom Co., Ltd. 11 Lane 74 Hsin-Yi Road Sec. 4, Taipei, 106, Taiwan
[email protected] 4 Department of Electronic Engineering, Hwa Hsia Institute of Technology, 111 Gong Jhuan Road, Chung Ho, Taipei, 235, Taiwan
[email protected] Abstract. This paper proposes an intelligent system for handling the customer retention task, which is getting important due to keen competition among companies in many modern industries. Taking wireless telecommunication industry as a target of research, our system first learns an optimized churn predictive model from a historical services database by the decision tree-based technique to support the prediction of defection probability of customers. We then construct a retention policy model which maps clusters of churn attributes to retention policies structured in a retention ontology. The retention policy model supports automatic proposing of suitable retention policies to retain a possible churner provided that he or she is a valuable subscriber. Our experiment shows the learned churn predictive model has around 85% of correctness in tenfold cross-validation. And a preliminary test on proposing suitable package plans shows the retention policy model works equally well as a commercial website. The fact that our system can automatically propose proper retention policies for possible churners according to their specific characteristics is new and important in customer retention study.
1 Introduction To build customer loyalty and maximize profitability, intelligent techniques for predicting customer churning behaviors are becoming more than necessary in customer relationship management (CRM), especially in the highly competitive wireless telecommunications industry. A study by KPMG about customer defection in the UK reported that 44 percent of UK consumers changed at least one of their key product or service suppliers in 2004 [1]. Analysts have estimated that churning cost wireless service providers in North America and Europe more than four billion US dollars a year [2]. In addition, analysts agreed that acquiring a new customer was many times more expensive than retaining an existing one. M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1259 – 1269, 2006. © Springer-Verlag Berlin Heidelberg 2006
1260
B.-H. Chu, K.-C. Hsiao, and C.-S. Ho
Besides predicting customer defection, proposing appropriate retention policies to retain profitable churners is another significant issue. So far, most of the researchers focus on analyzing customers’ behavior and predicting who are likely to churn [3], [4], [5]. There have been, however, no specific researches working on how policies can be made in accord with the churners. Knowing that the reasons of churn are distinct for different people, so are retention policies, by analyzing the specific characteristics of each churner we should be able to provide suitable policy to retain him. In this paper, we describe an intelligent system for customer retention by suggesting appropriate retention policies for possible churners according to their specific characteristics. In addition to predicting churners, our system deals with customer retention by first constructing an ontology, which contains comprehensive retention policies of various incentives and recommendations to deal with most possible causes of customer defection. It then digs out hidden correlations among the attribute values of the churners, trying to find out the knowledge of how retention policies are related to churn attribute clusters built upon the hidden correlations, which often reveal why a customer defects. The knowledge is then used to automatically propose retention policies for valuable churners. Note that the retention policy ontology we constructed not only can support the construction of specific retention policies but also can help general retention policy design and analysis.
2 Retention Policy Ontology and Historical Services Database Our system runs in two modes: the learning mode and the application mode. In the learning mode, it learns a churn predictive model and constructs a retention policy model; while in the application mode, it uses the above models to predict whether a customer defects and to propose proper retention policies to retain a potential, valuable churner. Before we further describe how each mode works, we first introduce the construction of our retention policy ontology and the structure of the exemplified historical customer services database. 2.1 Retention Policy Ontology The retention policy ontology is the key component in our system [6], [7]. To develop the domain ontology, we have done a survey on a variety of policies about CRM in lots of industries and academic papers, from which we analyzed the reasons for subscriber defection, the categories of retention policies, and the potential meaning of each policy [2], [8], [9], [10]. Based on these professional suggestions, we have collected comprehensive retention policies for most types of possible churners. There are, however, two issues we have to cope with about these collected retention policies. First, they may conflict with one another: each policy has its own specific function but the combination might not always do good things to a churner because of, e.g., mutual exclusion. The second issue is the genera of policies: policies with similar properties should be grouped into the same class to facilitate their usage. We can solve these two issues by constructing a retention policy ontology that can completely categorize all the retention policies into classes and clearly specify the
An Intelligent Customer Retention System
1261
conflicts between them. The ontology is designed to contain five main categories. The Money category defines the policies with respect to various fees, the Service category specifies the policies associated with value-added services the companies provide, the Goods category relates promotion plans that are associated with limited-edition souvenirs or discounted handsets, the Contact category lists promising channels in sustaining better communication with subscribers, and the Quality category contains the policies of improving quality-of-service, which involves the attitudes of customer service representatives, efficiency of engineering problem-solving, and so forth. Fig. 1 illustrates the detailed subcategories and the corresponding retention policies of the Money category as an example. Note that we further partition the Money category into eight different policy clusters, each containing a number of retention policies. A “conflict” relationship is associated with the cluster of Monthly rental policies in the figure, which means no two policies of the cluster can be recommended to a possible churner at the same time. Also an “alternative” relationship links the cluster of Message fees policies to that of Monthly rental policies. It means the policies in either cluster can be proposed for a possible churner, but not both can be chosen at a time. We used Protégé 3.0, which was developed by University of Stanford [11], to build this ontology.
Fig. 1. Part of retention policy ontology about Money category
2.2 Historical Services Database The historical customer services database of a major telecommunication company in Taiwan is used as our example database. It span for six months and contained information about 63,367 subscribers, each described by hundreds of attributes. Among the attributes, we identify 13 attributes, which we think are most significant in our work. We enumerate the value range of each attribute as shown in Table 1. The following represents an example subscriber data: Subscriber A: ‘15’, ‘8’, ‘3’, ‘2’, ‘4’, ‘7’, ‘1’, ‘3’, ‘0’, ‘0’, ‘1’, ‘2’, ‘2’. Note that some attributes in our database are numerical, including Tenure, Average invoice, and Average domestic invoice. They can not be directly used by the decision
1262
B.-H. Chu, K.-C. Hsiao, and C.-S. Ho
tree algorithm. We need to divide the range of each such numerical attribute into several intervals whose boundaries are determined according to the subscriber distribution: a cut point is chosen and a boundary is created when the slope of the subscriber distribution curve makes an apparent change. Table 1. Attributes and their enumerated values in the historical services database Attribute name Zip code
Industry code Service center Tenure Discount type Dealer ID Gender Package code
Stop-use Re-use Disconnect Average invoice Average domestic invoice
Value enumeration 0 (Taipei city), 1 (Keelung), 2 (Taipei county), 3 (Ilan), 4 (Hsinchu city and country), 5 (Taoyuan), 6 (Miaoli), 7 (Taichung city), 8 (Taichung country), 9 (Changhua), 10 (Nantou), 11 (Chiai city and country), 12 (Yunlin), 13 (Tainan city), 14 (Tainan country), 15 (Kaohsiung city), 16 (Kaohsiung country), 17 (Penghu), 18 (Kinmen), 19 (Pingtung), 20 (Taitung), and 21 (Hualien). 0 (Public), 1 (Business), 2 (Manufacture), 3 (Finance), 4 (Communication), 5 (Department Store), 6 (Social Service), 7 (Farming and Fishing), 8 (individual), and 9 (Other). 0 (Site1), 1 (Site2), 2 (Site3), 3 (Site4), 4 (Site5), 5 (Site6), and 6 (Site7). 0 (0~4), 1 (5~13), 2 (14~25), 3 26~37), 4 (38~61), and 5 (over 62 months). 0 (Government), 1 (Guaranty Free), 2 (Prepaid Card), 3 (Normal), 4 (Enterprise), 5 (Employee), 6 (Military), 7 (Official), 8 (Public Servant), 9 (Alliance), and 10 (Subordinate). The ID number (11 Dealers. 0 (Company), 1 (Male), and 2 (Female). 0 (Standard), 1 (Economy), 2 (Prepay), 3 (Ultra Low Rate), 4 (Base Rate), 5 (Base Rate plus 100NT), 6 (Base Rate plus 400NT), 7 (Base Rate plus 800NT), 8 (Base Rate plus 1500NT), and 9 (Special Low Rate). 0~9 times that customers actively lay claim to the telecommunication company for temporarily suspending telephone service. 0~7 times of service reopening. 0~2 times that a telecommunication company actively disconnects the subscriber’s telephone call for some reasons. 0 (0~100), 1 (101~200), 2 (201~500), 3 (501~1000), 4 (1001~2000), 5 (2001~3000), and 6 (over 3001 NT dollars). 0 (0~100), 1 (101~250), 2 (251~400), 3 (401~600), 4 (601~1000), 5 (1001~1500), 6 (1501~2000), 7 (2001~3000), 8 (over 3000 NT dollars).
3 The Learning Mode Fig. 2 shows the flow diagram of the learning mode. The Churn Model Learner learns the churn predictive model, which can decide subscribers’ loyalty from the historical
Historical Services Database
Churn Churn Model Model Learner Learner
Churn Predictive Model
Churn Churn Attribute Attribute Cluster Cluster Constructor Constructor
Churn Attribute Cluster Model
Policy Policy Model Model Constructor Constructor
Retention Retention Policy Policy Ontology Ontology
Retention Policy Model
Legend: Dashed line denotes process flow; solid lines denote data flows
Fig. 2. Flow diagram of the learning mode
An Intelligent Customer Retention System
1263
services database. The Churn Attribute Cluster Constructor discovers hidden correlations among the attribute values of the same database to form the churn attribute cluster model, based upon which and the retention policy ontology, the Policy Model Constructor is able to construct the retention policy model so as to recommend appropriate retention policies for possible churners. 3.1 Churn Model Learner The Churn Model Learner runs C4.5 algorithm [12], [13] on the historical services database to build a decision tree-represented churn predictive model. As we all know C4.5 doesn’t guarantee the created decision tree to be optimal, we thus use the greedy method [14] to optimize the churn predictive model by deleting each attribute one at a time to find out a better accuracy. This process then repeats until no more improvement can be obtained. The remaining attributes are used as the features for churn prediction. Our experiment showed that C4.5 originally produced a decision tree that contained all 13 attributes as listed in Table 1. The follow-up optimization process stopped until the fourth repetition, which produced no further improvement and thus we removed nothing. The previous three repetitions removed Zip code, Industry code, and Gender, respectively. The final churn predictive model thus contained only ten attributes. To our surprise, all the removed attributes belong to the demographic attributes. We therefore conjecture that the demographic-related attributes may not be crucial in deciding subscriber defection in our exemplified database. Fig. 3 shows part of the constructed churn predictive model.
Fig. 3. Churn predictive model (partial)
3.2 Churn Attribute Cluster Constructor Designing suitable retention policies for possible churners is not an easy task for humans. To do it automatically is even harder. One naïve idea is to provide possible churners with policies according to each single attribute value. It, however, may miss hidden, but significant interactions among attribute values, which can play a key role in explaining why subscribers defect. Our first task is thus to discover how attributes are associated with one another by the association rule mining technique. From the
1264
B.-H. Chu, K.-C. Hsiao, and C.-S. Ho
historical services database of all churners, we use Apriori algorithm [15] to mine the association rules, which are in the form of: Tenure=0 Average invoice=2 2480 ==> Package code=4 2363 conf: (0.95). This sample rule means that there are 2,480 churners whose Tenure is 0~4 months (code 0) and Average invoice is 201~500 NT dollars (code 2), and among whom 2,363 churn because the Package code is 4. The confidence of this rule is 0.95. In our experiment, we discovered 12 association rules with the minimum support set to 0.1 and confidence to 0.8. All of the attributes involved in a single rule have to be considered at the same time during the decision of suitable retention policies. In other words, we need to group the antecedents and consequents of a rule into a cluster, and define for it one or several retention policies accordingly. Fig. 4 shows the constructed churn attribute clusters corresponding to these rules. Note that some rules, like Rules 7 and 11, fall into the same group because they contain the same attributes after combination. In the figure some clusters are isolated while others are related by the “isa” relationship and form a class hierarchy. The complete churn attribute cluster model was similarly constructed from 300 association rules, with support from 0.1 down to 0.01 and fixed confidence = 0.8, as partially shown in Fig. 5. This hierarchical model allows us to easily set corresponding policies for two groups that are related by “isa”. Note that each cluster in the figure is marked with a support value, which is used for conflict resolution. The support values are created by the association rule mining. Specifically, we executed Apriori algorithm ten times starting from support = 0.1 down to support = 0.01 with 0.01 decrement each time. A less minimum support usually generates a greater number of large itemsets, which in turns causes more rules to be created. By comparing the current set of rules with those generated last time, we can associate the newly generated rules with the current minimum support value. This support-valuebased distinction between the rules allows us to solve any possible conflicts between two or more clusters of policies by preferring rules with higher supports during conflict resolution.
Fig. 4. Churn attribute clusters corresponding to 12 association rules
An Intelligent Customer Retention System
1265
Fig. 5. Part of churn attribute cluster model
There is one issue left: there are some attribute values that could not be properly clustered with other attribute values. It brings in a closely related issue: none of the attribute values of a possible churner can match any pre-clustered group of the churn attribute cluster model. Our solution to the issues is to define proper nominal policies for each attribute value that cannot be properly pre-clustered so that we can provide a nominal retention policy for a churner whose major cause of defection is difficult to find, even by an expert. 3.3 Policy Model Constructor Fig. 6 shows how we construct the retention policy model. First, the attribute values analysis module analyzes the churn attribute cluster model and the attributes that do not appear in the churn attribute cluster model in order to derive any possible explanations. The mapping knowledge construction module then follows the explanations to construct the mappings between the retention policies to each churn attribute cluster or singleton attributes. The policy conflict eliminator finally removes all conflicts from the mappings according to the support values. We take Fig. 5 as an example to demonstrate how we construct the retention policy model, as shown in Table 2 for its churn attribute clusters.
Churn Attribute Clusters
Attribute Values Analysis
Mapping Knowledge Construction
Policy Conflict Eliminator
Remaining Attributes
Fig. 6. Retention policy model construction
Retention Policy Model
1266
B.-H. Chu, K.-C. Hsiao, and C.-S. Ho Table 2. Construction of the retention policy model
Cluster Cluster 1 Tenure=0, Package code=4 Cluster 2 Zip code=15, Tenure=0, Package code=4
Explanation The duration of use is less than 4 months, and the package plan is Base Rate, which is a normal plan. So we recommend two policies. This cluster looks similar to Cluster 1 except Zip code shows people living in Kaohsiung city tend to churn. We conjecture that the condition might happen because of the efficiency of trouble solving. So we add a new policy.
The association between Average domestic invoice=0 and Package code=4 reveals no new information. So we still recommend policies according to the value of Tenure. In this cluster, Average invoice (code=2) is higher than Package plan (code=4). We thus add a new policy of changing package plans. This cluster is the superclass of Cluster 3 with one more attribute value: Service center=5. It implies that some services might be unsatisfactory in service center 5. As a result, we add a new policy. This cluster is the superclass of Cluster 1 with two more attribute values: Service center=5 and Gender=1. Unfortunately, it seems that no new information can be discovered from these two attribute values. So we recommend the same policies as for Cluster 1. This cluster is the superclass of Cluster 4 along with one Cluster 7 Tenure=0, Gender=1, more attribute value: Gender=1, which provides no specific Average invoice=2, Package information for policy making. We thus recommend the code=4 same policies as for Cluster 4. This cluster is the superclass of Cluster 4 with one more Cluster 8 Tenure=0, Average attribute: Average domestic invoice=0. Taking the values of invoice=2, Average domestic Average invoice and Average domestic invoice into invoice=0, Package code=4 account, the ratio of the average monthly domestic fees to the average monthly fees is lower than 50 percent. We thus add a new policy. Cluster 3 Tenure=0, Average domestic invoice=0, Package code=4 Cluster 4 Tenure=0, Average invoice=2, Package code=4 Cluster 5 Service center=5, Tenure=0, Package code=4, Average domestic invoice=0 Cluster 6 Service center=5, Tenure=0, Gender=1, Package code=4
Proposed retention policies -Provide all kinds of discountable tickets regularly. -Announce the newest discountable plans regularly. -Solve the troubles reported by the subscribers immediately and report to them as soon as the troubles are cleared. -Provide all kinds of discountable tickets regularly. -Announce the newest discountable plans regularly. -Provide all kinds of discountable tickets regularly. -Announce the newest discountable plans regularly. -Change package plan to “Base Rate Plus 100”. -Provide all kinds of discountable tickets regularly. -Announce the newest discountable plans regularly. -Enhance the ability of problem-solving of the engineers. -Provide all kinds of discountable tickets regularly. -Announce the newest discountable plans regularly. -Provide all kinds of discountable tickets regularly. -Announce the newest discountable plans regularly.
-Change package plan to “Base Rate Plus 100”. -Provide all kinds of discountable tickets regularly. -Announce the newest discountable plans regularly. -Select an appropriate message package plan to suit personal usage. -Provide all kinds of discountable tickets regularly. -Announce the newest discountable plans regularly.
4 The Application Mode Fig. 7 shows how the application mode works. The Churner Predictor uses the trained churn predictive model to decide the churn possibility of a given subscriber. If the subscriber tends to churner, his/her data will be sent to the Lifetime Customer Value discriminator, which decides whether he/she is a valuable one to retain. The lifetime value of a possible churner is calculated by Eq. (1). Lifetime _ Customer _ Value =
1 × Average _ Invoice Churn _ Score
(1)
where Churn_Score is the predicted churn rate by the churn predictive model and Average_Invoice is the average monthly invoice of a subscriber. If the calculated value is higher than a default lifetime customer value, a suitable set of retention policies will be proposed by the Policy Proposer, which first allocates the churner to some n (n≥0) churn attribute clusters according to his attribute values. The module then resorts to the retention policy model to retrieve and combine the corresponding retention policies. Note that the retention policy ontology will be checked to see whether there are any conflicts existing among these policies at the same time. Only when this check is OK, will the combined retention policies be proposed for the possible churner.
An Intelligent Customer Retention System
Subscribers Data
Churner Churner Predictor Predictor
1267
Churn Predictive Model
Possible churners
Lifetime Lifetime Customer Customer Value Value Discriminator Discriminator Subscribers to be retained Retention Retention Policy Policy Ontology Ontology
Policy Policy Proposer Proposer
Retention Policy Model
Retention policies to the subscribers
Legends: Dashed lines denote process flows; solid lines denote data flows
Fig. 7. Flow diagram of the application mode
5 Evaluation In Table 3 we show the results of tenfold cross-validation on the optimized churn predictive model (Fig. 3). Out of 63,367 subscribers, which contain 24,774 churn instances and 38,593 non-churn instances, 17,668 instances were correctly classified as positive (true positive), and 35,592 instances were correctly classified as negative (true negative). There were 7,106 instances incorrectly classified as negative (false negative) and 2,901 instances incorrectly classified as positive (false positive). On average, the accuracy degree of prediction is 84.21% ⎛⎜ 17668 + 35692 = 84.21% ⎞⎟ . ⎝ 63367
⎠
63367
Table 3. Accuracy of the optimized churn predictive model Churn Prediction Actual Class
+ -
Predicted Class + 17,668(27.88%) 2,901(4.57%)
Total 7,106(11.21%) 35,592(56.34%)
24,774 38,593 63,367
As for the evaluation of policy proposing, we found some telecommunication company provides a website to help the subscribers calculate which package plans fit them best [10] and it can be used to partially evaluate how well the automatic policy proposing mechanism in our system does. The input data of the website include: the total call minutes per month, the intra-carrier-call minutes per month, the crosscarrier-call minutes per month. After normalizing the definition of each package plan of the website to make it the same as ours, we randomly selected 10 churner instances from the historical services database for evaluation. The attribute values of the 10 instances are shown in Table 4. The results of the package plans proposed by the website and our system are shown in Table 5. It shows only case 9 results in different package plans. This divergence is because of the bad correspondence between the package plan selected by customer 9 and his calling minutes. Note that package code 0 stands for the “standard” payment
1268
B.-H. Chu, K.-C. Hsiao, and C.-S. Ho
package which has a fixed high monthly rental fee, from which the calling charge is not deductible, even though its per-minute calling charge is low. It is suitable to highvolume traffic customers. In that case, no divergence will happen and our system works as well, e.g., cases 1, 7, and 8. A low-volume traffic user, e.g., case 9, however, will end up with a very high average invoice and causes divergence. This divergence cannot be tackled by our system for now, because the Policy Proposer currently only considers deductible package plans. Table 4. The values of attributes of the testing churner data ID Package Monthly WithinCross- Average ID Package Monthly WithinCrossAverage code usage carrier carrier invoice code usage carrier carrier invoice minutes counts counts minutes counts counts 1 0 198 147 (65%) 80 (35%) 4 6 5 90 278 (90%) 31 (10%) 2 2 6 169 84 (30%) 199 (70%) 4 7 0 373 188 (39%) 298 (61%) 5 3 8 545 1164 (77%) 352 (23%) 6 8 0 220 58 (20%) 238 (80%) 4 4 1 639 106 (30%) 253 (70%) 5 9 0 55 164 (100%) 0 (0%) 3 5 4 150 76 (19%) 327 (81%) 4 10 1 42 14 (26%) 40 (74%) 2
Table 5. Comparison of package plans proposed by our system and the website Customer ID Package plan recommended by our system Package plan recommended by the website
1 7 7
2 7 7
3 8 8
4 8 8
5 7 7
6 5 5
7 8 8
8 7 7
9 6 4
10 4 5
6 Conclusions In this paper we described an intelligent customer retention system which works in two modes. In the learning mode, it learns about potential associations inside the historical services database to construct a churn predictive model. It also creates a churn attribute cluster model by exploiting the correlationships among the attribute values of all churners in the historical services database. The churn attribute cluster model, with the support of the retention policy ontology, allows us to create a retention policy model that maps appropriate retention policies to the clusters. In the application mode, it uses the churn predictive model to calculate the churn probability of a given subscriber. If the subscriber is decided to be a potential churner, it goes one further step to evaluate whether he is valuable to retain. A valuable churner will invoke the system to propose proper retention policies according to the retention policy model. Our experimental results show that the churn predictive model, containing only ten attributes, can reach a high degree of prediction accuracy. A preliminary test on proposing package plans also shows the system works equally well as a commercial website. The attributes we used to construct the churn predictive model in our system contain much information about the subscribers’ background and usage behavior, which explains why we can decide customer defection more objectively and correctly. More factors may also strongly affect the propensities of subscribers. For example, subscriber satisfaction can explain why poor services or inefficient problem solving makes subscribers lose their confidence in the carriers. Frequently checking the complaints of subscribers and figuring out what subscribers really want are
An Intelligent Customer Retention System
1269
indispensable to promote the brand image. Subscriber alliance is another example that the defection of a subscriber to anther carrier is likely to cause a “snow-ball effect”, which means that he/she may influence his/her business alliances to follow up. A more intricate investigation of complicated relationships between subscribers is necessary to prevent it from happening.
Acknowledgement This paper was partly supported by the National Science Council, Taiwan, R.O.C, under grant NSC 94-2213-E-228-006.
References 1. Oceansblue: Accurate Customer Churn Prediction with Information Inventor. Available at: http://www.oceansblue.co.uk/resources/iichurn1.pdf 2. Mozer, M.C., Wolniewicz, R., Grimes, D.B.: Predicting Subscriber’s Dissatisfaction and Improving Retention in the Wireless Telecommunications Industry. IEEE Transactions on Neural Networks, Vol. 11, 3 (2000) 690-696 3. Wei, C., Chiu, I.: Turning Telecommunications Call Details to Churn Prediction: A Data Mining Approach. Expert Systems with Applications, Vol. 23, 2 (2002) 103-112 4. Au, W.H., Chan, K.C.C., Yao, X.: A Novel Evolutionary Data Mining Algorithm With Applications to Churn Prediction. IEEE Transactions on Evolutionary Computation, Vol. 7, 6 (2003) 532-545 5. Yan, L., Wolniewicz, R.H., Dodier, R.: Predicting Customer Behavior in Telecommunications. IEEE Intelligent Systems, Vol. 19, 2 (2004) 50-58 6. Uschold, M., Gruninger, M.: ONTOLOGIES: Principles, Methods and Applications. The Knowledge Engineering Review, Vol. 11, 2 (1996) 93-136 7. Fensel, D.: Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce, Springer (2001) 8. Gerpott, T.J., Rambs, W., Schindler, A.: Customer Retention, Loyalty, and Satisfaction in the German Mobile Cellular Telecommunications Market. Telecommunications Policy, Vol. 25 (2001) 249-269 9. Kin, H.S., Yoon, C.H.: Determinants of Subscriber Churn and Customer Loyalty in the Korean Mobile Telephony Market. Telecommunications Policy, Vol. 28, Issue 9-10 (2004) 751-765 10. Chunghwa Telecom: Best Package Plan Calculation. Available at: http://www.cht.com.tw 11. Stanford Medical Informatics: The Protégé Ontology Editor and Knowledge Acquisition System. Available at: http://protege.stanford.edu 12. Quinlan, J.R.: Induction of Decision Tree. Machine Learning, Vol. 1 (1983) 81-106 13. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo CA (1993) 14. Dehne, F., Eavis, T., Rau-Chaplin, A.: Computing Partial Data Cubes for Parallel Data Warehousing Applications. In: Proc. of Euro PVM/MPI 01, Santorini Greece (2001) 15. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. of the 20th International Conference on Very Large Databases (1994) 487-499
Software Diagnosis Using Fuzzified Attribute Base on Modified MEPA Jr-Shian Chen1,2 and Ching-Hsue Cheng1 1
Department of Information Management, National Yunlin University of Science and Technology, 123, Section 3, University Road, Touliu, Yunlin 640, Taiwan
[email protected] 2 Department of Computer Science and Information Management, HUNGKUANG University, No.34, Chung-Chie Road, Shalu, Taichung 433, Taiwan
[email protected] Abstract. Currently, there are many data preprocess methods, such as data discretization, data cleaning, data integration and transformation, data reduction ... etc. Concept hierarchies are a form of data discretization that can use for data preprocessing. Using discrete data are usually more compact, shorter and more quickly than using continuous ones. So that we proposed a data discretization method, which is the modified minimize entropy principle approach to fuzzify attribute and then build the classification tree. For verification, two NASA software projects KC2 and JM1 are applied to illustrate our proposed method. We establish a prototype system to discrete data from these projects. The error rate and number of rules show that the proposed approaches are both better than other methods.
1 Introduction Defective software modules cause software failures, increase development and maintenance costs, and decrease customer satisfaction. Effective defect prediction models can help developers focus quality assurance activities on defect-prone modules and thus improve software quality by using resources more efficiently. In real-world databases are highly susceptible to noisy, missing, and inconsistent data. Noise is a random error or variance in a measured variable [1]. When decision trees are built, many of the branches may reflect noisy or outlier data. Therefore, data preprocessing steps are very important. There are many methods for data preprocessing. Concept hierarchies are a form of data discretization that can use for data preprocessing. Data discretization has many advantages. Data can be reduced and simplified. Using discrete features are usually more compact, shorter and more accurate than using continuous ones [2]. So that we proposed a modified minimize entropy principle approach on fuzzified attribute and then build the classification tree. This paper is organized as: The related work is described in section 2. Section 3 will be devoted to study the modified minimize entropy principle approach, and presents our proposed classifier method. Experiment results are shown in Section 4. The final is conclusions. M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1270 – 1279, 2006. © Springer-Verlag Berlin Heidelberg 2006
Software Diagnosis Using Fuzzified Attribute Base on Modified MEPA
1271
2 Related Work In this section, related work include data preprocess, classification decision tree and minimize entropy principle approach. 2.1 Software Diagnosis Software diagnosis is becoming an important field in software engineering and applications. Defective software modules cause software failures, increase development and maintenance costs, and decrease customer satisfaction. Hence, effective defect diagnosis models can help developers focus quality assurance activities on defect-prone modules and thus improve software quality by using resources more efficiently [3]. 2.2 Data Preprocess In data mining algorithms, the data preprocessing steps are very important. Preprocessing data affects the quality and the efficiency of the data mining algorithms. There are number of data reprocessing techniques. For example: data cleaning [4], data integration and transformation, data reduction [5][6]. Concept hierarchies [7] for numeric attributes can be constructed automatically based on data distribution analysis. Discretization maps similar values into one discrete bin, which can improve predictions by reducing the search space, reducing noise, and by pointing to important data characteristics. Discretization includes the unsupervised and supervised approaches. Unsupervised approaches divided the original feature value range into few equal-length or equal-data-frequency intervals. Supervised method used the maximizing measure involving the predicted variable, e.g. entropy or the chi-square statistics [2]. The entropy–based discretization can reduce the data size. Unlike the other methods, entropy–based discretization uses class information that makes it more likely. The interval boundaries are defined to occur in places that may help classification accuracy. 2.3 Classification Decision Tree ID3 and C4.5 Classification is an important data mining technique. Several classification models have been proposed, e.g. statistical based, distance based, neural network base and decision tree based [8]. A decision tree is a flow-chart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent class or class distributions [1]. The ID3 [9] is a decision tree algorithm that based on information theory. The basic strategy used by ID3 is to choose splitting attributes with the highest information gain. The concept used to quantify information is called entropy. Entropy is used to measure of information in an attribute. Assume that have a collection set S of c outcomes then the entropy is defined as
H ( S ) = ∑ ( − pi log 2 pi ) Where
pi the proportion of S belonging to class i .
(1)
1272
J.-S. Chen and C.-H. Cheng
Gain ( S , A) is information gain of example set S on attribute A is defined as
Gain( S , A) = H ( S ) − ∑ and where
Sv S
H (Sv )
(2)
v is a value of A , S v = subset of S , S v = number of elements in S v ,
S = number of elements in S . C4.5 is a software extension of the basic ID3 algorithm designed by Quinlan [10] to address the following issues not dealt with by ID3: avoiding overfitting the data, determining how deeply to grow a decision tree, reduced error pruning, rule postpruning, handling continuous attributes, choosing an appropriate attribute selection measure, handling training data with missing attribute values, handling attributes with differing costs, improving computational efficiency. 2.4 Minimize Entropy Principle Approach(MEPA) A key goal of entropy minimization analysis is to determine the quantity of information in a given data set. The entropy of a probability distribution is a measure of the uncertainty of the distribution [11]. To subdivide the data into membership functions, establishing the threshold between classes of data is needed. A threshold line can be determined with an entropy minimization screening method, and then start the segmentation process, first into two classes. Therefore, a repeated partitioning with threshold value calculations will allow us to partition the data set into a number of fuzzy sets [12]. Assume that a threshold value is being seeking for a sample in the range between x1 and x2. An entropy equation is written for the regions [x1, x] and [x, x2], and denote the first region p and the second region q. Entropy with each value of x are expressed as: [13]
S ( x ) = p ( x ) S p ( x ) + q( x ) S q ( x )
(3)
Where
S p ( x ) = −[ p1 ( x ) ln p1 ( x ) + p2 ( x ) ln p2 ( x )] S q ( x ) = −[ q1 ( x ) ln q1 ( x ) + q2 ( x ) ln q2 ( x )]
(4)
And pk(x) and qk(x) = conditional probabilities that the class k sample is in the region [x1, x1+x] and [x1+x, x2], respectively. p(x) and q(x) = probabilities that all samples are in the region [x1, x1+x] and [x1+x, x2], respectively. p(x) + q(x) = 1
(5)
A value of x that gives the minimum entropy is the optimum threshold value. The entropy estimates of pk(x) and qk(x), p(x) and q(x), are calculated as follows: [12]
Software Diagnosis Using Fuzzified Attribute Base on Modified MEPA
1273
pk ( x ) =
nk ( x ) + 1 n( x ) + 1
(6)
qk ( x ) =
N k ( x) + 1 N ( x) + 1
(7)
n( x ) n
(8)
q(x) = 1- p(x)
(9)
p( x ) =
where nk(x) = number of class k samples located in [x1, x1+x] n(x) = the total number of samples located in [x1, x1+x] Nk(x) = number of class k samples located in [x1+x, x2] N(x) = the total number of samples located in [x1+x, x2] n = total number of samples in [x1, x2]. 1 0.5 PRI 1
SEC1
PRI
SEC2
1
TER1
SEC1
TER2
PRI
TER3
SEC2
TER4
Fig. 1. Partitioning process of Minimize entropy principle approach
Figure 1 shows the partitioning process. While moving x in the region [x1, x2], we calculate the values of entropy for each position of x. The value of x in the region that holds the minimum entropy is called the primary threshold (PRI) value. With repeating this process, secondary threshold values can be determined which denote as SEC1 and SEC2. To develop seven partitions we need tertiary threshold values which denote as TER1, TER2, TER3 and TER4. [8]
1274
J.-S. Chen and C.-H. Cheng
3 Proposed Modified MEPA Approach to Fuzzify Attribute In this section, an approach is described. New approaches are proposed to improve the accuracy rate and reduce number of rules in C4.5 decision tree. The well-known dataset is giving in table 1, that is used to explain these approaches and our proposed method is described as follows: Table 1. A dataset for play game [8] page 18
outlook
temperature
humidity
Windy
play
sunny
85
85
FALSE
no
sunny
80
90
TRUE
no
overcast
83
86
FALSE
yes
rainy
70
96
FALSE
yes
rainy
68
80
FALSE
yes
rainy
65
70
TRUE
no
overcast
64
65
TRUE
yes
sunny
72
95
FALSE
no
sunny
69
70
FALSE
yes
rainy
75
80
FALSE
yes
sunny
75
70
TRUE
yes
overcast
72
90
TRUE
yes
overcast
81
75
FALSE
yes
rainy
71
91
TRUE
no
Step 1: Partition quantitative attributes and calculates the threshold value (PRI, SEC1, and SEC2). Entropy value of each data is computed by the entropy equation proposed by Christensen [13] described above. Table 2 shows that when x = 82.5, the entropy value S(x) is the smallest, so the PRI is 82.5. By repeating this procedure to subdivide the data, the thresholds can be obtained, as shown in Table 3. Step 2: Build membership function. Using the thresholds from step 1 as the midpoint of triangular fuzzy number, but we modify the first and last membership functions to be trapezoid. When the attribute value is below to SEC1 then the membership degree is equal to 1. That is the same to when attribute value is greater then SEC2. The membership function of minimize entropy principle approach can be established which is shown in Table 3 and Figure 2.
Software Diagnosis Using Fuzzified Attribute Base on Modified MEPA
1275
Step 3: Defuzzify the quantity data to determine its linguistic value. According to the membership function in step 2, the membership degree of each data is calculated to determine its linguistic value. (see Table 4). Step 4: Use the C4.5 decision tree classifier to build the model on the dataset. Use the linguistic values build the classification decision tree. Table 2. Example of entropy value calculation for attribute “humidity”
x P1(x) P2(x) Q1(x) Q2(x) P(x) Q(x)
75 + 80 = 77.5 2 1+1 = 0.333 5 +1 4 +1 = 0.833 5 +1 4 +1 = 0.500 9 +1 5 +1 = 0.600 9 +1 5 = 0.357 14 9 = 0.643 14
Sp(x) Sq(x) S(x)
80 + 85 = 82.5 2 1+1 = 0.250 7 +1 6 +1 = 0.875 7 +1 4 +1 = 0.625 7 +1 3+1 = 0.500 7 +1 7 = 0.500 14 7 = 0.500 14
0.518 0.653 0.605
85 + 86 = 85.5 2 2 +1 = 0.333 8+1 6 +1 = 0.778 8 +1 3+1 = 0.571 6 +1 3+1 = 0.571 6 +1 8 = 0.571 14 6 = 0.429 14
0.463 0.640 0.552
0.562 0.640 0.595
1.2
1
0.8
0.6
0.4
0.2
0 60
70
80
90
Fig. 2. Membership function of humidity
100
1276
J.-S. Chen and C.-H. Cheng Table 3. Thresholds of attributes
min 64 65
temperature humidity
sec1 66.5 72.5
pri 84 82.5
Sec2 85 95.5
max 85 96
Table 4. Fuzzified attribute base on modified MEPA
temperature
Linguistic value
humidity
Linguistic value
85
high
85
med
80
med
90
high
83
med
86
med
70
low
96
high
68
low
80
med
65
low
70
low
64
low
65
low
72
low
95
high
69
low
70
low
75
low
80
med
75
low
70
low
72
low
90
high
81
med
75
low
71
low
91
high
4 Case Verification Our empirical case study used data sets from two NASA software projects, labeled JM1 and KC2 [14]. JM1 is written in “C” and is a real-time predictive ground system: Uses simulations to generate predictions. The original dataset JM1 consisted of 10,885 software modules. We remove tuples that have 5 null values and 2 questionable values. The JM1 2102 modules had software faults and the remaining 8776 had no faults. KC2 is the data from “C++” functions using for scientific data processing. The original dataset KC2 consisted of 520 software modules, of which 106 modules had software faults and the remaining 414 had no faults. JM1 and KC2 projects have same with 21 attributes for software measurements. The attributes are list in Table 5. From the section 3’s statements establish a prototype system. The system exploit by the program language Delphi7.0. And then use the system to discrete data and use the linguistic values build the classification decision. The C4.5 classifier used in our studies is Weka open software [15].
Software Diagnosis Using Fuzzified Attribute Base on Modified MEPA
1277
The entropy value of each data is computed by equation (3)-(9). By repeating this procedure to partition the data, the thresholds can be obtained, as shown in Table 6. The proposed approaches are applied to build the classified decision tree on the dataset KC2 and JM1. The C4.5 method is also applied to compare with the proposed approaches. From Table 7, The Error rate and number of rules show that the proposed approaches are better than other methods. Table 5. Attribute information
no. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Attribute loc v(g) ev(g) iv(g) n v l d i e b t lOCode lOComment lOBlank lOCodeAndComment uniq_Op uniq_Opnd total_Op total_Opnd branchCount
Attribute information numeric % McCabe's line count of code numeric % McCabe "cyclomatic complexity" numeric % McCabe "essential complexity" numeric % McCabe "design complexity" numeric % Halstead total operators + operands numeric % Halstead "volume" numeric % Halstead "program length" numeric % Halstead "difficulty" numeric % Halstead "intelligence" numeric % Halstead "effort" numeric % Halstead numeric % Halstead's time estimator numeric % Halstead's line count numeric % Halstead's count of lines of comments numeric % Halstead's count of blank lines numeric numeric % unique operators numeric % unique operands numeric % total operators numeric % total operands numeric % of the flow graph
Table 6. Thresholds of attributes in KC2 & JM1 KC2 no. min
JM1
sec1
pri
sec2
max min
sec1
pri
sec2
max
1 11.50
38.50
93.50
3442
1
1
7.50
31.50
100.50
1275
2
1
1.50
6.50
15.50
180
1
1.50
8.50
15.50
470
3
1
3.50
4.50
7.50
125
1
3.50
6.50
12.50
165
4
1
2.50
3.50
8.50
143
1
1.50
4.50
9.50
402
5
1 26.50
72.50
305.50
3982
0
0.50
102.50
295.50
8441
6
0 109.21 354.28 1733.15
33814.56
0
1.00
699.33
7
0
2
0
0.01
0.06
0.01
0.12
0.34
2001.57 80843.08 0.16
1
1278
J.-S. Chen and C.-H. Cheng Table 6. (continued)
8
0
2.93
9.09
24.73
103.53
0
0.50
19.52
26.83
418.2
9
0 24.55
24.73
47.14
415.06
0
1.55
37.29
103.82
569.78
10
0 435.20 3600.08 35401.11 2147483.64
0
4.00 14248.47 41260.17 31079782
11
0
11.27
0
0.16
0.24
12
0 24.18 200.01 1966.73 153047.01
0
0.22
791.59
13
0
7.50
23.50
83.00
1107
0
0.50
26.50
67.50
2824
14
0
0.50
1.50
11.00
44
0
0.50
4.50
20.50
344
15
0
1.50
6.50
7.50
121
0
1.50
5.50
11.50
447
16
0
0.50
1.50
3.00
11
0
0.50
1.50
6.50
108
17
1
6.50
11.50
16.50
47
0
0.50
17.50
21.50
411
18
0
9.50
16.50
34.50
325
0
0.50
23.50
55.50
1026
19
1 14.50
80.50
148.00
2469
0
0.50
83.50
198.50
5420
20
0 11.50
49.50
118.50
1513
0
0.50
63.50
326.50
3021
21
1
12.00
26.00
361
1
2.00
11.50
29.50
826
0.04
2.00
0.13
0.58
0.67
26.95
2292.24 1726655
Table 7. Comparison with the proposed approaches and other model
Approaches Dataset Error rate Number of Rules Size of tree
Rforest[16]
C4.5
KC2 23.08% NA1
JM1 NA2 NA2
KC2 18.08% 26
JM1 20.45% 284
NA1
NA2
51
567
Proposed Approach KC2 JM1 17.69% 18.87% 17 169 25
253
Note: The NA1 denotes no given the answer in this method. The NA2 denotes that the method has not use this dataset to experiment.
5 Conclusions In this paper, we have proposed a new method to classify tree base on fuzzified attribute of modified MEPA. From Table 7, The Error rate and number of rules show that the proposed approaches are better than other methods. We can find that the proposed method has 2 main advantages as follows. 1. Reduce the size of tree and number of rules. 2. Improve the classification accuracy without removing the noise instance.
Software Diagnosis Using Fuzzified Attribute Base on Modified MEPA
1279
Future work could focus on the number of membership function used MEPA method, which could be fixed at two, three, and seven …etc. We can consider another flexibility method to build the membership function. That will leave as an area for future research.
Reference 1. J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco(2001) 2. H. Liu, F. Hussain, C. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, Vol. 6, 4(2002), 3. Norman Fenton and Shari Pfleeger. Software Metrics - A Rigorous and Practical Approach. Chapmann & Hall, London, (1997) 4. Vijayshankar Raman, Joe Hellerstein, Potter's wheel: An interactive data cleaning system. VLDB(2001) ,381-390, Roma, Italy. 5. U. Fayyad, G. P. Shapiro, and P. Smyth, The KDD process for extracting useful knowledge from volumes of data, Communications of the ACM, vol. 39(1996), 27-34 6. S. Mitra, S.K. Pal, p. Mitra, Data mining in soft computing framework: A survey, IEEE Trans. Neural Networks Vol. 13, 1 (2002) 3-14 7. Cai, Y., Cercone, N., and Han, J., Knowledge discovery in databases: an attribute-oriented approach, VLDB (1992) 547-559. 8. MH Dunham, Data Mining: Introductory and Advanced Topics, Prentice Hall, Upper Saddle River, NJ (2003) 9. J. Ross Quinlan, Induction of decision trees. Machine Learning, 1 (1986) 81-106 10. J. Ross Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA (1993) 11. Yager, R. and Filev, D., Template-based fuzzy system modeling, Intelligent and Fuzzy Sys., Vol. 2 (1994) 39-54. 12. Ross, T.J., Fuzzy logic with engineering applications, International edition, McGraw-Hill, USA. (2000) 13. Christensen, R., Entropy minimax sourcebook, Entropy Ltd., Lincoln, MA. (1980) 14. Sayyad Shirabad, J., Menzies, T.J. The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada. Available: http://promise.site.uottawa.ca/SERepository(2005) 15. Ian H. Witten, Eibe Frank, Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco (2005) 16. Taghi M. Khoshgoftaar, Naeem Seliya, Kehan Gao, Detecting noisy instances with the rule-based classification model, Intelligent Data Analysis, Vol.9, 4 (2005), 347-364
New Methods for Text Categorization Based on a New Feature Selection Method and a New Similarity Measure Between Documents Li-Wei Lee and Shyi-Ming Chen Department of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei, Taiwan, R.O.C.
[email protected] Abstract. In this paper, we present a new feature selection method based on document frequencies and statistical values. We also present a new similarity measure to calculate the degree of similarity between documents. Based on the proposed feature selection method and the proposed similarity measure between documents, we present three methods for dealing with the Reuters-21578 top 10 categories text categorization. The proposed methods get higher performance for dealing with the Reuters-21578 top 10 categories text categorization than that of the method presented in [4].
1 Introduction Text categorization is a task of classifying text documents into a predefined number of categories based on classification patterns [3], [22]. The terms appearing in documents are treated as features. One major difficulty in text categorization is the large dimension of the feature space. Therefore, we hope to reduce the dimension of the feature space to get a higher performance for dealing with the text categorization problem. One method for text categorization is based on the feature selection method [3], [4], [21]. Some results from the previous researches show that the semantic feature selection approach affects the performance of text categorization [3], [21]. Some feature selection methods have been presented to deal with a wide range of text categorization tasks, such as Chi-Square test [1], [2], [6], [20], Information Gain (IG) [1], [9], [10], [12], [15], and Mutual Information (MI) [2], [5], [8], [12], [13]. In this paper, we present a new feature selection method based on document frequencies and statistical values to select useful features. We also present a new similarity measure between documents. Based on the proposed feature selection method and the proposed similarity measure between documents, we present three methods for dealing with the Reuters-26578 top 10 categories text categorization. The proposed methods can get higher performance for dealing with the Reuters-26578 top 10 categories text categorization than that of the method presented in [4]. The rest of this paper is organized as follows. In Section 2, we briefly review the previous research for text categorization. In Section 3, we present a new feature selection M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1280 – 1289, 2006. © Springer-Verlag Berlin Heidelberg 2006
New Methods for Text Categorization Based on a New Feature Selection Method
1281
method based on document frequencies and statistical values for text categorization. In Section 4, we present a new similarity measure between documents. Based on the proposed feature selection method and the proposed similarity measure between documents, we present three methods to deal with the Reuters-26578 top 10 categories text categorization. In Section 5, we show the experimental results. The conclusions are discussed in Section 6.
2 Preliminaries In the vector space model [18], documents are usually represented by feature vectors of terms. The task of preprocessing consists of transforming capital letters into lowercased letters and removing stop words (such as “a”, “an”, “the”, etc.), where words are stemmed by applying the Porter algorithm [17]. The acquired document-term matrix is then transformed into TF-IDF (Term Frequency-Inverse Document Frequency) weights which are normalized by document lengths [19]. After the feature selection process, the dimension of the feature space is reduced and useful features are obtained. Therefore, the feature selection process is a very important task, and it can affect the performance of text categorization. Assume that F consists of n features f1, f2, …, fn and assume that S consists of m features s1, s2, …, sm, where S is a subset of F. The goal of the feature selection process is to choose an optimal subset S of F for text categorization. There are many statistic measures for dealing with the task of feature selection, e.g., Chi-Square [1], [2], [6], [20], Information Gain [1], [9], [10], [12], [15], and Mutual Information [2], [5], [8], [12], [13]. Among these measures, the mutual information measure is the most commonly used measure. It also has a better performance for dealing with the task of feature selection. In the following, we briefly review some feature selection measures, shown as follows: (1) Chi-Square test [2]: Fix a term t, let the class labels be 0 and 1. Let ki,0 denote the number of documents in class i not containing term t and let ki,1 denote the number of documents in class i containing term t. This gives us a 2 × 2 contingency matrix: It
C
0
1
0
k00
k01
1
k10
k11
where C and It denote Boolean random variables and k lm denotes the number of observations, where C ∈ {0,1} and I t ∈ {0,1} . Let n = k00 + k01 + k10 + k11. We can estimate the marginal distribution as follows: Pr(C = 0) = (k00 + k01)/n, Pr(C = 1) = (k10 + k11)/n, Pr(It = 0) = (k00 + k10)/n, Pr(It = 1) = (k01 + k11)/n.
1282
The χ χ
2
L.-W. Lee and S.-M. Chen
2
= ∑ l,m
test is shown as follows: (k lm − n Pr(C = l) Pr(I t = m))
2
n Pr(c = l) Pr(I t = m)
=
n(k11k 00 − k10k 01)
2
(k11 + k10 )(k 01 + k 00 )(k11 + k 01)(k10 + k 00 )
(1)
.
2 The larger the value of χ , the lower is our belief that the independence assumption is upheld by the observed data. In [2], Chakrabarti pointed out that for feature selec2 tion, it is adequate to sort terms in decreasing order of their χ values, train several classifiers with a varying number of features, and stopping at the point of maximum 2 accuracy (see [2], pp. 139). The larger the value of χ , the higher the priority to choose term t. For more details, please refer to [2]. (2) Information Gain Measure [23], [24]: For the binary document model and two 2 classes (the same as the case of the χ test), the Information Gain (IG) of term t with respect to the two classes can be written as follows: 1 1 1 IG(t ) = − ∑ P(ci ) log P(ci ) − ( − P(t ) ∑ P(ci | t ) log P(ci | t ) − P(t ) ∑ P(ci | t ) log P(ci | t )), i =0 i =0 i =0
P(c0 ) =
where P (t ) =
k 00 + k 01 k 00 + k 01 + k10 + k11
k 00 + k10 k 00 + k 01 + k10 + k11
P ( c1 | t ) =
k10 k 00 + k10
k 01 + k11
P (t ) =
,
, P(c0 | t ) =
P ( c1 ) =
,
k 00 + k 01 + k10 + k11
k 01 k 01 + k11
,
and P ( c1
| t) =
(2)
k10 + k11 k 00 + k 01 + k10 + k11 P (c 0 | t ) =
, k11
k 00 k 00 + k10
,
,
.
k 01 + k11
The larger the value of IG (t ) , the higher the priority to choose term t. For more details, please refer to [23] and [24]. (3) Mutual Information Measure [2]: For the binary document model and two classes 2 (the same as the case of the χ test), the Mutual Information (MI) of term t with respect to the two classes can be written as follows: MI ( I t , C ) =
∑ l , m∈{0,1}
k l ,m n
log
k l,m / n ( k l , 0 + k l ,1 )( k 0, m + k1,m ) / n
2
,
(3)
where n = k00 + k01 + k10 + k11. The larger the value of MI ( I t , C ) , the higher the priority to choose term t. For more details, please refer to [2].
New Methods for Text Categorization Based on a New Feature Selection Method
1283
3 A New Feature Selection Method Based on Statistical Values and Document Frequencies In this section, we present a new feature selection method based on statistical values and document frequencies. Let X and Y be two different classes of documents. The mean values μ X ,t and μ Y ,t of X and Y are 1 / | X | ( ∑ X x t ) and 1 / | Y | ( ∑Y y t ) , respectively, where xt denotes the TFIDF of term t in class X and yt denotes the TFIDF of term t in class Y. Furthermore, the variances σ X and σ Y of X and Y are 1 / | X | ∑ X ( x t − μ X ,t )
2
and
1 / | Y | ∑Y ( y t − μ Y ,t )
2
, respectively. Let |X| denote the
number of documents in the class X and let |Y| denote the number of documents in the class Y. Here, we consider the effect of document frequencies and variances for feature selection. Let DF(xt) denote the document frequencies of term xt in the X class and let DF(yt) denote the document frequency of term yt in the Y class. The proposed feature selection method is as follows: S (t ) =
( DF ( x t ) /|X| − DF ( y t ) /|Y|) 2
2
(1/|X|) ∑ X ( x t − μ X,t ) + (1/|Y|) ∑Y ( y t − μY,t )
2
.
(4)
The larger the value of S(t), the higher the priority to choose term t.
4 New Methods for Text Classification Based on the Proposed Similarity Measure and the k-NN Approach Many learning-based approaches have been presented to deal with the task of text categorization, e.g., the k-NN approach [7], [22], [24], support vector machines [5], [14], [24], Naïve Bayes approaches [5], [10], [24], and neural networks [16], [24]. In this paper, we present three classification methods based on the k-NN approach [7], [22], [24] to classify the Reuters-21578 top 10 categories data set. The k-NN classifier uses the k-nearest training documents with respect to a testing document to calculate the likelihood of categories. The document-document similarity measure used in the k-NN classifier is the most important part for text categorization. Most previous k-NN classifiers use the cosine similarity measure in the vector space v v model. The cosine similarity measure cos(a, b) , for measuring the degree of similarity between documents a and b is as follows: v v a⋅b v v cos(a, b) = v v , (5) a ⋅ b v v v v where cos(a, b) ∈ [0,1] , a and b denote the vector representation of the documents a v v and b, respectively. The larger the value of cos(a, b) , the higher the similarity between the documents a and b.
1284
L.-W. Lee and S.-M. Chen
The term weight wij calculated by TFIDF normalized by document lengths is the most commonly-used method [19] for the cosine similarity measure in the vector space model, where TFIDF ( t i ,d j )
wij =
,
(6)
T
∑ (TFIDF ( t ,d )) 2 k j
k = 1
wij denotes the weight of term i in document j, |T| the total number of terms, and Number of Documents TFIDF(t i ,d j ) = (Term Frequencyof Term t i in d ) × log( ). j Document Frequencyof Term t i
A comparison of the three proposed methods with Dona’s method [4] is shown in Table 1. Table 1. A comparison of the three proposed methods with Dona’s method Methods
Feature selection Term weight Similarity measure Classifier
Dona’s method [4] Mutual Information Measure [2, pp. 139-141] N/A N/A Naïve Bayes [24]
The first proposed method Mutual Information Measure [2, pp. 139-141] Formula (6) Formula (5) k-NN [24]
The second proposed method Mutual Information Measure [2, pp. 139-141] Boolean Formula (7) k-NN [24]
The third proposed method Formula (4) Boolean Formula (7) k-NN [24]
In the following, we summarize the three proposed methods as follows: (A) The First Proposed Method for Text Categorization: Step 1: Select a predefined number of features based on the Mutual Information (MI) Measure [2] shown in formula (3) to reduce the number of features of each document. Step 2: Given a testing document, calculate the term weight of the testing document by using formula (6). Find its k-nearest documents among the training documents by using formula (5). Step 3: The testing document belonging to the category has the largest summing weight. (B) The Second Proposed Method for Text Categorization: Step 1: Select a predefined number of features based on the Mutual Information (MI) Measure [2] shown in formula (3) to reduce the number of features of each document. Step 2: Given a testing document, find its k-nearest documents among the training documents by using the proposed document-document similarity measure described as follows. We use the Boolean method for document representation. Each term weight is either 0 or 1, where 0 means that the term is not appearing and 1 means that it is appearing in the document. Let M(d1, d2) denote the number of terms appearing in documents D1 and D2, simultaneously. The proposed similarity measure to calculate the degree of similarity Similarity(d1, d2) between documents is shown as follows:
New Methods for Text Categorization Based on a New Feature Selection Method
M ( d1 , d 2 )
Similarity ( d 1 , d 2 ) =
,
1285
(7)
d1 × d 2
where |d1| denotes the number of terms in document d1 and |d2| denotes the number of terms in document d2. Calculate the likelihood of the testing document belonging to each category by summing the weights of its k-nearest documents belonging to the category. For example, assume that there are 3-nearest training documents d1, d2, and d3 of testing document d4 as shown in Fig. 1. Assume that the degree of similarity between document d1 and the testing documents d4 is w1, the degree of the similarity between document d2 and the testing document d4 is w2, and the degree of similarity between document d3 and the testing document d4 is w3. Then, the summing weights of the documents d1 and d2 belonging to Category 1 are equal to w1 + w2, the weight of d3 belonging to Category 2 is w3, if (w1 + w2) < w3, then we let the testing document d4 belong to Category 2.
Category 1 d1
w1
d2 w2
d4 w3 d3 Category 2
Fig. 1. The 3-nearest training documents of testing document d4
Step 3: The testing document belonging to the category has the largest summing weight. (C) The Third Proposed Method for Text Categorization: Step 1: Select a predefined number of features based on the proposed feature selection method shown in formula (4) to reduce the number of features of each document. Step 2: The same as Step 2 of the second proposed method. Step 3: The testing document belonging to the category has the largest summing weight.
5 Experimental Results In our experiment, we use the Reuters-21578 “top 10 categories” data set [4], [25] shown in Table 2 for dealing with the text categorization.
1286
L.-W. Lee and S.-M. Chen Table 2. Top 10 categories of the Reuters-21578 data set [4], [25] Category Earn Acq Money-fx Grain Crude Trade Interest Ship Wheat Corn Total
Number of Training Documents 2877 1650 538 433 389 368 347 197 212 181 7769
Number of Testing Documents 1083 719 179 149 189 117 131 89 71 56 3019
We have implemented the proposed method by using MATLAB version 6.5 on a Pentium 4 PC. We use the Microaveraged F1 [2] for evaluating the performance of the proposed methods. Precision and Recall are defined as follows [2]: Precision =
number of documents retrieved that are relevant
,
(8)
total numberof documents that are retrieved
Recall =
number of documents retrieved that are relevant
.
(9)
total number of documents that are relevant
The relationship between the precision and the recall is characterized by a graph called the precision-recall curve. The F1 measure combines the precision and the recall defined as follows [2]: F1 =
2 × Precision × Recall Precision + Recall
(10)
.
For multiple categories, the precision, the recall, the microaveraged precision, the microaveraged recall, and the microaveraged F1 are calculated based on the global contingency matrix shown in Table 3, where pi =
ri =
ai a i + ci
ai a i + bi
microaveraged precision =
,
(11)
,
(12)
A A+C
=
∑ik=1 a i ∑ik=1 (a i + c i )
,
(13)
New Methods for Text Categorization Based on a New Feature Selection Method
microaveraged recall =
microaveraged F1 =
A A+B
=
∑ ik=1 a i ∑ ik=1 (a i + b i )
1287
(14)
,
2 × microaverage_precision × microaverage_recall micoraverage_precision + microaverage_recall
(15)
.
Table 3. The Contingency Matrix for a Set of Categories [2] Category C = {c1, c2, … c|c|}
Predicted “YES” c A = ∑ ai i |c| C = ∑ ci i
Actual class “YES”
Actual class “NO”
Predicted “NO” c B = ∑ bi i c D = ∑ di i
In our experiment, we use the proposed three methods (i.e. the first proposed method, the second proposed method and the third proposed method) shown in Table 1 to compare their performance for text categorization with the performance of the method presented in [4]. To compare the proposed three methods with the Dona’s Method [4], the microaveraged F1 is considered for comparing the system performances of different methods for text categorization. Table 4 shows the results of the Retuers-21578 top 10 categories text categorization of the methods. The experimental results show that the proposed three methods (i.e. the first proposed method, the second proposed method and the third proposed method) get higher performances than that of Dona’s method [4]. Table 4. A comparison of the performance of categorizing Reuters-21578 top 10 categories data set for different methods Methods F1 measure
Dona’s method [4]
The first proposed method
The second proposed method
The third proposed method
Categories Earn Acq Money-fx Grain Crude Trade Interest Ship Wheat Corn Microaveraged F1
98.04 96.67 76.54 57.47 79.43 85.60 73.38 68.75 48.39 44.02 74.06
96.82 89.2 73.13 57.41 70.7 65.63 60.97 84.85 39.27 42.95 81.99
97.76 95.43 73.5 60 73.68 70.72 60.76 85.71 44.25 38.71 84.6
97.54 95.56 73.12 59.67 74.42 72.44 65.25 86.75 47.01 37.57 84.73
6 Conclusions In this paper, we have presented a new feature selection method based on document frequencies and statistical values. We also have presented a new similarity measure to
1288
L.-W. Lee and S.-M. Chen
calculate the degree of similarity between documents. Based on the proposed feature selection method and the proposed similarity measures between documents, we also have presented three methods to deal with the categorization of the Reuters-21578 top 10 categories data set. The experimental results show that the proposed three methods get higher performance for text categorization than the method presented in [4].
Acknowledgements The authors would like to thank Professor Yuh-Jye Lee for his help during this research. This work was supported in part by the National Science Council, Republic of China, under grant NSC 94-2213-E-011-003.
References 1. Caropreso, M. F., Matwin, S., Sebastiani, F.: A Learner-Independent Evaluation of the Usefulness of Statistical Phrases for Automated Text Categorization. In: A. G. Chin, eds. Text Databases and Document Management: Theory and Practice, Idea Group Publishing, Hershey, PA (2001) 78–102 2. Chakrabarti, S.: Mining the Web. New York: Morgan Kaufmann (2003) 137–144 3. Chua, S., K, N.: Semantic Feature Selection Using WordNet. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (2004) 4. Doan, S.: An Efficient Feature Selection Using Multi-Criteria in Text Categorization. Proceedings of the IEEE Fourth International Conference on Hybrid Intelligent Systems (2004) 5. Dumais, S. T., Plant, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations for Text Categorization. Proceedings of the 7th ACM International Conference on Information and Knowledge Management (1998) 148–155 6. Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization. Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries (2000) 59–68 7. Lam, W., Ho, C. Y.: Using a Generalized Instance Set for Automatic Text Categorization. Proceedings of SIGIR-98 the 21st ACM International Conference on Research and Development in Information Retrieval (1998) 195–202 8. Larkey, L. S., Croft, W. B.: Combining Classifiers in Text Categorization. Proceedings of the 19th ACM International Conference on Research and Development in Information Retrieval (1996) 289–297 9. Larkey, L. S.: Automatic Essay Grading Using Text Categorization Techniques. Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval (1998) 90–95 10. Lewis, D. D.: An evaluation of phrasal and clustered representations on a text categorization task. Proceedings of the 15th ACM International Conference on Research and Development in Information Retrieval (1992) 37–50 11. Lewis, D. D.: Representation and Learning in Information Retrieval. Ph.D. Dissertation, Department of Computer Science, University of Massachusetts, Amherst, MA (1992) 12. Lewis, D. D.: and Ringuette, M., A Comparison of Two Learning Algorithms for Text Categorization. Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval (1994) 81–93
New Methods for Text Categorization Based on a New Feature Selection Method
1289
13. Li, Y. H., Jain, A. K.: Classification of Text Documents, Computer Journal Vol. 41, No. 8 (1998) 537–546 14. Li, H., Yamanishi, K.: Text Classification Using ESC-Based Stochastic Decision Lists. Proceedings of the 8th ACM International Conference on Information and Knowledge Management (1999) 122–130 15. Mladenic, D.: Feature Subset Selection in Text Learning. Proceedings of the 10th European Conference on Machine Learning (1998) 95–100 16. Ng, H. T., Goh, W. B., Low, K. L.: Feature Selection, Perceptron Learning, and a Usability Case Study for Text Categorization. Proceedings of the 20th ACM International Conference on Research and Development in Information Retrieval (1997) 67–73 17. Porter, M. F.: An Algorithm for Suffic Stripping Program. Vol. 14, No. 3 (1980) 130–137 18. Salton, G., Wong, A., Yang, C.: A Vector Space Model for Automatic Indexing. Communications of the ACM, Vol. 18, No. 11 (1975) 613–620 19. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Survey, Vol. 34, No. 1 (2002) 1–47 20. Sebastiani, F., Sperduti, A., Valdambrini, N.: An Improved Boosting Algorithm and its Application to Automated Text Categorization. Proceedings of the 9th ACM International Conference on Information and Knowledge Management (2000) 78–85 21. Shima, K., Todoriki, M., Suzuki, A.: SVM-Based Feature Selection of Latent Semantic Features. Pattern Recognition Letters 25 (2004) 1051–1057 22. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval Journal, Vol. 1, No. 1–2, (1999) 69–90 23. Yang, Y. Pedersen, J.: A Comparative Study on Feature Selection in Text Categorization. Proceedings of the 14th International Conference on Machine Learning (1997) 412–420 24. Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. Proceedings of the SIGIR-99 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, CA (1999) 42–49 25. Reuter-21578 Apte Split Data Set, http://kdd.ics.uci.edu/data-bases/reuter21578/ reuter221578.html
Using Positive Region to Reduce the Computational Complexity of Discernibility Matrix Method Feng Honghai1,2, Zhao Shuo1, Liu Baoyan3, He LiYun3, Yang Bingru2, and Li Yueli1 1
Hebei Agricultural University, 071001 Baoding, China
[email protected] 2 University of Science and Technology Beijing, 100083 Beijing, China 3 China Academy of Traditional Chinese Medicine, 100700 Beijing, China
Abstract. Rough set discernibility matrix method is a valid method to attribute reduction. However, it is a NP-hard problem. Up until now, though some methods have been proposed to improve this problem, the case is not improved well. We find that the idea of discernibility matrix can be used to not only the whole data but also partial data. So we present a new algorithm to reduce the computational complexity. Firstly, select a condition attribute C that holds the largest measure of γ(C, D) in which the decision attribute D depends on C. Secondly, with the examples in the non-positive region, build a discernibility matrix to create attribute reduction. Thirdly, combine the attributes generated in the above two steps into the attribute reduction set. Additionally, we give a proof of the rationality of our method. The larger the positive region is; the more the complexity is reduced. Four Experimental results indicate that the computational complexity is reduced by 67%, 83%, 41%, and 30% respectively and the reduced attribute sets are the same as the standard discernibility matrix method.
1 Introduction The rough set theory introduced by Pawlak [1] provides a systematic framework to study the problems arising from imprecise and insufficient knowledge. In a real world information system there may be hundreds of attributes, many of which may be irrelevant to the decision-making. So, knowledge reduction is one of the most important problems in rough set theory. Because an information system may usually have more than one reduction, we always hope to obtain the set of the most concise rules. Unfortunately, it has been shown that finding the minimal reduct of an information system is a NP-hard problem [2]. Therefore, heuristic methods that explore a reduced search space are commonly used for attribute reduction. In these kinds of methods, the measure of significance of every attribute is analyzed, and regard it as a heuristic information in order to decrease search space. WEI has given uncertainty measures in probabilistic rough set using fuzzy entropy of the fuzzy set [3]. Some authors have introduced information-theoretic measures of uncertainty for rough sets [4]. In addition, some authors apply genetic algorithm to find the suboptimal set [5] and use approximation techniques [6] to obtain relative reduct [7], finding relative core and relative knowledge reduction. M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1290 – 1298, 2006. © Springer-Verlag Berlin Heidelberg 2006
Using Positive Region to Reduce the Computational Complexity
1291
2 Concepts of Rough Set Suppose given two finite, non-empty sets U and A, where U is the universe, and A – a set of attributes. With every attribute a∈A we associate a set Va, of its values, called the domain of a. Any subset B of A determines a binary relation I (B) on U, which will be called an indiscernibility relation, and is defined as follows: xI(B)y if and only if a(x) = a(y) for every a∈A, where a(x) denotes the value of attribute a for element x. Obviously I(B) is an equivalence relation. The family of all equivalence classes of I(B), i.e., partition determined by B, will be denoted by U/IND(B); an equivalence class of IND(B), i.e., block of the partition U/ IND(B), containing x will be denoted by B(x). If (x, y) belongs to IND(B) we will say that x and y are B-indiscernible. Equivalence classes of the relation IND(B) (or blocks of the partition U/IND(B)) are referred to as B-elementary sets. In the rough set approach the elementary sets are the basic building blocks (concepts) of our knowledge about reality. The indiscernibility relation will be used next to define approximations, basic concepts of rough set theory. Approximations can be defined as follows:
B∗ ( X ) = {x ∈ U : B( x ) ⊆ X } , B ∗ ( X ) = {x ∈ U : B( x ) ∩ X ≠ ∅}, assigning to every subset X of the universe U two sets B*(X) and B*(X) called the Blower and the B-upper approximation of X, respectively. The set
BN B ( X ) = B∗ ( X ) − B∗ ( X ) will be referred to as the B-boundary region of X. Sometimes we distinguish in an information table two classes of attributes, called condition denoted by C and decision attributes denoted by D. The coefficient γ(C, D) expresses the ratio of all elements of the universe, which can be properly classified to blocks of the partition U/D, employing attributes C.
γ (C , D) =
| POSC ( D) | , where POSC ( D) = |U |
UC ( X )
∗ X ∈U / I ( D )
The expression POSC(D), called a positive region of the partition U/IND(D) with respect to C, is the set of all elements of U that can be uniquely classified to blocks of the partition U/IND(D), by means of C. By a discernibility matrix of B ⊆ A denoted M (B) a n × n matrix is defined as
(cij ) = {a ∈ B : ( xi ) ≠ a ( x j )} for i, j = 1,2,K, n . Let us assign to each attribute a ∈ B a binary Boolean variable
a , and let
Σδ ( x, y ) denote Boolean sum of all Boolean variables assigned to the set of attributes δ ( x, y ) . Then the discernibility function can be defined by the formula f ( B) =
∏{Σδ ( x, y) : ( x, y) ∈U
( x , y )∈U
2
2
and
δ ( x, y ) ≠ ∅} .
1292
F. Honghai et al.
The following property establishes the relationship between disjunctive normal form of the function f(B) and the set of all reducts of B. All constituents in the minimal disjunctive normal form of the function f(B) are all reducts of B. In order to compute the value core and value reducts for x we can also use the discernibility matrix as defined before and the discernibility function, which must be slightly modified:
f x ( B) = ∏{Σδ ( x, y ) : y ∈ U and δ ( x, y ) ≠ ∅} . y∈U
The D-core is the set of all single element entries of the discernibility matrix MD(C), i.e.
CORED (C ) = {a ∈ C : cij = (a ), for some i, j} .
3 Algorithm (1) For (C =0; C 0 for 1≤ i ≤ r, σ j = 0 for j ≥ r+1. Matrices U and V contain left and right singular vectors of A, respectively, and diagonal matrix, Σ, contains singular values of A. If only the k largest singular values of Σ are kept along with their corresponding columns in the matrices U and V, we obtain new matrices Uk ,Vk with reduced dimension. A rank-k approximation to A is constructed with the following formula:
A ≈ Ak = U k ∑ k VkT
.
Ak is the unique matrix of rank k that is closest
in the least squares sense to A. 2.2 Diagnostic Text Classification (DTC) The DTC component has the following two major computational steps, formulating a query document vector based input problem description, and determining the similarity between the query document and the diagnostic categories. An input problem description, d, is first transformed into a pseudo-document within the reduced termcategory space. Namely d is transformed to q , which is a term vector with the same length M as T_L, and q = ( q1 ,..., q M ) , where qi is the frequency of the ith term on T_L occurred in the query document d, i = 1, …, M. The decision on the diagnostic T
1314
L. Huang and Y.L. Murphey
category of
q is made based on the similarity measure q and each column vector of
a j = (a1 j ,..., aMj )T , j=1, …, N, We have investigated three different similarity measures between the query vector q and the TCF
A. Let the column vectors of A be
matrix A. The first similarity function of the query vector and a diagnostic category is measured by the cosine measure [7] denoted as follows: M
r (q , a j ) = c j
q • aj q • aj
∑q a i
=
ij
i −1
M
M
∑q ∑a 2 i
i =1
i =1
2 ij
Pearson's correlation coefficient is another measure of extent to which two vectors are related[8]. The formula for Pearson’s correlation coefficient takes on many forms. A commonly used formula is shown below: M
r jp (q , a j ) =
M
M
M ∑ qi aij − ∑ qi ∑ aij i =1
i =1
⎡ ⎛ ⎞ ⎢M ∑ qi2 − ⎜ ∑ qi ⎟ ⎝ i =1 ⎠ ⎢⎣ i =1 M
M
2
i =1
2 ⎤⎡ ⎛M ⎞ ⎤ ⎥ ⎢ M ∑ aij2 − ⎜ ∑ aij ⎟ ⎥ ⎝ i =1 ⎠ ⎥⎦ ⎥⎦ ⎢⎣ i=1 M
The third measure we investigated in our system is Spearman rank correlation coefficient. It provides a measure of how closely two sets of rankings agree with each other. Spearman rank correlation coefficient is a special case of the Pearson correlation coefficient in which the data are converted to ranks before calculating the coefficient. It can be used when the distribution of the data makes the latter undesirable or misleading [9]. The formula for Spearman's rank correlation coefficient is: M
r js = 1 −
6∑ d i2 i =1
M ( M 2 − 1)
d = (d1 ,..., d M )T is the difference vector between ranks of corresponding values of q and a j .
where:
3 Experiment Results The proposed DKE&TC system has been fully implemented. The LFT program used a training data set of 200,000 documents that describe 54 different categories of auto problems to construct a TCW matrix A with dimensions of 3972x54. Based on the practical consideration, for each input query document d, the DKE&TC finds the three diagnostic categories that best match d. The performance measure is the
Text Mining with Application to Engineering Diagnostics
1315
% of accuracy
percentage of the queries in the test set whose true categories are contained in the matched categories returned by DKE&TC. The performances of DKE&TC system and the SVD system on the test set of 6000 queries are shown in Figure 2 (a). The retrieval performance of SVD is heavily affected by its value of rank K. We explored the effects of varied K values used in SVD and compared it with the DKE&TC’s performance. It appears that the best performance was achieved with K equal to 14. However the DKE&TE system outperformed the best SVD system by more than 11%.
0.6
40.00% 20.00% 0.00% p no y rm al Bid lo f g lo -idf gG fid f
0.4
60.00%
en tro
% of Accuracy
0.8
80.00%
0.2
weight schemas
0 1
3
5
7
9 11 13 15 17 19 21 23 25 27 29 31 33 35
DKE&TE(Cosine)
K Value
DKE&TE(Pearson)
DKE&TE(Spearman) SVD
DKE&TC
(a)
(b)
Fig. 2. Performance evaluation. (a) Performances of the proposed DKE&TC system vs SVD with various K values. (b) Comparing DKE&TE with various weight schemes and similarity functions.
We also compared the performance of the DKE&TE system with other weighting schemes commonly used in text document categorization and the results are shown in Figure 2(b) along with the performances of the DKE&TC systems that used three different similarity functions, cosine, Pearson, and Spearman. The definition of these weight schemes are defined in Table 1 in which, tfij is the frequency of term i occurred in document category j, dfi is the total number of document categories in the training data that contain term i, gfi is global frequency at which term i occurs in the entire training data, ndocs is the total number of document categories in the entire collection. In addition, we define:
pij =
⎧⎪0 if tf ij = 0 . Bij = ⎨ gfi , and ⎪⎩1 if tfij > 0
tf ij
Figure 3 shows that the DKE&TC systems used similarity functions of Pearson and cosine gave the best performances. Table 2 shows three examples of document queries and the diagnostic categories classified by the DKE&TC system. The first column lists the three examples of input queries and the right column lists three top matched diagnostic categories to each input query. The correct categories are highlighted.
1316
L. Huang and Y.L. Murphey Table 1. Definitions of various weight schemes
entropy tfij (1 −
Gfidf * tfij *
pij log( pij )
∑ log(ndocs) ) j
gf i df i
Normal tfij *
B-idf Bij *
1
∑ tf
B-normal
⎡ ndocs ⎤ log ⎢ ⎥ +1 ⎣ df i ⎦
2 ij
j
Bij *
1 ∑ tfij2 j
Bg-idf log(tfij +1) ( ⎡ ndocs ⎤ )
*
log ⎢ ⎥ +1 ⎣ dfi ⎦
Log-entropy log(tfij +1) * ( 1− ∑ j
pij log( pij ) log(ndocs )
log-Gfidf log(tfij +1) *
gfi dfi
log-norm log(tfij +1)
*
1 ∑ tfij2 j
) Table 2. Examples of matched diagnostic categories classified by the DKE&TC
Input queries CUST STATES VEHICLE OVERHEAT NO LOSS OF COOLANT BUT HAD BURNING COOLANT SMELL. CUSTOMER STATES CHECK FOR OIL LEAK BETWEEN TRANS. AND ENGINE SAYS GETTING SQUEAK NOISE FROM BELT OR BEARING TYPE
Three top matched diagnostic categories and associated description C1: COOLANT LEAK C2: ENGINE OVERHEATS/RADIATOR TROUBLES C3: UNUSUAL EXHAUST SYSTEM ODOR C1: ENGINE LEAKS OIL C2: UNDETERMINED ENGINE LEAK C3: TRANSMISSION/CLUTCH FLUID LEAKS C1: ENGINE BELT BREAKING/SLIPPING/SQUEALING C2: ENGINE BELT SLIPPING/SQUEALING C3: ENGINE BELT OFF/FRAYED/COMING APART/BROKEN
4 Conclusion In this paper, we presented a text document mining system, DKE&TC, developed for the automotive engineering diagnostic application. We presented our research results in a number of important areas including various weighting schemes and similarity functions. Our experiment results show that the proposed system, DKE&TC, outperformed the well-known LSA model and many other weight schemes. We also compared the performance of DKE&TC system with diagnostic engineers on a small data set of 100 queries. The results showed that the DKE&TC system outperformed its human counter part.
Text Mining with Application to Engineering Diagnostics
1317
Acknowledgment This work is support in part by a CEEP grant from the College of Engineering and Computer Science at the University of Michigan-Dearborn.
References 1. S.K.M.Wong., V.V.R.a.: A Critical Analysis of Vector Space Model for Information Retrieval. Journal of the Americal Society for Information Science, 1986. 37(2): p. 279-287. 2. Salton, G.a.B., C.: Term weighting approaches in automatic text retrieval. Information Processing and Management, 1988. 24(5): p. 513--523. 3. Dumais, S.T.: Enhancing performance in latent semantic indexing (LSI) retrieval, in Technical Report Technical Memorandum. 1990, Bellcore. 4. Berry, M.W., Dumais, S. T., and O'Brien, G. W.: Using linear algebra for intelligent information retrieval. SIAM Review, 1995. 37(4): p. 573-595. 5. S. Deerwester, S.D., G. Furnas, T. Landauer, and R. Harshman.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 1990. 41: p. 391-407. 6. T. K. Landauer, P.W.F., and D. Laham.: Introduction to latent semantic analysis. Discourse Processes, 1998(25): p. 259-284. 7. Michael W. Berry, Z.D., Elizabeth R. Jessup.: Matrices, Vector Spaces, and Information Retrieval. SIAM Review, 1999. 41(2): p. 335-362. 8. Edwards, A.L.: The Correlation Coefficient. San Francisco, CA: W. H. Freeman, 1976 9. Lehmann, E.L., D'Abrera, H. J. M., Nonparametrics: Statistical Methods Based on Ranks, rev., ed. E. Cliffs. NJ: Prentice-Hall, 1998 10. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval, Addison-Wesley, 1999 11. Gerard Salton and Michael McGill.: Introduction to Modern Information Retrieval. McGraw-Hill, 1983 12. Ana Cardoso-Cachopo, A.L.O.: An Empirical Comparison of Text Categorization Methods. in SPIRE 2003,LNCS, 2003
Handling Incomplete Categorical Data for Supervised Learning* Been-Chian Chien1, Cheng-Feng Lu2, and Steen J. Hsu3 1
Department of Computer Science and Information Engineering, National University of Tainan, 33, Sec. 2, Su-Lin St., Tainan 70005, Taiwan, R.O.C.
[email protected] 2 Department of Information Engineering, I-Shou University, Kaohsiung 840, Taiwan, R.O.C.
[email protected] 3 Department of Information Management, Ming Hsin University of Science and Technology, 1 Hsin-Hsing Road, Hsin-Fong, Hsin-Chu, Taiwan 304, R.O.C.
[email protected] Abstract. Classification is an important research topic in knowledge discovery. Most of the researches on classification concern that a complete dataset is given as a training dataset and the test data contain all values of attributes without missing. Unfortunately, incomplete data usually exist in real-world applications. In this paper, we propose new handling schemes of learning classification models from incomplete categorical data. Three methods based on rough set theory are developed and discussed for handling incomplete training data. The experiments were made and the results were compared with previous methods making use of a few famous classification models to evaluate the performance of the proposed handling schemes.
1 Introduction The classification problem is a task of supervised learning that consists of two phases. The learning phase is to learn a classification model from a set of training data with a predefined class for each datum. The classification phase is to classify unknown cases to one of the predefined classes using the learned classification model. Typically, the training data are assumed to be collected completely without loss or error inside. Reliable classification models thus can be learned from the training data and represented in the form of classification rules, decision trees, or mathematical functions. However, incomplete data happen usually in real world applications. An incomplete dataset is that a dataset contains at least one missing value as in Table 1. The symbols ‘?’ in the cases x5, x6, x9 denote missing values. The reasons of producing incomplete data may be caused by a broken machine or mistakenly erased by a person. Generally, the incomplete training data will degrade the learning quality of classification models. *
This work was supported in part by the National Science Council of Taiwan, R. O. C., under contract NSC94-2213-E-024-004.
M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1318 – 1328, 2006. © Springer-Verlag Berlin Heidelberg 2006
Handling Incomplete Categorical Data for Supervised Learning
1319
Table 1. A simple incomplete dataset No. x1
outlook sunny
temperature L
humidity H
play no
x2
?
H
H
no
x3
overcast
M
?
yes
x4
rainy
M
H
yes
In this paper, we investigate the problem of handling incomplete categorical data contained in training datasets for supervised learning. Since the incomplete samples do not provide perfect information for training process, most of traditional supervised learning methods can not deal with incomplete data directly but generate inaccurate classifiers from an incomplete dataset. If the incomplete training data can be tackled well, an effective classification model can be learned. Many methods on dealing with missing values for classification had been proposed in the past decades. The previous researches on handling incomplete data can be generally divided into two strategies: (1) Ignore the data with missing values inside the dataset. [7][10][13][18][21]. (2) Fill the missing value by an appropriate alternative. [5][8][12][14][17][23]. To learn an effective classification model from incomplete categorical data, we try to overcome the problem of missing values in attributes based on rough set theory. During the learning phase of classification, three methods are proposed to handle the problem of incomplete categorical data based on rough membership functions. The proposed methods transform categorical data into numerical form and replace the missing values with different granular processes based on rough membership. To evaluate the performance of the proposed handling methods, a few famous classification models, such as GPIZ, SVM, NB and NBTree, were selected to test and compare with C4.5 which handles missing values using information gain. The tested datasets include real and artificial incomplete datasets generated from UCI Machine Learning repository [1]. The experimental results demonstrate that the proposed methods based on rough membership can perform various effects on different classification models. The remainder of this paper is organized as follows: Section 2 reviews the rough set theory and the related researches of incomplete data. In Section 3, we present the proposed scheme for handling incomplete data. Section 4 describes the experiments and demonstrates the performance of proposed methods combining different classification models. Finally, conclusions and future work are presented.
2 Related Work 2.1 Rough Sets Theory The rough set theory was proposed by Pawlak in 1982 [19][20] that is an extension of set theory and is well suited to deal with incompleteness and uncertainty. The main advantages of rough set theory are that it does not need any additional information about data and provides a powerful foundation to reveal important structures in data.
1320
B.-C. Chien, C.-F. Lu, and S.J. Hsu
Many researches had broadly and successfully applied rough set theory to discover knowledge from incomplete databases [8][10][18][23]. The idea of rough sets is based on the establishment of equivalence classes on a given dataset U, called a universe, and supports two approximations called lower approximation and upper approximation. We give the definitions of the rough set theory as follows: Let S = (U, T) denote an information table, where U means a non-empty finite set of objects and T = A C, A is the set of condition attributes and C is the decision attribute. We then define a equivalence relation RA(B) on S as RA(B) = {(x, y) | x, y∈U, ∀Aj∈B, Aj(x)=Aj(y)}. We say that objects x and y are indiscernible if the equivalence relation RA(B) is satisfied on the set U, for all Aj ∈A and each B⊆A. Let [x]B denote an equivalence class of S on RA(B) and [U]B denote the set of all equivalence classes [x]B for x∈U. That is, [x]B = {y | x RA(B) y, ∀x, y∈U} and [U]B = {[x]B | x∈U}. Then, the lower approximation and upper approximation for B on concept X, denoted B*(X) and B*(X) are defined as B*(X) = {x | x ∈ U, [x]B ⊆ X} and B*(X) = {x | x ∈ U and [x]B ∩ X ≠ ∅}, respectively. For a given concept X⊆U, a rough membership function of X on the set of attributes B is defined as
∪
| [ x] B ∩ X | , | [ x] B |
μ BX ( x) =
(1)
where |[x]B∩X| denotes the cardinality of the set [x]B∩X. The rough membership value μ BX (x) can be interpreted as the conditional probability that an object x belongs to X, given that the object belongs to [x]B, where μ BX (x) ∈ [0, 1]. 2.2 Review of Incomplete Data Handling The techniques of supervised learning on incomplete datasets are more difficult than on complete datasets. Designing an effective learning algorithm being able to deal with incomplete datasets is a challenge to researchers. Many previous researches generally were motivated from two strategies: ignoring and repairing. The first strategy is to ignore the whole record with unknown values or just ignore the missing attribute values. The related researches are enumerated as follows: (1) Ignore samples or attributes [7]. (2) Ignore the missing attribute values [10][13][18][21][24]. The second strategy is to use the mechanism of reparation to transforms an incomplete dataset into a complete dataset. The related techniques on handling incomplete categorical data are presented as follows: (1) Fill in the missing value manual. (2) Concept most common attribute value [17]. (3) Assign all possible values of the attribute [8][9]. (4) Use rough set approach to repair missing values [14][15][23].
Handling Incomplete Categorical Data for Supervised Learning
1321
The preprocess of handling incomplete data is important and necessary for learning a high-accuracy classification model.
3 Handling Incomplete Data Based on Rough Membership In this section, we introduce three handling methods for incomplete data based on rough membership. The first method is called fill in another value (FIAV). The second and third methods are fill in all possible value with class (FIAP-class) and fill in all possible value with minimum (FIAP-min), respectively. All the three methods first employ the transformation function based on rough membership of equation (1) for transform categorical data into numerical data. The purpose of the transformation is to transform the categorical attributes into a set of numerical attributes. The transformation function is described as follows: Given a dataset U having n conditional attributes A1, A2, ... , An and a decision attribute C. Let Di be the domain of Ai, for 1 ≤ i ≤ n, and C = {C1, C2, ... , CK}, where K is the number of predefined classes. For each object xj ∈ U, xj = (vj1, vj2, ... , vjn), where vji∈Di stands for the value on attribute Ai of the object xj. The idea of transformation is to transform the original categorical attribute Ai into K numerical attributes Ãi. Let Ãi = (Ai1, Ai2, ... , AiK), where K is the number of predefined classes C and the domains of Ai1, Ai2, ... , AiK are in [0, 1]. The method of transformation is based on the rough membership function of equation (1). For an object xj ∈ U, xj = (vj1, vj2, ... , vjn), vji∈Di, the vji will be transformed into (wjk, wj(k+1), ... , wj(k+K-1) ), wik ∈ [0, 1] and wjk = μ CAi1 ( x j ) , wj(k+1) = μ CAi2 ( x j ) , ... , wj(k+K-1) = μ AC ( x j ) , where K
i
μ CA ( x j ) = k
i
| [ x j ] A ∩ [ x j ]C | i
k
| [ x j ]A |
, if vji∈[xj]Ai and xj∈U.
(2)
i
After the transformation, the new dataset U′ with attributes à = {Ã1, Ã2, ... , Ãn} is obtained and an object xj ∈ U, xj = (vj1, vj2, ... , vjn) will be transformed into yj, yj = (wj1, wj2, ... , wjn′), where n′ = q × K, q is the number of categorical attributes. The proposed handling methods are presented in detailed as follows. FIAV This method is described as its name; the missing values are filled by an alternate value. For example, we can use ‘unknown’ value to replace the missing values. Then, the values in categorical attributes are transformed into numerical values. We thus ignore what real values should be in the missing fields and shift to concern about the distribution of ‘unknown’ value on the decision attribute. We give an example to illustrate this method using the incomplete dataset as shown in Table 1. Example 1: In Table 1, there are of three categorical attributes: outlook, temperature and humidity and the decision attribute is play. The FIAV method uses the value ‘unknown’ to replace the symbol ‘?’ as a new attribute value. The symbolic categories in the three attributes become: outlook: {sunny, overcast, rainy, unknown}, temperature: {L, M, H} and humidity: {L, H, unknown}. The number of numerical
1322
B.-C. Chien, C.-F. Lu, and S.J. Hsu
attributes n′ = 3 × 2 = 6. Based on the above equivalence classes, the values in case x1 after transformation are as follows: w11 = μ AC11 ( x1 ) = 0, w12 = μ A12 ( x1 ) = 1, w13 = μ A21 ( x1 ) = 0, C
C
w14 = μ AC22 ( x1 ) = 1, w15 = μ AC31 ( x1 ) = 0.33, w16 = μ AC32 ( x1 ) = 0.67. So, y1 = (w11, w12, w13, w14, w15, w16) = (0, 1, 0, 1, 0.33, 0.67). The new dataset U′ is generated and shown as Table 2. Table 2. The results of the transformation of Table 1 using FIAV No. y1 y2 y3 y4
outlook wj1
wj2
0 0 1 1
1 1 0 0
temperature wj3 wj4 0 0 1 1
1 1 0 0
humidity wj5 wj6
play Ci
0.33 0.33 1 0.33
no no yes yes
0.67 0.67 0 0.67
FIAP-class This method assumes that missing values can be any possible values in the attribute. Hence, all possible rough membership values have to be calculated by replacing the missing values with all possible values in that attribute. After all possible rough membership values are derived, the value with the maximum of μ CA ( x j ) is picked to be the alternate of the missing value for object xj on attribute Ai, where Ck is the class of object xj. We give Example 2 to illustrate this method more detailed. k
i
Example 2: In Table 1, the missing values are first replaced by all possible values according to their attributes, e.g. the symbolic categories in the three attributes are: outlook: {sunny, overcast, rainy}, temperature: {L, M, H} and humidity: {L, H}. The object x2 in attribute outlook is replaced by sunny, overcast and rainy. The missing value in the object x2 will have three sets of possible transformation values corresponding to sunny, overcast and rainy in attribute outlook, as follows:
If outlook = sunny in case x2, the values after transformation are w21 = μ AC11 ( x 2 ) = 0, w22 = μ AC12 ( x 2 ) = 1.
If outlook = overcast in case x2, the values after transformation are w21 = μ CA ( x 2 ) = 0.5, w22 = μ AC ( x 2 ) = 0.5. 1
2
1
1
If outlook = rainy in case x2, the values after transformation are w21 = μ AC ( x2 ) = 0.5, w22 = μ AC ( x2 ) = 0.5. 1
2
1
1
Since the class label of the object x2 is C2, i.e. play = no, we select the set of rough membership values with maximum μ AC ( x2 ) among the three sets (i.e. w21 = 0 and w22 = 1) to be the alternate of the missing value on the attribute outlook. The other missing values can be also produced by the same procedure and are shown as Table 3. 2
1
Handling Incomplete Categorical Data for Supervised Learning
1323
Table 3. The results of the transformation of Table 1 using FIAP-class No.
wj1
y1 y2 y3 y4
outlook wj2
0 0 1 1
temperature wj3 wj4
1 1 0 0
0 0 1 1
1 1 0 0
wj5
humidity wj6
play Ci
0.33 0.33 0.50 0.33
0.67 0.67 0.50 0.67
no no yes yes
FIAP-min This method is similar to the FIAP-class method except the selected rough membership values. FIAP-min selects the minimum of μ AC ( x j ) as the alternate from the same class instead of the maximum in FIAP-class method. We define the minimum value λjk as follows k
i
λ jk = arg min v
ji ∈Di
{μ
Ck Ai = v ji
}
(xj ) .
(3)
Then we use λjk to replace the missing value wjk. We give Example 3 to show the transformation of FIAP-min on Table 1. Example 3: As the example of the missing value of object x2 in Table 1, we first obtain the three sets of transformation values corresponding to sunny, overcast and rainy in the attribute outlook as Example 2. Then, we have λ21 = min{0, 0.5, 0.5} = 0 and λ22 = min{1, 0.5, 0.5} = 0.5. Therefore, we replace w21 with 0 and w22 with 0.5. The same procedure is applied to the other missing values in Table 1 and the result is shown in Table 4. Table 4. The results of the transformation of Table 1 using FIAP-min No. y1 y2 y3 y4
outlook wj1
wj2
0 0 1 1
1 0.5 0 0
temperature wj3 wj4 0 0 1 1
1 1 0 0
humidity wj5 wj6
play Ci
0.33 0.33 0.50 0.33
no no yes yes
0.67 0.67 0 0.67
4 Experiments and Comparison The experiments was conducted by a PC with 3.4 GHz CPU with 256 MB RAM. For understanding the effect of the proposed methods for supervised learning on incomplete data, we generated incomplete datasets artificially selected from UCI Machine Learning repository [1]. The related information of the selected datasets is summarized in Table 5. The selected datasets were modified to generate the incomplete datasets by randomly selecting a specified percentage of cases to set to be null. However, the generation of missing values still must follow the following constraints: C1: Each original case retains at least one attribute value. C2: Each attribute has at least one value left.
1324
B.-C. Chien, C.-F. Lu, and S.J. Hsu Table 5. The selected complete datasets Attributes categorical
numerical
Number of Objects
Number of classes
led7
7
0
3200
10
Lymph tic-tac-toe
18 9
0 0
148 958
4 2
Datasets
The performance of the classification scheme is evaluated by the average classification error rate of 10-fold cross validation for 10 runs. Each run works using a new generated incomplete dataset. In addition to the three proposed methods, we also compare the methods of handling incomplete categorical data with the method of concept most common attribute value (CMCAV) in [17]. The selected classification models include a GP-based classifier GPIZ[3], statistical based classifiers Naïve Bayes [6] and NBTree [16]; support vector machines (SVM) [11], and decision tree based classifiers C4.5 [21]. Note that C4.5 was tested on incomplete data directly in the experiment of FIAV because C4.5 has its own mechanism in handling missing values. The experimental results of classification error rates are listed from Table 6 to Table 8. Table 6 is the classification results of the five classifiers and four incomplete data handling methods on the led7 dataset. Since the number of data in led7 is large, we can test higher missing rate up to 30% missing. The NB-based classifiers in this dataset have better classification rate than other classifiers in FIAV and CMCAV methods and the SVM classifier has the best performance in average no matter what handling methods is. Generally, missing data do not influence the classification rate so much in this dataset. Table 7 is the classification results of the five classifiers and four incomplete data handling methods on the lymph dataset. This dataset has only 148 objects belonging to four classes. Therefore, only 10 % maximum missing rate can be considered. In this dataset, we found that the FIAP-class method is suitable for non-NB-based classifiers, GPIZ, C4.5 and SVM. The CMCAV method especially provides good classification rates for NB-based classifiers. The error rates on the dataset with missing data are even less than those without missing for NB and NBTree. Table 8 is the classification results of the five classifiers and four incomplete data handling methods on the tic-tac-toe dataset. This dataset contains 958 objects and only two classes. The maximum missing rate can be tuned to 30%. The more missing rate increases, the more classification error rate results. We found that NB-based classifiers are terrible bad in this dataset no matter what handling method used. The classifier SVM is fair in this case and the classifier GPIZ is as good as C4.5. From the experimental results, we obviously knew that no single classifier and handling method can do all data well. Generally, the FIAV method is not an ideal handling method for most of the tested classifiers. The FIAP-class and FIAP-min methods have better classification accuracy than other methods in the GPIZ, C4.5 and
Handling Incomplete Categorical Data for Supervised Learning
1325
Table 6. The results of classification using incomplete led7 dataset FIAV
Missing rate led7 0% 5% 10% 20% 30%
GPIZ Ave. S.D. 27.8 0.9 27.6 1.3 27.3 1.5 27.9 1.9 28.5 2.1
5% 10% 20% 30%
27.5 27.6 27.3 28.0
0.9 1.6 2.2 2.1
5% 10% 20% 30%
27.9 28.2 28.5 29.3
1.1 1.7 2.1 2.4
5% 10% 20% 30%
27.9 28.0 28.6 29.9
0.6 1.4 1.6 2.6
C4.5 Ave. S.D. 27.1 0.0 27.0 0.3 27.3 0.9 28.0 1.2 29.2 1.4
SVM NB Ave. S.D. Ave. S.D. 26.6 0.0 26.8 0.0 26.4 0.5 27.0 0.4 26.4 0.7 26.5 0.7 26.4 0.9 26.7 1.0 27.2 1.7 26.9 1.5 FIAP-class 26.8 0.2 26.5 0.4 27.3 0.5 26.9 0.7 26.6 0.9 27.3 0.6 27.1 1.3 26.5 1.2 27.1 1.3 27.5 1.4 26.9 2.3 27.5 1.7 FIAP-min 27.1 0.4 26.6 0.5 27.2 0.3 27.3 0.7 26.9 1.1 27.2 0.6 28.0 1.4 26.8 1.5 26.7 1.0 29.6 1.6 27.2 1.9 27.1 2.0 Concept most common attribute value(CMCAV) 26.7 0.5 26.6 0.5 27.0 0.4 26.9 0.7 26.6 0.9 27.0 0.6 27.0 1.4 26.8 1.2 26.8 1.0 27.6 1.1 27.4 1.6 27.2 1.4
NBTree Ave. S.D. 26.8 0.0 26.9 0.5 26.9 0.8 26.5 0.9 26.9 1.7 27.2 27.2 26.6 27.5
0.5 0.6 1.1 2.1
26.9 27.0 26.5 26.9
0.4 0.8 0.9 1.7
26.9 26.8 26.7 26.8
0.4 0.7 1.1 1.4
Table 7. The results of classification using incomplete lymph dataset FIAV
Missing rate lymph 0% 5% 10%
GPIZ Ave. S.D. 17.9 1.3 19.6 2.8 20.3 4.4
5% 10%
16.9 17.6
2.1 3.9
5% 10%
17.3 18.5
3.6 6.2
5% 10%
20.3 21.2
2.5 4.0
C4.5 Ave. S.D. 22.9 0.0 22.5 5.2 27.3 6.4
SVM NB Ave. S.D. Ave. S.D. 17.5 0.0 16.9 0.0 17.9 3.4 16.2 1.0 15.3 3.6 13.8 6.0 FIAP-class 19.5 4.8 13.6 1.2 18.8 0.8 14.5 5.9 14.7 2.1 15.9 5.3 FIAP-min 23.3 6.2 13.7 2.4 16.3 1.5 23.2 7.2 14.7 2.1 14.2 7.1 Concept most common attribute value(CMCAV) 19.6 2.3 17.3 2.4 12.8 2.7 24.3 4.6 18.6 3.8 16.9 8.2
NBTree Ave. S.D. 14.8 0.0 25.4 3.2 17.2 7.7 25.1 26.6
1.6 8.1
24.3 18.5
5.4 9.3
13.4 19.8
2.8 8.1
SVM classification models, and FIAP-class defeats FIAP-min. However, these two methods are not suitable for NB-based classifiers. The CMCAV method obviously is not a good handling method for the GPIZ and the SVM classification models; nevertheless, it provides a good repair for NB-based classifiers. For the four handling methods, the GPIZ, C4.5 and SVM classifiers combining with the FIAP-class will yield good models for supervised learning. The previous CMCAV method performs fair classification accuracy while combining with NB and
1326
B.-C. Chien, C.-F. Lu, and S.J. Hsu
NBTree classification models. Generally, the former models (FIAP-class + GPIZ, C4.5 or SVM) perform better performance than the latter (CMCAV + NB or NBTree). Further, the GPIZ method presents a stable classification rate in average while using the FIAP-class to be the handling method of incomplete data.
5 Conclusion Supervised learning from incomplete data to learn an effective classification model is more difficult than learning one from complete data. In this paper, we propose three new methods, FIAV, FIAP-class, FIAP-min, based on rough membership to handle the problem of incomplete categorical data. The approach we used is different from previous methods that filled appropriate categorical values for missing data. The novelty of the proposed methods is to transform the categorical data into numerical data and estimate the membership of missing values belonging to each class. Then, the membership values are used to replace the missing values. The experimental results demonstrate that the proposed method FIAP-class has good performance on GPIZ, SVM and C4.5, but is poor on the NB-based classifiers like NB and NBtree. On the other hand, we found that NB and NBtree have good performance on the CMCAV method. We are also interested in the experimental results that the error rates may decrease when missing rates are larger than 10% or 20% in some datasets. We still analyze and try to explain such a situation using more test data in the future. Table 8. The results of classification using incomplete tic-tac-toe dataset FIAV
Missing rate tic-tac-toe 0% 5% 10% 20% 30%
GPIZ Ave. S.D. 5.9 0.7 7.5 2.1 7.4 2.7 10.7 4.4 13.6 6.3
5% 10% 20% 30%
6.9 7.3 9.4 15.2
2.3 2.8 4.6 5.5
5% 10% 20% 30%
7.0 8.2 13.7 17.1
1.2 3.0 3.5 6.7
5% 10% 20% 30%
7.0 7.4 16.5 18.7
1.9 3.2 4.5 9.0
C4.5 Ave. S.D. 2.60 0.0 7.69 1.7 8.08 1.3 10.6 4.1 13.7 5.9
SVM NB Ave. S.D. Ave. S.D. 19.6 0.0 30.4 0.0 19.9 1.4 36.4 1.4 16.7 2.3 35.1 1.7 18.9 4.4 33.9 4.1 21.4 6.3 33.2 7.4 FIAP-class 6.69 1.1 17.0 1.7 36.4 1.3 6.92 1.4 16.2 2.6 35.0 1.8 10.2 3.5 16.2 3.1 34.6 3.5 16.2 6.0 23.2 7.8 34.9 7.9 FIAP-min 8.41 1.5 17.2 1.7 36.0 1.2 8.06 1.7 15.2 2.5 34.8 1.9 10.9 2.7 18.1 3.6 33.6 3.8 15.1 6.3 21.8 7.9 32.5 7.6 Concept most common attribute value 6.89 1.6 16.3 1.2 34.1 1.1 8.42 1.3 15.8 2.6 32.7 2.3 14.5 4.1 22.2 2.6 31.7 4.9 18.9 8.0 27.1 8.9 31.2 8.7
NBTree Ave. S.D. 16.0 0.0 27.7 2.9 26.6 4.0 26.9 4.3 31.0 6.1 28.2 28.3 27.0 31.3
1.9 2.1 5.0 5.3
24.0 24.8 25.7 31.8
3.9 2.8 5.8 3.8
23.3 19.7 24.1 25.0
1.4 2.8 3.8 8.7
Handling Incomplete Categorical Data for Supervised Learning
1327
References 1. Blake, C., Keogh E., Merz, C. J.: UCI repository of machine learning database. http://www.ics.uci.edu/~mlearn/MLRepository.html, Irvine, University of California, Department of Information and Computer Science (1998) 2. Chang, C. C., Lin, C. J.: LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm 3. Chien, B. C., Lin, J. Y., Yang, W. P.: Learning effective classifiers with z-value measure based on genetic programming. Pattern Recognition, 37 (2004) 1957-1972 4. Chien, B. C., Yang, J. H., Lin, W. Y.: Generating effective classifiers with supervised learning of genetic programming. Proceedings of the 5th International Conference on Data Warehousing and Knowledge Discovery (2003) 192-201 5. Dempster, P., Laird, N. M., Rubin, D. B.: Maximum-likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B39 (1977) 1-38 6. Duda, R. O., Hart, P. E.: Pattern Classification and Scene Analysis. New York: Wiley, John and Sons Incorporated Publishers (1973) 7. Friedman, J. H.: A recursive partitioning decision rule for non-parametric classification. IEEE Transactions on Computer Science (1977) 404-408 8. Grzymala-Busse, J. W.: On the unknown attribute values in learning from examples. Proceedings of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, Lecture Notes in Artificial Intelligence, Vol. 542, Springer-Verlag, Berlin Heidelberg New York (1991) 368-377 9. Grzymala-Busse, J. W., Hu, M.: A comparison of several approaches to missing attribute values in data mining. Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing (2000) 378-385 10. Grzymala-Busse, J. W.: Rough set strategies to data with missing attribute values. Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining (2003) 56-63 11. Gunn, S. R.: Support vector machines for classification and regression. Technical Report, School of Electronics and Computer Science University of Southampton (1998) 12. Han, J., Kamber, M.: Data Mining: Concept and Techniques. Morgan Kaufmann publishers, (2001) 13. Hathaway, R. J., Bezdek, J. C.: Fuzzy c-means clustering of incomplete data, IEEE Transactions on Systems, Man, and Cybernetics-part B: Cybernetics 31(5) (2001) 14. Hong, T. P., Tseng, L. H., Chien, B. C.: Learning fuzzy rules from incomplete numerical data by rough sets. Proceedings of the 2002 IEEE International Conference on Fuzzy Systems (2002) 1438-1443 15. Hong, T. P., Tseng, L. H., Wang, S.-L.: Learning rules from incomplete training examples by rough sets. Expert Systems with Applications 22 (2002) 285-293 16. Kohavi, R.: Scaling up the accuracy of naïve-bayes classifiers: a decision-tree hybrid. Knowledge Discovery & Data Mining, Cambridge/Menlo Park: AAAI Press/MIT Press Publishers (1996) 202-207 17. Koninenko, I., Bratko, K., Roskar,E.: Experiments in automatic learning of medical diagnostic rules. Technical Report, Jozef Stenfan Institute, Ljubljana (1984) 18. Kryszkiewicz, M.: Rough set approach to incomplete information systems. Information Science 112 (1998) 39-49 19. Pawlak, Z.,: Rough sets. International Journal of Computer and Information Sciences 11 (1982) 341-356
1328
B.-C. Chien, C.-F. Lu, and S.J. Hsu
20. Pawlak Z., A. Skowron: Rough membership functions, in: R.R. Yager and M. Fedrizzi and J. Kacprzyk (Eds.), Advances in the Dempster-Shafer Theory of Evidence (1994) 251-271 21. Quinlan, J. R.: C4.5: Programs for Machine Learning. San Mateo, California, Morgan Kaufmann Publishers (1993) 22. Singleton, A.: Genetic Programming with C++. http://www.byte.com/art/9402/sec10/art1.htm, Byte (1994) 171-176 23. Slowinski, R., Stefanowski, J.: Handling various types of uncertainty in the rough set approach. Proceedings of the International Workshop on Rough Sets and Knowledge Discovery (1993) 366-376 24. Stefanowski, J., Tsoukias, A.: On the extension of rough sets under incomplete information. Proceeding of the 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing (1999) 73-81 25. Witten, H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)
Mining Multiple-Level Association Rules Under the Maximum Constraint of Multiple Minimum Supports Yeong-Chyi Lee1, Tzung-Pei Hong2,*, and Tien-Chin Wang3 1
Department of Information Engineering, I-Shou University, Kaohsiung, 84008, Taiwan
[email protected] 2 Department of Electrical Engineering, National University of Kaohsiung, Kaohsiung, 811, Taiwan
[email protected] 3 Department of Information Management, I-Shou University, Kaohsiung, 84008, Taiwan
[email protected] Abstract. In this paper, we propose a multiple-level mining algorithm for discovering association rules from a transaction database with multiple supports of items. Items may have different minimum supports and taxonomic relationships, and the maximum-itemset minimum-taxonomy support constraint is adopted in finding large itemsets. That is, the minimum support for an itemset is set as the maximum of the minimum supports of the items contained in the itemset, while the minimum support of the item at a higher taxonomic concept is set as the minimum of the minimum supports of the items belonging to it. Under the constraint, the characteristic of downward-closure is kept, such that the original Apriori algorithm can easily be extended to find large itemsets. The proposed algorithm adopts a top-down progressively deepening approach to derive large itemsets. An example is also given to demonstrate that the proposed mining algorithm can proceed in a simple and effective way.
1 Introduction Finding association rules in transaction databases is most commonly seen in data mining. The mined knowledge about the items tending to be purchased together can be passed to managers as a good reference in planning store layout and market policy. Agrawal and his co-workers proposed several mining algorithms based on the concept of large itemsets to find association rules from transaction data [1-4]. Most of the previous approaches set a single minimum support threshold for all the items [3][5][6][7]. In real applications, different items may have different criteria to judge their importance. The support requirements should thus vary with different items. Moreover, setting the minimum support for mining association rules is a dilemma. If it is set too high, many possible rules may be pruned away; on the other hand, if it is set too low, many uninteresting rules may be generated. Liu et al. thus proposed an approach for mining association rules with non-uniform minimum support values [10]. Their approach allowed users to specify different minimum supports to different *
Corresponding author.
M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1329 – 1338, 2006. © Springer-Verlag Berlin Heidelberg 2006
1330
Y.-C. Lee, T.-P. Hong, and T.-C. Wang
items. Wang et al. proposed a mining approach, which allowed the minimum support value of an itemset to be any function of the minimum support values of items contained in the itemset [11]. In the past, we proposed a simple and efficient algorithm based on the Apriori approach to generate large itemsets under the maximum constraints of multiple minimum supports [12][13]. Furthermore, taxonomic relationships among items often appear in real applications. For example, wheat bread and white bread are two kinds of bread. Bread is thus a higher level of concept than wheat bread or white bread. Meanwhile, the association rule “bread Æ milk” may be more general to decision makers than the rule “wheat bread Æ juice milk”. Discovering association rules at different levels may thus provide more information than that only at a single level [8][9]. In this paper, we thus propose a multiple-level mining algorithm for discovering association rules from a transaction database with multiple minimum-supports of items. It is an extension of our previous approach with taxonomy being considered. Each item is first given a predefined support threshold. The maximum-itemset minimum-taxonomy support constraint is then adopted in finding large itemsets. That is, the minimum support for an itemset is set as the maximum of the minimum supports of the items contained in the itemset, while the minimum support of the item at a higher taxonomic concept is set as the minimum of the minimum supports of the items belonging to it. This is quite consistent with the mathematical concepts of union and intersection. Itemsets can be thought of as item intersection in transactions, and higher-level items as item union. The algorithm then adopts a top-down progressively deepening approach to derive large itemsets. The remaining parts of this paper are organized as follows. Some related mining algorithms are reviewed in Section 2. The proposed algorithm for mining multiplelevel association rules under the maximum-itemset minimum-taxonomy support constraint of multiple minimum supports is described in Section 3. An example to illustrate the proposed algorithm is given in Section 4. Conclusion and discussion are given in Section 5.
2 Review of Related Mining Algorithms Some related researches about mining multiple-level association rules and mining association rules with multiple minimum supports are reviewed in this section. 2.1 Mining Multiple-Level Association Rules Previous studies on data mining focused on finding association rules on the singleconcept level. However, mining multiple-concept-level rules may lead to discovery of more general and important knowledge from data. Relevant data item taxonomies are usually predefined in real-world applications and can be represented using hierarchy trees. Terminal nodes on the trees represent actual items appearing in transactions; internal nodes represent classes or concepts formed by lower-level nodes. A simple example is given in Figure 1.
Mining Multiple-Level Association Rules Under the Maximum Constraint
1331
Food
Milk
Chocolate
Dairyland
Bread
...
Apple
Foremost
...
White
OldMills Wonder
...
Wheat
...
...
Fig. 1. Taxonomy for the example
In Figure 1, the root node is at level 0, the internal nodes representing categories (such as “Milk”) are at level 1, the internal nodes representing flavors (such as “Chocolate”) are at level 2, and the terminal nodes representing brands (such as “Foremost”) are at level 3. Only terminal nodes appear in transactions. Han and Fu proposed a method for finding association rules at multiple levels [8]. Nodes in predefined taxonomies are first encoded by sequences of numbers and the symbol "*" according to their positions in the hierarchy tree. For example, the internal node "Milk" in Figure 2 will be represented as 1**, the internal node "Chocolate" as 11*, and the terminal node "Dairyland" as 111. A top-down progressively deepening search approach is then used to mine the rules out. 2.2 Mining Association Rules with Multiple Minimum Supports In the conventional approaches of mining association rules, minimum supports for all the items or itemsets to be large are set at a single value. However, in real applications, different items may have different criteria to judge its importance. Liu et al. [10] thus proposed an approach for mining association rules with non-uniform minimum support values. Their approach allowed users to specify different minimum supports to different items. The minimum support value of an itemset is defined as the lowest minimum supports among the items in the itemset. This assignment is, however, not always suitable for application requirements. As mentioned above, the minimum support of an item means that the occurrence frequency of the item must be larger than or equal to it for being considered in the next mining steps. If the support of an item is smaller than the support threshold, this item is not worth considering. When the minimum support value of an itemset is defined as the lowest minimum supports of the items in it, the itemset may be large, but items included in it may be small. In this case, it is doubtable whether this itemset is worth considering. It is thus reasonable in some sense that the occurrence frequency of an interesting itemset must be larger than the maximum of the minimum supports of the items contained in it. Wang et al. [11] then generalized the above idea and allowed the minimum support value of an itemset to be any function of the minimum support values of items
1332
Y.-C. Lee, T.-P. Hong, and T.-C. Wang
contained in the itemset. They proposed a bin-oriented, non-uniform support constraint. Items were grouped into disjoint sets called bins, and items within the same bin were regarded as non-distinguishable with respect to the specification of a minimum support. Although their approach is flexible in assigning the minimum supports to itemsets, the mining algorithm is a little complex due to its generality. As mentioned above, it is meaningful in some applications to assign the minimum support of an itemset as the maximum of the minimum supports of the items contained in the itemset. Although Wang et al.’s approach can solve this kind of problems, the time complexity is high. In our previous work, a simple algorithm based on the Apriori approach was proposed to find the large-itemsets and association rules under the maximum constraint of multiple minimum supports [12][13]. The proposed algorithm is easy and efficient when compared to Wang et al.’s under the maximum constraint. Below, we will propose an efficient algorithm based on Han’s mining approach and our previous approach for multiple-level items under the maximum-itemset minimum-taxonomy support constraint to generate the large itemsets level by level. Some pruning can also be easily done to save the computation time.
3 The Proposed Algorithm In the proposed algorithm, items may have different minimum supports and taxonomic relationships, and the maximum-itemset minimum-taxonomy support constraint is adopted in finding large itemsets. That is, the minimum support for an itemset is set as the maximum of the minimum supports of the items contained in the itemset, and the minimum support for an item at a higher taxonomic concept is set as the minimum of the minimum supports of the items belonging to it. Under the constraint, the characteristic of downward-closure is kept, such that the original Apriori algorithm can be easily extended to find large itemsets. The proposed mining algorithm first encodes items (nodes) in a given taxonomy as Han and Fu's approach did [8]. It first handles the items at level 1. The minimum supports of the items at level 1 are set as the minimum of the minimum supports of the items belonging to it. It then finds all large 1-itemsets L1 from the given transactions by comparing the support of each item with its minimum support. After that, candidate 2-itemsets C2 can be formed from L1. Note that the supports of all the large 1-itemsets comprising each candidate 2-itemset must be larger than or equal to the maximum of the minimum supports of them. This feature provides a good pruning effect before the database is scanned for finding large 2-itemsets. The proposed algorithm then finds all large 2-itemsets L2 for the given transactions by comparing the support of each candidate 2-itemset with the maximum of the minimum supports of the items contained in it. The same procedure is repeated until all large itemsets have been found. The algorithm then find large itemsets at the next level until all levels have been processed. The details of the proposed mining algorithm under the maximum-itemset minimum-taxonomy support constraint are described below.
Mining Multiple-Level Association Rules Under the Maximum Constraint
1333
The mining algorithm for multiple-level association rules under the maximumitemset minimum-taxonomy support constraint of multiple minimum supports: INPUT: A body of n transaction data D, a set of m items, each with a predefined minimum support value, a predefined taxonomy of items in D, a set of membership functions, and a minimum confidence value λ. OUTPUT: A set of multiple-level association rules under maximum constraint of multiple minimum supports. STEP 1: Encode the predefined taxonomy using a sequence of numbers and the symbol “*”, with the l-th number representing the branch number of a certain item at level l. STEP 2: Translate the item names in the transaction data according to the encoding scheme. STEP 3: Set k = 1, where k is used to store the level number currently being processed. STEP 4: Group the items with the same first k digits in each transaction. Use the encoded name to represent the group, and retain only one item when there are more than two items in the same group in a transaction. Denote the j-th group at level k as gjk, where j = 1 to mk and mk is the number of groups at level k. STEP 5: Calculate the count cjk of each group gjk at level k as its occurring number in the transaction data set D. The support sjk of gjk can then be derived as:
s kj =
c kj n
.
STEP 6: Check whether the support sjk of gjk is larger than or equal to the threshold τjk that is the minimum of minimum supports of items contained in it. If the support of a group gjk satisfies the above condition, put gjk in the set of large 1-itemsets (L1k) at level k. That is,
L1k = {g kj s kj ≥ τ kj , 1 ≤ j ≤ m k } ; otherwise, remove all the items in the group from the transactions D. STEP 7: If L1k is null, then set k = k + 1 and go to STEP 4; otherwise, do the next step. STEP 8: Set r = 2, where r is used to represent the number of items stored in the current large itemsets. STEP 9: Generate the candidate set Crk from Lkr −1 in a way similar to that in the Apriori algorithm [3]. That is, the algorithm first joins Lkr −1 and Lkr −1 , assuming that r-1 items in the two itemsets are the same and the other one is different. In addition, it is different from the Apriori algorithm in that the supports of all the large (r-1)-itemsets comprising a candidate r-itemset I must be larger than or equal to the maximum (denoted as mI) of the minimum supports of these large (r-1)-itemsets. Store in Crk all the itemsets satisfying the above conditions and with all their sub-r-itemsets in Lkr −1 .
1334
Y.-C. Lee, T.-P. Hong, and T.-C. Wang
STEP 10: If the candidate r-itemsets Crk is null, set k = k + 1 and go to STEP 4; otherwise, do the next step. STEP 11: For each newly formed candidate r-itemset I with items (I1, I2, …, Ir) in Crk , do the following substeps: (a) Calculate the count cIk of each candidate r-itemset I at level k as its occurring number in the transaction data set. The support of I can then be derived as: ck sI = I . n (b) Check whether the support sI of each candidate r-itemset is larger than or equal to τI, which is the minimum of minimum supports of items contained in it. If I satisfies the above condition, put it in the set of large k r-itemsets ( Lr ) at level k. That is, Lkr +1 = {I s I ≥ τ I } . STEP 12: If Lrk is null and k reaches to the level number of the taxonomy, then do the next step; otherwise, if Lrk is null, then set k = k + 1 and go to STEP 4; otherwise, set r = r + 1 and go to STEP 9. STEP 13: Construct the association rules for each large q-itemset I with items ( I1 , I 2 , ..., I q ), q ≥ 2, by the following substeps: (a) Form all possible association rules as follows:
I1 ∧ ... ∧ I r −1 ∧ I r +1 ∧ ... ∧ I q → I r , r = 1 to q. (b) Calculate the confidence values of all association rules by:
sI j s I1 ∧...∧ I r −1 ∧ I r +1 ∧...∧ I q
.
STEP 14: Output the rules with confidence values larger than or equal to the predefined confidence value λ.
4 An Example In this section, a simple example is given to demonstrate the proposed mining algorithm, which generates a set of taxonomic association rules from a given transaction dataset with multiple minimum supports. Assume the transaction dataset includes the ten transactions shown in Table 1. Each transaction consists of two parts, TID and Items. The field TID is used for identifying transactions and the field Items lists the items purchased at a transaction. For example, there are three items, plain milk, tea and coke, purchased at transaction T1. Assume the predefined taxonomy among the items is shown in Figure 2. All the items fall into three main classes: foods, drinks and alcohol. Foods can be further classified into bread and cookies. There are two kinds of bread, white bread and wheat bread. The other nodes can be explained in the same manner.
Mining Multiple-Level Association Rules Under the Maximum Constraint
1335
Table 1. The ten transactions in this example TID T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
Items plain milk, tea, coke chocolate biscuits, plain milk, tea whit bread, chocolate biscuits, red wine, blended whiskey wheat bread, chocolate biscuits, plain milk, tea chocolate biscuits, pure malt whiskey, white wine whit bread, juice milk, tea, coke chocolate biscuits, plain milk, juice milk, tea soda cookies, juice milk, coke soda cookies, coke, blended whiskey soda cookies, plain milk, tea Goods
Foods
Bread
White bread
Cookies
Wheat bread
Alcohol
Soft drinks
Milk
Beverages
Wine
Soda Chocolate Plain Juice Tea Coke White wine cookies biscuits milk milk
Whiskey
Red Blended Pure malt wine whiskey whiskey
Fig. 2. The predefined taxonomy in this example
Also assume that the predefined minimum support values of items are given in Table 2 and the minimum confidence value is set at 0.85. The proposed mining algorithm for finding association rules with the predefined multiple minimum supports and multiple-level item taxonomy under the maximum-itemset minimumtaxonomy support constraint proceeds as follows. Table 2. The predefined minimum support values for items Item
White bread
Wheat bread
Soda cookies
Chocolate biscuits
Plain milk
Juice milk
Minsupport
0.4
0.4
0.4
0.7
0.6
0.4
Item
Tea
Coke
Red wine
White wine
Blended whiskey
Pure malt whiskey
Minsupport
0.5
0.7
0.4
0.5
0.6
0.4
1336
Y.-C. Lee, T.-P. Hong, and T.-C. Wang
First of all, each item name is encoded using the predefined taxonomy. The results are shown in Table 3. All transactions shown in Table 1 are encoded using the above encoding scheme. Since k = 1, where k is used to store the level number currently being processed, all the items in the transactions are first grouped at level one. For example, the items 112 and 122 are grouped into 1** since their first digits are both 1. Table 3. Codes of the item names Item name Foods Drinks Alcohol Bread Cookies Milk Beverages Wine Whiskey White bread Wheat bread
Code 1** 2** 3** 11* 12* 21* 22* 31* 32* 111 112
Item name Soda cookies Chocolate biscuits Plain milk Juice milk Tea Coke Red wine White wine Blended whiskey Pure malt whiskey
Code 121 122 211 212 221 222 311 312 321 322
The count and support of each item occurring in the ten transformed transactions are then calculated. Take 1** as an example. The count of 1** is obtained as 9. Its support value is then calculated as 0.9 (=9/10). The support value of each item at level 1 is compared with its threshold, which is the minimum of minimum supports of items contained in it. Since 1** includes the four items, 111, 112, 121 and 122, its minimum support is then calculated as min(0.4, 0.4, 0.4, 0.5) according to the minimum supports given in Table 2. The minimum support of 1** is thus 0.4. Since the support value of 1** (=0.9) is larger than its minimum support, 1** is put into the large 1-itemset L11 at level 1. In this example, the set of large 1-itemsets at level 1 is: L11 = {{1**}, {2**}}. All the items belonging to 3** (311, 312, 321 and 322) are then removed from the original transaction data set. The candidate set C21 is generated from L11 as {1**, 2**}. The support of {1**, 2**} can then be derived as 0.7. Since the support value of {1**, 2**} is larger than the maximum (0.4) of the minimum supports of 1** (0.3) and 2** (0.4), it is thus large at level 1 and is put into the set of large 2-itemsets L21. In this example, L21 = {1**, 2**}. The mining processes are then executed for level 2 and level 3. When large itemsets are generated through the process, candidate itemsets may be pruned in the following three cases: Case 1: The item which belongs to a higher-level small item does not satisfy its own minimum support and can be removed from the original transactions due to the minimum-taxonomy constraint. For example, the items, 111 and 112, belonging to 11*, are pruned since the support of 11* is smaller than its threshold. The original transaction data set is then reformed as a new one without 111 and 112. Case 2: An itemset is pruned if any of its subset is not in the set of large itemsets. For example, the 2-itemset {11*, 12*} is pruned since 11* is not a large item.
Mining Multiple-Level Association Rules Under the Maximum Constraint
1337
Case 3: An itemset is pruned if any of support values of the items in the itemset is smaller than the maximum of minimum supports of the items contained in it. For example, the two items 122 and 221 at level 3 have support values 0.8 and 0.6, which are respectively larger than their own minimum supports. 122 and 221 are thus large 1-itemsets for level 3. The 2-itemset {122, 221}, formed from them, however, is not likely to be a large itemset, since the support value (0.6) of item 221 is smaller than the maximum (0.7) of minimum supports of these two items. The 2-itemset {122, 221} is thus pruned. Since the level of the given taxonomy is 3, the association-rule deriving process is then executed. The association rules for each large q-itemsets, q ≥ 2, are constructed by the following substeps. (a) All possible association rules are formed. (b) The confidence factors of the above association rules are calculated. Take the possible association rule “If 12*, then 21*” as an example. The confidence value for this rule is calculated as: s12*∪21* 0.5 = = 0.625 . s12* 0.8
The confidence values of the above association rules are compared with the predefined confidence threshold λ. Assume the confidence λ is set at 0.85 in this example. The following five association rules are thus output: 1. If 1**, then 2**, with confidence 1.0; 2. If 2**, then 1**, with confidence 0.875; 3. If 21*, then 22*, with confidence 1.0; 4. If 22*, then 21*, with confidence 0.875; 5. If 21* and 12*, then 22*, with confidence 1.0. The proposed algorithm can thus find the large itemsets level by level without backtracking.
5 Conclusion Using different criteria to judge the importance of different items and managing taxonomic relationships among items are two issues that usually occur in real mining applications. In this paper, we have proposed a simple and efficient mining algorithm to solve these two issues. In the proposed algorithm, items may have different minimum supports and taxonomic relationships, and the maximum-itemset minimumtaxonomy support constraint which is set as the minimum support for an itemset is set as the maximum of the minimum supports of the items contained in the itemset, while the minimum support of the item at a higher taxonomic concept is set as the minimum of the minimum supports of the items belonging to it. The rational for using the maximum constraint has been well explained and this constraint may be suitable to some mining domains. Under the constraint, the characteristic of downward-closure can be easily kept, such that the original Apriori algorithm can be easily extended to finding large itemsets. The proposed algorithm can thus generate large itemsets from multiple-level items level by level and then derive association rules. Some pruning heuristics have also been adopted in this paper. Due to the minimum-taxonomy
1338
Y.-C. Lee, T.-P. Hong, and T.-C. Wang
constraint, the item with its higher-level item not satisfying the minimum support is removed from the original transactions. The itemset with any of support values of the items contained in it smaller than the maximum of minimum supports of the items is also pruned. These make the proposed algorithm work in a better way.
Acknowledgement This research was supported by the National Science Council of the Republic of China under contract NSC 94-2213-E-390-005.
References 1. R. Agrawal, T. Imielinksi and A. Swami, “Mining association rules between sets of items in large database,“ The ACM SIGMOD Conference, pp. 207-216, Washington DC, USA, 1993. 2. R. Agrawal, T. Imielinksi and A. Swami, “Database mining: a performance perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp. 914-925, 1993. 3. R. Agrawal and R. Srikant, “Fast algorithm for mining association rules,” The International Conference on Very Large Data Bases, pp. 487-499, 1994. 4. R. Agrawal and R. Srikant, ”Mining sequential patterns,” The Eleventh IEEE International Conference on Data Engineering, pp. 3-14, 1995. 5. S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic itemset counting and implication rules for market-basket data,“ in Proceedings of the 1997 ACM-SIGMOD International Conference in Management of Data, pp. 207-216, 1997. 6. J. S. Park, M. S. Chen, and P. S. Yu, “An Effective hash-based algorithm for mining association rules,” in Proceedings of the 1995 ACM-SIGMOD International Conference in Management of Data, pp. 175-186, 1995. 7. A. Savasere, E. Omiecinski, and S. Navathe, “An efficient algorithm for mining association rules in large databases,” in Proceedings of the 21st International Conference in Very Large Data Bases (VLDB’95), pp. 432-443, 1995. 8. J. Han and Y. Fu, “Discovery of multiple-level association rules from large databases,” in Proceeding of the 21st International Conference on Very Large Data Bases, pp. 420-431, 1995. 9. R. Srikant and R. Agrawal, “Mining generalized association rules,” in Proceeding of the 21st International Conference on Very Large Data Bases, pp. 407-419, 1995. 10. B. Liu, W. Hsu, and Y. Ma, “Mining association rules with multiple minimum supports,” in Proceedings of the 1999 International Conference on Knowledge Discovery and Data Mining, pp. 337-341, 1999. 11. K. Wang, Y. H and J. Han, “Mining frequent itemsets using support constraints,” in Proceedings of the 26th International Conference on Very Large Data Bases, pp. 43-52, 2000. 12. Y. C. Lee, T. P. Hong and W. Y. Lin, "Mining fuzzy association rules with multiple minimum supports using maximum constraints", The Eighth International Conference on Knowledge-Based Intelligent Information and Engineering Systems, 2004, Lecture Notes in Computer Science, Vol. 3214, pp. 1283-1290, 2004. 13. Y. C. Lee, T. P. Hong and W. Y. Lin, "Mining association rules with multiple minimum supports using maximum constraints," International Journal of Approximate Reasoning, Vol. 40, No. 1, pp. 44-54, 2005.
A Measure for Data Set Editing by Ordered Projections Jesús S. Aguilar-Ruiz, Juan A. Nepomuceno, Norberto Díaz-Díaz, and Isabel Nepomuceno Bioinformatics Group of Seville, Pablo de Olavide University and University of Seville, Spain
[email protected], {janepo, ndiaz, isabel}@lsi.us.es
Abstract. In this paper we study a measure, named weakness of an example, which allows us to establish the importance of an example to find representative patterns for the data set editing problem. Our approach consists in reducing the database size without losing information, using algorithm patterns by ordered projections. The idea is to relax the reduction factor with a new parameter, λ, removing all examples of the database whose weakness verify a condition over this λ. We study how to establish this new parameter. Our experiments have been carried out using all databases from UCI-Repository and they show that is possible a size reduction in complex databases without notoriously increase of the error rate.
1
Introduction
Data mining algorithms must work with databases with tends of attributes and thousands of examples when they are used to solve real and specific problems. This kind of databases contain much more information than standard databases, most of them of small size, which are usually used to testing data mining techniques. A lot of time and memory size are necessary to accomplish the final tests on these real databases. Methodologies based on axis-parallel classifiers are classifiers that provide easy-to-understand decision rules by humans and they are very useful for the expert interest in getting knowledge from databases. These methodologies are the most common among all methodologies used by data mining researchers. If we want to apply one of this type of tools, as C4.5 or k-NN [9], to solve a real problem with a huge amount of data, we should use some method in order to decrease the computational cost of applying these algorithms. Databases preprocessing techniques are used to reduce the number of examples or attributes as a way of decreasing the size of the database with which we are working. There are two different types of preprocessing techniques: editing (reduction of the number of examples by eliminating some of them or finding representatives patterns or calculating prototypes) and feature selection (eliminating non-relevant attributes). M. Ali and R. Dapoigny (Eds.): IEA/AIE 2006, LNAI 4031, pp. 1339–1348, 2006. c Springer-Verlag Berlin Heidelberg 2006
1340
J.S. Aguilar-Ruiz et al.
Editing methods are related to the nearest neighbours (NN) techniques [4]. For example, in [5], Hart proposed to include in the set of prototypes those examples whose classification is wrong using the nearest neighbour technique; in this way, every member of the main set is closer to a member of the subset of prototypes of the same class than a member of a different class of this subset. In [2] a variant of the previous method is proposed. In [15], Wilson suggests to eliminate the examples which are incorrectly classified with the k-NN algorithm, the works of [13] and [11] follows the same idea. Other variants are based on Voronoi diagrams [7], for example: Gabriel neighbours (two examples are Gabriel neighbours if their diametrical sphere does not contain any other examples) or relative neighbours [14] (two examples p and q are relative neighbours if for all example x in the set, the following expression is true: d(p, q) < max{d(p, x), d(x, q)}). In all previous methods the distances between examples must be calculated, so that, if we are working with n examples with m attributes, the first methods takes a time of Θ(mn2 ), the method proposed in [11] takes Θ(mn2 + n3 ) and Θ(mn3 ) the methods based on Voronoi diagrams. We work in this paper in the line proposed by Aguilar-Riquelme-Toro [1], where a first version of editing method by ordered projection technique was introduced. This algorithm works well with continuous attributes. In [10], a second and more elaborated version of this algorithm is proposed and it works simultaneously with continuous and discrete attributes (i.e., nominal attributes) and it conserves the properties of the initial approach. Working with NN-based techniques implies to introduce some initial parameters and defining a distance to calculate the proximity between the different examples of the database. This new method based on ordered projection technique does not need to define any distance and it works with each attribute independently as we will see in the next section. The most important characteristic of this approach to the editing techniques, in addition to the absence of distance calculations, are: the considerable reduction of the number of examples, the lower computational cost Θ(mn log n) and the conservation of the decision boundaries (especially interesting for applying classifiers based on axis-parallel decision rules). We are interesting in a measure, the weakness of an example, which help us to determine the importance of an example as decision boundary: more weakness implies less relevance. We propose a relaxation to the projection approach eliminating those examples whose weakness is larger than a threshold using a new parameter, λ, in the algorithm. At the present time some authors think that editing methods are rather oldfashioned because by today’s standard technology (even thought today´s data sets are larger) it is not clear whether it is worthwhile to spend the pre-processing time to perform editing. That is why some methods which embedded approaches for (simultaneous) feature selection (or editing) and classification, as SVMs [8], are being used. We are interesting in the study how to relax the projection approach to the editing problem in order to combine this new measure with the parameter of a similar approach to feature selection (eliminating non-relevant attributes), see [12]. A good method (as a new theory of measure to preprocessing
A Measure for Data Set Editing by Ordered Projections
1341
techniques) to find out the threshold which reduce the number of examples and the number of attributes without losing information in huge databases, would be a great achievement. In this paper, we show that in more complicated databases we can relax the reduction factor eliminating those examples whose weakness verify a condition over λ. We have dealt with two different databases of the UCI repository [3] (University of California at Irvine), heart-statlog database and ionosphere database. k-NN (for k = 1) and C4.5 classifiers have been used to classify our database before and after applying our editing method P OPλ (patterns by ordered projections). The condition over the weakness of each example has been relaxed gradually in order to study the importance of this measure and the goodness of our method being applied to algorithms based on axis-parallel classifiers. After having determined the threshold using the λ parameter, we carry out the study of it over the different databases of the UCI repository with continuos attributes [3]. A ten-fold cross-validation has been used for each database.
2
Description of the Algorithm
A database with m attributes and n examples can be seen as a space with m dimensions, where each example takes a value in the rank of each attribute and it has a determined class associated. Each attribute represents an axis of this space, with n points or objects inside and each example has a particular label associated with its corresponding class. For example, if our database has two attributes, we will be in a two-dimensional space (attributes are represented by x and y axis respectively) see Figure 1. As we said in the previous section, our method does not need to define any distance to compare the different examples, we will work with the projection of each example over each axis of the space. The main idea of the algorithm is the following: if the dimension of the space is d, in order to locate a region (in the context of the example region means hyper-rectangle although our algorithm will work with any hyperplane) of this space with examples with the same class, we will need only 2d examples which will define the borders of this region; for example, in order to define a squared in R2 we only need four points. So that, if we have more than 2d examples in the region with the same class, we can eliminate the rest which are inside. Our objective will be to eliminate examples which are not in the boundaries of the region. The way of finding if an example is inner to a region will be studding if it is inner in each corresponding interval in the projection of the region over the different axis of the space. An ordered projected sequence is the sequence formed by the projection of the space onto one particular axis, i.e., a particular attribute. A partition is a subsequence formed from one ordered projection sequence which maintains the projection ordered. We define the weakness of an example as the number of times that an example is not a border in a partition (i.e., it is inner to a partition) for every
1342
J.S. Aguilar-Ruiz et al.
Fig. 1. A two-dimensional database with twelve examples
partition obtained from ordered projected sequences of each attribute. We call irrelevant examples those examples whose weakness is equal to the number of attributes. In order to illustrate the method, we have designed a simple two–dimensional labelled database. This database is depicted in Figure 1, picture 1, and it contains twelve examples from two different class: N (numbers) and L (letters). An optimal classifier would obtain two rules, examples with numbers label and letters label, see picture 2 in the Figure 1, with overlapped rules. The classifier must be hierarchical because it produces overlapped rules. This is no the case of an axis parallel classifier which does not produce overlapped rules. For example, C4.5 and many others similar classifiers would produce situations like we can see in picture 2, 3, 4 and 5 in Figure 1. The aim of our algorithm is to built regions containing all the examples of the same class and to eliminate those examples which are not necessaries to define the regions, that is, those examples which are not in the borders of the regions. If we consider the situation depicted in picture number 7 in Figure 1, each region only contains examples of the same class in a maximal way. The projection of the examples on the abscissa axis, for the first attribute, it will produce four ordered sequences {N, L, N, L} corresponding to {[5, 2, 6, 4, 1], [b], [3], [f, c, d, e, a]}. Respectively on the ordinates axis, will produce the sequences {N, L, N, L} formed by the examples {[1, 2, 6, 5, 3], [d], [4], [f, c, b, a, e]}. Each sequence represents a rectangular region as possible solution of a classifier and initial and
A Measure for Data Set Editing by Ordered Projections
1343
final examples of the sequence (if it has only one, it is simultaneously the initial and the final one) represent the lower and upper values for each coordinate of this rectangle. In this situation, 5 and 1 are border for the first attribute. According to this figure, the weakness of each examples would be 0 to examples ‘1’, ‘3’ and ‘f’; 1 to ‘4’, ‘d’, ‘e’, ‘5’, ‘b’ and ‘a’; and 2 to example ‘2’, ‘6’ and ‘c’. Last examples have weakness equal to the dimension, therefore they are not necessaries to define the subregions, they are irrelevant examples. So, they are removed from the database, see picture 8 in the Figure 1. 2.1
Algorithm
Given the database E, let be n and n the initial and the final number of examples (n ≥ n ), let be m the number of attributes and let be λ ∈ R the initial parameter to relax the measure of the weakness. The P OPλ -algorithm (algorithm for patterns by projections ordered) is the following: ................................................................................ Procedure P OPλ (in: (En×m , λ), out: En ×m ) for each example ei ∈ E,i ∈ {1, . . . , n} weakness(ei ) := 0 end for for each attribute aj , j ∈ {1, . . . , m} Ej := QuickSort(Ej , aj ) in increasing order Ej := ReSort(Ej ) for each example ei ∈ Ej , i ∈ {1, . . . , n} if ei is nor border weakness(ei ) := weakness(ei ) + 1 end if end for end for for each example ei ∈ E, i ∈ {1, . . . , n} if weakness(ei ) ≥ m · λ remove ei from E end if end for end P OPλ ................................................................................ The computational cost of POP is Θ(mn log n). This cost is much lower than other algorithms proposed in the bibliography, normally Θ(mn2 ). The algorithm constructs the ordered projected sequence over each axis of the space (attribute) and it calculates the weakness for each example. The value of the projections need to be sorted when we are working with each attribute. We use QuickSort algorithm, [6], and a second sorting, we call it Resort, is made in order to create regions containing examples of the same class in a maximal way. The examples sharing the same value for an attribute are not necessary
1344
J.S. Aguilar-Ruiz et al.
Fig. 2. Applying P OPλ over two different databases. Ordinate axis shows the percentage of retention. In abscissa axis different values of the λ parameter.
nearer to those examples that have the same class and have another value. The solution to that problem consists of resorting the interval containing repeated values. The heuristic is applied to obtain the least number of changes of class. Therefore, the algorithm sort by value and, in case of equality, by class (Resort sorting). In the case of working with a database with nominal attributes, an other more elaborated version of this kind of algorithm could be considered: discrete attributes does not need to be sorted and the weakness of all the examples except one which has the least weakness obtained for the continuous attributes is increased. We are not interesting in this first approach in database with nominal attributes. Finally, examples verifying the condition over λ parameter are eliminated of the database. This parameter permit us to control the level of reduction of our editing method.
3
Experiments
Our tests have been achieved over two different databases: heart-statlog database and ionosphere database, both obtained from the UCI repository [3]. The main objective is to compare the performance of our editing method when the λ parameter is modified. We have a measure for each example, the weakness of the example, which determines its importance as a decision boundary, we relate this measure with the parameter of our editing algorithm. Our objective is to study how to establish a threshold to eliminate examples of the database, we want to determinate a parameter. A ten-fold cross-validation is made dividing the database in ten parts and taking blocks of nine parts which are our training set and the other one is the test set. We apply our reducing method to the training set and then, after having applied the corresponding classifier algorithm, we use the test set to validate the process. This operation is realized ten times each one with the different ten subset we have built.
A Measure for Data Set Editing by Ordered Projections
1345
Table 1. Computational Cost in Seconds and Error Rate (average - standard deviation) for C4.5 and k-NN (with k = 1) algorithms over the different databases obtained with P OPλ
Original POPλ=1 POPλ=0.95 POPλ=0.90 POPλ=0.85 POPλ=0.80 POPλ=0.75 POPλ=0.70 POPλ=0.65 POPλ=0.60 POPλ=0.55 POPλ=0.50
C4.5 Heart-Statlog Ionosphere CCS ER ±σ CCS ER ±σ 0.08 21.7 ±6.6 0.16 14.3 ±7.9 0.08 24.0 ±6.0 0.18 14.3 ±7.9 0.13 20.8±10.8 0.15 15.4 ±8.7 0.08 20.8±10.8 0.14 14.8 ±7.7 0.05 22.1 ±6.7 0.14 12.0 ±7.2 0.03 27.4±11.4 0.12 14.3 ±6.8 0.04 37.4±11.4 0.09 17.7 ±8.1 0.03 40.3±14.8 0.08 25.1±14.5 0.02 42.0±14.2 0.06 30.2±11.7 0.02 42.0±14.2 0.04 47.2±13.7 0.01 43.0±15.8 0.04 58.3±16.6 0.01 9.3∗ ±18.5 0.04 53.4±17.0
k-NN Heart-Statlog Ionosphere CCS ER ±σ CCS ER ±σ 0.09 24.6 ±8.1 0.06 15.4 ±8.2 0.05 24.8 ±8.6 0.05 16.3 ±8.8 0.05 28.4 ±5.9 0.05 16.8 ±9.3 0.05 28.4 ±5.9 0.05 17.1 ±9.4 0.03 24.9±11.4 0.06 17.1 ±9.4 0.02 39.4 ±8.8 0.05 16.3 ±8.9 0.01 39.4 ±8.8 0.05 15.4 ±8.2 0.01 45.4 ±9.5 0.04 15.7 ±8.5 0.00 41.3 ±9.5 0.04 19.7±11.5 0.00 41.3 ±9.5 0.04 24.8±11.2 0.00 46.6 ±6.5 0.03 44.1 ±9.5 0.00 42.9 ±7.1 0.03 51.2 ±7.9
We apply P OPλ algorithm with λ ∈ {1, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5}. Figure 2 shows the percentage of retention of our method varying with the different values of λ (100% means that the database has not any reduction)1 . We notice that both functions are increasing, that is because we have imposed the condition “ weakness(ei ) ≥ m · λ” in the algorithm and for each new valor of the parameter, we remove the new examples that verifies the condition and the old ones. If we have put = instead ≥, the graphic would not be increasing and the process would not be cumulative. If a group of examples with the same weakness is removed for a value of λ, they will be removed when the parameter decreases. We are interested in relaxing the condition over λ in order to remove examples from the database gradually. A possible threshold could be establish seeing the graphics but we must verify that we do not lose any knowledge in order to make the classification phase. The results of the classifications using C4.5 and k-NN techniques are shown in Table 1. We presents the CCS, computational cost in seconds, and the ER, error rate, for the original database and the different reduced databases. ER is the classification error produced when the test set validates the model that we have constructed applying the different techniques to the different databases. Values are the average and the standard deviation of complete 10-fold cross-validation (the sum of the 10 experiments for each value obtained). We can observe that the computational cost is decreasing while the lambda value is decreasing too. So, if we found a lambda value less than 1 without losing information, we would manage reduce the database size and the computational cost would decreased too. 1
Percentages of retention for each value of λ are the average of the ten different experiments realized in the 10-fold CV.
1346
J.S. Aguilar-Ruiz et al.
The purpose is to study the relevance of P OPλ as a reduction method for the different values of λ. The best situation could be getting the λ which produces the greatest reduction in our database and the least error rate when the classification methods is applied. We observe between λ = 0.85 and λ = 0.80 a possible threshold: for λ = 0.80 the number of examples are removed dramatically from the database and the error rate seems to increase. We have a good candidate to be our threshold. We must verify it and we must study how to establish the valour of the parameter. In order to proof the goodness of our parameter, in Table 2, we carry out the study of the situation for this two values over all the databases from UCI repository with continuous attributes [3]. We show the percentage of reduction, PR, in order to indicate the number of examples which are removed from the database, values are the average of complete 10-fold cross-validation process. We must consider how the error changes when the database is reduced considerably. Our aim is to ascertain the value of λ which reduce more the database without losing information. For example, for Heart–Statlong, the Error Rate from the original database using C4.5 is 21.7 ± 6.6, but if we apply P OPλ=0.85 (Table 2) the error would be only 22.1 ± 6.7. That is, we have managed to reduce the database in a 68.8% (100-PR) without losing knowledge. If we take the same database and configuration but using k-NN, similar behavior is observed. In general, for λ = 0.85, databases would be reduced to 61.8% of the original size and error rate would be incremented from 18.6 ± 4.8 to 24.8 ± 5.8 using C4.5, and from 15.4 ± 4.6 to 23.1 ± 5.0 using 1-NN. We have drawn in bold in both tables the data which are relevant according to t-Student statistical distribution. Paying attention to results obtained with λ = 0.8 we have to say that although databases was reduced dramatically, the error rate is incremented notably. These Experiments show us how to establish an appropriate value of λ parameter in order to apply the P OPλ algorithm reducing the database up to the limit and conserving the knowledge of the original database. In summary, we can state that with lambda values minor than 1 it is possible a higher database size reduction without losing information. But this reduction is limited to a 0.85 lambda value. We have established a method to find out a threshold to relax the reduction factor over the algorithm for finding representative patterns for dat aset editing. We have proven the goodness of our level with all the databases of the UCI repository with only continuous attributes.
4
Conclusions
We have defined a new parameter, λ, which helps us to remove all examples in a database which verify a condition over it, its “weakness ≥(number attributes)·λ”. Therefore we have established a threshold via a measure over each example in order to reduce the number of examples of a database. After analyzing our approach using some databases from the UCI–Repository, we conclude that it is possible to reduce the database size up to a 40% without losing any knowledge. Furthermore, the computational cost is decreased by allowing to remove examples with weakness less than the number of attributes.
A Measure for Data Set Editing by Ordered Projections
1347
Table 2. Error Rate for C4.5 and k-NN (with k = 1) algorithms over databases from UCI repository with continuous attributes. Every database is considered before and after applying P OPλ for λ = 0.85, for the first table, λ = 0.80 for the second one. PR is the percentage of reduction of the reduction algorithm.
Original Data Base ER ±σ Heart-Statlog 21.7 ±6.6 Ionosphere 14.3 ±7.9 Balance-Scale 25.4 ±7.2 Breast-W 5.2 ±2.7 Bupa 15.4 ±0.0 Diabetes 26.4 ±7.3 Glass 51.3 ±19 Iris 2.0 ±0.0 Lung-Cancer 15.6 ±0.0 Page-Blocks 1.5 ±0.0 Pima-Indians-Diabetes 15.9 ±0.0 Segment 3.3 ±1.3 Sonar 45.5±19.3 Vehicle 27.8 ±3.4 Waveform-5000 25.1 ±1.5 Wine-5000 1.1 ±0.0 Average 18.6 ±4.8
C4.5 P OPλ=0.85 ER ±σ PR 22.1 ±6.7 31.2 12.0 ±7.2 78.2 66.0 ± 17.7 10.8 17.0 ± 9.9 5.2 42.0 ± 0.0 42.0 28.1 ±7.7 55.3 51.4 ±19 97.6 12.7 ± 0.0 26.0 9.4 ± 0.0 96.0 9.3 ± 0.0 13.0 23.6 ± 0.0 58.0 3.8 ±1.5 92.0 45.5±19.3 100.0 28.2±4.6 88.1 25 ±1.8 97.1 1.1 ±0.0 98.0 24.8 ±5.8 61.8
Original ER ±σ 24.6 ±8.1 15.4 ±8.2 20.2 ±5.6 5.2 ±4.7 0.0 ±0.0 29.6 ±4.0 39.8 ±12 0.0 ±0.0 0.0 ±0.0 0.3 ±5.6 0.0 ±0.0 3.2 ±1.6 49.4±16.1 31.9 ±4.6 26.8 ±1.5 0.0 ±0.0 15.4 ±4.6
k-NN P OPλ=0.85 ER ±σ PR 24.9 ± 11.4 31.2 17.1 ±9.4 78.2 54.0 ± 13.1 10.8 23.2 ± 7.8 5.2 22.6 ± 0.0 42.0 35.0 ± 5.8 55.3 39.8 ±12 97.6 3.3 ± 0.0 26.0 3.1 ± 0.0 96.0 23.5 ± 3.6 13.0 11.9 ± 1.8 58.0 3.2 ±1.6 92.0 49.4±16.1 100.0 31.8 ±4.4 88.1 26.8 ±1.6 97.1 0.0 ±0.0 98.0 23.1 ±5.0 61.8
Original Data Base ER ±σ Heart-Statlog 21.7 ±6.6 Ionosphere 14.3 ±7.9 Balance-Scale 25.4 ±7.2 Breast-W 5.2 ±2.7 Bupa 15.4 ±0.0 Diabetes 26.4 ±7.3 Glass 51.3 ±19 Iris 2.0 ±0.0 Lung-Cancer 15.6 ±0.0 Page-Blocks 1.5 ±0.0 Pima-Indians-Diabetes 15.9 ±0.0 Segment 3.3±1.4 Sonar 45.5±19.3 Vehicle 27.8 ±3.4 Waveform-5000 25.1 ±1.5 Wine-5000 1.1 ±0.0 Average 18.6 ±4.8
C4.5 P OPλ=0.80 ER ±σ PR 27.4 ± 11.4 11.7 14.3 ±6.8 62.5 6.0 ± 17.8 10.8 50.4 ± 29 1.3 42.0 ± 0.0 42.0 49.5 ± 8.2 26.4 52.6±17.5 94.2 12.7 ± 0.0 26.0 12.5 ± 0.0 90.0 9.3 ± 0.0 13.0 48.8 ± 0.0 28.0 4.1 ±1.5 87.0 45.5±19.3 100.0 29.0 ±6.4 79.6 25.1 ±1.6 92.6 1.1 ±0.0 96.6 30.6 ±7.5 53.9
Original ER ±σ 24.6 ±8.1 15.4 ±8.2 20.2 ±5.6 5.2 ±4.7 0.0 ±0.0 29.6±4.0 39.8±12.0 0.0 ±0.0 0.0 ±0.0 0.3 ±5.6 0.0 ±0.0 3.2 ±1.6 49.4±16.1 31.9 ±4.6 26.8 ±1.5 0.0 ±0.0 15.4 ±4.6
k-NN P OPλ=0.80 ER ±σ PR 39.4 ± 8.8 11.7 16.3 ±8.9 62.5 54.0 ± 13.2 10.8 34.9 ± 9.0 5.2 22.6 ± 0.0 42.0 47.0 ± 4.1 26.4 39.8 ±12 94.2 3.3 ± 0.0 26.0 9.4 ± 0.0 90.0 23.6 ± 0.0 13.0 32.2 ± 0.0 13.0 3.7 ±1.5 87.0 49.4±16.1 100.0 32.8 ±5.3 79.6 26.9 ±1.6 92.6 0.0 ±0.0 96.6 27.2 ±5.0 53.9
1348
J.S. Aguilar-Ruiz et al.
In spite of having introduced a new parameter and treating a problem of editing (some authors consider editing is also rather old-fashioned because it is worthwhile to spend the pre-processing time with today´s standard technologies), this paper begins a way to consider the preprocessing problem, such editing as features selection, as a problem of election of two parameters. As a future work, the combination of P OPλ -algorithm with SOAP-algorithm [12] is proposed. Thus we will obtain an algorithm to preprocess a database working with two parameters in order to remove such examples as attributes.
References 1. Aguilar, J.S.; Riquelme, J.C.; Toro, M.: Data set editing by ordered projection, in: Proceedings of the 14th European Conference on Arti2cial Intelligence (ECAI’00), Berlin, Germany, (2000), pp. 251-255. 2. Aha, D.W.; Kibler, D.; Albert, M.K.: Instance-based learning algorithms, Mach. Learning 6 (1991), pp. 37-66. 3. Blake, C.; Merz, E.K.: UCI repository of machine learning databases, (1998). 4. Cover, T. and Hart, P.: Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, IT-13 (1) (1967), pp. 21-27. 5. Hart, P.: The condensed nearest neighbor rule, IEEE Trans. Inf. Theory 14 (3) (1968), pp. 515-516. 6. Hoare, C.A.R.: Quicksort, Comput.J. 5 (1) (1962), pp. 10-15. 7. Klee, V.: On the complexity of d-dimensional voronoi diagrams, Arch. Math. 34 (1980), pp. 75-80. 8. Neumann, Julia; Schnörr, Christoph; Steidl, Gabriele: SVM-based Feature Selection by Direct Objective Minimisation Pattern Recognition, Proc. of 26th DAGM Symposium, LNCS, Springer, August (2004). 9. Quinlan, J.R.: C4.5: Programs for Machine Learning, Morgan Kauffmann, San Mateo, CA, (1993). 10. Riquelme, José C.; Aguilar-Ruiz, Jesús S.; Toro, Miguel: Finding representative patterns with ordered projections Pattern Recognition 36 (2003), pp. 1009-1018. 11. Ritter, G.; WoodruI, H.; Lowry, S.; Isenhour, T.: An algorithm for a selective nearest neighbor decision rule, IEEE Trans. Inf. Theory 21 (6) (1975), pp. 665669. 12. Ruiz, R.; Riquelme, Jose C.; Aguilar-Ruiz, Jesus S.: NLC: A Measure Based on Projections 14th International Conference on Database and Expert Systems Applications, DEXA 2003Lecture Notes in Computer Science, Springer-VerlagPrague, Czech Republic, 1-5 September, (2003). 13. Tomek, I.: An experiment with the edited nearest-neighbor rule, IEEE Trans. Syst. Man Cybern. 6 (6) (1976), pp. 448-452. 14. Toussaint, G.T.: The relative neighborhood graph of a finite planar set, Pattern Recognition 12 (4) (1980), pp. 261-268. 15. Wilson, D.R.; Martinez, T.R.: Improved heterogeneous distance functions, J. Artif. Intell. Res. 6 (1) (1997), pp. 1-34.
Author Index
Abbassene, Ali 800 Abril, Montserrat 400 Adnan, Muhaimenul 363 Aguilar-Ruiz, Jesús S. 1339 Agulló, Loreine 917 Ahn, Eun Yeong 1211 Ahn, Tae-Chon 472 Aioanei, Daniel 691 Alcalá, Rafael 452 Alcalá-Fdez, Jesús 452 Alhajj, Reda 363 Amarger, Véronique 155 An, Zeng 353 Angerer, Bibiane 600 Antunes, Ana 908 Arredondo V., Tomás 462, 917 Avarias A., Jorge 917 Bacauskiene, Marija 701 Bannister, Peter R. 1149 Bao, Yukun 1080 Baoyan, Liu 1290 Barber, Federico 400 Barker, Ken 363 Barker, Nathan 962 Barlatier, Patrick 14 Basile, Teresa Maria Altomare 721 Batouche, Mohamed 800, 809 Belghith, Khaled 838 Berlanga, Francisco José 452 Bhattacharya, Anindya 943 Biba, Marenglen 721 Bingru, Yang 1290 Bø, Ketil 554 Boberg, Jorma 610 Bosin, Andrea 790 Brennan, Jane 898 Brézillon, Patrick 137, 146 Briand, Henri 312 Calderón B., Felipe 917 Cameron, Ian T. 70 Candel C., Diego 917
Carloni, Olivier 590 Carson-Berndsen, Julie 674, 691 Casanovas, Pompeu 1000 Casellas, Núria 1000 Cesta, Amedeo 410, 421 Chaiyaratana, Nachol 1090 Chang, Chir-Ho 1200 Chang, Chuan-Yu 1119 Chang, Yeon-Pung 760 Chang, Yu-Chuan 1249 Chau, K.W. 111, 548 Che, Oscar 711 Cheepala, Satish 972 Chelloug, Samia 809 Chen, Chao-Wen 1221 Chen, Jiah-Shing 197 Chen, Jr-Shian 1270 Chen, Jungan 859 Chen, Lijuan 711 Chen, Peter P. 750 Chen, Rong 639 Chen, Shifu 510 Chen, Shyi-Ming 432, 442, 1249, 1280 Chen, Stephen 44 Chen, Xi 54 Chen, Yi-Wei 760 Chen, ZhiHang 1169 Cheng, Ching-Hsue 478, 1270 Cheung, Yee Chung 127 Chien, Been-Chian 1318 Chohra, Amine 155 Choi, Chang-Ho 177 Choi, Nam-Sup 177 Choi, Sang-Kyu 177 Chu, Bong-Horng 1259 Chu, Ming-Hui 760 Chung, Paul 54 Chung, Paul Wai Hing 127 Chung, Yeh-Ching 1299 Clifford, John 972 Clifton, David A. 1149 Cordova H., Macarena 917 Cortellessa, Gabriella 421
1350
Author Index
Dai, Yun 927 Damásio, Carlos V. 650 Dan, Pan 353 Dapoigny, Richard 14 Davoodi, Mansoor 1100 De, Rajat K. 943 Demazeau, Yves 731 Despres, Sylvie 1014 Dessì, Nicoletta 790 Di Mauro, Nicola 629, 721 Diallo, Gayo 1024 Dias, Fernando Morgado 908 Díaz-Díaz, Norberto 1339 Dombrovskaia, Lioubov 917 Dourgnon-Hanoune, Anne 583 Ekinci, Murat 500 Esposito, Floriana 629, 721 Fanizzi, Nicola 629 Felfernig, Alexander 869 Feng, Jun 117 Ferilli, Stefano 721 Foulloy, Laurent 14 Fratini, Simone 421 Frenz, Christopher M. 935 Freund, Wolfgang 462 Gacto, María José 452 García-Hernández, Ma. de Guadalupe 1179 Garrido, Antonio 1179 Garza Castañón, Luis E. 520, 530 Gentil, Sylviane 2 Ghose, Aditya K. 780, 1127 Giunchiglia, Fausto 1 Gómez, Luis 917 Gonzalez, Avelino J. 137 Groza, Adrian 91 Gu, Mingyang 554 Guan, Ying 780 Guillet, Fabrice 312 Guo, Yubin 1071 Han, Chang-Wook 238 Hangos, Katalin M. 70 Hasgul, Servet 393 Hendtlass, Tim 292 Hennig, Sascha 332 Herrera, Francisco 452
Ho, Cheng-Seen 1259 Hong, Dong Kwon 879 Hong, Tzung-Pei 1329 Honghai, Feng 1290 Hou, Jia-Leh 197 Hsiao, Kai-Chung 1259 Hsu, Mu-Hsiu 1111 Hsu, Steen J. 1318 Huang, Liping 1309 Hung, Che-Lun 1299 Hung, Ming-Chuan 1299 Hur, Gi T. 488 Huynh, Xuan-Hiep 312 Iannone, Luigi 629 Imada, Miyuki 322 Ingolotti, Laura 400 Islier, A. Attila 741 Jakulin, Aleks 1000 Jamont, Jean-Paul 101 Jang, Min-Soo 540 Jannach, Dietmar 166, 819 Jędrzejowicz, Joanna 24 Jędrzejowicz, Piotr 24 Ji, Se-Jin 770 Jian-bin, He 353 Jorge, Rui D. 650 Kabanza, Froduald 838 Kanaoui, Nadia 155 Kang, Jaeho 1159, 1211 Kang, Yuan 760 Kanokphara, Supphanat 674, 691 Kim, Dongwon 830 Kim, Jung H. 488 Kim, Kap Hwan 1159, 1211 Kim, Kweon Yang 879 Kim, Kyoung Min 177 Kim, Sang-Jun 540 Kim, Sang-Woon 668 Kim, Sun Yong 322 Kim, Yong-Ha 177 Kim, Yong-Guk 540 Komatani, Kazunori 207 Kong, Ying 44 Krishna, Aneesh 780 Kuntanapreeda, Suwat 1090 Kuster, Jürgen 166
Author Index Kuwahara, Hiroyuki 962 Kwon, Jung-Woo 770 Lakner, Rozália 70 Laowattana, Djitt 60 Latorre R., Valeria 917 Leclère, Michel 590 Lee, Buhm 177 Lee, Chang-Shing 1240 Lee, Huey-Ming 1111 Lee, Jeong-Eom 540 Lee, Jia-Chien 682 Lee, Jimmy A. 898 Lee, Lee-Min 682 Lee, Li-Wei 1280 Lee, Seok-Joo 540 Lee, S.H. 889 Lee, Tsang-Yean 1111 Lee, Yeong-Chyi 1329 Letia, Ioan Alfred 91 Li, Xi 1138 Li, Yanzhi 272 Liang, Feng 859 Liao, Shih-Feng 1111 Liau, Churn-Jung 1249 Liegl, Johannes 819 Lim, Andrew 262, 272, 282, 711, 1138, 1189 Lin, Jin-Ling 1200 Lin, Si-Yan 1119 Lin, Ya-Tai 218 Lin, Yan-Xia 927 Liu, Alan 750 Liu, Ying 342 Liu, Yuan-Liang 760 Liu, Zhitao 1080 LiYun, He 1290 Loganantharaj, Raja 972 Loh, Han Tong 342 Lova, Antonio 400 Lu, Cheng-Feng 1318 Lu, Min 373 Lv, Pin 373 Ma, Hong 262, 272 Macek, Jan 674 Madani, Kurosh 155 Maneewarn, Thavida 60 Martin, Trevor 12 Martyna, Jerzy 1231
Masrur, Abul M. 1169 Melaye, Dimitri 731 Meliopoulos, Sakis A. 177 Mellal, Nacima 14 Meshoul, Souham 800, 809 Miao, Zhaowei 262, 1138 Mitra, Debasis 953 Montes de Oca, Saúl 520 Morales-Menéndez, Rubén 520 Moser, Irene 292 Mota, Alexandre Manuel 908 Mugnier, Marie-Laure 590 Mukai, Naoto 117 Mukkamala, Srinivas 619 Muñoz, Cesar 462 Muñoz R., Freddy 917 Murphey, Yi L. 1169 Murphey, Yi Lu 1309 Musliu, Nysret 302 Myers, Chris 962 Nakadai, Kazuhiro 207 Nakano, Mikio 207 Navarro, Nicolas 462 Németh, Erzsébet 70 Nepomuceno, Isabel 1339 Nepomuceno, Juan A. 1339 Nilsson, Carl-Magnus 701 Nkambou, Roger 838, 848 Noda, Jugo 573 Nolazco-Flores, Juan A. 530 Occello, Michel 101 Oddi, Angelo 421 Ogata, Tetsuya 207 Oh, Myung-Seob 1211 Oh, Soo-Hwan 668 Oh, Sung-Kwun 472 Ohta, Masakatsu 322 Okuno, Hiroshi G. 207 Onaindía, Eva 383, 1179 Ozcelik, Feristah 741 Ozkan, Metin 393 Pahikkala, Tapio 610 Park, Gwi-Tae 540, 830 Park, Ho-Sung 472 Park, Jong-Hee 770 Park, Jung-Il 238 Park, Min Chul 540
1351
1352
Author Index
Park, Se Young 879 Park, Taejin 228 Parlaktuna, Osman 393 Pazienza, Maria Teresa 990, 1042 Peischl, Bernhard 660 Pelle, Josephine 848 Pennacchiotti, Marco 1042 Pereira, Luís Moniz 81 Pérez Reigosa, MariCarmen 530 Pes, Barbara 790 Policella, Nicola 410, 421 Potter, W.D. 244 Qi-lun, Zheng 353 Quirós, Fernando 462 Randall, Marcus 254 Rasconi, Riccardo 410 Rau, Hsin 1221 Rezaei, Jafar 1100 Ritthipravat, Panrasee 60 Roche, Christophe 583, 1034 Rodrigues, Brian 1138 Ryu, Kwang Ryel 228, 1159, 1211 Saga, Ryosuke 573 Sălăgean, Ana 127 Salakoski, Tapio 610 Salaün, Patrick 583 Salido, Miguel Angel 400 Samant, Gandhali 953 Santana, Pedro 81 Sapena, Oscar 383 Saricicek, Inci 393 Schuster, Alfons 187 Seeger P., Michael 917 Sengupta, Kuntal 953 Shao, Jinyan 34 Shen, Lixiang 342 Shr, Arthur M.D. 750 Shuo, Zhao 1290 Sie, Shun-hong 982 Simonet, Ana 1024 Simonet, Michel 1024 Sombattheera, Chattrakul 780, 1127 Soomro, Safeeullah 660 Spotton Visano, Brenda 44 Srikasam, Wasan 1090 Stellato, Armando 990 Su, Jin-Shieh 1111
Sung, Andrew H. 619 Szulman, Sylvie 1014 Takeda, Ryu 207 Tang, Deyou 1071 Tang, Yin 1059 Tarassenko, Lionel 1149 Tormos, Pilar 400 Tseng, Lin-Yu 218 Tsivtsivadze, Evgeni 610 Tsuji, Hiroshi 573 Tsujino, Hiroshi 207 Tuohy, Daniel R. 244 Valin, Jean-Marc 207 Vallbé, Joan-Josep 1000 Verikas, Antanas 701 Verrons, Marie-Hélène 731 Vieira, José 908 Viswanathan, M. 889 Wang, Chih-Huang 432 Wang, Hsing-Wen 564 Wang, Hui-Yu 442 Wang, Hung-Jen 1119 Wang, Jia-Wen 478 Wang, Long 34 Wang, Mei-Hui 1240 Wang, Tien-Chin 1329 Wang, Zhenyu 711 Watanabe, Toyohide 117 Wotawa, Franz 600, 639, 660 Wu, C.L. 111 Wurst, Michael 332 Wyatt, Jeremy 60 Xi, Jianqing Xu, Dennis
1071 619
Yamamoto, Shun’ichi 207 Yang, Dongyong 859 Yang, Don-Lin 1299 Yang, Jia-Yan 1200 Yang, Y.K. 889 Yang, Yubin 510 Yap, Ivan 342 Yeh, Jian-hua 982 Yong-quan, Yu 353 Yoon, Sung H. 488
Author Index Yu, Junzhi 34 Yueli, Li 1290 Zanzotto, Fabio Massimo Zhang, Jin-fang 373 Zhang, Kaicheng 1189
1042
Zhang, Ren 927 Zhang, Yao 510 Zhu, Wenbin 282 Zinglé, Henri 1053 Zou, Hua 1080
1353