Lecture Notes in Artificial Intelligence Edited by J. G. Carbonell and J. Siekmann
Subseries of Lecture Notes in Computer Science
2639
3
Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Tokyo
Guoyin Wang Qing Liu Yiyu Yao Andrzej Skowron (Eds.)
Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing 9th International Conference, RSFDGrC 2003 Chongqing, China, May 26-29, 2003 Proceedings
13
Volume Editors Guoyin Wang Chongqing University of Posts and Telecommunications Institute of Computer Science and Technology Chongqing, 400065, P.R. China E-mail:
[email protected] Qing Liu Nanchang University Department of Computer Science Nanchang, 330029, P.R. China E-mail:
[email protected] Yiyu Yao University of Regina Department of Computer Science Regina, Saskatchewan, S4S 0A2, Canada E-mail:
[email protected] Andrzej Skowron Warsaw University Institute of Mathematics Banacha 2, 02-097 Warsaw, Poland E-mail:
[email protected] Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress. Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at . CR Subject Classification (1998): I.2, H.2.4, H.3, F.4.1, F.1, I.5, H.4 ISSN 0302-9743 ISBN 3-540-14040-9 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by PTP-Berlin GmbH Printed on acid-free paper SPIN: 10928639 06/3142 543210
Preface This volume contains the papers selected for presentation at the 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC 2003) held at Chongqing University of Posts and Telecommunications, Chongqing, P.R. China, May 26–29, 2003. There were 245 submissions for RSFDGrC 2003 excluding for 2 invited keynote papers and 11 invited plenary papers. Apart from the 13 invited papers, 114 papers were accepted for RSFDGrC 2003 and were included in this volume. The acceptance rate was only 46.5%. These papers were divided into 39 regular oral presentation papers (each allotted 8 pages), 47 short oral presentation papers (each allotted 4 pages) and 28 poster presentation papers (each allotted 4 pages) on the basis of reviewer evaluations. Each paper was reviewed by three referees. The conference is a continuation and expansion of the International Workshops on Rough Set Theory and Applications. In particular, this was the ninth meeting in the series and the first international conference. The aim of RSFDGrC2003 was to bring together researchers from diverse fields of expertise in order to facilitate mutual understanding and cooperation and to help in cooperative work aimed at new hybrid paradigms. It is our great pleasure to dedicate this volume to Prof. Zdzislaw Pawlak, who first introduced the basic ideas and definitions of rough sets theory over 20 years ago. Rough sets theory has grown to be a useful method in soft computing. It has also been applied in many artificial intelligence systems and research fields, such as data mining, machine learning, pattern recognition, uncertain reasoning, granular computing, intelligent decision-making, etc. Many international conferences now include rough sets in their lists of topics. The main theme of the conference was centered around rough sets theory, fuzzy sets theory, data mining technology, granular computing, and their applications. The papers contributed to this volume reflect advances in these areas and some other closely related research areas, such as, − Rough sets foundations, methods, and applications − Fuzzy sets and systems − Data mining − Granular computing − Neural networks − Evolutionary computing − Machine learning − Pattern recognition and image processing − Logics and reasoning − Multi-agent systems − Web intelligence − Intelligent systems We wish to express our gratitude to Profs. Zdzislaw Pawlak, Bo Zhang, and Ling Zhang for accepting our invitation to be keynote speakers at RSFDGrC 2003. We also wish to thank Profs. Hongxing Li, Tsau Young Lin, Sankar K. Pal, Lech Polkowski, Andrzej Skowron, Hideo Tanaka, Shusaku Tsumoto, Shoujue Wang, Michael Wong,
VI
Preface
Yiyu Yao, and Yixin Zhong, who accepted our invitation to present plenary papers at this conference. We wish to express our thanks to the Honorary Chairs, General Chairs, Program Chairs, and the members of the Advisory Board, Zdzislaw Pawlak, Lotfi A. Zadeh, Tsau Young Lin, Andrzej Skowron, Shusaku Tsumoto, Guoyin Wang, Qing Liu, Yiyu Yao, James Alpigini, Nick Cercone, Jerzy Grzymala-Busse, Akira Nakamura, Sankar Pal, James F. Peters, Lech Polkowski, Zbigniew Ras, Roman Slowinski, Lianhua Xiao, Bo Zhang, Ning Zhong, Yixin Zhong, and Wojciech Ziarko, for their kind contribution to and support of the scientific program and many other conferencerelated issues. We also acknowledge the help in reviewing papers from all reviewers. We want to thank all individuals who submitted valuable papers to the RSFDGrC 2003 conference and all conference attendees. We also wish to express our thanks to Alfred Hofmann at Springer-Verlag for his support and cooperation. We are grateful to our sponsors and supporters: the National Natural Science Foundation of China, Chongqing University of Posts and Telecommunications, the Municipal Education Committee of Chongqing, China, the Municipal Science and Technology Committee of Chongqing, China, and the Bureau of Information Industry of Chongqing, China for its financial and organizational support. We also would like to express our thanks to the Local Organizing Chair, the President of Chongqing University of Posts and Telecommunications, Prof. Neng Nie for his great help and support in the whole process of preparing RSFDGrC 2003. We also want to thank the secretaries of the conference, Yu Wu, Hong Tang, Li Yang, Guo Xu, Lan Yang, Hongwei Zhang, Xinyu Li, Yunfeng Li, Dongyun Hu, Mulan Zhang, Anbo Dong, Jiujiang An, Zhengren Qin, and Zheng Zheng, for their help in preparing the RSFDGrC 2003 proceedings and organizing the conference. May 2003
Guoyin Wang Qing Liu Yiyu Yao Andrzej Skowron
RSFDGrC 2003 Conference Committee
Honorary Chairs: Zdzislaw Pawlak, Lotfi A. Zadeh General Chairs: Tsau Young Lin, Andrzej Skowron, Shusaku Tsumoto Program Chairs: Guoyin Wang, Qing Liu, Yiyu Yao Local Chairs: Neng Nie, Guoyin Wang Advisory Board :
James Alpigini Akira Nakamura Lech Polkowski Lianhua Xiao Yixin Zhong
Nick Cercone Sankar Pal Zbigniew Ras Bo Zhang Wojciech Ziarko
Jerzy Grzymala-Busse James F. Peters Roman Slowinski Ning Zhong
Haoran Liu Yu Wu
Yuxiu Song Li Yang
Local Committee: Juhua Jing Hong Tong
Program Committee
Peter Apostoli Qingsheng Cai Jitender S. Deogun Maria C. Fernandez Salvatore Greco Jouni Jarvinen Daijin Kim Bozena Kostek Yuefeng Li
Malcolm Beynon Mihir Kr. Chakraborty Dieder Dubois Guenter Gediga Xiaohua Hu Fan Jin Jan Komorowski Marzena Kryszkiewicz Pawan Lingras
Hans Dieter Burkhard Andrzej Czyzewski Ivo Duentsch Fernando Gomide Masahiro Inuiguchi Janusz Kacprzyk Jacek Koronacki Churn-Jung Liau Chunnian Liu
VIII
Conference Committee
Jiming Liu Solomon Marcus Nakata Michinori Tetsuya Murai Piero Pagliani Henri Prade Sheela Ramanna Zhongzhi Shi Zbigniew Suraj Marcin Szczuka Mihaela Ulieru Anita Wasilewska Keming Xie Wenxiu Zhang
Zongtian Liu Benedetto Matarazzo Sadaaki Miyamoto Hung Son Nguyen Gheorghe Paun Mohamed Quafafou Ron Shapira Jerzy Stefanowski Roman Swiniarski Francis E.H. Tay Alicja Wakulicz-Deja Michael Wong Jingtao Yao Zhi-Hua Zhou
Brien Maguire Ernestina Menasalvas-Ruiz Mikhail Moshkov Ewa Orlowska Witold Pedrycz Vijay Raghvan Qiang Shen Jaroslav Stepaniuk Andrzej Szalas Helmut Thiele Hui Wang Xindong Wu Huanglin Zeng
Table of Contents
Keynote Papers Flow Graphs and Decision Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zdzislaw Pawlak
1
The Quotient Space Theory of Problem Solving . . . . . . . . . . . . . . . . . . . . . . . Ling Zhang, Bo Zhang
11
Plenary Papers Granular Computing (Structures, Representations, and Applications) . . . . Tsau Young Lin
16
Rough Sets: Trends and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrzej Skowron, James F. Peters
25
A New Development on ANN in China – Biomimetic Pattern Recognition and Multi Weight Vector Neurons . . . . . . . . . . . . . . . . . . . . . . . . Shoujue Wang On Generalizing Rough Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y.Y. Yao Dual Mathematical Models Based on Rough Approximations in Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hideo Tanaka Knowledge Theory: Outline and Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y.X. Zhong A Rough Set Paradigm for Unifying Rough Set Theory and Fuzzy Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lech Polkowski
35 44
52 60
70
Extracting Structure of Medical Diagnosis: Rough Set Approach . . . . . . . . Shusaku Tsumoto
78
A Kind of Linearization Method in Fuzzy Control System Modeling . . . . . Hongxing Li, Jiayin Wang, Zhihong Miao
89
A Common Framework for Rough Sets, Databases, and Bayesian Networks S.K.M. Wong, D. Wu
99
Rough Sets, EM Algorithm, MST and Multispectral Image Segmentation Sankar K. Pal, Pabitra Mitra
104
X
Table of Contents
Rough Sets Foundations and Methods Rough Mereology: A Survey of New Developments with Applications to Granular Computing, Spatial Reasoning and Computing with Words . . 106 Lech Polkowski A New Rough Sets Model Based on Database Systems . . . . . . . . . . . . . . . . . 114 Xiaohua Tony Hu, Tsau Young Lin, Jianchao Han A Rough Set and Rule Tree Based Incremental Knowledge Acquisition Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Zheng Zheng, Guoyin Wang, Yu Wu Comparison of Conventional and Rough K-Means Clustering . . . . . . . . . . . 130 Pawan Lingras, Rui Yan, Chad West An Application of Rough Sets to Monk’s Problems Solving . . . . . . . . . . . . . 138 Duoqian Miao, Lishan Hou Pre-topologies and Dynamic Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Piero Pagliani Rough Sets and Gradual Decision Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Salvatore Greco, Masahiro Inuiguchi, Roman S/lowi´ nski Explanation Oriented Association Mining Using Rough Set Theory . . . . . . 165 Y.Y. Yao, Y. Zhao, R. Brien Maguire Probabilistic Rough Sets Characterized by Fuzzy Sets . . . . . . . . . . . . . . . . . 173 Li-Li Wei, Wen-Xiu Zhang A View on Rough Set Concept Approximations . . . . . . . . . . . . . . . . . . . . . . . 181 Jan Bazan, Nguyen Hung Son, Andrzej Skowron, Marcin S. Szczuka Evaluation of Probabilistic Decision Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Wojciech Ziarko Query Answering in Rough Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . 197 Aida Vit´ oria, Carlos Viegas Dam´ asio, Jan Ma/luszy´ nski Upper and Lower Recursion Schemes in Abstract Approximation Spaces . 205 Peter Apostoli, Akira Kanda Adaptive Granular Control of an HVDC System: A Rough Set Approach James F. Peters, H. Feng, Sheela Ramanna
213
Rough Set Approach to Domain Knowledge Approximation . . . . . . . . . . . . 221 Tuan Trung Nguyen, Andrzej Skowron Reasoning Based on Information Changes in Information Maps . . . . . . . . . 229 Andrzej Skowron, Piotr Synak
Table of Contents
XI
Characteristics of Accuracy and Coverage in Rule Induction . . . . . . . . . . . . 237 Shusaku Tsumoto Interpretation of Rough Neural Networks as Emergent Model . . . . . . . . . . . 245 Yasser Hassan, Eiichiro Tazaki Using Fuzzy Dependency-Guided Attribute Grouping in Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 Richard Jensen, Qiang Shen Conjugate Information Systems: Learning Cognitive Concepts in Rough Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Maria Semeniuk-Polkowska, Lech Polkowski A Rule Induction Method of Plant Disease Description Based on Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Ai-Ping Li, Gui-Ping Liao, Quan-Yuan Wu Rough Set Data Analysis Algorithms for Incomplete Information Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 K.S. Chin, Jiye Liang, Chuangyin Dang Inconsistency Classification and Discernibility-Matrix-Based Approaches for Computing an Attribute Core . . . . . . . . . . . . . . . . . . . . . . . . 269 Dongyi Ye, Zhaojiong Chen Multi-knowledge Extraction and Application . . . . . . . . . . . . . . . . . . . . . . . . . 274 QingXiang Wu, David Bell Multi-rough Sets Based on Multi-contexts of Attributes . . . . . . . . . . . . . . . . 279 Rolly Intan, Masao Mukaidono Approaches to Approximation Reducts in Inconsistent Decision Tables . . . 283 Ju-Sheng Mi, Wei-Zhi Wu, Wen-Xiu Zhang Degree of Dependency and Quality of Classification in the Extended Variable Precision Rough Sets Model . . . . . . . . . . . . . . . . . . . . . . . 287 Malcolm J. Beynon Approximate Reducts of an Information System . . . . . . . . . . . . . . . . . . . . . . 291 Tien-Fang Kuo, Yasutoshi Yajima A Rough Set Methodology to Support Learner Self-Assessment in Web-Based Distance Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Hongyan Geng, R. Brien Maguire A Synthesis of Concurrent Systems: A Rough Set Approach . . . . . . . . . . . . 299 Zbigniew Suraj, Krzysztof Pancerz Towards a Line-Crawling Robot Obstacle Classification System: A Rough Set Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 James F. Peters, Sheela Ramanna, Marcin S. Szczuka
XII
Table of Contents
Order Based Genetic Algorithms for the Search of Approximate Entropy Reducts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 ´ ˛zak, Jakub Wr´ Dominik Sle oblewski Variable Precision Bayesian Rough Set Model . . . . . . . . . . . . . . . . . . . . . . . . 312 ´ ˛zak, Wojciech Ziarko Dominik Sle Linear Independence in Contingency Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Shusaku Tsumoto The Information Entropy of Rough Relational Databases . . . . . . . . . . . . . . . 320 Yuefei Sui, Youming Xia, Ju Wang A T-S Type of Rough Fuzzy Control System and Its Implementation . . . . 325 Jinjie Huang, Shiyong Li, Chuntao Man Rough Mereology in Knowledge Representation . . . . . . . . . . . . . . . . . . . . . . . 329 Cungen Cao, Yuefei Sui, Zaiyue Zhang Rough Set Methods for Constructing Support Vector Machines . . . . . . . . . 334 Yuancheng Li, Tingjian Fang The Lattice Property of Fuzzy Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Fenglan Xiong, Xiangqian Ding, Yuhai Liu Querying Data from RRDB Based on Rough Sets Theory . . . . . . . . . . . . . . 342 Qiusheng An, Guoyin Wang, Junyi Shen, Jiucheng Xu An Inference Approach Based on Rough Sets . . . . . . . . . . . . . . . . . . . . . . . . . 346 Fuyan Liu, Shaoyi Lu Classification Using the Variable Precision Rough Set . . . . . . . . . . . . . . . . . . 350 Yongqiang Zhao, Hongcai Zhang, Quan Pan An Illustration of the Effect of Continuous Valued Discretisation in Data Analysis Using VPRSβ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 Malcolm J. Beynon
Fuzzy Sets and Systems Application of Fuzzy Control Base on Changeable Universe to Superheated Steam Temperature Control System . . . . . . . . . . . . . . . . . . . . . 358 Keming Xie, Fang Wang, Gang Xie, Tsau Young Lin Application of Fuzzy Support Vector Machines in Short-Term Load Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Yuancheng Li, Tingjian Fang A Symbolic Approximate Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 Mazen El-Sayed, Daniel Pacholczyk
Table of Contents
XIII
Intuition in Soft Decision Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 Kankana Chakrabarty Ammunition Supply Decision-Making System Design Based on Fuzzy Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 Deyong Zhao, Xinfeng Wang, Jianguo Liu The Concept of Approximation Based on Fuzzy Dominance Relation in Decision-Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 Yunxiang Liu, Jigui Sun, Sheng-sheng Wang An Image Enhancement Arithmetic Research Based on Fuzzy Set and Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Liang Ming, Guihai Xie, Yinlong Wang A Study on a Generalized FCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 Jian Yu, Miin-shen Yang Fuzzy Multiple Synapses Neural Network and Fuzzy Clustering . . . . . . . . . 394 Kai Li, Houkuan Huang, Jian Yu On Possibilistic Variance of Fuzzy Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 398 Wei-Guo Zhang, Zan-Kan Nie
Granular Computing Deductive Data Mining. Mathematical Foundation of Database Mining . . 403 Tsau Young Lin Information Granules for Intelligent Knowledge Structures . . . . . . . . . . . . . 405 Patrick Doherty, Witold L / ukaszewicz, Andrzej Sza/las Design and Implement for Diagnosis Systems of Hemorheology on Blood Viscosity Syndrome Based on GrC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Qing Liu, Feng Jiang, Dayong Deng Granular Reasoning Using Zooming In & Out . . . . . . . . . . . . . . . . . . . . . . . . 421 T. Murai, G. Resconi, M. Nakata, Y. Sato A Pure Mereological Approach to Roughness . . . . . . . . . . . . . . . . . . . . . . . . . 425 Bo Chen, Mingtian Zhou
Neural Networks and Evolutionary Computing Knowledge Based Descriptive Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 430 J.T. Yao Genetically Optimized Rule-Based Fuzzy Polynomial Neural Networks: Synthesis of Computational Intelligence Technologies . . . . . . . . 437 Sung-Kwun Oh, James F. Peters, Witold Pedrycz, Tae-Chon Ahn
XIV
Table of Contents
Ant Colony Optimization for Navigating Complex Labyrinths . . . . . . . . . . 445 Zhong Yan, Chun-Wie Yuan An Improved Quantum Genetic Algorithm and Its Application . . . . . . . . . 449 Gexiang Zhang, Weidong Jin, Na Li Intelligent Generation of Candidate Sets for Genetic Algorithms in Very Large Search Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Julia R. Dunphy, Jose J. Salcedo, Keri S. Murphy Fast Retraining of Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 458 Dumitru-Iulian Nastac, Razvan Matei Fuzzy-ARTMAP and Higher-Order Statistics Based Blind Equalization . . 462 Dong-kun Jee, Jung-sik Lee, Ju-Hong Lee Comparison of BPL and RBF Network in Intrusion Detection System . . . 466 Chunlin Zhang, Ju Jiang, Mohamed Kamel Back Propagation with Randomized Cost Function for Training Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 H.A. Babri, Y.Q. Chen, Kamran Ahsan
Data Mining, Machine Learning, and Pattern Recognition Selective Ensemble of Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476 Zhi-Hua Zhou, Wei Tang A Maximal Frequent Itemset Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 Hui Wang, Qinghua Li, Chuanxiang Ma, Kenli Li On Data Mining for Direct Marketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Chuangxin Ou, Chunnian Liu, Jiajing Huang, Ning Zhong A New Incremental Maintenance Algorithm of Data Cube . . . . . . . . . . . . . 499 Hongsong Li, Houkuan Huang, Youfang Lin Data Mining for Motifs in DNA Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 David Bell, J.W. Guan Maximum Item First Pattern Growth for Mining Frequent Patterns . . . . 515 Hongjian Fan, Ming Fan, Bingzheng Wang Extended Random Sets for Knowledge Discovery in Information Systems . 524 Yuefeng Li Research on a Union Algorithm of Multiple Concept Lattices . . . . . . . . . . . 533 Zongtian Liu, Liansheng Li, Qing Zhang
Table of Contents
XV
A Theoretical Framework for Knowledge Discovery in Databases Based on Probabilistic Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 Ying Xie, Vijay V. Raghavan An Improved Branch & Bound Algorithm in Feature Selection . . . . . . . . . . 549 Zhenxiao Wang, Jie Yang, Guozheng Li Classification of Caenorhabditis Elegans Behavioural Phenotypes Using an Improved Binarization Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 Won Nah, Joong-Hwan Baek Consensus versus Conflicts – Methodology and Applications . . . . . . . . . . . . 565 Ngoc Thanh Nguyen, Janusz Sobecki Interpolation Techniques for Geo-spatial Association Rule Mining . . . . . . . 573 Dan Li, Jitender Deogun, Sherri Harms Imprecise Causality in Mined Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 Lawrence J. Mazlack Sphere-Structured Support Vector Machines for Multi-class Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 Meilin Zhu, Yue Wang, Shifu Chen, Xiangdong Liu HIPRICE-A Hybrid Model for Multi-agent Intelligent Recommendation . . 594 ZhengYu Gong, Jing Shi, HangPing Qiu A Database-Based Job Management System . . . . . . . . . . . . . . . . . . . . . . . . . . 598 Ji-chuan Zheng, Zheng-guo Hu, Liang-liang Xing Optimal Choice of Parameters for a Density-Based Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Wenyan Gan, Deyi Li An Improved Parameter Tuning Method for Support Vector Machines . . . 607 Yong Quan, Jie Yang Approximate Algorithm for Minimization of Decision Tree Depth . . . . . . . 611 Mikhail J. Moshkov Virtual Reality Representation of Information Systems and Decision Rules: An Exploratory Technique for Understanding Data and Knowledge Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 Julio J. Vald´es Hierarchical Clustering Algorithm Based on Neighborhood-Linked in Large Spatial Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619 Yi-hong Dong Unsupervised Learning of Pattern Templates from Unannotated Corpora for Proper Noun Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 Seung-Shik Kang, Chong-Woo Woo
XVI
Table of Contents
Approximate Aggregate Queries with Guaranteed Error Bounds . . . . . . . . 627 Seok-Ju Chun, Ju-Hong Lee, Seok-Lyong Lee Improving Classification Performance by Combining Multiple T AN Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 Hongbo Shi, Zhihai Wang, Houkuan Huang Image Recognition Using Adaptive Fuzzy Neural Network and Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635 Huanglin Zeng, Yao Yi SOM Based Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640 Yuan Jiang, Ke-Jia Chen, Zhi-Hua Zhou User’s Interests Navigation Model Based on Hidden Markov Model . . . . . . 644 Jing Shi, Fang Shi, HangPing Qiu Successive Overrelaxation for Support Vector Regression . . . . . . . . . . . . . . . 648 Yong Quan, Jie Yang, Chenzhou Ye Statistic Learning and Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 652 Xian Rao, Cun-xi Dong, Shao-quan Yang A New Association Rules Mining Algorithms Based on Directed Itemsets Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660 Lei Wen, Minqiang Li A Distributed Multidimensional Data Model of Data Warehouse . . . . . . . . 664 Youfang Lin, Houkuan Huang, Hongsong Li
Logics and Reasoning An Overview of Hybrid Possibilistic Reasoning . . . . . . . . . . . . . . . . . . . . . . . 668 Churn-Jung Liau Critical Remarks on the Computational Complexity in Probabilistic Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676 S.K.M. Wong, D. Wu, Y.Y. Yao Critical Remarks on the Maximal Prime Decomposition of Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682 Cory J. Butz, Qiang Hu, Xue Dong Yang A Non-local Coarsening Result in Granular Probabilistic Networks . . . . . 686 Cory J. Butz, Hong Yao, Howard J. Hamilton Probabilistic Inference on Three-Valued Logic . . . . . . . . . . . . . . . . . . . . . . . . 690 Guilin Qi Multi-dimensional Observer-Centred Qualitative Spatial-temporal Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694 Yi-nan Lu, Sheng-sheng Wang, Sheng-xian Sha
Table of Contents
XVII
Multi-agent Systems Architecture Specification for Design of Agent-Based System in Domain View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697 S.K. Lee, Taiyun Kim Adapting Granular Rough Theory to Multi-agent Context . . . . . . . . . . . . . 701 Bo Chen, Mingtian Zhou How to Choose the Optimal Policy in Multi-agent Belief Revision? . . . . . . 706 Yang Gao, Zhaochun Sun, Ning Li
Web Intelligence and Intelligent Systems Research of Atomic and Anonymous Electronic Commerce Protocol . . . . . 711 Jie Tang, Juan-Zi Li, Ke-Hong Wang, Yue-Ru Cai Colored Petri Net Based Attack Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715 Shijie Zhou, Zhiguang Qin, Feng Zhang, Xianfeng Zhang, Wei Chen, Jinde Liu Intelligent Real-Time Traffic Signal Control Based on a Paraconsistent Logic Program EVALPSN . . . . . . . . . . . . . . . . . . . . . . . . . . . 719 Kazumi Nakamatsu, Toshiaki Seno, Jair Minoro Abe, Atsuyuki Suzuki Transporting CAN Messages over WATM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724 Ismail Erturk A Hybrid Intrusion Detection Strategy Used for Web Security . . . . . . . . . . 730 Bo Yang, Han Li, Yi Li, Shaojun Yang Mining Sequence Pattern from Time Series Based on Inter-relevant Successive Trees Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734 Haiquan Zeng, Zhan Shen, Yunfa Hu
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
Flow Graphs and Decision Algorithms Zdzisáaw Pawlak University of Information Technology and Management ul. Newelska 6, 01-447 Warsaw, Poland and Chongqing University of Posts and Telecommunications Chongqing, 400065, P.R. China
[email protected] Abstract. In this paper we introduce a new kind of flow networks, called flow graphs, different to that proposed by Ford and Fulkerson. Flow graphs are meant to be used as a mathematical tool to analysis of information flow in decision algorithms, in contrast to material flow optimization considered in classical flow network analysis. In the proposed approach branches of the flow graph are interpreted as decision rules, while the whole flow graph can be understood as a representation of decision algorithm. The information flow in flow graphs is governed by Bayes’ rule, however, in our case, the rule does not have probabilistic meaning and is entirely deterministic. It describes simply information flow distribution in flow graphs. This property can be used to draw conclusions from data, without referring to its probabilistic structure.
1 Introduction The paper is concerned with a new kind of flow networks, called flow graphs, different to that proposed by Ford and Fulkerson [3]. The introduced flow graphs are intended to be used as a mathematical tool for information flow analysis in decision algorithms, in contrast to material flow optimization considered in classical flow network analysis. In the proposed approach branches of the flow graph are interpreted as decision rules, while the whole flow graph can be understood as a representation of decision algorithm. It is revealed that the information flow in flow graphs is governed by Bayes’ formula, however, in our case the rule does not have probabilistic meaning and is entirely deterministic. It describes simply information flow distribution in flow graphs, without referring to its probabilistic structure. Despite Bayes’ rule is fundamental for statistical reasoning, however it has led to many philosophical discussions concerning its validity and meaning, and has caused much criticism [1], [2]. In our setting, beside a very simple mathematical form, the Bayes’ rule is free from its mystic flavor.
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 1–10, 2003. © Springer-Verlag Berlin Heidelberg 2003
2
Z. Pawlak
This paper is a continuation of some authors’ ideas presented in [6], [7], [8], where the relationship between Bayes’ rule and flow graphs has been introduced and studied. From theoretical point of view the presented approach can be seen as a generalization of àukasiewicz’s ideas [4], who first proposed to express probability in logical terms. He claims that probability is a property of propositional functions, and can be replaced by truth values belonging to the interval . In the flow graph setting the truth values, and consequently probabilities, are interpreted as flow intensity in branches of a flow graph. Besides, it leads to simple computational algorithms and new interpretation of decision algorithms. The paper is organized as follows. First, the concept of a flow graph is introduced. Next, information flow distribution in the graph is defined and its relationship with Bayes’ formula is revealed. Further, simplification of flow graphs is considered and the relationship of flow graphs and decision algorithms is analyzed. Finally, statistical independence and dependency between nodes is defined and studied. All concepts are illustrated by simple tutorial examples.
2 Flow Graphs A flow graph is a directed, acyclic, finite graph G = (N, B, σ), where N is a set of nodes, B ⊆ N × N is a set of directed branches, σ : B → is a flow function. Input of x∈N is the set I(x) = {y∈N: ( y, x) ∈ B}; output of x∈N is defined as O(x) = {y∈N: ( x, y ) ∈ B} and σ ( x, y ) is called the strength of ( x, y ) . Input and output of a graph G, are defined as I(G) = {x∈N : I(x) = ∅}, O(G) = {x∈N : O(x) = ∅}, respectively. Inputs and outputs of G are external nodes of G; other nodes are internal nodes of G. With every node x of a flow graph G we associate its inflow and outflow defined as σ + ( x) = ∑ σ ( y, x) , σ − ( x ) = ∑σ ( x, y ), respectively. y∈I ( x )
y∈O ( x )
We assume that for any internal node x, σ + ( x ) = σ − ( x ) = σ ( x ) , where σ (x ) is a troughflow of x. An inflow and an outflow of G are defined as σ + (G ) = ∑ σ − ( x) , x∈I ( G )
σ − (G ) =
∑σ
x∈O ( G )
+
( x ) , respectively.
Obviously σ + (G ) = σ − (G ) = σ (G ) , where σ (G ) is a troughflow of G. Moreover, we assume that σ (G ) = 1. The above formulas can be considered as flow conservation equations [3].
Flow Graphs and Decision Algorithms
3
3 Certainty and Coverage Factors With every branch of a flow graph we associate the certainty and the coverage factors [9], [10]. σ ( x, y ) The certainty and the coverage of ( x, y ) are defined as cer ( x, y ) = and σ ( x) σ ( x, y ) FRY( x, y ) = , respectively, where σ(x) is the troughflow of x. Below some σ ( y) properties, which are immediate consequences of definitions given above are presented: ∑ cer ( x, y ) = 1 , (1) y∈O ( x )
∑ FRY( x, y ) = 1 ,
x∈I ( y )
cer ( x, y ) =
FRY ( x, y )σ ( y ) , σ ( x)
(2) (3)
cer ( x, y )σ ( x ) . (4) σ ( y) Obviously the above properties have a probabilistic flavor, e.g., equations (3) and (4) are Bayes’ formulas. However, these properties can be interpreted in deterministic way and they describe flow distribution among branches in the network. Notice that Bayes’ formulas given above have a new interpretation form which leads to simple computations and gives new insight into the Bayesian methodology. Example 1: Suppose that three models of cars x1, x2 and x3 are sold to three disjoint groups of customers z1, z2 and z3 through four dealers y1, y2, y3 and y4. Moreover, let us assume that car models and dealers are distributed as shown in Fig. 1. FRY ( x, y ) =
Fig. 1. Cars and dealers distribution
4
Z. Pawlak
Computing strength and coverage factors for each branch we get results shown in Figure 2.
Fig. 2. Strength, certainty and coverage factors
4 Paths and Connections A (directed) path from x to y, x ≠ y is a sequence of nodes x1,…,xn such that x1 = x, xn = y and (xi, xi+1) ∈B for every i, 1 ≤ i ≤ n-1. A path x…y is denoted by [x,y]. The certainty of a path [x1, xn] is defined as n −1
cer[ x1 , x n ] = ∏ cer ( xi , xi +1 ) ,
(5)
i =1
the coverage of a path [x1, xn] is
n −1
FRY[ x1 , x n ] = ∏ FRY ( x i , x i +1 ) ,
(6)
i =1
and the strength of a path [x, y] is (7) σ [x, y] = σ (x) cer[x, y] = σ (y) cov[x, y]. The set of all paths from x to y (x ≠ y) denoted < x, y > , will be called a connection from x to y. In other words, connection < x, y > is a sub-graph determined by nodes x and y. The certainty of connection < x, y > is
cer < x, y >=
∑ cer[ x, y ] ,
[ x , y ]∈< x , y >
the coverage of connection < x, y > is
(8)
Flow Graphs and Decision Algorithms
FRY < x, y >=
∑ FRY[ x, y ] ,
[ x , y ]∈< x , y >
5
(9)
and the strength of connection < x, y > is
σ < x, y >=
∑σ [ x , y ] .
[ x , y ]∈< x , y >
(10)
Let x, y (x ≠ y) be nodes of G. If we substitute the sub-graph < x, y > by a single branch ( x, y ) such that σ ( x, y ) = σ < x, y > , then cer ( x, y ) = cer < x, y > , FRY ( x, y ) = FRY < x, y > and σ (G ) = σ (G ′) , where G ′ is the graph obtained from G by substituting < x, y > by ( x, y ) . Example 1 (cont). In order to find how car models are distributed among customer groups we have to compute all connections among cars models and consumers groups. The results are shown in Fig. 3.
Fig. 3. Relation between car models and consumer groups
For example, we can see from the flow graph that consumer group z2 bought 21% of car model x1, 35% − of car model x2 and 44% − of car model x3. Conversely, for example, car model x1 is distributed among customer groups as follows: 31% cars bought group z1, 57% − group z2 and 12% − group z3.
5 Decision Algorithms With every branch ( x, y ) we associate a decision rule x→y, read if x then y; x will be referred to as a condition, whereas y – decision of the rule. Such a rule is characterized by three numbers, σ ( x, y ), cer ( x, y ) and cov( x, y ).
6
Z. Pawlak
Thus every path [ x1 , x n ] determines a sequence of decision rules x1→x2, x2→x3,…,xn-1→xn. From previous considerations it follows that this sequence of decision rules can be * * interpreted as a single decision rule x1x2…xn-1→xn, in short x → xn, where x = x1x2…xn-1, characterized by * cer(x , xn) = cer[x1, xn], (11) * cov(x , xn) = cov[x1, xn], (12) and (13) σ(x*, xn) = σ(x1) cer[x1, xn] = σ(xn) cov[x1, xn]. Similarly, every connection < x, y > can be interpreted as a single decision rule x → y such that: cer ( x, y ) = cer < x, y > , (14) cov ( x, y ) = cov < x, y > , (15) and (16) σ ( x, y ) = σ(x)cer < x, y > = σ(y)cov < x, y > . Let [x1, xn] be a path such that x1 is an input and xn an output of the flow graph G, respectively. Such a path and the corresponding connection < x1 , x n > will be called complete. The set of all decision rules xi1 xi2 ... xin −1 → xin associated with all complete paths
[ x i1 , xin ] will be called a decision algorithm induced by the flow graph. The set of all decision rules x i1 → x in associated with all complete connections
< x i1 , x in > in the flow graph, will be referred to as the combined decision algorithm determined by the flow graph. Example 1 (cont.). The decision algorithm induced by the flow graph shown in Fig. 2 is given below: Rule no. 1) 2) 3)
Rule x1 y1→z1 x1 y1→z2 x1 y1→z3
Strength 0.036 0.072 0.012
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . .
20) 21) 22)
x3 y4→z1 x3 y4→z2 x3 y4→z3
0.025 0.075 0.150
For the sake of simplicity we gave only some of the decision rules of the decision algorithm. Interested reader can easily complete all the remaining decision rules. Similarly we can compute certainty and coverage for each rule. Remark 1. Due to round-off errors in computations, the equalities (1)...(16) may not be satisfied exactly in these examples.
Flow Graphs and Decision Algorithms
7
The combined decision algorithm associated with the flow graph shown in Fig. 3, is given below: Rule no. 1) 2) 3) 4) 5) 6) 7) 8) 9)
Rule x1→z1 x1→z2 x1→z3 x2→z1 x2→z2 x2→z3 x3→z1 x3→z2 x3→z3
Strength 0.06 0.11 0.02 0.06 0.18 0.06 0.10 0.23 0.18
This decision algorithm can be regarded as a simplification of the decision algorithm given previously and shows how car models are distributed among customer groups.
6
Independence of Nodes in Flow Graphs
Let x and y be nodes in a flow graph G = (N, B, σ), such that (x,y)∈B. Nodes x and y are independent in G if (17) σ ( x, y ) = σ(x) σ(y). From (17) we get σ ( x, y ) = cer ( x, y ) = σ ( y ) , (18) σ ( x) and σ ( x, y ) = cov( x, y ) = σ ( x). (19) σ ( y) If (20) cer(x,y) > σ(y), or (21) cov ( x, y ) > σ(x), then y depends positively on x in G. Similarly, if (22) cer ( x, y ) < σ(y), or (23) cov ( x, y ) < σ(x), then y depends negatively on x in G. Let us observe that relations of independency and dependences are symmetric ones, and are analogous to that used in statistics.
8
Z. Pawlak
Example 1 (cont.). In flow graphs presented in Fig. 2 and Fig. 3 there are no independent nodes, whatsoever. However, e.g. nodes x1,y1 are positively dependent, whereas, nodes y1,z3 are negatively dependent. Example 2. Let X = {1,2,…,8}, x∈X and let a1 denote “x is divisible by 2”, a0 – “x is not divisible by 2”. Similarly, b1 stands for “x is divisible by 3” and b0 – “x is not divisible by 3”. Because there are 50% elements divisible by 2 and 50% elements not divisible by 2 in X, therefore we assume σ(a1) = ½ and σ(a0) = ½. Similarly, σ(b1) = ¼ and σ(b0) = ¾ because there are 25% elements divisible by 3 and 75% not divisible by 3 in X, respectively. The corresponding flow graph is presented in Fig. 4.
Fig. 4. Divisibility by “2” and “3”
The pair of nodes (a0,b0), (a0, b1), (a1,b0) and (a1,b1) are independent, because, e.g., cer(a0,b0) = σ(b0) (cov(a0,b0) = σ(a0)). Example 3. Let X = {1,2,…,8}, x∈X and a1 stand for “x is divisible by 2”, a0 – “x is not divisible by 2”, b1 – “x is divisible by 4” and b0 – “x is not divisible by 4”. As in the previous example σ(a0) = ½ and σ(a1) = ½; σ(b0) = ¾ and σ(b1) = ¼ because there are 75% dements not divisible by 4 and 25% divisible by 4 – in X. The flow graph associated with the above problem is shown in Fig. 5.
Fig. 5. Divisibility by “2” and “4”
The pairs of nodes (a0,b0), (a1,b0) and (a1,b1) are dependent. Pairs (a0,b0) and (a1,b1) are positively dependent, because cer(a0,b0) > σ(b0) (cov(a0,b0) > σ(a0)) and – cer(a1,b1) > σ(b1) (cov(a1,b1) > σ(a1)). Nodes (a1,b0) are negatively dependent, because cer(a1,b0) < σ(b0) (cov(a1,b0) < σ(a1)).
Flow Graphs and Decision Algorithms
9
For every branch (x,y)∈B we define a dependency factor η ( x, y ) defined as cer ( x, y ) − σ ( y ) cov( x, y ) − σ ( x) η ( x, y ) = = . (24) cer ( x, y ) + σ ( y ) cov ( x, y ) + σ ( x) Obviously −1 ≤ η ( x, y ) ≤ 1 ; η ( x, y ) = 0 if and only if cer ( x, y ) = σ ( y ) and
cov( x, y ) = σ ( x) ; η ( x, y ) = −1 if and only if cer ( x, y ) = cov( x, y ) = 0 ; η ( x, y ) = 1 if and only if σ ( y ) = σ ( x) = 0. It is easy to check that if η ( x, y ) = 0 , then x and y are independent, if
−1 ≤ η ( x, y ) < 0 then x and y are negatively dependent and if 0 < η ( x, y ) ≤ 1 then x and y are positively dependent. Thus the dependency factor expresses a degree of dependency, and can be seen as a counterpart of correlation coefficient used in statistics. For example, in the flow graph presented in Fig. 4 we have: η (a0 , b0 ) = 0,
η (a0 , b1 ) = 0, η (a1 , b0 ) = 0 and η (a1 , b1 ) = 0. However, in the flow graph shown in Fig. 5 we have η (a0 , b0 ) = 1 / 7, η (a1 , b0 ) = −1/ 5 and η (a1 , b1 ) = 1 / 3. The meaning of the above results is obvious.
7 Conclusions In this paper a relationship between flow graphs and decision algorithms has been defined and studied. It has been shown that the information flow in a decision algorithm can be represented as a flow in the flow graph. Moreover, the flow is governed by Bayes’ formula, however the Bayes’ formula has entirely deterministic meaning, and is not referring to its probabilistic nature. Besides, the formula has a new simple form, which essentially simplifies the computations. This leads to many new applications and also gives new insight into the Bayesian philosophy. Acknowledgement. Thanks are due to Professor Andrzej Skowron for critical remarks.
References 1.
2. 3.
Bernardo, J. M., Smith, A. F. M.: Bayesian Theory. Wiley series in probability and mathematical statistics. John Wiley & Sons, Chichester, New York, Brisbane, Toronto, Singapore (1994) Box, G.E.P., Tiao, G.C.: Bayesian Inference in Statistical Analysis. John Wiley and Sons, Inc., New York, Chichester, Brisbane, Toronto, Singapore (1992) Ford, L.R., Fulkerson, D.R.: Flows in Networks. Princeton University Press, Princeton. New Jersey
10
Z. Pawlak
àukasiewicz, J.: Die logischen Grundlagen der Wahrscheinlichkeitsrechnung. Kraków (1913). In: L. Borkowski (ed.), Jan àukasiewicz – Selected Works, North Holland Publishing Company, Amsterdam, London, Polish Scientific Publishers, Warsaw (1970) 5. Greco, S., Pawlak, Z., 6áRZL VNL, R.: Generalized Decision Algorithms, Rough Inference Rules, and Flow Graphs, in: J.J. Alpigini et al. (eds.), Lecture Notes in Artificial Intelligence 2475 (2002) 93−104 6. Pawlak, Z.: In Pursuit of Patterns in Data Reasoning from Data – The Rough Set Way. In: J.J. Alpigini et al. (eds.), Lecture Notes in Artificial Intelligence 2475 (2002) 1−9 7. Pawlak, Z.: Rough Sets, Decision Algorithms and Bayes’ Theorem. European Journal of Operational Research 136 (2002) 181−189 8. Pawlak, Z.: Decision Rules and Flow Networks (to appear) 9. Tsumoto, S., Tanaka, H.: Discovery of Functional Components of Proteins Based on PRIMEROSE and Domain Knowledge Hierarchy, Proceedings of the Workshop on Rough Sets and Soft Computing (RSSC-94), 1994: Lin, T.Y., and Wildberger, A.M. (eds.), Soft Computing, SCS (1995) 280−285. 10. Wong, S.K.M., Ziarko, W.: Algorithm for Inductive Learning. Bull. Polish Academy of Sciences 34, 5–6 (1986) 271−276 4.
The Quotient Space Theory of Problem Solving1 Ling Zhang1, 3 and Bo Zhang2, 3 1
Artificial Intelligence Institute, Anhui University, Hefei, Anhui, China 230039
[email protected] 2 Department of Computer Science & Technology, Tsinghua University, Beijing, China 100084
[email protected] 3 State Key Lab of Intelligent Technology & Systems
Abstract. The talk introduces a framework of quotient space theory of problem solving. In the theory, a problem (or problem space) is represented as a triplet, including the universe, its structure and attributes. The worlds with different grain size are represented by a set of quotient spaces. The basic characteristics of different grain-size worlds are presented. Based on the model, the computational complexity of hierarchical problem solving is discussed.
1 Introduction It’s well known that one of the basic characteristics in human problem solving is the ability to conceptualize the world at different granularities and translate from one abstraction level to the others easily, i.e., deal with them hierarchically [1]. In order to analyze and understand the above human behavior, we presented a quotient space model in [2]. The model we proposed was intended to describe the worlds with different grain-size easily and can be used for analyzing the hierarchical problem solving behavior expediently. Based on the model, we have obtained several characteristics of the hierarchical problem solving and developed a set of its approaches in heuristic search, path planning, etc. since the approaches can deal with a problem at different grain size so that the computational complexity may greatly be reduced. The model can also be used to deal with the combination of information obtained from different grain-size worlds (different views), i.e., information fusion. The theory and some recent well-known works on granule computing [3]-[11] have something in common such as the “grain sizes” of the world are partitioned by equivalence relations, a problem can be described under different grain sizes, etc. But we mainly focus on the relationship among the universes with different grain size and the translation among different knowledge bases rather than single knowledge base. In our model, a problem (or space, world) was represented as a triplet, including universe (space), space structure, and attributes, that is, the structure of the space was represented explicitly. In the following discussion, it can be seen that the space structure is of very important in the model. In this talk, we present the framework of 1
Supported by the National Natural Science Foundation of China Grant No. 60135010, the National Key Foundation R&D Project under Grant No. G1998030509
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 11–15, 2003. © Springer-Verlag Berlin Heidelberg 2003
12
L. Zhang and B. Zhang
our quotient space theory and the two basic characteristics of different grain-size worlds.
2 The Framework of Quotient Space Theory Problem Representation at Different Granularities The aim of representing a problem at different granularities is to enable the computer to solve the same problem at different grain-size hierarchically. Suppose that triplet (X, F, f) describes a problem space or simply space (X, F). Where X denotes the universe, F is the structure of universe X. f indicates the attributes (or features) of universe X. Suppose that X represents the universe with the finest grain-size. When we view the same universe X from a coarser grain size, we have a coarse-grained universe denoted by [X]. Then we have a new problem space ([X], [F], [f]). The coarser universe [X] can be defined by an equivalence relation R on X. That is, an element in [X] is equivalent to a set of elements, an equivalence class, in X. So [X] consists of the whole equivalence classes obtained by R. From F and f, we can defined the corresponding [F] and [f]. Then we have a new space ([X], [F], [f]) called a quotient space of (X, F, f). Assume R is the whole equivalent relations on X. Define a “ coarse-fine”, i.e., “0, the number of common features, is an integer and may gradually decrease as N increases. Ignoring all the samples that do not fit { f k }, the remaining N' samples are named a basic set of the samples. When N and N' are sufficiently large and { f k } tends to be no change as N increases, CFS is then said to have been stable and a concept, abstracted from the class of information samples, is thus formed. The CFS, { f k }, is called the intension of the concept whereas all the members of the class are called the extension of the concept. Otherwise, return to step 3, till CFS is stable. This is the simplest description about the possible production mechanism of a concept. Based on the procedure described above, it is easy to establish any concepts of interest and to distinguish one concept from another. This is really the base of knowledge creation. 5.2 The Production Mechanism of Utility Knowledge It is interesting to note before going to the next point that because content knowledge is more abstract than utility knowledge in nature we can only discuss the mechanism of utility knowledge first and then move to that of content knowledge. In most cases, a subject, or a system, should have certain objective (goal) and the utility of a piece of information toward the subject/system can then be judged in terms of the contribution the information may make to the implementation of the objective. Production Mechanism 5-2. The procedure of utility knowledge production from information may need the following steps:
Knowledge Theory: Outline and Impact
67
Clearly define the general objective, G, for the concerned subject. G may further be decomposed into a number of sub-objectives, G = {Gn } n =1 , that are easy to test to see whether any sub-objective is threatened or supported by any external stimulus. Inputting a piece of information X, the subject records and stores the formal description, D(X), of X including the description about its states and the manner with which the states vary. Observing the influence that the information makes to {Gn } : whether any subobjective suffers threatening or gained supporting from the information. The former N
+
−
indicates the positive utility, un , and the latter the negative, un , for all n. An average value of utility, u , may also need to consider. Set up the descriptor {D( X ), G; u} which means that the information with the formal descriptor D(X) provides a utility u to the subject whose objectives is G. Notice that {D( X ), G; u} is just another expression of utility knowledge. For any new information received, the utility knowledge can be obtained by comparing the formal description of the new information with D(X). As long as the new information has the formal feature similar to D(X), it will be regarded to have utility u to the subject having objective G in accord with {D( X ), G; u} . Otherwise, a loop from step 1 to 4 will be needed. This mechanism of utility knowledge production is feasible either for humans or artificial systems. The key steps are the definition of the objective the subject/system has and the calculation of the utility based on the comparison and analysis. 5.3 The Production Mechanism of Content Knowledge As is mentioned above, the content knowledge can only be produced after the production of formal and utility knowledge. The production mechanism of content knowledge can be stated as below. Production Mechanism 5-3. The procedure of content knowledge production may contain the steps: (1) The production of the related formal knowledge, K F , by the production mechanism 5-1; (2) The production of the related utility knowledge, KU , by the production mechanism 5-2; (3) The establishment of a linkage between K F and KU such that K C :
K F α KU
(4) The content knowledge is " K F
α KU "
in which
α
expresses the logic
relationship between K F and KU such as "be", "have", "do", and so on. (5) Name the meaning of the content knowledge by a term.
68
Y.X. Zhong
6 The Activation of Knowledge: The Relation between Knowledge and Intelligence [4] It is clear from the definition of intelligence in section 2 that the intelligence consists a number of basic component factors: information acquisition, knowledge creation, strategy formation, and strategy execution, etc. We have discussed the issues of knowledge creation in Section 5 and here we would like to discuss the issue of the formation of strategy from knowledge. This is also termed knowledge activation. How can knowledge be practically activated into intelligent strategy? Suppose that the information about the present state of the problem has been stored in database already and that the knowledge needed for solving the problem is also stored in knowledge base and rule base. Moreover, the goal that the problem solving is seeking for has been set up. Knowledge Activation 6-1. The algorithm of knowledge activation can then be described as below. Set up a threshold ε 0 that is an allowable error of goal seeking. Establish a measure that indicates the error ε between the problem state and the goal state. Select an applicable rule (a rule whose conditions match with the present state of the problem) from rule base and apply it to the database, producing a new state of the problem in database. If the error between the new state and the goal state is larger than ε 0 , select another rule that would be able to make error decrease. As long as the error is decreased but yet still larger than ε 0 , continue the rule selection and application. When ε ≤ ε 0 , the algorithm terminates, the problem is solved The strategy is the sequence of the successful rule selections for the problem solving. Otherwise got back to step (2) The algorithm shows that the intelligent strategy for problem solving can be formed through the utilization of the knowledge that is related to the problem, the environment (the constraints of the problem solving) and the goal, including the contents previously stored in knowledge base and the rule base.
7 The Impact: A Unified Theory of Information-Knowledge-Intelligence It is a pity that there has been lack of a systematic theory of knowledge till the present time. We do have "knowledge engineering" for quite a time but it does not touch the nucleus of knowledge theory, the essential linkage among information, knowledge, and intelligence. Nevertheless, the world is more and more urgently demanding the knowledge theory as the knowledge-based economy advanced rapidly everywhere. We have presented the fundamentals of knowledge theory from section 2 though to section 4 of the paper. We have also delivered the main body of the knowledge theory in sections 5 and 6 that showed that knowledge can be refined from information
Knowledge Theory: Outline and Impact
69
through induction mechanism and intelligence can be formed from knowledge through activation mechanism. By jointly employing the two mechanisms, a kernel of the unified theory of information, knowledge, and intelligence may be established. This is the impact that knowledge theory may have. It is necessary to mention that all the results presented in the paper are of course a preliminary work on knowledge theory. Many extensive issues of knowledge theory remain open.
References [1] C. E. Shannon, The Mathematical Theory of Communication, BSTJ, vol.27, 1948, p. 379– 423, p. 632–656 [2] A. Feigenbaum et al., Handbook of Artificial Intelligence, Vol.1, William Kaufmann.Inc., 1981 [3] Y. X. Zhong, Principles of Information Science, UPT Press, Beijing, 1988(I), 1996(II) [4] Y. X. Zhong, A Framework of Knowledge Theory, Journal of China Electronics, Vol. 28, No.5, 2000 [5] J. Aczel, Lectures on Functional Equations and Their Applications, Academic Press, 1966. [6] G. H. Hardy, et al, Inequalities, Cambridge University Press, London, 1973
A Rough Set Paradigm for Unifying Rough Set Theory and Fuzzy Set Theory Lech Polkowski Polish–Japanese Institute of Information Technology, Koszykowa 86, 02008 Warsaw, Poland Department of Mathematics and Computer Science, University of Warmia and ˙ lnierska 14a, 10561 Olsztyn, Poland Mazury, Zo Lech.Polkowski,
[email protected] To Professors Zdzislaw Pawlak and Lotfi A. Zadeh Abstract. In this plenary address, we would like to discuss rough inclusions defined in Rough Mereology, a joint idea with A. Skowron, as a basis for common models for rough as well as fuzzy set theories. We would like to justify the point of view that tolerance (or, similarity) is the leading motif common to both theories and in this area paths between the two lie. Keywords. Rough set theory, fuzzy set theory, rough mereology, rough inclusion.
1
Introduction
We give here some basic insight into the two theories. 1.1
Rough Sets: Basic Ideas
Rough Set Theory begins with the idea (cf.[6],[5]) of an approximation space, understood as a universe U together with a family R of equivalence relations on U(knowledge base). Given a sub–family S ⊆ R, the equivalence relation S = S induces a partition PS of U into equivalence classes [x]S of the relation S. In terms of PS , concept approximation is possible; a concept is a subset X ⊆ U . There are two cases: (1) a concept X is S–exact: X = {[x]S : [x]S ⊆ X}; (2) otherwise, X is said to be S − −rough. In case (2), the idea of approximation comes useful (cf. [5]). Two exact sets, approximating X from below and from above, are: (low) SX = {[x]S : [x]S ⊆ X} (1) (upp) SX = {[x]S : [x]S ∩ X = ∅} Then clearly: (1) SX ⊆ X ⊆ SX; (2) SX (resp. SX) is the largest (resp. the smallest) S–exact set contained in (resp. containing) X. G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 70–77, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Rough Set Paradigm for Unifying Rough Set Theory
71
Sets (concepts) with identical approximations may be identified: consider an equivalence relation ≈S defined as follows (cf.[5]) X ≈S Y ⇔ [S]X = SY ∧ SX = SY
(2)
This is clearly an equivalence relation; let ConceptsS denote the set of these classes. Then for x, y ∈ ConceptsS , we have x = y ⇔ ∀u ∈ U.φu (x) = φu (y) ∧ ψu (x) = ψu (y)
(3)
where for x = [X]≈S we have φu (x) = 1 in case [u]S ⊆ X, otherwise φu (x) = 0; similarly, ψu (x) = 1 in case [u]S ∩ X = ∅, otherwise ψu (x) = 0. The formula (3) witnesses the leibnizian indiscernibility in ConceptS : entities are distinct if and only if they are discerned by at least one of available functionals. The idea of indiscernibility is one of the most fundamental in Rough Set Theory (cf. [5]). Other fundamental notions are derived from the observation on complexity of the generation of S: one may ask whether there is some T ⊂ S such that (*) S = T ; in case the answer is positive one may search for a minimal with respect to inclusion subset U ⊂ S satisfying (*). Such a subset is said to be an S–reduct. Letus observe that U ⊂ S is an S–reduct if and only if for each R ∈ U we have that (U \ {R}) = U; in this case we say that U is independent. These ideas come afore most visibly in case of knowledge representation in the form of information systems cf.[5]. An information system is a universe U along with a set A of attributes each of which is a mapping from U into a value set V . Clearly, each attribute a ∈ A does induce a relation of a–indiscernibility IN D(a) defined as follows: xIN D(a)y ⇔ a(x) = a(y), x, y ∈ U
(4)
The family {Ind(a) : a ∈ A} is a knowledge base and for each B ⊆ A, the relation IN D(B) = {IN D(a) : a ∈ B} is defined. The notions of B– indiscernibility, B–reduct, independence are defined as in general case cf. [5]. 1.2
Fuzzy Sets: Basic Notions
A starting point for Fuzzy Set Theory is that of a fuzzy set cf. [12]. Fuzzy sets come as a generalization of the usual mathematical idea of a set: given a universe U , a set X in U is expressed by means of its characteristic function χX : χX (u) = 1 in case u ∈ X, χX (u) = 0, otherwise. A fuzzy set X is defined by allowing χX to take values in the interval [0, 1]. Thus, χX (u) ∈ [0, 1] is a measure of the degree to which u is in X. Once fuzzy sets are defined, one may define a fuzzy algebra of sets by defining operators responsible for the union, the intersection, the complement of fuzzy sets. Usually, these are defined by selecting a t–norm, a t–conorm, and a negation functions where a t–norm T (x, y) is a function allowing the representation (cf.[10], Ch. 14): T (x, y) = g(f (x) + f (y)) (5)
72
L. Polkowski
where the function f : [0, 1] → [0, +∞) in (5) is continuous decreasing on [0, 1] and g is the pseudo–inverse to f (i.e. g(u) = 0 in case u ∈ [0, f (1)], g(u) = f −1 (u) in case u ∈ [f (1), f (0)], and g(u) = 1 in case u ∈ [f (0), +∞)). A t–conorm C is induced by a t–norm T via the formula C(x, y) = 1 − T (1 − x, 1 − y). A negation n : [0, 1] → [0, 1] is a continuous decreasing function such that n(n(x)) = x. An important example of a t–norm is the Lukasiewicz product ⊗(x, y) = max{0, x+y−1} cf. [8]; we recall also the Menger product P rod(x, y) = x · y. 1.3
Direct Bridging of Rough and Fuzzy Universes
It is natural to create a fuzzy universe within a rough one: given an information system (U, A) and a set B ⊆ A of attributes, and a concept X ⊆ U , a rough membership function mB,X on U is defined as follows cf. [7]: mB,X (u) =
|X ∩ [u]B | |[u]B |
(6)
It may be noticed that (1) in case X is a B–exact set, mB,X is the characteristic function of X i.e. X is perceived as a crisp set (2) mB,X is a piece-wise constant function, constant on classes of IN D(B) (3) from mB,X , rough set approximations are reconstructed via: BX = {u ∈ U : mB,X (u) = 1} BX = {u ∈ U : mB,X (u) > 0
(7)
In this case, as witnessed by (7) the rough and fuzzy notions of necessity and possibility coincide.
2
Rough vs. Fuzzy: Logical Opposition
With each concept X ⊆ U , and a set of attributes B, two approximations BX, BX are defined; for u ∈ U , the sentence u ∈ X may acquire one of the three logical values: 1 in case u ∈ BX (certainty), 0 in case u ∈ / BX (impossibility), and ? in case u ∈ BX \ BX (don’t know). It follows that rough sets are related to 3–valued logic. 2.1
3–Valued Logic
It was proposed in [3] as a logic with truth–values 0, 1, 12 in which the implication functor Cpq and the negation functor N p were defined by means of: C(x, y) = min{1, 1 − x + y} N (x) = 1 − x These values may be seen in a table.
(8)
A Rough Set Paradigm for Unifying Rough Set Theory
73
Table 1. 3–logic of L ukasiewicz C 0 1 12 N 0 111 1 1 0 1 12 0 1 1 1 1 12 2 2
Example 1. Truth table for 3–logic The L ukasiewicz logic was completely axiomatized by M. Wajsberg (cf. [10], Ch. 4) by means of the axiom schemes: W1 Cq(Cpq) W2 C(Cpq)C((Cqr)(Cpr)) W3 C(C(C(pN p)p)p) W4 C((CN qN p)(Cpq)) Formulae W 1 − W 4 may be translated into an algebra W by first identifying formulae α ≈ β of 3–logic in case α ⇔ β is a theorem of the 3–logic, next considering classes [α]≈ of formulae, and then defining operations → and N , and constant 1 on the set of classes as follows: 1. [α]≈ → [β]≈ = [Cαβ]≈ . 2. N [α]≈ = [N α]≈ . 3. [α]≈ = 1 where α is a theorem. Then W 1 − W 4 are rendered as follows in the resulting algebra W (the Wajsberg algebra): w1 x → (y → x) = 1 w2 (x → y) → ((y → z) → (x → z)) = 1 w3 ((x → N x) → x) → x) = 1 w4 (N x → N y) → (y → x) = 1 w5 if 1 → x = 1 then x = 1 w6 x → y = 1 and y → x = 1 imply x = y.
Now, let us look at concepts in a given universe U as defined by means of attributes in A: a concept X is represented via its approximations (I = AX, C = AX) (I for interior, C for closure). Letting E = U \ C (E for exterior), we represent a concept as a pair (I, E); always, I ∩ E = ∅. In case I ∪ E = U , the represented concept is exact; otherwise it is rough and then manifestly, no {x} ⊆ D \ I is exact. In the family C of pairs (I, E) as above, one may introduce operations →, N following Becchio and Pagliani (cf. [10], Ch. 12) (to simplify, we write −X for U \ X): N (I, E) = (E, I)//(I1 , E1 ) → (I2 , E2 ) = (E1 ∪ I2 ∪ −I1 ∩ −E2 , I1 ∩ E2 )
(9)
Then (cf. [10], pp. 361 ff.): Proposition 1. The algebra (C, →, N, 0 = (∅, U ), 1 = (U, ∅)) does satisfy w1.– w6. Proof. A proof is in verifying directly that w1.–w6. are observed under the rough set interpretation. For instance, with x = (I1 , E1 ), y = (I2 , E2 ), the formula w1. yields on the left–hand side x → (y → x) = (E1 ∪ E2 ∪ I1 ∪ −I2 ∩ −E1 ∪ −I1 ∩ −I2 ∪ −I1 ∩ −E1 , ∅) = (U, ∅) = 1.
74
L. Polkowski
Summing up the above, it turns out that 3–logic describes rough sets in logical terms. The corresponding logic for fuzzy sets is clearly [0,1]–valued logic. Functors C, N are expressed by (8) and a set of axiom formulae was conjectured by L ukasiewicz [4] and proved complete by Rose and Rosser (cf. [10], Ch. 13, for references). [0,1]–valued logic enters fuzzy logic in a way envisioned by Goguen (cf. [10] for references) i.e. given a fuzzy set of axioms and fuzzy derivation rules, formulae are produced along with their degrees of truth. Thus, a fuzzy derivation rule is two–fold, in its syntactic part giving rules for new formula forming and in its semantic parts producing the truth degree of the new formula from truth degrees of its parents. As in [8], fuzzy logic is interpreted in the L ukasiewicz lattice L = ([0, 1], ∨, ∧, ⊗, −→, 0, 1) where ∨(x, y) = max{x, y}, ∧(x, y) = min{x, y}, ⊗(x, y) = max{0, x + y − 1}, x −→ y = min{1, 1 − x + y}. A fuzzy set A of axiom formulae is defined; derivation rules are fuzzy detachx,y α x ment: ( α,α⇒β , ⊗(x,y) ) and lifting: ( a⇒α , a−→x ) for a ∈ (0, 1). β The set F of formulae of fuzzy logic is the smallest set containing a set V of variables, the set {a : a ∈ (0, 1) of constants and closed under functors ∨, ∧, ⇒, † interpreted as respectively, ∨, ∧, −→, ⊗. The set A consists among others of theses of [0, 1]–logic, formulae like a ∧ b ⇒ (a ∧ b), a ∨ b ⇒ (a ∨ b), a ⇒ b ⇒ (a −→ b), α ⇒ 1, α † 1 = 1. The fuzzy set A is given via its membership function χA : F → L outlined as follows: χA (a) = a χA (a † b) = ⊗(a, b) (10) χA (a ⇒ b) = a −→ b Otherwise, χA (α) is 1 for α a thesis of [0,1]–logic, and 0 for all other cases of α. Given this, for each fuzzy set X : F → L , we find the smallest set CS (X) : F →L such that (1) A ∨ X ≤ CS (X) (2) for each derivation rule R = (R1 , R2 ): χX (R1 (x1 , ..., xk )) ≥ R2 (χX (x1 ), ..., χX (xk )) and we say that a formula α is a syntactic consequence of X in degree at least a = χCS (X) (α), in symbols X a α. Analogously a semantics is introduced: the semantics is the family E of fuzzy homomorphisms T : F → L i.e. T (α ‡ β) = T (α) ⊕ T (β) where ‡, ⊕ = ∨, ∨, ∧, ∧, †, ⊗, ⇒, −→, respectively. Given E, the semantic consequence CE (X) = {Y ∈ E : X ≤ Y} : L F →L F is defined. We say that α is a semantic consequence to X in degree at least a in case χCE (X) (α) ≥ a, in symbols X |=a α. The basic result of [8] is that CS = CE i.e. fuzzy logic is complete. It is noteworthy that ⊗ is the unique up to isomorphism functor making this logic complete (loc.cit). From the above short discussion we see that rough and fuzzy worlds are opposite ends of many–valued logical scale. Bridging them will be the task in next section.
A Rough Set Paradigm for Unifying Rough Set Theory
3
75
A Common Extension to Rough and Fuzzy: Rough Mereological Similarity
We introduce a reasoning mechanism whose distinct facets correspond to rough respectively fuzzy approaches to reasoning. We outline its main ideas. 3.1
Classical Ideas of Mereology
The basic notion is that of part relation on a universe U cf. [2]; in symbols: xπy reads x is a part of y. It should satisfy the following: p1 xπy ∧ yπz ⇒ xπz; p2 ¬(xπx) The notion of an element, x el y, is the following: x el y ⇔ xπy ∨ x = y. Thus, x el y ∧ y el x is equivalent to x = y. The relation el is an ordering of the universe U . In order to make a non–empty concept M ⊆ U into an entity, the class operator Cls is used cf. [2]. The definition of the class of M , in symbols Cls(M ), is as follows: c1 x ∈ M ⇒ x el Cls(M ) c2 x el Cls(M ) ⇒ ∃y, z.y el x ∧ y el z ∧ z ∈ M
(11)
Condition c1 in (11) includes all members of M into Cls(M ) as elements; condition c2 demands that each element of Cls(M ) has an element in common with a member of M (compare this with the definition of the union of a family of sets). We demand also that Cls(M ) be unique. From this demand it follows that cf. [2]: (Inf ) [∀y.(y el x ⇒ ∃w.w el y ∧ w el z)] ⇒ x el z (12) (Inf) is a useful rule for recognizing that x el z. 3.2
Rough Mereology
Given a mereological universe (U, π), we introduce cf. [11], on U × U × [0, 1] a relation µ(x, y, r) read x is a part of y to degree at least r. We will write rather more suggestively xµr y, calling µ a rough inclusion. We require of µ the following: r1 xµ1 x r2 xµ1 y ⇔ x el y r3 xµ1 y ⇒ ∀z, r.(zµr x ⇒ zµr y) r4 xµr y ∧ s < r ⇒ xµs y
(13)
Informally, r2 ties rough inclusions to mereological underlying universes, r3 does express monotonicity (a bigger entity cuts a bigger part of everything), r4 says that a part in degree r is a part to any lesser degree.
76
3.3
L. Polkowski
Basic Examples
In addition to the rough membership function, we consider the following rough inclusions. Gaussian linear (Gl) rough inclusion. Given (U, A), we define DISA (x, y) = {a ∈ A; a(x) = a(y)} and then yµr x if and only if e(−Σa∈DISA (x,y) wa ≥ r where r x,xµs z wa ∈ (0, ∞) is a weight attached to a. Then cf. [9], the rule yµyµ holds. rs z L ukasiewicz rough inclusion. In notation of the above, we let yµr x if and A (x,y)| r x,xµs z only if 1 − |DIS|A| ≥ r. Then cf. [9], the rule yµ yµ⊗(r,s) z holds. The reader will find in [9] a detailed discussion of the two inclusions, a part of this address. Gl and Luk satisfy the additional requirement: r y,yµs z r5. there exists a t–norm f such that the inference rule xµ xµf (r,s) z holds. We say that µ is an f –rough inclusion. 3.4
Rough World from Rough Inclusions
Assume a rough inclusion µ and a property M as above; given an entity x, we define its lower M, µ–approximation xM as follows: xM = Cls({y ∈ Y : y ∈ M ∧ yµ1 x})
(14)
Then we have as a direct consequence of (Inf) in (12): (L1) xM el x. Again applying (Inf), we obtain: (L2) xM el xM M hence xM = xM M . We will say that x is M –exact in case there exists M0 ⊆ M such that x = Cls(M0 ). Then we have: (L3) x = xM if and only if x is M –exact. Indeed, if x = xM then x = Cls({y ∈ M : yµ1 x}). The converse follows from (Inf) directly. Similarly, we introduce the upper M, µ–approximation xM : xM = Cls({y ∈ U : y ∈ M ∧ ∃r > 0.yµr x})
(15)
We say that M is an rm–covering in case the following holds: (rmcov) ∀y. y el x ⇒ ∃w.y el w ∧ w ∈ M ∧ ∃r > 0.wµr x
(16)
Then, again by (Inf): (U1) if M is an rm–covering, then x el xM . We say that M is an rm–partition in case the following holds: 1. yµr z ∧ y, z ∈ M ⇒ r = 0 2. y ∈ M ∧ M0 ⊆ M ∧ ∀z ∈ M0 .yµ0 z ⇒ yµ0 Cls(M0 ). Then: (U2) if M is an rm–covering and an rm–partition then xM = xM M . Again, using (inf) we get: (U3) for an rm–partition and rm–covering M , x = xM if and only if x = Cls(M0 ) for some non-empty M0 ⊆ M . Thus: (LU) xM = x = xM if and only if x is M –exact.
A Rough Set Paradigm for Unifying Rough Set Theory
3.5
77
Fuzzy World from Rough Inclusions
Because of yµ1 x equivalent to y el x, we may interpret yµr x as a statement of fuzzy membership; we will write µx (y) = r to stress this interpretation. Thus, rough inclusions induce globally a family of fuzzy sets {X : X = Cls(V ) ∧ ∅ = V ⊆ U } with fuzzy membership functions µX . We assume r5 additionally for µ with a t–norm f . Let us consider a relation τ on U defined as follows: xτr y ⇔ xµr y ∧ yµr x
(17)
τr is for each r a tolerance relation. Also: 1. xτ1 x 2. xτr y ⇔ yτr x 3. xτr y ∧ yτs z ⇒ xτf (r,s) z. Thus, τ is the f –fuzzy similarity cf. [13]. From this a number of results on fuzzy equivalences and partitions cf. [10] may be derived along the lines indicated.
References 1. L. Borkowski (ed.). Jan L ukasiewicz. Selected Works. North Holland – Polish Sci. Publ., Amsterdam – Warsaw, 1970. 2. S. Le´sniewski. On the foundations of mathematics. Topoi, 2: 7–52, 1982. 3. J. L ukasiewicz. Farewell lecture, Warsaw Univ., March 1918. In: [1], pp. 84–86. 4. J. L ukasiewicz and A. Tarski. Untersuchungen ueber den Aussagenkalkuls. In: [1], pp. 130–152. 5. Z. Pawlak. Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer, Dordrecht, 1992. 6. Z. Pawlak. Rough sets. Intern. J. Comp. Inf. Sci., 11 (1982), pp. 341–356. 7. Z. Pawlak and A. Skowron. Rough membership functions. In R. R. Yager, M. Fedrizzi, and J. Kacprzyk, editors, Advances in the Dempster-Schafer Theory of Evidence, 251–271, Wiley, New York, 1994. 8. J. Pavelka. On fuzzy logic I,II,III. Zeit. Math. Logik Grund. Math., 25, 1979, pp. 45–52, 119–134, 447–464. 9. L. Polkowski. Rough Mereology. A Survey of new results... These Proceedings. 10. L. Polkowski. Rough Sets. Mathematical Foundations. Physica, Heidelberg, 2002. 11. L. Polkowski, A. Skowron. Rough mereology: a new paradigm for approximate reasoning. International Journal of Approximate Reasoning, 15(4): 333–365, 1997. 12. L. A. Zadeh. Fuzzy sets. Information and Control, 8 (1965), pp. 338–353. 13. L. A. Zadeh. Similarity relations and fuzzy orderings. Information Sciences, 3 (1971), pp. 177–200.
Extracting Structure of Medical Diagnosis: Rough Set Approach Shusaku Tsumoto Department of Medical Informatics, Shimane Medical University, School of Medicine, 89-1 Enya-cho Izumo City, Shimane 693-8501 Japan
[email protected] Abstract. One of the most important problems on rule induction methods is that they cannot extract rules, which plausibly represent experts’ decision processes. It is because rule induction methods induce probabilistic rules that discriminates between a target concept and other concepts, assuming that all the concepts are on the same level. However, medical experts assume that all the concepts of diseases are belonging to the different level of hierarchy. In this paper, the characteristics of experts’ rules are closely examined and a new approach to extract plausible rules is introduced, which consists of the following three procedures. First, the characterization of decision attributes (given classes) is extracted from databases and the concept hierarchy for given classes is calculated. Second, based on the hierarchy, rules for each hierarchical level are induced from data. Then, for each given class, rules for all the hierarchical levels are integrated into one rule. The proposed method was evaluated on a medical database, the experimental results of which show that induced rules correctly represent experts’ decision processes.
1
Introduction
One of the most important problems in data mining is that extracted rules are not easy for domain experts to interpret. One of its reasons is that conventional rule induction methods [6] cannot extract rules, which plausibly represent experts’ decision processes [8]: the description length of induced rules is too short, compared with the experts’ rules. For example, rule induction methods, including AQ15 [2] and PRIMEROSE [8], induce the following common rule for muscle contraction headache from databases on differential diagnosis of headache: [location = whole] ∧[Jolt Headache = no] ∧[Tenderness of M1 = yes] → muscle contraction headache.
This rule is shorter than the following rule given by medical experts. [Jolt Headache = no] ∧([Tenderness of M0 = yes] ∨[Tenderness of M1 = yes] ∨[Tenderness of M2 = yes]) ∧[Tenderness of B1 = no] ∧[Tenderness of B2 = no] ∧[Tenderness of B3 = no] ∧[Tenderness of C1 = no] ∧[Tenderness of C2 = no] ∧[Tenderness of C3 = no] ∧[Tenderness of C4 = no] → muscle contraction headache
where [Tenderness of B1 = no] and [Tenderness of C1 = no] are added. G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 78–88, 2003. c Springer-Verlag Berlin Heidelberg 2003
Extracting Structure of Medical Diagnosis: Rough Set Approach
79
One of the main reasons why rules are short is that these patterns are generated only by one criteria, such as high accuracy or high information gain. The comparative studies[8,9] suggest that experts should acquire rules not only by one criteria but by the usage of several measures. Those characteristics of medical experts’ rules are fully examined not by comparing between those rules for the same class, but by comparing experts’ rules with those for another class[8]. For example, the classification rule for muscle contraction headache given in Section 1 is very similar to the following classification rule for disease of cervical spine: [Jolt Headache = no] ∧([Tenderness of M0 = yes] ∨[Tenderness of M1 = yes] ∨[Tenderness of M2 = yes]) ∧([Tenderness of B1 = yes] ∨[Tenderness of B2 = yes] ∨[Tenderness of B3 = yes] ∨[Tenderness of C1 = yes] ∨[Tenderness of C2 = yes] ∨[Tenderness of C3 = yes] ∨[Tenderness of C4 = yes]) → disease of cervical spine
The differences between these two rules are attribute-value pairs, from tenderness of B1 to C4. Thus, these two rules can be simplified into the following form: a1 ∧ A2 ∧ ¬A3 → muscle contraction headache a1 ∧ A2 ∧ A3 → disease of cervical spine The first two terms and the third one represent different reasoning. The first and second term a1 and A2 are used to differentiate muscle contraction headache and disease of cervical spine from other diseases. The third term A3 is used to make a differential diagnosis between these two diseases. Thus, medical experts first select several diagnostic candidates, which are very similar to each other, from many diseases and then make a final diagnosis from those candidates. In this paper, the characteristics of experts’ rules are closely examined and a new approach to extract plausible rules is introduced, which consists of the following three procedures. First, the characterization of decision attributes (given classes) is extracted from databases and the concept hierarchy for given classes is calculated. Second, based on the hierarchy, rules for each hierarchical level are induced from data. Then, for each given class, rules for all the hierarchical levels are integrated into one rule. The proposed method was evaluated on a medical database, the experimental results of which show that induced rules correctly represent experts’ decision processes.
2 2.1
Rough Set Theory and Probabilistic Rules Rough Set Notations
In the following sections, we use the following notations introduced by GrzymalaBusse and Skowron[7], which are based on rough set theory[3]. These notations are illustrated by a small database shown in Table 1, collecting the patients who complained of headache.
80
S. Tsumoto
Let U denote a nonempty, finite set called the universe and A denote a nonempty, finite set of attributes, i.e., a : U → Va for a ∈ A, where Va is called the domain of a, respectively.Then, a decision table is defined as an information system, A = (U, A ∪ {d}). For example, Table 1 is an information system with U = {1, 2, 3, 4, 5, 6} and A = {age, location, nature, prodrome, nausea, M 1} and d = class. For location ∈ A, Vlocation is defined as {occular, lateral, whole}. The atomic formulae over B ⊆ A ∪ {d} and V are expressions of the form [a = v], called descriptors over B, where a ∈ B and v ∈ Va . The set F (B, V ) of formulas over B is the least set containing all atomic formulas over B and closed with respect to disjunction, conjunction and negation. For example, [location = occular] is a descriptor of B. For each f ∈ F (B, V ), fA denote the meaning of f in A, i.e., the set of all objects in U with property f , defined inductively as follows. 1. If f is of the form [a = v] then, fA = {s ∈ U |a(s) = v} 2. (f ∧ g)A = fA ∩ gA ; (f ∨ g)A = fA ∨ gA ; (¬f )A = U − fa By the use of the framework above, classification accuracy and coverage, or true positive rate is defined as follows. Definition 1. Let R and D denote a formula in F (B, V ) and a set of objects which belong to a decision d. Classification accuracy and coverage(true positive rate) for R → d is defined as: αR (D) =
|RA ∩ D| |RA ∩ D| , and κR (D) = , |RA | |D|
where |˙|, αR (D), κR (D) and P(S) denote the cardinality of a set, a classification accuracy of R as to classification of D and coverage (a true positive rate of R to D), respectively. Finally, we define partial order of equivalence as follows: Definition 2. Let Ri and Rj be the formulae in F (B, V ) and let A(Ri ) denote a set whose elements are the attribute-value pairs of the form [a, v] included in Ri . If A(Ri ) ⊆ A(Rj ), then we represent this relation as: Ri Rj . 2.2
Probabilistic Rules
According to the definitions, probabilistic rules with high accuracy and coverage are defined as: α,κ
R → d s.t. R = ∨i Ri = ∨ ∧j [aj = vk ], αRi (D) ≥ δα and κRi (D) ≥ δκ , where δα and δκ denote given thresholds for accuracy and coverage, respectively.
Extracting Structure of Medical Diagnosis: Rough Set Approach
3
81
Characterization Sets
In order to model medical reasoning, a statistical measure, coverage plays an important role in modeling, which is a conditional probability of a condition (R) under the decision D(P (R|D)). Let us define a characterization set of D, denoted by L(D) as a set, each element of which is an elementary attribute-value pair R with coverage being larger than a given threshold, δκ . That is, Definition 3. Let R denote a formula in F (B, V ). Characterization sets of a target concept (D) is defined as: Lδκ (D) = {R|κR (D) ≥ δκ } Then, three types of relations between characterization sets can be defined as follows: Independent type: Lδκ (Di ) ∩ Lδκ (Dj ) = φ, Boundary type: Lδκ (Di ) ∩ Lδκ (Dj ) = φ, and Positive type: Lδκ (Di ) ⊆ Lδκ (Dj ). All three definitions correspond to the negative region, boundary region, and positive region, respectively, if a set of the whole elementary attribute-value pairs will be taken as the universe of discourse. We consider the special case of characterization sets in which the thresholds of coverage is equal to 1.0. That is, L1.0 (D) = {Ri |κRi (D) = 1.0} Then, we have several interesting characteristics. Theorem 1. Let Ri and Rj two formulae in L1.0 (D) such that Ri Rj . Then, αRi ≤ αRj . Proof. Since κRi and κRj are 1.0, Ri ∩ D = D and Rj ∩ D = D. From the |D| A ∩D| definition of accuracy, αR (D) = |R|R = |R . Since Ri Rj , |RiA | ≥ |Rj A |. A| A| Therefore, αRi (D) =
|D| |D| ≤ = αRj (D) RiA Rj A
Thus, when we collect the formulae whose values of coverage are equal to 1.0, the sequence of conjunctive formulae corresponds to the sequence of increasing chain of accuracies. Since κR (D) = 1.0 means that the meaning of R covers all the samples of D, its complement U − RA , that is, ¬R do not cover any samples of D. Especially, when R consists of the formulae with the same attributes, it can be viewed as the generation of the coarsest partitions. Thus,
82
S. Tsumoto
procedure T otal P rocess; var inputs LD : List; /* A list of Target Concepts */ begin Calculate a set of characterization set Lc ; Calculate a set of intersection Lid ; Calculate a list of similarity measures Ls ; Calculate a list of grouping Lg ; (Fig. 2) Induce a set of rules for Lg : Lr ; (Fig. 3) Combine Rules in Lr for each Di ; end {T otal P rocess} Fig. 1. An Algorithm for Total Process
Theorem 2. Let R be a formula in L1.0 (D) such that R = ∨j [ai = vj ]. Then, R and ¬R gives the coarsest partition for ai , whose R includes D.
From the propositions 1 and 2, the next theorem holds. Theorem 3. Let A consist of {a1 , a2 , · · · , an } and Ri be a formula in L1.0 (D) such that Ri = ∨j [ai = vj ]. Then, a sequence of a conjunctive formula F (k) = ∧ki=1 Ri gives a sequence which increases the accuracy.
4
Rule Induction with Grouping
As discussed in Section 2, When the coverage of R for a target concept D is equal to 1.0, R is a necessity condition of D. That is, a proposition D → R holds and its contrapositive ¬R → ¬D holds. Thus, if R is not observed, D cannot be a candidate of a target concept. Thus, if two target concepts have a common formula R whose coverage is equal to 1.0, then ¬R supports the negation of two concepts, which means these two concepts belong to the same group. Furthermore, if two target concepts have similar formulae Ri , Rj ∈ L1.0 (D), they are very close to each other with respect to the negation of two concepts. In this case, the attribute-value pairs in the intersection of L1.0 (Di ) and L1.0 (Dj ) give a characterization set of the concept that unifies Di and Dj , Dk . Then, compared with Dk and other target concepts, classification rules for Dk can be obtained. When we have a sequence of grouping, classification rules for a given target concepts are defined as a sequence of subrules. From these ideas, a rule induction algorithm with grouping target concepts can be described as Figure 1. First, this algorithm calculates a characterization set L1.0 (Di ) for {D1 , D2 , · · · , Dk }. Second, from the list of characterization sets, it calculates the intersection between L1.0 (Di ) and L1.0 (Dj ) and stores it into Lid . Third, the procedure calculates the similarity (matching number)of the intersections and sorts Lid with respect of the similarities. Finally, the algorithm chooses one intersection (Di ∩ Dj ) with maximum similarity (highest matching number) and group Di and Dj into a concept DDi . These procedures will be continued until all the grouping is considered.
Extracting Structure of Medical Diagnosis: Rough Set Approach
83
procedure Grouping ; var inputs Lc : List; /* A list of Characterization Sets */ Lid : List; /* A list of Intersection */ Ls : List; /* A list of Similarity */ var outputs Lgr : List; /* A list of Grouping */ var k : integer; Lg , Lgr : List; begin Lg := {} ; k := n; /* n: A number of Target Concepts*/ Sort Ls with respect to similarities; Take a set of (Di , Dj ), Lmax with maximum similarity values; k:= k+1; forall (Di , Dj ) ∈ Lmax do begin Group Di and Dj into Dk ; Lc := Lc − {(Di , L1.0 (Di )}; Lc := Lc − {(Dj , L1.0 (Dj )}; Lc := Lc + {(Dk , L1.0 (Dk )}; Update Lid for DDk ; Update Ls ; Lgr := ( Grouping for Lc , Lid , and Ls ) ; Lg := Lg + {{(Dk , Di , Dj ), Lg }}; end return Lg ; end {Grouping}
Fig. 2. An Algorithm for Grouping
5
Example
Let us consider the case of Table 1 as an example for rule induction. For a similarity function, we use a matching number[1] which is defined as the cardinality of the intersection of two the sets. Also, since Table 1 has five classes, k = 6. 5.1
Grouping
From this table, the characterization set for each concept is obtained as shown in Fig 4. Then, the intersection between two target concepts are calculated. Since common and classic have the maximum matching number, these two classes are grouped into one category, D6 . Then, teh characterization of D6 is obtained as : D6 = {[loc = lateral], [nat = thr], [jolt = 1], [nau = 1], [M 1 = 0], [M 2 = 0]} from Fig 5. In the second iteration, the intersection of D1 and others is considered as shown in Fig 6. From this matrix, we have two possibilities of grouping: one is to group m.c.h. and i.m.l. That is, these two diseases are grouped into D7 : D7 = {([loc = occular] ∨ [loc = whole]), [nat = per], [prod = 0]} The other one is to group D1 and i.m.l., where D7 = {[jolt = 1], [M 1 = 0], [M 2 = 0]}.
84
S. Tsumoto
procedure RuleInduction ; var inputs Lc : List; /* A list of Characterization Sets */ Lid : List; /* A list of Intersection */ Lg : List; /* A list of grouping*/ /* {{(Dn+1 ,Di ,Dj ),{(DDn+2 ,.)...}}} */ /* n: A number of Target Concepts */ var Q, Lr : List; begin Q := Lg ; Lr := {}; if (Q = ∅) then do begin Q := Q − f irst(Q); Lr := Rule Induction (Lc , Lid , Q); end (DDk , Di , Dj ) := f irst(Q); if (Di ∈ Lc and Dj ∈ Lc ) then do begin Induce a Rule r which discriminate between Di and Dj ; r = {Ri → Di , Rj → Dj }; end else do begin Search for L1.0 (Di ) from Lc ; Search for L1.0 (Dj ) from Lc ; if (i < j) then do begin r(Di ) := ∨Rl ∈L1.0 (Dj ) ¬Rl → ¬Dj ; r(Dj ) := ∧Rl ∈L1.0 (Dj ) Rl → Dj ; end r := {r(Di ), r(Dj )}; end return Lr := {r, Lr } ; end {Rule Induction}
Fig. 3. An Algorithm for Rule Induction
In the third iteration of the former case(3a ), the intersection is calculated as Fig 7 and D2 and psycho are grouped into D3 : D3a = { [nat=per], [prod=0] } In the latter case(3b ), it is calculated as Fig 8 and m.c.h. and psycho are grouped into D8 : D8a = { [nat=per], [prod=0] }. Fig 9 and 10 depicts the two results of grouping like a dendrogram in clustering analysis[1]. 5.2
Rule Induction
First Model for Diagnosis. Figure 9 shows one candidate of the differential diagnosis. For the differential diagnosis of common. First, this model discriminate between D6 (common and classic) and D8 (m.c.h., i.m.l. and psycho). Then, common and classic within D6 are differentiated. Thus, a classification rule for common is composed of two subrules: (discrimination between D6 and D8 ) and
Extracting Structure of Medical Diagnosis: Rough Set Approach
85
Table 1. A small example of a database No. loc nat his prod jolt nau M1 M2 class 1 occular per per 0 0 0 1 1 m.c.h. 2 whole per per 0 0 0 1 1 m.c.h. 3 lateral thr par 0 1 1 0 0 common. 4 lateral thr par 1 1 1 0 0 classic. 5 occular per per 0 0 0 1 1 psycho. 6 occular per subacute 0 1 1 0 0 i.m.l. 7 occular per acute 0 1 1 0 0 psycho. 8 whole per chronic 0 0 0 0 0 i.m.l. 9 lateral thr per 0 1 1 0 0 common. 10 whole per per 0 0 0 1 1 m.c.h. Definition. loc: location, nat: nature, his:history, Definition. prod: prodrome, nau: nausea, jolt: Jolt headache, M1, M2: tenderness of M1 and M2, 1: Yes, 0: No, per: persistent, thr: throbbing, par: paroxysmal, m.c.h.: muscle contraction headache, psycho.: psychogenic pain, i.m.l.: intracranial mass lesion, common.: common migraine, and classic.: classical migraine. {([loc = occular] ∨ [loc = whole]), [nat = per], [his = per], [prod = 0], [jolt = 0], [nau = 0], [M 1 = 1], [M 2 = 1]} L1.0 (common) = {[loc = lateral], [nat = thr], ([his = per] ∨ [his = par]), [prod = 0], [jolt = 1], [nau = 1], [M 1 = 0], [M 2 = 0]} L1.0 (classic) = {[loc = lateral], [nat = thr], [his = par], [prod = 1], [jolt = 1], [nau = 1], [M 1 = 0], [M 2 = 0]} L1.0 (i.m.l.) = {([loc = occular] ∨ [loc = whole]), [nat = per], ([his = subacute] ∨[his = chronic]), [prod = 0], [jolt = 1], [M 1 = 0], [M 2 = 0]} L1.0 (psycho) = {[loc = occular], [nat = per], ([his = per] ∨ [his = acute]), [prod = 0]} L1.0 (m.c.h.) =
Fig. 4. Characterization Sets for Table 1 m.c.h.
m.c.h. common − {[prod=0]}
common
−
−
classic i.m.l.
− −
− −
classic ∅
i.m.l. psycho {([loc=occular]∨[loc=whole]), {[nat=per],[prod=0]} {[nat=per],[prod=0]} {[loc=lateral], [nat=thr],[jolt=1], {[prod=0],[jolt=1], {[prod=0]} [nau=1], [M1=0],[M2=0]} [M1=0], [M2=0] } − {[jolt=1],[M1=0],[M2=0]} { } − − {[nat=per],[prod=0]}
Fig. 5. Intersection of Two Characterization Sets (Step 2) m.c.h. D6 i.m.l. psycho − {} {([loc=occular]∨[loc=whole]), {[nat=per],[prod=0]} {[nat=per],[prod=0]} D6 − − {[jolt=1], [M1=0], [M2=0]} {} i.m.l. − − − {[nat=per],[prod=0]}
m.c.h.
Fig. 6. Intersection of Two Characterization Sets after the first Grouping (Step 3)
86
S. Tsumoto D6 D7 psycho D6 − {} {} D7 − − {[nat=per],[prod=0]}
Fig. 7. Intersection of Two Characterization Sets after the first Grouping (1) (Step 4a) m.c.h. D7
m.c.h. D7 psycho − {} {[nat=per],[prod=0] } − {} {}
Fig. 8. Intersection of Two Characterization Sets after the first Grouping (2) (Step 4b)
(discrimination within D6 ). On the other hand, a classification rule for m.c.h. is composed of three subrules: (discrimination between D6 and D8 ), (discrimination between D7 and psycho) and (discrimination within D7 ). Let us consider the first case. The first part can be obtained by the intersection in Figure 7. That is, D8 → [nat = per] ∧ [prod = 0] ¬[nat = per] ∨ ¬[prod = 0] → ¬D8 . Then, since from Figure 4, the difference set between characterization sets of common and classic is {[prod = 1]}, for a classification rule for common within D7 is: [prod = 0] → common. Combining these two parts, the classification rule for common is (¬[nat = per] ∨ ¬[prod = 0]) ∧ [prod = 0] → common. After its simplification, the rule is: ¬[nat = per] → ¬common, whose accuracy is equal to 2/3. In the same way, the rule for classic is obtained as: ¬[nat = per] ∧ [prod = 1] → classic. common classic m.c.h. i.m.l. psycho
Fig. 9. Grouping by Characterization Sets (1)
Extracting Structure of Medical Diagnosis: Rough Set Approach
87
common classic i.m.l. m.c.h.
psycho
Fig. 10. Grouping by Characterization Sets (2)
Second Model for Diagnosis. Figure 10 shows the other candidate of the differential diagnosis. For differential diagnosis, First, this model discriminate between D7 (common, classic and i.m.l.) and D8 (m.c.h. and psycho). Then, D6 and i.m.l. within D7 are differentiated. Finally, common and classic within D7 are checked. Thus, a classification rule for common is composed of two subrules: (discrimination between D7 and D8 ), (discrimination between D6 and D7 ), and (discrimination within D6 ). The first part can be obtained by the intersection in Figure 7. That is, D8 → [nat = per] ∧ [prod = 0] ¬[nat = per] ∨ ¬[prod = 0] → ¬D8 . Then, the second part can be obtained by the intersection in Figure 6. That is, D7 → [jolt = 1] ∧ [M 1 = 0] ∧ [M 2 = 0] ¬[jolt = 1] ∨ ¬[M 1 = 0] → ¬D7 . Finally, the third part can be obtained by the difference set for common and classic: {[prod = 1]}. [prod = 0] → common. Combining these three parts, the classification rule for common is (¬[nat = per] ∨ ¬[prod = 0]) ∧ ([jolt = 1] ∧ [M 1 = 0] ∧ [M 2 = 0]) ∧ [prod = 0] → common. After its simplification, the rule is: (¬[nat = per]) ∧ ([jolt = 1] ∧ [M 1 = 0] ∧ [M 2 = 0]) → common. whose accuracy is equal to 2/3. It is notable that the second part ([jolt = 1] ∧ [M 1 = 0] ∧ [M 2 = 0]) is redundant in this case, compared with the first model. However, from the viewpoint of characterization of a target concept, it is very important part.
88
6
S. Tsumoto
Conclusion
In this paper, the characteristics of experts’ rules are closely examined, whose empirical results suggest that grouping of diseases ais very important to realize automated acquisition of medical knowledge from clinical databases. Thus, we focus on the role of coverage in focusing mechanisms and propose an algorithm for grouping of diseases by using this measure. The above example shows that rule induction with this grouping generates rules, which are similar to medical experts’ rules and they suggest that our proposed method should capture medical experts’ reasoning. This research is a preliminary study on a rule induction method with grouping and it will be a basis for a future work to compare the proposed method with other rule induction methods by using real-world datasets. Acknowledgments. This work was supported by the Grant-in-Aid for Scientific Research (13131208) on Priority Areas (No.759) “Implementation of Active Mining in the Era of Information Flood” by the Ministry of Education, Science, Culture, Sports, Science and Technology of Japan.
References 1. Everitt, B. S., Cluster Analysis, 3rd Edition, John Wiley & Son, London, 1996. 2. Michalski, R. S., Mozetic, I., Hong, J., and Lavrac, N., The Multi-Purpose Incremental Learning System AQ15 and its Testing Application to Three Medical Domains, in Proceedings of the fifth National Conference on Artificial Intelligence, 1041–1045, AAAI Press, Menlo Park, 1986. 3. Pawlak, Z., Rough Sets. Kluwer Academic Publishers, Dordrecht, 1991. 4. Polkowski, L. and Skowron, A.: Rough mereology: a new paradigm for approximate reasoning. Intern. J. Approx. Reasoning 15, 333–365, 1996. 5. Quinlan, J.R., C4.5 – Programs for Machine Learning, Morgan Kaufmann, Palo Alto, 1993. 6. Readings in Machine Learning, (Shavlik, J. W. and Dietterich, T.G., eds.) Morgan Kaufmann, Palo Alto, 1990. 7. Skowron, A. and Grzymala-Busse, J. From rough set theory to evidence theory. In: Yager, R., Fedrizzi, M. and Kacprzyk, J.(eds.) Advances in the Dempster-Shafer Theory of Evidence, pp. 193–236, John Wiley & Sons, New York, 1994. 8. Tsumoto, S., Automated Induction of Medical Expert System Rules from Clinical Databases based on Rough Set Theory. Information Sciences 112, 67–84, 1998. 9. Tsumoto, S. Extraction of Experts’ Decision Rules from Clinical Databases using Rough Set Model Intelligent Data Analysis, 2(3), 1998. 10. Zadeh, L.A., Toward a theory of fuzzy information granulation and its certainty in human reasoning and fuzzy logic. Fuzzy Sets and Systems 90, 111–127, 1997.
A Kind of Linearization Method in Fuzzy Control System Modeling Hongxing Li , Jiayin Wang, and Zhihong Miao Department of Mathematics, Beijing Normal University, Beijing 100875, China
Abstract. A kind of linearization method in fuzzy control system modeling is proposed, in order to deal with the nonlinear model with variable coefficients. The method can turn a nonlinear model with variable coefficients into a linear model with variable coefficients in the way that the membership functions of the fuzzy sets in fuzzy partitions of the universes are changed from triangle waves into rectangle waves. However, the linearization models are incomplete in their forms because of their lacking some items. For solving this problem, joint approximation by using linear models is introduced. The simulation results show that marginal linearization models are of higher approximation precision than their original nonlinear models.
1
Introduction
A kind of modeling method based on fuzzy inference (MMFI) on the plants of control systems is proposed in [1]. The key way is that for a control system a fuzzy logic system describing the plant is obtained by acting fuzzy inference mechanism on the plant, then the fuzzy logic system is turned into a kind of nonlinear differential equation or a system of nonlinear differential equations with variable coefficients[1] based on the interpolation mechanism[2−10] of fuzzy logic systems. By the way, the differential equation is just linear when its order is one. As pointed out in [1], this has actually solved the problem for the plant in a fuzzy control system to be modelled. From the results in [1], we can learn that for a fuzzy system, when its input variables are more than two, the model as obtained by the method in [1] is all of same form nonlinear differential equation or a system of nonlinear differential equations with variable coefficients, called HX equation[1] . Here “nonlinear” may not bring too much trouble for solving such kind of equation. In fact, we can easily find out the solutions of the equations and draw the solution curves and phase plane curves under given initial values by using Matlab5.3 (see the simulation experiments in [1]). However this kind of “nonlinear” may make it difficult to consider some qualitative and quantitative analysis to a system, such as the stability, controllability, and observability. To
Supported by the National Natural Science Foundation of China (Grant No. 60174013) and the Research Fund for Doctoral Program of Higher Education (Grant No. 20020027013) To whom correspondence should be addressed.
[email protected] G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 89–98, 2003. c Springer-Verlag Berlin Heidelberg 2003
90
H. Li, J. Wang, and Z. Miao
solve this problem, in this paper, we propose a marginal linearization method aiming at approximately turning HX equations into some kind of linear differential equations or systems of linear differential equations with variable coefficients.
2
The Input-Output Models of Fuzzy Control Systems
First of all, we introduce some useful concepts and notations as follows, taking second order systems for examples. Let Y = [a1 , b1 ], Y˙ = [a2 , b2 ] and Y¨ = [a3 , b3 ] respectively be universes of y(t), y(t) ˙ and y¨(t). Suppose A = {Ai }(1≤i≤p) , B = {Bj }(1≤j≤q) and C = {Cij }(1≤i≤p,1≤j≤q) to be the fuzzy partitions respectively of corresponding universes[1−7] Y = [a1 , b1 ], Y˙ = [a2 , b2 ] and Y¨ = [a3 , b3 ] (i.e. groups of base elements), where yi , y˙ j and y¨ij are respectively the peakpoints[1−6] of Ai , Bj and Cij , with the condition: a1 ≤ y1 < y2 < · · · < yp ≤ b1 and a2 ≤ y˙ 1 < y˙ 2 < · · · < y˙ q ≤ b2 . A, B and C can be regarded as linguistic variables so that a group of fuzzy inference rules is formed as follows. If y(t) is Ai and y(t) ˙ is Bj then y¨(t) is Cij ,
(1)
where i = 1, 2, · · · , p, j = 1, 2, · · · , q. By using the conclusions in [2], the fuzzy logic system based on (1) can be represented as a binary piecewise interpolation function: p q y¨(t) = F (y(t), y(t)) ˙ = Ai (y(t))Bj (y(t))¨ ˙ yij . (2) i=1 j=1
Usually, Ai , Bj and Cij are taken as “triangle wave” membership functions y(t) − yi−1 , yi−1 ≤ y(t) ≤ yi , yi − yi−1 Ai (y(t)) = y(t) − yi+1 (3) , yi ≤ y(t) ≤ yi+1 , y − y i i+1 0, otherwise , y(t) ˙ − y˙ j−1 , y˙ j−1 ≤ y(t) ˙ ≤ y˙ j , y˙ j − y˙ j−1 ˙ − y˙ j+1 Bj (y(t)) ˙ = y(t) (4) , y˙ j ≤ y(t) ˙ ≤ y˙ j+1 , y ˙ − y ˙ j j+1 0, otherwise , where i = 1, 2, · · · , p and we stipulate that y0 = y1 and yp+1 = yp , j = 1, 2, 3, · · · , q, and also stipulate y˙ 0 = y˙ 1 and y˙ q+1 = y˙ q . By noticing that (2) is only related to the peakpoints of Cij , we do not need to consider the membership functions of Cij . Theorem 1. Under the above assumptions, the input-output model on free motion of the second order system based on (1) can be represented as a nonlinear differential equation with variable coefficients as follows (see [1]): y¨(t) = F (y(t), y(t)) ˙
= a(y(t), y(t))y(t) ˙ + b(y(t), y(t)) ˙ y(t) ˙ +c(y(t), y(t))y(t) ˙ y(t) ˙ + d(y(t), y(t)) ˙ ,
(5)
A Kind of Linearization Method in Fuzzy Control System Modeling
91
where the variable coefficients a(y(t), y(t)), ˙ · · ·, d(y(t), y(t)) ˙ depend on the timespace structure with the conditions
a(y(t), y(t)) ˙ =
p−1 q−1
a(i,j) , b(y(t), y(t)) ˙ =
i=1 j=1
c(y(t), y(t)) ˙ =
p−1 q−1 i=1 j=1
p−1 q−1
b(i,j) ,
i=1 j=1
c(i,j) , d(y(t), y(t)) ˙ =
p−1 q−1
d(i,j) ,
i=1 j=1
where a(i,j) , · · · , d(i,j) are the local coefficients on the (i, j)-th piece defined as the following: when (y(t), y(t)) ˙ ∈[yi , yi+1 ] × [y˙ j , y˙ j+1 ], a(i,j) = b(i,j) = c(i,j) = (i,j) d = 0, and when (y(t), y(t)) ˙ ∈ [yi , yi+1 ] × [y˙ j , y˙ j+1 ], they are respectively defined by y˙ j (¨ yij+1 − y¨i+1j+1 ) + y˙ j+1 (¨ yi+1j − y¨ij ) , (yi − yi+1 )(y˙ j − y˙ j+1 ) yi (¨ yi+1j − y¨i+1j+1 ) + yi+1 (¨ yij+1 − y¨ij ) = , (yi − yi+1 )(y˙ j − y˙ j+1 ) y¨ij − y¨ij+1 − y¨i+1j + y¨i+1j+1 = , (yi − yi+1 )(y˙ j − y˙ j+1 ) yi+1 (y˙ j+1 y¨ij − y˙ j y¨ij+1 ) yi (y˙ j y¨i+1j+1 − y˙ j+1 y¨i+1j ) + . = (yi − yi+1 )(y˙ j − y˙ j+1 ) (yi − yi+1 )(y˙ j − y˙ j+1 )
a(i,j) =
(6)
b(i,j)
(7)
c(i,j) d(i,j)
(8) (9)
Note 1. The nonlinear differential equation with variable coefficients as (5) is called a (second order) HX equation. When (y(t), y(t)) ˙ ∈ [yi , yi+1 ]×[y˙ j , y˙ j+1 ] (i.e. when on the (i, j)-th piece), the HX equation degenerates a local HX equation: y¨(t) = a(i,j) y(t) + b(i,j) y(t) ˙ + c(i,j) y(t)y(t) ˙ + d(i,j) ,
(10)
which is a nonlinear differential equation with constant coefficients. This means that an HX equation is formed by (p−1)×(q−1) local HX equations. So, in order to solve an HX equation, we should solve every local HX equation piecewisely.
3
Marginal Linearization Method on Input-Output Models
In order to obviate the nonlinear problem of previous model, here we propose a method called marginal linearization which can approximately transform the nonlinear equation in the form of (5) into a kind of linear equation. Therefore, the membership functions of Ai are first changed from “triangle wave” to “rectangle wave” by 1, yi− 12 ≤ y(t) < yi+ 12 , Ai (y(t)) = (11) 0, otherwise , where i = 1, 2, · · · , p, and we stipulate that y1− 12 = y1 and yp+ 12 = yp .
92
H. Li, J. Wang, and Z. Miao
In the sense of signal processing, this changing is equal to the fact that a series of triangle waves are turned to a series of rectangle waves. And in the meaning of sets, it equivalents to the case that the membership functions of fuzzy sets are replaced by the characteristic functions of crisp sets. From the angle of interpolation, it is for some linear(first degree) interpolation base functions being changed into zero-degree interpolation base functions. No matter in which sense, the essence is some simplification being done. Theorem 2. Under the previous assumptions and conditions of (11), the inputoutput model of the second order system based on (1) can be represented as a second order differential equation with variable coefficients: y¨(t) + P1 (y(t), y(t)) ˙ y(t) ˙ = Q1 (y(t), y(t)) ˙ .
(12)
Proof. When (y(t), y(t)) ˙ ∈ [yi− 12 , yi+ 12 ] × [y˙ j , y˙ j+1 ], considering the structures of Ai and Bj , we have p q y¨(t) = F (y(t), y(t)) ˙ = Ai (y(t))Bj (y(t))¨ ˙ yij i=1 j=1
y(t) ˙ − y˙ j+1 y(t) ˙ − y˙ j y¨ij + y¨ij+1 y˙ j − y˙ j+1 y˙ j+1 − y˙ j y¨ij+1 − y¨ij y˙ j y¨ij+1 − y˙ j+1 y¨ij =− y(t) ˙ + . (13) y˙ j − y˙ j+1 y˙ j − y˙ j+1 = Bj (y(t))¨ ˙ yij + Bj+1 (y(t))¨ ˙ yij+1 =
(i,j)
(i,j)
We define local coefficients P1 and Q1 as follows: y¨ij+1 − y¨ij , (y(t), y(t)) ˙ ∈ [yi− 12 , yi+ 12 ] × [y˙ j , y˙ j+1 ] , (i,j) P1 = (14) y˙ − y˙ j+1 0, j otherwise , y˙ j y¨ij+1 − y˙ j+1 y¨ij , (y(t), y(t)) ˙ ∈ [yi− 12 , yi+ 12 ] × [y˙ j , y˙ j+1 ] , (i,j) Q1 = (15) y˙ j − y˙ j+1 0, otherwise . (i,j)
(i,j)
Thus, (13) can be written as y¨(t) = −P1 y(t) ˙ + Q1 , which means that we get a differential equation with constant coefficients on the local (i, j)-th piece: (i,j)
y¨(t) + P1
(i,j)
y(t) ˙ = Q1
.
(16)
Now we respectively take that P1 (y(t), y(t)) ˙ =
p q−1
(i,j)
P1
,
and Q1 (y(t), y(t)) ˙ =
i=1 j=1
p q−1 i=1 j=1
then on the whole, i.e. ∀ (y(t), y(t)) ˙ ∈ Y × Y˙ , we have y¨(t) = F (y(t), y(t)) ˙ =
p q−1
(i,j)
(−P1
i=1 j=1
= −P1 (y(t), y(t))y(t) ˙ + Q1 (y(t), y(t)) ˙ . This is (12) that we want to prove.
(i,j)
y(t) + Q1
)
(i,j)
Q1
,
A Kind of Linearization Method in Fuzzy Control System Modeling
93
In what we did previously, the nonlinear equation (5) is transformed into the linear equation (12) by changing the shape of membership functions of the fuzzy sets on the “edge” Y . In same way, we can get another kind of linear equation by changing the fuzzy sets on the edge Y˙ . For that, the membership functions of Bj are changed into rectangle waves: 1, y˙ j− 12 ≤ y(t) ˙ < y˙ j+ 12 , Bj (y(t)) ˙ = (17) 0, otherwise , where j = 1, 2, · · · , q and we stipulate that y˙ 1− 12 = y˙ 1 and y˙ q+ 12 = y˙ q . Theorem 3. Under the previous assumptions and condition of (17), the inputoutput model of the second order system based on (1) can be represented as a second order differential equation with variable coefficients: y¨(t) + P2 (y(t), y(t))y(t) ˙ = Q2 (y(t), y(t)) ˙ ,
(18)
where P2 (y(t), y(t)) ˙ =
p−1 q
(i,j)
P2
, and Q2 (y(t), y(t)) ˙ =
i=1 j=1 (i,j)
in which P2 by (i,j)
P2
(i,j)
Q2
(i,j)
and Q2
p−1 q
(i,j)
Q2
,
i=1 j=1
are the local coefficients on the (i, j)-th piece, defined
y¨i+1j − y¨ij , (y(t), y(t)) ˙ ∈ [yi , yi+1 ] × [y˙ j− 12 , y˙ j+ 12 ] , = y − yi+1 0, i otherwise , yi y¨i+1j − yi+1 y¨ij , (y(t), y(t)) ˙ ∈ [yi , yi+1 ] × [y˙ j− 12 , y˙ j+ 12 ] , = yi − yi+1 0, otherwise .
(19)
(20)
The proof of the theorem is omitted for it is the same as the one for Theorem 2. Note 2. In the local (i, j)-th piece, (18) degenerates into a second order differential equation with constant coefficients: (i,j)
y¨(t) + P2
(i,j)
y(t) = Q2
.
(21)
Therefore, in order to process (18), we only need to consider (21) piecewisely. Note 3. Equations (12) and (18) are all with “lacking term” phenomenon with respect to so-called a standard linear differential equation, i.e. the former lacks the term y(t) and the latter lacks the term y(t). ˙ Especially for the local equations (16) and (21), this lacking term phenomenon may easily cause us some misconception. For example, for the stability of a system, we may regard the
94
H. Li, J. Wang, and Z. Miao
linear systems (16) and (21) as unstable systems based on Routh criterion if we are careless. However, this is generally wrong. In fact, Routh criterion is available only with respect to a whole linear system, not for some local parts of it. Thus we can not directly use Routh criterion for the linear systems (16) and (21) which are all local. If a local system such as (16), (21) or (10) is viewed as a whole system, then it is easy to know that it generally is a stable system because the variables of it are bounded. It is noteworthy that there does not exist obvious necessary relationship between a whole system and its all parts.
4
The Joint Equation on Edge Linearization Model
The “joint equation” here means to bring (12) together with (18) so as to form a whole equation by some method. We have such an idea for the following reasons. (i) Equations (12) and (18) respectively represent the approximate linear models obtained by using the edge linearization with respect to the nonlinear model (5). These two linear models emphasize particularly on some respective meanings that contain certain imbalance between y(t) and y(t). ˙ Clearly, this kind of imbalance can be compensated if the two linear models are jointed. (ii) Just as the statement of Note 3 in Sect. 2, (12) and (18) are all of “lacking term” phenomenon. When the inference rules degenerate into only one rule, i.e. p = q = 1, the local equations will be extended to a whole equation. Thus, this kind of lacking term phenomenon may make Routh criterion being available so that we have the conclusion that the system is unstable. A method on jointing models thought out easily is the mean value superposition between models (12) and (18), i.e. 1 1 y¨(t) + P1 (y(t), y(t)) ˙ y(t) ˙ + P2 (y(t), y(t))y(t) ˙ 2 2 1 = [Q1 (y(t), y(t)) ˙ + Q2 (y(t), y(t))] ˙ .. 2
(22)
Denoting b1 (y(t), y(t)) ˙ = 12 P1 (y(t), y(t)), ˙ b2 (y(t), y(t)) ˙ = 12 P2 (y(t), y(t)) ˙ and
b∗ (y(t), y(t)) ˙ =
1 [Q1 (y(t), y(t)) ˙ + Q2 (y(t), y(t))] ˙ , 2
(22) can be written as y¨(t) + b1 (y(t), y(t)) ˙ y(t) ˙ + b2 (y(t), y(t))y(t) ˙ = b∗ (y(t), y(t)) ˙ .
(23)
Now we consider the local expressions of variable coefficients b1 (y(t), y(t)), ˙ b2 (y(t), y(t)) ˙ and b∗ (y(t), y(t)). ˙ First of all, we formally assume that
bn (y(t), y(t)) ˙ =
p q i=1 j=1
b(i,j) , (n = 1, 2, ∗) , n
(24)
A Kind of Linearization Method in Fuzzy Control System Modeling
95
(i,j)
where bn is the local coefficient on (i, j)-th piece. Attentively, here (i, j)-th piece means that (y(t), y(t)) ˙ ∈ [yi− 12 , yi+ 12 ] × [y˙ j− 12 , y˙ j+ 12 ] . Obviously the local equation (23) is represented as the following: (i,j)
y¨(t) + b1
(i,j)
y(t) ˙ + b2
(i,j)
y(t) = b∗
.
(25)
Because the domains of definition of the local equations (12) and (18) are different, and in order to make the coefficients of (25) have uniform and clear expressions, we should make the local domain of definition of (25) [yi− 12 , yi+ 12 ] × [y˙ j− 12 , y˙ j+ 12 ] to be partitioned into four parts again, i.e. (i, j)-th is partitioned into four smaller pieces denoted respectively by (i, j)1 , (i, j)2 , (i, j)3 and (i, j)4 : (i, j)1 : [yi− 12 , yi ] × [y˙ j− 12 , y˙ j ], (i, j)2 : [yi− 12 , yi ] × [y˙ j , y˙ j+ 12 ] , (i, j)3 : [yi , yi+ 12 ] × [y˙ j− 12 , y˙ j ], (i, j)4 : [yi , yi+ 12 ] × [y˙ j , y˙ j+ 12 ] . In this way, the local equation are divided into four local sub-equations as follows: (i,j)m
y¨(t) + b1
(i,j)m
y(t) ˙ + b2
(i,j)m
y(t) = b∗
,
(26)
where m = 1, 2, 3, 4, and the relationship between the local coefficients and their local sub-coefficients is b(i,j) = n
4
m b(i,j) , (n = 1, 2, ∗) . n
(27)
m=1 (i,j)
By lengthy deducing, the expressions of local sub-coefficients bn m are clearly represented as the following: when (y(t), y(t)) ˙ is not on the sub-piece (i,j)m (i, j)m , (m = 1, 2, 3, 4), bn = 0 (n = 1, 2, ∗, m = 1, 2, 3, 4); when (y(t), y(t)) ˙ (i,j) is not on the sub-piece (i, j)m , (m = 1, 2, 3, 4), bn m are determined by the following expressions: (i,j)1
b1
(i,j)3
b2
(i,j)1
b∗
(i,j)3
b∗
5
1 (i,j−1) 1 (i,j) (i,j) (i,j) P1 , b1 2 = b1 4 = P1 , 2 2 1 (i,j) 1 (i−1,j) (i,j) (i,j) (i,j) = b2 4 = P2 , b2 1 = b2 2 = P2 , 2 2 1 (i,j−1) 1 (i,j) (i−1,j) (i,j) (i−1,j) = (Q1 + Q2 ), b∗ 2 = (Q1 + Q2 ) , 2 2 1 (i,j−1) 1 (i,j) (i,j) (i,j) (i,j) = (Q1 + Q2 ), b∗ 4 = (Q1 + Q2 ) . 2 2 (i,j)3
= b1
=
Simulation of Edge Linearization Method on Input-Output Models
Given a system, for example, we regard Var der Pol equation as the real model of the system, y¨(t) − µ(1 − y 2 (t))y(t) ˙ + y(t) = 0 , (28)
96
H. Li, J. Wang, and Z. Miao
where we put µ = 1. The aim and operating steps on the simulation have been introduced in [1]. Here we only give the simulations of three kinds of edge linearization methods. We take T = 20 as the simulation time and assume that y(0) = 2 and y(0) ˙ = 0. Example 1. The simulation results on edge linearization model (12). Case 1. Taking p = 7 and q = 8, the simulation results are shown in Fig. 1. We can learn that the simulation curves of edge linearization model (12) approximate the curves of real model well. 3
3 (a)
1.7
(b)
2
2
1
0.7
⋅
−0.3
−1.3
1 2
0
y(t)
y(t)
y(t)
1
(c)
1
2
⋅
0
−1
−1
−2
−2 1
−2.3 0
4
8 12 Time (Second)
16
20
−3 0
4
8 12 Time (Second)
16
20
−3
2 −2
−1
0
y(t)
1
2
−2
Fig. 1. The simulation curves of the linearization model (12) when p = 7 and q = 8. Curves 1 and 2 respectively represent the simulation curves of the linearization model and the real model. (a) and (b) respectively express the simulation results of y and y, ˙ and (c) is the simulation result of the phase plane
Case 2. When p and q are increased in a double number, i.e. p = 14 and q = 16, the simulation curves of edge linearization model (12) approximate the curves of real model very well, in other words the simulation curves are almost coincided with the curves of real model. We omit these simulation figures for space limit. Example 2. The simulation results on edge linearization model (18). Case 1. Taking p = 7 and q = 8, the simulation results are shown in Fig. 2. Clearly, the effect of the edge linearization model approximating to the real model is not better than the effect of model (12) to the real one, because of the lacking term phenomenon, i.e. lacking term y˙ in model (18). Case 2. When p and q are increased in double times, i.e. p = 14 and q = 16, the simulation curves of edge linearization model (18) approximate the curves of real model very well, of which the simulation curves are almost coincided with the curves of real model. We omit these simulation figures for space limit. Example 3. The simulation results on the joint model (23) of edge linearization models. Case 1. Taking p = 7 and q = 8, the simulation results are shown in Fig. 3. Case 2. In order to increase the approximating effect of the joint model (23) to the real model, p and q are respectively increased as taking p = 12 and q = 14. The simulation curves of edge linearization model (18) approximate the curves of real model very well, of which the simulation curves are almost coincided with the curves of real model. These simulation figures are omitted for space limit.
A Kind of Linearization Method in Fuzzy Control System Modeling 2.5
3
3
(a)
(b)
2
1.5
⋅
1
0
y(t)
y(t)
1
y(t)
⋅
2
(c)
2
1
1
0.5
97
2
0
−0.5 −1
−1
−2
−2
1 2
−1.5
−2.5 0
4
8 12 Time (Second)
16
−3 0
20
4
8 12 Time (Second)
16
20
−3
−2.5
−1.5
−0.5
0.5
1.5
2.5
y(t)
Fig. 2. The simulation curves of the linearization model (18) when p = 7 and q = 8 3 1.8
3
(a)
(c)
(b) 2
2
1
1
−0.2
1
0
⋅
y(t)
⋅
y(t)
y(t)
0.8
1
2
0
2 −1
−1
−2
−2
1
−1.2
−2.2 0
4
8
12
16
20
−3 0
Time (Second)
4
8 12 Time (Second)
16
20
−3 −2.5
2
−1.5
−0.5
y(t)
0.5
1.5
2.5
Fig. 3. The simulation curves of the joint model (23) when p = 7 and q = 8
6
Results and Conclusions
As we know, fuzzy control can be applied to such control cases with fuzzy environment that we may hardly modeled on them by using typical methods (see [11] and [12]). Usually we apply fuzzy inference to real systems. Then some algebraic models are formed by these fuzzy inference systems so that fuzzy controllers can be directly designed for the real systems that often have good control effect. However we all know that a mature kind of control theory is always based on some mathematical models of the plants in real systems. Therefore, the problem that the plants in fuzzy control systems can hardly be modeled is just a bottleneck of development on fuzzy control theory. In [1], a kind of modeling method for fuzzy control systems based on fuzzy inference is proposed so that the bottleneck problem is almost solved. But the mathematical models obtained by using the methods in [1] are almost nonlinear differential equations with variable coefficients. So, how to approximately transform these nonlinear models into linear ones is undoubtedly a very important problem. The edge linearization method proposed in this paper can solve above problem. It is easy to learn that the key way of the method is to change the membership functions on some universes(also called edges) from triangle wave shape to rectangle wave shape so that the basis variables in the universes (such as basis variable y(t) in universe Y or basis variable y(t) ˙ in universe Y˙ , etc.) disappear in the original nonlinear equation. This reaches the aim for the linearization. For a second order system, we can implement the linearization by using this method on only one of the universes. For a third order system, we can implement the linearization by using this method on any two of universes Y , Y˙ and
98
H. Li, J. Wang, and Z. Miao
Y¨ . And so on, for an n-th system, we can implement the linearization by using this method on any n − 1 edges of n edges Y , Y˙ , Y¨ , · · ·, Y (n−1) . In order to avoid the lacking term phenomenon in some cases, we propose a concept called Joint equation on edge linearization model, which can offer some convenience for considering qualitative or quantitative analysis. The simulation results show that the edge linearization model does have stronger effect for its approximating original nonlinear equation.
References 1. Li, H.X., Wang, J.Y., Miao, Z.H.: Modeling on fuzzy control systems. Science in China, Ser(A) 45 (2002) 1506–1517 2. Li, H.X.: Interpolation mechanism of fuzzy control. Science in China, Ser(E) 41 (1998) 312–320 3. Li, H.X.: Adaptive fuzzy controllers based on variable universe. Science in China, Ser(E) 42 (1999) 10–20 4. Li, H.X.: Relationship between fuzzy controllers and PID controllers. Science in China, Ser(E) 42 (1999) 215–224 5. Li, H.X.: Fuzzy logic systems are equivalent to feedforward neural networks. Science in China, Ser(E) 43 (2000) 42–54 6. Li, H.X.: To see the success of fuzzy logic from mathematical essence of fuzzy control. Fuzzy Systems and Mathematics (in Chinese) 9 (1995) 1–14 7. Wang, G.J.: On the foundation of fuzzy reasoning. Lecture in Fuzzy Mathematics and Computer Science, Omaha: Creighton University 4 (1997) 1–24 8. Li, H.X., Chen, P., Huang, H.P.: Fuzzy Neural Intelligent Systems. Florida: CRC Press (2001) 9. Li, H.X., Yen, V.C.: Fuzzy Sets and Fuzzy Decision-Making. Florida: CRC Press (1995) 10. Koo, T.J.: Stable model reference adaptive fuzzy control of a class of nonlinear systems. IEEE Transactions on Fuzzy Systems 9 (2001) 624–636 11. Sun, Z.Q.: Theorem and Technology of Intelligence Control. Beijing: Qinghua Publishing House (1997) 16–123 12. Zhang, N.Y.: Structure analysis of typical fuzzy controllers. Fuzzy Systems and Mathematics (in Chinese) 11 (1997) 10–21
A Common Framework for Rough Sets, Databases, and Bayesian Networks S.K.M. Wong and D. Wu Department of Computer Science, University of Regina Regina, Saskatchewan, Canada, S4S 0A2 {wong, danwu}@cs.uregina.ca
1
Introduction
It has been pointed out that there exists an intriguing relationship between propositional modal logic and rough sets [8, 2]. In this paper, we use first order modal logic (FOML) to formulate a common framework for rough sets, databases, and Bayesian networks. The relational view of the semantics of first order modal logic provides a unified interpretation of many related concepts.
2
First Order Modal Logic and Its Relational Representation
We first briefly describe the language of first order modal logic (FOML). Consider a system with n agents. We use T to denote a set of relation symbols, function symbols, and constant symbols. Each relation symbol or function symbol has an associated arity, which corresponds to the number of arguments it can take. We use K to denote a set {K1 , . . . , Kn } of n modal operators, each corresponding to an agent. We refer to the set T ∪ K of relation symbols, function symbols, constant symbols, and modal operators as the vocabulary of the language of FOML. We assume an infinite supply of variables, which we usually write as a, b, c, u, . . ., possibly with subscripts. Constant symbols and variables are called terms. They are used to describe the individuals in the domain. We can form more complicated terms by using function symbols. In other words, variable, constant symbols and terms are all used in the language of FOML to denote an individual in the domain. More formally, the set of terms is defined inductively by starting with variable symbols and constant symbols, and closing off under the application of function symbols. That is to say, if f is a k arity function symbol, and if v1 , . . . , vk are terms, then f (v1 , . . . , vk ) is a term. Terms are used to define formulas. An atomic formula is either of the form ϕ(v1 , . . . , vk ), where ϕ is a k arity relation symbol and v1 , . . . , vk are terms, or of the form v1 = v2 . If ψ and φ are formulas, then so are ¬ψ and ψ ∧ φ. If ψ is a formula, then so is Ki ψ. In addition, we can form formulas by using quantifiers. If ψ is a formula and u is a variable, then ∃u ψ is also a formula. The formula ψ ∨ φ G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 99−103, 2003. Springer-Verlag Berlin Heidelberg 2003
100
S.K.M. Wong and D. Wu
is an abbreviation for ¬(¬ψ ∧ ¬φ), and the formula ∀u ψ is an abbreviation for ¬∃u ¬ψ. The semantics of FOML uses relational Kripke structure [1]. A relational Kripke structure for n agents over the vocabulary T ∪ K is a tuple M = (S, π, K1 , . . . , Kn ), where S is a set of states (possible worlds), π associates with each state s ∈ S a normal interpretation π(s) for the first order logic, and Ki is a binary equivalence relation on S. We assume a common domain D, i.e., the domain is the same at every state. A valuation t on M is a function that assigns to each variable a member of D. Under the relational Kripke structure M , we define truth of formulas in a straightforward way. For a state s ∈ S, a valuation t on M , and a formula ϕ, we write (M, t, s) |= ϕ to mean that the formula ϕ is true at state s of M under valuation t. In the case of Ki ϕ, we define (M, t, s) |= Ki ϕ, if for every s such that (s, s ) ∈ Ki , (M, t, s ) |= ϕ. In the following, we demonstrate how we can conveniently use relations to represent the semantics of a formula in FOML. Definition 1 Consider a formula ϕ in FOML and a relational Kripke structure M . Let Stϕ = {s ∈ S | (M, t, s) |= ϕ}, where t is a valuation and s is a state of M . We call Stϕ the target states of formula ϕ for the valuation t. The above definition indicates that Stϕ denotes all the states under which ϕ is true with a fixed valuation t. Consider a formula ϕ(u) = ϕ(u1 , . . . , un ) with n variables, where u = (u1 , u2 , . . . , un ). We can represent the formula ϕ(u) by the relation r(ϕ) in Fig. 1.
u1 u2 t1 (u1 ) t1 (u2 ) r(ϕ) = t2 (u1 ) t2 (u2 ) .. .. . . tm (u1 ) tm (u2 )
. . . un . . . t1 (un ) . . . t2 (un ) .. .. . . . . . tm (un )
W Stϕ1 Stϕ2 .. . ϕ S tm
Fig. 1. A relational representation of the formula ϕ(u).
Note that the attributes of the relation r(ϕ) are those variables u 1 , . . . , un in the formula ϕ(u). An additional column with attribute name W is used to denote the target states Stϕi . Here we have established the connection between a formula ϕ(u) and its relational representation r(ϕ). The relation r(ϕ) shown in Fig. 1 is called the meta relation of the formula ϕ(u).
A Common Framework for Rough Sets, Databases, and Bayesian Networks
3
101
Rough Sets
Consider a formula Kϕ(u) where u = (u1 , . . . , un ) and K ∈ K. We can represent this formula by a meta relation as shown in Fig.2, in which StKϕ = {s ∈ S | (M, t, s) |= Kϕ}.
u1 u2 t1 (u1 ) t1 (u2 ) r(Kϕ) = t2 (u1 ) t2 (u2 ) .. .. . . tm (u1 ) tm (u2 )
. . . un . . . t1 (un ) . . . t2 (un ) .. .. . . . . . tm (un )
W StKϕ 1 StKϕ 2 .. . StKϕ m
Fig. 2. A relational representation of the formula Kϕ(u).
We call StKϕ the lower bound of Stϕ for a fixed valuation t. Let S ϕ = Stϕi and S Kϕ = StKϕ . We refer to S Kϕ as the lower bound of i ϕ S . However, one may alternatively define the lower bound of S ϕ using the following definition.
Definition 2 (M, s) |= Ki ϕ(u), if for every s , (s, s ) ∈ Ki , there exists a valuation t such that (M, t, s ) |= ϕ(u). Let SˆKϕ = {s ∈ S | (M, s) |= Kϕ(u)}. We want to point out that SˆKϕ is actually equal to the conventional lower bound of S ϕ derived from the propositional modal logic. In fact, S Kϕ ⊆ SˆKϕ ⊆ S ϕ . We say SˆKϕ is a more refined lower bound of S ϕ . Therefore, our approach provides a granular view of S ϕ in terms of different lower bounds.
4
Databases
In Fig.1, r(ϕ) is the relational representation of the formula ϕ(u). If we omit the column W in r(ϕ), then the table shown in Fig. 3 becomes a standard relation r(ϕ). This omission corresponds to ignoring the usage of possible worlds in our knowledge system. Traditionally, the relational database model is based on first order logic. Thus, in our approach database relations can be viewed as a special representation of the FOML formulas.
102
S.K.M. Wong and D. Wu u1 u2 t1 (u1 ) t1 (u2 ) r(ϕ) = t2 (u1 ) t2 (u2 ) .. .. . . tm (u1 ) tm (u2 )
. . . un . . . t1 (un ) . . . t2 (un ) .. .. . . . . . tm (un )
Fig. 3. The relation r(ϕ) obtained from meta relation r(ϕ) in Fig.1 by omitting the column W .
5
Bayesian Networks
In this section, we augment the FOML by introducing numeric operators Φ¯i so that if ϕ(u) is a formula, then Φ¯i (ϕ(u)) is a numeric term. The interpretation of the numeric operator Φ¯i , denoted Φi , is a function from 2S to R+ . We have shown that the formula ϕ(u) can be represented by a meta relation (see Fig. 1). Similarly, the numeric term Φ¯i (ϕ(u)) can be represented as a meta relation as shown in Fig.4. u1 u2 t1 (u1 ) t1 (u2 ) ¯ Φϕ i (u) = r(ϕ(u), Φi ) = t2 (u1 ) t2 (u2 ) .. .. . . tm (u1 ) tm (u2 )
. . . un W . . . t1 (un ) Φi (Stϕ1 ) . . . t2 (un ) Φi (Stϕ2 ) .. .. .. . . . . . . tm (un ) Φi (Stϕm )
Fig. 4. A relational representation of the term Φ¯i (ϕ(u)).
If we interpret the numeric operator, say Φ¯0 as a probability operator, then the function Φϕ i (u) in Fig.4 becomes a probability distribution. Consider the following formula ϕ(a, b, c) which can be expressed as: ϕ(a, b, c) ← ϕ1 (a) ∧ ϕ2 (b, a) ∧ ϕ3 (c, a). If we adopt the following interpretations: 1 Φϕ 1 (a) ⇐⇒ p(a), ϕ2 Φ2 (b, a) ⇐⇒ p(b|a),
3 Φϕ 3 (c, a) ⇐⇒ p(c|a), Φϕ 0 (a, b, c) ⇐⇒ p(a, b, c),
where p(a, b, c) is a probability joint distribution. Then the formula ϕ(a, b, c) can be interpreted as: ϕ1 ϕ2 ϕ3 Φϕ 0 (a, b, c) = Φ1 (a) · Φ2 (b, a) · Φ3 (c, a).
A Common Framework for Rough Sets, Databases, and Bayesian Networks
103
That is, p(a, b, c) = p(a) · p(b|a) · p(c|a). The above expression is in fact a Bayesian factorization of the joint probability distribution p(a, b, c) in terms of the conditional probability distributions p(a), p(b|a), and p(c|a). In other words, such a factorization represents a Bayesian network [3], whose graphical structure is depicted by the directed acyclic graph as shown in Fig.5.
a b
c
Fig. 5. The graphical representation of the Bayesian network.
6
Conclusion
Within the framework of first order modal logic, we have shown that the relational representation of formulas encompasses rough sets, databases, and Bayesian networks. Therefore, FOML can serve as a common framework for these three apparently different but related knowledge systems [7, 6, 4, 5].
References [1] R. Fagin, J. Halpern, Y. Moses, and Vardi M. Reasoning About Knowledge. MIT Press, Cambridge, Massachusetts, 1996. [2] E. Orlowska. Logical aspects of learning concepts. International Journal of Approximate Reasoning, 2:349–364, 1988. [3] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Francisco, California, 1988. [4] S.K.M. Wong. A logical approach for modeling uncertainty. In 6th International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, volume 1, pages 129–135, 1996. [5] S.K.M. Wong. An extended relational data model for probabilistic reasoning. Journal of Intelligent Information Systems, 9:181–202, 1997. [6] S.K.M. Wong, C.J. Butz, and Y. Xiang. A method for implementing a probabilistic model as a relational database. In Eleventh Conference on Uncertainty in Artificial Intelligence, pages 556–564. Morgan Kaufmann Publishers, 1995. [7] S.K.M. Wong, Y. Xiang, and Xiaopin Nie. Representation of bayesian networks as relational databases. In Fifth International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, pages 159–165, 1994. [8] Y.Y. Yao and T.Y. Lin. Generalization of rough sets using modal logic. Intelligent and Automation and Soft Computing, an International Journal, 2(2):103–120, 1996.
Rough Sets, EM Algorithm, MST and Multispectral Image Segmentation Sankar K. Pal and Pabitra Mitra Machine Intelligence Unit, Indian Statistical Institute, Calcutta 700 108, India. {sankar,pabitra r}@isical.ac.in
Segmentation is a process of partitioning an image space into some nonoverlapping meaningful homogeneous regions. The success of an image analysis system depends on the quality of segmentation. Two broad approaches to segmentation of remotely sensed images are gray level thresholding and pixel classification [1]. Multispectral nature of most remote sensing images make pixel classification the natural choice for segmentation. A general method of statistical multispectral image segmentation is to represent the probability density function of the data as a mixture model, which asserts that the data is a combination of k individual component densities (commonly Gaussians), corresponding to k clusters. The task is to identify, given the data, a set of k populations in it, and provide a model (density distribution) for each of the populations. The EM algorithm is an effective and popular technique for estimating the mixture model parameters. Rough set theory [2] provides an effective means for analysis of data by synthesizing or constructing approximations (upper and lower) of set concepts from the acquired data. An important use of rough set theory and granular computing has been in generating logical rules for classification and association [3]. These logical rules correspond to different important regions of the feature space, which represent data clusters. In this article we exploit the above characteristics of the rough set theoretic logical rules to obtain initial approximation of Gaussian mixture model parameters. The crude mixture model, after refinement through EM, leads to accurate clusters. Here, rough set theory offers a fast and robust (noise insensitive) solution to the initialization besides reducing the local minima problem of iterative refinement clustering. Also the problem of choosing the number of mixtures is circumvented, since the number of Gaussian components to be used is automatically decided by rough set theory. The problem of modelling non-convex clusters is addressed by constructing a minimal spanning tree (MST) with each Gaussian component as nodes and Mahalanobis distance between them as edge weights. Since MST clustering is performed on the Gaussian models rather than the individual data points and G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 104–105, 2003. c Springer-Verlag Berlin Heidelberg 2003
Rough Sets, EM Algorithm, MST and Multispectral Image Segmentation
105
the number of models is much less than the data points, the computational time requirement is significantly small. Block diagram of the integrated segmentation methodology is shown in Fig. 1. Discretization of the feature space, for the purpose of rough set rule generation, is performed by gray level thresholding of the image bands individually.
Fig. 1. Block diagram of the proposed clustering algorithm
Experiments were performed on two four band IRS-1A satellite images. Comparison is made both in terms of a cluster quality index [1] and computational time, in order to demonstrate the effect of the individual components.
References 1. S. K. Pal, A. Ghosh, and B. Uma Shankar, “Segmentation of remotely sensed images with fuzzy thresholding, and quantitative evaluation,” International Journal of Remote Sensing, vol. 21(11), pp. 2269–2300, 2000. 2. Z. Pawlak, Rough Sets, Theoretical Aspects of Reasoning about Data, Kluwer Academic, Dordrecht, 1991. 3. A. Skowron and C. Rauszer, “The discernibility matrices and functions in information systems,” in Intelligent Decision Support, Handbook of Applications and Advances of the Rough Sets Theory, R. Slowi´ nski, Ed., pp. 331–362. Kluwer Academic, Dordrecht, 1992.
Rough Mereology: A Survey of New Developments with Applications to Granular Computing, Spatial Reasoning and Computing with Words Lech Polkowski Polish–Japanese Institute of Information Technology, Koszykowa 86, 02008 Warsaw, Poland {Lech.Polkowski, polkow}@pjwstk.edu.pl
Abstract. In this paper, we present a survey of new developments in rough mereology, i.e. approximate calculus of parts, an approach to reasoning under uncertainty based on the notion of an approximate part (part to a degree) along with pointers to its main applications. The paradigms of Granule Computing (GC), Computing with Words (CWW) and Spatial Reasoning (SR) are particularly suited to a unified treatment by means of Rough Mereology (RM). Keywords: Rough mereology, computing with words, granular computing, spatial reasoning, rough sets
1
Rough Sets: First Notions
Rough set theory approaches the problem of in–exact concepts cf. [6] with representation of knowledge in the form of an information system A = (U, A) where U is the set of objects and A is the set of attributes, each a ∈ A being a map a : U → Va . Definable concepts X ⊂ U are then expressed as unions of equivalence classes [x]IN DA of the indiscernibility relation IN DA = {(x, y) : ∀a ∈ A.a(x) = a(y)}. For an arbitrary concept X ⊂ U , two approximations are formed viz. AX = {x ∈ U : [x]IN DA ⊆ X} (the lower approximation) and AX = {x ∈ U : [x]IN DA ∩ X = ∅} (the upper approximation).
2
Classical Mereology Theory
Our presentation of Mereology is based on [4], [5] in the first place. First we resort to Ontology by assuming that we have a family of concepts divided into G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 106–113, 2003. c Springer-Verlag Berlin Heidelberg 2003
Rough Mereology: A Survey of New Developments
107
two categories: the class AT of atomic (individual) concepts and the class CM P of non–atomic (complex) concepts (names). The predicate AT (x) will take value 1 when x is an atomic concept, and 0 otherwise. The formula xεY will mean that an individual x answers to a complex name Y i.e. AT (x), CM P (Y ) hold by default. However,we point to the fact that in the formula xεY , the object Y may be an individual as well; then, x = Y holds. 2.1
Mereology Axioms
(A1) xpty =⇒ AT (x) ∧ AT (y); this means that the functor pt of part is defined for individual concepts only. (A2) xpty ∧ yptz =⇒ xptz; this means that the functor pt is transitive, i.e. a part of a part is a part. (A3) non(xptx); this means that the functor pt is non-reflexive (or, equivalently, if xpty, then non(yptx). We define an element as follows: xely ⇐⇒ xpty ∨ x = y In terms of the predicate el an important Inference Rule may be stated [4], [11]. (IR) (The Inference Rule) [∀z.(zelx ⇒ ∃w, q.welz ∧ welq ∧ qely)] ⇒ xely. The remaining axioms of mereology are related to the class functor that converts distributive classes (complex concepts) into individual concepts: it may be used to represent “United States” as an individual comprising all US states. The class operator Cls is a principal tool in applications of rough mereology to problems of distributed systems, knowledge granulation, computing with words, [11], [12], [13]. Here we see a formal advantage of mereology: we have to deal only with objects, not with their families. For a non–empty concept Y , the class of Y , Cls(Y ) is defined as follows: x = Cls(Y ) ⇐⇒ ∃z.zεY ∧ ∀z.(zεY =⇒ zelx) ∧ ∀z.(zelx =⇒ ∃u, w.(uεY ∧ welu ∧ welz). The class functor is subject to the following postulates. (A4) xεCls(Y ) ∧ zεCls(Y ) =⇒ x = z; this means that Cls(Y ) is an individual unique concept, for any (nonempty) Y. (A5) ∃z.zεY ⇐⇒ ∃z.zεCls(Y ); meaning that Cls(Y ) exists (i.e. is a nonempty individual name) if and only if Y is a nonempty name.
3
Rough Mereology: A Calculus of Approximate Parts
Rough mereology extends mereology by considering the functor µr of a part to a degree r for r ∈ [0, 1] cf. [11], [12], [13]. The following is a list of basic postulates of rough mereology. We assume the predicate of part pt already defined, so we discuss a fixed mereological context. We introduce a family µr , where r ∈ [0, 1], called a rough inclusion which would satisfy
108
L. Polkowski
(RM 1) xµ1 y ⇔ xely (RM 2) xµ1 y ⇒ ∀z.(zµr x ⇒ zµr y) (RM 3) x = y ∧ xµr z ⇒ yµr z (RM 4) xµr y ∧ s ≤ r ⇒ xµs y
(1)
The postulate (RM1) relates rough inclusion to mereology: xµ1 y is equivalent to x being an element of y. In this way, the given exact mereological structure is embedded into rough mereological structure. It also follows that µr is defined on individual objects only. The postulate (RM2) does express monotonicity of µ with respect to the relation el. By (RM3), µr is a congruence with respect to identity. (RM4) sets the meaning of µr : it means a degree at least r. The function µ is called a rough inclusion; the term was introduced in [11]. Example 1. A generalized rough membership function | Xµr Y ⇐⇒ |X∩Y |X| ≥ r in case X nonempty, 1 else, cf.[8], [11] where X, Y are (either exact or rough) subsets (concepts) in the universe U of an information system (U, A) is an example of a rough inclusion on concepts (= sets of objects) regarded as individual elements of the mereological universe. It is evident that we cannot in general say more about properties of µr : in particular, we lack in general the transitivity property. The class operator Cls may be recalled now. We make use of it in defining a granule notion in the rough mereological universe of individual objects. 3.1
Rough Mereological Knowledge Granulation
For given r < 1 and x, we let gr (x) to denote the class Cls(Ψr ) where Ψr (y) ⇔ yµr x. The class gr (x) collects in a single class–concept all individuals satisfying the class definition with the concept Ψr . From (RM1)–(RM4), the following properties may be deduced. Proposition 1. 1. xµr y ⇒ xelgr (y) 2.xµr y ∧ yelz ⇒ xelgr (z) 3.∀z.[zely ⇒ ∃w, q.welz ∧ welq ∧ qµr x] ⇒ yelgr (x) 4.yelgr (x) ∧ zely ⇒ zelgr (x) 5.s ≤ r ⇒ gr (x)elgs (x).
(2)
Proof. 1 follows by definition of gr and class definition. 2 is implied by (RM2). For 3, use the inference rule (IR). 4 follows by transitivity of the relation el of being an element. 5 is a consequence of (RM4), (IR), and of the class definition. The class gr (x) may be regarded as a neighborhood (cluster) of (about) x of radius r. Let us observe that g1 (x) = x is the class of elements of x hence x itself.
Rough Mereology: A Survey of New Developments
3.2
109
Rough Mereology in Information Systems: A Case Study of the Linear Gaussian Rough Inclusion and of the L ukasiewicz Rough Inclusion
We will single out some propositions for a rough inclusion in an information system A = (U, A), as case studies. We define for x, y ∈ U , the set DIS(x, y) = {a ∈ A : a(x) = a(y)}
(3)
The linear gaussian rough inclusion. With help of DIS(x, y), we define a linear gaussian rough inclusion (LGRI, for short) µA r by letting − wa a∈DIS(x,y) xµA ≥r (4) r y ⇔e where wa ∈ (0, ∞) is a weight associated with the attribute a for each a ∈ A. It remains to verify that (4) is a rough inclusion indeed. We consider as individual objects the classes of indiscernibility relation IN DA and we define the notion of an element as follows xely ⇔ DIS(x, y) = ∅ (5) This notion of an element is then identical with = and corresponds to the empty part relation. Proposition 2. µA r satisfies (RM1)–(RM4) with the notion of element as in (5). Proof. For (RM1), we have xµA 1 y if and only if DIS(x, y) = ∅ if and only if xely. For (RM2), clearly, DIS(x, y) = ∅ implies DIS(x, z) = DIS(y, z) and same argument justifies (RM3). (RM4) follows by definition (4). Properties of linear gaussian inclusions are collected in the next proposition. We denote with the symbol grA the neighborhood induced by µA r . Proposition 3. 1. xIN DA y ⇒ xµA 1 y A A 2.∃η(r, s) = r · s.xµA y ∧ yµ r s z ⇒ xµη(r,s) z
(6)
Proof. 1 follows directly from the proof of Proposition 2. To prove 2: indeed, from DIS(x, z) ⊆ DIS(x, y) ∪ DIS(y, z) (7) −| wa | −| wa | −| wa | a∈DIS(x,z) a∈DIS(x,y) a∈DIS(y,z) we get e ≥e ·e . It follows from Proposition 3, 2 that LGRI µA r does satisfy the following transitivity law: A xµA r y, yµs z A xµr·s z
(8)
110
L. Polkowski
The reader undoubtedly recognizes the Menger t–norm P rod(x, y) = x · y as the function realizing the transitivity scheme (8). A look at (4) shows that µA r is constant on indiscernibility classes of IN DA . In the sequel, we will use µA r on objects as well as on indiscernibility classes tacitly. Proposition 4. (i) xelgr (y) ⇔ xµA r y (ii) xelgr y ⇒ ∀t ∈ [0, 1].gt (x)elgt r(y)
(9)
Proof. For (i), assume that xelgr (y); then, for every velx, by class definition, A A we find w, q such that welv, welq, qµA r y. Thus wµ1 q hence wµ1 · ry and finally A v, xµr y by (RM2). Proof of (ii) follows on same lines with the usage of the inference lemma (IR). L ukasiewicz rough inclusion, L RI. We begin with the information system A = (U, A), and the sets DIS(x, y) defined as above in (3). As in sect. 3.2, we exploit these sets but in a different way. For x, y ∈ U , we let xµL ry ⇔ 1−
|DIS(x, y)| ≥r |A|
(10)
L calling µ ukasiewicz rough inclusion. We recall the L ukasiewicz tensor r the L product (x, y) defined via the formula (x, y) = max{0, x + y − 1} (11)
The following is the counterpart to Proposition 3,2 stating a transitivity property of L RI. Proposition 5. Transitivity of the L ukasiewicz rough inclusion is expressed by the following scheme L xµL r y, yµs z (12) L xµ(r,s) z L L Proof. Assume that xµL r y, yµs z, xµt z; we need an estimate of t. By (7), we have 1 − t ≤ 1 − r + 1 − s hence t ≥ r + s − 1. As obviously t ≥ 0, we have finally t ≥ max{0, r + s − 1} = (r, s).
Thus, the L ukasiewicz rough inclusion does correspond to the L ukasiewicz product (t–norm). As with LGRI, one checks here that Proposition 4, 1, holds viz. xelgr (y) if and only if xµL r y.
4
Rough Mereological Granular Computing
We define an intelligent unit modeled on a classical perceptron [1].
Rough Mereology: A Survey of New Developments
4.1
111
Rough Mereological Perceptron
We exhibit the structure of a rough mereological perceptron (RM P ). It consists of an intelligent unit int − ag denoted ia. The input to ia is a finite set of connections Linkia,in = link1 , ..., linkm ; each linkj has as the source an information system Aj = (Uj , Aj ) endowed with a linear gaussian rough inclusion µjr . The output of ia is a connection linkia,out to an information system Aia = (Uia , Aia ) equipped with the linear gaussian rough inclusion µia . The operation (function) realized in RM P is denoted with Oia ; thus, for every tuple < x1 , x2 , ..., xm >, where xi ∈ Ui ,the object x = Oia (x1 , x2 , ..., xm ) ∈ Uia . In each Uj as well as in Uia , finite sets Tj , Tia are selected with the properties that (i) for each t ∈ tia there exist t1 ∈ T1 , ..., tm ∈ Tm with t = Oia (t1 , .., tm ) (ii) for each choice of ti ∈ Ti , i = 1, 2, ..., m, there is t ∈ Tia with t = Oia (t1 , ..., tm ). In case t = Oia (t1 , ..., tm ), we will say that < t1 , ..., tm , t > is an admissible set of references. The set of all admissible reference sets is denoted by Σ. The operation of an RM P may be expressed in terms of the functor ωia defined as follows. ωia : Σ × [0, 1]m → [0, 1] f or σ ∈ Σ, with σ =< t1 , .., tm , t >, ωia (σ, r1 , .., rm ) ≥ r if and only if xµia r t whenever i xi µri f or i = 1, 2, .., m where x = Oia (x1 , .., xm ) 4.2
(13)
Granular Computations
The functor ωia factors thru granule operator viz. (13)(ii) may be expressed as follows g ωia (σ, gr1 (t1 ), ..., grm (tm ) = gωia (σ,r1 ,...,rm ) (t) (14) which defines as well the factored functor ω g acting on granules. Let us observe that the acting of RM P may as well be described as that of a granular controller viz. the functor ω g may be described via a decision algoritm consisting of rules of the form if gr1 (t1 ) and gr2 (t2 ) and ...and grm (tm ) then gr (t)
(15)
with r = ωia (σ, r1 , ..., rm ) where σ =< t1 , ..., tm , t >. It is worth noticing that the functor ωia is defined from given information systems Aj , Aia and it is not subject to an arbitrary choice. Composition of RM P ’s involves a composition of the corresponding functors ω viz. given RM P1 , ..., RM Pk , RM P with links to RMP being outputs from RM P1 , .., RM Pk , each RM Pj having inputs Linkj = {link1j , ..., linkkj j }, m = k Σj=1 kj , the composition IA = RM P ◦ (RM P1 , RM P2 , ..., RM Pk ) of m inputs does satisfy the formula ωIA = ωRM P ◦ (ωRM P1 , ..., ωRM Pk )
(16)
112
L. Polkowski
under the tacit condition that admissible sets of references are composed as well. Thus RMP’s may be connected in networks subject to standard procedures e.g. learning by backpropagation etc. Computing with Words. The paradigm of computing with words (Zadeh) assumes that syntax of the computing mechanism is given as that of natural language while the semantics is given in a formal computing mechanism involving numbers. Let us observe how this idea may be implemented in an RM P . Assume there is given a set N of noun phrases {n1 , n2 , ..., nm , n} corresponding to information system universes U1 , ..., Um . A set ADJ of adjective phrases is also given and to each σ ∈ Σ, a set adj1 , .., adjm , adj is assigned Then the decision rule (15) may be expressed in the form if n1 is adj1 and ....and nm is adjm then n is adj
(17)
The semantics of (17) is expressed in the form of (15). The reader will observe that (17) is similar in form to decision rules of a fuzzy controller, while the semantics is distinct. Composition of RM P ’s as above is reflected in compositions of rules of the form (17) with semantics expressed by composed functors ω.
5
Spatial Reasoning
Spatial reasoning is based usually on a functor of connection C [2], [10] which satisfies the following conditions xCx xCy ⇒ yCx ∀z.zCx ⇔ zCy ⇒ x = y
(18)
In terms of C other spatial relations are defined like being a tangential/nontangential part or being an interior [10]. Spatial reasoning is related to spatial objects hence we depart here from our setting of the LGRI, L RI and we consider an example in which objects are located in Euclidean spaces. Example 2. The universe of objects will consist of squares of the form [k +
j j+1 i i+1 , k + s ] × [l + s , l + s ] s 2 2 2 2
with k, l ∈ Z, i, j = 1, 2, ..., 2s − 1, s = 1, 2, .... We define a rough inclusion µ as follows: xµr y ⇔ |x∩y| |x| ≥ r where |x| is the area of x. Then we define a connection Cu with u ∈ [0, 1], as follows xCu y ⇔ ∃z.∃r, s ≥ u.zµr x ∧ zµs y Then one verifies directly that Cu with u > 0 is a connection.
(19)
Rough Mereology: A Survey of New Developments
113
Assume now an RM P as above endowed with connections Cj , Cia for j = 1, 2, ..., k. Then cf. [10] Proposition 6. If xj Cj,rj tj for j = 1, 2, ..., k then xCia,inf {ω(σ,s1 ,...,sk ):sj ≥rj ,j+1,2,...,k} t with σ = {t1 , ..., tk , t}. By means of this proposition, one may consider networks of RM P ’s for constructing more and more complex sets from simple primitive objects like squares in Example 2.
References 1. C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon, Oxford, 1997. 2. A.G. Cohn. Calculi for qualitative spatial reasoning. In J. Calmet, J.A. Campbell, J. Pfalzgraf (eds.), LNAI 1138: 124–143. Springer, Berlin. 3. J. Srzednicki, S. J. Surma, D. Barnett, and V. F. Rickey, editors. Collected Works of Stanislaw Le´sniewski, Kluwer, Dordrecht, 1992. 4. S. Le´sniewski. Grundz¨ uge eines neuen Systems der Grundlagen der Mathematik. Fundamenta Mathematicae, 14: 1–81, 1929. 5. S. Le´sniewski. On the foundations of mathematics. Topoi, 2: 7–52, 1982. 6. Z. Pawlak. Rough sets, algebraic and topological approach. International Journal Computer Information Sciences, 11: 341–366, 1982. 7. Z. Pawlak. Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer, Dordrecht, 1992. 8. Z. Pawlak and A. Skowron. Rough membership functions. In R. R. Yager, M. Fedrizzi, and J. Kacprzyk, editors, Advances in the Dempster-Schafer Theory of Evidence, 251–271, Wiley, New York, 1994. 9. L. Polkowski. Rough Sets. Mathematical Foundations. Physica, Heidelberg, 2002. 10. L. Polkowski. On connection synthesis via rough mereology. Fundamenta Informaticae, 46(1/2): 83–96, 2001. 11. L. Polkowski, A. Skowron. Rough mereology: a new paradigm for approximate reasoning. International Journal of Approximate Reasoning, 15(4): 333–365, 1997. 12. L. Polkowski, A. Skowron. Adaptive decision-making by systems of cooperative intelligent agents organized on rough mereological principles. Intelligent Automation and Soft Computing. An International Journal, 2(2): 123–132, 1996. 13. L. Polkowski, A. Skowron. Grammar systems for distributed synthesis of approximate solutions extracted from experience. In G. Paun, and A. Salomaa, editors, Grammatical Models of Multi-Agent Systems, 316–333, Gordon and Breach, Amsterdam, 1999.
A New Rough Sets Model Based on Database Systems Xiaohua Tony Hu , T.Y. Lin , and Jianchao Han 1
1
2
3
College of Information Science and Technology, Drexel University, Philadelphia, PA 19104 2 Dept. of Computer Science, San Jose State University, San Jose, CA 94403 3 Dept. of Computer Science, California State University, Dominguez Hills, CA 90747 Abstract. In this paper we present a new rough sets model based on database systems. We borrow the main ideas of the original rough sets theory and redefine them based on the database theory to take advantage of the very efficient set-oriented database operation. We present a new set of algorithms to calculate core, reduct based on our new database based rough set model. Almost all the operations used in generating core, reduct in our model can be performed using the database set operations such as Count, Projection. Our new rough sets model is designed based on database set operations, compared with the traditional rough set models, ours is very efficient and scalable.
1 Introduction Rough sets theory was first introduced by Pawlak in the 1980’s [9] and it has been applied in a lot of applications such as machine learning, knowledge discovery, expert system [6] since then. Many rough sets models have been developed by rough set community in the last decades including such as Ziarko’s VPRS [10], Hu’s GRS [2], to name a few. These rough set models focus on extending the limitations of the original rough sets such as handling statistical distribution or noisy data, not much attention/attempts have been made to design new rough sets model to generate the core, reduct efficiently to make it efficient and scalable in large data set. Based on our own experience of applying VPRS and GRS in large data sets in data mining applications, we found one of the strong drawbacks of rough set model is the inefficiency of rough set methods to compute the core, reduct and identify the dispensable attributes, which limits its suitability in data mining applications. Further investigation of the problem reveals that rough set model does not integrate with the relational database systems and a lot of computational intensive operations are performed in flat files rather than utilizing the high performance database set operations. In considering of this and influenced by [5], we borrow the main ideas of rough sets theory and redefine them based on database set operations to take advantages of the very efficient set-oriented database operations. Almost all the operations used in generating reduct, core etc in our method can be performed using the database operations such as Cardinality, Projection etc. Compared with the traditional rough set approach, our method is very efficient and scalable. The rest of the paper is organized as follows: We give an overview of rough set theory with some
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 114–121, 2003. © Springer-Verlag Berlin Heidelberg 2003
A New Rough Sets Model Based on Database Systems
115
examples in Section 2. In Section 3, we discuss how to redefine the main concepts and methods of rough set based on database set operations. In Section 4, we describe rough set based feature selection. We conclude with some discussions in Section 5.
2 Overview of Rough Sets Theory We assume that our dataset is stored in a relational table with this form Table(condition-attributes decision-attributes), C is used to denote the condition th attributes, D for decision attributes, C∩D=Φ, tj denotes the j tuple. Rough sets theory defines three regions based on the equivalent classes induced by the attribute values: lower approximation, upper approximation and boundary as shown in Figure 1. Lower approximation contains all the objects which are classified surely based on the data collected. Upper approximation contains all the objects which can be classified probably, while the boundary is the difference between lower approximation and upper approximation. Below we give the formal definition. Suppose T={C, D} is a database table, we define two tuples ti and tj are in the same equivalent class induced by attributes S (S is a subset of C or D) if ti(S) = tj(S) . (The tuples in the same equivalent class has the same attribute value for all the attributes in S). Let [D]= {D1, .., Dk} denote the equivalent classes induced by D, ∀A ⊆ C, [A]= {A1,…Am} denotes the equivalent classes induced by A (Dj, Ai are called an equivalent class or elementary set). Definition 1: For a set Dj, the lower approximation Lower of Dj under A ⊆ C is [A]/Dj the union of all those equivalent classes Ai, each of which is contained by Dj: Lower = {∪Ai | Ai ⊆ Dj, i=1,…m}. For any object ti ∈ Lower , ti can be [A]/Dj [A]/Dj classified certainly to Dj, Lower[A]/[D] = ∪{Dj∈ [D]Lower[A]/[Dj] , j=1,…k} Definition 2: For a set Dj, the upper approximation Upper of Dj under A⊆ C is the [A]/Dj union of those equivalent classes Ai, each of which has a non-empty intersection with Dj : Upper = {∪Ai | Ai ∩ Dj ≠ Φ , i=1,..m}. For any object ti ∈ Upper , ti can [A]/Dj [A]/Dj be classified probably to Dj. Upper[A]/[D] = ∪{Dj ∈ [D]Upper , j=1,…,k} [A]/[Dj]
Definition 3: The boundary Boundary[A]/[D] = Upper[A]/[D] − Lower[A]/[D] Example 1: Suppose we have a collection of 8 cars (t1 to t8) with information about the Door, Size, Cylinder and Mileage. Door, Size and Cylinder are the condition attributes and Mileage is the decision attribute. (the Tupel_id is just for explanation purpose) Table 1. 8 Cars with {Door, Size, Cylinder, Mileage} Tuple_id t1 t2 t3 t4 t5 t6 t7 t8
Door 2 4 4 2 4 4 4 2
Size compact sub compact compact compact compact sub sub
Cylinder 4 6 4 6 4 4 6 6
Mileage high low high low low high low low
116
X.T. Hu, T.Y. Lin, and J. Han
[Mileage] = [Mileage=low] [Mileage=high] [Door Size Cylinder] Lower[Door Size Cylinder]/[Mileage] Upper[Door Size Cylinder]/[Mileage] Boundary[Door Size Cylinder]/[Mileage]
{[Mileage=high], [Mileage=low]} = {t2, t4,t5,t7,t8} = {t1,t3, t6} = {{t1},{t2,t7},{t3,t5,t6},{t4},{t8}} = {t2, t7,t4,t8, t1} = {t2, t7,t3,t5, t6, t4, t8, t2, t7,t3,t5, t6, t4, t1} = {t3, t5, t6}
5 cars t2, t7, t4, t8, t1 belong to the lower approximation Lower[Door Size Cylinder]/[Mileage], which means, that relying on the information about the Door, Size, Cylinder, the data collected are not complete, it is only good enough to make a classification model for the above 5 cars. In order to classify t3, t5, t6 (which belong to the boundary region), more information need to be collected about the car. Suppose we add the Weight of each car and the new data is presented in Table 2. Table 2. 8 Cars with {Weight, Door, Size Cylinder, Mileage} Tuple_id t1 t2 t3 t4 t5 t6 t7 t8
Weight low low med high high low high low
Door 2 4 4 2 4 4 4 2
Size compact Sub compact compact compact compact sub sub
Cylinder 4 6 4 6 4 4 6 6
Mileage high low high low low high low low
Based on the new data, we get the lower approximation, upper approximation and boundary as below: [Door Weight Size Cylinder] = {{t1},{t2},{t3},{t4}{t5}{t6},{t7},{t8}} Lower[Door Weight Size Cylinder]/[Mileage] = {t2, t4, t5 ,t7, t8, t1, t3, t6} Upper[Door Weight Size Cylinder]/[Mileage] = {t2, t4, t5 ,t7, t8, t1, t3, t6} Boundary[Door Weight Size Cylinder]/[Mileage] = Φ After the Weight information is added, then a classification model for all 8 cars can be built. One of the nice features of rough sets theory is that rough sets can tell whether the data is complete or not based on the data itself. If the data is incomplete, it suggests more information about the objects need to be collected in order to build a good classification model. On the other hand, if the data is complete, rough sets theory can also determine whether there are more than enough or redundant information in the data and find the minimum data needed for classification model. This property of rough sets theory is very important for applications where domain knowledge is very limited or data collection is very expensive/laborious because it makes sure the data collected is just good enough to build a good classification model without sacrificing the accuracy of the classification model or wasting time and effort to gather extra information about the objects. Furthermore, rough sets theory classifies all the attributes into three categories: core attributes, reduct attributes and dispensable attributes. Core attributes have the essential information to make correct classification for the data set and should be retained in the data set, dispensable attributes are the redundant ones in the data set and they should be eliminated while
A New Rough Sets Model Based on Database Systems
117
reduct attributes are in the middle between. Depending on the combination of the attributes, in some combinations, a reduct attribute is not necessary while in other situation it is essential. Definition 4: An Attribute Cj∈C is a dispensable attribute in C with respect to D if Lower[C]/[D] = Lower[C-Cj]/[D] In Table 2, Lower[Door Weight Size Cylinder]/[Mileage] = Lower[Weight Size Cylinder]/[Mileage] , so Door is a dispensable attribute in C with respect to Mileage Definition 5: An Attribute Cj∈ C is a core attribute in C with respect to D if Lower[C]/[D] ≠ Lower[C-Cj]/[D] Lower[Door Weight Size Cylinder]/[Mileage] ≠ Lower[Dorr Size Cylinder]/[Mileage], Weight is a core attribute in C with respect to Mileage Definition 6: An attribute Cj∈ C is a reduct attribute if Cj is part of a reduct.
3 The New Rough Sets Model Based on Database Systems There are two major limitations of rough sets theory which restricts its suitability in practice: (1) Rough sets theory uses the strict set inclusion definition to define the lower approximation, which does not consider the statistical distribution/noise of the data in the equivalence class. This drawback of the original rough set model has limited its applications in domains where data tends to be noisy or dirty. Some new models have been proposed to overcome this problem such as Ziarko’s Varied Precise Rough Set Model (VPRS) [10] and our previous research work on Generalized Rough Set Model (GRS Model) [2]. A detailed discussion of these new models are beyond the scope of our paper, for interested readers, please refer to the reference papers [2,6,7,10]Another drawback of rough sets theory is the inefficiency in computation, which limits its suitability for large data sets. In order to find the reducts, core, dispensable attributes, rough sets need to construct all the equivalent classes based on the attribute values of the condition and decision attributes. This is a very time consuming process and is very inefficient and infeasible and doesn’t scale for large data set, which is very common in data mining applications. Our research investigation of the inefficiency problem of rough sets model finds out that rough set model does not integrate with the relational database systems and a lot of basic operations of these computations are performed in flat files rather than utilizing the high performance database set operations. In considering of this and influenced by [5], we borrow the main ideas of rough sets theory and redefine them in the database theory to utilize the very efficient set-oriented database operations. Almost all the operations in rough sets computation used in our method can be performed using the database Count, Projection etc. (In this paper, we use Card to denote the Count operation, Π for Projection operation). Below we first give our new definitions of core, dispensable and reduct based on database operations and then present our rough set based feature selection algorithm. Definition 7: An attribute Cj is a core attribute if it satisfies the condition Card(Π(C−Cj+D)) ≠Card(Π (C−Cj)), In Table 2, Card(Π(Door, Size, Cylinder, Mileage)) = 6, Card(Π(Door, Size, Cylinder)) = 5, so Weight is a core attribute in C with respect to Mileage. We can check whether attribute Cj is a core attribute by using some SQL operations. We only need to take two projections of the table: one on the attribute set C−Cj+D, and the
118
X.T. Hu, T.Y. Lin, and J. Han
other on C−Cj. If the cardinality of the two projection tables is the same, then it means no information is lost in removing attributes Cj, otherwise, it indicates that Cj is a core attribute. Put it in a more formal way, using database term, the cardinality of two projections being compared will be different iff there exist at least two tuples tl and tk such that for any q ∈ C – Cj, s.t. tl.q = tk.q, tl.Cj ≠ tk.Cj and tl.D ≠ tk.D. In this case, a projection on C−Cj will be one fewer row than the projection on C−Cj+D because tl and tk being identical in C−Cj are being combined in this projection. However, in the projection C−Cj+D, tl, tk are still distinguishable. So eliminating attribute Cj will lose the ability to distinguish tuple tl and tk. Intuitively this means that some classification information is lost after Cj is eliminated For example, in Table 2, t5 and t6 have the same values on all the condition attributes except Weight; the two tuples belong to different classes because they are different on the value on Weight. If Weight is eliminated, then t5, t6 are indistinguishable. So Weight is a core attribute for the table. All the core attributes are indispensable part of every reduct. So it is very important to have a very efficient way to find all the core attributes in order to get the reduct, the minimum subset of the entire attributes. In traditional rough set models, a popular method to get the core attribute is to construct a decision matrix first, then search all the entries in the decision matrix to find all those entries with only one attribute. If the entry in the decision matrix contains only one attribute, that attribute is a core attribute [1]. This method is very inefficient and it is not realistic to construct a decision matrix for millions of tuples, which is a typical situation for data mining applications. We propose a new algorithm based on the database operations to get the core attributes of a decision table. Compared with the original rough set approach, our algorithm is efficient and scalable Algorithm 1: Core Attributes Algorithm Input: a decision table T(C,D) Output: Core –the core attribute of table T Set Core = ΦFor each attribute Ci ∈C { If Card(Π (C-Ci+D)) ≠ Card(Π (C-Ci)) Then Core = Core ∪ C } Definition 8: An attribute Cj∈C is a dispensable attribute with respect to D if the classification result of each tuple is not affected without using Cj. In database term, it means Card(Π(C−Cj+D))= Card(Π (C−Cj)) . This definition means that an attribute is dispensable if each tuple can be classified in the same way no matter whether the attribute is present or not. We can check whether attribute Cj is dispensable by using some SQL operations. We only need to take two projections of the table: one on the attribute set C−Cj+D, and the other on C−Cj. If the cardinality of the two projection tables is the same, then it means no information is lost in removing attributes Cj, otherwise, it indicates that Cj is relevant ad should be reinstated. For example, in Table 2, since Card(Π (Weight, Size, Cylinder, Mileage)) =6, Card(Π (Weight, Size, Cylinder)=6, so Door is a dispensable attribute in C with respect to Mileage. Definition 9: The degree of dependency K(REDU, D) between the attribute REDU ⊆ C and attribute D in decision table T(C,D) is K(REDU,D) = Card(Π(REDU+D))/ Card(C+D)
A New Rough Sets Model Based on Database Systems
119
The value K(REDU,D) is the proportion of these tuples in the decision table which can be classified. It characterizes the ability to predict the class D and the complement ¬D from tuples in the decision table. Definition 10: The subset of attributes RED (RED ⊆ C) is a reduct of attributes C with respect to D if it is a minimum subset of attributes which has the same classification power as the entire collection of condition attributes. K(RED, D) = K(C, D) K(RED, D) ≠ K(R’, D) ∀R’ ⊂ RED For example, for Table 2, there are two reducts: {Weight, Size} and {Weight, Cylinder} (in next section we will present the algorithm to find a reduct) Definition 11: The merit value of an attribute Cj in C is defined as Merit(Cj, C, D) = 1 – Card(Π(C-Cj+D))/Card((Π(C+D)). Merit(Cj, C, D) reflects the degree of contribution made by the attribute Cj only between C and D. For example, in Table 2,Card(Π(Door,Size,Cylinder,Mileage))=6, Card(Π(Door,Weight,Size,Cylinder,Mileage))=8, Merit(Weight,{Door,Weight,Size,Cylinder}, Mileage)=1-6/8=0.25
4 Rough Set Based Feature Selection All feature selection algorithms fall into two categories: (1) the filter approach and (2) the wrapper approach. In the filter approach, feature selection is performed as a preprocessing step to induction. Some of the well-known filter feature selection algorithms are RELIEF [4] and PRESET [8]. Filter approach is ineffective in dealing with the second feature redundancy. In the wrapper approach [3], the feature selection is “wrapped around” an induction algorithm, so that the bias of the operators that define the search and that of the induction algorithm interact. Though the wrapper approach suffers less from feature interaction, nonetheless, its running time would make the wrapper approach infeasible in practice, especially if there are many features because the wrapper approach keeps running the induction algorithm on different subsets from entire attributes until a desirable subset are identified. We intend to keep the algorithm bias as small as possible and would like to find a subset of attributes, which can generate good results by applying a suite of data mining algorithms. We focus on an algorithm-independent feature selection. Our goal is very clear: to have a reasonably fast algorithm that can find a relevant subset of attributes and eliminate the two kinds of unnecessary attributes effectively. With these considerations in mind, we proposed a rough set based filter feature selection algorithm. Our algorithm has many advantages over existing methods: (1) it is effective and efficient in eliminating irrelevant and redundant features with strong interaction, (2) it is feasible for applications with hundreds or thousands of features, and (3) it is tolerant to inconsistencies in the data. A decision table may have more than one reduct. Anyone of them can be used to replace the original table. Finding all the reducts from a decision table is NP-Hard [1]. Fortunately in many real applications, it is usually not necessary to find all of them, one is sufficient enough. A natural question is which reduct is the best if there are more than one reduct. The selection depends on the optimality criterion associated with the attributes. If it is possible to assign a cost function to attributes, then the
120
X.T. Hu, T.Y. Lin, and J. Han
selection can be naturally based on the combined minimum cost criteria. In the absence of an attribute cost function, the only source of information to select the reduct is the contents of the table [10]. In this paper we adopt the criteria that the best reduct is the one with the minimal number of attributes and that if there are two or more reducts with the same number of attributes, then the reduct with the least number of combinations of values of its attributes is selected. In our algorithm we first rank the attribute based on the merit, then we adopt the backward elimination approach to remove those redundant attributes until a reduct is generated. Algorithm 2: Compute a minimal attribute subset (reduct) Input: A decision Table T(C,D) Output: A set of minimum attribute subset (REDUCT) 1. Run Algorithm 1 to get the core attributes of the table CO 2. REDU= CO; 3. AR = C – REDU 4. Compute the merit values for all attributes of AR; 5. Sort attributes in AR based on merit valuess in decreasing order 6. Choose a attribute Cj with the biggest merit values (if there are several attributes with the same merit value, choose the attribute which has the least number of combinations with those attributes in REDU) 7. REDU = REDU ∪{Cj}, AR = AR − {Cj} 8. If K(REDU, D) = 1, then terminate, otherwise go back to Step 4 There are a lot of algorithms developed to find a reduct, but all these algorithms suffer from the performance problem because they were not integrated into the relational database systems and all the related computation operations were performed on a flat file [6,7]. In our algorithm, all the calculations such as Core, merit values are utilizing the database set operations. Based on this algorithm we can get a reduct {Weight Size}. For each reduct, we can derive a reduct table from the original table. For example, the reduct table T4 based on reduct {Weight, Size} is created by projecting out the attributes Weight and Size from Table 2, which can still make a correct classification model. {Weigh Size} is a minimum subset and can’t reduce further without sacrificing the accuracy of the classification model. Suppose we create another table T5 from T4 by moving Size, it cannot correctly distinguish between tuples t1, t6 and tuples t2, t8 because these tuples have the same Weight values but belong to different classes which were distinguishable in the reduct table T4. Table 3. Reduct Table for {Weight Size} Tuple_id t 1, t 6 t 2, t 8 t3 t 4, t 5 t7
Weight low low med high high
Size compact sub compact compact sub
Mileage high low high low low
A New Rough Sets Model Based on Database Systems
121
Table 4. Reducd Table for {Weight} Tuple_id t 1, t 6 t 2, t 8 t3 t4, t5, t7
Weight low low med high
Mileage high low high low
5 Conclusion In this paper we present a new database operation based rough set model. Most rough set models do not integrate with the databases systems, a lot of computational intensive operations such as generating core, reduct and rule induction are performed on flat file, which limit its applicability for large data set in data mining applications. We borrow the main ideas of rough sets theory and redefine them based on the database theory to take advantage of the very efficient set-oriented database operation. We present a new set of algorithms to calculate core, reduct based on our new database based rough set model. Our feature selection algorithm identifies a reduct efficiently and reduces the data set significantly without losing essential information. Almost all the operations used in generating core, reduct, etc in our method can be performed using the database set operations such as Count, Projection. Our new rough set model is designed based on database set operations, compared with the traditional rough set based data mining approach, our method is very efficient and scalable.
References [1]
Cercone N., Ziarko W., Hu X., Rule Discovery from Databases: A Decision Matrix Approach, Methodologies for Intelligent System, Ras Zbigniew, Zemankova Maria (eds), 1996 [2] Hu, X., Cercone N., Han, J., Ziarko, W, GRS: A Generalized Rough Sets Model, in Data Mining, Rough Sets and Granular Computing, T.Y. Lin, Y.Y.Yao and L. Zadeh (eds), Physica-Verlag [3] John, G., Kohavi, R., Pfleger,K., Irrelevant Features and the Subset Selection Problem, In Proc. ML-94, 1994 [4] Kira, K.; Rendell, L.A. The feature Selection Problem: Traditional Methods and a New Algorithm, In Proc. AAAI-92 [5] Kumar A., A New Technique for Data Reduction in A Database System for Knowledge Discovery Applications, Journal of Intelligent Systems, 10(3) [6] Lin T.Y., Yao Y.Y. Zadeh L. (eds), Data Mining, Rough Sets and Granular Computing, Physica-Verlag, 2002 [7] T. Y. Lin, H. Cao, "Searching Decision Rules in Very Large Databases using Rough Set Theory." In Rough sets and Current Trends in Computing, Ziarko & Yao (eds) [8] Modrzejewski, M. Feature Selection Using Rough Sets Theory, in Proc. ECML-93 [9] Pawlak Z., Rough Sets, International Journal of Information and Computer Science, 11(5), 1982 [10] Ziarko, W., Variable Precision Rough Set Model, Journal of Computer and System Sciences, Vol. 46, No. 1, 1993
A Rough Set and Rule Tree Based Incremental Knowledge Acquisition Algorithm Zheng Zheng, Guoyin Wang, and Yu Wu Institute of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065,P.R.China
[email protected] Abstract. As a special way of human brains in learning new knowledge, incremental learning is an important topic in AI. It is an object of many AI researchers to find an algorithm that can learn new knowledge quickly based on original knowledge learned before and the knowledge it acquires is efficient in real use. In this paper, we develop a rough set and rule tree based incremental knowledge acquisition algorithm. It can learn from a domain data set incrementally. Our simulation results show that our algorithm can learn more quickly than classical rough set based knowledge acquisition algorithms, and the performance of knowledge learned by our algorithm can be the same as or even better than classical rough set based knowledge acquisition algorithms. Besides, the simulation results also show that our algorithm outperforms ID4 in many aspects.
1 Introduction Being a special intelligent system, human brains have super ability in learning and discovering knowledge. It can learn new knowledge incrementally, repeatedly and increasedly. And this learning and discovering way of human is essential sometimes. For example, when we are learning new knowledge in an university, we need not to learn the knowledge we have already learned in elementary school and high school again. We can update our knowledge structure according to new knowledge. Based on this understanding, AI researchers are working hard to simulate this special learning way. Schlimmer and Fisher developed a decision tree based incremental learning algorithm ID4 [1, 2]. Utgoff designed ID5R algorithm [2] that is an improvement of ID4. G.Y. Wang developed several parallel neural network architectures (PNN’s) [3, 4, 5], Z.T. Liu presented an incremental arithmetic for the smallest reduction of attributes [6], and Z.H. Wang developed an incremental algorithm of rule extraction based on concept lattice [7], etc. In recent years, many rough set based algorithms about the smallest or smaller reduction of attributes and knowledge acquisition are developed. They are almost based on static data. However, real databases are always dynamic. So, many researchers suggest that knowledge acquisition in databases should be incremental. Incremental arithmetic for the smallest reduction of attributes [6] and incremental
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 122–129, 2003. © Springer-Verlag Berlin Heidelberg 2003
A Rough Set and Rule Tree Based Incremental Knowledge Acquisition Algorithm
123
algorithm of rule extraction based on concept lattice [7] have been developed, but there are few incremental rough set based algorithms about knowledge acquisition. On the basis of former results, we develop a rough set and rule tree based incremental knowledge acquisition algorithm (RRIA) in this paper. Simulation results show that our algorithm can learn more quickly than classical rough set based knowledge acquisition algorithms, and the performance of knowledge learned by our algorithm can be the same as or even better than classical rough set based knowledge acquisition algorithms. Besides, we compare our algorithm with ID4 algorithm. The results show that the rule quality and the recognition rate of our algorithm are both better than ID4.
2 Basic Concepts of Decision Table For the convenience of description, we introduce some basic notions at first. Definition 1. A decision table is defined as S=, where U is a finite set of objects and R=C D is a finite set of attributes. C is the condition attribute set and D is the decision attribute set. With every attribute a ³ R,set of its values Va is associated. Each attribute has a determine function f : U×R→V. Definition 2. Let S= < U,R,V,f > denote a decision table, and let B ⊆ C. Then a rule set generated from S is defined as F={ f 1 DB , f 2 DB ,..., f r DB }, where
f i B = {∑ a → d | a ∈ C and d ∈ D} (i = 1, ..., r ) d
d
and r is the number of rules in F. In f i B , if some attributes is reduced, then the value of the reduced attributes are supposed to be “*” which is different from any possible values of these attributes. For example, in a decision system with 5 condition attributes (a1, …, a5), (a1=1) ∧ (a3=2) ∧ (a4=3) → d=4 is a rule. In this paper, we write it as (a1=1) ∧ (a2=*) ∧ (a3=2) ∧ (a4=3) ∧ ( a5=*) → d=4 according to Def.2.
3 Related Incremental Algorithms In order to compare our algorithm with other related algorithms, we introduce and discuss some decision table based incremental algorithms at first. 3.1 ID4 Algorithm [1, 2] Based on the concept of decision tree, Schlimmer and Fisher designed an incremental learning algorithm ID4. Decision trees are the tree structures of rule sets. They have higher speed of searching and matching than ordinary rule set. Comparing with ordinary rule set, a decision tree has unique matching path. Thus, using decision tree can avoid confliction of rules and accelerate the speed of searching and matching. However, there are also some drawbacks of decision tree. After recursive partitions, some data is too little to express some knowledge or concept. Besides, there are also some problems of the constructed tree, such as overlap, fragment and replication [12]. The results of our experiments also show that the rule set expressed by decision tree
124
Z. Zheng, G. Wang, and Y. Wu
has lower recognition rate than most of the rough set based knowledge acquisition algorithms. So, we should find a method that has the merit of decision tree and avoid its drawback. 3.2 ID5R Algorithm [2] In 1988, to improve ID4 algorithm’s learning ability, Utgoff developed ID5 decision algorithm and afterwards the algorithm is updated to ID5R. The decision trees generated by ID5R algorithm are the same as those generated by ID3. Each node of decision tree generated by ID5R must save a great deal of information, so the space complexity of ID5R is higher than ID4. Besides, when the decision tree’s nodes are elevated too many times, the algorithm’s time complexity will be too high. 3.3 Incremental Arithmetic for the Smallest Reduction of Attributes [6] Based on the known smallest reduction of attributes, this algorithm searches the new smallest reduction of attributes when some attributes are added to the data set. 3.4 Incremental Algorithm of Rule Extraction Based on Concept Lattice [7] Concept lattice is an effective tool for data analysis and rule generation. Based on concept lattice, it’s easy to establish the dependent and causal relation model. And concept lattice can clearly describe the extensive and special relation. This algorithm is based on the incremental learning idea, but it has some problems in knowledge acquisition: first, it’s complexity is too high; second, it takes so much time and space to construct the concept lattice and Hasse map; third, rule set cannot be extracted from concept lattice directly.
4 Rough Set and Rule Tree Based Incremental Knowledge Acquisition Algorithm (RRIA) Based on the above discussion about incremental learning algorithms and our understanding of incremental learning algorithms, we develop a rough set and rule tree based incremental knowledge acquisition algorithm in this section. In this part, we introduce the concepts of rule tree at first and present our incremental learning algorithm then. At last, we analyze the algorithm’s complexity and performance. For the convenience of description, we might suppose that OTD represents the original training data set in our algorithm, ITD represents the incremental training data set, ORS represents the original rule set and ORT represents the original rule tree. 4.1 Rule Tree 4.1.1 Basic Concept of Rule Tree Definition 3. A rule tree is defined as follows: (1) A rule tree is composed of one root node, some leaf nodes and some middle nodes.
A Rough Set and Rule Tree Based Incremental Knowledge Acquisition Algorithm
125
(2) The root node represents the whole rule set. (3) Each path from the root node to a leaf node represents a rule. (4) Each middle node represents an attribute testing. Each possible value of an attribute in a rule set is represented by a branch. Each branch generates a new child node. If an attribute is reduced in some rules, then a special branch is needed to represent it and the value of the attribute in this rule is supposed as “*”, which is different from any possible values of the attribute (Ref. Definition 2). 4.1.2 Algorithms for Building Rule Tree Algorithm 1: CreateRuleTree(ORS) Input: ORS; Output: ORT. Step 1. Arrange the condition attributes in ascending of the number of their values in ORS. Then, each attribute is the discrenibility attribute of each layer of the tree from top to bottom; Step 2. For each rule R of ORS {AddRule(R)} Algorithm 2: AddRule(R) Input: a rule tree and a rule R; Output: a new rule tree updated by rule R. Step 1. CN←root node of the rule tree; Step 2. For i=1 to m /*m is the number of layers in the rule tree*/ { If there is a branch of CN representing the ith discernibility attribute value of rule R, then CN←node I;/*node I is the node generated by the branch*/ else {create a branch of CN to represent the ith discernibility attribute value; CN←node J /*node J is the node generated by the branch*/}} We suppose the attribute with fewer number of attribute values is the discernibility attribute in higher layer of a rule tree. By this supposition, the number of nodes in each layer should be the least, and the searching nodes should also be the least and it could speed up the searching and matching process. 4.2 Rough Set and Rule Tree Based Incremental Knowledge Acquisition Algorithm (RRIA) Algorithm 3: Rough set and rule tree based incremental knowledge acquisition algorithm Input: OTD, ITD={ object1, object2, … , objectt}; Output: a rule tree. Step 1. Use rough set based algorithm to generate ORS from OTD; Step 2. ORT=CreateRuleTree(ORS); Step 3. For i=1 to t (1) CN←root node of ORT; (2) Create array Path[]; /*Array Path records the current searching path in MatchRule() and the initial value of Path is NULL*/ (3) MatchRules=MatchRule(CN, objecti, Path); (4) R=SelectRule (MatchRules); (5) If R exists and the decision value of R is different from objecti’s, then UpdateTree(objecti,R); /*R is conflict with objecti*/ (6) If R doesn’t exit then AddRule(objecti).
126
Z. Zheng, G. Wang, and Y. Wu
Where, algorithm AddRule is described in section 4.1 and algorithms MatchRule, SelectRule and UpdateTree will be further illustrated in the following section. 4.3 Child Algorithms of RRIA In the incremental learning process, we need match the object to be learned incrementally with rules of ORT at first. The matching algorithm is as follows. Algorithm 4: MatchRule(CN, Obj, Path) Input: the current matched node CN, the current object Obj to be learned incrementally, the current searching path Path and ORT; Output: MatchRules. /*which records the matched rules of Obj in ORT.*/ Step 1. matchedrulenum=0; Step 2. For I=1 to number of child nodes of CN {If the branch generating child node I represents the current discernibility attribute value of Obj or represents a possible reduced attribute, then {If node I is a leaf node, then { MatchRules[matchedrulenum++]←Path; path=NULL} else { CN←node I; Path[layerof(I)] ←value(I); /*The result of value(I) is the current discernibility attribute value that the branch generating node I represents and the result of layerof(I) is the layer that node I is in ORT. “*” represents a possible reduced attribute value(Ref. Definition 2)*/ MatchRule(node I, Obj, Path)}} else Path=NULL} /* If neither of these two kinds of child nodes exist */ This algorithm is recursive. The maximal number of nodes in ORT to be checked is m×b, and the complexity of this algorithm is O(mb). There maybe more than one rules in MatchRules. There are several methods to select the best rule from MatcheRules, such as High confidence first principle [8], Majority principle [8] and Minority principle [8]. Here we design a new principle, that is Majority Principle of Incremental Learning Data, which is more suitable for our incremental learning algorithm. Algorithm 5: SelectRule(MatchRules) Input: the output of algorithm MatchRule, that is, MatchRules; Output: the final matched rule. During the incremental learning process, we consider that the incremental training data is more important than the original training data. The importance degree of a rule will be increased by 1 each time it is matched to an unseen object to be learned in the incremental learning process. We choose the rule with the highest importance degree to be the final matched rule. If we can’t find the rule absolutely matched to the object to be learned, but we can find a rule R matching its condition attribute values, that is , the object to be learned is conflict with rule R, then we should update the rule tree. The algorithm is as follows. Suppose {C1, C2, …, Cp} is the result of arranging the condition attributes reduced in R in ascending of the number of their values. Algorithm 6: UpdateTree(Obj, R) Input: the current object Obj to be learned incrementally, rule R, ORT, OTD and ITD; Output: a new rule tree.
A Rough Set and Rule Tree Based Incremental Knowledge Acquisition Algorithm
127
Step 1. Check all objects of OTD and the objects that have been incrementally learned in ITD and suppose object1, object2, …, objectq are all objects matched to rule R; Step 2. For i=1 to q { dis_num[i]=0; For j=1 to p {mark[i][j]=0}} Step 3. For i=1 to q For j=1 to p If Obj and objecti have different values on attribute Cj, then { mark[i][j]=1; dis_num[i]++} Step 4. For i=1 to p { delattr[i]=Ci} Step 5. For i=1 to q { If dis_num[i]=0, then { delattr=NULL; go to Step 7} else If dis_num[i]=1, then { j=1; While (mark[i][j++] ≠ 1); delattr=delattr-{Cj}}} Step 6. For i=1 to q { discerned=FALSE; If dis_num[i]>1, then { j=1; While (discerned=FALSE) and (j 0 is the number of information source sets (Ψi ): H sources to consider. As an example, consider the case of a domain where objects are characterized by attributes given by continuous crisp quantities, discrete features, fuzzy features, graphs and digital images. Let R be the reals with the ˆ = R ∪ {?} to be a source set and usual ordering, and R ⊆ R. Now define R extend the ordering relation to a partial order accordingly. Now let N be the set of natural numbers and consider a family of nr sets (nr ∈ N+ = N − {0}) given ˆ nr = R ˆ1 × · · · × R ˆ n (nr times) where each R ˆ j (0 ≤ j ≤ nr ) is constructed by R r 0 ˆ ˆ as R, and define R = φ (the empty set). Now, if Oj is a family of ordinal source sets (with the corresponding ordering relation), Nj a family of nominal variables, Fj a collection of fuzzy sets, Gj of graphs, and of digital images, Ij , and the same procedure is applied, a heterogeneous domain is constructed ˆn = R ˆ nr × O ˆ no × N ˆ nm × Fˆ nf × Gˆng × Iˆ ni . Other kinds of heterogeneous as H domains can be constructed in the same way, using the appropriate source sets. In more general information systems the universe is endowed with a set of relations of different arities. Let t =< t1 , . . . , tp > be a sequence of p natural integers, called type, and Y =< Y, γ1 , . . . , γp > the extended information system will be Sˆ =< U, A, Γ >, endowed with the relational system U =< U, Γ >. A virtual reality space is a structure composed by different sets and functions defined as Υ =< O, G, B, m , go , l, gr , b, r >. O is a relational structure defined as above (O =< O, Γ v > , Γ v =< γ1v , . . . , γqv >, q ∈ N+ and the o ∈ O are objects), G is a non-empty set of geometries representing the different objects and relations (the empty or invisible geometry is a possible one). B is a nonempty set of behaviors (i.e. ways in which the objects from the virtual world will express themselves: movement, response to stimulus, etc. ). m ⊂ Rm is a metric space of dimension m (the actual virtual reality geometric space). The other elements are mappings: go : O → G, l : O → m , gr : Γ v → G, b : O → B, r is a collection of characteristic functions for Γ v , (r1 , . . . , rq ) s.t. ri : γiv ti → {0, 1}, according to the type t associated with Γ v . The representation of an extended
Virtual Reality Representation of Information Systems and Decision Rules
617
information system Sˆ in a virtual world requires the specification of several sets and a collection of extra mappings: Sˆv =< O, Av , Γ v >, O in Υ , which can be done in many ways. A desideratum for Sˆv is to keep as many properties from Sˆ as possible. Thus, a requirement is that U and O are in one-to-one correspondence (with a mapping ξ : U → O). The structural link is given by a ˆ n → m . If u =< fa (u), . . . , fa (u) > and ξ(u) = o, then l(o) = mapping f : H 1 n f (ξ(< fa1 (u), . . . , fan (u) >)) =< fav1 (o), . . . , favm (o) > (favi are the evaluation functions of Av ). It is natural to require that Γ v ⊆ Γ , thus having a virtual world portraying selected relations from the information system. Function f can be constructed as to maximize some metric/non-metric structure preservation criteria as is typical in multidimensional scaling [1], or minimize some error measure of information loss [8], [4].
3
Examples
Clearly, a VR environment can not be shown on paper, and only simplified, grey level screen snapshots from two examples are shown just to give an idea. The VR spaces were kept simple in terms of the geometries used. The f transform used was Sammon error, with ζij given by the Euclidean distance in Υ and δij = (1 − sˆij )/ˆ sij , where sˆij is Gower’s similarity [3]. For genomic research in neurology, time-varying expression data for 2611 genes in 8 time attributes were measured. Fig-1(a) shows the representation in Υ of the information system and the result of a previous rough k-means clustering [5]. Besides showing that there is no differentiated class structure in this data, the undecidable region between the two enforced classes is perfectly clear. The rough clustering parameters were k = 2, ωlower = 0.9, ωupper = 0.1 and threshold = 1. The small cluster encircled at the upper right, contains a set of genes discovered when examining the VR space and was considered interesting by the domain experts. This pattern remained
Fig. 1. VR spaces of (a) a genomic data base (with rough clusters), and (b) a geologic data base with decision rules build with rough set methods.
618
J.J. Vald´es
unnoticed since it was masked by the clustering procedure (its objects were assigned to the nearby bigger cluster). When data sets and decision rules are combined, the information systems are of the form S =< U, A {d} >, Sr =< R, A {d} > (for pthe rules), where {d} is the decision attribute. Decision rules are of the form , i=1 (Aτi = vητii ) → (d = v d j ) , where the Aτi ⊆ A, the vητii ∈ Vτi and v d j ∈ Vd . The sˆij used for δij in A, was given by: sˆij = 1˘ ωij a∈A˘ (ωij · sij ), where: A˘ = Au if i, j ∈ U , a∈ A Ar if i, j ∈ R and Au Ar if i ∈ U and j ∈ R. The s, ω functions are defined as: sij = 1 if fa (i) = fa (j) and 0 otherwise, ωij = 1 if fa (i), fa (j) = ?, and 0 otherwise. The example presented is the geo data set [2]. The last attribute was considered the decision attribute and the rules correspond to the very fast strategy, giving 99% accuracy. The join VR space is shown in Fig-1(b) where objects are spheres and rules cubes, respectively. According to RSL results, Rule 570 is supported by object 173, and they appear very close in Υ . Also, data objects 195 and 294 are very similar and they appear very close in Υ .
References 1. Borg, I., Lingoes, J.: Multidimensional Similarity Structure Analysis. SpringerVerlag 1987. 2. Gawrys, M., Sienkiewicz, J. : Rough Set Library User’s Manual (version 2.0). Inst. of Computer Science. Warsaw Univ. of Technology (1993) 3. Gower, J.C.: A General Coefficient of Similarity and Some of its Properties. Biometrics Vol.1 No. 27 (1973) pp. 857–871 4. Jianchang, M., Jain, A. : Artificial Neural Networks for feature Extraction and Multivariate Data Projection. IEEE Trans. On Neural Networks. Vol. 6, No. 2 (1995) pp. 296–317 5. Lingras, P., Yao, Y. : Time Complexity of Rough Clustering: GAs versus K-Means. Third. Int. Conf. on Rough Sets and Current Trends in Computing RSCTC 2002. Malvern, PA, USA, Oct 14–17. Alpigini, Peters, Skowron, Zhong (Eds.) Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence Series) LNCS 2475, pp. 279–288. Springer-Verlag , 2002 6. Pawlak, Z. : Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Dordrecht, Netherlands. (1991) 7. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning (1992) 8. Sammon, J. W. A non-linear mapping for data structure analysis. IEEE Trans. Computers, C-18, 401–408, (1969) 9. Vald´es, J.J: Virtual Reality Representation of Relational Systems and Decision rules: an exploratory tool for understanding data structure. In TARSKI: Theory and Application of Relational Structures as Knowledge Instruments. Meeting of the COST Action 274, Book of Abstracts. Prague, Nov. 14–16, (2002) 10. Vald´es, J.J: Similarity-based Heterogeneous Neurons in the Context of General Observational Models. Neural Network World. Vol 12., No. 5, (2002) pp. 499–508
Hierarchical Clustering Algorithm Based on Neighborhood-Linked in Large Spatial Databases Yi-hong Dong Department of Computer Science, Ningbo University, Ningbo 315211, China
[email protected] Abstract. A novel hierarchical clustering algorithm based on neighborhoodlinked is proposed in this paper. Unlike the traditional hierarchical clustering algorithm, the new model only adopts two steps: clustering primarily and merging. The algorithm can be performed in high-dimensional data set, clustering the arbitrary shape of clusters. Furthermore, not only can this algorithm dispose the data with numeric attributes, but with boolean and categorical attributes. The results of our experimental study in data sets with arbitrary shape and size are very encouraging. We also conduct an experimental study with web log files that can help us to discover the use access patterns effectively. Our study shows that this algorithm generates better quality clusters than traditional algorithms, and scales well for large spatial databases.
1 Introduction A traditional hierarchical method[1] is made up of partitioning method in different layers, which uses iterative partition between layers. The layers are constructed by merge or split approaches. Once a group of objects are merged or split, the process at the next step will operate on the newly generated cluster. It will neither undo what was done previously, nor perform object swapping between clusters. BIRCH does not perform well to the non-spherical in shape, while CURE can not deal with data with boolean or categorical attributes. We suggest a new clustering algorithm-HIerarchical Clustering Algorithm based on NEighborhood-Linked(Hicanel) which can find clusters of arbitrary shape and size, identify the data set with both boolean and categorical attributes. It is a fast and high efficient clustering algorithm.
2 Hierarchical Algorithm Based on Neighborhood-Linked Comparing with the common hierarchical method, Hicanel method uses only two steps: clustering primarily to the objects in data set by partitioning method to get several sub-clusters, then merges sub-clusters linked compactly. This merging step is a one-step process other than layer by layer.
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 619–622, 2003. © Springer-Verlag Berlin Heidelberg 2003
620
Y.-h. Dong
Fig. 1. An Example
We can easily detect clusters of points when looking at the sample sets of points depicted in Figure 1a. In our method, we regard the cluster with arbitrary shape as the combination of many sub-clusters. The data set is divided into some sub-clusters with similar size(Figure 1b), the combination of which is the cluster we want to discover. Either convex shape or concave can we regard as the combination of many subclusters, so we can distinguish any clusters with arbitrary shape and size. In conclusion, we divide the data set into many sub-clusters, then merge them into some clusters. It is easy to find sub-clusters, but the key to get the clusters is to merge the subclusters effectively and efficiently. Hicanel algorithm proposed merges the compact sub-clusters after analyzing the neighbor and links among sub-clusters. Definition 1:( ¦-neighborhood of a point) ¦-neighborhood of a point is an area with point p as center and r as radius which is called neighborhood p, denoted by Neg(p). To the data of numeric attribute, it is define by Neg(p)={q D|dist(p,q)=U}, where sim(p,q) is Jaccard coefficient. Definition 2: (Link) Point x is a link between neighborhood p and neighborhood q, if and only if x is not only in the neighborhood of point p but in that of point q. We use the signal link(p,q) to denote this relationship. Link(p,q)= {x|x Neg(p),x Neg(q)}. The whole data sets can be well described by joint constitutive graph of links after clustering primarily as following: node represents the centroid of every sub-cluster, edge represents the link of the sub-clusters, and weight represents the link number of the sub-clusters. Database should be rescanned once more to judge the distance between every object and each centroid of sub-clusters. If the object is in several ¦ neighborhoods, weights between these nodes should be added by increment. The composition is finished after whole data sets rescanned. After every object being scanned, we are going to cut off the graph. When the cutting finishing, the unconnected graph is formed and the connected branches, each of which represents a cluster, are the clustering result. Figure 1c is the joint constitutive graph of the subclusters for the data sets of Figure 1a after clustering primarily. We can easily discover that node O1,O2,…,O8 form one cluster and O9 and O10 belong to the other. Unsupervised clustering algorithm[2](UC) is used to clustering primarily in our algorithm which can identify the clusters with similar shape and size to search sub2 2 clusters. Total time complexity is O(2nk+nk +k ) including the complexity of UC algorithm. K is regarded as a constant owing to k ,L , < url n , t n > } , where ti means in the frequent sequence pattern set S, associated the pageview represented urli, whose average visited time is ti. It will be used to determine the weight of user’s attention to the page. 2.2 Content Preprocessing Content Preprocessing analysis the association between subjects and pageviews. Web page automatic marked method based Ontology is used to discover the subject of the pageview[4]. The preprocessing task is to obtain a set p(σ 1 , µ1 ;σ 2 , µ 2 L;σ i , µ i ) according to a pageview. Where µ i is the weight represents the correlation of subject σ i and pageview p.
2.3 Interest Navigation Model Based HMM In the patterns, we can confirm the users have more interest in finding some useful information about the subject and not any other. These patterns can be regarded as the common interest frequent navigation patterns according to the subject. 2.3.1 Computing the Transition Probability and Attention Probability of a Subject on a Pageview Definition1: The one-step transition probability of two pages in set S:
P ( qi → q j ) ≈ The qi → q j
count ( qi → q j )
count ( qi ) represents qi and q j are linked directly by hyperlink. The
count( qi → q j ) represents the number of the transaction in which users access the web site from qi to q j in one step in S. The count( qi ) represents the number of the transactions in S, each of which has qi . Definition2: The attention probability of a keyword σ , which the user pay on the page qi :
p(σ | qi ) = µ × t ’i
646
J. Shi, F. Shi, and H. Qiu
The µ is the weight of the subject σ of the page content which computed in the content processing, t’i is the ratio of the time of the user visit the page to the user’s total visit time. p(σ | qi ) represents the attention degree of the user pay on the subject σ in the page qi . Definition3: the interest association rule R (σ | S k ) : If we know an interest frequent navigation pattern S k and σ , and R (σ | S k ) ≥ C (C is a given confidence threshold). k −1
R (σ | S k ) = Π ( P( qi → qi +1 ) × P (σ | qi +1 )) i =1
2.3.2 Defining of Interest Navigation Model 1.a pageview is the state q in the HMM. 2.There is a subject set = σ 1 , σ 2 ,L , σ M . σ i is a subject on the pageview.
∑
3.a pageview includes a subset ( σ 1’ ,L , σ m’ ) of Σ . 4. q and q’ are linked directly by hyperlink. They have a transition probability
P ( qi → q j ) 5.The attention probability p(σ | qi ) is that the users pay on the subject σ in the page qi . Through analyzing the Log file, we can build an INM (Interest Navigation Model) that satisfies these definitions. 2.4 Discovering the Interest Navigation Pattern The interest association rule represents all the users’ navigation pattern for pageviews. In order to discover the interest association rule, we give an incremental discovery algorithm: Algorithm: Hmm_R Input: Q,S,C Begin: k k:=1; j:=1; S :=S; While j=1 do j:=0; For each s ∈ S k For each q ∈ Q If R (σ | ( s, q )) ≥ C then
S k := S k +1 + ( s, q );
User’s Interests Navigation Model Based on Hidden Markov Model
647
R k +1 := R k +1 + R (σ | ( s, q )); j:=1; End if; End for; End For; k:=k+1; End While; End. Output:
R k ,k=1,…,n
3 Conclusions and Future Work Our approach is essentially a recommendation approach based on web usage mining and web content mining. Through mining the user transaction record and the subjects on the pageviews, we recommend the mining result in order to accelerate the user’s access. We firstly use the HMM to analyze the common interest navigation patterns. This approach resolves the self-adaptation problem of the web site. In the approach, building the HMM doesn’t require the complex training process, the transition probability and the common attention probability to a subject on a pageview are easily calculated. There are some characteristics in our approach. 1) It is a kind of optimizing approach. 2) The mining object is the interactive action and the common interest, and the mining result faces up to the total users. 3) The discovered navigation pattern is not always has direct hyperlink with each other. The next step of our works not only explores the recommendation approaches but also explores the prediction approaches. So we can predict the interest of the users in the future.
References 1. Cooley R, Mobasher B et al. Grouping Web page references into transactions formining World Wide Web browsing patterns. In: Proc Knowledge and Data Engineering Workshop, Newport Beach, CA, 1997.245–253 2. Rabiner,L.R. A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE 77(2), New York, 1989, 257–286 3. Rabiner,L.R. A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE 77(2), New York, 1989, 257–286 4. I. Horrocks, et al., The Ontology Interchange Language: OIL, tech. report, Free Univ. of Amsterdam, 2000
Successive Overrelaxation for Support Vector Regression Yong Quan, Jie Yang, and Chenzhou Ye Inst. of Image Processing & Pattern Recognition, Shanghai Jiaotong Univ. Shanghai 200030, People’s Republic of China Correspondence: Room 2302B, No.28, Lane 222, fanyu Road, Shanghai City, 200052, People’s Republic of China
[email protected] Abstract. Support vector regression (SVR) is an important tool for data mining. In this paper, we first introduce a new way to make SVR have the similar mathematic form as that of support vector classification. Then we propose a versatile iterative method, successive overrelaxation, for the solution of extremely large regression problems using support vector machines. Experiments prove that this new method converges considerably faster than other methods that require the presence of a substantial amount of the data in memory. Keywords: Support Vector Regression;Support Vector Machine, Successive Overrelaxation, data mining
In this work, we propose a new way to make SVR have the similar mathematic form as that of support vector classification, and derive a generalization of SOR to handle regression problems. Simulation results indicate that the modification to SOR for regression problem yield dramatic runtime improvements.
1 Successive Overrelaxation for Support Vector Machine Given a training set of instance-label pairs
(xi , yi ), i = 1,K, l
where
xi ∈ R n and
y ∈ {1,−1} , Mangasarian outlines the key modifications from standard SVM to RSVM [1]. Suppose a is the solution of the dual optimization problem [2]. Choose ω ∈ (0,2) . Start with any a 0 ∈ R l . Having a i compute a i +1 as follows: l
(
(
(
a i +1 = a i − ωD −1 Aa i − e + L a i +1 − a i
where
)))
∗
(⋅)∗ denotes the 2-norm projection on the feasible region of (1), that is
(ai )∗
ai ≤ 0 0 = ai 0 < ai < C , i = 1, K , l . C ai ≥ C
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 648–651, 2003. © Springer-Verlag Berlin Heidelberg 2003
(1)
Successive Overrelaxation for Support Vector Regression
649
2 Simplified SVR and Its SOR Algorithm Most of those already existing training methods are originally designed to only be applicable to SVM. In this paper, we propose a new way to make SVR have the similar mathematic form as that of support vector classification, and derive a generalization of SOR to handle regression problems. 2.1 Simplified SVR Formulation Similar to [3], we also introduce an additional term the formulation stated as follows: min s .t .
The solution problem. min s .t .
1 2
of
∑ ∑ (a l
(2)
l
i =1
j =1
l
i
)
a i , a ∈ [0 , C ]
(2)
L
be
)(
transformed
)
) ∑ (a
)
w ϕ ( x i ) + b − y i ≤ ε + ξ i∗ ξ i , ξ i∗ ≥ 0 i = 1, ,l
− a i∗ a j − a ∗j ϕ ( x i )ϕ (x
(
(
l 1 w T w + b 2 + C ∑ ξ i + ξ i∗ 2 i =1 yi − w ϕ (xi ) − b ≤ ε + ξ i
can
+ ε ∑ a i + a i∗ − i =1 ∗ i
(
b 2 2 to SVR. Hence we arrive at
l
i
j
into
) + 1 ∑ ∑ (a 2 l
l
i =1
j =1
the
i
dual
)(
optimization
− a i∗ a j − a ∗j
)
(3)
)
− a i∗ y i
i =1
The main reason for introducing our variant (2) of the SVR is that its dual (3) does not contain an equality constraint, as does the dual optimization problem of original SVR. This enables us to apply in a straightforward manner the effective matrix splitting methods such as those of [3] that process one constraint of (2) at a time through its dual variable, without the complication of having to enforce an equality constraint at each step on the dual variable a . This permits us to process massive data without bringing it all into fast memory. Define y1 − ε d ϕ (x ) 1 1 1 a1 M T M ,E H = ZZ , d ϕ (x ) , a l l Z = a = ∗l d l +1ϕ ( x l +1 ) a1 M M d ϕ (x ) a∗ 2l 2l 2 l ×1 l 2 l ×1
= dd
T
M M , , y −ε 1 c = l d = − y1 − ε − 1 M M − y − ε − 1 l 2 l ×1
2 l ×1
Thus (3) can be expressed in a simpler way. min s .t .
1 1 a T Ha + a T Ea − c T a 2 2 a i ∈ [0 , C ], i = 1, ,l
(4)
L
If we ignore the difference of matrix dimension, (4) and (2) have the similar mathematic form. So many training algorithms that were used in SVM can be used in SVR.
650
Y. Quan, J. Yang, and C. Ye
2.2 SOR Algorithm for SVR Here we let A = H + E , L + D + L = A . The nonzero elements of constitute the strictly lower triangular part of the symmetric matrix T
L ∈ R 2l ×2l A , and the
2l ×2l
nonzero elements of D ∈ R constitute the diagonal of A . The SOR method, which is a matrix splitting method that converges linearly to a point a satisfying (4), leads to the following algorithm: a
i+1 j
= a
i j
−ϖA
−1 jj
2l
∑
A
jk
a
i k
− c
j
+
k = j
j −1
∑
A
jk
k =1
a
i +1 k
∗
(5)
A simple interpretation of this step is that one component of the multiplier
a j is
updated at a time by bringing one constraint of (4) at a time.
3 Experimental Results The SOR algorithm is tested against the standard chunking algorithm and against the SMO method on a series of benchmarks. The SOR, SMO and chunking are all written in C++, using Microsoft’s Visual C++ 6.0 complier. Joachims’ package SVMlight (version 2.01) with a default working set size of 10 is used to test the decomposition method. The CPU time of all algorithms are measured on an unloaded 633 MHz Celeron II processor running Windows 2000 professional. The chunking algorithm uses the projected conjugate gradient algorithm as its QP solver, as suggested by Burges [4]. All algorithms use sparse dot product code and kernel caching. Both SMO and chunking share folded linear SVM code. In this experiment, we consider the approximation of the sinc function
(
)
f ( x) = (sin πx ) πx . Here we use the kernel K (x1 , x2 ) = exp − x1 − x2 0.1 , C = 100 and ε = 0.1 . Figure 1 shows the approximated results of SMO method and SOR method respectively.
Fig. 1. Approximation results of SMO method (a) and SOR method (b)
2
Successive Overrelaxation for Support Vector Regression
651
Table 1. Approximation effect of SVMs using various methods Time(sec)
Experiment
Number of SVs Expectation of Error
Variance of Error
SOR
0.232f0.023
9f1.26
0.0014f0.0009
0.0053f0.0012
SMO
0.467f0.049
9f0.4
0.0016f0.0007
0.0048f0.0021
0.521f0.031
9f0
0.0027f0.0013
0.0052f0.0019
0.497f0.056
9f0
0.0021f0.0011
0.006f0.0023
Chunking
SVM
light
In table 1 we can see that the SVMs trained with various methods have nearly the same approximation accuracy. And SOR algorithm is faster than the other algorithms.
4 Conclusion In summary, SOR is a simple method for training support vector regressions which does not require a numerical QP library. Because its CPU time is dominated by kernel evaluation, SOR can also be dramatically quickened by the use of kernel optimizations, such as linear SVR folding and sparse dot products. SOR can be anywhere from several to hundred even to thousand times faster than the standard Chunking algorithm, depending on the data set.
References [1]
[2] [3] [4]
J. Platt. Fast training of support vector machines using sequential minimal optimization. In B.Schölkopf, C.Burges, and A.Smola, editors, Advances in Kernel Methods – Support Vector Learning, MIT Press, 1998 E.Osuna, R.Freund, and F.Girosi. An improved training algorithm for support vector machines. In Proc. of IEEE NNSP’s97, 1997 Olvi L.Mangasarian and David R.Musicant. Successive overrelaxation for support vector machines. IEEE Trans on Neural Networks, 1999, 10(5): 1032~1037 C.J.C.Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2)
Statistic Learning and Intrusion Detection Xian Rao, Cun-xi Dong, Shao-quan Yang p.b.135, Xidian University, Xi’an, China, zip 710071
[email protected] Abstract. The goal of intrusion detection is to determine whether there are illegal or dangerous actions or activities in the system by checking the audit data on local machines or information gathered from network. It also can be look as the problem that search relationship between the audit data on local machines or information gathered from network and the states of the system need to be protected, that is, normal or abnormal. The statistic learning theory just study the problem of searching unknown relationship based on size limited samples. The statistic theory is introduced briefly. By modeling the key process of intrusion detection, the relationship between two problems can be found. The possibility of using the methods of statistic theory in intrusion detection is analyzed. Finally the new fruit in statistic learning theory –Support Vector Machines—is used in simulation of network intrusion detection using the DRAPA data. The simulation results show support vector machines can detection intrusions very successfully. It overcomes many disadvantages that many methods now available have. It can lower the false positive with higher detection rate. And since it using small size samples, it shortens the training time greatly. Keyword: statistic learning intrusion detection network security support vector machines neural network
1 Introduction Statistic learning plays a fundamental role in AI study. It has accumulated lots of theories since it began at 35 years ago. But the Approximation theory that traditional statistic learning studied is the statistic character based on infinite size samples. Practically, the size of samples is usually limited. So given the limited size samples, the generalization of a well learning machine becomes worse. The deeper part in statistic learning did not attract more attention until resent years. The Structure Risk Minimization principle and Minimum Description Length Principle become studying focuses. The research on small size samples problem is carried on. Along with the fast development of Internet, the network security becomes more important each day. Only when the security problem solved, we can take full advantage of network. Intrusion detection is an important area in network security. So it gets a lot of focuses. Many approach are studied to solve intrusion detection problem.
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 652–659, 2003. © Springer-Verlag Berlin Heidelberg 2003
Statistic Learning and Intrusion Detection
653
By studying statistic learning and intrusion detection, we find that two theories have something in common. So the new method in statistic learning—Support vector machines—can be used in intrusion detection.
2 Statistic Learning Statistic learning theory[1] is a statistical learning rule for small size samples problem. It is an important development and complement to the traditional statistics. It provides a theory frame for machine learning theory and method given size limited samples. The core of the theory is to control the generalizing ability of a learning machine by controlling its capacity. 2.1 The Description of Learning Problem A learning problem based on samples can be described by a three-parts model as figure 1 shows. There, G is a generator that generates random vectors x ∈ R , which is take out independently from a definite but N
x G
F (x) . S is a y for every
unknown distribution function trainer that returns an output
x according to the same definite but unknown distribution F y | x . LM is a input
y S
LM
( )
learning machine that can realize a certain
( )
Fig. 1. Learning model based on function set f x, α , α ∈ Λ , where Λ is samples parameter set. A learning problem is to choose the best function which is mostly approximate the response of the trainer from a given function sets f ( x, α ), α ∈ Λ . This choice is based on training set which are
composed of samples sized of l drawn out independently according to the unite distribution
( ) () ( ) (x , y ),L, (x , y )
F x, y = F x F y | x 1
1
l
(1)
l
In order to make the chosen function have the best approximation, it is necessary to measure the errors or loss respond function
( )
( ( ))
L y, f x, α between the output of trainer and that of the
f x, y given the input x . The mathematical expectation is
R (α ) = ∫ L( y, f ( x , α ))dF (x , y )
(2)
654
X. Rao, C.-x. Dong, S.-q. Yang
That is also called Risk functional. The aim of learning is looking for function
(
)
f x,α 0 that makes the RF minimum when the unit distribution is unknown and all the information is included in the training set as the formula (1 ) shows. 2.2 Traditional Statistic Learning Theory The traditional learning machine is based on the Empirical Risk Minimization (ERM) principle, but it is find that the generalization of all these machines is not good given small sized samples in experiment. Statistic learning comes back to its originated point that is studying the learning machine given small size samples. The statistic learning theory can be divided into four parts. First is the Consistent theory of learning process. This theory answers under which situation the learning process based on ERM is consistent. Second one is the non-approximation theory. This theory answer at what speed it converges when studying. By studying it is concluded that the bound of generalization composed of two parts: one is the Risk
l Remp (α l ) , the other is confidence interval Φ , where l is the hk number of samples and hk represents VC dimension. Third is the theory about Functional
generalizing ability in controlling learning procedure. This theory is on how to control the speed of generalization (generalizing ability ) during learning procedure. Since the convergence of learning are controlled by two parts, the learning algorithm based only on ERM principle can not guarantee its generalizing ability given small size samples. So it comes a more universal principle—Stuctural Risk Minimization (SRM) Principle. The principle is minimizing Risk functional based both on Empirical Risk and confidence interval. Thus the ability of generalization when given small sized samples can be guaranteed. The last is a theory on constructing learning algorithm. The algorithm that is able to control the generalizing ability can be made based on this theory. These algorithm included Neural Network (NN) and Support Vector Machines(SVM). 2.3 The Common Methods NN is a learning algorithm that minimize the ER without changing confidence interval. The idea is used to estimate the weight of neural cell. The method that using sigmoid approximation function compute the empirical risk grades is called back propagation algorithm. According to these grades, the parameters of NN can be modified iteratively using the standard computation based on grades. Under the statistic learning frame, the new universal learning method, support vector machine is produced. It learns by fixing the Empirical functional and minimizing the confidence interval. Its main idea is constructing a separating hyperplane under the linearly separable case. Then generalize it to the linearly nonseparable case. SVM, such a learning machine is used for constructing optimal separating hyperplane. It solves the learning problem for small sized samples efficiently.
Statistic Learning and Intrusion Detection
655
3 Intrusion Detection The original intrusion detection model was presented by Dorthy Denning[2] in 1968. Now it becomes an important task in network security field. 3.1 Description of Intrusion Detection Intrusion detection judges whether there are illegal or dangerous actions or activities in the system by checking the audit data on local machines or information gathered from network.. Almost every intrusion detecting runs by three steps: gathering information phase, learning phase and detecting phase. In the gathering information phase, all kinds of normal or abnormal information are collected. During the training and learning phase the relationship between gathered information and system state is find by analyzing the information already known. Then in the detecting phase we can determine the state of unknown audit data or network traffic data according to the relationship we got in second phase. Of these three phases, the first two are more important for they guarantee the correctness of the detection. Now a model for these two phase are made. The target system need to protected can be looked as an y x generator, noted by O. all the information of the system O P S outputting can be translated to a number vectors, noted by x , by a processor P. The output of trainer S is noted by y . The training and learning process can be described by figure 2. The distribution
LM Fig. 2. The model of the gathering information and training and learning phase of intrusion detection
F (x) of the parameter number vectors gained from processor P is unknown. The output
y of trainer S is
produced according to the same definite but unknown probability function LM is a learning machine that can realize a certain function set
( )
F y|x .
( )
f x, α , α ∈ Λ ,
where Λ is parameter set. The intrusion detection can be described as choosing the best function which is mostly approximate the response of the trainer from a given function sets f ( x,α ),α ∈ Λ . This choice is based on training set which are composed of
l samples drawn out independently according to the unite probability F x, y = F x F y | x . The training set is noted as T .
( ) () ( )
656
X. Rao, C.-x. Dong, S.-q. Yang
3.2 Methods in Intrusion Available Detecting intrusions can be divided into two categories: anomaly intrusion detecting and misuse detecting[3]. Anomaly detection means establishing a “normal” behavior pattern for users, programs or resources in the system and then looking for deviations from this behavior. The methods often used are quantitative analysis, statistical approach and non-parameters statistic analysis and rule-based algorithm. According to model above their process can be summarized as following. In the gathering information phase, all the data collected is the legal or expected behavior. Thus all the output of trainer set is +1. Supposed that the number of training samples gathered is lα , the training set can be noted as Ta = {( xi ,+1) | i = 1, L , l a } . The learning machine chooses the best approximation function to the respond of trainer from given function set f ( x , α ), α ∈ Λ based on the training set Ta . Misuse detection means to looking for know malicious or unwanted behavior according to the knowledge base on gathered intrusion information. The main approach are simple matching, expert system and the state translated method. In the gathering information phase, intrusion knowledge or anomaly behaviors are collected, so all the out put of the trainer are –1. Supposed that that the number of training samples gathered is l n , the training set can be noted as
Tn = {( xi ,−1) | i = 1,L , l n } . The learning machine chooses the best approximation function to the respond of trainer from given function set f ( x , α ), α ∈Λ based on the training set Tn . 3.3 Intrusion Detection and Statistic Learning
When the key steps of intrusion detection described by mathematical model, it is can be seen that intrusion detection can be looked as a problem looking for relationship between system audit data or network traffic data and the system state based on the known knowledge. The key point, looking for relationship, here is the problem just studied by statistic learning. It is feasible to solve the intrusion detection using the statistic learning methods. In intrusion detection, both computer system by network system only have two states: being intruded or not, noted as –1 and +1. So the output of trainer in intrusion detection model are also two values: +1 and –1,that is y ∈ {+ 1,−1} . The pattern recognition problem is very suitable here. The difference between intrusion detection and statistic learning is that intrusion detection uses different types of data. Some maybe are string type. Some are char type. So most intrusion detection systems have preprocessor parts to translate the gathered information to the type that the detector can read. Because statistic learning dealing with number vector, it is necessary to translate all gathered information to numbers by the preprocessor when using the statistic learning method.
Statistic Learning and Intrusion Detection
657
By analysis the intrusion detection model shown above, it is can be seen that both anomaly detection and misuse detection use unilateral knowledge that brings some disadvantage hard to be overcome. The false alarm rate of anomaly detection is high. And misuse detection can not detection unknown attack. Neural Network as one of the statistic learning has been used in detection. But it is only used by anomaly detection or by misuse detection. It is mean that the training set it uses is unilateral, so the advantage of NN is not taken a good use. Beside that, the NN not only need large training data size but also is inclined to get into local minimum. In most case, the training set we can get is not very big. So the mew statistic learning method base on small size samples—SVM is an approach very suitable for intrusion detection. At the same time, using both normal and abnomal information in training improve the detection performance.
4 Using SVM in Intrusion Detection SVM, a new learning method, in statistic learning has found its successful uses in handwriting recognition[4] and face recognition[5]. We now find it performed well too in intrusion detection. 4.1 DRAPA Data The 1998 DARPA Intrusion Detection Evaluation Program was prepared and managed by MIT Lincoln Labs. The objective was to survey and evaluate research in intrusion detection. A standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment, was provided. Lincoln Labs set up an environment to acquire nine weeks of raw TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. They operated the LAN as if it were a true Air Force environment, but peppered it with multiple attacks. The raw training data was about five million connection records from seven weeks of network traffic. Similarly, the two weeks of test data yielded around two million connection records for test. A connection is a sequence of TCP packets starting and ending at some well defined times, between which data flows to and from a source IP address to a target IP address under some well defined protocol. Each connection has 41 features divided into tree categories. Nine features in first category indicate the basic information of a net connection, such as protocol type (char) and bytes from source (float) et.. There are 13 features in second category, which is called derived features. They are the domain knowledge provided by connected hosts, including number of failed login attempts, number of created files and so on. The 19 features in last category reflect the statistic character of net traffic, for example, number of connections to the same host as the current connection in the past two seconds, percentage of connections that have ``SYN'' errors in past two seconds and so on. At last each connection is labeled as either normal, or as an attack, with exactly one specific attack type. There are over twenty kinds of attacks simulated in 7 weeks. They were falls into four category: DOS: denial-of-service, e.g. syn flood; R2L: unauthorized access from a remote machine, e.g. guessing password; U2R: unauthorized access to local super user (root)
658
X. Rao, C.-x. Dong, S.-q. Yang
privileges, e.g., various ‘‘buffer overflow’’ attacks; probing: surveillance and other probing, e.g., port scanning. During the last 2 weeks of gathering test data, besides attacked used before, new attacks are added to test the generalizing ability. 4.2 Data Preprocessing By discussing, it is known that the preprocessing must be made before using the statistic method to detect. There are two reasons. First is to unite data form, that is, translating different types of data into a number vector. Second is to minimize the differences between each element. For the network connection data, the types of different features maybe different. Even their types are same, but their ranges are not. So they should be preprocessed first. For those features whose types are string or char we need to code for them, that is translating them to numbers. For the SVM classify something according to their space character, the main purpose of coding is to distinguish different strings or chars. An simple coding method will perform this function. Besides type translation, minimum difference between each connection is necessary. For example, because of the uncertainty number and size of translated data, the value and their change of bytes from source or bytes from destination are great. But for the percentage of same serve rate in past two minutes is ranged from 0 to1. In order to shorten their difference, the bytes from source and from destination need to be normalized. 4.3 Training Phase and Detecting Phase In training phase, the connection labeled is used to train SVM in order to approximate the output of trainer. The training data need to be translated to the form as shown in formula (1) that could be directly used by SVM. The training data set given has over 5,000,000 connections. Only a small part is used for training. The generalization ability of SVM guaranteed the detection system still performance well under small size samples. The training samples we used is only one or two of thousands, so we greatly reduced the time used in training. In detecting phase the SVM can class unlabeled connections. We tested on all the test data of 2,000,000 connections data and got satisfied result. 4.4 Simulation Result The effects of different normalizing methods in preprocessing phase are shown in figure 3. The training set used is only 1.8 of the thousands of all training set. “no norm” means no normalization for training sets; the power normalization is denoted as “power norm”; and max value normalization is denoted as “max norm”. The figure shows that using the normalized data can get better results whatever the normalizing method you use. At the same time, the size of training data size also effects intrusion detection performance. The more data used, the better intrusion detection can perform. The ROC of curve of different training sets size is shown in figure 4. The three curves in the figure represents the size of training set are 0.08%, 0.18% and 0.26%. It just validate out though the more samples are used, the better performance it get.
Statistic Learning and Intrusion Detection
659
Fig. 3. The ROC curve using different norm-lizing methods, training sets is 0.18%
Fig. 4. The ROC of different training data size. The normalizing method is power normalizing.
5 Conclusions By analysis and comparison of statistic learning theory and intrusion detection, it is found that the key problems need to be solved in two are the same. It is natural to apply the method in statistic learning to solve intrusion detection problem. For most cases, the sizes of training sets we get are small. It is more practical to using SVM in intrusion detection. By simulation, using SVM in intrusion detection has many advantages including shorten the training time, high detection rate with lower false alarm, real time detection and upgraded easily and so on. An attempt of applying statistic learning method in intrusion detection is made in this paper. More work need to be done in the future to perfect the application.
References 1. Vapnik V. N.. The Nature of Statistical Learning Theory .New York:spring-Verlag,1995. 2. Denning D.E. An Intrusion Detection Model. IEEE Trans. On Software Engineering,1987,13(2):222–232 . 3. Humar G. Classification and detection of computer intrusions [ph.d.Thesis].Purdue University,1995 4. Cortes C.,Vapnik V. Support vector networks. Machine Learning,1995(20):273–297 5. Osuna E., Freand R.,and Girosi F. Training Support Vector Machines:an Implication to face detection. IEEE Conference on Computer Vision and Pattern Recognition. 1997:130-136
Raoxian received the BS degree from department of electronics engineering in xi dian university, xi an, china. She is nowing working toward a Doctor degree in the department of communication and information in xidian university. Dongchunxi
A New Association Rules Mining Algorithms Based on Directed Itemsets Graph
1,2
1
Lei Wen and Minqiang Li 1
School of Management, TianJin University, No.92 WeiJin Road, TianJin City, Post code 300072, China 2 Department of Economy and Management, North China Electric Power University, No.204 QingNian Road, BaoDing City, Post code 071003, HeBei, China
[email protected] [email protected] Abstract. In this paper, we introduced a new data structure called DISG (Directed itemsets graph)in which the information of frequent itemsets was stored. Based on it, a new algorithm called DBDG(DFS Based –DISG) was developed by using depth first searching strategy. At last we performed a experiment on a real dataset to test the run time of DBDG. The experiment showd that it was efficient for mining dense datasets.
1 Introduction Mining association rules was an important task of knowledge discovery. Association rules described the interesting relationships among attributes in database. The problem of finding association rules was first introduced by agrawal[1] and has been attracted great intention in database research communities in recent years. Mining association rules can be decomposed into two steps: the first was Generate frequent itemsets.The second was generate association rules. The most famous algorithms was Apriori[2]. The algorithms employed a breathfirst and downward closure strategy and use that any subset of a large itemsets must be large and any superset of a small itemsets must be small to prune the search space. Most of the algorithms were the variant of Apriori. Such as Partition[3], DHP[4] and DIC[5] etc. They all need scan the database multiple times. The Apriori-inspiries algorithm showed good performance with sparse dataset, but was not suitable for dense dataset. Now more works focus on how to construct a tree structure to replace the original database for mining frequent itemsets. FP-growth[6] and tree projection[7] were samples of them. They all had a good performance than others . In this paper, we introduced a new algorithms DBDG which discover frequent itemsets by using directed itemsets graph. This algorithms used vertical database to count the support of itemsets. The record of the Vertical database was a pair item, Tidsets , where Tidsets was a set of TID of the transaction which support the item. So the frequent patterns can be counted via Tidsets intersections efficiently.
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 660–663, 2003. © Springer-Verlag Berlin Heidelberg 2003
A New Association Rules Mining Algorithms Based on Directed Itemsets Graph
661
2 The DISG(Directed Itemsets Graph) : Design and Construction Definition 1:(directed graph) a directed graph D= V, A was consisted of a vertex set V and an arc set A. an arc Aij directed from Vi to Vj signed Vi called tail, Vj called head. Definition 2:Directed itemsets graph DISG= V, A defined as follow: (1)Vertex set V of the DISG was the set of frequent 1-itemset FI FI1, FI2, …FIn . Each vertex has three field: the first is the name of frequent item; The second is the support of the frequent item signed Supi ; The third is the adjacent vertex of the vertex. (2)The arc indicate was a frequent 2-itemsets, with a number s corresponded to the support of frequent 2-itemsets . Based on the definition above, we have the following DISG construction algorithms: Algotithm1 (DISG Construction) input: Vertical database D/ and minsup s out put: DISG begin Select the frequent item FIi from vertical database D/; add all FII into the vertex set V by the support descending order; while V do select Vi From V; for each Vj V (j>i) do begin if s(Vi,Vj) >s do add Vj to the adjacentitemlist of Vi; end; delete Vi From V; end; As a novel structure, DISG included all information of frequent patterns. At the same time, it only stored the frequent item and frequent 2-itemsets. So the size of DISG was smaller than tree structure.
3 The Algorithms of DBDG Now we introduced algorithms DBDG(DFS Based –DISG) with Depth–First-Search strategy. First selected the vertex Vi from V. Selected it’s adjacentvertex Vj with highest support. Then counted s(Vi,Vj), if s(Vi,Vj)>minsup, then check adjacentvertex of Vj. Continue the step above until support s(Vi,Vj,…Vm)was less than minsup or the adjacent list was empty. Then returned to vertex of last level and repeated the steps until all the vertex had been visited as start vertex. The following was the algorithms code.
662
L. Wen and M. Li
Algorithms: DBDG: Input: DISG, Output: FIS Begin While V do for each Vi of V FIS=Vi, S(FIS)=S(Vi); Select unvisited Vj from Vi.adjacentlist; FIS=FIS Vj, S(FIS)=S(FIS,,Vj) Call for DFS(Vj) end; Procedure DFS(Vj) begin if Vj.adjacentlist do Select Vk with highest support from Vj.adjacentlist; if S(FIS, Vk) minsup do S(FIS)=S(FIS, Vk) FIS=FIS Vk Call DFS(Vk) else Output FIS delete Vk from Vj.adjacentlist Call DFS(Vj) Else Return to its parent vertex Vi Call DFS(Vi) End;
4 Experiments
UXQWLPHVHFRQG
To assess the performance of algorithm DISG, we performed an experiments on a PC with P4 1.5Ghz and 256MB main memory. All the programs are writed by Visual c++ 6.0. An real datasets mushroom (from the UC Irvine Machine Learning Database Repository) was used in this experiment .The runing time was showed in figure 1.
VXSSRUW Fig. 1. Computational performance
A New Association Rules Mining Algorithms Based on Directed Itemsets Graph
663
5 Conclusion In this paper we introduced a new data structure DISG and a algorithm called DBDG for mining frequent itemsets. There were several advantage of DBDG over other approach: (1) it constructed a highly compacted DISG, which was smaller than original database; (2) it avoided scan database multiple times by using vertical database to count the support of frequent itemsets; (3) it employed depth first strategy and decrease the number of candidate itemsets. An experiment showed it had a good performance of mining dense datasets. In recent years, discovering maximum frequent itemsets[8,9] or closed frequent itemsets[10] was a new field to solve the dense dataset problem. So in the future, we will focus on researching how to discover maximum frequent itemsets or closed frequent itemsets based on DISG.
References 1.
Agrawal R., Imielinski T., Swami A., "Mining association rules between sets of items in very large databases." Proceedings of the ACM SIGMOD Conference on Management of data, washington,USA,(1993) 207–216 2. Rakesh Agrawal and Ramakrishnan Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, (1994) 3. A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. Proc. 1995 Int. Conf. Very Large Data Bases (VLDB’95), Zurich, Switzerland,(1995) 432–443 4. J. S. Park, M. S. Chen, and P. S. Yu. An efficient hash-based algorithm for mining association rules. Proc. 1995 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD’95), San Jose, CA,(1995)175–186 5. Brin S., Motwani R. Ullman J. D. and Tsur S. Dynamic Itemset Counting and implication rules for Market Basket Data. Proceedings of the ACM SIGMOD, (1997)255–264 6. J. Han, J. Pei and Y. Yin. Mining Frequent Patterns without Candidate Generation. Proc. 2000 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD00), Dallas, TX, USA, (2000) 1–12 7. Agarwal R. C., Aggarwal C. C., Prasad V. V. V., Crestana V., A Tree Projection Algorithm for Generation of Large Itemsets For Association Rules. Journal of Parallel and Distributed Computing, Special Issue on High Performance Data Mining, 61(3), (2001)350–371 8. R. C. Agarwal, C. C. Aggarwal, and V.V.V. Prasad. Depth first generation of long patterns. In Proc. of the 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, USA ,(2000 )108–118 9. Roberto J. Bayardo. Efficiently mining long patterns from databases. In Proceedings of ACM-SIGMOD International Conference on Management of Data,(1998) 85–93 10. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. Proc. 7th Int. Conf. Database Theory (ICDT99), Jerusalem, Israel, (1999) 398–416
A Distributed Multidimensional Data Model of Data Warehouse Youfang Lin, Houkuan Huang, and Hongsong Li School of Computer Science and Information Technology, Northern Jiaotong Unversity, 100044, Beijing, China {lyf,hhk}@computer.njtu.edu.cn
Abstract. The base fact tables of a data warehouse are generally of huge size. In order to shorten query response time and to improve maintenance performance, a base fact table is generally partitioned into multiple instances of same schema. To facilitate multidimensional data analysis, common methods are to create different multidimensional data models (MDDM) for different instances which may make it difficult to realize transparent multi-instance queries and maintenance. To resolve the problems, this paper proposes a logically integrated and physically distributed multidimensional data model.
1 Introduction Data size of data warehouse is typically huge. Hence, a base fact table in warehouse may generally be partitioned into several partitions. To facilitate multidimensional data analysis, common resolutions are to build different cubes for these different instances of the base fact table. To do that, we should carry out similar design process repeatedly. Furthermore, for queries involving multiple instances of this kind of models, we have to write extra codes for front-end application to merge different parts of query results, which make it impossible to realize transparent multi-instance query through data access engine. To resolve the problems, we propose and are implementing a distributed multidimensional data model that can organize physically distributed tables into a logically integrated model, aiming at establishing a theoretical foundation of model logic for a distributed warehouse engine and enhancing the engine to provide distributed cube management and transparent query services.
2 Logic and Instances of Multidimensional Data Model We classify base fact table attributes into dimension attributes, measure attributes, and other attributes. We denote a BFT by a tuple (FN, DAS, MAS, OAS). Given a table F and a tuple set TS of it, we call FI=(F, TS) a base fact table instance of F.
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 664–667, 2003. © Springer-Verlag Berlin Heidelberg 2003
A Distributed Multidimensional Data Model of Data Warehouse
665
Dimensions are generally organized into hierarchical levels. In this paper, we use a model of dimension lattice to exemplify the distributed multidimensional data model. Definition 1. Dimension schema is partial order set 〈DN, ≤〉, where DN is a finite set of dimension levels, ≤ is a partial order defined on DN. Each dimension level d is associated with an attribute set, denoted by AS(d), where it includes a key k(d). We denote the domain of k(d) by dom(d). Suppose that 〈DN, ≤〉 is a lattice, we call it a dimension lattice DL. For a level di, given a domdi we call the tuple dIi didomdi dimension level instance of di And we call DNI ^dI1,…,dIk`, k _DN_ dimension level set instance. If (dj≤di)∧(dj≠di)∧¬∃dk ((dj≤dk)∧(dk≤di)), we denote the relation between dj and di by dj c, |= [c]ϕ ⇔ Nw (ϕ) ≥ c, |= [c]+ ϕ ⇔ Nw (ϕ) > c,
Note that [c]+ and c+ correspond to the strict inequality of the uncertainty measures. 4.2
Hybrid Qualitative Possibility Logic
It has been shown that the hybridization of possibilistic logic is helpful in the development of its proof methods[6]. However, for more practical applications, we can take the hybrid qualitative possibility logic(HQPL) as a tool for reasoning about multi-criteria decision making. Let us first define its syntax with respect to the set of propositional symbols, nominals, and a set of modality labels {≥0 , ≥1 , · · ·}. Its wffs are W F F := a | p | ¬ϕ | ϕ ∧ ψ | ϕ ∨ ψ | ϕ ⊃ ψ | ϕ ≥i ψ | a : ϕ For the semantics, an HQPL model is a triple M = (W, (πi )i≥0 , V ) where W is the set of states, each πi is a possibility distribution over W , and V is the truth valuation as in the HL models. In addition to the satisfaction clauses of HL, we have M, w |= ϕ ≥i ψ iff Πi (ϕ) ≥ Πi (ψ) where Πi is the possibility measure for πi and defined via the truth set of ϕ and ψ. In practical applications, each modality label can correspond to preference relation under a decision criterion, while the nominals are exactly the options available to the decision maker. There are in general two kinds of preference statements for the multi-criteria decision-making problems. The first is about the description of the general preference. This can be modelled by the QPL wff ϕ ≥i ψ which means that some options satisfying ϕ are preferred than some ones satisfying ψ according to the criterion i. The second regards the preference between specific options. This can only be modelled by wffs of the form a ≥i b which means that the option a is preferred to b according to the criterion i. 4.3
Hybrid Possibilistic Description Logic
Some works on the application of fuzzy description or modal logics to information retrieval have been done previously[9,8]. However, following the tradition of DL,
674
C.-J. Liau
most approaches separate the terms and formulas. Here, we shows a hybrid logic approach where the objects to be retrieved and the queries are treated uniformly. Let propositional symbols and nominals be given as above and {α0 , α1 , · · ·} be a set of role names, then the syntax for the wffs of hybrid possibilistic description logic(HPDL) is as follows: W F F := a | p | ¬ϕ | ϕ ∧ ψ | ϕ ∨ ψ | ϕ ⊃ ψ | a : ϕ| + [c]i ϕ | [c]+ i ϕ | ci ϕ | ci ϕ | [αi ]ϕ | αi ϕ where c is still a numeral in [0, 1]. An HPDL model for the language is a 4-tuple (W, (Ri )i≥0 , (Si )i≥0 , V ), where W and V are defined as in the HL models, each Ri is a (crisp) binary relation over W and each Si is a fuzzy binary relation over W . In the application, it is especially assumed that each Si is in fact a similarity relation. A fuzzy relation S : W × W → [0, 1] is called a ⊗-similarity relation if it satisfies the following three conditions: (i) reflexivity: S(w, w) = 1 for all w ∈ W (ii) symmetry: S(w, u) = S(u, w) for all w, u ∈ W (iii) ⊗-transitivity: S(w, u) ≥ supx∈W S(w, x) ⊗ S(x, u) for all w, u ∈ W where ⊗ is a t-norm1 . The semantics for modalities based on role names is the same as that for ML and for the similarity-based modalities, we adopt the semantics of HGML, so we do not have to repeat it again. In the application, the model can have an information retrieval interpretation, where W denotes the set of objects (documents, images, movies, etc.) to be retrieved and each Si is an endowed similarity relations which is associated with some aspect such as style, color, etc. As for the truth valuation function V and the relations Ri ’s, they decide the extensions of each wff just like the interpretation function of DL, so each wff in HPDL also corresponds to a concept term in DL. Since nominals are just a special kinds of wffs and each nominal refers to a retrievable object, the objects in the model are also denoted by wffs in the same way as queries. Let us look at an example adapted from [8] to illustrate the use of the logic. Example 1 (Exemplar-based retrieval) In some cases, in particular, for the retrieval of multimedia information, we may be given an exemplar or standard document and try to find documents very similar to the exemplar but satisfying some additional properties. In this case, we can formulate the query as ca ∧ ϕ, where a is the name for the exemplar and ϕ denotes the additional desired properties. According to the semantics, b : ca will be satisfied by an HPDL model (W, (Ri )i≥0 , S, V ) iff S(V (a), V (b)) ≥ c. Thus, a document referred by b will meet the query if it can satisfy the properties denoted by ϕ and is similar to the exemplar at least with degree c. 1
A binary operation ⊗ : [0, 1]2 → [0, 1] is a t-norm iff it is associative, commutative, and increasing in both places, and 1 ⊗ x = x and 0 ⊗ x = 0 for all x ∈ [0, 1].
An Overview of Hybrid Possibilistic Reasoning
675
It has been shown that the hybridization of DL in fact makes it possible to accommodate more expressive powers than ALC[2]. In particular, it can express number restriction and collection of individuals. For example, b : (book ∧ authora1 ∧ authora2 ∧ ¬a1 : a2 ) means that b is a book with at lease two authors.
5
Conclusion
We have presented some preliminary proposals on the hybridization of possibilistic logic in this paper. We study some variants of hybrid possibilistic logic and show that some application domains such as multi-criteria decision making and information retrieval indeed benefit from the hybridization. Further works on the elaboration of the proposed logical systems are expected.
References 1. P. Blackburn. “Representation, reasoning, and relational structures: a hybrid logic manifesto”. Logic Journal of IGPL, 8(3):339–365, 2000. 2. P. Blackburn and M. Tzakova. “Hybridizing concept languages”. Annals of Mathematics and Artificial Intelligence, 24:23–49, 1999. 3. L. Farinas del Cerro and A. Herzig. “A modal analysis of possibility theory”. In R. Kruse and P. Siegel, editors, Proceedings of the 1st ECSQAU, LNAI 548, pages 58–62. Springer-Verlag, 1991. 4. D. Dubois, J. Lang, and H. Prade. “Possibilistic logic”. In D.M. Gabbay, C.J. Hogger, and J.A. Robinson, editors, Handbook of Logic in Artificial Intelligence and Logic Programming, Vol 3 : Nonmonotonic Reasoning and Uncertain Reasoning, pages 439–513. Clarendon Press - Oxford, 1994. 5. F. Esteva, P. Garcia, L. Godo, and R. Rodriguez. “A modal account of similaritybased reasoning”. International Journal of Approximate Reasoning, pages 235–260, 1997. 6. C.J. Liau. “Hybrid logic for possibilistic reasoning”. In Proc. of the Joint 9th IFSA World Congress and 20th NAFIPS International Conference, pages 1523– 1528, 2001. 7. C.J. Liau and I.P. Lin. “Quantitative modal logic and possibilistic reasoning”. In B. Neumann, editor, Proceedings of the 10th ECAI, pages 43–47. John Wiley & Sons. Ltd, 1992. 8. C.J. Liau and Y.Y. Yao. “Information retrieval by possibilistic reasoning”. In H.C. Mayr, J. Lazansky, G. Quirchmayr, and P. Vogel, editors, Proc. of the 12th International Conference on Database and Expert Systems Applications (DEXA 2001),, LNCS 2113, pages 52–61. Springer-Verlag, 2001. 9. C. Meghini, F. Sebastiani, and U. Straccia. “A model of multimedia information retrieval”. JACM, 48(5):909–970, 2001. 10. M. Schmidt-Schauß and G. Smolka. “Attributive concept descriptions with complements”. Artificial Intelligence, 48(1):1–26, 1991. 11. L.A. Zadeh. “Fuzzy sets as a basis for a theory of possibility”. Fuzzy Sets and Systems, 1(1):3–28, 1978.
Critical Remarks on the Computational Complexity in Probabilistic Inference S.K.M. Wong, D. Wu, and Y.Y. Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada, S4S 0A2 {wong, danwu, yyao}@cs.uregina.ca
Abstract. In this paper, we review the historical development of using probability for managing uncertain information from the inference perspective. We discuss in particular the NP-hardness of probabilistic inference in Bayesian networks.
1
Introduction
Probability has been successfully used in AI for managing uncertainty [4]. A joint probability distribution (jpd) can be used as a knowledge base in expert systems. Probabilistic inference, namely, calculating posterior probabilities, is a major task in such knowledge based systems. Unfortunately, probabilistic methods for coping with uncertainty fell out of favor in AI from 1970s to the mid-1980s. There are two major problems. One is the intractability of acquiring a jpd with a large number of variables, and the other is the intractability of computing posterior probabilities for probabilistic inference. However, the probability approach managed to come back in middle 1980s with the discovery of Bayesian networks (BNs) [4]. The purpose of introducing BNs is to solve the intractability of acquiring the jpd. The BN provides a representation of the jpd as a product of conditional probability distributions (CPDs). The structure of such a product can be characterized by a directed acyclic graph (DAG). Once the jpd is specified in this manner, one still has to design efficient methods for computing posterior probabilities. “Effective” probabilistic inference methods have been developed [2] for BNs and they seem to be quite successful in practice. Therefore, the BNs seemingly overcome the representation problems that early expert systems encountered. However, the task of computing posterior probabilities for probabilistic inference in BNs is NP-hard as shown by Copper [1]. This negative result has raised some concerns about the practical use of BNs. In this paper, we review the historical development of using probability for managing uncertainty. In particular, we discuss the problem of probabilistic inference in BNs. By studying the proof in [1], we observe that the NP-hardness of probabilistic inference is due to the fact that the DAG of a BN contains a node having a large number of parents. This observation of the cause of NP-hardness may help the knowledge engineers to avoid the pitfalls in designing a BN. G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 676–681, 2003. c Springer-Verlag Berlin Heidelberg 2003
Critical Remarks on the Computational Complexity
677
The paper is organized as follows. We introduce some pertinent notions in Section 2. In Section 3, we recall the difficulties of early expert systems using a jpd. Probabilistic inference is discussed in Section 4. In Section 5, we analyze the cause of the NP-hardness of probabilistic inference in BNs. We conclude our paper in Section 6.
2
Background
Let R denote a finite set of discrete variables. Each variable of R is associated with a finite domain. By XY , we mean the union of X and Y , i.e., X ∪ Y and X, Y ⊆ R. The domain for X ⊆ R, denoted DX , is the Cartesian product of the individual domain for each variable in X. Similarly, the domain for R, denoted DR , or just D, if R is understood, is the Cartesian product of each individual domain of each variable in R. Each element in the domain is called a configuration of the corresponding variable(s) and we use the corresponding lowercase letter, possibly using subscripts to denote it. For example, we use x to denote that it is an element in the domain of X. Naturally, we use X = x to indicate that X is taking the value x. We define a joint probability distribution (jpd) over R to be a function p on D, denoted p(R), such that for each configuration t ∈ D, 0 ≤ p(t) ≤ 1, and t∈D p(t) = 1. Let X ⊆ R. We define the marginal (probability distribution) of p(R), denoted p(X), as p(X) = p(R). R−X
Let X, Y ⊆ R and X ∩ Y = ∅. We define the conditional probability distribution of X given Y , denoted p(X|Y ), as p(X|Y ) =
P (XY ) , whenever p(Y ) = 0. p(Y )
Probabilistic inference means computing the posterior probability distribution for a set of query variables, say X, given exact values for some evidence variables, say Y , that is, to compute p(X|Y = y).
3
Probabilistic Inference Using JPD
Early expert systems tried to handle uncertainty by considering the jpd as a knowledge base and working solely with the jpd [5] without taking advantage of the conditional independencies satisfied by the jpd. Suppose a problem domain can be described by a set R = {A1 , . . . , An } of discrete variables, the early expert systems tried to use the jpd p(R) to describe the problem domain and conduct inference based on p(R) alone. This method quickly becomes intractable
678
S.K.M. Wong, D. Wu, and Y.Y. Yao
when the number of variables in R becomes large. Let |R| = n and assume the domain of each variable Ai ∈ R is binary. To store p(R) in a table requires exponential storage, that is, we need to store 2n entries in the single table. Computing a marginal probability, for example p(Ai ), require 2n−1 additions, namely, p(Ai ) = p(R). A, ..., Ai−1 , Ai+1 , ...,An
In other words, both the storage of the jpd and the computation of a marginal exhibit exponential complexity. Thus, probabilistic inference using jpd alone soon fell out of favor in AI in 1970s [5].
4
Probabilistic Inference Using BNs
The problems of using a jpd alone for representing and reasoning with uncertainty were soon realized and were not solved until late 1980s, in which time Bayesian networks were discovered as a method for representing not only a jpd, but also the conditional independency (CI) information satisfied by the jpd. A Bayesian network defined over a set R of variables consists of a DAG which is augmented by a set of CPDs whose product yields a jpd [4]. Consider the BN over R = {X, U1 , U2 , U3 , U4 , C1 , C2 , C3 , Y } in Fig. 1, the structure of the DAG encodes CI information which can be identified by d-separation [4]. Each node in the DAG corresponds to a variable in R and is associated with a conditional probability. For each node without parents, it is associated with a marginal. For instance, the node X in the DAG is associated with the marginal p(X). For each node with parents, it is associated with a CPD of this node given its parents. For instance, the node Y in the DAG is associated with the CPD p(Y |C1 C2 C3 ). The product of these CPDs defines the following jpd: p(R) = p(X) · p(U1 |X) · p(U2 |X) · p(U3 |X) · p(U4 |X) ·p(C1 |U1 U2 U3 ) · p(C2 |U1 U2 U3 ) · p(C3 |U2 U3 U4 ) ·p(Y |C1 C2 C3 ).
(1)
The factorization in the above equation indicates that instead of storing the whole jpd p(R) in a single table, in which case the exponential storage problem described early would occur, one can now store each individual CPD instead. Since storing each CPD requires significantly less space than storing the entire jpd, seemingly the BN approach has solved the problem of storage intractability experienced by early expert systems. More encouragingly, “effective” algorithms have been developed for probabilistic inference without the need of recovering the entire jpd as defined by equation (1). These methods are referred to as the local propagation method and its variants [3]. The local propagation method first moralizes and then triangulates the DAG. Then a junction tree is constructed on which the inference can be performed [3].
Critical Remarks on the Computational Complexity
679
Fig. 1. A BN defined over R = {X, U1 , U2 , U3 , U4 , C1 , C2 , C3 , Y }.
5
NP-Hard in Probabilistic Inference
The introduction of BNs seems to have completely solved the problems encountered by early expert systems. By utilizing the CI information, individual CPD tables are stored instead of the whole jpd. Therefore, the storage intractability problem is “solved”. By applying the local propagation method, computing posterior probability can be effectively and efficiently done without resorting back to the whole jpd. That is, therefore, the computational intractability problem is “solved” as well. However, Cooper [1] showed that the task of probabilistic inference in BNs is NP-hard. This result indicates that there does not exist an algorithm with polynomial time complexity for inference in BNs in general. In the following, we analyze the cause of the NP-hardness by studying the proof in [1]. Cooper [1] proved the decision problem version of probabilistic inference is NP-complete by successfully transforming a well known NP-complete problem, the 3SAT problem [1], into the decision problem of probabilistic inference in BNs. For example, consider an instance of 3SAT in which U = {U1 , U2 , U3 , U4 } is a set of propositions and C = {U1 ∨ U2 ∨ U3 , ¬U1 ∨ ¬U2 ∨ U3 , U2 ∨ ¬U3 ∨ U4 } is a set of clauses. The objective of a 3SAT problem is to find a satisfying truth assignment for the propositions in U so that every clause in C is evaluated to be true simultaneously. We can transform this 3SAT problem into a BN decision problem: “Is p(Y |X) > 0 ?” The corresponding BN is shown in Fig.1. The 3SAT example given in the preceding paragraph involves only 4 propositions in U and 3 clauses in C. Suppose in general the number of propositions in U is n and the number of clauses in C is m. The corresponding BN according to the construction in [1] is depicted in Fig. 2. Note that the BN in Fig.1 and the BN in Fig. 2 have the same structure.
680
S.K.M. Wong, D. Wu, and Y.Y. Yao
Fig. 2. A BN defined over R = {X, U1 , . . . , Un , C1 , . . . , Cm , Y }.
The jpd defined by the DAG in Fig. 2 is: p(R) = p(X) · p(U1 |X) · . . . p(Ui |X) . . . · p(Un |X) ·p(C1 |Ui1 Ui2 Ui3 ) · . . . · p(Cm |Uj1 Uj2 Uj3 ) ·p(Y |C1 . . . Cm ).
(2)
One may immediately note that in equation (2), the CPD p(Y |C1 . . . Cm ) indicates that Y has m parents, i.e., C1 . . . Cm , and all the other CPDs have at most 3 parents. Therefore, the storage requirement for the CPD p(Y |C1 . . . Cm ) is exponential in m, the number of parents of Y . In other words, the representation of this CPD table is intractable, which renders the representation of the BN intractable. On the other hand, if one wants to compute p(Y ), one needs to compute p(C1 . . . Cm ) regardless of the algorithm used, i.e., p(Y ) = p(Y, C1 . . . Cm ) C1 ...Cm
=
p(C1 . . . Cm ) · p(Y |C1 . . . Cm ).
C1 ...Cm
The number of additions and multiplications for computing p(Y ) is exponential with respect to m, i.e., the number of parents of Y . The above analysis has demonstrated that the cause of the exponential storage and exponential computation is due to the existence of the CPD p(Y |C1 . . . Cm ). In designing a BN, there are several existing techniques such as noise-or and divorcing that can be used to remedy this situation. The noise-or technique [2] allows the designer to specify p(Y |C1 ), . . ., p(Y |Cm ) individually and combine them into p(Y |C1 . . . Cm ) under certain assumptions. The divorcing technique [2], on the other hand, partitions the parent set {C1 , . . . , Cm } into m < m groups and introduces m intermediate variables. Each intermediate
Critical Remarks on the Computational Complexity
681
variable is the parent of Y . More recently, the notion of contextual weak independency [6] was proposed to further explore possible decomposition of a large CPD table into smaller ones.
6
Conclusion
In this paper, we have reviewed the problem of probabilistic inference in BNs. We point out that the cause of NP-hardness is due to a particular structure of the DAG. That is, there exists a node in the DAG with a large number of parent nodes.
References [1] G.F. Cooper. The computational complexity of probabilistic inference using bayesian belief networks. Artificial Intelligence, 42(2-3):393–405, 1990. [2] F.V. Jensen. An Introduction to Bayesian Networks. UCL Press, 1996. [3] S.L. Lauritzen, T.P. Speed, and K.G. Olesen. Decomposable graphs and hypergraphs. Journal of Australian Mathematical Society, 36:12–29, 1984. [4] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Francisco, California, 1988. [5] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs, New Jersey, 1995. [6] S.K.M. Wong and C.J. Butz. Contextual weak independence in bayesian networks. In Fifteenth Conference on Uncertainty in Artificial Intelligence, pages 670–679. Morgan Kaufmann Publishers, 1999.
Critical Remarks on the Maximal Prime Decomposition of Bayesian Networks Cory J. Butz, Qiang Hu, and Xue Dong Yang Department of Computer Science, University of Regina, Regina, Canada, S4S 0A2, {butz,huqiang,yang}@cs.uregina.ca
Abstract. We present a critical analysis of the maximal prime decomposition of Bayesian networks (BNs). Our analysis suggests that it may be more useful to transform a BN into a hierarchical Markov network.
1
Introduction
Very recently, it was suggested that a Bayesian network (BN) [3] be represented by its unique maximal prime decomposition (MPD) [1]. An MPD is a hierarchical structure. The root network is a jointree [3]. Each node in the jointree has a local network, namely, an undirected graph called a maximal prime subgraph. This hierarchical structure is claimed to facilitate probabilistic inference by locally representing independencies in the maximal prime subgraphs. In this paper, we present a critical analysis of the MPD representation of BNs. Although the class of parent-set independencies is always contained within the nodes of the root jointree, we establish in Theorem 2 that this class is never represented in the maximal prime subgraphs (see Example 3). Furthermore, in Example 4, we show that there can be an independence in a BN defined precisely on the same set of variables as a node in the root jointree, yet this independence is not represented in the local maximal prime subgraph. Conversely, we explicitly demonstrate in Example 5 that there can be an independence holding in the local maximal prime subgraph, yet this independence cannot be realized using the probability tables assigned the corresponding node in the root jointree. This paper is organized as follows. Section 2 reviews the maximal prime decomposition of BNs. We present a critique of the MPD representation in Section 3. The conclusion is presented in Section 4.
2
Maximal Prime Decomposition of Bayesian Networks
Let X, Y, Z be pairwise disjoint subsets of U . The conditional independence [3] of Y and Z given X is denoted I(Y, X, Z). The conditional independencies encoded in the Bayesian network (BN) [3] in Fig. 1 on U = {a, b, c, d, e, f, g, h, i, j, k} indicate that the joint probability distribution can be written as p(U ) = p(a) · p(b) · p(c|a) · p(d|b) · p(e|b) · p(f |d, e) · p(g|b) · p(h|c, f ) · p(i|g) · p(j|g, h, i) · p(k|h). Olesen and Madsen [1] proposed that a given BN be represented by its unique maximal prime decomposition (MPD). An MPD is a hierarchical structure. The G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 682–685, 2003. c Springer-Verlag Berlin Heidelberg 2003
Critical Remarks on the Maximal Prime Decomposition
683
Fig. 1. A Bayesian network D encoding independencies on the set U of variables.
root network is a jointree [3]. Each node in the jointree has a local network, namely, an undirected graph called a maximal prime subgraph. Example 1. Given the BN D in Fig. 1, the MPD representation is shown in Fig. 2. Each of the five nodes in the root jointree has an associated maximal prime subgraph as denoted with an arrow. The root jointree encodes independencies on U , while the maximal prime subgraphs encode independencies on proper subsets of U . For example, the root jointree in Fig. 2 encodes I(a, c, bdef ghijk), I(abcdef gij, h, k) I(ack, f h, bdegij), and I(abcdef k, gh, ij) on U , while I(bde, f g, h), for instance, can be inferred from the maximal prime subgraph for node bdef gh. The numerical component of the MPD is defined by assigning the conditional probability tables of the BN to the nodes of the jointree. In our example, this assignment must be as follows: φ1 (ac) = p(a) · p(c|a), φ2 (cf h) = p(h|c, f ), φ3 (bdef gh) = p(b) · p(d|b) · p(e|b) · p(f |d, e) · p(g|b), φ4 (ghij) = p(i|g) · p(j|g, h, i), and φ5 (hk) = p(k|h).
3
Critical Remarks on the MPD of BNs
A parent-set independency I(Y, X, Z) is one such that Y XZ is the parent-set [2] of a variable ai in a BN. For example, the BN D in Fig. 1 encodes the parent-set independencies I(c, ∅, f ) and I(h, g, i); c and f are the parents of variable h, while g, h and i are the parents of j. The proof of the next result is omitted due to lack of space. Theorem 2. All parent-set independencies in a Bayesian network are not represented in the maximal prime decomposition.
684
C.J. Butz, Q. Hu, and X.D. Yang
Fig. 2. The maximal prime decomposition (MPD) of the BN D in Fig. 1.
Example 3. Although the BN D in Fig. 1 indicates that variables c and f are unconditionally independent, the maximal prime decomposition of D indicates that c and f are dependent. Similar remarks hold for the parent-set independence I(h, g, i). Example 4. I(def h, b, g) holds in the given BN. On the contrary, b does not separate {d, e, f, h} from g in the maximal prime subgraph bdef gh as g and h are directly connected (dependent). Example 5. I(bde, f g, h) can be inferred by separation from the maximal prime subgraph bdef gh. However, it can never be realized in the probability table φ(bdef gh) as variable h does not appear in any of the conditional probability tables p(b), p(d|b), p(e|b), p(f |d, e), p(g|b) assigned to φ(bdef gh). In [3], Wong et al. suggested that a BN be transformed in a hierarchical Markov network (HMN). An HMN is a hierarchy of Markov networks (jointrees). The primary advantages of HMNs are that they are a unique and equivalent representation of BNs [3]. For example, given the BN D in Fig. 1, the unique and equivalent HMN is depicted in Fig. 3. Example 6. The BN D in Fig. 1 can be transformed into the unique MPD in Fig. 2 or into the unique HMN in Fig. 3. Unlike the MPD representation
Critical Remarks on the Maximal Prime Decomposition
685
Fig. 3. The hierarchical Markov network (HMN) for the BN D in Fig. 1.
which does not encode, for instance, the independencies I(c, ∅, f ), I(h, g, i), and I(d, b, e), these independencies are indeed encoded in the appropriate nested jointree of the HMN.
4
Conclusion
The maximal prime decomposition (MPD) [1] of Bayesian networks (BNs) is a very limited hierarchical representation as it always consists of precisely two levels. Moreover, the MPD is undesirable since it is not a faithful representation of BNs. On the contrary, it has been previously suggested that BNs be transformed into hierarchical Markov networks (HMNs) [3]. The primary advantages of HMNs are that they are a unique and equivalent representation of BNs [3]. These observations suggest that compared with the MPD of BNs, HMNs seem to be a more desirable representation.
References 1. Olesen, K.G. and Madsen, A.L.: Maximal prime subgraph decomposition of Bayesian networks. IEEE Transactions on Systems, Man, and Cybernetics, B, 32(1):21–31, 2002. 2. Wong, S.K.M., Butz, C.J. and Wu, D.: On the implication problem for probabilistic conditional independency. IEEE Transactions on Systems, Man, and Cybernetics, A, 30(6): 785–805, 2000. 3. Wong, S.K.M., Butz, C.J. and Wu, D.: On undirected representations of Bayesian networks. ACM SIGIR Workshop on Mathematical/Formal Models in Information Retrieval, 52–59, 2001.
A Non-local Coarsening Result in Granular Probabilistic Networks Cory J. Butz, Hong Yao, and Howard J. Hamilton Department of Computer Science, University of Regina, Regina, Canada, S4S 0A2, {butz,yao2hong,hamilton}@cs.uregina.ca
Abstract. In our earlier works, we coined the phrase granular probabilistic reasoning and showed a local coarsening result. In this paper, we present a non-local method for coarsening variables (i.e., the variables are spread throughout the network) and establish its correctness.
1
Introduction
In [3], we coined the phrase granular probabilistic reasoning to mean the ability to coarsen and refine parts of a probabilistic network depending on whether they are of interest or not. Granular probabilistic reasoning is of importance as it not only leads to more efficient probabilistic inference, but it also facilitates the design of large probabilistic networks [4]. It is then not surprising that Xiang [4] explicitly states that our granular probabilistic reasoning [3] demands further attention. We proposed two operators called nest and unnest for coarsening and refining variables in a network, respectively [3]. In [1], we showed that the nest operator can be applied locally to a marginal distribution with the same effect as if it were applied directly to the joint distribution. However, no study has ever addressed how to coarsen variables spread throughout a network. In this paper, we present a method, called Non-local Nest, for coarsening non-local variables, that is, variables spread throughout a network. This method gathers all variables to be coarsened into one marginal distribution, and then applies the nest operator. We also prove our method is correct. This paper is organized as follows. Section 2 reviews a local nest method. We present a non-local method for nesting variables in Section 3. The conclusion is presented in Section 4.
2
A Local Method for Nesting
Consider the joint distribution p(R) represented as a probabilistic relation r(R) in Fig. 1, where R = {A, B, C, D, E, F } = ABCDEF is a set of variables. Configurations with zero probability are not shown. The nest operator φ is used to coarsen a relation r(XY ). Intuitively, φA=Y (r) groups together all the Yvalues into a nested distribution for coarse variable A given the same X-value. G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 686–689, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Non-local Coarsening Result in Granular Probabilistic Networks
687
More formally, φA=Y (r) = {t | t(X) = u(X), t(A) = {u(Y p(R))}, t(p(XA)) =
u(p(R)), and u ∈ r}.
u
Attribute p(R) in the A-value is relabeled p(Y ) and the values are normalized. A 0 0 r(R) = 0 0 0 1
B 0 0 0 1 1 0
C 0 0 1 0 0 0
D 0 0 0 1 1 2
E 0 1 1 0 1 2
F 0 0 1 2 2 0
p(R) 0.05 0.05 0.20 0.15 0.15 0.40
Fig. 1. A probabilistic relation r(R) representing a joint distribution p(R).
Example 1. Recall the relation r(ABCDEF ) in Fig. 1. Nesting variables Y = DE as the single variable G gives the nested relation φG={D,E} (r) in Fig. 2. For instance, given the fixed X-value A : 0, B : 1, C : 0, F : 2, the Y-values D : 1, E : 0, p(R) : 0.15 and D : 1, E : 1, p(R) : 0.15 are grouped into a nested distribution. Here the attribute p(R) is relabeled as p(DE), and the probability values 0.15 and 0.15 are normalized as 0.50 and 0.50. In practice, a joint distribution r(R) is represented as a Markov network (MN) [2]. The dependency structure of a MN is an acyclic hypergraph (a jointree) [2]. The acyclic hypergraph encodes conditional independencies [2] satisfied by r(R). For example, the joint distribution p(R) in Fig. 1 can be expressed as the MN: p(R) =
p(ABD) · p(ABC) · p(ACE) · p(BCF ) , p(AB) · p(AC) · p(BC)
(1)
where R = {R1 = {A, B, D}, R2 = {A, B, C}, R3 = {A, C, E}, R4 = {B, C, F }} is an acyclic hypergraph, and the marginal distributions r1 (ABD), r2 (ABC), r3 (ACE), and r4 (BCF ) of r(R) are shown in Fig. 3. In our probabilistic relational model [2], the MN in Eq. (1) is expressed as r(R) = ((r1 (ABD) ⊗ r2 (ABC)) ⊗ r3 (ACE)) ⊗ r4 (BCF ), where the Markov join operator ⊗ means r(XY ) ⊗ r(Y Z) =
p(XY ) · p(Y Z) . p(Y )
688
C.J. Butz, H. Yao, and H.J. Hamilton ABC
G
0 0 0
D E p(DE) 0 0 0.5 0 1 0.5
0 0 1 φG={D,E} (r) = 0 1 0
1 0 0
D E p(DE) 0 1 1.0 D E p(DE) 1 0 0.5 1 1 0.5 D E p(DE) 2 2 1.0
F p(ABCGF )
0
0.1
1
0.2
2
0.3
0
0.4
Fig. 2. The nested relation φG={D,E} (r), where r is the relation in Fig. 1.
r1 =
A 0 0 1
B 0 1 0
D 0 1 2
p(R1 ) A B C p(R2 ) ACE 0.3 r2 = 0 0 0 0.1 r3 = 0 0 0 0.3 0 0 1 0.2 0 0 1 0.4 0 1 0 0.3 0 1 1 1 0 0 0.4 1 0 2
p(R3 ) B C F p(R4 ) 0.2 r4 = 0 0 0 0.5 0.2 0 1 1 0.2 0.2 1 0 2 0.3 0.4
Fig. 3. The marginals r1 (ABD), r2 (ABC), r3 (ACE), and r4 (BCF ) of relation r.
We may omit the parentheses for simplified notation. For example, the Markov join r1 (ABD) ⊗ r2 (ABC) ⊗ r3 (ACE) is shown in Fig. 4. The main result in [1] was that the nest operator can be applied locally to one marginal distribution in a Markov network with the same effect as if applied directly to the joint distribution itself. For example, φG={E} (r) is the same nested distribution as r1 (ABD) ⊗ r2 (ABC) ⊗ φG={E} (r3 (ACE)) ⊗ r4 (BCF ). We next study how to coarsen variables spread throughout a network.
3
A Non-local Method for Nesting
We call Y ⊆ R a nestable set with respect to a MN on R = {R1 , . . . , Rn }, if Y does not intersect any separating set [2] of R. Since the nest operator is unary, the first task is to combine the nestable set Y of variables into a single table. The well-known relational database selective reduction algorithm (SRA) is applied for this purpose. The nest operator can then be applied to coarsen Y as attribute A. We now present the formal algorithm Non-local Nest (NLN) to coarsen a nestable set Y as attribute A in a given MN on acyclic hypergraph R.
A Non-local Coarsening Result in Granular Probabilistic Networks A 0 0 r1 (ABD) ⊗ r2 (ABC) ⊗ r3 (ACE) = 0 0 0 1
B 0 0 0 1 1 0
C 0 0 1 0 0 0
D 0 0 0 1 1 2
689
E p(ABCDE) 0 0.05 1 0.05 1 0.20 0 0.15 1 0.15 2 0.40
Fig. 4. The Markov join r1 (ABD) ⊗ r2 (ABC) ⊗ r3 (ACE).
Algorithm 1 NLN(Y, A, R) 1. Let Rj , . . . , Rn be those elements of R not deleted by the call SRA(Y, R). 2. Return φA=Y (rjn ), where rjn = rj (Rj ) ⊗ . . . ⊗ rn (Rn ). Theorem 1. Let r(R) = r1 (R1 ) ⊗ . . . ⊗ ri (Ri ) ⊗ rj (Rj ) . . . ⊗ rn (Rn ) be represented as a MN, and let Y be a nestable set. Then φA=Y (r) is the same as r1 (R1 ) ⊗ . . . ⊗ ri (Ri ) ⊗ r , where r is the nested relation returned by the call NLN(Y, A, {R1 , R2 , . . . , Rn }). Example 2. DE is a nestable set with respect to R = {R1 , R2 , R3 , R4 } as used in our running example. Suppose we wish to compute NLN(DE, G, R). Then SRA(DE, R) = {R1 , R2 , R3 }. Again, the Markov join r1 ⊗ r2 ⊗ r3 is shown in Fig. 4. The reader can verify that the nested relation φG={D,E} (r) in Fig. 2 is the same as r4 (BCF )⊗ φG={D,E} (r1 (ABD) ⊗ r2 (ABC) ⊗ r3 (ACE)).
4
Conclusion
Xiang [4] explicitly states that more work needs to be done on our granular probabilistic networks [3]. In this paper, we have extended the work in [1] by coarsening a non-local set of variables (i.e., variables spread throughout a network). Theorem 1 establishes the correctness of our approach.
References 1. Butz, C.J., Wong, S.K.M.: A local nest property in granular probabilistic networks, Proc. of the Fifth Joint Conf. on Information Sciences, 1, 158–161, 2000. 2. Wong, S.K.M., Butz, C.J.: Constructing the dependency structure of a multi-agent probabilistic network. IEEE Trans. Knowl. Data Eng. 13(3) (2001) 395–415. 3. Wong, S.K.M., Butz, C.J.: Contextual weak independence in Bayesian networks, Proc. of the Fifteenth Conf. on Uncertainty in Artificial Intelligence, 670–679, 1999. 4. Xiang, Y.: Probabilistic Reasoning in Multi-Agent Systems: A Graphical Models Approach. Cambridge Publishers, 2002.
Probabilistic Inference on Three-Valued Logic Guilin Qi Mathematics Department, Yichun College Yichun, Jiangxi Province, 336000
[email protected] Abstract. In this paper, we extend Nilsson’s probabilistic logic [1] to allow that each sentence S has three sets of possible worlds. We adopt the ideas of consistency and possible worlds introduced by Nilsson in [1], but propose a new method called linear equation method to deal with the problems of probabilistic inference, the results of our method is consistent with those of Yao’s interval-set model method.
1
Introduction
Nilsson in [1] presented a method to combine logic with probability theory. In his probabilistic logic, the problems of probabilistic entailment were solved by a geometric analysis method. Later, Yao in [4] gave a modified modus ponens rule which may be used in incidence calculus, then carried out probabilistic inference based on modus ponens. Furthermore, Yao discussed the relationship between three-valued logic and interval-set model, and gave a modus ponens rule for interval-set model. Then he obtained an extension of previous probabilistic inference. Naturally, we may wonder whether there exists an extension of Nilsson’s probabilistic logic. This problem is discussed here.
2
Probabilistic Inference on Three-Valued Logic
As in probabilistic logic, we will relate each sentence S to some sets of possible worlds, but this time the numbers of the sets of possible worlds are three-two of them, say, W1 and W2 , containing worlds in which S was true and false respectively, the third, say, W3 , containing worlds in which S was neither true nor f alse. Clear, the consistent sets of truth values for sentences φ, φ→ψ, ψ are given by columns in the following table: φ φ→ψ ψ
T
T
T
I
I
T T
I I
F F
T I T I
I
F
F
F
I T F T
T I
T F
where T denotes the truth value ”true”, F denotes the truth value ”f alse”, I denotes the truth value ”unknown”. G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 690–693, 2003. c Springer-Verlag Berlin Heidelberg 2003
Probabilistic Inference on Three-Valued Logic
691
Since the sentence S possesses a third truth value, which is different from T and F , we can extend the probability of S to a probability interval [P∗ (S), P ∗ (S)], where P∗ (S) is taken to be the sum of the probabilities of all the sets of worlds in which S is true, while P ∗ (S) is taken to be the sum of the probabilities of all the sets of worlds in which S is either true or unknown. Next, we define a matrix equation Π = VP ,
(1)
where Π, V, P are defined as follows. Suppose there are K sets of possible worlds for our L sentences in B(B is a set of sentences), then P is the K-dimensional column vector that represents the probabilities of the sets of possible worlds. V = (V1 , V2 , ..., VK ), where Vi is taken to be have components equal to 0, 1, or 1/2. The component vji = 1 if Sj has the value true in the worlds in Wi , vji = 0 if Sj has the value f alse in the worlds in Wi , vji = 1/2 if Sj has the value unknown in the worlds in Wi , here ”1/2” is not just a number but a symbol to represent the unknown state. Π is a L-dimensional column vector whose component πi denotes the ”probability” of each sentence S in B. We obtain the best possible bounds for the probability interval [P∗ (S), P ∗ (S)] by following steps: First we define two sets of integrals (Ii )∗ and (Ii )∗ as (Ii )∗ = {j∈Z|vij = 1},
(Ii )∗ = {j∈Z|vij = 1∨1/2}.
k Since πi = Σj=1 vij pj , we have
P∗ (Si ) = Σj∈(Ii )∗ pj ,
P ∗ (Si ) = Σj∈(Ii )∗ pj .
(2)
The rule of modus ponens allows us to infer ψ from φ and φ→ψ, but if the probability interval of φ and φ→ψ are given, what can we infer for [P∗ (ψ), P ∗ (ψ)]? To solve these kinds of problems, let us consider three sentences φ, φ→ψ, ψ. The consistent truth-value assignments are given by the columns in the matrix V as follows: 1 1 1 1/2 1/2 1/2 0 0 0 V = 1 1/2 0 1 1/2 1/2 1 1 1 1 1/2 0 1 1/2 0 1 1/2 0 The first row of the matrix gives truth values for φ in the four sets of possible worlds. The second row gives truth values for φ→ψ, and the third row gives truth values for ψ. Probability values for these sentences are constrained by the matrix equation (1), and by the rules of probability, Σi pi = 1 and 0≤pi ≤1 for all i. Now suppose we are given the probability interval of sentences φ and φ→ψ. The probability interval of φ, is denoted by interval [P∗ (φ), P ∗ (φ)]; the probability interval of φ→ψ, is denoted by [P∗ (φ→ψ), P ∗ (φ→ψ)]. By the definition of P∗ (S) and P ∗ (S)(where S is an arbitrary sentence) and the equation (2) we have P∗ (φ) = p1 + p2 + p3 , P∗ (φ→ψ) = p1 + p4 + p7 + p8 + p9 , P∗ (ψ) = p1 + p4 + p7 ,
P ∗ (φ) = p1 + p2 + · · · + p6 P (φ→ψ) = p1 + p2 + p4 + · · · + p9 P ∗ (ψ) = p1 + p2 + p4 + p5 + p7 + p8 . ∗
692
G. Qi
Now we will try to obtain the best possible bounds for [P∗ (ψ), P ∗ (ψ)] using a new method called linear equation method: First, we will find the best lower bound for P∗ (ψ), here we only consider the best linear lower bound1 of P∗ (ψ) and we assume it is the best lower bound. The reason that why not other combination be taken will be discussed elsewhere. By the constraints of pi , we know that the linear lower bounds of P∗ (ψ) are l1 p1 + l2 p4 + l3 p7 , 0≤li ≤1, and the best linear lower bound must be one of them. But which one is the best? We claim that it must satisfy following two conditions: Linear Condition: It should be the linear combination of P∗ (φ), P ∗ (φ), P∗ (φ→ψ), P ∗ (φ→ψ), and 1. Maximum Condition: It should be maximum among those satisfy linear condition . Proposition 1. The best lower bound for P (ψ) is P ∗ (φ) + P∗ (φ→ψ) − 1, if we are given the probability interval for φ and φ→ψ. Proof. The linear requires that the lower bounds l1 p1 + l2 p4 + l3 p7 should satisfy the linear equation l1 p1 + l2 p4 + l3 p7 = x1 p∗ (φ) + x2 p∗ (φ) + x3 p∗ (φ→ψ) + x4 p∗ (φ→ψ) + y,
(3)
where variables x1 , x2 , x3 are coefficients of p∗ (φ), p∗ (φ), p∗ (φ→ψ) respectively and y is the coefficient of 1. By the discussion above and Σi pi = 1 we know the equation (3) is equivalent to l1 p1 + l2 p4 + l3 p7 = a1 p1 + a2 p2 + ... + a9 p9
(4)
where a1 = x1 + x2 + x3 + x4 + y, a2 = x1 + x2 + x4 + y, a3 = x1 + x2 + y, a4 = x2 + x3 + x4 + y, a5 = x2 + x4 + y, a6 = x2 + x4 + y, a7 = x3 + x4 + y, a8 = x3 + x4 + y, a9 = x3 + x4 + y. Before continuing with our discussions about solving equation (4), let us consider a lemma: Lemma 1. Suppose p1 , p2 , ..., p9 are probabilities of the nine consistent possible worlds which satisfy Σi pi = 1 and 0≤pi ≤1, then pi , i = 1, 2, ..., 9 are linear independent. Proof. The proof of lemma 1 is clear. By lemma 1 we know the equation (4) is equivalent to the following system of equations: x1 + x2 + x3 + x4 + y = l1 , x1 + x2 + x4 = x1 + x2 + y = x2 + x4 + y = x3 + x4 + y = 0, x2 + x3 + x4 + y = l2 , x3 + x4 + y = l3 . (5) It is clear the solution for the system of equation (5) is x1 = 0, x2 = x3 = l1 = l2 = −y, x4 = 0, l3 = 0, 1
a lower bound is linear if it can be expressed as the linear combination of pi , where i = 1, 2..., 9, that is, it is equal to x1 p1 + .. + x9 p9 , for some integrals xi .
Probabilistic Inference on Three-Valued Logic
693
next, by the maximum condition we must have l1 = l2 = 1, therefore the best possible solution for equation (3) is x1 = 0, x2 = x3 = 1, x4 = 0, y = −1. In this way, we can know p1 + p4 = P ∗ (φ) + P∗ (φ→ψ) − 1 is the best possible lower bound of P∗ (ψ). Next, we will decide the best upper bound for P ∗ (ψ). As for the best lower bound, here we only consider the linear upper bounds. The linear upper bounds of P ∗ (ψ) are l1 p1 + l2 p2 + ... + l9 p9 , where li ≥1, for i = 1, 2, 4, 5, 7, while li ≥0, for 9 other i and Σi=1 li ≤1. The best upper bound must be one of them. We claim that this best upper bound should satisfy the linear condition and minimum condition, that is, it should be minimum among those satisfy linear condition. As for the best upper bound, we have following proposition: Proposition 2. The best upper bound for P (ψ) is P ∗ (φ→ψ), if we are given the probability interval for φ and φ→ψ. Proof. The proof of proposition 2 is similar to that of proposition 1. Thus, we have got the best possible interval to include [P∗ (ψ), P ∗ (ψ)], it is the interval [P ∗ (φ) + P∗ (φ→ψ) − 1, P ∗ (φ→ψ)], which are the bounds given by Yao [4].
References 1. N. J. Nilsson: Probability Logic, Artificial Intelligence. 28, 71–87, 1986. 2. G. Shafer: A Mathematical Theory of Evidence, Princeton University Press, 1976. 3. Y. Y. Yao, Xining Li: Comparison of Rough-set and Interval-set Model For Uncertainty Reasoning , Fundamenta Informaticae. 27 (1996) 289–298. 4. N. Recher: Many-valued Logic, New York, McGraw-Hill, 1969. 5. Z. Pawlak: Rough sets, International Journal of Computer and Information Science, 11, 341–356, 1981.
Multi-dimensional Observer-Centred Qualitative Spatial-temporal Reasoning *
1
1
Yi-nan Lu , Sheng-sheng Wang , and Sheng-xian Sha
2
1
Jilin University, Changchun China 130012 Changchun Institute of Technology, Changchun China 130012
2
Abstract. Multi-dimensional spatial occlusion relation (MSO) is an observercentred model which could express the relation between the images of two bodies x and y from a viewpoint v. We study the basic reasoning method of MSO, then extend MSO to spatial-temporal field by adding time feature to it, and the related definitions are given.
1 Introduction Most spatial representation models have used dyadic relations which are not observercentred. But observer-centred relation is really useful in the physical world. Spatial occlusion relation is an observer-centred spatial representation which studies the qualitative relation of two objects from a viewpoint. It is mainly used in computer vision and intelligent robot. In recent years, with the development of spatial reasoning research, occlusion has also been widely investigated in the Qualitative Spatial Reasoning (QSR) [1][2]. There are two important spatial occlusion models in QSR . The first one is Lines of Sight (LOS) proposed by Galton in 1994. LOS includes 14 relations of convex bodies[3]. The second is Randell’s ROC-20[4]. It could be used to handle both convex and concave objects. These two models are both based on RCC(Region Connection Calculus)theory which expresses the topology of region . Most acquired spatial data is abstract data, such as points stand for cities. So the dimensions of spatial data are various. Multi-dimensional data process has been more and more important in spatial information field. Since RCC require the objects in the same dimension[6][7], LOS and ROC are not suitable for multi-dimensional objects. To deal with this, we extended the RCC to MRCC which can express multidimensional topology, and based on MRCC proposed a multi-dimensional spatial occlusion relation MSO in 2002 [8].
2 MRCC and MSO RCC-8 ,the bound sensitive RCC model , is widely used to express spatial relation in GIS ,CAD and other spatial information software. RCC-8 has eight JEPD basic *
This paper was supported by Technological Development Plan of Jilin Province (Grant No. 20010588)
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 694–696, 2003. © Springer-Verlag Berlin Heidelberg 2003
Multi-dimensional Observer-Centred Qualitative Spatial-temporal Reasoning
695
relations [7]. Multi-dimensional RCC (MRCC) is a RCC-8 extended model for multidimensional objects. The function DIM(x) indicates the dimension of an object x , its value is 0,1,2 for point , line and area object, and we call them 0-D,1-D,2-D object respectively. MRCC relation of x and y is defined by a triple: (DIM(x),DIM(y),ψ) where ψ∈RCC8 The available relations of ψ depend on the combination of DIM(x) and DIM(y). By considering all the possible combining of DIM(x) and DIM(y), only 38 MRCC basic relations are available. x occludes y from viewpoint v is formally defined as : Occlude(x,y) ≡def ∃a∈i(x) ∃b∈i(y) [Line(v,a,b)] Function Line(a,b,c) means that points a,b,c fall on a straight line with b strictly between a and c. JEPD occlusion relation OP={NO,O,OB,MO} is defined by Occlude(x,y): NO(x,y) = ¬Occlude(x,y) ∧¬Occlude(y,x) O(x,y) = Occlude(x,y) ∧¬Occlude(y,x) OB(x,y) = ¬Occlude(x,y) ∧Occlude(y,x) MO(x,y) = Occlude(x,y) ∧Occlude(y,x) The definition of Multi-dimensional Spatial Occlusion (MSO) relation is: mso(x,y,v) ≡def (DIM(img(x,v)),DIM(img(y,v)),Ψ , φ ) where (DIM(img(x,v)),DIM(img(y,v)),Ψ) ∈MRCC, φ∈OP There are 79 reasonable MSO relations. Detail information of MRCC and MSO is in [8].
3 Conception Neighborhood Graph Predicate nbr(R1,R2) means that R1 and R2 are conception neighborhood. Definition 1: Given two MSO relation: mso1(x,y,v) = (d1, d2, rcc, op) , mso2(x,y,v’) = (d1’, d2’, rcc’, op’) nbr(mso1 , mso2 ) ≡def (d1= d1’) and (d2= d2’) and [ (rcc = rcc’) or nbr( rcc, rcc’) ] and [ (op = op’) or nbr( op, op’) ] nbr is available for MRCC, OP and MSP.
4 Relation Composition Definition 2: Given two MRCC relations: mrcc1 = (d1, d2, r1) , mrcc2 = (d2, d3, r2) mrcc1 o mrcc2 ≡def (d1, d3, r1 o r2 ) ∩ {all the MRCC basic relations } But as for the occlusion relation OP, we cannot treat them independently, because occlusion relation isn’t transitive[2]. We define fully occlusion relation FOP={FF,FFI,NF} to settle this problem: FF ={ (a,b,R,O)| 0 ≤ a, b ≤ 2,R={TPP,NTPP,EQ} } FFI={ (a,b,R,OB)| 0 ≤ a,b ≤ 2,R={TPPI,NTPPI,EQ} } NF= ¬ (FF∨FFI)
696
Y.-n. Lu , S.-s. Wang, and S.-x. Sha
Definition 3: Given two MSO relations: mso(x,y,v)=(d1, d2, Ψ1, φ1), mso(y,z,v)=( d2, d3, Ψ2, φ2) RF1 , RF2 ∈ FOP and mso(x,y,v)∈RF1 , mso(y,z,v)∈RF2 mso(x,y,v) o mso(y,z,v) ≡def [(d1,d2,Ψ1) o (d2,d3,Ψ2),{NO,O,OB,MO}]∩(RF1 o RF2)
5 Integrating Time Information into MSO Definition 4: Time feature is defined as (t1,c,t2)(A,B) , t1 1
Function RR(R1,R2) shows the possible state changing times when the state started from R1 and ended at R2 .
RR ( R1 , R2 ) = {n | n ∈ N , R2 ∈ RC ( R1 , n )} Considering time feature, the mso relation at time point t is expressed as :
mso(x,y,v) t = ( DIM(img(x,v)), DIM(img(y,v)), Ψ , φ ) t Definition 6: The composition ⊕RC and ⊕RR of time and MSO are defined as Given mso( x, y, v) t1 = (d x , d y , R1 , P1 ) t1 , mso( x, y, v ) t 2 = ( d x , d y , R2 , P2 ) t 2
(d x , d y , R1 , P1 )t1 ⊕ RC (t1 , c, t 2 ) ( x , y ) = (d x , d y , RC ( R1 , c), RC ( P1 , c))t2 (d x , d y , R1 , P1 ) t1 ⊕ RR (d x , d y , R2 , P2 )t2 = (t1 , RR( R1 , R2 ) ∩ RR( P1 , P2 ), t 2 ) ( x , y )
References 1. 2. 3. 4. 5. 6. 7. 8.
Renz J. and Nebel B. , Efficient Methods for Qualitative Spatial Reasoning, ECAI-98, 1998, pages 562–566 Petrov A.P. and Kuzmin L.V. ,Visual Space Geometry Derived from Occlusion Axioms, J. of Mathematical Imaging and Vision ,Vol 6,pages 291–308 Antony Galton, Lines of Sight, AISB Workshop on Spatial and Spatio-temporal Reasoning,1994 , pages 1–15 David R. etc, From Images to Bodies: Modeling and Exploiting Spatial Occlusion and Motion Parallax,IJCAI,2001,pages 57–63 Papadias D. etc. , Multidimensional Range Query Processing with Spatial Relations, Geographical Systems, 1997 4(4),pages 343–365 A.G.Cohn and S.M. Hazarika, Qualitative Spatial Representation and Reasoning: An Overview , Fundamental Informatics , 2001,46 (1-2),pages 1–29 M.Teresa Escrig,Francisco Toledo, Qualitative Spatial Reasoning: Theory and Practice, Ohmsha published,1999,pages 17–43 Wang shengsheng,Liu dayou, Multi-dimensional Spatial Occlusion Relation, International Conference on Intelligent Information Technology (ICIIT2002) , Beijing, P199–204
Architecture Specification for Design of Agent-Based System in Domain View S.K. Lee and Taiyun Kim Department of Computer Science, Korea University Anam-Dong Sungbuk-Ku, Seoul, 136-701, Korea
[email protected] Abstract. How to engineer agent-based system is essential factor to develop agent or agent-based system. Existing methodologies for agent-based system development focus on the development phases and activities in each phase and don’t enough consider system organization and performance aspects. It is important to provide designer with detailed guidance about how agent-based system can be organized and specified. This paper defines computing environment in view of domain, proposes the methods that identify domain, deploy agent, organize system, and analyzes characteristics of the established system organization.
1 Introduction In recent, agent technology is emerging. Some researchers consider agent technology as one of new software engineering paradigm. Although agent technology isn’t mature and its usefulness isn’t enough verified, interest on agent technology will be increased in gradual, and agent technology will be developed and used in many fields. As to number, intelligence, mobility or capability of agent, types of agent can be classified into multi-agent, intelligent agent, informative agent, mobile agent, etc. Whatever types of agent are, engineering methodology to develop agent or agentbased system is required. These existing methodologies focus on the development phases and activities in each phase and don’t enough consider system organization and performance aspects. It is meaningful to establish guidance about how agentbased system can be constructed in efficient[2]. This paper is to provide designer with guidance about how agent-based system can be organized. For this, we survey the existing methodologies for agent-based system development in section 2. In section 3, this paper defines agents’ society in view of domain and describes organization method of agent-based system. Future work is described in section 4.
2 Related Work Many methodologies for agent-based system development have been proposed. We survey GAIA[5], MaSE[4], BDI[1] and SODA[3]. The existing methodologies provide useful engineering concepts and methods for agent-based system development, but there are some weaknesses to be improved. G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 697–700, 2003. © Springer-Verlag Berlin Heidelberg 2003
698
S.K. Lee and T. Kim
- Consideration on agents’ society. It is necessary to model computing environment in agent-oriented view. - Consideration on deployment of agents and system organization. Deployment of agents in agent-based system isn’t enough considered in engineering steps. - Organization methods aren’t described clearly. This may be because agents act in autonomous. Based on relationship among agents, we can reason about organization. - Consideration on performance of agent-based system. The existing methodologies don’t provide method to check whether system organization is efficient.
3 Architecture Design of Agent-Based System 3.1 Domain Oriented Computing Environment Model In agent-oriented view, computing environment can be recognized as set of agents like block. This paper recognizes a block as a domain. That is, computing environment is composed of domains and agents are placed in domains. Each domain has manager to manage domain in internal and support interaction among domains in external. Figure 1 is to model computing environment in view of domain. Domain1
Domain2
Domain3
Domain4
agent manager
Fig. 1. Computing environment model in view of domain
To construct agent-based computing environment, domain partition and agent deployment are required. 3.2 Agent-Based System Organization We can identify and define necessary agents from application description using useful steps and activities proposed in existing methodologies: goal-role-agent model, relationship model, agent model, interaction model, etc. Additionally, deployment of agent, system organization and performance of system must be followed. This paper devises method that solves these issues in domain-oriented view. In order to achieve its roles, each agent interacts with other agents placed in local or remote. Examining relationship among agents from application description, we can define interaction type between two agents as followings. l : Local type. Two agents interact at same infrastructure or organization. r : Remote type. Two agents interact at remote locations, work at different infrastructure and organization. If interaction type of two
Architecture Specification for Design of Agent-Based System in Domain View
699
agents Ai and Aj is ‘l’, then it would be desirable to place both Ai and Aj in same area. On the other hand, if interaction type between agent Ai and Aj is ‘r’, then it would be efficient to separate Ai from Aj in different area. We can define domain identification procedure as below: Traversing all agents that have interaction type ‘l’ with agent Ai(i=1..n, number of agent) in sequence until there is no ‘l’ interaction type and classifying group of agents into domain, we can extract domains. When five agents are identified, and each interaction type is like figure 2, two domains can be extracted. A1, A2 and A4 are deployed in domain1 and A3, A5 in domain2. l
r
A2
A1
A3
r
l
l
r
A4
A5
Fig. 2. Interaction type between agents
Based on the result of domain partition and agent deployment, system organization can be established. Relation among domains is basically remote. One domain can be mapped into one local agent-based system. Therefore entire system is organized into the shape that agent-based systems interact with each other in remote. Applying domain identification procedure to figure 2, system is organized like figure 3. Domain1
A1
A2
Agent
A4
M1
Domain2
Manager
M2 A3
A5
Fig. 3. System organization in view of domain Table 1. Features of system organization of figure 3 Feature Item Deployed agents Locality of domain Domain load Average load of all agents Decision of domain load Overloaded agent
Domain1 A1, A2, A4 4/7(0.57145) 7/3 agents(2.33) 12/5 agents(2.4) low A2
Domain2 A3, A5 2/5(0.4) 5/2 agents(2.5) high A5
It is necessary to examine features of the established system organization: (1) Locality. This feature is the rate of local behaviors over total behaviors in certain domain. Low rate implies that the domain highly interacts with remote domain. (Number of ‘l’ link of all agents deployed in each domain) / (Number of total links in each domain)
700
S.K. Lee and T. Kim
(2) Domain load. This feature is to check certain domain is overloaded. This feature can be measured as below. (Average number of behaviors that agents should provide in domain) / (Average load of entire agents)
(3) Agent load. This feature is to identify overloaded agents. When number of links of certain agent has is more than average load of entire agents, the agent is overloaded. (number of links of each agent) / (average load of entire agents)
As an example, applying the metrics to figure 3, we can analyze features of system like table1. When initial design result doesn’t have good features, engineer can devise alternatives of system organization. Finally, we can specify agent-based system as figure 4. Computing Environment = < E-manager, E-platform, Domain1, Domain2 > E-manager = < E-DS, E-coordination, E-Interoperation > E-coordination = { Contract-net } E-interoperation = { Translator } E-platform = < E-communication, E-language > E-communication = { Internet } E-language = { ACL } Domain1 = < I-manager, I-platform, A1, A2, A4 > I-manager = < I-DS, I-coordination, I-Interoperation > I-coordination = { Contract-net } I-interoperation = { Translator } I-platform = < I-communication, I-language > I-communication = { Ethernet } I-language = { ACL } A1 = < Name, Address, State, Capability > …
Fig. 4. Specification example to design agent-based system architecture
4 Conclusion and Future Work The existing development methodologies for agent-based system must be improved. In particular, methods of agent deployment and performance prediction using system organization are needed. This paper proposes methods to complement existing methodology. Modeling real complex computing environment into domains, domain partition and agent deployment by l-r type make engineer organize agent-based system. The methods can be used to establish architecture of agent-based system and specify the architecture before detail design. In future, we will improve the methods proposed in this paper and combine our methods with existing methodology.
References [1] Huhns, M., et al, Interaction-oriented Software Development, IJSEKE, Vol. 11, No. 3, World Scientific Pub., 2001, 259–279. [2] Lisa M.J. Hogg, et al, Socially Intelligent Reasoning for Autonomous Agents, IEEE Trans. on Systems, Man, and Cybernetics-part A, Vol. 31, No. 5, Sep. 2001, 381–393. [3] Omicini. A., SODA: Societies and Infrastructures in the Analysis and Design of Agentbased Systems, LNCS 1957, Springer, 2000, 185–193. [4] Scott A., et al, Multiagent Systems Engineering, in IJSEKE, Vol. 11, No. 3, World Scientific Pub., 2001, 231–258. [5] Zambonelli, F., et al, Organizational Rules as an Abstraction for the Analysis and Design of Multi-agent Systems, IJSEKE, Vol. 11, No. 3, World Scientific Pub., 2001, 303–328.
Adapting Granular Rough Theory to Multi-agent Context Bo Chen and Mingtian Zhou Microcomputer Institute, School of Computer Science & Engineering, University of Electronic Science & Technology of China, Chengdu, 610054
[email protected],
[email protected] Abstract. The present paper focuses on adapting the Granular Rough Theory to a Multi-Agent system. By transforming the original triple form atomic granule into a quadruple, we encapsulate agent-specific viewpoint into information granules to mean “an agent knows/believes that a given entity has the attribute type with the specific value”. Then a quasi-Cartesian qualitative coordinate system named Granule Space is defined to visualize information granules due to their agent views, entity identities and attribute types. We extend Granular Rough Theory into new versions applicable to the 3-D information cube based M-Information System. Then challenges in MAS context to rough approaches are analyzed, in forms of an obvious puzzle. Though leaving systematic solutions as open issues, we suggest auxiliary measurements to alleviate, at least as tools to evaluate, the invalidity of rough approach in MAS. Keywords. Granule Space, Information Cube, M-Information System, Granular Rough Theory
1 Introduction Efforts on encapsulating all relevant elements in individual granules and developing rough theory over granules lie in the expectation that the rough approach could be more applicable for knowledge representation and approximation. In [1], we represent data cells of information table as Atomic Granules, i.e. triples of the form (ui , a j , vk ) , with the sentential meaning that “Entity ui has the attribute a j with the value of vk ”. Regarding atomic granule as the primitive construct, we define a granular calculus for Information System, with facilities of which, a granular approach to roughness is built up based on pure mereological relations over granules, referred to as Granular Rough Theory here. Shifting the context of information granules to a Multi-Agent System, each agent can have her own knowledge/belief/etc. of the outer world, which means there would be multiple information tables in the entire system. Hence, our new approach would set out by incorporating the viewpoint of agent into atomic granule, bringing in a quadruple (agt , ui , a j , vk ) , with the complete semantics a data cell could indicate: "Agent agt knows/believes/etc. that entity ui has the attribute a j with the value of vk ".
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 701–705, 2003. © Springer-Verlag Berlin Heidelberg 2003
702
B. Chen and M. Zhou
2 Granule Space For a given quadruple form atomic granule (agt , ui , a j , vk ) , it is easy to suppose that there exists a functional relation F : Ag × U × A → V , which states that the specific attribute value of a granule is determined by the agent viewpoint as well as entity identity and the attribute type, viz. vk = F (agt , ui , a j ) . Such an equation can be represented as a valued point in a 3-D coordinate system, as shown in Fig. 1.
Fig. 1. Coordinate Representation of Quadruples. A quadruple form granule is a visualized point Pt ( ag 5 , ui , a j ) of three coordinate arguments, whereas a value vk is point specific functional value. Each coordinate is a discrete set and sorted only by the given index, which distinguish the coordinate system from common mathematical plot systems
Given the point representation above, we define Granule Space as a hypothetical qualitative quasi-Cartesian coordinate system, with three non-negative axes standing for arbitrary-ordered discrete agent viewpoints, entity identities in the universe of discourse and attribute types for entities, holding 3-D points qualified by these coordinates, for each of which it is valued as the specific attribute value of an entity in the viewpoint of the associated agent. There are some properties of Granule Space: The coordinate system of it is similar to a restricted or pruned form of common Cartesian coordinate system in the convention of coordinate representation and interpretation for points. It is not a quantitative but a qualitative system, which distinguishes its coordinate system from its counterpart in ordinary Cartesian form. The differences include the non-numerical semantics of coordinates, the incomparability of values associated with different points for distinct attributes. It is intrinsically discrete, which takes two sides: the axes stand for discrete notions such as agent views, entities and attribute types; the contents in it are individual information granules dispersed in such a conceptual space. Then the point form of granules can be substituted with a sphere centered on the point Pt (agt , ui , a j ) , as indicated by the dashed circles in Fig.1. An Information Cube is a special case of all information granules contained inside the Granule Space, when for each of the agents, there is an information table about the same set of entities describing their identical set of attribute types. Such a situation is common when there are several agents with similar function of reasoning about the same set of entities to give respective decision rules. Fig.2 gives an illustration.
Adapting Granular Rough Theory to Multi-agent Context
703
Fig. 2. An information cube in Granule Space. A 3-D information cube can be a standard extension to original 2-D information table structure for multi-agent systems, which encodes the knowledge/belief of each agent to provide additional functions, viz. the ability to evaluate, synthesize and harmonize the distributed viewpoints. Associated values of granules are omitted
Three planes paralleled with the axes-plane have respective implicit connotation: The entity-attribute plane cutting with a specific agent graduation agt takes complete viewpoint of agent agt to the universe of discourse, referred to as Agent Sight. The agent-attribute plane intersecting a specific entity identity ui on the entity axis stands for all agent views about an entity ui , referred to as Entity Perspective. The agententity plane meeting the attribute axis at the graduation a j indicates the extensional values of specific attribute a j over each entity from all the agents’ viewpoints, referred to as Attribute Extension.
3 Adapting Granular Rough Theory to MAS We define a Multi-Agent oriented M-Information System I M = ( Ag , U , A) , given by an information cube for which Ag is the set of agent viewpoints, U is the universe of discourse and A is the set of attribute types. Substituting the triple form atomic granule ξ (ui , c j , vk ) with the quadruple ξ (agt , ui , c j , vk ) , the underlying granular calculus is modified to an extended version, the M-Granular Calculus CM . In CM , most of the basic operations are consistent with the original system, while the internal structures of the compound granules generated are incorporated with additional agent information. The collection of agent views for a compound granule is critical, which decides if the granule at hand is in the local scope of one agent view or spanning
704
B. Chen and M. Zhou
multiple views, due to which, the corresponding compound granules are named agtLocal Granules and Global Granules, respectively. Such a classification is applicable to the original definitions of Cluster Granule, Aspect Granule, and Aspect Cluster Granule and so on. It is natural for each participated agent to have her own right to analyze her own Agent Sight of the universe to achieve local perception of roughness. That is, by slicing the given information cube into layers paralleled with the entity-attribute axis plane, methods in 2-D Rough Granular Theory can be applied to local compound granules in each layer, so as to roughly approximate local decisional granules with local conditional aspect cluster granules. For M-Information System itself, the extended Granular Rough Theory should take account of attaining not only the local-roughness for each agent but also the globalroughness. It is straightforward to apply similar process when we define the Granular roughness from an information table, viz. to classify all the global aspect cluster granules into Regular, Irregular and Irrelevant Granules with respect to a given global decisional aspect cluster granule, moreover, to define the global Kernel, Hull and Corpus Granule to it. It should be noticed that the Shift operation is based on intensional connection amongst information granules, which is interpreted as aspects belonging to the same entity in the information table, whereas in an information cube, it is confined to be aspects of the same entity in the same aspect’s view. By nature, the above two approaches are both trying to reduce the 3-D information structure given by an information cube to 2-D structure. For the former, it applies the Slicing method to cut the information cube into layers, and then confines each pass of investigation in a single information table; the latter, it implicitly utilize the Flattening method to merge each layer of the information cube into a large global information table, in which, the original universe of entities is enlarged by re-labeling each of them a new identity incorporated with the hint of agent view. Then the extended version of Granular Rough Theory is consistent with its triple version.
4 Challenges of Multi-agent Context From the motivation of rough theory, the roughness is developed to discover the decision rules of an Information System, so that we could base our reasoning on these rules to infer the decision attribute value of an entity due to some conditional attributes values of it. On the other hand, the most important contribution of rough theory is to approximating a set of entities from inner and outer of the set. In the nonagent oriented system, there are no great conflicts between these two methodological connotations. Nevertheless, in the M-Information System, since different agents might have diverse knowledge/beliefs over the outer world, drastic inconsistency arises. For instance, there are two agents ag1 and ag 2 in the system, both of the agents may infer the decision rule “each paper that has the readability of 3 points and innovation of 4 is accepted” from their own information table. Then such a rule is translated into the representation form as “the class of papers that will be accepted can be approximated by the class of papers that have the attribute readability with value 3 and the attribute innovation with value 4”. Since the attributes “readability” and “innovation” are both somewhat subjective, local information table for either agent may be quite different in
Adapting Granular Rough Theory to Multi-agent Context
705
the real data distribution, but they happen to achieve common rules describing only the inference relation between attribute values. Then for a concrete paper, it may be hard to decide whether it is qualified or not without further efforts to coordinate contradictions between agents’ views. But if the information tables of each agent were identical, it would not make any sense to make efforts on it. Rough approach of information analysis is challenged in the context of MAS. Such a puzzle lies in multiple factors that affect the agents’ views to the outer world, including the epistemic characteristics of each agent, the system deployments and other concrete environments, and so on. It is out of our reach in the present paper to establish a systematic methodology to resolve the puzzle and left as an open issue for future research. We can establish measurements helpful in alleviating the invalidity of rough approach in some cases, at least, to give ways of evaluating the current applicability for rough approach. By evaluating the similarity among rows of a specific Entity Perspective Granule, the degree of inconsistence of entity’s perspective in different agent views can be calculated; whereas by assessing the similarity among rows of a given Attribute Extension Granule, the subjectivity, viz. the degree of dependence on agents’ personality, of an attribute can be found. If an attribute type is too subjective, when its value depends too much on the arbitrary epistemic state of an agent, this attribute is bound to differ drastically on the value for most of the entities, aggravating the degree of inconsistence of entity’s perspectives. In such a case, we can reconsider new attribute types that can better objectively characterizing the entities in order that we have a more rational decision system. On the other hand, if the subjectivity is not serious, and there are some entities that have much higher degree of inconsistence on its perspectives than average cases, we could try to find out what implicit reasons lead to these special cases, so that we can adjust the system or these special entities accordingly. A well-founded methodology of investigating similarity among constructs is part of the Rough Mereology established by A. Skowron and L. Polkowski, as stated in [2, 3, 4], in which the quantitative measure of the similarity is given by the value of Rough Inclusion. Based on it, deliberate system parameters can be defined to convey specific semantics in future works.
References 1. Chen Bo, Zhou Mingtian, A Pure Mereological Approach to Roughness. Accepted by RSFDGrC’2003. 2. Polkowski L., Skowron A., Approximate Reasoning about Complex Objects in Distributed Systems: Rough Mereological Formalization. Extended Version for lecture delivered at the Workshop: Logic, Algebra and Computer Science (LACS), Banach International Mathematical Center, Warsaw, December 16, 1996. 3. Skowron A., Polkowski L., Rough Mereological Foundations for Design, Analysis, Synthesis, and Control in Distributed Systems. Information Sciences, Elsevier Science Inc., 1998. 4. Polkowski L., Skowron A., Rough Mereology: A new paradigm for approximate reasoning. International Journal of Approximate Reasoning 15/4, pp 333–365.
How to Choose the Optimal Policy in Multi-agent Belief Revision? Yang Gao, Zhaochun Sun, and Ning Li State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, P.R.China {gaoy, szc, ln}@nju.edu.cn
Abstract. In multi-agent system, it is not enough to maintain the coherence of single agent’s belief set only. By communication, one agent removing or adding a belief sentence may influence the certainty of other agents’ belief sets. In this paper, we investigate carefully some features of belief revision process in different multi-agent systems and bring forward a uniform framework - Bereframe. In Bereframe, we think there are other objects rather than maintaining the coherence or maximize certainty of single belief revision process, and agent must choose the optimal revision policy to realize these. The credibility measure is brought forward, which is used to compute the max credibility degree of the knowledge background. In cooperative multi-agent system, agents always choose the combinative policy to maximize the certainty of whole system’s belief set according to the welfare principle. And in semi-competitive multi-agent system, agents choose the revision policy according to the Pareto efficiency principle. In competitive multi-agent system, the Nash equilibrium criterion is applied into agent’s belief revision process.
1 Introduction Belief revision is the process of incorporating new information into the current belief system. An agent’s belief revision function is a mapping brf : ϕ ( Bel ) × P → ϕ ( Bel ) , which on the basis of the current percept P and current beliefs Bel determine a new set of beliefs Bel. The most important principle of the belief revision process is that the consistency should be maintained. But some extra-logical criteria are needed. For example, the minimal changes principle in the AGM theory [1,2]. Another fundamental reason for belief revision is the inherent uncertainty. How uncertainty is represented in a belief system also fundamentally affect how belief is revised in the light of new information. There are two approaches: the first is numerical approaches and use numbers to summarize the uncertainty. Eg., probability theory. The second is Non-numerical approaches and there are no numbers in the representation of uncertainty, logically deal with the reasons for believing and disbelieving a hypothesis [3].
G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 706–710, 2003. © Springer-Verlag Berlin Heidelberg 2003
How to Choose the Optimal Policy in Multi-agent Belief Revision?
707
Most belief revision researches have been developed with a single agent in mind. And in multi-agent system, they act independently as single agents without awareness of other’s existence too. However, since agents communicate, cooperate, coordinate and negotiate to reach common goals, they are not independent. So, it is not enough to maintain the coherence of single agent’s belief set only. In current researches on multi-agent belief revision, couldn’t differentiate between cooperative multi-agent system, semi-competitive multi-agent system and competitive multi-agent system clearly. Wei Liu etc. divided the multi-agent belief revision into MSBR and MABR. He thinks that MSBR studies individual agent revision behaviors, ie, when an agent receives information from multiple agents towards whom it has social opinions. And MABR investigates the overall BR behavior of agent teams or a society [4,5]. There are some other researches in this field. For example, Benedita Malheiro used ATMS in MABR [6,7]. Kfir-dahav and Tennenholtz initate research on MABR in the context of heterogeneous systems [8] and mutual belief revision of Van der Meyden [9]. But they couldn’t be applied into all kinds of multi-agent system. In this paper, according to analyzing the various communication forms, we give some postulates about possible belief revision objects. Because there are other objects rather than maintaining the coherence or maximize certainty of single belief revision process, agent must choose the optimal revision policy to realize it. The problem of multi-agent belief revision is changed to the problem about how to choose the optimal revision policy in different multi-agent systems. Firstly, the certainty or credibility measure is brought forward, which is used to compute the max certainty degree of the knowledge background. In cooperative multi-agent system, agents always choose the combinative policy that can maximize the certainty measure of whole system’s belief set according to the welfare principle. And in semi-competitive multi-agent system, agents choose the revision policy according to the Pareto efficiency principle. Lastly, in competitive multi-agent system, the Nash equilibrium criterion is applied into agent’s belief revision process.
2 A General Framework of Multi-agent Belief Revision In order to implement cooperative, semi-competitive and competitive MABR, we must describe the objects of agent’s BR in every situation clearly. After BR, the belief set of individual agent maintains consistent while inter-agent inconsistency is allowed. In cooperative MAS, we try to maximize the consistency among agent belief sets. Under perfect situation, there is no inter-agent inconsistency, but inconsistency still exists when the strategies of each agent are limit. In semi-competitive MAS, each agent maximizes its own preference, then try to maximize the consistency of whole system. And in competitive MAS, agent only manages to maximize its own preference. In below sub-section, we describe a uniform model of MABR, that is, Bereframe. In our framework, we assume that the new information and initial belief set of each agent can be different, and the new information for each agent comes simultaneous. Agents carry out BR process in a parallel manner. During the process, each agent has
708
Y. Gao, Z. Sun, and N. Li
a number of strategies. In fact, from belief set B to the belief revision strategies, there B is a mapping: B → 2 B , that is, on belief set B there are 2 strategies under perfect situation. But when implement BR, we can only choose some of them. 2.1 The Uniform Model of Bereframe The MABR model of Bereframe is a six-tuple: . In this framework, obviously, A is the set of agent, B is the set of belief set of each agent, EE is the set of belief Epistemic Entrenchment of each agent, N is the set of new information, S is the set of strategies of each agent, P is the utility function of agent. P(Sj(i)) means the utility value that agenti gets when it uses revision strategy j. In Bereframe, every agent has a utility measurement, which refers to the amount of satisfaction an agent derives from an object or an event. Since in BR, agent prefers to keep beliefs with higher EE and gives up those with lower EE. Therefore, the utility of individual agent could be measured by the ultimate EE of the agent belief set, which is P(Sj(i)) in our framework. We use arithmetic average of all beliefs to express P(Sj(i)) because it is simple to calculate and easy to understand. The utility function of agent could be described by equation 1. 1 P ( S j (i )) = (1) ∑ EE ( belief k ) belief k ∈ Bi Bi k
2.2 BR in Cooperative, Semi-competitive, and Competitive MAS We implement MABR with three different evaluation criteria on the basis of Bereframe. Let’s take only two agents into account and explain our idea. A={A1, A2}. The environment of A1, A2 can be regarded as a big set E, then Cn(B1) ⊆E, Cn(B2) ⊆E. If we define that to increase the profit of individual agent is to expend its closure, then this problem can be dealt with the process of closure operation. Process: 1. If the intersection of two agents belief set closure is empty, then they will have no conflict and are cooperative. 2. If the intersection is not empty, then with the assumption that each agent maximizes its own profit, whether the cooperation is possible or not depends on whether it increases or reduces agent’s profit. 2.1 If the increase of the intersection is larger than the increase of each belief set after belief revision, then the basis of cooperation comes into existence and the requirement of maximizing the whole profit can be conceived. 2.2 Otherwise, agents are competitive. 2.2.1 Cooperative BR Cooperative BR uses the Social Welfare evaluation criteria. It considers global profit and this kind of BR will maximize the utility of whole system. The optimum solution
How to Choose the Optimal Policy in Multi-agent Belief Revision?
709
in our model is the strategy pair that maximizes the consistency of MAS. In our view, an account of consistency of belief set should consider two knowledge bases: 1. The knowledge background KB, which is the set of all the beliefs available. Here KB={B1 B2 Bn}. * 2. The knowledge base KB ⊆KB, which is the maximal consistent subset of KB. The measurement of consistency ρ is defined as follows: ρ = KB KB , that is the portion of the number of maximal consistent beliefs to that of the all beliefs of agents. 2.2.2 Semi-competitive BR In the semi-competitive MAS, it is not necessary to compute the global consistency, but to compute the EE of each belief after revision. EE measures the priority of selecting the belief and is restricted by three assumptions. The first is subjective factor. System or designer gives EE to the beliefs in the initial belief set. The EE of new information is also given by system. The second is logic factor. For example, if p•q, then EE of q is not smaller than that of p. If p q•r, then EE of r is not smaller than the minimum of that of p and q. And if S T•r, then EE of r is not smaller than the maximum of that of p and q. The third is experiential factor. When an agent discovers that other agents add or remove some beliefs, it should revise the EE of these beliefs of its own accordingly. For example, when A1 discovers that A2 reduces belief α and α is enclosed in its own belief set, it will subtract a very little amount ξ from EE of α. The semi-competitive MAS use Pareto Efficiency evaluation criteria. Definition (Pareto Efficiency): A pair of strategy (Sa(i), Sb(j)) is Pareto-optimal, if there is no other strategy which can improve P(Sb(j) without reducing P(Sa(i)). 2.2.3 Competitive BR In competitive MAS, each agent cares only about ist profit and chooses the strategy that maximizes the profit. If we don’t take the impact from other agents into account, the BR process is completely equal to individual BR and the computation of EE is independent from other agents’ experiential knowledge. If agents interact with each other, we should assume that, agent knows other agents’ strategies, other agents’ current or initial belief sets, the new information accepted by other agents and the strategies used by others are observable after BR operation. With these assumptions, we can revise beliefs in competitive MAS using max-min method. When A1 receives new information, it estimates which strategy A2 employing will minimize ist average EE, and then it chooses the best strategy it can under this situation. A2 also estimates A1 and employ a certain strategy accordingly. During this process, these two agents don’t exchange their information, because they are opposed. This method will result the Nash equilibrium. Definition (Nash Equilibrium): A pair of strategy (Sa(i), Sb(j)) is the Nash equilibrium, if and only if, for every agent, e.g. agent a: Sa(i) is the best strategy for agent a if all the others agents, here agent b, choose (Sb(j)).
710
Y. Gao, Z. Sun, and N. Li
3 Conclusion and Future Works In MASs, agents complete tasks by cooperating, negotiating and coordinating. They consider not only their own profit but also the profit of whole system. The multi-agent belief revision framework is developed here. The Bereframe provides a model to support several MABR e.g. cooperative MABR, semi-competitive MABR and competitive MABR in order to unite all MASs in one framework. Many other features of this framework are still under investigation and need future work. For example, when the inconsistency of belief set is reasonable [11] or there are malicious agents in the MAS, the BR will become very complex and need to be investigated in our future work deeply. Acknowledgements. This paper is supported by the National Natural Science Foundation of China under Grant No. 60103012 and the National Grand Fundamental Research 973 Program of China under Grant No. 2002CB312002.
References 1.
Alchouurron, C., Gardenfors, P., Makinson, D.: On the logic of theory change: Partial meet contraction functions and their associated revision functions. Journal of Symbolic Logic 50 (1985) 510–530 2. Gardenfors, P.: The dynamics of belief systems: Foundations vs. coherence theories. Revue Internationale de Philosophie 171 (1990) 24–46 3. Shafer, G.: A mathematical theory of evidence. Princeton, NJ: Princeton University Press (1976) 4. Liu W., Williams M.A.: A framework for multi-agent belief revision (Part I: The role of ontology). In: Foo N. (ed.) 12th Australian Joint Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence. Sydney. Australia: Springer-Verlag (1999) 168– 179 5. Liu W., Williams M.A.: A framework for multi-agent belief revision. Studia Logica 67 (2001) 219–312 6. Benedita Malheiro, N.R.Jennings, Eugenio Oliveira.: Belief revision in Multi-agent Systems. In: Proceedings 11th European Conf. on Artificial Intelligence (ECAI-94) (1993) 294–298 7. A.F. Dragoni, Paolo Giorgini, Marco Baffetti.: Distributed Belief Revision vs. Belief Revision in a Multi-Agent environment: first results of a simulation experiment In: Magnus Boman and Walter Van de Velde (Eds.), Multi-agent Rationality, LNCS No. 1237, Springer-Verlag (1997) 8. Noa E. Kfir-dahav, Moshe Tennenholtz.: Multi-agent belief revision. In: 6th Conference on Theoretical Aspects to Rationality and Knowledge (1996) 175–194 9. R. van der Meyden.: Mutual Belief Revision. In: Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, Bonn (1994) 595–606 10. Nir Friedman, Joseph Y.Halpern.: Belief Revision: A Critique. In: Aiello, L.C., Doyle, J., Shapiro, S.C., (eds.) Principles of Knowledge Representation and Reasoning: Proceedings of the Fifth International Conference (KR'96), San Francisco, Morgan Kaufmann (1996) 421–431
Research of Atomic and Anonymous Electronic Commerce Protocol Jie Tang, Juan-Zi Li, Ke-Hong Wang, and Yue-Ru Cai Department of Computer, Tsinghua University, P.R.China, 100084
[email protected],
[email protected] Abstract. Atomicity and anonymity are the two important requirements for application in electronic commerce. However absolutely anonymity may lead to confliction with law enforcement, e.g. blackmailing or money laundering. Therefore, it is important to design systems satisfying both atomicity and revocable anonymity. Based on the concept of two-phase commitment, we realize atomicity in electronic transaction with the Trusted Third Part as coordinator. Also we develops Brands’ fair signature model, propose a method to enable not only anonymity but also owner-trace and money-trace. Keywords: fair blind signature, atomicity, fair anonymity, payment system
1 Introduction Recently, e-commerce (electronic commerce) is among the most important and exciting area of research and development in information technology. Some ecommercial protocol has been proposed, such as, SET[1], NETBILL[2], DigiCash[3], etc. However, application of it doesn’t grow up as expected, the main reasons restricting it’s wide and quick development are listed below. (1) Lack of fair transaction. A fair transaction is a transaction satisfying atomicity, which means that both sides agree with the goods and money before the transaction, and receive correct goods (or money) after the transaction, otherwise transaction terminates with reconversion to both sides. Unfortunately, most existing e-transaction protocols don’t enable or only partly enable atomicity. (2) Lack of privacy protection. Most of the current systems does not provide protection to users’ privacy. In these system, seller not only need to cater for the existing clients on the internet, but also want to mine the potential users, so all clients’ activities on the web are logged involuntary which lead to potential possibility of misusing the clients’ privacy. Another type system is anonymous one[3], in which client could execute payment anonymously, but new problems are emerged, e.g. cheat, money laundering, blackmailing, etc. To deal against the problems, we propose a new electronic transaction protocol to realize atomicity and fair anonymity at the same time in this paper. This paper is structured as follow. Chapter two introduces related work; Chapter three presents a new electronic transaction protocol. Chapter four analyzes the atomicity and fair anonymity of the protocol. Finally, chapter five summarizes the paper. G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 711–714, 2003. © Springer-Verlag Berlin Heidelberg 2003
712
J. Tang et al.
2 Related Work Atomicity and Anonymity are two important characteristic in e-commerce. David Chaum put forth the concept of the anonymity in electronic commerce, and he realized the privacy in his Digicash[3]. But absolutely anonymity provides feasibility to criminal by untraceable e-crime, such as corruption, unlawful purchase, blackmailing crime, etc. In 1992, van Solms and Naccache[4] discovered a serious attack on Chaum’s payment system. Therefore, Chaum and Bronds proposed some protocols to implement controllable anonymity[5,6]. Afterward, Stadler brought forward fair anonymity[7]. Tygar brought forward that atomicity should be guaranteed in e-transaction[8,9]. Tygar divided atomicity into three levels: Money Atomicity, Goods Atomicity, Certified Delivery. (1) Money Atomicity. It is a basic Atomicity. If in a transaction money transfers either transfers successfully or reconverts, then money transfer is atomic. (2) Goods Atomicity. What satisfy goods atomicity need to satisfy money atomicity. Goods atomicity demands that if the money is paid then goods is guaranteed to be received and if goods is received, money must be paid as well. (3) Certified Delivery. Certified Delivery is based on the money atomicity and goods atomicity and it provides the capability to certify sellers and customers. Jean Camp presented a protocol realizing atomicity and anonymity in [10]. But in this system, fair anonymity is unavailable, and, goods are limited to electronic ones.
3 Atomic and Fair Anonymous Electronic Transaction Protocol We propose a new protocol, in which atomicity and anonymity are both satisfied, trace to the illegal transaction is available as well. We define the protocol as AFAP(Atomic and Fair Anonymous Protocol). The system model is a 5-tuple AFAPM := (TTP, BANK, POSTER, SELLER, CUSTOMER) , where TTP is third trusted partner, POSTER is a delivery system, POSTER and BANK are both second-grade trusted organizations and authenticated by TTP. AFAP comprises five sub-protocols: TTP Setup, BANK Initialization, Registration, Payment and Trace. In this section, we focus on the payment and trace. The elaborate introduction can refer to [11]. 3.1 Payment Sub-protocol Payment sub-protocol is the most important sub-protocol in AFAP. The flow and relation of participators in the payment sub-protocol can refer to Fig.1. Step 1. Preparing to buy good, CUSTOMER sends a TransReq to TTP. Step 2. TTP generates TID and Expiration_Date for the transaction. Step 3. CUSTOMER computes the drawn token Token_w and payment token Token_p. Then he sends the blinded Token_p to BANK.
Research of Atomic and Anonymous Electronic Commerce Protocol
713
1 Customer
TTP
9
2 4
3
Poster
5 8
7 Bank
Seller 6
Fig. 1. The flow and relation of participators in the payment sub-protocol
Step 4. CUSTOMER executes a blind signature protocol with BANK to obtain a signature of blinded Token_p by BANK, i.e. Sign SK _ BANK ( Act , Aot , Bt ) . Then BANK subtracts corresponding value from the CUSTOMER’s account. In this way, the signature is regarded as coin worthy of value in later transaction. Step 5. CUSTOMER selects the Pickup_fashion, sets Pickup_Response, which is kept secreted by CUSTOMER and POSTER. CUSTOMER then generate Trans_detail and sends Sign SK _ BANK ( Act , Aot , Bt ) and Trans_detail to SELLER. Step 6. SELLER startups a local transaction clock. Afterward SELLER sends Sign SK _ BANK ( Act , Aot , Bt ) and value to BANK. Step 7. BANK validates blind signature
Sign SK _ BANK ( Act , Aot , Bt ) . And then
BANK queries the signature-value and signature-payment database to judge the validity of the signature and whether the payment is an illegal dual payment. Passed all these steps, BANK adds payment signature on the SELLER’s account, and generates Trans_guarantee for SELLLER. Step 8. SELLER startup the process of dispatch, he transfers Pickup_fashion to POSTER, and notifies POSTER to ready for delivering good. Step 9. POSTER estimates whether Pickup_fashion is permitted, if passed, POSTER generates Pickup_guarantee and sends it to TTP. Step 10. Given received Pickup_guarantee before Expiration_Date, if there exist one or more Rollback request, TTP sends Rollback command to all participators, or TTP sends Trans_Commit. Otherwise, if TTP do not receive Pickup_guarantee before Expiration_Date, he sends Rollback to all participators to rollback the transaction. Step 11. Received Trans_Commit, BANK begins really transfer process. SELLER dispatches goods to POSTER; POSTER delivers goods by the Pickup_fashion. 3.2 Trace Sub-protocol When illegal transaction occurred, trace sub-protocol is activated, which include three aspects: owner trace, payment trace and drawn trace.
714
J. Tang et al.
(1) Owner Trace. It is aimed to discover the account based on the known illegal payment. BANK queries the payment token and sends it to TTP, TTP computes
I = (t o(1 / xT ) ) / t c to trace the owner of the illegal payment. (2) Payment Trace and Drawn Trace. Payment Trace is to discover payment token ’
’(1 / xT )
based on known drawn token. BANK sends t c to TTP. TTP computes t c = t o get payment token. Drawn trace is the inverse process to payment trace.
to
4 Conclusion With the rapid development of internet, commerce action on the internet have extensive application prosperity. The key to improve the e-commerce is to provide secure, privacy unsealed transaction. In this paper, we analyze atomicity and fair anonymity, propose a new protocol to realize them altogether. Based on this paper, we will improve in these aspects:(1)Electronic payment based on group blind signature. (2)More efficient atomicity.
References 1
Larry Loeb. Secure Electronic Transactions Introduction and Technical Reference. ARTECH HOUSE,INC. 1998 2. B. Cox, J. D. Tygar, M. Sirbu, Netbill security and transaction protocol. Proceedings of st the 1 USENIX Workshop on Electronic Commerce. 1995:77–88 3. D.Chaum, A.Fiat, M.Naor. Untraceable electronic cash. Advances in cryptology: Crypto’88 Proceedings, Springer Verlay, 1990:200–212 4. S. von Solms and D. Naccache. On blind signatures and perfect crimes. Computers and Security. October, 1992, 11(6): 581–583 5. D. Chaum, J.H.Evertse, J.van de Graff, R.Peralta. Demonstrating Possession of a Discrete Logarithm without Revealing It. Advances in Cryptology-CRYPTO ’86 Proceedings. Springer Verlag. 1987: 200–212 6. S. Brands. Untraceable Off-line Cash in Wallets with Observers. In Advances in Cryptology-Proceedings of CRYPTO’93, Lecture Notes in Computer Science. Springer Verlag. 1993, 773:302–318 7. M. Stadler, M. M. Piveteau, J. Camenisch. Fair blind signatures. In L. C. Guillou and J.J. Quisquater, editors, Advances in Cryptology-EUROCRYPT’95, Lecture Notes in Computer Science, Springer Verlag. 1995, 921:209–219 8. J. D. Tygar. Atomicity versus Anonymity: Distributed Transactions for Electronic Commerce. Proceedings of the 24th VLDB Conference, New York, USA, 1998:1–12 9. J. D. Tygar. Atomicity in Electronic Commerce. Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing,1996,5:8–26 10. L. Camp, M.Harkavy, J. D. Tygar, B.Yee. Anonymous Atomic Transactions. In Proceedings of 2nd Usenix Workshop on Electronic Commerce. 1996, 11:123~133 11. Tang Jie. The Research of Atomic and Fair Anonymous Electronic Transaction Protocol. YanShan University. 2002:35~59
Colored Petri Net Based Attack Modeling Shijie Zhou, Zhiguang Qin, Feng Zhang, Xianfeng Zhang, Wei Chen, and Jinde Liu College of Computer Science and Engineering University of Electronic Science and Technology of China Sichuan, Chengdu 610054, P.R.China {sjzhou, qinzg, ibmcenter}@uestc.edu.cn
Abstract. Color Petri Net (CPN) based attack modeling approach is addressed. CPN based attack model is flexible enough to model Internet intrusion, including the static and dynamic features of the intrusion. The processes and rules of building CPN based attack model from attack tree are also presented. In order to evaluate the risk of intrusion, some cost elements are added to CPN based attack modeling. Experiences also show that it is easy to exploit CPN based attack modeling approach to provide the controlling functions.
1 Introduction Attack modeling is a kind of approach, which can picture the processes of attacks and depict them semantically. An attack modeling approach must not only characterize the whole steps of attacks accurately, but also successfully point out how attacking process continues [2]. Additionally, reasonable response measures should also be indicated in practical intrusion detection and response system. In this paper, we present an approach based on colored petri net (CPN) [1] for attack modeling, which derived from other attacking modeling methods [2][3][4][5][6][7].
2 From Attack Tree to Colored Petri Net Based Attack Models The CPN based attack model can be defined from attack tree to reduce the cost of modeling. It is because that some attack models have been built with attack trees [2][3]. To build a CPN based attack model from an attack tree, the mapping rules between them should be determined. Root Nodes Mapping. In attack trees, root node is the goal of attacks. It is also the result of an attack. In CPN attack models, the root node can map to place: node maps to a place, node inputs map to arcs of place. This kind of place is called Root Place. The OR gate and AND gate will map to the event relationship of CPN. Their maps are showed in Fig. 1. The node with OR gate maps to event’s conflict relation of CPN. This means only one event occurs, and then the attack will take place. The node with AND gate maps to event’s sequential relation, that implies only all events occur then the attack will take place. G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 715–718, 2003. © Springer-Verlag Berlin Heidelberg 2003
716
S. Zhou et al.
p t1
t2
p t
p
p
(a)
(b)
Fig. 1. Root Node and its mapping in CPN. (a) is OR gate of root node , and (b) is AND gate of root node
Leaf Nodes Mapping. Leaf nodes in attack trees are hacker’s actions breaking into the victim’s system. It is clearly that leaf nodes can map to the transition of CPN. But in an attack tree, all leaf nodes are connected directly. So it is difficult to do straightforward maps. Some intermediate states must be defined so that the mapping can be performed. Ruiu’s analysis of intrusion [8] divided attacks into seven stages: t
PS t p
Fig. 2. Leaf node and its mapping in CPN
Reconnaissance, Vulnerability Identification, Penetration, Control, Embedding, Data Extraction & Modification, and Attack Relay. Each stage can also be divided into some or several sub-stages. So we can model attacks’ stages and sub-stages as intermediate states when translating attack trees into CPN based attack models. Fig.2 shows how to deal with such translation. These newly added places (including the places added during translation of intermediate nodes) are called Added Place. In Fig.2, the value of place p can be derived from a function f(t), where t T, and f(t) . And the output arc of place p is the input arc of next transition. In Fig.2, the PS place is an Initialization Place whose means depend on the transition. Intermediate Nodes Mapping. Intermediate nodes of attack trees are sub-actions or sub-goals of hackers. It is more difficult to translate these nodes into CPN models because intermediate nodes have not only input arc(s) and output arc(s), OR and AND gate logics, but also the same problems confronted in leaf node translating. Mapping rules of leaf nodes are listed as follow: − The intermediate node itself maps to transition, t, of CPN − Input arc of an intermediate node maps the input arc of t, OR and AND gates are translated to confliction relation and sequence relation respectively. − Intermediate place is added in the same way as the translations of leaf nodes.
Colored Petri Net Based Attack Modeling
717
Temporal Logic Mapping. As to building template CPN from attack tree, an important issue is how to deal with temporal sequence of attacks. From above discussing, we know that an attack comprises many stages and sub-stages that all have temporal logical relations. One occurring sequence of many stages and substages means an intrusion while all occurring sequences comprise the attack trees. Event relations of CPN can depict temporal logics in an attack tree.
3 Extended CPN Model for Intrusion and Response Attack tree only depicts the process of an attack. No mechanism in attack tree is provided to allow active response and defense, partially due to limits of tree model. But CPN based attack modeling can give administrators such means to control the hacker’s action or carry out some effective response. Based on the definition of CPN, transition can fire only when all its bindings occur. So we can model the defense and response actions as follow: for each transition, an input arc is added to allow control, and an output arc is added to allow response. Additionally, if there are many control and response actions, many arcs can also be added. But readers should be aware that this model is not derived from attack tree, but extends directly from CPN attack model.
4 Conclusions CPN based attack modeling approach can be used to model the attacks. It is easy to map attack tree into CPN attack model. After other features are added to this model, it can be used to model the intrusion detection and response. Another important feature of this model is that intrusion can be quantified, so the most effective controlling actions can be determined.
References 1. 2. 3. 4. 5.
Kurt Jensen. Coloured Petri Nets: Basic Concepts, Analysis Methods and Practical Use, volume 1. Springer-Verlag, Berlin, Germany / Heidelberg, Germany / London, UK / etc., 1992. Tidwell, T., Larson, R., Fitch K., and Hale J. Modeling Internet Attacks, Proceedings of the 2001 IEEE Workshop on Information Assurance and Security, United States Military Academy, West Point, NY, 5-6 June, 2001, Pages 54–59. Cunningham, W. The WikiWikiWeb. http://c2.com/cgi-bin/wiki. Schneier, B.. Attack Trees. Dr. Dobb's Journal of Software Tools 24, 12 (Dec. 1999), 21{29}. Guy Helmer, Johnny Wong, Mark Slagell, Vasant Honavar, Les Miller, and Robyn Lutz. A software fault tree approach to requirements analysis of an intrusion detection system. In Proceedings, Symposium on Requirements Engineering for Information Security. Center for Education and Research in Information Assurance and Security, Purdue University, March 2001.
718 6. 7. 8.
S. Zhou et al. Jan Steffan and Markus Schumacher. Collaborative Attack Modeling, SAC 2002, Madrid, Spain. McDermott, J. Attack Net Penetration Testing. In The 2000 New Security Paradigms Workshop (Ballycotton, County Cork, Ireland, Sept. 2000), ACM SIGSAC, ACM Press, pp. 15{22}. Dragos Ruiu. Cautionary tales: Stealth coordinated attack how to, July 1999. http://www.nswc.navy.mil/ISSEC/CID/Stealth_Coordinated_Attack.html.
Intelligent Real-Time Traffic Signal Control Based on a Paraconsistent Logic Program EVALPSN Kazumi Nakamatsu1 , Toshiaki Seno2 , Jair Minoro Abe3 , and Atsuyuki Suzuki2 1
3
School of H.E.P.T., Himeji Inst. Tech., HIMEJI 670-0092 JAPAN,
[email protected] 2 Dept. Information, Shizuoka University, HAMAMATSU 432-8011 JAPAN, {cs9051,suzuki}@cs.inf.shizuoka.ac.jp Dept. Informatics, Paulista University, SAO PAULO 121204026-002 BRAZIL,
[email protected] Abstract. In this paper, we introduce an intelligent real-time traffic signal control system based on a paraconsistent logic program called an EVALPSN (Extended Vector Annotated Logic Program with Strong Negation), that can deal with contradiction and defeasible deontic reasoning. We show how the traffic signal control is implemented in EVALPSN with taking a simple intersection example in Japan. Simulation results for comparing EVALPSN traffic signal control to fixed-time traffic signal control are also provided. Keywords: traffic signal control, paraconsistent logic program, intelligent control, defeasible deontic reasoning.
1
Introduction
We have already proposed a paraconsistent logic program called EVALPSN (Extended Vector Annotated Logic Program) that can deal with contradiction and defeasible deontic reasoning [2,4]. Some applications based on EVALPSN such as robot action control, automatic safety verification for railway interlocking and air traffic control have been already introduced in [5,6]. Traffic jam caused by inappropriate traffic signal control is a serious problem that we have to resolve. In this paper, we introduce an intelligent real-time traffic signal control system based on EVALPSN as one proposal to resolve the problem. Suppose that you are waiting for the change of the front traffic signal with red to green at an intersection. You must demand the change in your mind. This demand can be regarded as permission for the change. On the other hand, if you are going through the intersection with green signal you must demand keeping green. This demand can be regarded as forbiddance from the change. Then, there is a conflict between those permission and forbiddance. The basic idea of the traffic signal control is that the conflict can be managed by the defeasible deontic reasoning of EVALPSN. We show how to formalize the traffic signal control by defeasible deontic reasoning in EVALPSN. G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 719–723, 2003. c Springer-Verlag Berlin Heidelberg 2003 ÿ
720
2
K. Nakamatsu et al.
Traffic Signal Control in EVALPSN
We take an intersection in which two roads are crossing described in Fig.2 as an example for introducing our method based on EVALPSN. We suppose an intersection in Japan, which means “cars have to keep left”. The intersection has four traffic signals T{1,2,3,4} , which have four kinds of lights, green, yellow, red T S3 -6 4 T2 and right-turn arrow. Each lane conÿ nected to the intersection has a sensor S S 4 2 ? 6 S to detect traffic amount. Each sensor is T1 1 ? T3 described as Si (1 ≤ i ≤ 8) in Fig.2. For ÿ 6 example, the sensor S6 detects the right S8 turn traffic amount confronting the trafS7 fic signal T1 . Basically, the traffic signal Fig. 1. Intersection control is performed based on the traffic amount sensor values. The chain of signaling is as follows: → red → green → yellow → right-turn arrow → all red →. For simplicity, we assume that the lengths of yellow and all red signaling time are constant, therefore, the signaling time of yellow and all red are supposed to be included in those of green and right-turn arrow, respectively as follows : T1,2 → red → red → green → arrow → red →, T3,4 → green → arrow → red → red → green → . Only the changes green to arrow and arrow to red is controlled mainly. The change red to green of the front traffic signal follows the change right-turn arrow to red of the neighbor one. Moreover, the signaling is controlled at each unit time t ∈ {0, 1, 2, . . . , n}. The traffic amount of each lane is regarded as permission for or forbiddance from signaling change such as green to right-turn arrow. For example, if there are many cars waiting for the signaling change with green to right-turn arrow, it can be regarded as the permission for the signaling change with green to right-turn arrow, on the other hand, if there are many cars moving through the intersection with green, it can be regarded as the forbiddance from the signaling change with green to right-turn arrow. We formalize such contradiction and its resolution by deontic defeasible reasoning in EVALPSN. We assume that minimum and maximum signaling times are previously given for each traffic signal and each signaling time must be controlled between the minimum and maximum. We consider the four states of the traffic signals, state 1 (T1,2 are red and T3,4 are green), state 2 (T1,2 are red and T3,4 are right-turn arrow), state 3 (T1,2 are green and T3,4 are red), state 4 (T1,2 are right-turn arrow and T3,4 are red). Due to space restriction, we take only the case 1 into account to introduce the traffic signal control in EVALPSN. S5 S6 ? -
S rb 1 (t) : [(2, 0), α] ∧ T1,2 (r, t) : [(2, 0), α] ∧ T3,4 (b, t) : [(2, 0), α] ∧ ∼ M IN3,4 (b, t) : [(2, 0), α]∧ ∼ S rb 5 (t) : [(2, 0), α]∧ ∼ S rb 7 (t) : [(2, 0), α] → T3,4 (a, t) : [(0, 1), γ], (1)
Intelligent Real-Time Traffic Signal Control
721
S rb 3 (t) : [(2, 0), α] ∧ T1,2 (r, t) : [(2, 0), α] ∧ T3,4 (b, t) : [(2, 0), α] ∧ ∼ M IN3,4 (b, t) : [(2, 0), α]∧ ∼ S rb 5 (t) : [(2, 0), α]∧ ∼ S rb 7 (t) : [(2, 0), α] → T3,4 (a, t) : [(0, 1), γ], (2) S rb 2 (t) : [(2, 0), α] ∧ T1,2 (r, t) : [(2, 0), α] ∧ T3,4 (b, t) : [(2, 0), α] ∧ ∼ M IN3,4 (b, t) : [(2, 0), α]∧ ∼ S rb 5 (t) : [(2, 0), α]∧ ∼ S rb 7 (t) : [(2, 0), α] → T3,4 (a, t) : [(0, 1), γ], (3) S rb 4 (t) : [(2, 0), α] ∧ T1,2 (r, t) : [(2, 0), α] ∧ T3,4 (b, t) : [(2, 0), α] ∧ ∼ M IN3,4 (b, t) : [(2, 0), α]∧ ∼ S rb 5 (t) : [(2, 0), α]∧ ∼ S rb 7 (t) : [(2, 0), α] → T3,4 (a, t) : [(0, 1), γ], (4) S rb 6 (t) : [(2, 0), α] ∧ T1,2 (r, t) : [(2, 0), α] ∧ T3,4 (b, t) : [(2, 0), α] ∧ 0
S rb 7 (t) : [(2, 0), α]∧ ∼ M IN3,4 (b, t) : [(2, 0), α]∧ ∼ S rb 5 (t) : [(2, 0), α] ∧ ∼ S rb 7 (t) : [(2, 0), α] → T3,4 (a, t) : [(0, 1), γ], S
rb
8 (t) : [(2, 0), α]
(5)
∧ T1,2 (r, t) : [(2, 0), α] ∧ T3,4 (b, t) : [(2, 0), α] ∧
0 S rb 5 (t) : [(2, 0), α]∧ ∼ M IN3,4 (b, t) : [(2, 0), α]∧ ∼ S rb 5 (t) : [(2, 0), α] ∼ S rb 7 (t) : [(2, 0), α] → T3,4 (a, t) : [(0, 1), γ], S rb 5 (t) : [(2, 0), α] ∧ T1,2 (r, t) : [(2, 0), α] ∧ T3,4 (b, t) : [(2, 0), α] ∧
∼ M AX3,4 (b, t) : [(2, 0), α] → T3,4 (a, t) : [(0, 1), β],
∧ (6) (7)
S rb 7 (t) : [(2, 0), α] ∧ T1,2 (r, t) : [(2, 0), α] ∧ T3,4 (b, t) : [(2, 0), α] ∧ ∼ M AX3,4 (b, t) : [(2, 0), α] → T3,4 (a, t) : [(0, 1), β], (8) M IN3,4 (g, t) : [(2, 0), α] ∧ T3,4 (g, t) : [(2, 0), α] → T3,4 (a, t) : [(0, 2), β], (9) M AX3,4 (g, t) : [(2, 0), α] ∧ T3,4 (g, t) : [(2, 0), α] → T3,4 (a, t) : [(0, 2), γ], (10) T3,4 (g, t) : [(2, 0), α] ∧ T3,4 (a, t) : [(0, 1), γ] → T3,4 (a, t + 1) : [(2, 0), β], (11) T3,4 (g, t) : [(2, 0), α] ∧ T3,4 (a, t) : [(0, 1), β] → T3,4 (g, t + 1) : [(2, 0), β]. (12)
3
Simulation
Suppose that the traffic signals T1,2 are red and the traffic signals T3,4 are green, and the minimum signaling time of green has been already passed. • If the sensors S1,3,5 detect more traffic amount than the criteria and the sensors S2,4,6,7,8 do not detect at the time t, then, the EVALPSN clause (7) is fired and the forbiddance T3,4 (a, t) : [(0, 1), β] is derived, furthermore, the EVALPSN clause (12) is also fired and the obligatory result T3,4 (g, t + 1) : [(2, 0), β] is obtained. • If the sensors S1,3 detect more traffic amount than the criteria and the sensors S2,4,5,6,7,8 do not detect at the time t, then, the EVALPSN clause (8) is fired and the permission T3,4 (a, t) : [(0, 1), γ] is derived, furthermore, the EVALPSN clause (11) is also fired and the obligatory result T3,4 (a, t + 1) : [(2, 0), β] is obtained. We used the cellular automaton model for traffic flow and compared the EVALPSN traffic signal control to fixed-time traffic signal control in terms of the numbers of cars stopped and moved under the following conditions. [Condition 1] We suppose that : cars are flowing into the intersection in the following
722
K. Nakamatsu et al.
probabilities from all 4 directions, right-turn 5%, left-turn 5% and straight 20%; the fixed-time traffic signal control, green 30, yellow 3, right-arrow 4 and red 40 unit times ; the length of green signaling between 3 and 14 unit times, and the length of right-arrow signaling between 1 to 4 unit times. [Condition 2] We suppose that : cars are flowing into the intersection in the following probabilities; from South, right-turn 5%, left-turn 15% and straight 10%; from North, right-turn 15%, left-turn 5% and straight 10%; from West, right-turn, left-turn and straight 5% each ; from East, right-turn and left-turn 5% each, and straight 15%; other conditions are same as the [Condition 1]. We measured the sums of cars stopped and moved for 1000 unit times, and repeated it 10 times under the conditions. The average numbers of cars stopped and moved are shown in Table.1. This simulation results say that ; the number of cars moved when the EVALPSN control is larger than that when the fixed control, and Table 1. Simulation Results
fixed-time control EVALPSN control car stopped car moved car stopped car moved Condition 1 17690 19641 16285 23151 Condition 2 16764 18664 12738 20121 the number of cars stopped when the EVALPSN control is smaller than that when the fixed time control. Taking only the simulation into account, it is concluded that the EVALPSN control is more efficient than the fixed time one.
4
Conclusion
In this paper, we have proposed an EVALPSN based real-time traffic signal control system. The practical implementation of the system that we are planning is under the condition that EVALPSN can be easily implemented on a microchip hardware, although we have not addressed it in this paper. As our future work, we are considering multi-agent intelligent traffic signal control based on EVALPSN.
References 1. Nakamatsu,K., Abe,J.M. and Suzuki,A. : Defeasible Reasoning Between Conflicting Agents Based on VALPSN. Proc. AAAI Workshop Agents’ Conflicts, AAAI Press, (1999) 20–27 2. Nakamatsu,K., Abe,J.M. and Suzuki,A. : Annotated Semantics for Defeasible Deontic Reasoning. Proc. the Second International Conference on Rough Sets and Current Trends in Computing LNAI 2005, Springer-Verlag, (2001) 470–478 3. Nakamatsu,K. : On the Relation Between Vector Annotated Logic Programs and Defeasible Theories. Logic and Logical Philosophy, Vol.8, UMK Press, Poland, (2001) 181–205
Intelligent Real-Time Traffic Signal Control
723
4. Nakamatsu,K., Abe,J.M. and Suzuki,A. : A Defeasible Deontic Reasoning System Based on Annotated logic Programming. Computing Anticipatory Systems, CASYS2000, AIP Conference Proceedings Vol.573, AIP Press, (2001) 609–620 5. Nakamatsu,K., Abe,J.M. and Suzuki,A. : Applications of EVALP Based Reasoning. Logic, Artificial Intelligence Robotics, Frontiers in Artificial Intelligence and Applications Vol.71, IOS Press, (2001) 174–185 6. Nakamatsu,K., Suito,H., Abe,J.M. and Suzuki,A. : Paraconsistent Logic Program Based Safety Verification for Air Traffic Control. Proc. 2002 IEEE Int’l Conf. Systems, Man and Cybernetics (CD-ROM), IEEE, (2002)
Transporting CAN Messages over WATM Ismail Erturk Kocaeli University, Faculty of Technical Education, Izmit, 41300 Kocaeli, Turkey
[email protected] Abstract. A new method describing fixed wireless CAN networking is presented in this paper, exploiting the advantages of WATM as an over the air protocol, which include fixed-small cell size and connection negotiation. CAN over WATM mapping issues using encapsulation technique are also explained. The performance analysis of the proposed scheme is presented through computer simulation results provided by OPNET Modeler.
1 Introduction CAN (Controller Area Network) has become one of the most advanced and important autobus protocols in the communications industry for the last decade. Currently it is prominently used in many other industrial applications due to its high performance and superior characteristics as well as in automotive applications [1], [2]. Researches for cost-effective solutions to overcome threatening complexity of for example a car’s or a factory’s wiring harness appear to be a key point in such applications. For this reason CAN promisingly receives many researchers’ attention in ongoing industrial projects. Extensive use of CAN in automotive and other control applications results in also need for internetworking between CAN and other major public/private networks. In order to enable CAN nodes for them to be programmed and/or managed from any terminal controller at any time globally, requirements of the wireless transfer of CAN messages should be inevitably met as well [3], [4]. This idea establishes the basis to interconnection of CAN and WATM (Wireless Asynchronous Transfer Mode) in view of ATM as a universally accepted broadband access technology for transporting realtime multimedia traffic. WATM as an extension of local area network for mobile users, or for simplifying wiring, or simplifying reconfiguration has an appeal, stated in [5]. In addition, another rationale for this research work is that mobile/wireless networking will be also common and broadband as much as traditional networking in the near future.
2 The CAN, ATM, and WATM Allowing the implementation of peer-to-peer and broadcast or multicast communication functions with lean bus bandwidth use, CAN applications in vehicles are graduG. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 724–729, 2003. © Springer-Verlag Berlin Heidelberg 2003
Transporting CAN Messages over WATM
725
ally extended to machine and automation markets. As CAN semiconductors produced by many different manufacturers are so inexpensive, its widespread use has found a way into such diverse areas as agricultural machinery, medical instrumentation, elevator controls, public transportation systems and industrial automation control components. [1] and [3] supply a detailed overview of the CAN features. CAN utilizes the CSMA/CD as the arbitration mechanism to enable its attached nodes to have access to the bus. Therefore, the maximum data rate that can be achieved depends on the bus’ length. The CAN message format includes 0-8 bytes of variable data and 47 bits of protocol control information (i.e., the identifier, CRC data, acknowledgement and synchronization bits, Fig. 1) [2]. The identifier field serves two purposes: assigning a priority for the transmission and allowing message filtering upon reception. The communication evolution in its final phase has led the digital data transfer technologies to ATM as the basis of B-ISDN concept. Satisfying the most demanding quality of service (QoS) requirements (e.g., of real-time applications) ATM has become internationally recognized to fully integrate networking systems, referring to the interconnection of all the Information Technologies and Telecommunications. ATM networks are inherently connection-oriented (though complex) and allow QoS guarantees. ATM provides superior features including flexibility, scalability, fast switching, and use of statistical multiplexing to utilize network resources efficiently. Information is sent in short fixed-length blocks called cells in ATM. The flexibility needed to support variable transmission rates is provided by transmitting the necessary number of cells per unit time. An ATM cell is 53 bytes, consisting of a 48-byte information field and a 5-byte header, as presented in Figure 2. The cell header contains a label (VCI/VPI) that is used in multiplexing, denoting the routing address. The header also embodies four other fields, i.e. GFC, PTI, CLP and HEC [2], [4].
$UELWUDWLRQ)LHOG 6 2 )
ELWGHQWLILHU
&QWUO)LHOG ELW 5 , 7 ' U '/& 5 (
'DWD)LHOG E\WHV
&5&)LHOG
$&. ELW
ELW&5&
(2) ELW
Fig. 1. CAN message format
*HQHULF)ORZ 93,9&, 3D\ORDG7\SH &HOO/RVV &RQWURO )LHOGV ,QGLFDWRU 3ULRULW\ *)& 37, &/3 %LWV
%LWV
%LWV
%LW
+HDGHU &KHFNVXP +(& %LWV
3D\ORDG
%\WHV
Fig. 2. ATM cell format
WATM technology provides wireless broadband access to a fixed ATM network. As a result, it introduces new functions/changes on the physical, data link, and access layers of the ATM protocol stack. The physical layer realizes the actual transmission of data over the physical medium by means of a radio or an optical transmitter/receiver pair. Therefore, the most challenging problem is to overcome multipath reception from stationary and moving objects, resulting in a space- and time-varying
726
I. Erturk
dispersive channel [2], [5]. The data link layer for WATM is focused on encapsulation, header compression, QoS, and ARQ (Automatic Repeat Request) and FEC (Forward Error Correction) techniques. The proposed data link layer scheme in this work aims to provide reliable transfer of the cells over the wireless point-to-point links where a large number of errors likely to occur, using a combination of the sliding window, and selective repeat ARQ and FEC techniques [2], [5]. In order to provide these new functions, the cell structure in the WATM part of the proposed network is slightly different from a regular ATM network. Therefore, a simple translation is sufficient to transfer ATM cells between wireless and regular ATM networks. In this paper that CAN messages are transported between fixed wireless CAN nodes using WATM is presented. Although there are various challenging wireless technologies, only WATM has the advantage of offering end-to-end multimedia capabilities with guaranteed QoS [5].
3 Interconnection of Fixed Wireless CAN Nodes Using WATM Considering WATM as an issue of access to an ATM network, here in this research work WATM is regarded as an extension of the ATM LAN for remote CAN users/nodes. The aim of this work is to allow wireless service between two end points (i.e., WATM-enabled CAN nodes) that do not move relative to each other during the life time of the connection [2]. The proposed CAN over ATM protocol stack for wireless ATM is shown in Figure 3. In the CAN-WATM mapping mechanism it is proposed that at the WATM-enabled CAN nodes, the Protocol Data Units (PDU) of the CAN protocol are encapsulated within those of the ATM to be carried over wireless ATM channels. Since a CAN message is 108 bits (0-64 bits of which are variable data), it can easily be fitted into one ATM cell payload. Thus, neither segmentation/reassembly of CAN messages nor data compression is necessary for carrying a CAN message in one ATM cell. At the destination WATM-enabled CAN node, header parts of the ATM cells are stripped off, and the CAN messages extracted from the ATM cell payloads can be processed or passed on to the CAN bus. As it may be indicated in the arbitration fields of the CAN messages, different kinds of multimedia application traffic can take advantage of the ATM QoS support. Before the actual data transmission takes place through the ATM network, for example, ABR (Available Bit Rate) traffic is multiplexed into AAL3/4 connections while CBR (Constant Bit Rate) traffic is multiplexed into AAL2 connections.
4 Modeling of the Proposed Scheme and Simulation Results Figure 4 shows the simulation model created to evaluate the proposed fixed wireless access of CAN nodes over WATM channels, using a commercially available simulation package called OPNET 9.0. Standard OPNET 9.0 modules such as terminals, servers, radio transceivers and ATM switches are explored to interconnect WATM-
Transporting CAN Messages over WATM
727
enabled CAN nodes to ATM terminals over the radio medium [2]. As demonstrated in Figure 4, the scenario contains eigth CAN nodes, two WATM-enabled CAN nodes (these are wireless ATM terminals, i.e., Base Station, BS) with a radio transceiver having access to the access point (AP) that is connected to an ATM switch and an ATM terminal. In this topology, a remote WATM-enabled CAN node gains access to a wired ATM network at the access point (AP) through which it can be connected to the other WATM-enabled CAN node. The BS receives CAN frames containing information from the application traffic sources (i.e., video (V) and data (D)), which are destined to the other BS over the radio medium using the ATM terminal in the wired ATM network. Not only does it place these frames in ATM cell payloads but also realizes multiplexing ABR (i.e., D) and VBR (i.e., V) traffic into AAL3/4 and AAL2 connections respectively. Finally, having inserted the wireless ATM DLC header including a Non-Control Message (1 bit), Acknowledgement (1 bit) and FEC data into the ATM cells, they are transmitted from the BS radio transmitter to the point-to-point receiver (the two constitute a radio transceiver) at the AP. 5HPRWH &$1:$70 &$1:$70 5HPRWH &$11RGH (QFDSVXODWLRQ'HFDSVXODWLRQ (QFDSVXODWLRQ'HFDSVXODWLRQ &$11RGH O O $SSOLFDWLRQ $SSOLFDWLRQ $$/ RUW $$/ RUW $SSOLFDWLRQ $SSOLFDWLRQ Q R 0:LUHOHVV QR& 'DWD 0:LUHOHVV & 'DWD 'DWD 'DWD /LQN /LQN /LQN 7$ '/& VVHO /LQN 7$ '/& VVHO /D\HU /D\HU /D\HU /D\HU H H :LUHOHVV :LUHOHVV U U : 0$& L :LUHOHVV : 0$& L '// '// '// '// : $70&KDQQHO : 3K\VLFDO &XVWRP 3K\VLFDO 3K\VLFDO 3K\VLFDO &XVWRP /D\HU :LUHOHVV /D\HU /D\HU /D\HU :LUHOHVV &$1%XV &$1%XV
Fig. 3. The proposed CAN over WATM layered system architecture
99LGHR$SSOLFDWLRQ ''DWD$SSOLFDWLRQ :6$707HUPLQDO ' %6 &$1 9
'9
:6
:,5('$70 1(7:25.
$3
$3
%6%DVH6WDWLRQ $3$FFHVV3RLQW ''99&$1QRGHV ' %6 &$1 9 '9
Fig. 4. Simulation model of the proposed scheme
Preliminary simulation results of the proposed model under varying load characteristics are presented. The simulation parameters used are given in Table 1. For the WATM-enabled CAN node application sources D and V, Figures 5 and 6 show the average cell delay (ACD) and cell delay variation (CDV) results as a function of simulation run time, and ACD results as a function of each CAN node traffic load, respectively. Since the ATM network has offered each application traffic a different and negotiated service, the ACD and CDV results for these two applications differ from each other noticeably. As it can be seen from the Figure 5, the ACD results vary between 6 ms. and 22 ms. and between 2 ms. and 7 ms. for the applications D and V, respectively, and the application V traffic experiences almost four times less CDVs
728
I. Erturk
compared to the other. Figure 6 clearly shows that same increase in both D and V traffic loads resulted in better ACD for the latter as a consequence of WATM’s QoS support.
5 Summary A new proposed scheme for carrying CAN application traffic over WATM is presented. Considering its easy usage in many industrial automation control areas, CAN protocol emerges inevitably to need wireless internetworking for greater flexibility for its nodes to be controlled and/or operated remotely. In this study, two different types of data traffic produced by a WATM-enabled CAN node are transferred to the other one over the radio medium using an ATM terminal. CAN over WATM mapping using encapsulation technique is also presented as an important part of the WATM-enabled CAN and ATM internetworking system. Simulation results show that not only different CAN application traffics can be transmitted over WATM but also required QoS support can be provided to the users. Table 1. Simulation Parameters 5,000* Kbytes/hour
Application Source D
Peak Cell Rate = 100 Kbps Minimum Cell Rate = 0.5 Kbps Initial Cell Rate = 0.5 Kbps Peak Cell Rate = 100 Kbps Sustainable Cell Rate = 50 Kbps Minimum Cell Rate = 20 Kbps 50 Kbps
ABR ATM Connection 10,000* Kbytes/hour
Application Source V
VBR-nrt ATM Connection Uplink Bit Rate
*Produced using Exponential Distribution Function
V P \ ODH ' OH & HJ DU YH$
V P \ ODH ' OH &
6LPXODWLRQ7LPHPLQ
$Y HUDJH&HOO'HOD\ )RU'PV &HOO'HOD\ 9DULDWLRQ)RU'PV
$Y HUDJH&HOO'HOD\ )RU9PV &HOO'HOD\ 9DULDWLRQ)RU9PV
Fig. 5. Cell delays vs. simulation time
/RDGPHVVDJHVHF
$Y HUDJH&HOO'HOD\ )RU'PV
$Y HUDJH&HOO'HOD\ )RU9PV
Fig. 6. Average cell delays vs. load
Transporting CAN Messages over WATM
729
References 1. Lawrenz, W.: CAN System Engineering: from Theory to Practical Applications. SpringerVerlag, New York, (1997) 2. Erturk, I.: Remote Access of CAN Nodes Used in a Traffic Control System th to a CMU over Wireless ATM. IEEE 4 MWCN. Sweden (Sep. 2002) 626–630 3. Farsi, M., Ratckiff, K., Babosa, M.: An Overview of Controller Area Network. Computing and Control Engineering Journal, Vol. 10, No. 3 (June 1999) 113–120 4. Erturk, I., Stipidis, E.: A New Approach for Real-time QoS Support in IP over ATM Networks. IEICE Trans. on Coms., Vol. E85-B, No. 10 (October 2002) 2311–2318 5. Ayanoglu, E.: Wireless Broadband and ATM Systems. Computer Networks, Vol. 31. Elsevier Science (1999) 395–409
A Hybrid Intrusion Detection Strategy Used for Web Security 1
Bo Yang 1, Han Li , Yi Li 1, and Shaojun Yang 2 1
School of Information Science and Engineering, Jinan University, Jinan,250022,P.R.China {yangbo,lihan,liyi}@ujn.edu.cn http://www.ujn.edu.cn 2 Department of Information Industry, Shandong Provincial Govement, Jinan,250011,P.R.China
[email protected] Abstract. This paper describes a novel framework for intrusion detection systems used for Web security. A hierarchical structure was proposed to gain both server-based detection and network-based detection. The system consists of three major components. First, there is a host detection module (HDM) in each web server and a collection of detection units (UC) running on background in the host. Second, each subnet has a network detection module (NDM), which operates just like a HDM except that it analyzes network traffic. Finally, there is a central control detection module (CCDM), which is served as a high level administrative center. The CCDM receives reports from various HDM and NDM modules, and by processing and correlating these reports to detect intrusions. Detection rules are inductively learned from audit records and distributed to each detection modules in the CCDM.
1 Introduction As web-based computer systems play increasingly vital roles in modern society, they have become the targets of our enemies and criminals. While most web systems attempt to prevent unauthorized use by some kind of access control mechanism, such as passwords, encryption, and digital signatures, there are still some factors that make it very difficult to keep these crackers from eventually gaining entry into a system[1][2]. Since the event of an attack should be considered inevitable, there is an obvious need for mechanisms that can detect crackers attempting to gain entry into a computer system, that can detect users misusing their system privileges, and that can monitor the networks connecting all of these systems together. Intrusion Detection System (IDS) are based on the principle that an attack on a Web system will be noticeably different from normal system activity. An intruder to a system (possibly masquerading as a legitimate user) is very likely to exhibit a pattern of behavior different from the normal behavior of a legitimate user. The job of the IDS is to detect these abnormal patterns by analyzing numerous sources of information that are provided by the existing systems[3]. G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 730–733, 2003. © Springer-Verlag Berlin Heidelberg 2003
A Hybrid Intrusion Detection Strategy Used for Web Security
731
Network intrusions such as eavesdropping, illegal remote accesses, remote breakins, inserting bogus information into data streams and flooding the network to reduce the effective channel capacity are becoming more common. While monitoring network activity can be difficult because of the great proliferation of (un)trusted open networks. So we propose a network intrusion detection model based on hierarchical structure and cooperative agent to detect the intrusive behaviors both on the web server and network to gain the Web security.
2 Model Architecture The proposed architecture for this intrusion detection system consists of three major components. First, there is a host detection module (HDM) in each Web server. This module is a collection of detection units (UC) running on background in the host. Second, each subnet that is monitored has a network detection module (NDM), which operates just like a HDM except that it analyzes LAN traffic. Finally, there is a central control detection module (CCDM), which is placed at a single secure location and served as an administrative center. The CCDM receives reports from various HDM and NDM modules, and by processing and correlating these reports, it is expected to detect intrusions. Fig.1 shows the structure of the system.
Router,Switch
DU
TransAgent
DU DU
HM
TransAgent
HDM
NDM TransAgent
DU
CCDM DU DU
HM
HDM
Net Manager
Database
Fig. 1. Architecture of the hybrid detection model
Knowledge Set
732
B. Yang et al.
3 Components 3.1 Host Detection Module (HDM) There are a collection of Detection Units (DU) and a Host Manager (HM) in each Web server and some special host. Each DU is responsible for detecting attacks that are assaulted to one of the system resources. HM deal with the reports submitted by DU, manager and configure DU.If DU detected an attack, then it alarm the network administrator and record the system log. Otherwise, it endow with a Hypothesis Value (HV) to this kind of accessing and report such information as {Process Symbol, User, CPU Occupancy, Memory Spending, HVi } to HM. HM broadcast this information to other DU. If any other DU also detect this kind of accessing, it increases it’s HVi. If “HVi > Vlimit”, this DU alarm and record log; or report to HM. 3.2 Network Detection Module (NDM) NDM is responsible for monitoring network access, detecting possible attack, cooperating with other Agents to detect intrusive behaviors. NDM’s implementation is similar to HDM’s. First, Local Detection Module detects the data (most of them are datagram information) collected from network. If it detects an attack, then gives alarm message. Otherwise it endows this accessing with a HVi and refers it to cooperative agent module. Cooperative agent module register the network information {Source Address, Destination Address, Protocol Type, User Information (for instance, Telnet user name),Databytes,Packet Header} to an access list and transport it to correlative cooperative agent module. If the increased HV received from local agent and other agents exceed the HV limit, then this module regard this accessing as an intrusive behavior, alarm to local agent and transmit the information to other agent according with the address in access list. Each cooperative agent received this information continue to transmit the intrusive information to other agents besides alarm to local agent. So they form a transmission chain, and all agents in this chain can receive the intrusive information and response the intrusion. 3.3 Central Control Detection Module (CCDM) CCDM is located in network center and monitored by the administrator. By monitoring the data passing through network access point, it can detect data coming from exterior and emitting to exterior. Since it is at the high-level of all ID Systems, it can detect intrusive behaviors that other low-level ID System can’t detect. CCDM can complete intrusion detection more comprehensive and exact than other module.
A Hybrid Intrusion Detection Strategy Used for Web Security
733
3.4 Communications Agent In our model, there is a Communication Agent (namely, TransAgent) in each detection host. TransAgent is the corresponding bridge of the local modules and other cooperative modules in other hosts. It records all collection information about ID modules in local host and cooperative hosts, and can also offer router service for datagrams. When ID module want to transmit datagram, it appoints the destination ID module, and transmits it to TransAgent. TransAgent transmits the datagram to destination agent in remote hosts. The communication content has the following format: ::=<Sender><Time> Because the communicative information between ID modules is brief information, source ID module can encapsulate information in UDP package and send to TransAgent. We also use TCP package to realize mass datagram transmission. In addition, XML is used to improve the functionality of the Web servers by providing more flexible and adaptable information identification. Using XML to transmit datagram will not be influenced by different operating systems.
4 Conclusion This paper presented an architecture for a hybrid intrusion detection strategy based on hierarchical structure and cooperative agent. The target environment is a heterogeneous network that may consist of different hosts, servers, and subnets. This strategy is able to :(1) detect attacks on the network itself, (2) detect attacks involving multiple hosts, (3) track tagged objects including users and sensitive files as they move around the network, (4) detect via erroneous or misleading reports, situations where a host might be taken over by an attacker, and (5) monitor the activity of any networked system that doesn’t have a host monitor, yet generates LAN activity, such as a PC. In the future works, we will analyze a number of different low-level IP attacks and the vulnerabilities that they exploit. Some new data mining algorithms will be used to training the detection models automatically. By adding these new futures to an IDS implementation, the host-based intrusion detection systems can more easily detect and react to low-level network attacks.
References 1. Denning D E. An intrusion-detection model, IEEE Computer Society Symposium on Research in Security and Privacy, Oakland, USA, (1987) 118–131 2. Smaha S E, Haystack:An intrusion detection system, Proc. of the IEEE Fourth Aerospace Computer Security Application Conference . Orlando,FL:IEEE, (1988)37–44 3. Christina Warrender, Stephanie Forrest, and Barak Pearlmutter. Detecting intrusions using system calls:alternative data models, Proc.of the 1999 IEEE Symposium on Security and Privacy, IEEE,(1999)133–145
Mining Sequence Pattern from Time Series Based on Inter-relevant Successive Trees Model Haiquan Zeng, Zhan Shen, and Yunfa Hu Computer Science Dept., Fudan University, Shanghai 200433, P.R.China
[email protected] Abstract. In this paper, a novel method is proposed to discover frequent pattern from time series. It first segments time series based on perceptually important points, then converted it into meaningful symbol sequences by the relative scope, finally used a new mining model to find frequent patterns. Compared with the previous methods, the method is simpler and more efficient.
1 Introduction Time series is a kind of important data existing in a lot of fields. By analyzing time series we can discover changing principle of things and provide help for decisionmaking. Mining patterns from time series is an important method to analyze time series, which discovers useful patterns that appear frequently (i.e. Sequence Pattern). Now researches in this field are attracting more and more attention, and becoming a subject for research with great theoretical and practical application. However, in many existing methods in the literature [1-5], the mined patterns are described in shape, which is difficult to understand and use. Moreover, these methods are generally based on the Apriori algorithm, which has to generate many pattern candidates so that the mining efficiency is degraded. To remedy the problems above, we proposed a sequence mining method based on Inter-Relevant Successive Trees Model. Experiment shows the method is simpler and more efficient.
2 Time Series Segmentation Description Suppose time series is S=<x1=(v1, t1)…xn=(vn,tn)>, where vi is the observed value on time, and ti+1-ti=∆=1(i=1,…,n-1).To facilitate the mining operation; time series is often described segment-wise. Although some approaches are proposed in [3,4,5], they all face the problems of heavy computation, failure of grasping the main changing characteristic. In this paper, we adopt a method to segmentation on the basis of comparatively more important points. Important Points are observation point that has important influence at the vision to series change. In fact, it is the local minimum or maximum points of series. Segmentation through choosing important points can eliminate the impact of noise on series effectively and maintain the main features patterns of time series’ change. G. Wang et al. (Eds.): RSFDGrC 2003, LNAI 2639, pp. 734–737, 2003. © Springer-Verlag Berlin Heidelberg 2003
Mining Sequence Pattern from Time Series
735
Definition 1 Given a constant R(R>1) and time series {<x1=(v1,t1),…,xn=(vn,tn)>},if a data point xm(1≤ m ≤n) is called an Important Minimum Point , it must satisfy one of the following conditions: 1. if 1<m