Lecture Notes in Artificial Intelligence Edited by J. G. Carbonell and J. Siekmann
Subseries of Lecture Notes in Computer Science
4099
Qiang Yang Geoff Webb (Eds.)
PRICAI 2006: Trends in Artificial Intelligence 9th Pacific Rim International Conference on Artificial Intelligence Guilin, China, August 7-11, 2006 Proceedings
13
Series Editors Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA Jörg Siekmann, University of Saarland, Saarbrücken, Germany Volume Editors Qiang Yang Hong Kong University of Science and Technology Department of Computer Science and Engineering Clearwater Bay, Kowloon, Hong Kong, China E-mail:
[email protected] Geoff Webb Monash University School of Information Technology P.O. Box 75, Victoria 3800, Australia E-mail:
[email protected] Library of Congress Control Number: 2006929802
CR Subject Classification (1998): I.2, F.1 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13
0302-9743 3-540-36667-9 Springer Berlin Heidelberg New York 978-3-540-36667-6 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2006 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 11801603 06/3142 543210
Preface
The Pacific Rim International Conference on Artificial Intelligence (PRICAI) is one of the preeminent international conferences on artificial intelligence (AI). PRICAI 2006 (http://www.csse.monash.edu.au/pricai06/Header.htm) was the ninth in this series of biennial international conferences highlighting the most significant contributions to the field of AI. The conference was held during August 7-11, 2006, in the beautiful city of Guilin in Guangxi Province, China. As in previous years, this year’s technical program saw very high standards in both the submission and paper review process, resulting in an exciting program that reflects the great variety and depth of modern AI research. This year’s contributions covered all traditional areas of AI, including machine learning and data mining, knowledge representation and planning, probabilistic reasoning, constraint satisfaction, computer vision and automated agents, as well as various exciting and innovative applications of AI to many different areas. There was particular emphasis in the areas of machine learning and data mining, intelligent agents, evolutionary computing and intelligent image and video analysis. The technical papers in this volume were selected from a record of 596 submissions after a rigorous review process. Each submission was reviewed by at least three members of the PRICAI Program Committee, including at least two reviewers and one Vice Program Chair. Decisions were reached following discussions among the reviewers of each paper, and finalized in a highly selective process that balanced many aspects of a paper, including the significance of the contribution and originality, technical quality and clarity of contributions, and relevance to the conference objectives. The technical paper review process was very selective. Out of the 596 submissions, we accepted 81(13.6%) papers for oral presentation and 87 papers (14.6%) for presentation as posters at the conference. This corresponds to an overall acceptance rate of 28.8% among all submissions. In addition, we were honored to have keynote speeches by notable leaders in the field: Pedro Domingos, Ah Chung Tsoi, Wei-Xiong Zhang, Ning Zhong and Zhi-Hua Zhou and an invited talk by a further distinguished academic, Kuang-chih Huang. In addition to the main conference, PRICAI 2006 also featured an exciting tutorial program and workshop program, as well as several co-located international conferences. PRICAI 2006 relied on the generous help of many people. We extend our appreciation to the Vice PC Chairs: David Albrecht, Hung Bui, William Cheung, John Debenham, Achim Hoffmann, Huan Liu, Wee Keong Ng, Hui Xiong, Mingsheng Ying, Shichao Zhang and Zhi-Hua Zhou, as well as the hard work of 333 members of the Program Committee and reviewers. We thank in particular the professional help of Rong Pan and Michelle Kinsman, who provided an enormous amount of assistance with the conference reviewing system and website, and thank the generous help of the previous PRICAI Chair Chengqi Zhang
VI
Preface
and the support of Local Arrangement Chairs Shichao Zhang and Taoshen Li. We thank the strong support of the Conference General Chairs Ruqian Lu and Hideyuki Nakashima. We also thank the PRICAI Steering Committee for giving us this chance to co-chair the PRICAI 2006 conference, and Springer for its continuing support in publishing the proceedings.
August 2006
Qiang Yang and Geoff Webb
Organization
Conference Co-chairs: Ruqian Lu (Chinese Academy of Sciences) China Hideyuki Nakashima (Future University - Hakodate) Japan
Program Committee Co-chairs: Qiang Yang (Hong Kong University of Science and Technology) Hong Kong Geoff Webb (Monash University) Australia
Organizing Chair: Shichao Zhang (Guangxi Normal University) China Taoshen Li (Guangxi Normal University) China
Workshops Chair: Riichiro Mizoguchi (Osaka University) Japan Rong Pan (Hong Kong University of Science and Technology) Hong Kong
Tutorials Chair: Charles Ling (University of Western Ontario) Canada
Industrial Chair: Wei-Ying Ma (Microsoft Research) China
Sponsorship Co-chairs: Zhongzhi Shi (Chinese Academy of Sciences) China Chengqi Zhang (University of Technology, Sydney) Australia
Publicity Co-chairs: Jian Pei (Simon Fraser University) Canada Xudong Luo (University of Southampton) UK
VIII
Organization
Program Committee Vice Chairs: David Albrecht (Monash University) Australia Hung Bui (SRI International) USA William Cheung (Hong Kong Baptist University) Hong Kong John Debenham (University of Technology, Sydney) Australia Achim Hoffmann (University of New South Wales) Australia Huan Liu (Arizona State University) USA Wee Keong Ng (Nanyang Technological University) Singapore Hui Xiong (Rutgers University) USA Mingsheng Ying (Tsinghua University) China Shichao Zhang (University of Technology, Sydney) Australia Zhi-Hua Zhou (Nanjing University) China
Program Committee: David Albrecht Aijun An A. Anbulagan Hiroki Arimura Laxmidhar Behera Hung Bui Longbing Cao Tru Cao Nicholas Cercone Rong Chen Yin Chen Zheng Chen Jian-Hung Chen Songcan Chen David Cheung William Cheung Yiu-ming Cheung Sung-Bae Cho Andy Chun Paul Compton Jirapun Daengdej Honghua Dai Dao-Qing Dai Pallab Dasgupta Manoranjan Dash John Debenham James Delgrande Zhi-Hong Deng Norman Foo
Christian Freksa Yan Fu Sharon XiaoYing Gao Yang Gao Shyam Gupta Udo Hahn James Harland Achim Hoffmann Jiman Hong Michael Horsch Wynne Hsu Xiangji Huang Joshua Huang Shell Ying Huang Zhiyong Huang Mitsuru Ishizuka Sanjay Jain Margaret Jefferies Rong Jin Geun Sik Jo Ken Kaneiwa Hiroyuki Kawano Ray Kemp Shamim Khan Deepak Khemani Boonserm Kijsirikul Eun Yi Kim Yasuhiko Kitamura Alistair Knott
Organization
Ramamohanarao Kotagiri Peep Kngas Kazuhiro Kuwabara James Kwok Wai Lam Longin Jan Latecki Wee Sun Lee Tze Yun Leong Xue Li Hang Li Chun-Hung Li Jinyan Li Gerard Ligozat Ee-Peng Lim Zuoquan Lin Hong Liu Tie-Yan Liu Jiming Liu Bing Liu Huan Liu Dickson Lukose Xudong Luo Michael Maher Yuji Matsumoto Chris Messon Chunyan Miao KyongHo Min Antonija Mitrovic Shivashankar Nair Geok See Ng Wee Keong Ng Zaiqing Nie Masayuki Numao Takashi Okada Lin Padgham Rong Pan Jeng-Shyang Pan Hyeyoung Park Fred Popowich Arun K. Pujari Hiok Chai Quek Anca Luminita Ralescu Jochen Renz Claude Sammut Ken Satoh
Rudy Setiono Yidong Shen ZhongZhi Shi Daming Shi Akira Shimazu Carles Sierra Arul Siromoney Raymund Sison Paul Snow Von-Wun Soo Kaile Su Ruixiang Sun Wing Kin Sung Hideaki Takeda Ah-Hwee Tan Chew Lim Tan Qing Tao Takao Terano John Thornton Kai Ming Ting Shusaku Tsumoto Miroslav Velev Toby Walsh Huaiqing Wang Jun Wang Lipo Wang Takashi Washio Ian Watson Geoff Webb Ji-Rong Wen Graham Williams Wayne Wobcke Limsoon Wong Zhaohui Wu Xindong Wu Hui Xiong Baowen Xu Seiji Yamada Jun Yan Qiang Yang Ying Yang Hyun Seung Yang Yiyu Yao Min Yao Roland H. C. Yap
IX
X
Organization
Dit-Yan Yeung Jian Yin Mingsheng Ying Xinghuo Yu Jeffrey Xu Yu Lei Yu Jian Yu Pong Chi Yuen Huajun Zeng Hongbin Zha Chengqi Zhang Zili Zhang
Benyu Zhang Junping Zhang Mingyi Zhang Xuegong Zhang Byoung-Tak Zhang Jian Zhang Shichao Zhang Jun Zhang Ning Zhong Aoying Zhou Shuigeng Zhou Zhi-Hua Zhou
Additional Reviewers: Mina Akaishi Bill Andreopoulos Hiroshi Aoyama Naresh Babu Nilufar Baghaei Stuart Bain Shankar Balachandran Ravindran Balaraman Thomas Barkowsky J. P. Bekmann Sven Bertel Michael Blumenstein Abdenour Bouzouane Tiberio Caetano Lawrence Cavedon Hong Chang Ratthachat Chatpatanasiri Qingliang Chen Jilin Chen Yuanhao Chen Jie Chen Ding-Yi Chen Kenil Cheng Pak-Ming Cheung Vic Ciesielski Andrew Connor Diana Cukierman Guang Dai Ugo Dal Lago
Martina Dankova Luc De Raedt Marina De Vos Aldric Degorre Mike Dixon Jeremy Dokter Didier Dubois Frank Dylla Weiguo Fan Joel Fenwick Liliana Mara Carrillo Flrez Lutz Frommberger Ken-ichi Fukui Naoki Fukuta Chun Che Fung Gabriel Fung Dorian Gaertner Bin Gao Alban Grastien Xue Gui-Rong Makoto Haraguchi Xuefeng He Keijo Heljanko Jan Hladik Chenyong Hu Jinbo Huang Aaron Hunter Sun Jian-Tao Mike Jones
Organization
Sindhu Joseph Rohit Joshi Norihiro Kamide Gour Karmakar Yoshikiyo Kato Elizabeth Kemp Philip Kilby Kazuki Kobayashi Takanori Komatsu Yasuo Kudo Sreenivasa Kumar Satoshi Kurihara Roberto Legaspi Hua Li Xiaodong Li Guoliang Li Chavalit Likitvivatanavong Chenxi Lin Han Lin Ning Liu Jimmy Liu Yang Liu Xiangyu Luo Wei Luo Stephen MacDonell Stephen Marsland Akio Maruyama Shouichi Matsui Le Ngoc Minh Nguyen Le Minh Masaharu Mizumoto Mikihiko Mori Koji Morikawa Koichi Moriyama Masao Mukaidono Hiroshi Murata Tsuyoshi Murata N. S. Narayanaswamy Nide Naoyuki Yoshimasa Ohmoto Masayuki Okabe Yoshiaki Okubo Takashi Onoda Mehmet A. Orgun Mehrdad Oveisi
Piero Pagliani Jeffrey Junfeng Pan Tanasanee Phienthrakul Adrin Perreau de Pinninck Kim Leng Poh Wayne Pullan Prasertsak Pungprasertying Josep Puyol-Gruart Ho Bao Quoc M. Masudur Rahman Shri Rai Arthur Ramer Delip Rao Ramesh Rayudu Jochen Renz Kai-Florian Richter Juan A. Rodrguez-Aguilar Maxim Roy Jordi Sabater-Mir Ashish Sabharwal Falko Schmid Holger Schultheis Inessa Seifert Steven Shapiro Andy Song Fausto Spoto Anantaporn Srisawat Sufatrio Xichen Sun Yasufumi Takama Shiro Takata Martti Tammi Thora Tenbrink Quan Thanh Tho Mirek Truszczynski Ivor Tsang Dinh Duc Anh Vu Jan Oliver Wallgrn Meng wang Gang Wang Minhong Wang Yang Wendy Wang Amali Weerasinghe Miao Wen Michael Winter
XI
XII
Organization
Kok Wai Wong Swee Seong Wong Bozena Wozna Wensi Xi Shuicheng Yan
Sheng Zhang Kai Zhang Mengjie Zhang Xin Zheng
Table of Contents
Keynote Speech Learning, Logic, and Probability: A Unified View Pedro Domingos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Impending Web Intelligence (WI) and Brain Informatics (BI) Research Ning Zhong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Learning with Unlabeled Data and Its Application to Image Retrieval Zhi-Hua Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Regular Papers Intelligent Agents Learning as Abductive Deliberations Budhitama Subagdja, Iyad Rahwan, Liz Sonenberg . . . . . . . . . . . . . . . .
11
Using a Constructive Interactive Activation and Competition Neural Network to Construct a Situated Agent’s Experience Wei Peng, John S. Gero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
Rule-Based Agents in Temporalised Defeasible Logic Guido Governatori, Vineet Padmanabhan, Antonino Rotolo . . . . . . . .
31
Compact Preference Representation for Boolean Games Elise Bonzon, Marie-Christine Lagasquie-Schiex, J´erˆ ome Lang . . . . . .
41
Agent-Based Flexible Videoconference System with Automatic QoS Parameter Tuning Sungdoke Lee, Sanggil Kang, Dongsoo Han . . . . . . . . . . . . . . . . . . . . . . .
51
Kalman Filter Based Dead Reckoning Algorithm for Minimizing Network Traffic Between Mobile Game Users in Wireless GRID Seong-Whan Kim, Ki-Hong Ko . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
Affective Web Service Design Insu Song, Guido Governatori . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
XIV
Table of Contents
An Empirical Study of Data Smoothing Methods for Memory-Based and Hybrid Collaborative Filtering Dingyi Han, Gui-Rong Xue, Yong Yu . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
Eliminate Redundancy in Parallel Search: A Multi-agent Coordination Approach Jiewen Luo, Zhongzhi Shi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
Intelligent Market Based Learner Modeling Maryam Ashoori, Chun Yan Miao, Angela Eck Soong Goh, Wang Qiong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
101
User Preference Through Bayesian Categorization for Recommendation Kyung-Yong Jung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
112
Automated Reasoning A Stochastic Non-CNF SAT Solver Rafiq Muhammad, Peter J. Stuckey . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
120
Reasoning About Hybrid Probabilistic Knowledge Bases Kedian Mu, Zuoquan Lin, Zhi Jin, Ruqian Lu . . . . . . . . . . . . . . . . . . . .
130
Update Rules for Parameter Estimation in Continuous Time Bayesian Network Dongyu Shi, Jinyuan You . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
140
On Constructing Fibred Tableaux for BDI Logics Vineet Padmanabhan, Guido Governatori . . . . . . . . . . . . . . . . . . . . . . . .
150
The Representation of Multiplication Operation on Fuzzy Numbers and Application to Solving Fuzzy Multiple Criteria Decision Making Problems Chien-Chang Chou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
161
Finding a Natural-Looking Path by Using Generalized Visibility Graphs Kyeonah Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
170
Comparison Between Two Languages Used to Express Planning Goals: CT L and EA GLE Wei Huang, Zhonghua Wen, Yunfei Jiang, Aixiang Chen . . . . . . . . . .
180
Trajectory Modification Using Elastic Force for Collision Avoidance of a Mobile Manipulator Nak Yong Ko, Reid G. Simmons, Dong Jin Seo . . . . . . . . . . . . . . . . . . .
190
Table of Contents
XV
A Hybrid Architecture Combining Reactive Plan Execution and Reactive Learning Samin Karim, Liz Sonenberg, Ah-Hwee Tan . . . . . . . . . . . . . . . . . . . . . .
200
A Knowledge-Based Modeling System for Time-Critical Dynamic Decision-Making Yanping Xiang, Kim-Leng Poh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
212
Machine Learning and Data Mining Mining Frequent Itemsets for Protein Kinase Regulation Qingfeng Chen, Yi-Ping Phoebe Chen, Chengqi Zhang, Lianggang Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
222
Constructing Bayesian Networks from Association Analysis Ohm Sornil, Sunatashee Poonvutthikul . . . . . . . . . . . . . . . . . . . . . . . . . . .
231
Bayesian Approaches to Ranking Sequential Patterns Interestingness Kuralmani Vellaisamy, Jinyan Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
241
Mining Multi-dimensional Frequent Patterns Without Data Cube Construction Chuan Li, Changjie Tang, Zhonghua Yu, Yintian Liu, Tianqing Zhang, Qihong Liu, Mingfang Zhu, Yongguang Jiang . . . . . .
251
A New Approach to Symbolic Classification Rule Extraction Based on SVM Dexian Zhang, Tiejun Yang, Ziqiang Wang, Yanfeng Fan . . . . . . . . . .
261
Feature Selection for Bagging of Support Vector Machines Guo-Zheng Li, Tian-Yu Liu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
271
Neural Classification of Lung Sounds Using Wavelet Packet Coefficients Energy Yi Liu, Caiming Zhang, Yuhua Peng . . . . . . . . . . . . . . . . . . . . . . . . . . . .
278
Wireless Communication Quality Monitoring with Artificial Neural Networks Dauren F. Akhmetov, Minoru Kotaki . . . . . . . . . . . . . . . . . . . . . . . . . . . .
288
Prediction of MPEG Video Source Traffic Using BiLinear Recurrent Neural Networks Dong-Chul Park, Chung Nguyen Tran, Young-Soo Song, Yunsik Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
298
XVI
Table of Contents
Dynamic Neural Network-Based Fault Diagnosis for Attitude Control Subsystem of a Satellite Z.Q. Li, L. Ma, K. Khorasani . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
308
Gauss Chaotic Neural Networks Yao-qun Xu, Ming Sun, Ji-hong Shen . . . . . . . . . . . . . . . . . . . . . . . . . . .
319
Short-Term Load Forecasting Using Multiscale BiLinear Recurrent Neural Network Dong-Chul Park, Chung Nguyen Tran, Yunsik Lee . . . . . . . . . . . . . . . .
329
A Comparison of Selected Training Algorithms for Recurrent Neural Networks Suwat Pattamavorakun, Suwarin Pattamavorakun . . . . . . . . . . . . . . . . .
339
Neural Network Recognition of Scanning Electron Microscope Image for Plasma Diagnosis Byungwhan Kim, Wooram Ko, Seung Soo Han . . . . . . . . . . . . . . . . . . .
350
A New Multi-constrained QoS Routing Algorithm in Mobile Ad Hoc Networks Hu Bin, Liu Hui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
358
Sparse Kernel Ridge Regression Using Backward Deletion Ling Wang, Liefeng Bo, Licheng Jiao . . . . . . . . . . . . . . . . . . . . . . . . . . .
365
Using Locally Weighted Learning to Improve SMOreg for Regression Chaoqun Li, Liangxiao Jiang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
375
Palmprint Recognition Using Wavelet and Support Vector Machines Xinhong Zhou, Yuhua Peng, Ming Yang . . . . . . . . . . . . . . . . . . . . . . . . .
385
Context Awareness System Modeling and Classifier Combination Mi Young Nam, Suman Sedai, Phill Kyu Rhee . . . . . . . . . . . . . . . . . . . .
394
Non-negative Matrix Factorization on Kernels Daoqiang Zhang, Zhi-Hua Zhou, Songcan Chen . . . . . . . . . . . . . . . . . . .
404
Modelling Citation Networks for Improving Scientific Paper Classification Performance Mengjie Zhang, Xiaoying Gao, Minh Duc Cao, Yuejin Ma . . . . . . . . .
413
Analysis on Classification Performance of Rough Set Based Reducts Qinghua Hu, Xiaodong Li, Daren Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . .
423
Table of Contents
XVII
Parameter Optimization of Kernel-Based One-Class Classifier on Imbalance Text Learning Ling Zhuang, Honghua Dai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
434
Clustering-Based Nonlinear Dimensionality Reduction on Manifold Guihua Wen, Lijun Jiang, Jun Wen, Nigel R. Shadbolt . . . . . . . . . . . .
444
Sparse Kernel PCA by Kernel K-Means and Preimage Reconstruction Algorithms Sanparith Marukatat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
454
Clustering-Based Relevance Feedback for Web Pages Seung Yeol Yoo, Achim Hoffmann . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
464
Building Clusters of Related Words: An Unsupervised Approach Deepak P, Delip Rao, Deepak Khemani . . . . . . . . . . . . . . . . . . . . . . . . . .
474
Natural Language Processing and Speech Recognition Recognition of Simultaneous Speech by Estimating Reliability of Separated Signals for Robot Audition Shun’ichi Yamamoto, Ryu Takeda, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean-Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
484
Chinese Abbreviation-Definition Identification: A SVM Approach Using Context Information Xu Sun, Houfeng Wang, Yu Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
495
Clause Boundary Recognition Using Support Vector Machines Hyun-Ju Lee, Seong-Bae Park, Sang-Jo Lee, Se-Young Park . . . . . . . .
505
Large Quantity of Text Classification Based on the Improved Feature-Line Method XianFei Zhang, BiCheng Li, WenBin Mu, Yin Liu . . . . . . . . . . . . . . . .
515
Automatic Multi-level Summarizations Generation Based on Basic Semantic Unit for Sports Video Jianyun Chen, Xinyu Zhao, Miyi Duan, Tingting Wu, Songyang Lao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
524
Query-Topic Focused Web Pages Summarization Seung Yeol Yoo, Achim Hoffmann . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
533
XVIII
Table of Contents
Computer Vision Invariant Color Model-Based Shadow Removal in Traffic Image and a New Metric for Evaluating the Performance of Shadow Removal Methods Young Sung Soh, Hwanju Lee, Yakun Wang . . . . . . . . . . . . . . . . . . . . . .
544
Uncontrolled Face Recognition by Individual Stable Neural Network Xin Geng, Zhi-Hua Zhou, Honghua Dai . . . . . . . . . . . . . . . . . . . . . . . . .
553
Fuzzy Velocity-Based Temporal Dependency for SVM-Driven Realistic Facial Animation Pith Xie, Yiqiang Chen, Junfa Liu, Dongrong Xiao . . . . . . . . . . . . . . .
563
Re-ordering Methods in Adaptive Rank-Based Re-indexing Scheme Kang Soo You, Jae Ho Choi, Hoon Sung Kwak . . . . . . . . . . . . . . . . . . .
573
Use of Nested K-Means for Robust Head Location in Visual Surveillance System Hyun Jea Joo, Bong Won Jang, Suman Sedai, Phill Kyu Rhee . . . . . .
583
Appearance Based Multiple Agent Tracking Under Complex Occlusions Prithwijit Guha, Amitabha Mukerjee, K.S. Venkatesh . . . . . . . . . . . . . .
593
Perception and Animation Variable Duration Motion Texture for Human Motion Modeling Tianyu Huang, Fengxia Li, Shouyi Zhan, Jianyuan Min . . . . . . . . . . .
603
A Novel Motion Blending Approach Based on Fuzzy Clustering Xiangbin Zhu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
613
Efficient Optimization of Inpainting Scheme and Line Scratch Detection for Old Film Restoration Seong-Whan Kim, Ki-Hong Ko . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
623
Partial Encryption of Digital Contents Using Face Detection Algorithm Kwangjin Hong, Keechul Jung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
632
Relevance Feedback Using Adaptive Clustering for Region Based Image Similarity Retrieval Deok-Hwan Kim, Seok-Lyong Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
641
Table of Contents
XIX
Evolutionary Computing Learning and Evolution Affected by Spatial Structure Masahiro Ono, Mitsuru Ishizuka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
651
Immune Clonal Selection Evolutionary Strategy for Constrained Optimization Wenping Ma, Licheng Jiao, Maoguo Gong, Ronghua Shang . . . . . . . .
661
An Intelligent System for Supporting Personal Creativity Based on Genetic Algorithm Heng-Li Yang, Cheng-Hwa Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
671
Generating Creative Ideas Through Patents Guihua Wen, Lijun Jiang, Jun Wen, Nigel R. Shadbolt . . . . . . . . . . . .
681
An Improved Multiobjective Evolutionary Algorithm Based on Dominating Tree Chuan Shi, Qingyong Li, Zhiyong Zhang, Zhongzhi Shi . . . . . . . . . . . .
691
Fuzzy Genetic System for Modelling Investment Portfolio Rahib H. Abiyev, Mustafa Menekay . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
701
Fuzzy Genetic Algorithms for Pairs Mining Longbing Cao, Dan Luo, Chengqi Zhang . . . . . . . . . . . . . . . . . . . . . . . . .
711
A Novel Feature Selection Approach by Hybrid Genetic Algorithm Jinjie Huang, Ning Lv, Wenlong Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
721
Evolutionary Ensemble Based Pattern Recognition by Data Context Definition Mi Young Nam, In Ja Jeon, Phill Kyu Rhee . . . . . . . . . . . . . . . . . . . . . .
730
Quantum-Behaved Particle Swarm Optimization with a Hybrid Probability Distribution Jun Sun, Wenbo Xu, Wei Fang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
737
A Selection Scheme for Excluding Defective Rules of Evolutionary Fuzzy Path Planning Jong-Hwan Park, Jong-Hwan Kim, Byung-Ha Ahn, Moon-Gu Jeon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
747
An Improved Genetic-Based Particle Swarm Optimization for No-Idle Permutation Flow Shops with Fuzzy Processing Time Niu Qun, Xingsheng Gu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
757
XX
Table of Contents
Industrial Applications Determinants of E-CRM in Influencing Customer Satisfaction Yan Liu, Chang-Feng Zhou, Ying-Wu Chen . . . . . . . . . . . . . . . . . . . . . .
767
Penalty Guided PSO for Reliability Design Problems Ta-Cheng Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
777
Developing Methodologies of Knowledge Discovery and Data Mining to Investigate Metropolitan Land Use Evolution Yongliang Shi, Jin Liu, Rusong Wang, Min Chen . . . . . . . . . . . . . . . . .
787
Vibration Control of Suspension System Based on a Hybrid Intelligent Control Algorithm Ke Zhang, Shiming Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
797
Short Papers Part Intelligent Agents An Intelligent Conversational Agent as the Web Virtual Representative Using Semantic Bayesian Networks Kyoung-Min Kim, Jin-Hyuk Hong, Sung-Bae Cho . . . . . . . . . . . . . . . . .
807
Three-Tier Multi-agent Approach for Solving Traveling Salesman Problem Shi-Liang Yan, Ke-Feng Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
813
Adaptive Agent Selection in Large-Scale Multi-Agent Systems Toshiharu Sugawara, Kensuke Fukuda, Toshio Hirotsu, Shin-ya Sato, Satoshi Kurihara . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
818
A Mobile Agent Approach to Support Parallel Evolutionary Computation Wei-Po Lee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
823
The Design of Fuzzy Controller by Means of Genetic Algorithms and NFN-Based Estimation Technique Sung-Kwun Oh, Jeoung-Nae Choi, Seong-Whan Jang . . . . . . . . . . . . . .
829
GA-Based Polynomial Neural Networks Architecture and Its Application to Multi-variable Software Process Sung-Kwun Oh, Witold Pedrycz, Wan-Su Kim, Hyun-Ki Kim . . . . . .
834
Table of Contents
XXI
Topical and Temporal Visualization Using Wavelets T. Mala, T.V. Geetha, Sathish Kumar . . . . . . . . . . . . . . . . . . . . . . . . . . .
839
LP-TPOP: Integrating Planning and Scheduling Through Constraint Programming Yuechang Liu, Yunfei Jiang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
844
Integrating Insurance Services, Trust and Risk Mechanisms into Multi-agent Systems Yuk-Hei Lam, Zili Zhang, Kok-Leong Ong . . . . . . . . . . . . . . . . . . . . . . . .
849
Cat Swarm Optimization Shu-Chuan Chu, Pei-wei Tsai, Jeng-Shyang Pan . . . . . . . . . . . . . . . . . .
854
Heuristic Information Based Improved Fuzzy Discrete PSO Method for Solving TSP Bin Shen, Min Yao, Wensheng Yi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
859
Automated Reasoning A Network Event Correlation Algorithm Based on Fault Filtration Qiuhua Zheng, Yuntao Qian, Min Yao . . . . . . . . . . . . . . . . . . . . . . . . . . .
864
CPR Localization Using the RFID Tag-Floor Jung-Wook Choi, Dong-Ik Oh, Seung-Woo Kim . . . . . . . . . . . . . . . . . . .
870
Development of a Biologically-Inspired Mesoscale Robot Abdul A. Yumaryanto, Jaebum An, Sangyoon Lee . . . . . . . . . . . . . . . . .
875
Timed Petri-Net(TPN) Based Scheduling Holon and Its Solution with a Hybrid PSO-GA Based Evolutionary Algorithm(HPGA) Fuqing Zhao, Yahong Yang, Qiuyu Zhang, Huawei Yi . . . . . . . . . . . . .
880
Recognition Rate Prediction for Dysarthric Speech Disorder Via Speech Consistency Score Prakasith Kayasith, Thanaruk Theeramunkong, Nuttakorn Thubthong . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
885
An Emotion-Driven Musical Piece Generator for a Constructive Adaptive User Interface Roberto Legaspi, Yuya Hashimoto, Masayuki Numao . . . . . . . . . . . . . .
890
An Adaptive Inventory Control Model for a Supply Chain with Nonstationary Customer Demands Jun-Geol Baek, Chang Ouk Kim, Ick-Hyun Kwon . . . . . . . . . . . . . . . . .
895
XXII
Table of Contents
Context-Aware Product Bundling Architecture in Ubiquitous Computing Environments Hyun Jung Lee, Mye M. Sohn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
901
A Relaxation of a Semiring Constraint Satisfaction Problem Using Combined Semirings Louise Leenen, Thomas Meyer, Peter Harvey, Aditya Ghose . . . . . . . .
907
Causal Difference Detection Using Bayesian Networks Tomoko Murakami, Ryohei Orihara . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
912
Tabu Search for Generalized Minimum Spanning Tree Problem Zhenyu Wang, Chan Hou Che, Andrew Lim . . . . . . . . . . . . . . . . . . . . . .
918
Evolutionary Computing Investigation of Brood Size in GP with Brood Recombination Crossover for Object Recognition Mengjie Zhang, Xiaoying Gao, Weijun Lou, Dongping Qian . . . . . . . .
923
An Immune Algorithm for the Optimal Maintenance of New Consecutive-Component Systems Y.-C. Hsieh, P.-S. You . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
929
Immune Genetic Algorithm and Its Application in Optimal Design of Intelligent AC Contactors Li-an Chen, Peiming Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
934
The Parametric Design Based on Organizational Evolutionary Algorithm Chunhong Cao, Bin Zhang, Limin Wang, Wenhui Li . . . . . . . . . . . . . .
940
Buying and Selling with Insurance in Open Multi-agent Marketplace Yuk-Hei Lam, Zili Zhang, Kok-Leong Ong . . . . . . . . . . . . . . . . . . . . . . . .
945
Game Ensemble Evolution of Checkers Players with Knowledge of Opening, Middle and Endgame Kyung-Joong Kim, Sung-Bae Cho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
950
Dynamic Game Level Design Using Gaussian Mixture Model Sangkyung Lee, Keechul Jung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
955
Table of Contents
XXIII
Machine Learning and Data Mining Application Architecture of Data Mining in Telecom Customer Relationship Management Based on Swarm Intelligence Peng Jin, Yunlong Zhu, Sufen Li, Kunyuan Hu . . . . . . . . . . . . . . . . . . .
960
Mining Image Sequence Similarity Patterns in Brain Images Haiwei Pan, Xiaoqin Xie, Wei Zhang, Jianzhong Li . . . . . . . . . . . . . . .
965
Weightily Averaged One-Dependence Estimators Liangxiao Jiang, Harry Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
970
SV-kNNC: An Algorithm for Improving the Efficiency of k-Nearest Neighbor Anantaporn Srisawat, Tanasanee Phienthrakul, Boonserm Kijsirikul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
975
A Novel Support Vector Machine Metamodel for Business Risk Identification Kin Keung Lai, Lean Yu, Wei Huang, Shouyang Wang . . . . . . . . . . . .
980
Performing Locally Linear Embedding with Adaptable Neighborhood Size on Manifold Guihua Wen, Lijun Jiang, Jun Wen, Nigel R. Shadbolt . . . . . . . . . . . .
985
Stroke Number and Order Free Handwriting Recognition for Nepali K.C. Santosh, Cholwich Nattee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
990
Diagnosis Model of Radio Frequency Impedance Matching in Plasma Equipment by Using Neural Network and Wavelets Byungwhan Kim, Jae Young Park, Dong Hwan Kim, Seung Soo Han . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
995
Program Plagiarism Detection Using Parse Tree Kernels Jeong-Woo Son, Seong-Bae Park, Se-Young Park . . . . . . . . . . . . . . . . . 1000 Determine the Optimal Parameter for Information Bottleneck Method Gang Li, Dong Liu, Yangdong Ye, Jia Rong . . . . . . . . . . . . . . . . . . . . . . 1005 Optimized Parameters for Missing Data Imputation Shichao Zhang, Yongsong Qin, Xiaofeng Zhu, Jilian Zhang, Chengqi Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1010 Expediting Model Selection for Support Vector Machines Based on an Advanced Data Reduction Algorithm Yu-Yen Ou, Guan-Hau Chen, Yen-Jen Oyang . . . . . . . . . . . . . . . . . . . . 1017
XXIV
Table of Contents
Study of the SMO Algorithm Applied in Power System Load Forecasting Jingmin Wang, Kanzhang Wu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1022 Filtering Objectionable Image Based on Image Content Zhiwei Jiang, Min Yao, Wensheng Yi . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027 MRA Kernel Matching Pursuit Machine Qing Li, Licheng Jiao, Shuyuan Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . 1032 Multiclass Microarray Data Classification Using GA/ANN Method Tsun-Chen Lin, Ru-Sheng Liu, Ya-Ting Chao, Shu-Yuan Chen . . . . . 1037 Texture Classification Using Finite Ridgelet Transform and Support Vector Machines Yunxia Liu, Yuhua Peng, Xinhong Zhou . . . . . . . . . . . . . . . . . . . . . . . . . 1042 Reduction of the Multivariate Input Dimension Using Principal Component Analysis Jianhui Xi, Min Han . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047 Designing Prolog Semantics for a Class of Observables Lingzhong Zhao, Tianlong Gu, Junyan Qian, Guoyong Cai . . . . . . . . . 1052 A Fingerprint Capture System and the Corresponding Image Quality Evaluation Algorithm Based on FPS200 Hong Huang, Jianwei Li, Wei He . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058 Multi-agent Motion Tracking Using the Particle Filter in ISpace with DINDs TaeSeok Jin, ChangHoon Park, Soo-hong Park . . . . . . . . . . . . . . . . . . . 1063 Combining Multiple Sets of Rules for Improving Classification Via Measuring Their Closenesses Yaxin Bi, Shengli Wu, Xuming Huang, Gongde Guo . . . . . . . . . . . . . . 1068
Industrial Applications Multiple SVMs Enabled Sales Forecasting Support System Yukun Bao, Zhitao Liu, Rui Zhang, Wei Huang . . . . . . . . . . . . . . . . . . 1073 The Application of B-Spline Neurofuzzy Networks for Condition Monitoring of Metal Cutting Tool Pan Fu, A.D. Hope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078
Table of Contents
XXV
Simplified Fuzzy-PID Controller of Data Link Antenna System for Moving Vehicles Jong-kwon Kim, Soo-hong Park, TaeSeok Jin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083 Neuron Based Nonlinear PID Control Ning Wang, Jinmei Yu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089
Information Retrieval An Image Retrieval System Based on Colors and Shapes of Objects Kuo-Lung Hong, Yung-Fu Chen, Yung-Kuan Chan, Chung-Chuan Cheng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094 A Hybrid Mood Classification Approach for Blog Text Yuchul Jung, Hogun Park, Sung Hyon Myaeng . . . . . . . . . . . . . . . . . . . 1099 Modeling and Classification of Audio Signals Using Gradient-Based Fuzzy C-Means Algorithm with a Mercer Kernel Dong-Chul Park, Chung Nguyen Tran, Byung-Jae Min, Sancho Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104 A Quick Rank Based on Web Structure Hongbo Liu, Jiaxin Wang, Zehong Yang, Yixu Song . . . . . . . . . . . . . . . 1109 A Biologically-inspired Computational Model for Perceiving the TROIs from Texture Images Woobeom Lee, Wookhyun Kim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1114 A Computer-Assisted Environment on Referential Understanding to Enhance Academic Reading Comprehension Wing-Kwong Wong, Jian-Hau Lee, Yu-Fen Yang, Hui-Chin Yeh, Chin-Pu Chiao, Sheng-Cheng Hsu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1119 An Object-Oriented Framework for Data Quality Management of Enterprise Data Warehouse Li Wang, Lei Li . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1125
Natural Language Processing Extending HPSG Towards HDS as a Fragment of pCLL Erqing Xu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1130
XXVI
Table of Contents
Chinese Multi-document Summarization Using Adaptive Clustering and Global Search Strategy Dexi Liu, Yanxiang He, Donghong Ji, Hua Yang, Zhao Wu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135 Genetic Algorithm Based Multi-document Summarization Dexi Liu, Yanxiang He, Donghong Ji, Hua Yang . . . . . . . . . . . . . . . . . . 1140 MaxMatcher: Biological Concept Extraction Using Approximate Dictionary Lookup Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu . . . . . . . . . . . . . . . . . . . . . . 1145 Bootstrapping Word Sense Disambiguation Using Dynamic Web Knowledge Yuanyong Wang, Achim Hoffmann . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1150 Automatic Construction of Object Oriented Design Models [UML Diagrams] from Natural Language Requirements Specification G.S. Anandha Mala, G.V. Uma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1155 A Multi-word Term Extraction System Jisong Chen, Chung-Hsing Yeh, Rowena Chau . . . . . . . . . . . . . . . . . . . . 1160
Neural Networks A Multiscale Self-growing Probabilistic Decision-Based Neural Network for Segmentation of SAR Imagery Xian-Bin Wen, Hua Zhang, Zheng Tian . . . . . . . . . . . . . . . . . . . . . . . . . 1166 Face Detection Using an Adaptive Skin-Color Filter and FMM Neural Networks Ho-Joon Kim, Tae-Wan Ryu, Juho Lee, Hyun-Seung Yang . . . . . . . . . 1171 GA Optimized Wavelet Neural Networks Jinhua Xu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176 The Optimal Solution of TSP Using the New Mixture Initialization and Sequential Transformation Method in Genetic Algorithm Rae-Goo Kang, Chai-Yeoung Jung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1181 Steering Law Design for Single Gimbal Control Moment Gyroscopes Based on RBF Neural Networks Zhong Wu, Wusheng Chou, Kongming Wei . . . . . . . . . . . . . . . . . . . . . . 1186
Table of Contents
XXVII
Automatic Design of Hierarchical RBF Networks for System Identification Yuehui Chen, Bo Yang, Jin Zhou . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1191 Dynamically Subsumed-OVA SVMs for Fingerprint Classification Jin-Hyuk Hong, Sung-Bae Cho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1196 Design on Supervised / Unsupervised Learning Reconfigurable Digital Neural Network Structure In Gab Yu, Yong Min Lee, Seong Won Yeo, Chong Ho Lee . . . . . . . . . 1201 Car Plate Localization Using Pulse Coupled Neural Network in Complicated Environment Ming Guo, Lei Wang, Xin Yuan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1206 A Split-Step PSO Algorithm in Predicting Construction Litigation Outcome Kwok-wing Chau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1211
Computer Vision An Efficient Unsupervised MRF Image Clustering Method Yimin Hou, Lei Guo, Xiangmin Lun . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1216 Robust Gaze Estimation for Human Computer Interaction Kang Ryoung Park . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1222 A New Iris Control Mechanism for Traffic Monitoring System Young Sung Soh, Youngtak Kwon, Yakun Wang . . . . . . . . . . . . . . . . . . 1227 Invariant Object Recognition Using Circular Pairwise Convolutional Networks Choon Hui Teo, Yong Haur Tay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1232 Face Detection Using Binary Template Matching and SVM Qiong Wang, Wankou Yang, Huan Wang, Jingyu Yang, Yujie Zheng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1237 Gain Field Correction Fast Fuzzy c-Means Algorithm for Segmenting Magnetic Resonance Images Jingjing Song, Qingjie Zhao, Yuanquan Wang, Jie Tian . . . . . . . . . . . 1242 LVQ Based Distributed Video Coding with LDPC in Pixel Domain Anhong Wang, Yao Zhao, Hao Wang . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1248
XXVIII Table of Contents
Object Matching Using Generalized Hough Transform and Chamfer Matching Tai-Hoon Cho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1253 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1259
Learning, Logic, and Probability: A Unified View Pedro Domingos Department of Computer Science and Engineering University of Washington Seattle, WA 98195, U.S.A.
[email protected] http://www.cs.washington.edu/homes/pedrod
AI systems must be able to learn, reason logically, and handle uncertainty. While much research has focused on each of these goals individually, only recently have we begun to attempt to achieve all three at once. In this talk, I describe Markov logic, a representation that combines first-order logic and probabilistic graphical models, and algorithms for learning and inference in it. Syntactically, Markov logic is first-order logic augmented with a weight for each formula. Semantically, a set of Markov logic formulas represents a probability distribution over possible worlds, in the form of a Markov network with one feature per grounding of a formula in the set, with the corresponding weight. Formulas are learned from relational databases using inductive logic programming techniques. Weights can be learned either generatively (using pseudo-likelihood optimization) or discriminatively (using a voted perceptron algorithm). Inference is performed by a weighted satisfiability solver or by Markov chain Monte Carlo, operating on the minimal subset of the ground network required for answering the query. Experiments in link prediction, entity resolution and other problems illustrate the promise of this approach. This work, joint with Stanley Kok, Hoifung Poon, Matthew Richardson, and Parag Singla, is described in further detail in Domingos et al. [1]. An opensource implementation of Markov logic and the algorithms described in this talk is available in the Alchemy package [2].
References 1. Domingos, P., Kok, S., Poon, H., Richardson, M., & Singla, P.: Unifying logical and statistical AI. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, AAAI Press, Boston, MA, U.S.A. (2006). http://www.cs.washington.edu/homes/pedrod/papers/aaai06c.pdf. 2. Kok, S., Singla, P., Richardson, M., and Domingos, P.: The Alchemy system for statistical relational AI. Technical report, Department of Computer Science and Engineering, University of Washington, Seattle, WA, U.S.A. (2005). http://www.cs.washington.edu/ai/alchemy.
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, p. 1, 2006. c Springer-Verlag Berlin Heidelberg 2006
Impending Web Intelligence (WI) and Brain Informatics (BI) Research Ning Zhong Department of Information Engineering Maebashi Institute of Technology, Japan & The International WIC Institute Beijing University of Technology, China
[email protected] In this talk, we give a new perspective of Web Intelligence (WI) research from the viewpoint of Brain Informatics (BI), a new interdisciplinary field that studies the mechanisms of human information processing from both the macro and micro viewpoint by combining experimental cognitive neuroscience with advanced information technology. As two related emerging fields of research, WI and BI mutually support each other. When WI meets BI, it is possible to have a unified and holistic framework for the study of machine intelligence, human intelligence, and social intelligence. We argue that new instruments like fMRI and information technology will revolutionize both Web intelligence and brain sciences. This revolution will be bi-directional: new understanding of human intelligence through brain sciences will yield a new generation of Web intelligence research and development, and Web intelligence portal techniques will provide a powerful new platform for brain sciences. The synergy between these two fields will advance our understanding knowledge, intelligence, and creativity. As a result, Web intelligence will become a central topic that will change the nature of information technology, in general, and artificial intelligence, in particular, towards humanlevel Web intelligence.
References 1. M. Cannataro and D. Talia, “The Knowledge Grid”, Communications of the ACM, 46 (2003) 89-93. 2. I. Foster and C. Kesselman (eds.) The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann (1999). 3. M.S. Gazzaniga, R. Ivry, and G.R. Mangun, Cognitive Neuroscience: The Biology of the Mind, W.W. Norton (2002) 2nd Edition. 4. T.C. Handy, Event-Related Potentials, A Methods Handbook, The MIT Press (2004). 5. J. Hu and N. Zhong, “Organizing Multiple Data Sources for Developing Intelligent e-Business Portals”, Data Mining and Knowledge Discovery, Vol. 12, Nos. 2-3, Springer (2006) 127-150. 6. S.H. Koslow and S. Subramaniam (eds.) Databasing the Brain: From Data to Knowledge, Wiley (2005) Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 2–4, 2006. c Springer-Verlag Berlin Heidelberg 2006
Impending Web Intelligence (WI) and Brain Informatics (BI) Research
3
7. J.E. Laird and M. van Lent, “Human-Level AI’s Killer Application Interactive Computer Games”, AI Magazine (Summer 2001) 15-25. 8. Y. Li and N. Zhong, “Mining Ontology for Automatically Acquiring Web User Information Needs”, IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 4 (2006) 554-568. 9. J. Liu, N. Zhong, Y.Y. Yao, and Z.W. Ras, “The Wisdom Web: New Challenges for Web Intelligence (WI)”, Journal of Intelligent Information Systems, 20(1) Kluwer (2003) 5-9. 10. J. Liu, “Web Intelligence (WI): What Makes Wisdom Web?”, Proc. Eighteenth International Joint Conference on Artificial Intelligence (IJCAI’03) (2003) 15961601. 11. J. McCarthy, “Roads to Human Level AI?”, Keynote Talk at Beijing University of Technology, Beijing, China (September 2004). 12. V. Megalooikonomou and E.H. Herskovits, “Mining Structure-Function Associations in a Brain Image Database”, K.J. Cios (ed.) Medical Data Mining and Knowledge Discovery, Physica-Verlag (2001) 153-179. 13. T.M. Mitchell, R. Hutchinson, R.S. Niculescu, F.Pereira, X. Wang, M. Just, and S. Newman, “Learning to Decode Cognitive States from Brain Images”, Machine Learning, Vol. 57, Issue 1-2 (2004) 145-175. 14. A. Newell and H.A. Simon, Human Problem Solving, Prentice-Hall (1972). 15. M. Ohshima, N. Zhong, Y.Y. Yao, and C. Liu, “Relational Peculiarity Oriented Mining”, Data Mining and Knowledge Discovery, Springer (in press) 16. B.R. Rosen, R.L. Buckner, and A.M. Dale, “Event-related functional MRI: Past, Present, and Future”, Proceedings of National Academy of Sciences, USA, Vol. 95, Issue 3 (1998) 773-780. 17. F.T. Sommer and A. Wichert (eds.) Exploratory Analysis and Data Modeling in Functional Neuroimaging, The MIT Press (2003) 18. R.J. Sternberg, J. Lautrey, and T.I. Lubart, Models of Intelligence, American Psychological Association (2003). 19. Y. Su, L. Zheng, N. Zhong, C. Liu, and J. Liu, “Distributed Reasoning Based on Problem Solver Markup Language (PSML): A Demonstration through Extended OWL”, Proc. 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE’05), IEEE Press (2005) 208-213. 20. Y. Su, J. Liu, N. Zhong, L. Zheng, and C. Liu, “A Method of Distributed Problem Solving on the Web”, Proc. 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05), IEEE Press (2005) 42-45. 21. Y.Y. Yao, N. Zhong, J. Liu, and S. Ohsuga, “Web Intelligence (WI): Research Challenges and Trends in the New Information Age”, N. Zhong, Y.Y. Yao, J. Liu, S. Ohsuga (eds.) Web Intelligence: Research and Development, LNAI 2198, Springer (2001) 1-17. 22. N. Zhong, J. Liu, Y.Y. Yao, and S. Ohsuga, “Web Intelligence (WI)”, Proc. 24th IEEE Computer Society International Computer Software and Applications Conference (COMPSAC 2000), IEEE Press (2000) 469-470. 23. N. Zhong, C. Liu, and S. Ohsuga, “Dynamically Organizing KDD Process”, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 15, No. 3, World Scientific (2001) 451-473. 24. N. Zhong, J. Liu, and Y.Y. Yao, “In Search of the Wisdom Web”, IEEE Computer, 35(11) (2002) 27-31. 25. N. Zhong, “Representation and Construction of Ontologies for Web Intelligence”, International Journal of Foundations of Computer Science, World Scientific, Vol. 13, No. 4 (2002) 555-570.
4
N. Zhong
26. N. Zhong, J. Liu, and Y.Y. Yao (eds.) Web Intelligence, Springer, 2003. 27. N. Zhong, Y.Y. Yao, and M. Ohshima, “Peculiarity Oriented Multi-Database Mining”, IEEE Transaction on Knowlegde and Data Engineering, Vol. 15, No. 4 (2003) 952-960. 28. N. Zhong, “Developing Intelligent Portals by Using WI Technologies”, J.P. Li et al. (eds.) Wavelet Analysis and Its Applications, and Active Media Technology, Vol. 2, World Scientific (2004) 555-567. 29. N. Zhong, J.L. Wu, A. Nakamaru, M. Ohshima, and H. Mizuhara, “Peculiarity Oriented fMRI Brain Data Analysis for Studying Human Multi-Perception Mechanism”, Cognitive Systems Research, 5(3), Elsevier (2004) 241-256. 30. N. Zhong, and J. Liu (eds.) Intelligent Technologies for Information Analysis, Springer, 2004. 31. N. Zhong, J. Hu, S. Motomura, J.L. Wu, and C. Liu, “Building a Data Mining Grid for Multiple Human Brain Data Analysis”, Computational Intelligence, 21(2), Blackwell Publishing (2005) 177-196. 32. N. Zhong, J. Liu, and Y.Y. Yao, “Envisioning Intelligent Information Technologies (iIT) from the Stand-Point of Web Intelligence (WI)”, Communications of the ACM (in press).
Learning with Unlabeled Data and Its Application to Image Retrieval Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China
[email protected] Abstract. In many practical machine learning or data mining applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain because labeling the examples require human effort. So, learning with unlabeled data has attracted much attention during the past few years. This paper shows that how such techniques can be helpful in a difficult task, content-based image retrieval, for improving the retrieval performance by exploiting images existing in the database.
1
Learning with Unlabeled Data
In the traditional setting of supervised learning, a large amount of training examples should be available for building a model with good generalization ability. It is noteworthy that these training examples should be labeled, that is, the ground-truth labels of them are known to the learner. Unfortunately, in many practical machine learning or data mining applications such as web page classification, although a large number of unlabeled training examples can be easily collected, only a limited number of labeled training examples are available since obtaining the labels require human effort. So, exploiting unlabeled data to help supervised learning has become a hot topic during the past few years. Currently there are three main paradigms for learning with unlabeled data, i.e., semi-supervised learning, transductive learning and active learning. Semi-supervised learning deals with methods for automatically exploiting unlabeled data in addition to labeled data to improve learning performance. That is, the exploitation of unlabeled data does not need human intervene. Here the key is to use the unlabeled data to help estimate the data distribution. For example, a lot of approaches consider the contribution of the unlabeled examples by using a generative model for the classifier and employing EM to model the label estimation or parameter estimation process [7,9,12]. Note that previous research on semi-supervised learning mainly focus on classification, while semi-supervised regression only has been studied recently [18]. A recent comprehensive review on semi-supervised learning can be found in [21]. Transductive learning is a cousin of semi-supervised learning, which also tries to exploit unlabeled data automatically. The main difference lies in the different Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 5–10, 2006. c Springer-Verlag Berlin Heidelberg 2006
6
Z.-H. Zhou
assumptions they hold: transductive learning assumes that the goal is to optimize the generalization ability on only a given test data set, and the unlabeled examples are exactly the test examples [5,14]; semi-supervised learning does not assume a known test set, and the unlabeled examples are not needed to be test examples. Active learning deals with methods that assume the learner has some control over the input space. In exploiting unlabeled data, it requires that there is an oracle, such as a human expert, can be queried for labels of specific instances, with the goal of minimizing the number of queries needed. Here the key is to select the unlabeled example on which the labeling will convey the most helpful information for the learner. There are two major schemes, i.e. uncertainty sampling and committee-based sampling. Approaches of the former train a single learner and then query the unlabeled examples on which the learner is least confident [6]; while approaches of the latter generate a committee of multiple learners and select the unlabeled examples on which the committee members disagree the most [1,11].
2
A Machine Learning View of CBIR
With the rapid increase in the volume of digital image collections, content-based image retrieval (CBIR) has attracted a lot of research interests [13]. The user can pose an example image, i.e. user query, and ask the system to bring out relevant images from the database. A main difficulty here is the gap between high-level semantics and low-level image features, due to the rich content but subjective semantics of an image. Relevance feedback has been shown as a powerful tool for bridging this gap [10,15]. In relevance feedback, the user has the option of labeling a few images according to whether they are relevant to the target or not. The labeled images are then given to the CBIR system as complementary queries so that more images relevant to the user query can be retrieved from the database. In fact, the retrieval engine of a CBIR system can be regarded as a machine learning process, which attempts to train a learner to classify the images in the database as two classes, i.e. positive (relevant) or negative (irrelevant). However, this learning task has something different from traditional supervised learning tasks, which makes it interesting and challenging. First, few users will be so patient to provide a lot of example images in the retrieval process. Therefore, even with relevance feedback, the number of labeled training examples are still very small. Second, few users will be so patient to take part in a time-consuming interaction process. Therefore, the learning process should meet the real-time requirement. Third, instead of returning a crisp binary classification, the learner is expected to produce a rank of the images. The higher the rank, the more relevant the image. Fourth, in typical supervised learning the concept classes are known in advance, but in CBIR, since an image can be relevant to one query but irrelevant to another, the concept classes are dynamic, cannot be given a priori. The last but not least important, typical
Learning with Unlabeled Data and Its Application to Image Retrieval
7
machine learning algorithms regard the positive and negative examples interchangeably and assume that both sets are distributed approximately equally. In CBIR although it is reasonable to assume that all the positive examples belong to the same target class, it is usually not valid to make the same assumption for the negative ones because different negative examples may belong to different irrelevant classes and the small number of negative examples can hardly be representative for all the irrelevant classes.
3
Why Exploiting Images in Database?
Section 2 mentioned that in CBIR, even with relevance feedback, the number of example images provided by the user is still very limited. However, there are abundant images existing in the database. Can those images be helpful? Of course. It is well-known that a main difficulty of CBIR is the gap between high-level semantics and low-level image features, due to the rich content but subjective semantics of an image. This problem can hardly be solved by simply using stronger visual features, but can be released to some degree by using more example images. Usually, the target concept being queried by the user becomes more clear when the user gives more example images. In fact, the relevance feedback mechanism works simply because more example images are given by the user during the feedback process. Thus, considering the example images as labeled training examples and the images in the database as unlabeled training examples, the CBIR problem resembles what has motivated the research on learning with unlabeled examples. That is, there are a limited number of labeled training examples which are not sufficient for training a strong learner, but there are abundant unlabeled training examples which can be exploited. So, it is evident that techniques of learning with unlabeled data can be used to help improve the retrieval performance. Note that when the CBIR process is executed on a given database, the task can be mapped to a transductive learning problem since the generalization ability on the given database is concerned; when the CBIR process is executed on an open image source, such as the web, the task can be mapped to a semi-supervised learning problem. On the other hand, since relevance feedback involves human interaction, active learning can be helpful. Thus, CBIR provides a good arena for techniques of learning with unlabeled data.
4
Some Results
We have designed some co-training style techniques for exploiting unlabeled data in CBIR [16,17]. Co-training was proposed by Blum and Mitchell [2], which has then been studied and extended by many researchers and thus become a popular scheme in learning with unlabeled data. In its original version, co-training trains two classifiers separately on two sufficient and redundant views, i.e. two attribute
8
Z.-H. Zhou
sets each of which is sufficient for learning and conditionally independent of the other given the class label, and uses the predictions of each classifier on unlabeled examples to augment the training set of the other. Later, variants which do not require sufficient and redundant views have been presented [3,19], and so does an active learning variant [8]. In order to avoid a complicated learning process such that the real-time requirement of CBIR can be met, we [16,17] employ a very simple model to realize two learners which use Minkowski distances with different orders to measure the image similarities. Each learner will give every unlabeled image a rank which is a value between −1 and +1, where positive/negative means the learner judges the concerned image to be relevant/irrelevant, and the bigger the absolute value of the rank, the stronger the confidence of the learner on its judgement. Then, each learner will choose some unlabeled images to label for the other learner according to the rank information. After that, both the learners are re-trained with the enlarged labeled training sets and each of them will produce a new rank for the unlabeled images. The new ranks generated by the learners can be easily combined, which results in the final rank for every unlabeled image. Then, unlabeled images with top ranks are returned as the retrieval results which are displayed according to descending order of the real value of their ranks. Besides, unlabeled images with bottom absolute ranks (i.e. near 0) are put into a pool, which is then used for the user to give feedback. By using such an active learning scheme, the images labeled by the user in the relevance feedback process can have bigger chance to be the ones that are most helpful in improving the retrieval performance. It has been shown that introducing both semi-supervised learning and active learning into CBIR are beneficial [16,17]. The above approach works with the relevance feedback process, where there are several labeled training examples that can be used. As for the initial retrieval, since there is only one labeled training example, i.e. the user query, exploiting unlabeled examples is more difficult. Such an extreme setting has not been studied before in the area of learning with unlabeled data. In a recent work [20] we have shown that when the images are with textual annotations, even in the initial retrieval, exploiting unlabeled images to improve the retrieval performance is still feasible. The key is to induce some additional labeled training examples by using Kernel Canonical Component Analysis [4] to exploit the correlations between the visual features and textual annotations. Such an approach can be easily generalized to other cases where there is only one labeled training example but the data have two views.
5
Conclusion
Techniques of learning with unlabeled data are helpful in diverse machine learning or data mining applications. This paper shows that how they can be helpful in enhancing the performance of CBIR, which exhibits an encouraging new direction for image retrieval research. Note that here we do not claim that the retrieval performance can be boosted to a level which can ‘make the user satisfied’, which is still a long way to go. In fact, what we have claimed is that
Learning with Unlabeled Data and Its Application to Image Retrieval
9
the retrieval performance can be enhanced by exploiting the unlabeled images. Moreover, we believe that CBIR can raise many interesting machine learning research topics, the outputs of which will be not only beneficial to CBIR but also be able to generalize to other learning tasks.
Acknowledgment This research was partially supported by FANEDD (200343) and NSFC (60325207).
References 1. N. Abe and H. Mamitsuka. Query learning strategies using boosting and bagging. In Proceedings of the 15th International Conference on Machine Learning, pages 1–9, Madison, WI, 1998. 2. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 92–100, Madison, WI, 1998. 3. S. Goldman and Y. Zhou. Enhancing supervised learning with unlabeled data. In Proceedings of the 17th International Conference on Machine Learning, pages 327–334, San Francisco, CA, 2000. 4. D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: an overview with application to learning methods. Neural Computation, 16(12):2639–2664, 2004. 5. T. Joachims. Transductive inference for text classification using support vector machines. In Proceedings of the 16th International Conference on Machine Learning, pages 200–209, Bled, Slovenia, 1999. 6. D. Lewis and W. Gale. A sequential algorithm for training text classifiers. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3–12, Dublin, Ireland, 1994. 7. D. J. Miller and H. S. Uyar. A mixture of experts classifier with learning based on both labelled and unlabelled data. In M. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 571–577. MIT Press, Cambridge, MA, 1997. 8. I. Muslea, S. Minton, and C. A. Knoblock. Selective sampling with redundant views. In Proceedings of the 17th National Conference on Artificial Intelligence, pages 621–626, Austin, TX, 2000. 9. K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2-3):103–134, 2000. 10. Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra. Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 8(5):644–655, 1998. 11. H. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the 5th ACM Workshop on Computational Learning Theory, pages 287–294, Pittsburgh, PA, 1992. 12. B. Shahshahani and D. Landgrebe. The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon. IEEE Transactions on Geoscience and Remote Sensing, 32(5):1087–1095, 1994.
10
Z.-H. Zhou
13. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Contentbased image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349–1380, 2000. 14. V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. 15. X. S. Zhou and T. S. Huang. Relevance feedback in image retrieval: a comprehensive review. Multimedia Systems, 8(6):536–544, 2003. 16. Z.-H. Zhou, K.-J. Chen, and H.-B. Dai. Enhancing relevance feedback in image retrieval using unlabeled data. ACM Transactions on Information Systems, 24(2), 2006. 17. Z.-H. Zhou, K.-J. Chen, and Y. Jiang. Exploiting unlabeled data in contentbased image retrieval. In Proceedings of the 15th European Conference on Machine Learning, pages 525–536, Pisa, Italy, 2004. 18. Z.-H. Zhou and M. Li. Semi-supervised learning with co-training. In Proceedings of the 19th International Joint Conference on Artificial Intelligence, pages 908–913, Edinburgh, Scotland, 2005. 19. Z.-H. Zhou and M. Li. Tri-training: exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 17(11):1529–1541, 2005. 20. Z.-H. Zhou, D.-C. Zhan, and Q. Yang. Semi-supervised learning with a single labeled example. Unpublished manuscript. 21. X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin at Madison, Madison, WI, 2005. http://www.cs.wisc.edu/∼jerryzhu/pub/ssl survey.pdf.
Learning as Abductive Deliberations Budhitama Subagdja1 , Iyad Rahwan2 , and Liz Sonenberg1 2
1 Department of Information Systems, University of Melbourne Institute of Informatics, The British University in Dubai, (Fellow) School of Informatics, University of Edinburgh, UK
Abstract. This paper explains an architecture for a BDI agent that can learn based on its own experience. The learning is conducted through explicit procedural knowledge or plans in a goal-directed manner. The learning is described by encoding abductions within the deliberation processes. With this model, the agent is capable of modifying its own plans on the run. We demonstrate that by abducing some complex structures of plan, the agent can also acquire complex structures of knowledge about its interaction with the environment.
1
Introduction
The BDI (Beliefs, Desires, Intentions) agent model [10] is a design framework commonly used in developing agents that behave both deliberatively and reactively in a complex changing environment. The main principle is to use explicit representations of the agents’ own mental attitudes (in terms of attributes such as beliefs, desires, and intentions) to direct their actions and decision of choosing the appropriate predefined plan. To develop the system, the designer would define some initial mental conditions and describing some plans explicitly which correspond to the agents behavior in a repository of plans. Variability in behavior can be attained by the process of deliberation. However, it is always possible that unforeseen conditions require some modification of the prescribed plans or knowledge instead of just alternating one plan after another. Although, the exhibition of the behavior can be adaptive in a reactive way, plans for directing or guiding the behavior in a BDI agent are still fixed in advance of the system execution. Most existing BDI frameworks are still incapable of modifying plans or recipes for actions at runtime. In this paper, we present a new model of learning in BDI agents. We use metalevel plans, expressed in general programming constructs, to enable the agent to specify learning and deliberation steps explicitly. This enables the agent to introspectively monitor its own mental state and update its plans at runtime. The learning is regarded as a kind of deliberation process in which the agent makes plausible hypotheses about expected outcomes and creates (or modifies) plans if the hypotheses are proven. This kind of process is also known as abduction, a term which was coined by C.S. Peirce [6]. In this case, the agent is not just selecting the best option available but also expecting useful knowledge to be acquired if the selection fails. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 11–20, 2006. c Springer-Verlag Berlin Heidelberg 2006
12
B. Subagdja, I. Rahwan, and L. Sonenberg
This work advances the state of the art by combining the strengths of learning and BDI agent frameworks in a rich language for describing deliberation processes. In particular, our approach enables domain experts to specify learning processes and strategies explicitly, while still benefiting from procedural domain knowledge expressed in plan recipes (as opposed to generating and learning plans from scratch). The remainder of this paper is structured as follows. the next section explain the architecture of a BDI agent. The section also describes the concept of deliberation processes as meta-level plans which can accommodate abductions. Section 3 then explain how learning can be described explicitly in terms of meta-level plans and deliberation processes. In that section we describe some primitives for learning and some examples of generic strategies for the experience-based plan construction. Section 4 illustrates the characteristic of the learning approach from a case study. Section 5 discusses some related works on learning intentional agents. Finally, the last section concludes the paper.
2
BDI Agent Architecture
The BDI architecture works as an interpreter interacting with different data structures. In PRS [4] as the commonly used BDI implementation model, there are four different types of data structure. Firstly, beliefs or belief base (B) correspond to a model or knowledge about the world which can be updated directly by events captured on the sensors. Secondly, the agent’s desires or goals (G) correspond to conditions the agent wants to fulfill. The desires invoke finding ways to achieve them and select one (or some) to act upon. Thirdly, the selected ways to be committed for execution are the intentions (Γ ). Lastly, the knowledge of how to achieve certain desires or goals are stored in the plans or the plan library. The common process of a BDI interpreter that drives the agent’s behavior is an iteration of steps like updating beliefs based on observation in the world, deciding what intention to achieve, choosing a plan to achieve intentions, and executing the plan [12]. The interpreter goes through a control loop which consists of observation, intention filtering, and plan selection. The adopted intention is committed for execution to its end. If something goes wrong with the intention, the agent can reconsider its intention, select another plan as an alternative or just drop the intention and select another intention. In PRS-like agents, the loop may produce a hierarchical structure of intentions. A selected intention may invoke further deliberations which produce other intentions having sub-ordinate relations with the former one. This hierarchical structure is also called the intention structure. The intention structure represents a stack structure consisting of goals, subgoals, and their intentions. The intention structure maintains some information about the state of the agent choices and actions, limits the number of choices to be considered at a deliberation moment, thus reducing computational complexity at every cycle. By using this structure, a goal can be broken down further to be more specific while the agent behaves reactively to changes in the environment.
Learning as Abductive Deliberations
2.1
13
Plans and Intentions
A plan represents procedural knowledge or know-how. As a knowledge for accomplishing a task, a plan would be used as a recipe which guides an agent in its decision making process, hence reducing search through alternative solutions [9]. In classical STRIPS planning [2], a plan consists of a set of operators or actions each with attributes like a list of preconditions, an add list, and a delete list. Definition 1. An action α is a tuple of Aα , Pα , Δα , Σα in which Aα is the action name; Pα is a list of conditions that must be believed to be true prior to the execution of α; Δα is a list of conditions that must be believed to be false after the performance of α; and Σα are those that are believed to be true after the performance of α. Conditions are expressed as literals which can be propositions or predicate logic statements. A plan can be considered as an encapsulated description of actions with its consequences and contexts. It may represent just a single action or it can describe a complex relationship between actions. Similar to an action, a plan also has contextual descriptions like preconditions and effects (add or delete lists). In addition, a plan can also have attributes like a trigger (goal) and a body that describes relationships between actions. Definition 2. A plan π can be defined as a tuple ϕπ , Pπ , Σπ , Δπ , Bπ where Pπ , Σπ , and Δπ are respectively the preconditions, add list, and delete list which have the same meaning as the corresponding symbols in the action definition above. The trigger ϕπ is the goal that triggers the activation of the plan. The plan body Bπ describes actions and their relationships. The plan goal states the thing that is wanted or desired by executing the plan. There are two types of goals: achieve and perform. A plan with an achieve goal says that a condition stated in the goal will hold or be true after performing actions described in the plan body. A perform goal, on the other hand, tells that actions described in the goal will be performed if the plan is executed. Actions described in a perform goal or a plan body are represented in a composite action. Definition 3. A composite action τC [φ1 , ..., φn ] tions. τC is the type of the relation, in which τC proposition, or another composite action forming tionship. If ια is a composite action, ν ← ια is action ια to the variable ν.
states the relationship between ac∈ ΥC and φi can be an action, a a nested structure of actions relaan assignment of the result of the
There can be many types of structure in ΥC . Due to space limitations, table 1 only describes some of them which seem to be relevant and important. A variable, once bound to a value, can be used for a later purpose through variables in term parameters. For example, the composite action seq[do[X ← select object], do[grasp(X)]] states that an object X is selected and then grasped. The object value which is bounded by the variable X as a result of the selection action is fed into the action grasp. In the deliberation cycle of the BDI interpreter a plan is selected from the plan library based on current goals and intentions. The selected plan is instantiated
14
B. Subagdja, I. Rahwan, and L. Sonenberg Table 1. Action structures and relationships Relation type do[α] confirm[c] conclude[c]
Description execute a single action α confirm if condition c is true in the agent’s beliefs conclude that the condition c is true by asserting it to the agent’s beliefs wait[c] wait until the condition c is true subgoal[ϕ] post the goal ϕ as a subgoal seq[β1 , ..., βn ] execute substructures β1 to βn consecutively seq-choices[β1 , ..., βn ] try to execute substructures in the list from β1 to βn consecutively until a successful execution of one of them cycle[β1 , ..., βn , < until c >] iteratively execute all substructures in the list based on the order and stop until a condition c (optional)is true
and incorporated as an intention. The intention is put on the intention structure before it is executed later on. A plan instance or an intention stores an index that locates the current selected goal or action in the corresponding plan body. It also maintains information about variable bindings and states of the intention. An intention can be in a scheduled, succeeds, fails, pushed, or wait state. If the plan body of a plan instance has a nested structure, then the substructure of the composite action becomes a new intention which is concatenated at the intention of that plan instance. This is also conducted for a composite action that posts a subgoal. Another plan instance for achieving the subgoal will be concatenated at that location in the intention structure. 2.2
Meta-level Plans and Abductions
In the previous section we described the model of plans and intentions in a BDI agent architecture. In this section, we explain the use of meta-level plans for controlling the deliberation. This section also shows how meta-level plans can leverage the deliberation process with abductions. The original PRS model assumes that the deliberation process is handled by the use of meta-level plans [5]. The instance of meta-level plans can obtain information from the intention structure and change it at runtime. A metalevel plan for the deliberation process can be characterized as a plan which contains some meta-actions or actions that deal with goals, intentions, and plans. For example, the composite action described below shows some parts of the deliberation process. cycle[ do[observe], G ← do[consider options], I ← do[filter options(G)], P ← do[select plan(I)], do[intend(I, P )]]
This structure of composite actions can be put initially in the intention structure. It works as an infinite loop of observe for updating belief, consider options for generating options, filter options for selecting intentions, select plan for selecting a plan instance, and intend that insert the selected plan to its corresponding intention and put them on the intention structure. Objects passed or
Learning as Abductive Deliberations
15
exchanged between actions are goal options (G), selected intentions (I), and a plan (P ). The intention execution and reconsideration parts of the loop are skipped for simplification and it is assumed that executing the intention in the intention structure is done by the interpreter as a default process. Based on the process of deliberation and execution, it is possible to say that the agent decides a plan and selects an action based on its beliefs and goals. We assume that if the agent has a plan for achieving a goal, it means also that the agent believes that executing the actions described in the plan will bring about the goal. This kind of process of selecting and adopting a plan instance can be regarded as a deductive inference. When there is a failure in executing the plan, a re-deliberation or re-selection process can be conducted from the beginning with a refreshed condition of beliefs. A more sophisticated technique is retaining the history of the past failures so that a failed plan instance will never be chosen for the second time. In this paper, we suggest that abductions can incorporate the deliberation process so that the agent may not just re-deliberate to deal with failures and changes but also anticipate what would happen. The agent tries to develop explanations about possible failures while it tries to achieve the goal. In our model, the abduction is activated by the deliberation. The agent decides not just the goal and the intention to fulfill but also some proofs of hypothetical failures. The composite action for the deliberation cycle becomes as follows: cycle[ do[observe], G ← [do[consider options]], I ← do[filter options(G)], P ← do[select plan(I)], H ← do[make hypothesis(G, I)], P ← do[select testing plan(H, I)], do[intend(I, P )], do[intend(H,P )]]
The action make hypothesis generates hypotheses based on prescribed beliefs about what kind of situation would come up. The hypothesis made can be expressed as a composite action describing the sequence of events or actions that would happen. The plan for testing the hypothesis can be described as a meta-level plan. This meta-level plan involves types of actions that can capture changes in the agent’s mental state (e.g. wait, confirm). The successful execution of testing will produce or revise a belief about something that is hypothesized. When a test to prove a hypothesis fails, it will just be dropped or removed from the intention structure just like a normal plan execution. The testing plan will be re-activated in the next deliberation cycle if the same goal is still needed to be achieved. The testing plan may also update the beliefs about quality, preference, or confidence levels of a plan so that the plan will have a greater chance of being picked up by the action select plan in the next cycle of the deliberation loop. Different proving or testing strategies can be used for testing different hypotheses. For example, a hypothesis testing plan can have a composite action like the following: seq[wait[done(I)], confirm[success(I)], do[assert belief(H, I)]]. This composite action can be used to prove that the intention I will be executed successfully so that a new belief about the hypothesis H in relation with I can be asserted. The action wait waits for I until its finish before confirming its success.
16
B. Subagdja, I. Rahwan, and L. Sonenberg
In another case, a different testing strategy might have a more complex composite action like the following: seq-choices[cycle[wait[done(I)], confirm[not success(I)], H ← do[append(H, I)]], do[assert belief(H, I)]]
This composite action can be used to prove that actions in the intention I will eventually succeed or reach the goal if they are repeated for a certain number of times. The first branch of the seq-choices captures an unsuccessful attempt and updates the hypothesis with an additional step. The second branch is executed when a successful attempt is found and a belief about the repetition structure can be asserted. This hypothesis testing strategy still has a flaw in dealing with a single alternative of action only. An infinite loop might be produced if the goal can not be attained. However, the structure can trivially be amended by inserting some confirm actions on both branches to test if the length of the repetition exceeds a certain limit. To deal with complex situations and problems, different abduction plans can be given to test different possible structures of actions upon several attempts of goal achievements. The abduction plans can be provided by the domain expert or the agent designer as heuristics for acquiring knowledge.
3
Representing Learning Processes
Learning in BDI agents can be defined as abduction-deliberation processes which can result in the improvement of the agent’s performance. The approach of improvement suggested in this paper is by modifying or generating plans in the plan library. The hypotheses confirmed through the abduction process are candidate plans. The steps of confirmation are followed by a plan generation or modification. For example, the following composite action can be considered as the body of a learning plan: seq-choices[cycle[wait[done(I)], confirm[not success(I)], H ← do[add plan step(H, I)]], do[create plan(H, I)]]
This composite action is similar with the example of a composite action for hypothesis testing mentioned above. It captures repetitive actions for achieving the goal. However, the end result of the testing process is a new plan. In this case, the hypothesis H is a template of the possible plan. The learning plan can be said as trying to confirm that there is a sequence of repetitive actions that eventually reaches the goal. If a repetition of actions is confirmed, a new plan with a repetition or a sequence structure can be asserted.
4
Case Study
This section shows some examples of learning plans and the development of the agent’s knowledge when the agent is given certain tasks and is situated in
Learning as Abductive Deliberations
17
a certain environment. In order to implement the experiment for studying the characteristic of the learning agent, we have developed a special type of a BDI interpreter which supports introspective plans monitoring and modification at runtime. 4.1
The Rat’s World
The Rat’s World is an implemented simulation inspired by the psychological experiment of operant conditioning. An artificial rat agent is put on a designated place with some desires of getting some rewards (metaphorically some cheese). To get the reward the agent must select (press) some buttons in a particular order. Assume there are two buttons each with different colors (let say black and white buttons). If the appropriate order has been setup so that the reward can be obtained by firstly pressing the black button followed by the white one, the rat can learn the combination by pursuing several trials of the same situation and converge to the right sequence (Some reinforcement learning algorithms like Q-learning can learn this kind of task very well). However, a simple modification can make this problem non-trivial. In particular, the situation becomes complicated when the position of the buttons is randomly swapped for every trial. At one moment, the agent has beliefs about the buttons’ positions and its own last action. Following the Markovian model of decision processes, these beliefs represent a state. Let’s say, initially, the agent has the following belief based on its initial perception: button pos(black, white) which states that the black button is on the left and the white one is on the right. The predicate last act(A) is used to refer to the last action taken, and is added after the agent do the first action. The A can be press(left) for pressing the button on the left or press(right) for pressing the one on the right. The agent is provided with some initial plans for getting the reward. One plan contains actions like do[press(left) ] and the other has do[press(right)]. With these initial plans, a deliberation process will cause the agent to press the left or the right button in a random fashion to get the reward. A simple learning plan can be made which is triggered by a drop in its performance level (high rate of failures). The body of the learning plan (hypothesis testing) can be as follows seq[ wait[done(I)], S ← do[observe], seq-choices[ seq[confirm[success(I)], P ← do[create plan(I, S)]], seq[wait[done(I )], confirm[success(I )], P ← do[create plan(I, S), G ← do[obtain goal(I)], P ← do[add plan step(P, subgoal(G))] ] ], do[generate plan(P )] ] (Learning Plan 1)
The learning plan monitors events produced by the intention I on the intention structure. When I succeeds straightaway, a new plan is created by the action create plan with the instance of actions in the intention I as its plan
18
B. Subagdja, I. Rahwan, and L. Sonenberg
body and the observed state S as its precondition. Otherwise, it waits for another intention I that succeeds straightaway, and a new plan is created with the plan instance of the intention I, the precondition S, but a subgoal posting action is appended at the end. Learning Plan 1 produced two types of plan. One type of plan maps a belief state directly to an action. The other type maps a belief state to an action that reach an intermediate state and post the same subgoal recursively. For example, one plan produced has only the following structure in its body do[press(right)] and another one with the following structure seq[do[press(right)], subgoal[get reward]]. Each generated plan has a precondition. For example, button pos(black, white) and last act(do[press(right)]).
The experiment conducted has shown that the learning plan described above is not effective in dealing with the dynamic situation. This is because the agent can not distinguish some observed states when the buttons stay still from states where the buttons have just been swapped. The agent only relies on chances and the probability distribution of the learnt plans in dealing with uncertainties. Figure 1(i) shows that the performance of learning with Learning Plan 1 is not much different than without learning at all. From 400 learning trials for each 20 cases, the performance on average still stays just slightly above 50% chances with a relatively high level of variability. The performance level is measured by the rate of successful attempts (getting the rewards).
Fig. 1. Performance of agent in the Rat’s World domain (i) with Learning Plan 1. (ii) with Learning Plan 2
We modified the learning plan by encoding a different hypothesis. The agent is made to wait one more action before producing a sequence of actions. The composite action of the learning plan above is modified as follows seq[ S ← do[observe], wait[done(I)], wait[done(I )] confirm[I = I ], confirm[fails(I)], confirm[success(I )], P ← do[create plan(I, S),P ← do[add plan step(P, planbody(I ))], do[generate plan(P )] ] (Learning Plan 2)
This learning plan waits for two consecutive execution of actions from intentions I and I . If both actions are different and the first action failed while the
Learning as Abductive Deliberations
19
second one succeeded, then a new plan is generated with the two consecutive actions in the plan body and the observational state as the precondition. The result of the learning is one type of plan with a sequence of different actions as follows seq[do[press(left)], do[press(right)]]. The plan is also provided with appropriate preconditions based on observations. The experiment using Learning Plan 2 produced much better results. Figure 1(ii) shows that successful attempts raise the performance quickly to maximum values. In all cases, it is demonstrated that the performance always reaches the maximum. This can happen because the right interaction between the agent and the buttons can be modeled as a composite action. By hypothesizing a pattern of consecutive actions underlying the right interaction, a significant number of combinations of plan structures to be searched can be pruned. The experiments in the Rat’s World domain have clearly shown that a simple mapping between an observational state with a single action is not enough to make the agent learn the right model of interaction in a changing situation. The agent should first make assumptions about the model of its interaction with the environment. It is also indicated that applying different patterns of hypotheses influences the capability of the agent to learn something from the environment. Using complex patterns of hypotheses would also make the agent learn complex things.
5
Related Works
There is some previous work on making BDI agents learn [1,3] by making learning as a separate process conducted by separate modules. For example, Hernandez etal. [3] apply an inductive decision-tree learning algorithm that learns new plans by feeding in logged events produced by the BDI engine to the learning program. In contrast, the learning processes suggested in this paper are mainly heuristics or knowledge that describe learning and abduction as parts of the BDI architecture. Other works have considered learning as parts of the BDI mechanism [7,8,11], however they still assume the creation of plans based on direct mappings of observational states to corresponding plans or actions. As demonstrated in the last section, this kind of plan creation may lead to the perceptual aliasing problem in which the agent would not be able to distinguish one state from another. The advantage of putting the learning as part of the explicit knowledge in the BDI architecture is that it enables the agent developer to specify learning processes and strategies from domain or expert knowledge easily. In any case, the main feature of the BDI agent architecture is that the behavior of the agent is driven mainly by the procedural domain knowledge which can be prescribed as plan recipes as opposed to generating plans from scratch.
6
Conclusion
In this paper, we describe a model of learning in the BDI agent architecture. We use meta-level plans to program learning as an explicit part of the agent’s
20
B. Subagdja, I. Rahwan, and L. Sonenberg
behavior. The meta-level plans enable the agent to introspectively monitor their own mental conditions and update their plans at runtime. The learning can be described as a process of abduction that tries to confirm the occurrences of structures of plans. The experiments conducted have shown that the agent needs to make assumptions about its possible interactions with the environment. Direct observations might be insufficient to learn the appropriate model. Although the heuristics shown in this paper are problem specific, they can still be useful in a range of different situations. By providing a set of different types of heuristics for different classes of problems, the learning might cover various domains of application. Moreover, the learning can be multi-strategic and reactive to the change in the environment. However, further work is still needed for seeking the appropriate set of heuristics so that the approach can be practically useful.
References 1. C. F. E. Cindy Olivia, Chee-Fon Chang and A. K. Ghose. Case-based BDI agents: an effective approach for intelligent search on the world wide web. In Proceedings of the AAAI-99 Spring Symposium on Intelligent Agents in Cyberspace Stanford University, USA, 1999. 2. R. Fikes and N. Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving. In Artificial Intelligence, volume 2, pages 189–208. 1971. 3. A. G. Hernandez, A. E. Segrouchini, and H. Soldano. BDI multiagent learning based on first-order induction of logical decision trees. In S. O. N. Zhong, J. Liu and J. Bradshaw, editors, Intelligent Agent Technology: Research and Development. World Scientific, New Jersey, 2001. 4. F. Ingrand, M. Georgeff, and A. Rao. An architecture for real-time reasoning and system control. In IEEE Expert, volume 7(6), pages 34–44. 1992. 5. F. F. Ingrand and M. P. Georgeff. Managing deliberation and reasoning in real-time AI systems. In Proceedings of the DARPA Workshop on Innovative Approaches to Planning, San Diego, California, 1990. 6. L. Magnani. Abduction, Reason, and Science: Processes of Discovery and Explanation. Kluwer Academic/Plenum Publishers, New York, 2001. 7. E. Norling. Learning to notice: Adaptive models of human operators. In Second International Workshop on Learning Agents, Montreal, 2001. 8. E. Norling. Folk psychology for human modelling: Extending the BDI paradigm. In Proc. of the Third Int. Joint Conf. on Autonomous Agents and Multi Agent Systems (AAMAS-04), New York, NY, July 2004. 9. A. S. Rao. A unified view of plans as recipes. In Contemporary Action Theory. Kluwer Academic, Netherlands, 1997. 10. A. S. Rao and M. P. Georgeff. BDI agents: From theory to practice. In Proceedings of the First International Conference on Multi-Agent Systems (ICMAS-95). San Francisco, 1995. 11. C. Sioutis and N. Ichalkaranje. Cognitive hybrid reasoning intelligence agent system. In Knowledge-Based Intelligent Information and Engineering Systems, 9th International Conference, KES 2005, Melbourne, Australia, volume 3682 of LNAI, pages 838–843. Springer, 2005. 12. M. Wooldridge. Reasoning about Rational Agents. MIT Press, Cambridge, 2000.
Using a Constructive Interactive Activation and Competition Neural Network to Construct a Situated Agent’s Experience Wei Peng and John S. Gero Key Centre of Design Computing and Cognition University of Sydney, NSW 2006, Australia {wpeng, john}@arch.usyd.edu.au http://www.arch.usyd.edu.au/kcdc/
Abstract. This paper presents an approach that uses a Constructive Interactive Activation and Competition (CIAC) neural network to model a situated agent’s experience. It demonstrates an implemented situated agent and its learning mechanisms. Experiments add to the understanding of how the agent learns from its interactions with the environment. The agent can develop knowledge structures and their intentional descriptions (conceptual knowledge) specific to what it is confronted with – its experience. This research is presented within the design optimization domain.
1 Introduction Experience is defined as the accumulation of knowledge or skill that results from direct participation in events or activities [15]. Learning is a process whereby knowledge is created through the transformation of experience [6], [8]. Experience plays a key role in a learning process. Learning, also known as knowledge acquisition, has been widely explored by many artificial intelligence (AI) and machine learning researchers. A broad spectrum of approaches has been developed, including inductive learning methods, explanation-based learning approaches and connectionist algorithms. Another theory of learning emerged from cognitive science domain. “Situatedness” [1], [4] and “situated learning” [9], [13] emphasize the role of social interactions in learning. An agent’s memory can be regarded as a learning process. The notion of “constructive memory” contradicts many views of knowledge as being unrelated to either its locus or application [5]. A constructive memory model [5], [10] provides a conceptual framework for us to utilize the concept of “situatedness” in a software agent. Learning takes place when a learner interacts with, or is stimulated by, an environment [6]. An agent relates its own experience and gives meanings to a situation [7]. These two theories are not incompatible, but complementary, with both addressing the same issue at different levels. An agent that is designed to be situated at a conceptual level can still be implemented using various machine learners. In this paper, we describe how to model a situated agent’s experience using a Constructive Interactive Activation and Competition (CAIC) neural network. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 21 – 30, 2006. © Springer-Verlag Berlin Heidelberg 2006
22
W. Peng and J.S. Gero
This situated agent is applied in the design optimization domain. Design optimization is concerned with identifying optimal design solutions which meet design objectives while conforming to design constraints. The design optimization process involves some tasks that are both knowledge-intensive and error-prone. Such tasks include problem formulation, algorithm selection and the use of heuristics to improve efficiency of the optimization process. Choosing a suitable optimizer becomes the bottleneck of a design optimization process. Designers rely on their experience to carry out this task. Such a manual process may result in a sub-optimal design solution and hence an inefficient design. Our objective is to construct a computational model which is able to capture the knowledge of using the design optimization tool, and as a consequence, can aid the tool’s future use in supporting design. For example, a designer working on optimizing a hospital layout may find that a certain optimizer is more efficient in solving the problem applied. As the same or other designers tackle a similar design task, the same tool constructs memories of a design situation and anticipates the tool’s potential use. It can therefore offer helps to designers in their interactions in designing even before they require. Our approach is to use a situated agent that wraps around an existing design optimization tool. A user accesses a design tool via this wrapper, where a situated agent senses the events performed by that user and learns new concepts from the user’s interactions with it.
2 Situated Agency Software agents are intentional systems that work autonomously and interact with environments in selecting actions to achieve goals [14]. A situated agent is the software that is founded on the notion of “situatedness”.
Context: What I’m-doingnow
C3
E2
S4 C
C2 C2 Experience Perceptual Categorization 2
S3 E1 C1 time t’’
S2
C1 time t’
time t
(a)
time t’
S1
Perceptual Categorization 1
time t
S: Sensory Data
C: Perceptual Categories
E: Previous Conceptual Coordination
(b)
Fig. 1. (a) shows the conceptual knowledge as a higher order categorization of a sequence (after Fig. 1.6 of Clancey [2]). (b) illustrates a situated learning scenario.
Situatedness involves both the context and the observer’s experiences and the interactions between them. Situatedness [1] holds that “where you are when you do what you do matters” [4]. Conceptual knowledge can be learned by taking account of
Using a Constructive Interactive Activation and Competition Neural Network
23
how an agent orders its experience in time, which is referred as conceptual coordination [2], Fig. 1. Conceptual knowledge is a function of previously organized perceptual categories and what subsequently occurs, Fig. 1(a). It is generally formed by holding active a categorization that previously occurred (C1) and relating it to an active categorization C2 [2]. Fig. 1(b) illustrates a scenario of a situated concept learning process in which sensory data is augmented into a Gestalt whole. Perceptual category C1 groups sensory sequence “S1 S2” and activates the agent’s experience to obtain similar organizations. The agent’s experiential response (E1) represents the agent’s hypotheses about what would happen later in the environment. The agent constructs E1 with environmental changes (S3) into current perceptual category C2. This construction involves a validation process in which environmental changes are matched with the agent’s hypothesis. “Valid” means the environmental changes are consistent with the agent’s projection of such changes from a previous time. The grounding process then reinforces a valid experience. For invalid expectations, the agent updates its perceptual category (C2) with the latest environmental changes. A situated agent contains sensors, effectors, experience and a concept formation engine, which consists of a perceptor, a cue_Maker, a conceptor, a hypothesizer, a validator and related processes. Sensors gather events from the environment. These events include key strokes of objective functions, the users’ selections of design optimization algorithms, etc. Sense-data takes the form of a sequence of actions and their initial descriptions. For instance, sense-data can be expressed as: S (t) = {…… “click on objective function text field”, key stroke of “x”, “(”, “1”, “)”, “+”, “x”, “(, “2”, “)”…}
(1)
The perceptor processes sense-data and groups them into multimodal percepts, which are intermediate data structures illustrating environment states at a particular time. Percepts are structured as triplets: P (t) = (Object, Property, Values of properties)
(2)
For example, a perceptual data P1 can be described as (Objective Function Object, Objective_Function, “x(1)+x(2)”). The cue_Maker generates cues that can be used to activate the agent’s experience. The conceptor categorizes the agent experience to form concepts. Concepts attach meanings to percepts. The hypothesizer generates hypotheses from the learned concepts. This is where reinterpretation takes place in allowing the agent to learn in a “trial and error” manner. The validator pulls information from the environment and examines whether the environmental changes are consistent with the agent’s responses. An agent needs to validate its hypotheses in interactions to locate a suitable concept for the current situation. An effector is the unit via which the agent brings changes to environments through its actions. The agent’s experience is structured as a Constructive Interactive Activation and Competition (CIAC) neural network, in which we extend a basic IAC network [11] to accommodate the concept learning process, Fig. 2. An IAC consists of two basic nodes: instance node and property node. The instance node has inhibitory connections
24
W. Peng and J.S. Gero
to other instance nodes and excitatory connections to the relevant property nodes. The property nodes encode the special characteristics of an individual instance [12]. Property nodes are grouped into cohorts of mutually exclusive values [12]. Each property node represents the perceptual level experience which is processed from sensory data. Instance nodes along with the related property nodes describe an instance of a concept. Knowledge is extracted from the network by activating one or more of the nodes and then allowing the excitation and inhibition processes to reach equilibrium [12]. In Fig. 2, the shaded instance node (Ins-1) and related shaded property nodes presents a context addressable memory cued from an environment stimulus, e.g. [Objective_Function, f1]. Such a response is a dynamic construction in the sense that when environment stimuli change, the agent develops adapted knowledge. This organized experience grounds by weight adaptation and constructive learning. Optimizer Objective_Function Type ft1
o2
o1
Constraint_Type c1 c2
ft2 1
2
h2
v2 v1
h1 f1
Variable_Type
f2
Has_Hessian
Objective_Function Instance Cohort
Property Node (activated)
Property Node (inhibited)
Activation
Property Cohort
Instance Node (activated)
Instance Node (inhibited)
Inhibition
Fig. 2. A CIAC neural network as a representation of the agent’s experience; Property nodes are labeled by their value, e.g. f1 represents a property node in the Objective_Function cohort with objective function value f1
3 Constructing a Situated Agent’s Experience In this section, we discuss how to construct a situated agent’s experience with a CIAC neural net. Table 1 illustrates the formulas that are used to compute network input value (Ne) and activation value (Ac) for each nodes during activation and competition phrase. Table 1 also describes the formula that is applied to adjust the weights of each excitatory connection of the valid concept during the grounding via weight adaptation process, so that those nodes that fired together become more strongly connected. Weight adaptation is formulated similar to a Hebbian-like learning mechanism [12]. The pseudo code below (Table 2) presents procedures for constructing a situated agent’s experience with a CIAC neural network. It shows the relationships between
Using a Constructive Interactive Activation and Competition Neural Network
25
Table 1. Major formulas applied in the CIAC neural net
Items Network Input Values (Ne): -- the activations of each of the neurons can be regarded as the “degree of belief” in that hypothesis Activation Values (Ac): -- the tendency for units connected by negative weights to turn each other off -- a unit that begins with a small advantage “wins” the competition and becomes active at the expense of the other units in the pool in the end [3]. Weight Adaptation (Wn): -- the process that verifies the usefulness of a related experience in current situation. -- similar to a Hebbianlike learning mechanism
Fomulas Ne = μ Ε +
n
∑WA i =1
i
i
µ: excitatory gain for initial network stimulus, set to 4.0 E: initial network stimulus, default 1.0 Wi: inbound weights for a neuron Ai: activation value for each inbound weight n: number of neurons in a network If Ne > 0
Ac = Ac −1 + l [( Amax − Ac −1 )N e − ϕ ( Ac −1 − R )]
else A c = A c −1 + l [( A c −1 − A min )N e − ϕ ( A c −1 − R )] Ac: the activation value for each neuron at current cycle Ac-1: the activation value for that node at previous cycle Ne: the net input for each node Amax: the maximum permitted activation value, set to 1.0 Amin: the minimum permitted activation value, set to 0.2 R: the initial resting activation value, default -0.1 φ: the decay factor, default 1.0 ℓ: the learning rate, default 0.1; If Wo >0
Wn = Wo + l (Wmax − Wo )Ai A j − δWo
else
Wn = Wo Wo: the weight value before weight-adaptation Wn: the weight value after weight-adaptation ℓ: the learning rate, default 0.1 Wmax: the maximum permitted weight value, set to 1 Ai, Aj: activation values for neuron i and j δ:the weight decay factor, set to 0.1
functions in the experience package. Functions related to other packages are denoted by capitals of their package names. For example, Perceptor.generateCue() depicts the method belongs to the Perceptor package. The implemented prototype system is illustrated in Fig. 3. The tool wrapper interface allows designers to define problems. Sensors gather a user’s actions that comprise a design optimization process and activate a perceptor to create percepts. A percept cues the agent’s initial experience. Activation diagrams output the neurons winning at the equilibrium state, which represent the activated memory.
26
W. Peng and J.S. Gero Table 2. Constructing a situated agent’s experience with a CIAC neural net
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.
Initial experience CIAC net build from a matrix file Sensory data GUISensor.catcher() //captures a user’s actions Percepts Perceptor.modalitySwitch() //data tranform Cues Perceptor.generateCue() //create memory cues from percepts Activation: set activation values of the cue nodes to 1.0000 Cycling: while CIAC net not isEquilibrium() //check equlibrium states for all nodes in the CIAC neural net, compute Ne for each nodes for all nodes in the CIAC neural net, update Ac for each nodes End Cycling //reach equilibrium states Output activated nodes //output nodes with activation value > threshold Percepts Validator.pull() //check environment changes, transfer data result Validator.isValid() //check the usefulness of the activated experience If result = true Grounding via weight adaptation() // Hebbian-like learning Else { Percepts Validator.pull() //update and pull more environment changes hypotheses Hypothesizer.hypothesize(Percepts, Concepts, Target) output hypotheses result Validator.isValid() //check hypotheses If result = true Grounding() //weight adaptation via Hebbian-like learning Else { Constructive_learning() //incorporate new experience Concepts Concepter.learnInductiveConcepts() } } New experience
Conceptual Knowledge
B Grounded Experience via Weight Adaptation Activation Diagram Explanation-based Hypotheses
C Grounded Experience via Constructive Learning
A
Grounding via Constructive Learning Grounding via Weight Adaptation
Activation
cues Initial Experience
Inductive Learning
Tool Wrapper
Activating Existing Experience
Backward-chaining Hypothesizing
Fig. 3. Constructing a situated agent’s experience
Using a Constructive Interactive Activation and Competition Neural Network
27
Based on the responses from a CIAC neural net, the agent constructs initial concepts and displays the constructed knowledge in the tool wrapper. The grounding process initiates a validation function which matches the initially constructed concepts with environmental changes. The weight adaptation function increases connection weights of the valid concept and grounds experience A to experience B. The explanation-based learner can be involved to form a new concept if no valid concept has been activated. A percept at runtime can also be developed as a new concept by a constructive learning process. Experience C is learned from constructive learning and the related self conceptual labeling process. Conceptual labels are generalised knowledge that are obtained from applying an inductive learner to the agent’s experience. These conceptual knowledge serve as domain theories, from which the agent creates hypotheses. A typical grounded experience is illustrated as below.
Fig. 4. A typical experience grounded from an initial experience which has 1 instance node connected to several property nodes with weights “0.3000”. Property nodes are described as property and value pairs, e.g. “OBJF_Type: Linear” represents a property node with linear objective function type.
4 Agent Behaviours and Experiments This system provides a basis for us to explore the behaviours of a situated agent in various situations. We examine how the agent learns new concepts, in terms of developing knowledge structures and their intentional generalizations in its interactions with the environment. The following five internal states and their changes can be used to study how an agent constructs concepts: 1. The knowledge structure which is a Constructive Interactive Activation and Competition (CIAC) neural network composed of instance nodes connected to a number of property (or feature) nodes; 2. The expectation about environmental changes are generated by the agent experiential responses to environmental cues (shown in the activation diagram in Fig. 3);
28
W. Peng and J.S. Gero
3. The validator states show whether an agent’s expectation is consistent with the environment changes; 4. Hypotheses depict the agent’s reinterpretation about its failures in creating a valid expectation; 5. Concepts are the agent’s high-level experiences which are domain theories an agent uses to classify and explain its observations. Table 3. Experiments with various design optimization scnarios and the agent’s behaviours. Ac denotes activated experience. V1 represents the validator state for Ac. Hs are hypotheses. V2 describes the validator states for Hs. Be is the abbreviation for the agent’s behaviours. Nk means new knowledge learned. QP stands for a quadratic programming optimizer. NLP is a nonlinear programming optimizer. √ shows that the agent correctly predicts the situation and X shows the otherwise. “Ins-1” stands for design experience instance 1 and “OBJF” is the abbreviation for objective function. “Cons” represents constraints and “HF” is for Hessian function.
Design Scenarios Identical OBJF to ins-1, Linear Cons Linear OBJF, No Cons
Ac
V1
Hs
V2
Be
Nk
Ins-1
√
N/A
N/A
Grounds Ins-1
Grounded Ins-1
Ins-1
X
N/A
N/A
Constructs Ins2
Linear OBJF, Linear Cons
Ins-1
√
N/A
N/A
Quadratic OBJF and No Cons Quadratic OBJF and No Cons Quadratic OBJF and No Cons
None
X
N/A
N/A
Grounds Ins-1 Constructs Ins3 Constructs Ins-4
Ins-4
√
N/A
N/A
Ins4,5
√
N/A
N/A
Quadratic OBJF, Linear Cons
Ins4,5,6
X
Is a QP
√
New Experience Ins-2 Grounded Ins-1 and New Ins-3 New Experience Ins-4 Grounded Ins-4 and New Ins-5 Grounded Ins-4, 5 and New Ins-6; New Concepts 1,2 Grounded Ins-4, 5, 6 and New Ins-7;
Quadratic OBJF, Linear Cons, no HF
Ins4,5,6, 7
X
Is a QP
X
Grounds Ins-4 Constructs Ins-5 Grounds Ins-4, 5 Constructs Ins-6 Inductive Learning Hypothesize, Grounds Ins4,5,6 Constructs Ins7 Hypothesize, Constructs Ins8
New Ins-8, New Concepts 3,4
Using a Constructive Interactive Activation and Competition Neural Network
• • • •
29
New Concept 1: OBJF_Type = Quadratic Optimizer = Quad-Programming; New Concept 2: OBJF_Type = Linear Optimizer = Lin-Programming; New Concept 3: OBJF_Type = Quadratic and Provide_Hessian = false Optimizer = Nonlin-Programming; New Concept 4: OBJF_Type = Quadratic and Provide_Hessian = true Optimizer = Quad-Programming.
The initial experience of the agent holds one instance of a design optimization scenario using a linear programming algorithm. An experiment has been carried out to study the learning behaviours of a situated agent in heterogeneous design optimization scenarios over time. The performance is defined as the correctness of the system’s response to an environment cue, which predicts hence, assists the applied design task. The “0-1” loss function is applied to measure the outcomes of the prediction. The results are illustrated in Table 3. From these results of this experiment, we can see that the agent develops its experience through reorganizing existing experience or constructing a new design experience from its interaction with the environment. It adapts to the environment with the learned knowledge, ranging from detailed design instances to generalizations of these low-level experiences, i.e., new concept 1-4 in Table 3. As shown in Fig. 5, even at the early stage of its learning, the agent achieves a performance of 62.5% in recognizing design optimization problems. We conjecture one of the reasons for this is the content addressable ability of an CIAC neural net which can generalize across exemplars and provide plausible default values for unknown variables [3]. A situated agent that can inductively learn new concepts and subsequently deduce explanations for environment changes also adds a new level of learning to this CIAC neural net.
100% 50%
4t h 5t h 6t h 7t h 8t h
3r d
0%
1s t 2n d
Performance
Situated Learning
Situation Fig. 5. Performance of the initial stage of experience learning in a situated agent. The square dot shows the performance of the agent in recognizing new situations in the experiment. The dark black line represents the mean squared estimation of the agent’s performance (62.5%).
5 Conclusions In this paper, we have described an approach that applies a Constructive Interactive Activation and Competition (CIAC) neural network to model a situated agent’s experience. We demonstrate an implemented situated agent that uses a constructive memory model to learn new concepts from its interaction with the environment. From
30
W. Peng and J.S. Gero
the results obtained from the experiment, we can conclude that the agent develops its knowledge structures and behaviours specific to what it is confronted with – its experience. Based on the conceptual knowledge learned, the agent can further improve its learning behaviour. As a result, designers can integrate their expertise with the knowledge learned from the agent to develop design solutions. Such a system plays a potential role in enhancing the design optimization efficiency. Acknowledgments. This work is supported by a Cooperative Research Centre for Construction Innovation (CRC-CI) Scholarship and a University of Sydney Sesqui R and D grant.
References 1. Clancey, W.: Situated Cognition. Cambridge University Press. Cambridge (1997) 2. Clancey, W.: Conceptual Coordination: How the Mind Orders Experience in Time. Lawrence Erlbaum Associates, New Jersey (1999) 3. Dennis, S.: The Interactive Activation and Competition Network: How Neural Networks Process Information. http://www.itee.uq.edu.au/~cogs2010/cmc/chapters/IAC/ (1998) 4. Gero, J.S.: Towards a Model of Designing which includes its Situatedness. In: Grabowski, Rude and Grein (eds.): Universal Design Theory. Shaker Verlag, Aachen (1998) 47-56 5. Gero, J.S.: Constructive Memory in Design Thinking. In: Goldschmidt and Porter (eds.): Design Thinking Research Symposium: Design Representation, MIT, Cambridge (1999) 29-35 6. Hansen, R.E.: The Role of Experience Learning: Giving Meaning and Authenticity to the Learning Process. Journal of Technology Education.11 (2000) 23-32 7. Jarvis, P.: Meaningful and Meaningless Experience: Towards an Analysis of Learning from Life. Adult Education Quarterly. 37 (1987) 164-172 8. Kolb, D.A.: Experiential Learning: Experience as the Source of Learning and Development. Prentice Hall, Englewood Cliffs (1984) 9. Lave, J. and Wenger, E.: Situated Learning: Legitimate Peripheral Participation. University of Cambridge Press, Cambridge (1991) 10. Liew, P.: A Constructive Memory System for Situated Design Agents. PhD Thesis, Key Centre of Design Computing and Cognition, University of Sydney, Sydney, Australia (2004) 11. McClelland, J.L.: Retrieving General and Specific Information from Stored Knowledge of Specifics. In: Proceedings of the Third Annual Meeting of the Cognitive Science Society, Erlbaum, Hillsdale, NJ (1981) 170-172 12. Medler, D.A.: A Brief History of Connectionism. Neural Computing Surveys. 1 (1998) 61-101 13. Reffat, R. and Gero, J.S.: Computational Situated Learning in Design. In: Gero, J. S. (eds.): Artificial Intelligence in Design’00. Kluwer Academic Publishers, Dordrecht (2000) 589-610 14. Wooldridge, M. and Jennings, N.R.: Intelligent Agents: Theory and Practice. Knowledge Engineering Review. 10 (1995) 115–152 15. Wordnet. http://wordnet.princeton.edu.au/
Rule-Based Agents in Temporalised Defeasible Logic Guido Governatori1,, Vineet Padmanabhan1, , and Antonino Rotolo2, 1
School of ITEE, The University of Queensland, Australia {guido, vnair}@itee.uq.edu.au 2 CIRSFID, University of Bologna, Bologna, Italy
[email protected] Abstract. This paper provides a framework based on temporal defeasible logic to reason about deliberative rule-based cognitive agents. Compared to previous works in this area our framework has the advantage that it can reason about temporal rules. We show that for rule-based cognitive agents deliberation is more than just deriving conclusions in terms of their mental components. Our paper is an extension of [5,6] in the area of cognitive agent programming.
1 Introduction There are two main trends in the agent literature for programming cognitive agents in a BDI (belief, desire, intention) framework. The first one is system-based wherein the main idea is to develop a formal specification language that provides an explicit representation of states and operations on states that underly any BDI implementation [14,13,1]. In this approach the main idea is to formalise the operational semantics of the implemented system. The second one can be termed rule-based where rules are used to represent or manipulate an agent’s mental attitudes, i.e., an agent consists of a belief base, goal (desire) base, and intention base specified by logic formulas in the form of rules [7,3,4,8,15]. In addition to the three mental attitudes of beliefs, desires and intentions, the works above also include obligations, which are used to denote norms and commitments of social agents and social rationality. There are also works which club these two approaches like in [12]. Here we adopt the rule-based approach of [5,6] and extend it to accommodate temporal defeasible rules. The main question we try to answer in this paper is: What does it mean to deliberate for rule/policy-based agents? (By policy we mean a set of rules.) Of particular concern to us is the reasoning process involved in the deliberation of a rule-based agent wherein the agent can take a decision at t about what he/she has to do at t based on her beliefs and policies at t. In such a set up if no relevant event occurs then she can retain her deliberation at t . Consider the following rule p : t p , OBL q : tq ⇒ (OBL p : t p ⇒OBL s : ts ) : tr
(1)
whose reading is if p is true at time t p and q is obligatory at time tq , then the deontic rule OBL p : t p ⇒OBL s : ts is in force at time tr . In this work we develop a formal machinery
Supported by the Australian Research Council under the Discovery Project No. DP0558854. Supported by the European project for Standardised Transparent Representations in order to Extend Legal Accessibility (ESTRELLA, IST-4-027655).
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 31–40, 2006. c Springer-Verlag Berlin Heidelberg 2006
32
G. Governatori, V. Padmanabhan, and A. Rotolo
to reason about rules like (1). In general we want to accommodate in our framework rules of the type (a : t ⇒ b : t : t ) where t and t indicate the time at which a and b hold, while t is the time of the rule being in force. To incorporate this simple temporal reasoning we have to express whether events and states are permanent or immanent. If we use non-monotonic rules as in some of the works above then deliberation means reasoning about how to derive conclusions in terms of intentions/goals/plans, i.e., just deriving conclusions from a theory. Though [4] proposes a deliberation language, it considers only snapshots of the deliberation process in the sense that the deliberation program is dependent on the agent’s current state (one cycle). Put in Bratman’s terms to reason about intentions/goals/plans at t and decide for t. A complete solution would require the addition of a temporal variable to allow reasoning about the deliberation process after each time round [9]. A formal framework like the one developed in this paper is useful in the domain of legal information systems to reason about normative conditionals [11,10]. Another domain wherein the framework could be useful is with regard to policy-based rationality as outlined by Bratman [2] in his pursuit for a temporally extended rational agency. The principle can be roughly stated as follows: – At t0 agent A deliberates about what policy to adopt concerning a certain range of activities. On the basis of this deliberation agent A forms a general intention to ϕ in circumstances of type ψ . – From t0 to t1 A retains this general intention. – At t1 A notes that he/she will be (is) in circumstance ψ at t2 , where t2 ≥ t1 . – Based on the previous steps A forms the intention at t1 to ϕ at t2 . Notice that Bratman is concerned only with policy-based intentions1 and does not provide any formal framework to show the working aspect of his historical principle. In our model we have temporal rules for beliefs, desires, intentions and obligations as in (1) and a machinery based on defeasible logic (DL) to reason about such temporal rules. Given the temporal nature of Bratman’s historical principle, and the idea that some intentions can be retained from one moment to another, we must then account for two types of temporal deliberations: transient deliberations, which hold only for an instant of time, and persistent deliberations, in which an agent is going to retain them unless some intervening event that forces the agent to reconsider her deliberation occurs. This event can be just a brute fact or it can be a modification of the policy of the agent. Thus an agent must be able to cope with changes of the environment but also of her policies. Let us consider the following scenario. Our agent (Guido) has sent a paper to the PRICAI-06 conference, and he has the intention to attend the conference to present the paper if accepted. Guido’s school policy for funding is that if somebody intends to travel then she has to submit the request for travel funds two weeks (15 days) before the actual travel. This scenario can be represented as follows: r1 (PRICAIpaperAccepted : t1 ⇒INT Travel : t2 ) : t0 r2 (INT Travel : tX ⇒OBL Request : tX−15 ) : t0 1
(2) (3)
In [2] historical principles for deliberative as well as non-deliberative intentions is outlined. Here we are concerned only with the policy-based aspect.
Rule-Based Agents in Temporalised Defeasible Logic
33
Rule r1 states that Guido will form the intention to travel to PRICAI at a certain time in case the paper is accepted and that the rule is in force from t0 (let us say that t0 is the time when the paper is submitted to the conference), and r2 encodes Guido’s school travel policy (the policy is in force at time t0 ). Suppose that Guido, at time t1 , receives the notification that the paper has been accepted, then at that time he forms the intention to travel to PRICAI at time t2 . This triggers his obligation to have the travel request submitted two weeks before the date of the conference. Accordingly, he plans to prepare the required paperwork in due time. Time passes and two important events happen: the School updates the travel policy and Guido is appointed to a research-only position. The changes to the travel policy concerns research-only staff and the actual change is that, due to the new accounting software travel funds could be made available in less than one week. Thus the rule encoding the update to the policy is r3 (Research : tY ⇒ (INT Travel : tX ⇒OBL Request : tX−7 ) : t4 ) : t3 Here t3 is when the new policy has been issued and t4 is the time the new policy will be effective. Based on the updated policy and the new event, Guido complies with the new obligation if he submit the application for funds one week before travelling. Accordingly, he can change his plans and can postpone to fill all forms to a later time.
2 Temporalised DL for Cognitive Agents We focus on how mental attitudes and obligations jointly interplay in modelling agent’s deliberation and behaviour. Such an interplay is modelled within a temporal setting. The logical framework is based on DL, which is a simple and flexible sceptical non-monotonic formalism that has proven able to represent various aspects of non-monotonic reasoning. We extend here the machinery developed in [11,6,5] to represent temporalised motivational attitudes of agents. The basic language is based on a (numerable) set of atomic propositions Prop = {p, q, . . . }, a set of rule labels {r1 , r2 , . . . }, a discrete totally ordered set of instants of time T = {t1 ,t2 , . . . }, a set of modal operators M = {BEL, DES, INT, OBL} (belief, desire, intention, and obligation, respectively), and the negation sign ¬. A plain literal is either an atomic proposition or the negation of it. If l is a plain literal then, for any X ∈ M, Xl and ¬Xl are modal literals. A literal is either a plain literal or a modal literal. Given a literal l, ∼l denotes the complement of l, that is, if l is a positive literal p then ∼l = ¬p, and if l = ¬p then ∼l = p. A temporal literal is a pair l : t where l is a literal and t ∈ T . Intuitively, a temporal literal l : t means that l holds at time t. Knowledge in DL can be represented in two ways: facts and rules. Facts are indisputable statements, represented in the form of literal and modal literals. For example, “John is a minor”. In the logic, this might be expressed as Minor(John). A rule is a relation (represented by an arrow) between a set of premises (conditions of applicability of the rule) and a conclusion. In this paper, conclusions usually correspond to literals, but for a special class of rules they can also be rules themselves; in addition all the conclusions and the premises will be qualified with the time when they hold. We consider four classes of rules: rules for belief, desire, intention and obligation. Each class
34
G. Governatori, V. Padmanabhan, and A. Rotolo
of rules is qualified by labelling the arrow with any X ∈ M (for belief, desire, intention, and obligation). If X ∈ {DES, INT, OBL}, applicability of the corresponding rules permits to derive only literals: more precisely, if the consequent of such rules is a literal l : t, then their applicability leads to obtain the modal literal Xl : t. For any consequent l : t obtained through rules for X ∈ {DES, INT, OBL}, l : t is called a temporal goal. Rules for belief play a special role. They constitute the basic inference mechanism of an agent, as they concern the knowledge an agent has about the world. For this reason, their conclusions, if obtained, are not modalised; on the other hand this is the only class of rules for which conclusions can be also rules (rules for X ∈ M). Besides the above classification, rules can be partitioned according to their strength into strict rules (denoted by →), defeasible rules (denoted by ⇒) and defeaters (denoted by ;). Strict rules are rules in the classical sense: they are monotonic and whenever the premises are indisputable so is the conclusion. Defeasible rules, on the other hand, are non-monotonic: they can be defeated by contrary evidence. Defeaters are the weakest rules: they do not support directly conclusions, but can be used to block the derivation of opposite conclusions. Henceforth we use → as a metavariable for either → when the rule is a strict rule, ⇒ when the rule is a defeasible rule, and ; when the rule is a defeater. Thus we define the set of rule, Rules, using the following recursive definition: – a rule is either a rule for X, X ∈ M or the empty rule ⊥ – If r is a rule and t ∈ T , then r : t is a temporalised rule. (The meaning of a temporalised rule is that the rule is valid at time t.) – Let A be a finite set of temporal literals, C be a temporal literal and r a temporalised rule, then A →X C, A →X r are rules for X = BEL. – Let A be a finite set of temporal literals and C be a temporal plain literal. Then A →X C is a rule for X ∈ {DES, INT, OBL}. For a rule r labelled with any X ∈ M we will use A(r) to indicate the body or antecedent of the rule and C(r) for the head or consequent of the rule. It is also possible to have nested rules i.e., rules occurring inside rules for beliefs. However, it is not possible for a rule to occur inside itself. Thus for example, the following is a rule p : t p , OBLq : tq ⇒BEL (OBLp : t p ⇒INT s : ts ) : tr
(4)
(4) means that if p is true at time t p and q is obligatory at time tq , then the intention rule OBLp : t p ⇒INT s : ts is valid at time tr . Every temporalised rule is identified by its rule label and its time. Formally we can express this relationship by establishing that every rule label r is a function r : T → Rules. Thus a temporalised rule r : t returns the value/content of the rule ‘r’ at time t. This construction allows us to uniquely identify rules by their labels, and to replace rules by their labels when rules occur inside other rules. In addition there is no risk that a rule includes its label in itself. For example if we associate the temporal rule (OBLp : t p ⇒INT s : ts ) : tr to the pair r1 : tr , we can concisely rewrite (4) as (5) p : t p , OBLq : tq ⇒BEL r1 : tr It should be noted that we have to consider two temporal dimensions for rules. The first regards the efficacy (effectiveness) of a rule i.e., the capacity of a rule to produce a
Rule-Based Agents in Temporalised Defeasible Logic
35
desired effect at a certain time point, and the second shows when the rule is valid/comes into force. Consider the following two rules about a hypothetical tax regulation: r1 : (Income > 90K : 1Mar ⇒OBL Tax10 : 1Jan) : 1Jan : 15Jan
(6)
r2 : (Income > 100K : 1Mar ⇒OBL Tax40 : 1Jan) : 1Apr : 1Feb
(7)
Rule r1 states that if the income of a person is in excess of ninety thousand as of 1st March (Income > 90K : 1Mar) then he/she is obliged to pay the top marginal tax rate of 10 percent from 1st January (Tax10 : 1Jan) with the policy being in force from 15 January, and effective from 1st January. This means that the norm becomes part of the tax regulation from 15 January, but it is effective from 1st January. Accordingly, the policy covers tax returns lodged after 15 January as well as all tax returns lodged before the validity of the policy itself. The second rule, valid (i.e., part of the tax regulation) from 1st February, establishes a top marginal tax rate of 40% for tax returns lodged after the effectiveness date of 1st April. The above two rules illustrate the difference between the effectiveness and validity of a rule. In order to differentiate between the effectiveness and validity of a rule we introduce the notion of temporalised rule with viewpoint and 15 Jan in r1 denotes exactly this. A conclusion or a temporalised rule with viewpoint is an expression s@t, where t ∈ T , meaning that s “holds” when the agent reasons using the information available to her at t. Thus the expression r1 : t1 @t2 represents a rule r1 valid at time t2 and effective at time t1 . In the case of (6) this could be given as s@16Jan where s is (Income > 90K : 1Mar ⇒OBL Tax10 : 1Jan) : 1Jan and t = 16Jan. Thus for an agent intending to lodge a tax return on 16 Jan, there are no alternatives. She has to pay her taxes at the top marginal rate of 10%. However, should she postpone the decision after 1 February, then she has the option to evaluate when and how much tax she has to pay (10% if the tax return is lodged before 1 April and 40% if lodged afterward). Therefore she can plan her actions in order to achieve the most suitable result according to her goals. Hence, an agent equipped with such temporal rules should be able to figure out plans that are applicable at a particular time point. Temporal rules like (7) are more interesting, as they allow the agent to plan using rules having reference to past as well as future time points. We discuss more about temporalised rule with viewpoint in section 4. In addition the example shows that in general, unlike other approaches, there is no need to impose constraints on the time instants involved in a rule. Another issue we need to consider here is that we have two different types of conditionals to derive beliefs and goals (i.e., rules labelled with X ∈ M): conditionals that initiate a state of affairs which persists until an interrupting event occurs, and conditionals where the conclusion is co-occurrent with the premises. To represent this distinction we introduce a further distinction of rules, orthogonal to the previous one, where rules are partitioned in persistent and transient rules. A persistent rule is a rule whose conclusion holds at all instants of time after the conclusion has been derived, unless interrupting events occur; transient rules, on the other hand, establish the conclusion only for a specific instant of time. We use the following notation to differentiate the various types of rules: with →tX we represent a transient rule for X, and with →Xp a persistent rule. Given a set R of rules, we denote the set of strict rules in R by Rs , the set of strict and defeasible rules in R by Rsd , the set of defeasible rules in R by Rd , and the set of
36
G. Governatori, V. Padmanabhan, and A. Rotolo
defeaters in R by Rdft . R[q : t] denotes the set of rules in R with consequent q : t. We use RX for the set of rules for X ∈ M. The set of transient rules is denoted by Rtr and the set of persistent rules by R per . Finally we assume a set of rule modifiers. A rule modifier is a function m : Rules × T → Rules × T . The above constructions allow us to use rule modifiers on rule labels. Thus m(r1 : t1 ) : t2 returns the rule obtained from r1 as such at time t1 after the application of the modification corresponding to the function m and the result refers to the content of the rule at time t2 . Given this basic notion of rule modifier, we can define some functional predicates, i.e. specific rule-modifications. For the sake of brevity, we omit the technical details on how to adapt the basic definition of rule modifier to cover these specific rule modifications: Delete, Update and Add. As we shall see, these functional predicates can only occur in the head of belief rules. For the moment let us see their intuitive reading. The functional predicate Delete(r) : t says that a given rule r is deleted at t . More precisely, Delete(r) : t assigns the empty rule r : (⊥) : t to r as holding at t. The rule r is thus dropped at t from the system and so, at t , r is no longer valid. If r is a rule for X ∈ {DES, INT, OBL}, let A and C be a set of temporal literals and a temporal plain literal respectively; if r is a rule for belief, let A be defined as before, while C is a temporal plain literal or a temporalised rule. Then Update(r, A ) : t
Update(r, C ) : t
say that we operate, at t an update of r which replaces a subset or all components in the antecedent of r with other appropriate components and the consequent with a new appropriate element of the language. The new version of r will hold at t . Similarly Add(r , A(r ),C(r )) : t indicates that a new rule r is added at t to the system, and that r has the antecedent and consequent specified by A(r ) and C(r ).
3 Conflicts Between Rule Modifications Table 1 summarises the basic conflicts between rule modifications. Notice that conflicts obtain only if the conflicting modifications apply to the same time instant. Deleting a rule r is incompatible with any update of r (first and second rows from the top). This is the only case of real conflict. In fact, the third row from the top considers a “residual” but in theory possible conflict between modifications, namely, between those of deleting and adding at the same time a rule r. This case is marginal essentially because adding a rule r usually means that r is not valid in the theory. However, nothing prevents to add a rule r which is already valid in the system. In this case, the operation is redundant, but, if performed together with deleting r, we have indeed a conflict between modifications. Table 1. Conflicts Modifications Conditions Delete(r) : t Update(r, A ) : t t = t Delete(r) : t Update(r, C ) : t t = t Delete(r) : t Add(r, A(r),C(r)) : t t = t
Rule-Based Agents in Temporalised Defeasible Logic
37
4 Temporalised Rule with View Point In [11] we showed how to derive temporal literals in a DL framework. But this is of limited use and what we need is a way to derive temporal rules. In this section we extend the framework developed in [11,6,5] with temporal rules with a view point. What this means is that we can reason about temporal rules that are valid at a particular instant of time. Suppose that we have a defeasible theory D = (T , F, R, ≺) where T is discrete totally ordered set of instants of time, F is a finite set of temporalised literals, R a finite set of rules (comprising strict, defeasible and defeater rules) and ≺ a ternary relation (superiority relation) over R × R × T , meaning that one rule is stronger than another rule at a particualr time; for example r1 ≺t r2 means that rule r2 is stronger than rule r1 at time t. Conclusions in DL can have one of the following four forms (where X ranges over M): +ΔX @t q : t meaning that q is definitely provable with mode X, at time t with viewpoint t, in D (i.e., using only facts and strict rules). −ΔX @t q : t meaning that we have proved that q is not definitely provable with mode X, at time t with viewpoint t, in D. +∂X @t q : t meaning that q is defeasibly provable with mode X, at time t with viewpoint t, in D −∂X @t q : t meaning that we have proved that q is not defeasibly provable with mode X, at time t with viewpoint t, in D. For example, +∂OBL @t1 q : t0 means that we have a defeasible proof for OBLq at t0 , or, in other words, that OBLq holds at time t0 when we use the rules in force in the system at time t1 . However, these tags do not take care whether a conclusion q : t is obtained via transient rules (that is, q holds only at time t0 ) or via persistent rules, in such a case for every t such that t0 < t , the property q persists at time t , unless we have other evidence on the contrary, i.e., a piece of evidence that terminates the property q. To reflect these issues, we introduce auxiliary proof tags for persistent and transient conclusions. Formally, +ΔX @t p means that either +ΔXtr @t p or +ΔXpr @t p, i.e., either p is transient at t or it is persistent at t; −ΔX @t p means both −ΔXtr @t p or −ΔXpr @t p, i.e., it is not true that p is transient at t and that p is not persistent at t. The proof tags are labelled with the mode used to derive the rule, according to their appropriate proof conditions. It is not possible to give the complete set of proof conditions in this paper. Here we concentrate only on the proof conditions to derive defeasible persistence of both rules with belief mode, and literals. The proof conditions given here are extensions of those given in [11] for the temporal aspects and can be used for goals and planning as in [6,5]. The proof conditions missing in this paper can be obtained from the corresponding conditions of [11,6,5] using the same intuition on which the proof conditions we are going to show illustrate. Provability is based on the concept of a derivation (or proof) in D. A derivation is a finite sequence P = (P(1), . . . , P(n)) of tagged literals satisfying the proof conditions (which correspond to inference rules for each of the kinds of conclusion). P(1..n) denotes the initial part of the sequence P of length n. A strict derivation (i.e., a conclusion tagged with Δ ) is a monotonic derivation using forward chaining of rules, i.e., modus ponens. In DL a defeasible derivation, on the other hand, has three phases. In the first
38
G. Governatori, V. Padmanabhan, and A. Rotolo
phase we propose an argument in favour of the concussion we want to prove. In the simplest case this consists of an applicable rule for the conclusion (a rule is applicable if the antecedent of it has already been proved). Then in the second phase we examine all possible counter-arguments (rules for the opposite conclusion). Finally we have to rebut the counter-arguments. Thus we have to provide evidence against the counterargument. Accordingly, we can demonstrate that the argument is not as such (i.e., some of its premises are not provable), or we can show that the counter-argument is weaker than an argument for the conclusion. For persistent conclusions we have another method. We can use a derivation of the conclusion at a previous time provided that no terminating event occurred in between. In [11] the rules are given, but here rules are can also be derived. Thus in the proof conditions we have to cater for this option. Accordingly, we have to give conditions that allows us to derive rules instead of literals. For the sake of simplicity we will assume that all rules in R can be overruled/modified. Then we have to extend the notation R[x : t] to the case where x is a rule label (and rule-modifiers). Given a set of belief rules R and a set of rule modifiers M = {m1 , . . . , mn }, then R[r : tr ] = {s ∈ R : A(s) = mi (v : tv ) and mi (v : tv ) = r : tr } R[r : tr ] gives the set of nested rules whose head results in the rule r : tr after the application of the rule modifier; and R[∼r : tr ] = {s ∈ R : A(s) = mi (r : tr ) and mi (r : tr ) is in conflict with r : tr } The set R[∼r : tr ] gives the set of rules that modify r : tr and the modification is in conflict with the r : tr , see Table 1 for such conflicts. We can now give the proof conditions for +∂ pr to derive a rule. pr If P(n + 1) = +∂BEL @t r : tr then 1a) r : tr @t ∈ RBEL or 1b) ∃s : ts ∈ RBEL [r : tr ] such that +∂BEL @t s : ts ∈ P(1..n) and ∀Ya a : t ∈ A(s), +∂Ya @t a : t ∈ P(1..n); and 2) ∀v : tv ∈ RBEL [∼r : tr ] if +∂BEL @t v : tv ∈ P(1..n), then either 2.1) ∃Yb b : t ∈ A(v) such that −∂Yb @t b : t ∈ P(1..n) or 2.2 a) v : tv ≺t r : tr if 1a obtain or b) v : tv ≺t s : ts if 1b obtain; or pr 3) +∂BEL @t r : tr ∈ P(1..n), t < t and 3.1) ∀t ,t ≤ t < t, ∀s : ts ∈ R[∼r : tr ] if +∂BEL @t s : ts ∈ P(1..n), then 3.1.1) ∃Ya a : ta ∈ A(s), −∂Ya @t a : ta ∈ P(1..n) or ts < tr ; and pr 4) +∂BEL @t r : tr ∈ P(1..n), tr < tr and 4.1) ∀t ,tr ≤ t < tr , ∀s : ts ∈ R[∼r : tr ] if +∂BEL @t s : ts ∈ P(1..n), then 4.1.1) ∃Ya a : ta ∈ A(s), −∂Ya @t a : ta ∈ P(1..n) or ts < tr .
Let us briefly examine the above proof conditions. To prove a rule at time t, the rule must be in force at time t, i.e., the rule must be one of the given rules (condition 1a). There is a second possibility that the rule is derived from another rule. The second rule must be provable and applicable at t (condition 1b). However, this is not enough since there could have been modifications to the rule effective at t. Thus we have to show that either all eventual modifications were not applicable (2.1) or the modifications
Rule-Based Agents in Temporalised Defeasible Logic
39
were not successful since they were defeated (2.2a and 2.2b). Finally the rule could be provable because it was persistent, i.e., it was persistently in force before (3), and no modification occurred in between. The possible modifications in force after the rule was in force were not applicable to the rule. Or (4) the rule was persistently effective before, and its effectiveness was not revoked. The conditions for positive persistent defeasible proofs are as follows: If P(n + 1) = +∂X @t q : t then pr 1) +Δ X @t q : t ∈ P(1..n), or 2) −Δ X @t ∼q : t ∈ P(1..n), and X ,pr 2.1) ∃r : tr ∈ Rsd [q : t ]: +∂BEL @t r : tr ∈ P(1..n), and ∀Ya a : ta ∈ A(r : tr ), +∂Ya @t a : ta ∈ P(1..n) and 2.2) ∀s : ts ∈ RX [∼q : t]: if +∂BEL @t s : ts , then either 2.2.1) ∃Ya a : ta ∈ A(s : ts ), −∂Ya @t a : ta ∈ P(1..n); or 2.2.2) ∃w : tw ∈ RX [q : t]: +∂BEL @t w : tw ∈ P(1, , n) and ∀Ya a ∈ A(w : tw ), +∂Ya @t a : tw ∈ P(1..n) and w s; or pr 3) ∃t ∈ T : t < t and +∂X @t q : t ∈ P(1..m) and 3.1) ∀t t < t ≤ t ∀s : ts ∈ RX [∼q : t ]: if +∂BEL @t s : ts ∈ P(1..n), then 3.1.1) ∃Ya a : ta ∈ A(s : ts ), −∂Ya @t a : ta ∈ P(1..n) or 3.1.2) ∃v : tv ∈ RX [q : t ], +∂BEL @t v : tv ∈ P(1..n) and ∀Yb b : tb ∈ A(v : tv ) + ∂Yb @t b : tb ∈ P(1..n) and s : ts ≺t v : tv ; or 4) ∃t ∈ T : t < t and +∂Xpr @t q : t ∈ P(1..m) and pr @t s : ts ∈ P(1..n), then 4.1) ∀t t < t ≤ t ∀s : ts ∈ RX [∼q : t ]: if +∂BEL 4.1.1) ∃Ya a : ta ∈ A(s : ts ), −∂Ya @t a : ta ∈ P(1..n) or 4.1.2) ∃v : tv ∈ RX [q : t ] + ∂BEL @t v : tv ∈ P(1..n) and ∀Yb b : tb ∈ A(v : tv ) + ∂Yb @t b : tb ∈ P(1..n) and s : ts ≺t v : tv . pr
Clause 1 of the above proof condition allows us to infer a defeasible persistent conclusion from a strict persistent conclusion with the same mode. Clause 2 requires that the complement of the literal we want to prove is not definitely provable (or definitely provable for −∂ ), but it does not specify whether it is persistent or transient: remember that what we want to achieve is to see whether the literal or its complement are provable at t but not both; in the same way, and for the same reason, q can be attacked by any rule for the complement of q (clauses 2.2.1). An important issue in all clauses of this proof condition is that each time we have to use a rule (either to support the conclusion (2.1), to attack it (2.2) or to rebut the attack (2.2.2)) we must have that the rule is provable at time t of the derivation (@t). Clauses 3 and 4 are the clauses implementing persistence (i.e., the conclusion has been derived at a previous time and carries over to the current time). Essentially clause 3 ensures that the conclusion has been derived at a previous time t and no interrupting event occurred between t and t; while clause 4 takes care of the case where q is derived persistently for a time before t , and that no interrupting event will occur between the effectiveness of q and the time q is expected to hold according to the current derivation.
5 Summary In this paper we combined and extended the approaches presented in [11] and [6,5]. In particular we have extended the programming cognitive agents approach with
40
G. Governatori, V. Padmanabhan, and A. Rotolo
temporalised literals. This makes the resulting logic more expressive and more suitable for the task at hand. In addition we have introduced the notion of view-point. The deliberation of an agent based on a policy depends not only on the environment but also on the rules in force in the policy at the time of deliberation and at the time when the plan resulting from the deliberation will be executed. These two aspects are neglected in the literature on agent planning. In addition the framework we propose can handle revision of theories in the same way the framework is inspired to handle complex modification of normative codes [10]. An aspect we did not consider here is how to extend the temporal framework to reason with actions and their duration. This matter is left for future work.
References 1. R.H. Bordini, A.L. C. Bazzan, R. de O. Jannone, D.M. Basso, R.M. Vicari, and V.R. Lesser. Agentspeak(xl): efficient intention selection in BDI agents via decision-theoretic task scheduling. In AAMAS’02, pp. 1294–1302, 2002. 2. M.E. Bratman. Intentions, Plans and Practical Reason. Harvard University Press, 1987. 3. J. Broersen, M. Dastani, J. Hulstijn, and L. van der Torre. Goal generation in the BOID architecture. Cog. Sc. Quart., 2:428–447, 2002. 4. M. Dastani, F. de Boer, F. Dignum, and J.-J. Meyer. Programming agent deliberation. In AAMAS’03, pp. 91–104, ACM Press, 2003. 5. M. Dastani, G. Governatori, A. Rotolo, and L. van der Torre. Preferences of agents in defeasible logic. In Australian AI05, pp. 695–704. Springer, 2005. 6. M. Dastani, G. Governatori, A. Rotolo, and L. van der Torre. Programming cognitive agents in defeasible logic. In LPAR 2005, pp. 621–636. Springer, 2005. 7. M. Dastani and L.W.N. van der Torre. Programming BOID-plan agents: Deliberating about conflicts among defeasible mental attitudes and plans. In AAMAS 2004, pp. 706–713, 2004. 8. M. Dastani, B. van Riemsdijk, F. Dignum, and J.-J. Meyer. A programming language for cognitive agents: Goal directed 3APL. In ProMAS’03, pp. 111–130. Springer, 2003. 9. A.S. d’Avila Garcez and L.C. Lamb. Reasoning about time and knowledge in neural symbolic learning systems. In Sebastian Thrun, Lawrence saul, and Bernhard Sch¨o lkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, 2004. 10. G. Governatori, M. Palmirani, R. Riveret, A. Rotolo, and G. Sartor. Norm modifications in defeasible logic. In JURIX05, pp. 13-22. IOS Press, 2005. 11. G. Governatori, A. Rotolo, and G. Sartor. Temporalised normative positions in defeasible logic. In ICAIL, pp. 25–34. ACM Press 2005. 12. K.V. Hindriks, F.S. De Boer, Hoek W. van der, and J.-J. Meyer. Agent programming in 3(apl). Autonomous Agents and Multi-Agent Systems, 2(4):357–401, 1999. 13. M.D. Inverno and M. Luck. Engineering agentspeak(l): A formal computational model. Journal of Logic and Computation, 8:1–27, 1998. 14. A.S. Rao. Agentspeak(L): BDI agents speak out in a logical computable language. Technical report, Australian Artificial Intelligence Institute, 1996. 15. M. Birna van Riemsdijk, M. Dastani, and J.-J. Meyer. Semantics of declarative goals in agent programming. In AAMAS 2005, pp. 133-140. ACM Press, 2005.
Compact Preference Representation for Boolean Games Elise Bonzon, Marie-Christine Lagasquie-Schiex, and J´erˆome Lang IRIT, UPS, F-31062 Toulouse C´edex 9, France {bonzon, lagasq, lang}@irit.fr
Abstract. Boolean games, introduced by [15,14], allow for expressing compactly two-players zero-sum static games with binary preferences: an agent’s strategy consists of a truth assignment of the propositional variables she controls, and a player’s preferences is expressed by a plain propositional formula. These restrictions (two-players, zero-sum, binary preferences) strongly limit the expressivity of the framework. While the first two can be easily encompassed by defining the agents’ preferences as an arbitrary n-uple of propositional formulas, relaxing the last one needs Boolean games to be coupled with a propositional language for compact preference representation. In this paper, we consider generalized Boolean games where players’ preferences are expressed within two of these languages: prioritized goals and propositionalized CP-nets.
1 Introduction The framework of Boolean games, introduced by [15,14], allows for expressing compactly two-players zero-sum static games with binary preferences: an agent’s strategy consists of a truth assignment of the propositional variables she controls, and a player’s preferences is expressed by a plain propositional formula. Arguably, these three restrictions (two-players, zero-sum, binary preferences) strongly limit the expressivity of the framework. The first two can be easily encompassed by defining the agents’ preferences as an arbitrary n-uple of propositional formulas (see [3], who addresses complexity issues for these binary n-players Boolean games). In this paper we focus on the third one, which needs considerably more work to be dealt with. The starting point of our paper is that whereas a single propositional formula (goal) ϕ cannot express more than a binary preference relation on interpretations (models of ϕ are strictly better than models of ¬ϕ), expressing arbitrary (non-binary) preferences within a propositional framework is possible, making use of a propositional language for compact preference representation. The study of such languages has been a very active issue for a few years in the AI community. Several classes of languages based on propositional logic have been proposed and studied (see for instance [16,8] for an overview of these languages). A first question has to be addressed before going further: should agents’ preferences be expressed in a numerical way or in an ordinal way? This depends a lot on the notions we want to deal with. While some notions (such as pure Nash equilibria and dominated strategies) can be defined in a purely ordinal setting, other ones (such as mixed strategy Nash equilibria) need quantitative (real-valued) preferences. Here we choose to stick to ordinal settings (we leave numerical preferences in Boolean games for further Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 41–50, 2006. c Springer-Verlag Berlin Heidelberg 2006
42
E. Bonzon, M.-C. Lagasquie-Schiex, and J. Lang
work – see Section 5), and we successively integrate Boolean games with two of these languages: first, prioritized goals, and then (propositionalized) CP-nets. In Section 2, some background is given and we define n-players, non zero-sum) Boolean games with binary preferences. Boolean games are then enriched with prioritized goals in Section 3, and with propositionalized CP-nets in Section 4. Section 5 addresses related work and further issues.
2 n-Players Boolean Games Let V = {a, b, . . .} be a finite set of propositional variables and LV be the propositional language built from V and the usual connectives as well as the Boolean constants (true) and ⊥ (false). Formulas of LV are denoted by ϕ, ψ, etc. A literal is a formula of the form x or of the form ¬x, where x ∈ V . A term is a consistent conjunction of literals. 2V is the set of the interpretations for V , with the usual meaning that an interpretation M gives the value true to a variable x if x ∈ M, and the value false otherwise. |= denotes classical logical consequence. Let X ⊆ V . 2X is the set of X-interpretations. A partial interpretation (for V ) is an X-interpretation for some X ⊆ V . Partial interpretations are denoted by listing all variables of X, with a ¯ symbol when the variable is set to false: for instance, let X = {a, b, d}, then the X-interpretation M = {a, d} is denoted abd. If {V1, . . . ,Vp } is a partition of V and {M1 , . . . , M p } are partial interpretations, where Mi ∈ 2Vi , (M1 , . . . , M p ) denotes the interpretation M1 ∪ . . . ∪ M p . Given a set of propositional variables V , a Boolean game on V [15,14] is a zero-sum game with two players (1 and 2), where the actions available to each player consist in assigning a truth value to each variable in a given subset of V . The utility functions of the two players are represented by a propositional formula ϕ formed upon the variables in V and called Boolean form of the game1. ϕ represents the goal of Player 1: her payoff is 1 when ϕ is satisfied, and 0 otherwise. Since the game is zero-sum2, the goal of Player 2 is ¬ϕ. This simple framework can be extended in a straightforward way to non zerosum n-players games (see [3], especially for complexity issues): each player i has a goal ϕi (a formula of LV ). Her payoff is 1 when ϕi is satisfied, and 0 otherwise. Definition 1. A n-players Boolean game is a 4-uple (A,V,π,Φ), where A = {1, 2, . . ., n} is a set of players, V is a set of propositional variables π : A → V is a control assignment function and Φ = ϕ1 ,. . . , ϕn is a collection of formulas of LV . The control assignment function π associates every player with the variables that she controls. For the sake of notation, the set of all the variables controlled by i is written πi instead of π(i). We require that each variable be controlled by one and only one agent, i.e., {π1, . . . , πn } forms a partition of V . The original definition by [15,14] is a special case of this more general framework, obtained by letting n = 2 and ϕ2 = ¬ϕ1 . 1
2
The original definition in [15,14] is inductive: a Boolean game consists of a finite dynamic game. We use here the equivalent, simpler definition of [11], who showed that this tree-like construction is unnecessary. Stricto sensu, the obtained games are not zero-sum, but constant-sum (the sum of utilities being 1) – the difference is irrelevant and we use the terminology “zero-sum” nevertheless.
Compact Preference Representation for Boolean Games
43
Definition 2. Let G = (A,V, π, Φ). A strategy si for a player i is a πi -interpretation. A strategy profile S for G is an n-uple S = (s1 , s2 , . . . , sn ) where for all i, si ∈ 2πi . In other words, a strategy for i is a truth assignment for all the variables i controls. Remark that since {π1 , . . . , πn } forms a partition of V , a strategy profile S is an interpretation for V , i.e., S ∈ 2V . Ω denotes the set of all strategy profiles for G. The following notations are usual in game theory. Let G = (A,V,π,Φ), S=(s1 , . . . , sn ), S = (s 1 , . . . , s n ) be two strategy profiles for G. s−i denotes the projection of S on A \ {i}: s−i = (s1 , s2 , . . . , si−1 , si+1 , . . . , sn ). Similarly, π−i denotes the set of the variables controlled by all players except i: π−i = V \ πi . Finally, (s−i , s i ) denotes the strategy profile obtained from S by replacing si with s i without changing the other strategies: (s−i , s i ) = (s1 , s2 , . . . , si−1 , s i , si+1 , . . . , sn ). Example 1. We consider here a Boolean n-players version of the well-known prisoners’ dilemma. n prisoners (denoted by 1, . . . , n) are kept in separate cells. The same proposal is made to each of them: “Either you cover your accomplices (Ci , i = 1, . . . , n) or you denounce them (¬Ci , i = 1, . . . , n). Denouncing makes you freed while your partners will be sent to prison (except those who denounced you as well; these ones will be freed as well). But if none of you chooses to denounce, everyone will be freed.3 ” This can be expressed much compactly by the following n-players Boolean game G = (A,V, π, Φ): A = {1, 2, . . ., n}; V = {C1 , . . . ,Cn }; and strategy of 3: C3 strategy of 3: C3 for every i ∈ {1, . . . , n}, πi = {Ci } and ϕi = 2 @ (C1 ∧ C2 ∧ . . .Cn ) ∨ ¬Ci . Here is the repreC2 C2 C2 C2 sentation of this game in normal form for 1 @ @ n = 3, where in each (x, y, z), x – resp. y, C1 (1, 1, 1) (0, 1, 0) (0, 0, 1) (0, 1, 1) resp. z – represents the payoff of player 1 – C (1, 0, 0) (1, 1, 0) (1, 0, 1) (1, 1, 1) 1 resp. 2, resp. 3. The explicit representation of this game in normal form would need exponential space, which illustrates the succinctness power of a representation by Boolean games. Each player i has two possible strategies: si1 = {Ci }, si2 = {Ci }. There are 8 strategy profiles for G. Consider S1 = (C1 ,C2 ,C3 ) and S2 = (C1 ,C2 ,C3 ). Under S1 , players 1, 2 and 3 have their goal satisfied, while S2 satisfies only Player 1’s goal. This choice of binary utilities (where agents can only express plain satisfaction or plain dissatisfaction, with no intermediate levels) is a real loss of generality. We would like now to allow for associating an arbitrary preference relation on Ω with each player. A preference relation is a reflexive and transitive binary relation (not necessarily complete) on Ω. The strict preference associated with is defined as usual by S1 S2 if and only if S1 i S2 and not (S2 i S1 ). A generalized Boolean game will be a 4-uple G = (A,V, π, Φ), where A = {1, . . . , n}, V and π are as before and Φ = Φ1 , . . . , Φn , where for each i, Φi is a compact representation (in some preference representation language) of the preference relation i of agent i on Ω. We let Pre fG = 1 , . . . , n . 3
The case where everyone will be freed if everyone denounces the others is a side effect of our simplication of the prisoners’ dilemma.
44
E. Bonzon, M.-C. Lagasquie-Schiex, and J. Lang
A pure strategy Nash equilibrium (PNE) is a strategy profile such that each player’s strategy is an optimum response to the other players’ strategies. However, PNEs are classically defined for games where preferences are complete, which is not necessarily the case here. Therefore we have to define two notions of PNEs, a weak one and a strong one (they are equivalent to the notion of maximal and maximum equilibria in [14]). Definition 3. Let G = (A,V, π, Φ) and Pre fG = 1 , . . . , n the collection of preference relations on Ω induced from Φ. Let S = (s1 , . . . , sn ) ∈ Ω. S is a weak PNE (WPNE) for G iff ∀i ∈ {1, . . . , n}, ∀s i ∈ 2πi , (s i , s−i ) i (si , s−i ). S is a strong PNE (SPNE) for G iff ∀i ∈ {1, . . . , n}, ∀s i ∈ 2πi , (s i , s−i ) i (si , s−i ). NEstrong (G) and NEweak (G) denote respectively the set of strong and weak PNEs for G. Clearly, any SPNE is a WPNE, that is, NEstrong (G) ⊆ NEweak (G)).
3 Boolean Games and Prioritized Goals The preferences of a single player in this framework are expressed by a set of goals ordered by a priority relation: Definition 4. A prioritized goal base Σ is a collection Σ1 ; . . . ; Σ p of sets of propositional formulas. Σ j represents the set of goals of priority j, with the convention that the smaller j, the more prioritary the formulas in Σ j . In this context, several criteria can be used in order to generate a preference relation from Σ. We recall below the three most common ones. In the following, if S is an interpretation of 2V then we let Sat(S, Σ j ) = {ϕ ∈ Σ j | S |= ϕ}. Definition 5. Let Σ = Σ1 ; . . . ; Σ p , and let S and S be two interpretations of 2V . Discrimin preference relation [7,13,2] S disc S iff ∃k ∈ {1, . . . , p} such that: Sat(S, Σk ) ⊃ Sat(S , Σk ) and ∀ j < k, Sat(S, Σ j ) = Sat(S , Σ j ) Leximin preference relation [10,2,17] S lex S iff ∃k ∈ {1, . . . , p} such that: |Sat(S, Σk )| > |Sat(S , Σk )| and ∀ j < k, |Sat(S, Σ j )| = |Sat(S , Σ j )|. Best-out preference relation [10,2] Let a(s) = min{ j such that ∃ϕ ∈ Σ j , S |= ϕ}, with the convention min(∅) = +∞. Then S bo S iff a(S) ≥ a(S ). Note that bo and lex are complete preference relations, while disc is generally a partial preference relation. Moreover, the following implications hold (see [2]): (S bo S ) ⇒ (S discr S ) ⇒ (S lex S ) (1) (S discr S ) ⇒ (S lex S ) ⇒ (S bo S ) (2)
Definition 6. A PG-Boolean game is a 4-uple G = (A,V, π, Φ), where Φ = (Σ1 , . . . , Σn ) j is a collection of prioritized goals bases. We denote Σi = Σ1i , . . . , Σip , that is, Σi denotes the stratum j of Σi , or equivalently, the (multi)set of goals of priority j for player i. Note that the assumption that the number of priority levels is the same (p) for all players does not imply a loss of generality, as adding empty strata to a prioritized base does not change the induced preference relation. We make use of the following notations:
Compact Preference Representation for Boolean Games
45
– if G is a PG-boolean game and c ∈ {disc, lex, bo} then Pre fGc = c1 , . . . , cn . c c (G) and NEstrong (G) denote respectively the sets of all weak and strong – NEweak Nash equilibria for Pre fGc . Example 2. Let G = (A,V, π, Φ) with A = {1, 2}, V = {a, b, c}, π1 = {a, c}, π2 = {b}, Σ1 = a; (¬b, c), Σ2 = (¬b, ¬c); ¬a. For each of the three criteria c ∈ {lex, disc, bo}, we draw the corresponding preference relations Pre fGc = c1 , . . . , cn . The arrows are oriented from more preferred to less preferred strategy profiles (S1 is preferred to S2 is denoted by S1 → S2 ). To make the figures clearer, we do not draw edges that are obtained from others by transitivity. The dotted arrows indicate the links taken into account in order to compute Nash equilibria. P1
Disc
Lex
abc
abc abc
abc
P2
abc
abc
abc
abc
abc
abc abc
abc
BO
abc Disc
abc Lex
abc
abc
abc
abc abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
BO
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
disc disc – Discrimin and Leximin: NEweak (G) = NEstrong (G) = {abc} bo bo – Best Out: NEweak (G) = NEstrong (G) = {abc, abc}
Lemma 1. Let = 1 , . . . , n and = 1 , . . . , n be two collections of preference relations, and let S be a strategy profile.
46
E. Bonzon, M.-C. Lagasquie-Schiex, and J. Lang
1. If is contained in and if S is a SPNE for , then S is a SPNE for . 2. If is contained in and if S is a WPNE for , then S is a WPNE for . This lemma enables us to draw the following: Proposition 1. Let G = (A,V, π, Φ) be a PG-boolean game and Pre fGc = c1 , . . . , cn . disc (G) ⊆ NE lex (G) ⊆ NE bo lex disc bo NEstrong strong strong (G) and NEweak (G) ⊆ NEweak (G) ⊆ NEweak (G). We may now wonder whether a PG-boolean game can be approximated by focusing on the first k strata of each player. Here, the aim is double: to obtain a simpler (for PNE computation) game and to increase the possibility to find a significant PNE taking into account the most prioritized strata. Definition 7. Let G=(A={1,. . .,n},V,π,Φ) be a PG-boolean game, and k ∈ {1, . . . , p}. G[1→k] = (A,V, π, Φ[1→k] ) denotes the k-reduced game of G in which all players’ goals [1→k] [1→k] in G are reduced in their k first strata: Φ[1→k] = Σ1 , . . . , Σn . Lemma 2. Let G be a PG-boolean game. Then for every k ≤ p, c ∈ {discr, lex, bo}, and c,[1→k] c,[1→k−1] c,[1→k] c,[1→k−1] S ⇒ S i S and S i S ⇒ S i S. every i ∈ A, we have: S i Proposition 2. Let G be a PG-boolean game and c ∈ {discr, lex, bo}. If S is a SPNE (resp. WPNE) for Pre fGc [1→k] of the game G[1→k] , then S is a SPNE (resp. WPNE) for
Pre fGc [1→(k−1)] of the game G[1→(k−1)] .
This proposition leads in an obvious way that if G[1] for Pre fGc [1] does not have any SPNE (resp. WPNE), then the game G for Pre fGc does not have any SPNE (resp. WPNE) whatever the criteria used. The converse is false, as shown in the following example. Example 3. Let G with A = {1, 2}, V = {a, b}, π1 = {a}, π2 = {b}, Σ1 = a → b; b → bo (G) = NE bo a, Σ2 = a ↔ ¬b; ¬b. We check that NEweak strong (G) = ∅. Let us now focus [1]
[1]
on the 1-reduced game G[1] = (A,V, π, Φ[1] ) of G. We have Σ1 = a → b, Σ2 = a ↔ c c ¬b. We check that for any criterion c, NEweak (G[1] ) = NEstrong (G[1] ) = {ab}. This example shows us that Proposition 2 can be used to find the right level of approximation for a PG-game. For instance, we may want to focus on the largest k such that G[1→k] has a SPNE, and similarly for WPNEs.
4 Boolean Games and CP-Nets A problem with prioritized goals is the difficulty for the agent to express his preferences (from a cognitive or linguistic point of view). In this Section we consider another very popular language for compact preference representation on combinatorial domains, namely CP-nets. This graphical model exploits conditional preferential independence in order to structure decision maker’s preferences under a ceteris paribus assumption. They were introduced in [6] and extensively studied in many subsequent papers, especially [4,5].
Compact Preference Representation for Boolean Games
47
Although CP-nets generally consider variables with arbitrary finite domains, for the sake of simplicity (and homogeneity with the rest of the paper) here we consider only “propositionalized” CP-nets, that is, CP-nets with binary variables (note that this is not a real loss of generality, as all our definitions and results can be easily lifted to the more general case of non-binary variables). Definition 8. Let V be a set of propositional variables and {X,Y, Z} a partition of V . X is conditionally preferentially independent of Y given Z if and only if ∀z ∈ 2Z , ∀x1 , x2 ∈ 2X and ∀y1 , y2 ∈ 2Y we have : x1 y1 z x2 y1 z iff x1 y2 z x2 y2 z. For each variable X, the agent specifies a set of parent variables Pa(X) that can affect her preferences over the values of X. Formally, X and V \ ({X} ∪ Pa(X)) are conditionally preferentially independent given Pa(X). This information is used to create the CP-net: Definition 9. Let V be a set of variables. N = G , T is a CP-net on V , where G is a directed graph over V , and T is a set of conditional preference tables CPT (X j ) for each j X j ∈ V . Each CPT (X j ) associates a total order p with each instantiation p ∈ 2Pa(X j ) . Definition 10. A CP-boolean game is a 4-uple G = (A,V, π, Φ), where A = {1, . . . , n} is a set of players, V = {x1 , . . . , x p } is a set of variables and Φ = N1 , . . . , Nn . Each Ni is a CP-net on V . Example 4. G = (A,V, π, Φ) where A = {1, 2} V = {a, b, c} π1 = {a, b}, π2 = {c}, N1 and N2 are represented on the following figure. abc a
aa
a
abc
aa
abc b
bb a∧b c c
c
N1
abc
abc
abc abc
a∧b c c a∧b c c
abc
abc
a∧b c c
abc
b
abb abb
abc
abc
abc c
bcc bcc
abc
abc
abc 1
N2
2
Using these partial pre-orders, Nash equilibria are: NEstrong = NEweak = {abc}. The first property concerns a very interesting case where the existence and the unicity of PNE hold:
48
E. Bonzon, M.-C. Lagasquie-Schiex, and J. Lang
Proposition 3. Let G = (A,V, π, Φ) be a CP-boolean game such the graphs Gi are all identical (∀i, j, Gi = G j ) and acyclic. Then G has one and only one strong PNE. The proof of this result makes use of the forward sweep procedure [6,4] for outcome optimization (this procedure consists in instantiating variables following an order compatible with the graph, choosing for each variable its preferred value given the value of the parents). The point is that in general the graphs Gi for i ∈ {1, . . . , n} may not be identical. However, they may be made identical, once remarked that a CP-net G , T can be expressed as a CP-net G , T as soon as the set of edges in G is contained in the set of edges in G . We may then take as common graph G (to all players) the graph whose set of edges is the union of the set of edges of G1 , . . . , Gn . The only problem is that the resulting graph may not be acyclic, in which case Proposition 3 is not applicable. Formally: Definition 11. Let G be a CP-boolean game. For each player i, Gi is denoted by (V, Arci ), with Arci being the set of edges of i’s CP-net. The union graph of G is defined by G = (V, Arc1 ∪ . . . ∪ Arcn ). The normalized game equivalent to G, denoted by G∗ = {A,V, π, Φ∗ }, is the game obtained from G by rewriting, where the graph of each player’s CP-net has been replaced by the graph of the union of CP-nets of G and the CPT of each player’s CP-net are modified in order to fit with the new graph, keeping y the same preferences (formally, if i denotes the relation associated with CPTi (y) for Player i’s CP-net in G, then we have for G∗ : ∀x ∈ V such that x is a parent of y in G∗ but not in G, yi,x =yi,x =yi ). The following lemma is straightforward: Lemma 1. Let G be a CP-boolean game and G∗ its equivalent normalized game. Then G∗ and G define the same preference relations on strategy profiles. Therefore, if G∗ is acyclic, then Proposition 3 applies, therefore G∗ has one and only one SPNE. Now, since G and G∗ define the same pre-orders on Ω, the latter is also the only SPNE of G (on the other hand, if the graph of G is cyclic, neither the unicity nor the existence of SPNEs is guaranted). Proposition 4. Let G = (A,V, π, Φ) be a CP-boolean game. If the union graph of G is acyclic then G has one and only one strong PNE. Example 4, continued: Players’ preferences in the normalized game G∗ (equivalent to G) are represented by the CP-nets given on Figure 1. The union graph is acyclic, therefore Proposition 3 can be applied and G has one and only one strong PNE (abc). There is a last condition (less interesting in practice because it is quite strong) guaranteeing the existence and the unicity of a SPNE. This condition states that any variable controlled by an agent is preferentially independent on variables controlled by other agents (in other words, the parents of any variable controlled by a player i are also controlled by i). In this case, each agent is able to instantiate her variables in an unambiguously optimal way, according to her preferences. Proposition 5. Let G = (A,V, π, Φ) be a CP-boolean game such that for every player i ∈ A and for every v ∈ πi , we have Pa(v) ∈ πi . Then G has one and only one SPNE.
Compact Preference Representation for Boolean Games
a aa
b c a∧b c c
N1
a bb a bb
a∧b c c
a aa
b c a∧b c c
N2
49
a bb a bb
a∧b c c
a∧b c c
a∧b c c
a∧b c c
a∧b c c
Fig. 1. CP-net of Players 1 and 2’s preferences for G∗
5 Related Work and Conclusion Apart of previous work on Boolean games [15,14,11], related work includes a few papers where games are expressed within ordinal preferences within well-developed AI frameworks. In [12], a game in normal form is mapped into a logic program with ordered disjunction (LPOD) where each player owns a set of clauses that encode the player’s preference over her possible actions given every possible strategy profile of other players. It is shown that PNE correspond exactly to the most preferred answer sets. The given translation suffers from a limitation, namely its size: the size of the LPOD is the same as that of the normal form of the game (since each player needs a number of clauses equal to the number of possible other strategy profiles for other players). However, this limitation is due to the way LPODs are induced from games and could be overwhelmed by allowing to express the players’ preferences by any LPODs (in the same spirit as our Section 3). In [9], a strategic game is represented using a choice logic program, where a set of rules express that a player will select a “best response” given the other players’ choices. Then, for every strategic game, there exists a choice logic program such that the set of stable models of the program coincides with the set of Nash equilibria of the game. This property provides a systematic method to compute Nash equilibria for finite strategic games. In [1], CP-nets are viewed as games in normal form and vice versa. Each player i corresponds to a variable Xi of the CP-net, whose domain D(Xi ) is the set of available actions to the player. Preferences over a player’s actions given the other players’ strategies are then expressed in a conditional preference table. The CP-net expression of the game can sometimes be more compact than its normal form explicit representation, provided that some players’ preferences depend only on the actions of a subset of other players. A first important difference with our framework is that we allow players to control an arbitrary set of variables, and thus we do not view players as variables; the only way of expressing in a CP-net that a player controls several variables would consist in introducing a new variable whose domain would be the set of all combination of values for these variables—and the size of the CP-net would then be exponential in the number of variables. A second important difference, which holds as well for the
50
E. Bonzon, M.-C. Lagasquie-Schiex, and J. Lang
comparison with [12] and [9], is that players can express arbitrary preferences, including extreme cases where the satisfaction of a player’s goal may depend only of variables controlled by other players. A last (less technical and more foundational) difference with both lines of work, which actually explains the first two above, is that we do not map normal form games into anything but we express games using a logical language. Further work includes the investigation of other notions (such as dominated strategies) within the two frameworks proposed in this paper, as well as the integration of other preference representation languages within Boolean games.
References 1. K. R. Apt, F. Rossi, and K. B. Venable. CP-nets and Nash equilibria. In Elsevier, editor, Proc. CIRAS 2005 (Third International Conference on Computational Intelligence, Robotics and Autonomous Systems), Singapore, December 13-16 2005. 2. S. Benferhat, C. Cayrol, D. Dubois, J. Lang, and H. Prade. Inconsistency management and prioritized syntax-based entailment. In Proc. of the 13th IJCAI, pages 640–645, 1993. 3. E. Bonzon, MC. Lagasquie-Schiex, J. Lang, and B. Zanuttini. Boolean games revisited. available at ftp:ftp.irit.fr/pub/IRIT/RPDMP/ecai06.ps.gz, 2006. 4. C. Boutilier, R. Brafman, C. Domshlak, H. Hoos, and D. Poole. Cp-nets: A tool for representing and reasoning with conditional Ceteris Paribus preference statements. Journal of Artificial Intelligence Research, 21:135–191, 2004. 5. C. Boutilier, R. Brafman, C. Domshlak, H. Hoos, and D. Poole. Preference-based constrained optimization with cp-nets. Computational Intelligence, 20(2):137–157, 2004. Special Issue on Preferences. 6. C. Boutilier, R. I. Brafman, H. H. Hoos, and D. Poole. Reasoning with conditional ceteris paribus preference statements. In Proc. of UAI, 1999. 7. G. Brewka. Preferred subtheorie : An extended logical framework for default reasoning. In Proc. of the 11th IJCAI, pages 1043–1048, 1989. 8. S. Coste-Marquis, J. Lang, P. Liberatore, and P. Marquis. Expressive power and succinctness of propositional languages for preference representation. In Proc. of the 9th KR, pages 203– 212, 2004. 9. M. De Vos and D. Vermeir. Choice logic programs and Nash equilibria in strategic games. In Jorg Flum and Mario Rodriguez-Artalejo, editors, Computer Science Logic (CSL’99), volume 1683, pages 266–276, 1999. 10. D. Dubois, J. Lang, and H. Prade. Inconsistency in possibilistic knowledge bases: To live with it or not live with it. In Fuzzy Logic for the Management of Uncertainty, pages 335–351. Wiley and sons, 1992. 11. P.E. Dunne and W. van der Hoek. Representation and complexity in boolean games. In Proc. of JELIA, LNCS 3229, pages 347–359, 2004. 12. N. Foo, T. Meyer, and G. Brewka. LPOD answer sets and Nash equilibria. In M. Maher, editor, Proceedings of the 9th Asian Computer Science Conference (ASIAN 2004), pages 343–351. Chiang Mai, Thailand, Springer LNCS 3321, 2004. 13. H. Geffner. Default reasoning: causal and conditional theories. MIT Press, 1992. 14. P. Harrenstein. Logic in Conflict. PhD thesis, Utrecht University, 2004. 15. P. Harrenstein, W. van der Hoek, J.J Meyer, and C. Witteveen. Boolean games. In Proc. of TARK, pages 287–298, 2001. 16. J. Lang. Logical preference representation and combinatorial vote. Annals of Mathematics and Artificial Intelligence, 42:37–71, 2004. 17. D Lehmann. Another perspective on default reasoning. Annals of Mathematics and Artificial Intelligence, 15:61–82, 1995.
Agent-Based Flexible Videoconference System with Automatic QoS Parameter Tuning Sungdoke Lee1 , Sanggil Kang2 , and Dongsoo Han1 1
School of Engineering, Information and Communication University P.O. Box 77, Yuseong, Daejeon, 305-600 Korea {sdlee, dshan}@icu.ac.kr 2 Computer Science, Engineering of Information Technology The University of Suwon San 2-2, Wau-ri, Bongdam-eup, Hwaseong, Gyeonggi-do, 445-743 Korea
[email protected] Abstract. In this paper, we propose a new agent-based flexible videoconference system (AVCS) by modifying videoconference manger (VCM) agent in a conventional flexible videoconference system (FVCS). The proposed AVCS can more flexibly cope with changes in working conditions during videoconferencing than the conventional FVCS. It is because an automatic parameter tuning algorithm is imbedded to VCM dynamically adapt QoS (Quality of Service) parameters in the sense that the current working condition can meet with the desired working condition of the user, which can change in time during videoconferencing. In the experimental section, we design a new structure of the VCM with the automatic parameter tuning module, imbed to the prototype of FVCS and implement the new AVCS. Also, it is shown that the proposed AVCS outperforms the existing FVCS in the experiment.
1
Introduction
To maintain a stable conference session during videoconferencing [1]-[3] on various computer systems and network environments, users have to consider the working condition such as status of system resources on not only his/her side but also the other end, the variation on network service, and etc. A manual adjustment of system parameters for keeping the conference stable sometimes burdens to novice users. In order to reduce these burdens, the flexible videoconference system (FVCS) [4]-[7], which is a user support environment for videoconference based on a multi-agent system, has been studied by a few experts including us, to our best knowledge. FVCS can be modeled by adding some flexible features to a traditional videoconference system (VCS). FVCS can flexibly cope with the changes in user’s requirement and system/network environment by adapting its quality of service (QoS). Existing adaptive QoS control mechanisms adjust the QoS parameters1 such as smoothness, quality, and resolution by the strategic 1
In this paper, ”QoS parameter” and ”parameter” are exchangeable for the convenience.
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 51–60, 2006. c Springer-Verlag Berlin Heidelberg 2006
52
S. Lee, S. Kang, and D. Han
knowledge stored in an agent. We can catch a several drawbacks from the mechanisms such as; one is that their problem solving fashion is static because they cannot repeat the process of the adjustment at the same session. The other is that they cannot automatically cope with the changes of a user’s preference of the parameters during videoconferencing. In order to overcome the drawbacks, we modify the architecture of the videoconference manager (VCM) in conventional FVCS and develop an automatic parameter tuning module (APTM) which can reside in VCM. The APTM dynamically tunes the parameters in the sense that the current working condition can meet with the desired working condition of the user which usually changes in time during videoconferencing. Using the gradient method [8], the parameters are trained in the space of parameters until a set of optimal values of the parameters is obtained. In APTM, the parameters are differently tuned according to the order of users’ preference of the parameters. In the experimental section, we design a new structure of the VCM with APTM imbedded to the prototype of FVCS and implement the proposed agent-based flexible videoconference system (AVCS). Also, it is shown that of the proposed AVCS outperforms the existing FVCS in the experiment. The remainder of this paper is organized as follows. Section 2 overviews the background of videoconference systems and related work, which can help to understand our system. Section 3 explains our proposed architecture of VCM in the AVCS. In Section 4, we demonstrate our automatic parameter tuning algorithm. In Section 5, we design the proposed AVCS and analyze the operational situation. We also demonstrate the experimental results by comparing with existing FVCS. In Section 6, we draw our conclusion.
2 2.1
Background and Related Work Videoconference Systems
In traditional VCS, there are a couple of main methods for QoS control on application level: one is IVS [1] and the other is framework-based [2]. IVS was developed for the purpose of videoconferencing over Internet. It controls outgoing data transmission rate based on information about change in network condition by adjusting the parameters of video coders. Also, it can accept simple user requirement by defining policy on QoS control. Framework-based approach provides two fundamental challenges for constructing ”network aware” application: one is how to find out dynamic changes of quality in network service, the other is how to transfer network-centric quality measures to application-centric ones. The two methods explained above are simple to execute, however there are two critical limitations in them as following: (L1) Flexibility - They are designed for a specific network environment and their QoS control mechanism is simple, which are the main reasons why they have a difficulty in flexibility for an unexpected situation. (L2) Cooperation - In the traditional VCS, both sides of users do not exchange the information about their working condition such as the status of CPU resource, network condition, user requirement of QoS, and etc. at the
Agent-Based FVCS with Automatic QoS Parameter Tuning
53
other side. By cooperating during videoconferencing, a stable conference can be flexibly maintained from help of the other side in the case that the computer in his/her side can not adapt for an unexpected situation. The cooperation between two sides can be done by the cooperation protocol [5]. In order to overcome the limitations of the traditional QoS control mechanism, the multiagent-based flexible videoconference system has been developed, which is explained in the following section. 2.2
Flexible Videoconference System
Conventional FVCS has been promoted aiming at providing a user-centered communication environment based on agent-based computing technology [4][7]. The objective of the FVCS is to reduce lots of users’ overloads in utilizing traditional VCSs by effective use of traditional VCS software and expertise of designers/operators of VCSs. To lighten users’ burdens of VCSs, FVCS is attained by embedding the following functionality to the existing VCSs, i.e., (F1) service configuration function at the start of a session and (F2) service tuning function during the session. Here, (F1) composes the most suitable service configuration of VCS automatically by selecting best software modules and deciding their set-up parameters under the given conditions of environments and users. (F2) adjusts the QoS autonomously according to the changes of network/computational environments or user requirements against the QoS. The function can be realized by two phase tuning operations, i.e., parameter operation tuning for small-scale changes and reconfiguration of videoconference service for large-scale changes. In the FVCS, there are four parts such as Videoconference Manager (VCM) Agent, User Agent, Sensor Agent, and Service Agent. User Agent monitors the requirement of a user. The Sensor Agent monitors the status of network environment and CPU resource. The Service Agent executes the videoconference services such as video, audio, whiteboard, etc., in accordance with the requirements from the User Agent or the VCM Agent. The VCM Agent plays a role of management of other agents, e.g., analysis of other agents’ working conditions, control of message flow among agents, cooperation between two VCM agents (Video-Conf-Manager-A and Video-Conf-Manager-B), and tuning QoS parameters (F2). In general, the QoS parameter tuning is driven by the following steps: 1) some changes during videoconference are detected by the Sensor Agents or the User Agents, and it is reported to the VCM Agents. 2) The VCM Agents negotiate each other to decide suitable operation against the videoconference process agents. 3) The videoconference process agents set parameters. 4) The Sensor Agents check recovery status and report it to the VCM. 5) Repeat from 2) to 4) until the changes are recovered. In existing the VCM Agents, the rule-based programming language method [4]-[7] is used for adjusting the QoS parameters. The rule-based programming language method is performed by the knowledge base selection algorithm. It selects an appropriate value of each parameter with the trial and error fashion in the knowledge database. There are several drawbacks from the knowledge base selection algorithm (KBSA) such as: 1) Designers must know ahead of time all possible domain knowledge, i.e., the combination
54
S. Lee, S. Kang, and D. Han
Fig. 1. The model of VCM Agent
of the values of the QoS parameters. 2) The process of selecting QoS parameters cannot be repeated at the same session. 3) The process cannot automatically cope with the change of a user’s preference of QoS parameters during videoconferencing. In order to solve the problems, we develop an automatic parameter tuning algorithm and design a new VCM architecture in accordance with our algorithm which will be explained in detail in the following section.
3
Architecture of VCM Agent
In this section, we design a model of the VCM agent in accordance with developing the automatic parameter tuning algorithm explained in detail in the following. As viewed in Fig. 1, the VCM is composed of four parts such as Cooperation Module (CM), State Knowledge Module (SKM), Automatic Parameter Tuning Module (APTM), and Base Process Module (BPM). The Cooperation Module (CM) exchanges the information of the working conditions such as available CPU resource, bandwidth, user requirement, etc. with the SKM on a regular base with specific protocols. According to the information from SKM, the CM decides whether it requests the cooperation with the CM at the other user side. If the CM is not able to resolve the change in working condition for itself in a time limit, it sends message for cooperation to the CM in the other VCM Agent at the other end using a specific protocol. The State Knowledge Module (SKM) plays a role on not only detecting any change in working network environment but also storing it to the repository and informing it to the CM. If a change is detected on the working environment (or the status of computer resource) in its own computer, it notifies the change to the APTM and lets the APTM activate. Also, a user’s specific requirement for quality service is propagated to the APTM via the SKM. As seen in Fig. 2, the SKM is composed of two parts: one is the assemble environment and the other is the repository. The assemble environment sets up the initialization of the videoconferencing, based the information of working environment obtained from the previous videoconferences stored in the repository. During the videoconferencing, it can detect any change in the environment and send it to the
Agent-Based FVCS with Automatic QoS Parameter Tuning
55
Fig. 2. The structure of State Knowledge Module
repository. If the QoS parameters need to be adjusted, the information of the status of the environment is sent to the APTM. Then, the APTM tunes the QoS parameters in order to maintain an appropriate working environment by tuning the parameters. The algorithm of the automatic tuning algorithm is explained in the following section. The Base Process Module (BPM) plays a role on controlling the service agents including video, audio, and whiteboard.
4
Automatic Parameter Tuning Algorithm
Based on the information sent from the SKM, the APTM activates its parameter tuning algorithm. The parameters are differently tuned according to the order of user’s preference of the parameters in the sense that the working condition (CPU resource, network bandwidth, and etc.) meets to the desired working condition which can changes by users during videoconferencing. Using the gradient ascent or descent method, the parameters are trained in the space of parameters until a set of optimal values of parameters is obtained. The training is done with respect to obtaining an acceptable working condition as iterating the tuning process over the parameter space, whenever the current working condition changes. The adjustment is done by an automatic scheme demonstrated in the following. Let d(t) and y(t) denote a desired working condition (or an acceptable working condition) and the current working condition at time t, respectively. y(t) can be expressed in a function of a set of QoS parameters because the current resource depends on the values of the parameters as in Eq. (1). The available resource decreases if the values of parameters increase, and vice versa. y(t) = f (A(t))
(1)
where A(t) = [a1 (t), a2 (t), · · · , ai (t), · · · , an (t)] which is a set of parameters and ai (t) is parameter i at time t. The error, denoted as e(t), between the desired working condition and that of current working condition at time t can be expressed as: e(t) = |d − y(t)| (2)
56
S. Lee, S. Kang, and D. Han
As seen in Eq. (2), the value of e(t) depends on the value of y(t) depending on the value of A(t). The parameters have to be adjusted to the values with which the error becomes to zero or close to it. The adjusted parameters at the first iteration can be expressed as A(t + 1) = [a1 (t + 1), a2 (t + 1), · · · , ai (t + 1), · · · , an (t + 1)], where ”1” in the parenthesis means the first iteration of the adjustment and ai (t + 1) can be expressed as below: Δai (t + 1) = ai (t) + Δai (t)
(3)
where Δai (t) is the amount of adjustment for parameter ai (t) at the iteration. For the convenience, Δai (t) is set to a predetermined value of ratio with respect to ai (t), i.e., (4) Δai (t + 1) = ρi · ai (t) where ρi is a scaling factor. The size of the scaling factor implies the preference of the parameters by the user. If the value of the scaling factor of a parameter is larger, the user requires more high quality of service than others. For example, if the order of the preference of the parameters from the lowest to the highest is set to a1 (t), a2 (t), · · · , ai (t), · · · , an (t) then their scaling factors are ordered in ρn , ρn−1 , · · · , ρi , · · · , ρ1 . The value of scaling factor according to the preference of the parameters can be obtained from exhaustive empirical experience, which is demonstrated in the experimental section. Also, as seen in Eq. (3) and (4), the sign of the scaling factor implies a search direction in the parameter space. If the sign of a scaling factor is the positive, the parameter is adaptive to the direction in which the value of the parameter increases, and vice versa. During the tuning process, the sign can be varied, which make our tuning algorithm be capable to tune dynamically the parameters to desired working condition varying more than one time at one session of videoconference. Once the first iteration is done, the current working condition and the error can be expressed as y(t + 1) = f (A(t + 1)) (5) and e(t + 1) = |d(t) − y(t + 1)|
(6)
If e(t + 1) is reached to the satisfactory value then it ends the tuning process, otherwise the process of the adjustment continues until the error is tolerable. As seen in the equations from (4) to (6), the value of the scaling factor is the critical for the tuning process. If the scaling factor is too large then the speed of the tuning can be fast but the tuning process can be oscillated from the optimal points of the parameters as iteration goes on. If the scaling factor is too small then the speed of the tuning can be too slow but it can converge to a set of optimal values of the parameters. In order to avoid the problem, the amount of adjustment decreases as the iteration goes as seen in Eq. (7). (7) Δai (t + k) = Δai (t + k − 1)/2k−1 where k is the number of iteration.
Agent-Based FVCS with Automatic QoS Parameter Tuning
5 5.1
57
Experiment and Analysis Experimental Environment
The proposed architecture of the VCM based on the automatic parameter tuning algorithm is embedded to the AVCS and is implemented in the experimental environment viewed in Fig. 3. We designed a new VCM agent by modifying the parts of the repository and the knowledge base in ADIPS Framework [9] as an agent-based computing infrastructure. In BPM, videoconference-purpose vic [3] is used for controlling the service agents in accordance with the tuned QoS parameters such as frame rate, quality, and resolution. Video-A agent and Video-B agent out of service agents provide the information of vic to the VCM agent for the APTM to execute the tuning process. Also, the CPU-Monitor agent uses Unix OS standard command ”sar” for obtaining the information of CPU in order to investigate the status of CPU. The obtained information is sent to the VCM agent. The VCM agent and Sensor agent, and Service agent were written in TCl/Tk [10]. Also, SPARCstation Ultra1 (CPU clock 200MHz) was used as the hardware of the agent workspace and the videoconference terminal. The total size of the agents is about 1,800 lines.
Fig. 3. The experimental environment
5.2
Experimental Results by Variation of CPU Resources
In this section, we observe the behaviors of the proposed AVCS and the conventional FVCS according to CPU resource or user requirement change during videoconferencing. We add some extra load on CPU of WS-B at the side of UserB by a CPU load generator and observe the changes of QoS parameters (frame rate, encoding quality and resolution) of video processes serviced to User A or User B. For WS-A on the side of User-A, frame rate in movement of video is the highest priority of the preference parameter, video quality is the second highest priority, and video resolution is the lowest priority. On the other hand, for WSB on the side of User-B, video quality is the highest priority, frame rate is the
58
S. Lee, S. Kang, and D. Han WS-A
WS-B CPU load (WS-B) frame rate quality resolution
120
100
a
c
rate (%)
rate (%)
100
80
b
60
40
40
20
20
0
50
100
150
200
time (sec)
250
300
350
d
80
60
0
CPU load (WS-A) frame rate quality resolution
120
0
0
50
100
150
200
250
300
350
time (sec)
Fig. 4. The behavior of the conventional FVCS according to the variation of CPU load: (a) Change of QoS at User-A, (b) Change of QoS at User-B
second highest priority, and resolution is the lowest priority. Fig. 4 and 5 represent the transition of the parameters’ values controlled by VCM agents. In the graph, x-axis represents the time (second) and y-axis represents each parameter value observed at the recipient site. The parameter values can be expressed in terms of percentage in the case that the following values are regarded as 100%; CPU load: 100%, Frame rate: 30-fps, Quality: 32-level, Resolution: 5-level. The parameter tuning algorithms are supposed to activate when CPU load is over 80%. The acceptable level of CPU load is set to below 60%. Fig. 4 represents the experimental result using the rule-based programming language method applied to the conventional FVCS [4]-[7] by varying CPU load. From Fig. 5(a), when CPU load is increased by the CPU load generator on the side of WS-B at around 10 seconds (the first ) after starting videoconference, CPU load reaches to around 80% at around 60 seconds (point a) on the side of WS-A. At the point, the resolution, whose order of priority is the lowest, is first adjusted for degradation. It causes CPU load to decrease. From Fig. 5(b), quality at point b on the side of WS-B is adjusted for degradation. Also, frame rate, whose order of priority is the highest, is adjusted for small degradation. CPU load is decreased to some extent up to around 240 seconds (the second ). At 240 seconds, CPU load starts to increase again on the side of WS-B and reaches to over 80% at point c in Fig. 4(a) and at point d in Fig. 4(b). However, the parameters are not tuned anymore because of the characteristics of the conventional FVCS. From the result, it is revealed that the tuning method of the conventional FVCS is static so it is not able to flexibly cope with dynamical environment changes during one session, due to lack of the state knowledge stored in VCM. In addition, the protocols for cooperating between agents might not be appropriate in some cases. Fig. 5 represents the experimental result using our proposed AVCS in which APTM is imbedded to VCM. The experimental condition is the same as that of the conventional FVCS. In this experiment, the parameters are tuned according to the priority of the parameters. As seen in Eq. (4), for WS-A, the initial value of the scaling factor for each parameter is set as ρa1 = 0.196 for frame
Agent-Based FVCS with Automatic QoS Parameter Tuning WS-A
WS-B CPU load (WS-B) frame rate quality resolution
120
100
g
e
rate (%)
rate (%)
CPU load (WS-A) frame rate quality resolution
120
100
80
f
60
40
40
20
20
0
50
100
150
200
time (sec)
250
300
350
h
80
60
0
59
0
0
50
100
150
200
250
300
350
time (sec)
Fig. 5. The behavior of the proposed AVCS according to the variation of CPU load: (a) Change of QoS at User-A, (b) Change of QoS at User-B
rate, ρa2 = 0.324 for quality, and ρa3 = 0.525 for resolution, respectively. For WS-B, ρa1 = 0.182 for quality, ρa2 = 0.351 for frame rate, and ρa3 = 0.463 for resolution, respectively. The initial values were obtained by our exhaustive empirical experience. From Fig. 5(a), when CPU load is increased by the CPU load generator on the side of WS-B at around 10 seconds (the first ) after starting videoconference, CPU load reaches to around 80% at around 55 seconds (point e) on the side of WS-A. At the point, the parameters start to be tuned with the initial scaling factors by the automatic parameter tuning algorithm in APTM. However, the parameters start to be tuned at around 65 seconds (point f) on the side of WS-B, as seen in Fig. 5(b). CPU load is decreased to around up to 40% at around 240 seconds (the second ). In the meanwhile, the parameters are tuned to increase by automatically changing the direction of tuning from the negative to the positive, in order to efficiently use the 20% margin of CPU from the acceptable level (60%) of CPU. At 240 seconds, CPU load starts to increase again on the side of WS-B and reaches to over 80% at point g in Fig. 5(a) and at point h in Fig. 5(b). By the same process as the first parameter tuning step, the parameters start to be tuned and reach to the optimal points in which CPU load stays on an acceptable level.
6
Conclusion
In this paper, we have proposed a new AVCS by designing a new VCM in which the automatic parameter tuning module (APTM) is imbedded. From the experimental section, it can be concluded that the proposed AVCS is can more flexibly cope with the changes in working conditions, compared to the conventional FVCS. The AVCS adjusts the QoS parameters in a static fashion because it cannot repeat the process of the adjustment at the same session. However, the APTM can maintain a stable videoconference from the variations in working conditions during videoconferencing, by automatically tuning QoS parameters using the gradient method.
60
S. Lee, S. Kang, and D. Han
Despite encouraging results from our method, a couple of further studies are needed. In the experiment, we obtained the values of the scaling factors from exhaustive empirical experience. However, it does not guarantee that the values are the optimal initial scaling factors. In order to provide more automatic parameter tuning, an algorithm, which can decide the initial values of the scaling factors, is needed. Not only the proposed AVCS but also the conventional FVCS were developed for only two-user videoconference. We need to expand our method to multi-user videoconference because the multi-user videoconference is becoming more popular.
References 1. Turletti, T., Huitema, C.: Videoconferencing on the Internet. IEEE/ACM Trans. on Networking, Vol.4, No.3 (1996) pp.340-351. 2. Bolliger, J., Gross, T.: A framework-based approach to the development of networkaware applications. IEEE Trans. on Software Engineering, Vol.24, No.5 (1998). 3. MaCanne, S., Jacobson, V.: Vic: a flexible framework for packet video. ACM Multimedia Nov. (1995) pp.511-522. 4. Suganuma, T., Kinoshita, T., Sugawara, K., Shiratori, N.: Flexible videoconference system based on ADIPS framework. Proc. of the 3rd International Conference and Exhibition on the Practical Application of Intelligent Agents and Multi-Agent Technology (pAAM98) (1998) pp.83-100. 5. Lee, S. D., Karahashi, T., Suganuma, T., Kinoshita, T., Shiratori, N.: Construction and evaluation of agent domain knowledge for flexible videoconference system. IEICEJ, Vol.J83-B (2000) pp.195-206. 6. Suganuma, T., Imai, S., Kinoshita, T., Shiratori, N.: A QoS control mechanism using knowledge-based multiagent framework. IEICE Trans. Information and Systems, Vol.E86-D, No.8 (2003) pp.1344-1355. 7. Lee, S. D., Han, D. S.: Multiagent based adaptive QoS control mechanism in flexible videoconference system. ICACT 2004, Vol.II, Feb. (2004) pp.745-750. 8. Kang, S. G., Lim, J. Y., Kim, M. C.: Modeling the user preference on broadcasting contents using Bayesian networks. Journal of Electronic Imaging, Vol.14, No.2 [Online]: 0230221-10 (2005). 9. Kinoshita, T., Sugawara, K.: ADIPS framework for flexible distributed systems. Springer-Verlag Lecture Notes in AI, 1599 (1998) pp.18-32. 10. Ousterhout, J. K.: Tcl and the Tk Toolkit. Addison-Wesley (1994).
Kalman Filter Based Dead Reckoning Algorithm for Minimizing Network Traffic Between Mobile Game Users in Wireless GRID Seong-Whan Kim and Ki-Hong Ko Department of Computer Science, Univ. of Seoul, Jeon-Nong-Dong, Seoul, Korea Tel.: +82-2-2210-5316, fax: +82-2-2210-5275
[email protected],
[email protected] Abstract. Whereas conventional GRID service is static, wireless GRID supports mobility, and it should maintain geographic position to support efficient resource sharing and routing. When the devices are highly mobile, there will be much traffic to exchange the geographic position information of each mobile node, and this makes adverse effect on efficient battery usage and network congestion. To minimize the network traffic between mobile users, we can use dead reckoning (DR) algorithm for each mobile nodes, where each node uses the algorithm to estimates its own movement (also other node’s movement), and when the estimation error is over threshold, the node sends the UPDATE (including position, velocity, etc) packet to other devices. As the estimation accuracy is increased, each node can minimize the number of UPDATE packet transmission. To improve the prediction accuracy of DR algorithm, we propose Kalman filter based DR approach, and we also propose the adaptive Kalman gain control to minimize the number of UPDATE packet to distant device. To experiment our scheme, we implement a popular network game (BZFlag) with our scheme added on each mobile node, and the results show that we can achieve better prediction accuracy and reduction of network traffic by 12 percents. Keywords: Dead reckoning, Kalman filter, Wireless GRID.
1 Introduction Conventional GRID service supports no mobility, and results in many drawbacks such as continuous connection, waste of bandwidth, and service overloading. Wireless GRID supports mobility and it should consider geographic position to support efficient resource sharing and routing [1]. However, if the device in the GRID is highly mobile, there will be much traffic to manage the geographic position of each mobile node, and this make adverse effect on efficient battery usage. To minimize the network traffic between networking mobile devices, dead reckoning (DR) technique is used [2]. Each mobile device uses the algorithm to estimates its movement and other devices’ movement, thereby, each device can minimize the transmission of its in formation (position, velocity, etc) to other entities. R. Gossweiler and R. J. Laferriere Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 61 – 70, 2006. © Springer-Verlag Berlin Heidelberg 2006
62
S.-W. Kim and K.-H. Ko
introduced the DR algorithm for the multi-user game [2], and S. Aggarwal and H. Banavar proposed the use of globally synchronized clocks among the participating players and a time-stamp augmented DR vector that enables the receiver to render the entity accurately [3]. In addition, W. Cai and F. B.S. Lee proposed a multi-level threshold scheme that adaptively adjusted, based on the relative distance between entities to reduce the rate of transmitting UPDATE packets [4]. To improve the prediction accuracy of DR algorithm, we propose the Kalman filter based DR approach. To simulate the mobility of mobile device scenarios in wireless GRID, we use a simple analogy, network game (BZFlag). In section 2, we review the DR and Kalman filter. In Section 3, we propose a Kalman filter based DR algorithm. In Section 4, we apply our Kalman approach on BZFLAG game; show the experimental results with minimized UPDATE packets between game players. We conclude in section 5.
2 Related Works In the wireless mobile GRID, overall GRID area is partitioned intro several squares in GRID region. Each GRID region elects its own leader (gateway), and the leader performs location aware (exploit of geographic information). For Geographic location aware routing, we can use GEOCAST, GEOTORA, and GEOGRID [10]. • GEOCAST: GEOCAST sends a message to all mobile devices within a designated geographic area (so called a geo-cast region). It uses forwarding zone and multicast region. In the forwarding zone, the data packet is sent by unicast to each other’s device, and in the multicast region, the data packet is sent by multicast to each other’s device. • GEOTORA: GEOTORA derives from TORA (temporally ordered routing algorithm). TORA maintains a DAG (directed acyclic graph) with the destination device as sink; the data packet is forwarded by the DAG’s direction to sink. GEOTORA divides each region into TORA (DAG region) and GEOCAST region. In GEOCAST regions, mobile devices perform the flooding, and in the DAG region, mobile devices perform an anycast from the source to any host. • GEOGRID: GEOGRID uses two methods such as the flooding-based geo-casting and ticket-based geo-casting. The flooding-based geo-casting allows any grid leader in the forwarding zone to rebroadcast the messages, and the ticket-based geo-casting allows only ticket-holding grid leaders to rebroadcast. For these location-aware routing schemes, it is important to maintain precise location information on other devices. To maintain exact geographic information for all other device, it requires frequent location information exchange between devices, and it can easily make network congestion. To minimize the exact geographic information without heavy processing overhead is to use a kind of location prediction on each device.
Kalman Filter Based Dead Reckoning Algorithm for Minimizing Network Traffic
63
When mobile devices are physically distributed, each mobile device should maintain other device’s geographic state for efficient resource sharing and routing. As the number of mobiles increased, they should exchange their geographic information update request to each other, so that it can generate a large amount of communication and thus saturate network bandwidth. To reduce the number of UPDATE packets, the DR technique is used [4]. In addition to the high fidelity model that maintains the accurate position about its entities, each mobile device also has a DR model that estimates the position of all entities (both local and remote). Therefore, instead of transmitting state UPDATE packets, the estimated position of a remote mobile device can be readily available through a simple and localized computation [4]. Each mobile device M compares its real position with DR estimated position. If the difference between them is greater than a threshold, the M informs other mobile devices to update their geographic information of M [2]. Simple DR algorithm can be described as follows.
Algorithm: Dead Reckoning for every received packet of other device do switch received packet type { case UPDATE fix position information of other device break; } [Extrapolation] Extrapolate all device (including me)’s geographic information based on the past state information; if (my true position - my extrapolated position) > Threshold { Broadcast an UPDATE packet to all the other device }
3 Kalman Filter Based Dead Reckoning Algorithm In wireless GRID environment, each mobile device is geographically distributed. A technique referred to as DR is commonly used to exchange information about movement among the mobile devices [6, 7, 8]. Each mobile device sends information about its movement as well as the movement of the entities it controls to the other mobile devices using a DR update packets.. A update packets typically contains information about the current position of the entity in terms of x, y and z coordinates (at the time the update packet sent) as well as the trajectory of the entity in terms of the velocity component in each of the dimensions [3]. In this paper, we use the mobility of network game users to simulate the real geo graphically distributed mobile device environments. For the network game, we
64
S.-W. Kim and K.-H. Ko
present a Kalman filter based DR to optimize the network traffic. A Kalman filter is a recursive procedure to estimate the states sk of a discrete-time controlled process governed by the linear stochastic difference equation, from a set of measured observations tk. The mathematical model is shown in in Equation (1) and Equation (2). s k = As k −1 + w k −1
(1)
t k = Hs k + rk
(2)
The NxN matrix A represents an state transition matrix, wk is an Nx1 process noise 2 vector with N(0, σ w ), t is Mx1 measurement vector, H is MxM measurement matrix, k
2 and rk is Mx1 measurement noise vector with N(0, σ r ). To estimate the process,
ˆ−
Kalman filter uses a form of feedback control as shown in Figure 1 [5]. We define s k , sˆk , p k− and pk as the priori state estimate, posteriori state estimate, priori estimate error covariance, and posteriori estimate error covariance, respectively. Kalman gain.
Time Update (Predict)
is the
Measurement Update (Correct)
p k− H T Hp k− H T + σ r2
sˆ = Asˆk −1
Kk =
p k− = Ap k −1 AT + σ w2
sˆk = sˆk− + K k t k − Hsˆk−
− k
K
[
[
p k = [I − K k H ] p
]
]
− k
Fig. 1. Kalman filter cycle [5]
DR technique disseminates both location and movement models of mobile entities in the network so that every entity is able to predict or track the movement of every other entity with a very low overhead. The accuracy of location information affects the system performance; however, frequent transmission of DR UPDATE packets for increased accuracy can make network bandwidth overuse. To tradeoff the accuracy of location information and network bandwidth overuse, we can use the following consideration, as used in [11]. - As distance between devices increases, the accuracy requirement of geographic information gets decreased (“distance effect”). Figure 2 shows the distance effect. Entity N moves the same distance as entity M does. However, entity M is much far from P than N, so that entity P does not need to
Kalman Filter Based Dead Reckoning Algorithm for Minimizing Network Traffic
65
send UPDATE packet to M to maintain accurate geographic information in M. Layer concept is used to distinguish the accuracy of location information as distance [11]. Each layer applies different thresholds to classify the accuracy of location. However it requires processing overhead in each device because it should filter UPDATE packets considering distance between device P and destination device. Instead of using layer concept, we maintain the distance between the device P and every other device, and we set different Kalman gain (the allowable measurement nose) depending on the distance. For the device such as N and M as shown in Figure 2, we set higher obser-vation error allowance on N. In other words, the measurement noise’s variance for nearby entity N is higher than the measurement noise’s variance for far away entity M. Kalman filter defines measurement noise’s variance as Kalman gain, thereby we can set Kalman gain (measurement noise’s variance) to achieve distance effects.
Fig. 2. Tradeoff between accuracy of geographic information and network traffic: (a) distance effect, and (b) density effect
To evaluate our scheme, we experimented with simple DR scenarios (scheme 1) and optimized DR algorithm for game logic (scheme 2). From scheme 2, we added Kalman filter for better prediction (scheme 3), and added adaptive Kalman filter gain control over scheme 3 (scheme 4). The overall schemes are shown in Figure 3.
Scheme 1:
Scheme 2:
(x, y, z) DR extrapolated (x, y, z)
x, y, z (vx, vy, vz)
BZFlag DR
vx, vy, vz, angle
extrapolated (angle) extrapolated (x, y, z) extraploated (vx, vy, vz)
66
S.-W. Kim and K.-H. Ko
Scheme 3:
Scheme 4:
x, y, z
x, y, z vx, vy, vz angle
BZFlag DR Kalman filter
vx, vy, vz angle BZFlag DR Kalman filter
extrapolated (angle) extrapolated (x, y, z) extraploated (vx, vy, vz)
Adaptive Kalman gain depending on distance
extrapolated (angle) extrapolated (x, y, z) extraploated (vx, vy, vz)
Fig. 3. Kalman filter approach for DR algorithm
Scheme 1: We compute the extrapolated position using last position, last velocity, and time step as follows. We performed the extrapolation until the difference between the extrapolated position and the true position is under threshold. Extrapolated position = last position + last velocity * time step; Scheme 2: To get a new extrapolated position, the scheme uses two equations depending on the game entity’s motion type as follows. We performed the extrapolation until the difference between the extrapolated position and the true position is under threshold. if (linear motion) { extrapolated position = last position + last velocity * time step; } else { extrapolated position = BZFlag function(angle); } Scheme 3: Scheme 3 adds Kalman filter after computing the extrapolated (position, velocity, and angle) as scheme 1. Our DR algorithm (scheme 3) is described as follows. float speed = (vx * cosf(angle)) + (vy * sin(angle)); // speed relative to the tank's direction radius = (speed / angular_velocity); float inputTurnCenter[2]; // tank turn center float inputTurnVector[2]; // tank turn vector inputTurnVector[0] = +sin(last_angle) * radius; inputTurnVector[1] = -cos(last_angle) * radius;
Kalman Filter Based Dead Reckoning Algorithm for Minimizing Network Traffic
67
inputTurnCenter[0] = last_position-inputTurnVector[0]; inputTurnCenter[1] = last_position-inputTurnVector[1]; // compute new extrapolated angle using Kalman filter float angle = Kalman (time step * angular_velocity); float cos_val = cosf(angle); float sin_val = sinf(angle); // compute new extrapolated position const float* tc = inputTurnCenter; const float* tv = inputTurnVector; new_x = tc[0]+((tv[0] * cos_val) - (tv[1] * sin_val)); new_y = tc[1]+((tv[1] * cos_val) + (tv[0] * sin_val)); new_z = last_position + (vz * time step); // compute float vx = float vy = float vz =
new extrapolated velocity Kalman ((vx * cos_val) - (vy * sin_val)); Kalman ((vy * cos_val) + (vx * sin_val)); Kalman (vz);
Scheme 4: Scheme 4 considers distance effect to minimize the number of DR UPDATE packets. We achieve this minimization by controlling Kalman gain. Kalman gain depends on the allowable variance of measurement noise, and we set smaller Kalman gain as the distance between my device and another device gets longer. For example, the allowable variance is set to 0.001 for distant devices, 0.01 for middle distance, and 0.1 for nearby devices.
4 Experimental Results In this paper, we used a popular on-line game BZFlag (Battle Zone Flag) to experiment our scheme. BZFlag is a first-person shooting game, where each player in a team drives tanks within a battlefield. The aim of the players is to navigate and capture flags belonging to the other team and bring them back to their own area. The players shoot other player’s tanks using “shooting bullets” The movements of the tanks (players) as well as that of the shots (entities) exchanged among the players using DR vectors [3, 9]. The position and the velocity value of each player are sampled from real BZFlag game plays. We used 8301 samples, and set the UPDATE threshold to 0.09. Noticeably, BZFlag game uses the game optimized DR algorithm, which means that it considers two additional vectors (orientation and angle) to predict the position more accurately. This game optimized DR works very well, and it performs much better than usual DR schemes. To show the Kalman filter approach over DR, we compared the number of DR UPDATE packet transmission and the average prediction error E as shown in Equation (3). (x, y, z) represent the true position, (newx, newy, newz) represent the extrapolated position, and (n) represent the number of sample.
68
S.-W. Kim and K.-H. Ko n =8301
E=
∑ i =1
( xi − newxi ) 2 + ( yi − newyi ) 2 + ( z i − newzi ) 2
(3)
n
Table 1 shows the experimental result. It shows that the number of DR UPDATE packet transmission of scheme 1 (simple DR scheme) can be decreased using game optimized DR scheme as in scheme 2. Incorporating Kalman filter (scheme 3) makes more accurate prediction, and it decreases the number of DR UPDATE packets. Finally, we can use distance effect to decrease the number of DR UPDATE packets to distant devices. Table 1. Comparision of the # of DR UPDATE packet transmissions and prediction error E
(a)
(c)
Scheme 1
Scheme 2
Scheme 3
Scheme 4
# of DR UPDATE packet transmission
4658
700
644
634
E
4.511
2.563
0.363614
0.363484
(b)
(d)
Fig. 4. Error in X prediction: (a) errors in X direction, (b) enlarged for box in (a), (c) Error in Y prediction, and (d) enlarged for box in (c)
Kalman Filter Based Dead Reckoning Algorithm for Minimizing Network Traffic
69
Figure 4 compares the prediction errors in X and Y coordinates of scheme 2 and scheme 3. Scheme 2 shows fluctuations, and BZFlag clients should send DR UPDATE packets when the prediction error is over than threshold (we set 0.9). Minimizing DR packets also minimizes network latency and the game responses time. Figure 4 (a) and (c) show the overall prediction errors, and Figure 4 (b) and (d) show the detailed view on the prediction errors. Even in the detailed view, the prediction errors of scheme 4 are smaller than the prediction errors of scheme 3. Scheme 3 does not consider distance effect. It assures the same prediction accuracy of geographic information for nearby and distant mobile devices. However, as a mobile device gets far away from my device, it is hardly observable, and it does not require the same accuracy requirement as for the nearby mobile device. Scheme 4 considers the distance effect by controlling Kalman gain. Kalman gain depends on the variance of measurement noise, and we can assume large allowance for variance on the distant devices. For distant devices, the number of DR UPDATE packets is minimized; however the prediction error can be increased. However, as shown in Table 2, scheme 4 can even decreases the prediction error for distant de devices. Table 2. Ccomparison of scheme 3 and scheme 4 in # of DR packer transmission and errors
# of DR UPDATE packet transmission Near Middle Far
E Near
Middle
Far
Scheme 3
11
615
18
0.025864
0.613829
0.020523
Scheme 4
11
607
16
0.025864
0.613677
0.020373
5 Conclusions In this paper, we propose the Kalman filter approach to improve the DR algorithm for geographically oriented networking between mobile nodes in wireless GRID environments. Our scheme improves the accuracy of DR prediction, and minimizes the network traffic among the mobile devices, so that it can make efficient battery usage. Instead of experimenting geographically distributed mobile devices, we use a popular on-line game BZFlag, and compare our scheme with the state-of-the-art DR algorithm optimized for game logic. Our Kalman filter based DR scheme reduces more than 10% of network traffic over game optimized DR algorithms. More traffic reduction is achieved with adaptive Kalman gain control depending on the distance between my device and other device.
References 1. Zhang W., Zhang J., Ma D., Wang B., Chen Y.: Key technique research on GRID mobile servie. Proc. 2nd Int. Conf. Information Technology (2004)
70
S.-W. Kim and K.-H. Ko
2. Gossweiler, R., Laferriere, R.J., Keller, M.L., Pausch, R.: An introductory tutorial for developing multi-user virtual environments. Tele-operators and Virtual Environments, vol. 3. no. 4 (1994) 255-264 3. Aggarwal, S., Banavar, H., Khandelwal, A., Mukherjee, S., Rangarajan, S.: User experience: accuracy in dead-reckoning based distributed multi-player games. Proc. ACM SIGCOMM 2004 Workshops on Net-Games. Network and System Support for Games (2004) 4. Cai, W., Lee, F.B.S., Chen, L.: An auto-adaptive dead reckoning algorithm for distributed interactive simulation. Proc. of the thirteenth Workshop on Parallel and Distributed Simulation (1999) 5. Welch, G., Bishop, G.: An introduction to the Kalman filters. available in http://www. cs.unc.edu/~welch/Kalman/index.html 6. Gautier, L., Diot, C.: Design and Evaluation of MiMaze, a Multiplayer Game on the Internet. Proc. IEEE Multimedia. ICMCS (1998) 7. Mauve, M.: Consistency in Replicated Continuous Interactive Media. Proc. of the ACM Conference on Computer Supported Cooperative Work (2000) 181–190 8. Singhal, S.K., Cheriton, D.R.: Exploiting Position History for Efficient Remote Rendering in Networked Virtual Reality. Tele-operators and Virtual Environments. vol. 4. no. 2 (1995) 169-193 9. Schoeneman, C., Riker, T.: BZFlag (Battle Zone capture Flag), available in http:// www.bzflag.org 10. Tseng, Y.-C., Wu, S.-L., Liao, W.-H., Chao, C.-M.: Location awareness in ad hoc wireless mobile networks. IEEE Computer. vol. 34, no. 6, (2001) 46-52 11. Vijay K., Asmir R.D.: Performance of Dead Reckoning-Based Location Service for Mobile Ad Hoc Networks. Wireless Communications and Mobile Computing Journal. vol. 4, no. 2, (2004) 189-202
Affective Web Service Design Insu Song and Guido Governatori School of Information Technology & Electrical Engineering The University of Queensland, Brisbane, QLD, 4072, Australia {insu, guido}@itee.uq.edu.au
Abstract. In this paper, we propose that, in order to improve customer satisfaction, we need to incorporate communication modes (e.g., speech act) in the current standards of web services specifications. We show that with the communication modes, we can estimate various affects on service consumers during their interactions with web services. With this information, a web-service management system can automatically prevent and compensate potential negative affects, and even take advantage of positive affects. Keywords: E-commerce and AI Human computer interaction.
1
Introduction
In this paper, we discuss an important factor with regard to creating a successful Web services user experience that has been largely ignored. We propose that, in order to counter (detect, prevent, and resolve) negative affects that can be caused by Web services, we need to reconsider some of Web standards (e.g., WSDL) to include information about communication modes of interfaces (e.g., speech acts). Communication modes are important information for checking whether or not Web services are well-behaved. A Web service violating the well-behavedness can cause negative affects on its users. For instance, when a directive operation is invoked by a user (e.g., ‘add item A to my shopping cart’), it is a common knowledge that the user is expecting an acknowledgement within a certain time frame. A violation of this common knowledge can cause negative affects on the users. Despite great deal of efforts on semantic web, this factor has been largely ignored. Thus, currently a large amount of development time is spent on making sure that Web services behave as intended. For example, in most e-commerce environments, the following problems frequently arise: 1. Customers often feel ignored or uncertain because prospective events, such as delivery notices, are not informed properly. 2. For customer services personals, it is hard to feel how customers are affected by the overall processes. Therefore, it is difficult to provide more adaptive and reasonable services and differences in customers’ situations are often ignored. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 71–80, 2006. c Springer-Verlag Berlin Heidelberg 2006
72
I. Song and G. Governatori
3. Promises are often not full-filled. E-commerce Web-sites often promise customers for certain behaviors of their services in their Web site, such as promotions, but actually certain behaviors do not meet such information. We propose that Web services can be guaranteed to meet certain acceptable quality by incorporating affective computing, and thus avoiding these common problems. Affective computing is not a new concept. In designing Web applications (e.g., e-commerce applications) that carry out a certain set of goals for human users, such as purchase orders, the importance of the affects of such applications has already been put forward to system designers’ attention (e.g., [24,12]). Therefore, there has been much research into evaluating human emotions [11,6], expressing emotions [17,4], and the effectiveness of such approaches to improving humancomputer interactions [16,5,3,25]. Business communities also have been aware of the significance of customer satisfaction in measuring business performances (e.g., American Consumer Satisfaction Index [1]) because for most companies, repeat customers are major contributors to profit [1]. Preventing negative affects on computer users is also one of the primary goals of HCI communities [20]. However, there are still no well defined languages to represent or account for affects of Web services on human users. As a result, current Web service design approaches do not have means for representing and estimating affects on users. Thus, it is impossible for Web service management systems to prevent possible negative affects or to take advantage of positive affects. In this paper we define well-behaved protocols of Web-services based on speech act theory; and a method to evaluate various affects on the users when the Webservices violate such protocols based on cognitive appraisal theories of emotion. The rest of the paper is organized as follows. In the next section we discuss the issue of incorporating speech acts in Web service specifications. In Section 3, we define well-behavedness of Web service interfaces. In Section 4, we develop a method to evaluate affects on users during their interactions with Web services. Finally we conclude with comparisons with other approaches and some remarks.
2
Web Service Definition
Let us investigate the information that we can obtain from web service specifications in WSDL [7] which is a W3C standard for defining Web services. A WSDL document defines a set of interfaces through which consumers can interact with a Web service. Although WSDL standard defines four operation (interface) primitives that a service can support, in order to make the presentation more readable, in this paper we consider only two types of one-way operations: (One-way) the service receives a message, (Notification) the service sends a message. Since other primitive operation types defined in WSDL can be modelled abstractly using the two one-way messages, this is not a big limitation. We represent a Web service W as a structure: W = (OPi , OPo , M, DB, P L)
Affective Web Service Design
73
where OPi is a set of input operation types, OPo is a set of output operation types, M is a set of message types, DB is a database definition (e.g., DTD specifications for XML views of relational databases), P L is a set of plans (or procedures) each of which achieves certain goals. OPi , OPo , and M can be obtained directly from WSDL specifications of Web services. Let us represent each message type in M as follows: M essageT ypeN ame(P art1, ..., P artn ) which is just an alternative representation of WSDL message definitions. We also represent each operation type in OPi and OPo as OpN ame(m) where OpN ame is the operation name of the operation and m is a message type. Unfortunately, WSDL specifications do not tell us much about the meanings of the operations. In particular, there is no way to tell what are the speech acts of the messages exchanged between Web services.and service consumers. Without the speech act information of a message, it is impossible to know the intention of the message. Thus, we need to provide speech act information for each operations. According to the speech act theory of Searle [26], each communication interaction is an attempt to perform an illocutionary act such as a request and an assertion. Therefore, interactions can be classified into five types (illocutionary points) of illocutionary acts, but we find that mostly only four types (directive, assertive, commissive, declarative) are used in user-service communications. We define an operation-to-speech-act mapping function as follows: M Smap : O → U × F × Q where O is a set of operations, U is a set of users, F = {directive, assertive, declarative, commissive} is a set of illocutionary points, Q is a set of XPath queries. A query is a specification for checking the achievement of the operation. M Smap(op) returns a triple (u, f, q) for an operation instance op. We now suppose that this mapping function is given. Example 1. Let us consider a shopping cart Web service example defined as follows: OPi ={AddItem(Item))} M ={Item(name)}
OPo ={Response(Item))} P L ={(AddItem(Item)→p1)}
P L has only one plan, p1, whose goal is AddItem(Item). Suppose DB is an XML view of a relational database and its schema is defined by the following DTD specification: root = basket R={basket ← customerName, basketItem*; basketItem ← itemName; customerName ← #PCDATA; itemName ← #PCDATA;} The root item ‘basket’ represents a shopping basket of a customer in an online shopping Web service. It can have zero or more items. Now, let’s suppose the
74
I. Song and G. Governatori
process receives the following request message from a user u through one-way operation AddItem: AddItem(Item(‘Book’)); and suppose the mapping function returns a triple (u,f ,q) with the following values: f = directive q=“/basket[customerName=u]/ basketItem[itemName=’Book’]/itemName/text()” Then, the service upon receiving the message executes plan p1 for goal AddItem(Item(‘Book’)). The plan performs some actions that will eventually lead to an update to the database so that the XPath query q will return ‘Book’. An instance op of incoming operation can have one of the following goal types depending on the illocutionary point of the operation: 1. If f is a directive, the goal is op meaning that the service achieves op. For example, AddItem(Item(‘Book’)). 2. If f is an assertive or declarative, the goal is Bel(s, op) meaning that the service s believes op. For example, Details(PhoneNo(‘5555’)). 3. If f is a commissive, the goal is Bel(s, Int(u, op)) meaning that the service s believes that the customer u intends to achieve op for the service. where Bel(s, p) means s believes p, Int(u, p) means u intends p, and thus Bel(s, Int(u, p)) means s believes that u intends p. For outgoing messages op, if they are commissive, the goal is just op meaning that the service promises to achieve op. Other types of outgoing messages do not create goals. They are either assertive (informing messages) or directive and treated as actions produced by services.
3
Well-Behaved Web Service Interface
With the information of illocutionary point of each message, we can now define how a Web service should interact with its users. In this paper, we consider two cases: directive one-way operations and commissive notification operations. When a user of a Web service knows that the service has received a message containing the user’s goal and the service is responsible for achieving it, the user expects the service to inform the user an acknowledgement or whether the goal is achieved or not. The acknowledgement means that (a) the message is received; and (b) the goal will be achieved within a certain time limit; and (c) if the goal is achieved, the user will be informed. If the goal is a promised goal, acknowledgement is not necessary. In both cases, if the goal cannot be achieved in a certain time limit, the process must send a delay message telling the user to wait for a certain time. These are just basic protocols that are expected by most human users. Similar protocols are also defined in an agent communication language called FIPA-ACL [14]. Let us call Web service interfaces conforming to the above descriptions wellbehaved-interfaces. Figure 1 shows state flow diagrams for well-behaved-interfaces
Affective Web Service Design Receive directive op
Send commissive op gt
rt
rt
Confirm d
gt
rt
rt
Ack gt
gt
Disconfirm
75
Confirm
d
d
gt
gt
Delay(d)
d Disconfirm
d
Delay(d) d
Fig. 1. Well behaved web-service interface state flow diagrams for a directive one-way operation and a commissive notification operation. It is assumed that the response time, rt, is smaller (earlier) than the goal achievement time, gt: rt ≤ gt.
for both a directive one-way operation and a commissive notification operation. In the figure, upon receiving a directive operation, the service must response within a certain response time rtg with one of the following messages: 1. 2. 3. 4.
a confirming message conf img or a disconfirming message disconf irmg .or a delay message delayg (d) or an acknowledgement message ackg .
If the response is either conf irmg or disconf irmg , the communication ends. But, if the response is delayg (d) or ackg , the user is expecting one of the above messages (except ackg ) again within a certain goal achievement time, gtg . If the following response is delayg (d), this time the user is expecting one of the above messages (except ackg ) within d. The response behavior for a commissive notification operation is similar to a directive operation except that there is no acknowledgement as shown in Figure 1.
4
Detecting Affects on Users
Now, we consider how human users might be affected (or felt) if Web services are not well-behaved. There are many reasons that Web services cannot conform to the definitions of well-behaved-interface – the Internet is not always reliable and there are situations that Web-service designers have not anticipated or they are beyond the control of the services such as delay of back order items and natural disasters. We propose a simple and effective method to estimate affects on the users during their interactions with Web services. The method relies on prospective events which are main derivers of emotional states of human users according to cognitive appraisal theories of emotion (e.g., Ortony, Collins and Clore (OCC) [23] ). 4.1
Prospective Events of Web Services
We consider two classes of goals: directive goals created by directive one-way operations and promised goals created by commissive notification operations.
76
I. Song and G. Governatori Table 1. Prospective events for a goal g and OCC classification of the events
Event Names goal failure time event response failure time event confirming event disconfirming event delay event acknowledgement event
Symbols gfeg rfeg ceg dceg dlyeg ackeg
OCC Types Prospective Unexpected Desirable Undesirable Unconfirmed Confirming Disconfirming
Symbols all ceg gfeg , rfeg , dceg dlyeg , ackeg ceg dceg
Table 2. Interface based classification of events Event Types Symbols User side time events: T Eg gfeg , rfeg Informing events: IEg ceg , dceg , dlyeg , ackeg
Given the two classes of goals, we can now enumerate the events that are relevant to these goals. But first, we make two important assumptions. When a user interacts with the service, it is reasonable to assume the followings: 1. Users know the meaning of each interface; 2. Users are aware of (or accustomed to) all prospective events related with the goals. Based on these assumptions, we obtain the user side time events listed in Table 2. The service responsible for the goals must struggles to prevent these events occurring by producing informing events listed in the same table. Table 1 shows the descriptions of the event symbols. Table 1 also shows Ortony, Collins and Clore (OCC) [23] classification of these events. OCC have proposed a classification of events and their affects on (causes emotions) on communication participants. As shown in Figure 1, for a directive goal g, there will be the time rtg to inform of the acknowledgement of the acceptance of the goal to the user and the time gtg to fulfill the goal before the user aware of undesirable events. However, when the user is not informed of the achievement of the goal within the goal achievement time gtg , a goal failure time event gf eg fires. When neither the achievement nor an acknowledgement is informed to the user within the response time rtg , a response failure event rf eg fires. As shown in Figure 1, for promised goals, rtg is not necessary and only goal failure event gf eg will occur. Any user side time events {gf eg , rf eg } can cause negative emotions on the user. Thus, the process must create appropriate informing messages to prevent the user side time events occurring. According to Figure 1, there are four possible types of informing events: confirming events ceg , disconfirming events dceg , delay events dlyeg , and acknowledgement events ackeg . Each of these events occurs when the service sends the corresponding notification messages. The following production rules (whose conclusions are evaluated when their conditions are satisfied) summaries the event firing policies for the user side time events:
Affective Web Service Design
77
r1 : directiveg ∧ (rtg ≤ t) ∧ ¬(ackeg ∧ dlyeg ∧ ceg ∧ dceg ) → rf eg r2 : (gtg ≤ t) ∧ ¬(ceg ∧ dceg ) → gf eg r3 : delayg (d) → (gtg = gtg + d) Rule r1 says that if the response time has passed and there have been no responses at all for a directive goal g, a response failure time event rf eg occurs. r2 says that if the goal achievement time has passed and there have been neither a confirming message nor a disconfirming message, then a goal failure time event gf eg occurs. r3 says that a delay message resets the goal achievement time. If a failure event occurs, a new promising goal g should be created in order to compensate the failure. The promising goal can only be formulated if we know the affect of the failure on the user. 4.2
Estimating Emotional States
This section describes how emotional states can be deduced from the prospective events of Web services based on the work of the Ortony, Collins and Clore (OCC) [23] cognitive appraisal theory of emotion which is one of the most widely accepted emotion models. The OCC model defines twenty-two emotion types, but we only describe six of them in this paper: hope, satisfied, fear, fears-confirmed, disappointment, and reproach. These emotions are prospective-based emotions that are emotions in response to expected and suspected states and in response to the confirmation or disconfirmation of such states [23]. Although the events we have described provide significant information to estimate users’ emotional states, there can be always many other sources that can affect the users. Thus, we cannot use strict rules to capture relations between the events and emotional states. We use a fragment of Defeasible Logic (DL) [22] which is a popular nonmonotonic logic that is simple and computationally efficient. In DL, A defeasible rule L ⇒ c consists of its antecedent L which is a finite set of literals, an arrow, and its consequent c which is a literal. A literal is an atom p or its negation ¬p. A defeasible rule a1 , ..., an ⇒ c can be expressed in the following logic program (without the monotonic kernel and the superiority relation of DL): supported(c):- conclusion(a1 ), ..., conclusion(an ). conclusion(c):- supported(c), not supported(∼ c), not strictrly(∼ c). where ∼ c is the complement of literal c; conclusion(c) denotes that c is defeasibly provable; strictrly(c) denotes that c is strictly provable. Then, the following defeasible rules roughly capture the relations between the events and emotional states: R0. R1. R2. R3. R4. R5.
⇒ ¬gf eg ⇒ ¬rf eg , ceg ⇒ ¬hopeg , dceg ⇒ ¬hopeg , gf eg ⇒ ¬hopeg ceg ⇒ ¬f earg , dceg ⇒ ¬f earg , gf eg ⇒ ¬f earg directiveg , ¬rf eg ⇒ hopeg directiveg , rf eg ⇒ f earg directiveg , ¬rf eg , ¬gf eg , ceg ⇒ satisf iedg
78
I. Song and G. Governatori
R6. R7. R8. R9. R10.
commissiveg , ¬gf eg , ceg ⇒ satisf iedg directiveg , rf eg , gf eg ⇒ f earconf irmg directiveg , rf eg , dceg ⇒ f earconf irmg directiveg , ¬rf eg , gf eg ⇒ disappointg directiveg , ¬rf eg , dceg ⇒ disappointg commissiveg , gf eg ⇒ disappointg commissiveg , dceg ⇒ disappointg directiveg , rf eg , ceg ⇒ relievedg
Rules in R0 are assumptions that response failure events and goal failure events are not occurred. Rules in R1 and R2 say that if a communication is ended, a user usually feels neither hope nor fear. R3 says that a user usually feels hope over a desired goal g if no fear prospect (rf eg ) is triggered. R4 says that a user usually feels fear if a desirable goal seem to be failing (rf e(g)). Rules in R5 and R6 say that a user usually feels satisfied if a desirable goal (directive or commissive) is fulfilled without a trouble. Rules in R7 say that a user usually feels fear-confirmed if a desirable goal (directive(g)) that seems to be failing is actually failed. Rules in R8 and R9 say that a user is usually disappointed if a desirable goal (directive or commissive) is failed. R10 says that a user is usually relieved if the user has had fear over a desirable goal (directive), but it is actually achieved. The rules can be used to predict and estimate various affects that a Web service can cause on it users. With this information, a Web-service management system can prevent potential negative effects, compensate negative effects (e.g., sending an apology gift when a delivery is delayed), and even take advantage of positive affects (e.g., advertising when goods are successfully delivered without any troubles). This is a perfect technology that can tell when it is acceptable to send spam messages or to show pop-up advertisements.
5
Conclusion
Intelligent agent research communities have been working on various agent communication languages (e.g., KQML[13], FIPA-ACL[14]) [21,28] based on speech act theory. Speech act theory [27,8,9,10] has helped defining the types of messages based on the concept of illocutionary point, which constraints the semantics of the communication act itself [18, p.87]. It is also used as a basic ontology in organizational management systems [15] and in a conversation model for Web services [2]. However, Web service development communities have largely ignored the semantic issues of interactions. Currently, most works on Web service focus on design tools, infrastructure, and Web service composition. Thus, the standards developed for Web services (e.g., WDSL) mainly focus on the syntaxes of the description languages (e.g., WSDL), structural issues, or operational semantics (e.g., BPEL1 ) largely ignoring various service quality issues. Thus, there are 1
http://www.ibm.com/developerworks/webservices/library/ws-bpel/
Affective Web Service Design
79
currently no standard ways to represent necessary data for quality management; it is difficult to compose Web services that meets the minimum requirement for well-behaved Web services for human users. In contrast to usability evaluation methods (e.g., [19]), we only focus on the affects on the Web services users rather than efficiency oriented issues such as cognitive workload, performance, easy of use, and easy to learn. We should also note that our work does not require any of direct measurements of users such as brain activity, facial expression unlike existing approaches of affective computing. All information required is provided by the illocutionary points of operations and the emotion generation rules because our work only account for affects related to the goals that the services promise to deliver. However, actually these affects are the main issues that must be addressed. This paper proposed that we need to incorporate communication modes in the current standards of Web-service specifications in order to improve customer satisfaction. We showed that with the communication modes, we can define wellbehavedness of Web-service interfaces and estimate various affects on customers during their interactions with Web services. The result is important, since more businesses are relying on Web services as their primary contacts of customers.
References 1. ACSI. American customer satisfaction index. http://www.theacsi.org, 2004. 2. L. Ardissono, A. Goy, and G. Petrone. Enabling conversations with Web services. In Proc. AAMAS 2003, pages 819–826, 2003. 3. Lesley Axelrod and Kate Hone. E-motional advantage: performance and satisfaction gains with affective computing. In CHI ’05: CHI ’05 extended abstracts on Human factors in computing systems, pages 1192–1195, New York, NY, USA, 2005. ACM Press. 4. Alberto Battocchi, Fabio Pianesi, and Dina Goren-Bar. A first evaluation study of a database of kinetic facial expressions (dafex). In Proc. ICMI ’05: the 7th international conference on Multimodal interfaces, pages 214–221, New York, NY, USA, 2005. ACM Press. 5. Timothy W. Bickmore and Rosalind W. Picard. Establishing and maintaining long-term human-computer relationships. ACM Trans. Comput.-Hum. Interact., 12(2):293–327, 2005. 6. Carlos Busso, Zhigang Deng, Serdar Yildirim, Murtaza Bulut, Chul Min Lee, Abe Kazemzadeh, Sungbok Lee, Ulrich Neumann, and Shrikanth Narayanan. Analysis of emotion recognition using facial expressions, speech and multimodal information. In Proc. ICMI ’04: the 6th international conference on Multimodal interfaces, pages 205–211, New York, NY, USA, 2004. ACM Press. 7. Erik Christensen, Francisco Curbera, Greg Meredith, and Sanjiva Weerawarana. Web services description language (WSDL). W3C: http://www.w3.org/TR/wsdl, 2001. 8. Philip R. Cohen and Hector J. Levesque. Performatives in a rationally based speech act theory. In Meeting of the Association for Computational Linguistics, pages 79–88, 1990. 9. Philip R. Cohen and Hector J. Levesque. Rational interaction as the basis for communication. In Intentions in Communication, pages 221–255. MIT Press, Cambridge, Massachusetts, 1990.
80
I. Song and G. Governatori
10. Philip R. Cohen and Hector J. Levesque. Communicative actions for artificial agents. In Proc. ICMAS ’95, pages 65–72, San Francisco, CA, USA, 1995. The MIT Press: Cambridge, MA, USA. 11. Erica Costantini, Fabio Pianesi, and Michela Prete. Recognising emotions in human and synthetic faces: the role of the upper and lower parts of the face. In Proc. IUI ’05, pages 20–27, New York, NY, USA, 2005. ACM Press. 12. F.N. Egger. Affective design of e-commerce user interfaces: How to maximise perceived trustworthiness. In Proc. CAHD2001: Conference on Affective Human Factors Design, pages 317–324, 2001. 13. Tim Finin, Richard Fritzson, Don McKay, and Robin McEntire. Kqml as an agent communication language. In Proc. CIKM ’94: the third international conference on Information and knowledge management, pages 456–463, New York, NY, USA, 1994. ACM Press. 14. FIPA [Fondation for Intelligent Physical Agents]. FIPA ACL message structure specification. http://www.FIPA.org, 2002. 15. Fernando Flores, Michael Graves, Brad Hartfield, and Terry Winograd. Computer systems and the design of organizational interaction. ACM Trans. Inf. Syst., 6(2):153–172, 1988. 16. Kiel M Gilleade and Alan Dix. Using frustration in the design of adaptive videogames. In Proc. ACE ’04: the 2004 ACM SIGCHI International Conference on Advances in computer entertainment technology, pages 228–232, New York, NY, USA, 2004. ACM Press. 17. Lisa Gralewski, Neill Campbell, Barry Thomas, Colin Dalton, David Gibson, and University of Bristol. Statistical synthesis of facial expressions for the portrayal of emotion. In Proc. GRAPHITE ’04: the 2nd international conference on Computer graphics and interactive techniques in Australasia and South East Asia, pages 190– 198, New York, NY, USA, 2004. ACM Press. 18. Michael N. Huhns and Larry M. Stephens. Multiagent systems and societies of agents. In Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, chapter 2, pages 79–120. The MIT Press, Cambridge, MA, USA, 1999. 19. Melody Y. Ivory and Marti A. Hearst. The state of the art in automating usability evaluation of user interfaces. ACM Computing Surveys, 33(4):470–516, December 2001. 20. Jonathan Klein, Youngme Moon, and Rosalind W. Picard. This computer responds to user frustration. In Proceedings of ACM CHI 99, volume 2, pages 242–243, 1999. 21. Yannis Labrou, Tim Finin, and Yun Peng. Agent communication languages: The current landscape. IEEE Intelligent Systems, 14(2):45–52, March/April 1999. 22. M. J. Maher and G. Governatori. A semantic decomposition of defeasible logics. In AAAI ’99, pages 299–305, 1999. 23. A. Ortony, G. L. Clore, and A. Collins. The Congnitive Structure of Emotions. Cambridge University, Cambridge, 1988. 24. Rosalind W. Picard. Affective Computing. MIT Press, 1997. 25. Rosalind W. Picard. Affective computing: challenges. International Journal of Human-Computer Studies, 59(1/2):55–64, 2003. 26. John Searle. Expression and Meaning. Cambridge Univesity Press, Cambridge, England, 1979. 27. Munindar P. Singh. A semantics for speech acts. In Readings in Agents, pages 458– 470. Morgan Kaufmann, San Francisco, CA, USA, 1997. (Reprinted from Annals of Mathematics and Artificial Intelligence, 1993). 28. Michael Wooldridge. An Introduction to Multiagent Systems. John Wiley Sons, 2001.
An Empirical Study of Data Smoothing Methods for Memory-Based and Hybrid Collaborative Filtering Dingyi Han, Gui-Rong Xue, and Yong Yu Shanghai Jiao Tong University, No. 800, Dongchuan Road, Shanghai, 200240, China {handy, grxue, yyu}@sjtu.edu.cn Abstract. Collaborative Filtering (CF) techniques are important in the e-business era as vital components of many recommender systems, for they facilitate the generation of high-quality recommendations by leveraging the similar preferences of community users. However, there is still a major problem preventing CF algorithms from achieving better effectiveness, the sparsity of training data. Lots of ratings in the training matrix are not collected. Few current CF methods try to do data smoothing before predicting the ratings of an active user. In this work, we have validated the effectiveness of data smoothing for memory-based and hybrid collaborative filtering algorithms. Our experiments show that all these algorithms achieve a higher accuracy after proper smoothing. The average mean absolute error improvements of the three CF algorithms, Item Based, k Nearest Neighbor and Personality Diagnosis, are 6.32%, 8.85% and 38.0% respectively. Moreover, we have compared different smoothing methods to show which works best for each of the algorithms.
1
Introduction
With the ability of predicting user preferences, recommending systems can help business companies to better analyze their product markets in today’s e-business era. As the core techniques, collaborative filtering (CF) algorithms have proved their abilities of generatiing high-quality recommendations by leveraging the similar preferences of community users. There are two major classes of prediction algorithms: model-based algorithms and memory-based algorithms. Model-based algorithms first learn a descriptive model of users, items and/or ratings, appealing to which they generate recommendations. Generally, they are memory economical but time-consuming to build and update the models. Algorithms within this category include Bayesian network approach [1], clustering approach [2,3], the aspect models[4], etc. On the other hand, memory-based algorithms just maintain a table of all user ratings for all items, across which they perform some computation for each prediction. Compared to model-based ones, they are usually memory-consuming but accurate. The notable examples include the Pearson-Correlation based approach [5], the vector similarity based approach [1] and the extended generalized vector-space model [6]. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 81–90, 2006. c Springer-Verlag Berlin Heidelberg 2006
82
D. Han, G.-R. Xue, and Y. Yu
However, there is still a major problem preventing CF algorithms from achieving better effectiveness, the sparsity of training data. Lots of ratings in the training matrix are not collected. It is mainly caused by the cost consuming factors in time and effort of the data collection process. Take the popular data set EachMovie, as an example. It is an explicit voting example set using data from the EachMovie collaborative fltering site 1 deployed by Digital Equipment Research Center from 1995 through 1997 2 . It includes over 2.1 million ratings (ranged in value from 0 to 5) of 1,623 movies by 61,265 users. Although the collected rating number is great, the density of the data set is still only 2.1%. Few current CF methods try to smooth the sparse training matrix before predicting the ratings of an active user. Most of them analyze the training data directly to find similar user/item or to construct a model. Though a hybrid memory-and-model-based approach proposed by Pennock et al. in 2000 [7], named Personality Diagnosis (PD), does induct a simple smoothing method and attains a better results, it just assigns a uniform distribution over ratings to the blanks and does not exploit the information in the training data. Therefore, to find some effective smoothing methods becomes meaningful to recommending systems as they may improve the prediction accuracy. Meanwhile, smoothing methods can also lower the cost of collecting data for training. Since they can fill in the blank elements in the rating matrix, the training data may not be needed as much as before. i.e. CF may work with less ratings. In this work, we have validated the effectiveness of data smoothing methods for memory-based and hybrid collaborative filtering algorithms. In our experiments, the average mean absolute error improvements of the three CF algorithms, Item Based, k Nearest Neighbor and Personality Diagnosis, are 6.32%, 8.85% and 38.0% respectively. Moreover, we have compared different smoothing methods to show which works best for each of the algorithms. The rest of the paper is organized as follows: We first discuss the related work in Section 2. Section 3 introduces some background knowledge, including our notation, problem definition and the three CF algorithms in brief. It is in Section 4 that the smoothing methods are described. The experimental results and analysis are given in Section 5. Section 6 is the conclusion part.
2
Related Work
There are currently three kinds of algorithms to deal with the sparsity problem. The first one is to do dimension-reduction. The second one is to acquire additional information. The last and latest one is to do data smoothing. Dimension-reduction methods aims at reducing the dimensionality of the useritem matrix directly. Principle Component Analysis (PCA) [8] and information retrieval techniques such as Latent Semantic Indexing (LSI) [9,10] are used. Zeng proposed to compute the user similarities by a matrix conversion method [11]. These approaches address the sparsity problem by removing unrepresentative or 1 2
http://www.eachmovie.com/. For more information, please visit http://research.compaq.com/SRC/eachmovie/.
An Empirical Study of Data Smoothing Methods
83
insignificant users of items to condense the user-item matrix. However, some potentially useful information might also be removed during the reduction process. Content-boosted CF [12,13] approaches require additional information regarding items as well as a metric to compute meaningful similarities among them. In [14], Popescul et al. also proposed a unified probabilistic model for integrating content information to solve the sparse-data problem. Most previous studies have demonstrated significant improvement in recommendation quality. However, such information may be difficult or expansive to acquire in practice. Gui-Rong et al. have proposed a novel cluster-based smoothing method to solve the sparsity problem as well as the scalability problem [15]. They try to use smoothing methods to solve the sparsity problem and achieve a higher accuracy when using KNN algorithm to predict ratings. However, whether the method works on other CF algorithms are left unknown.
3 3.1
Background Smoothing Method Definition
Let U be the user set, n the user number, m the item number. Denote the n × m matrix of all the ratings as R. The rating of user i for item j is Rij . Ri∗ is user i’s rating vector. R∗j is item j’s user rating vector. The value of Rij maybe an true or a blank (marked as ⊥). provided integer Rij The CF problem thus can be described as the follows. Given a rating matrix of an active user i. CF is a function cf that can predict R and some rating Rtrue i∗ true a blank rating Rij = ⊥ of user i. i.e. Rij = cf (R, Rtrue the i∗ ). The closer to Rij predicted Rij is, the better the CF algorithm will be. A smoothing method s is to assign new values to blank value Rij (= ⊥) in the training matrix R. We use superscript ‘s ’ to mark a smoothed rating. Rij if Rij = ⊥, s Rij = s(Rtrue , i, j) else. Thus, a smoothing method is a function to replace the ⊥’s in a sparse matrix R with new values s(Rtrue , i, j). 3.2
Memory-Based and Hybrid CF Algorithms
Model-based CF algorithms are divided into two categories: user-based and itembased, which are different in the perspective of similarity measurement. The representative algorithms are respectively k-nearest neighbors (KNN) [11] and item-based CF (IB) [16]. KNN’s basic idea is to compute the active user’s predicted vote on an item as a weighted average of the votes given to that item by other users [11]. It first measures the similarity as the Pearson correlation of two rating vectors. The predictions are computed as the weighted average of deviation from the k nearest neighbors’ mean. IB just makes predictions in the other dimension of the rating matrix. The best similarity method is adjusted
84
D. Han, G.-R. Xue, and Y. Yu
cosine similarity[16]. It predicts the active user rating by weighted average of the user’s know rating. Personality Diagnosis (PD) is a hybrid memory-and-model-based approach proposed by Pennock et al. [7] in 2000. It assumes that users report ratings for movies with Gaussian noise and the distribution of personality types or rating vectors in the matrix is representative of the distribution of personalities in the target population of users. By applying Bayes’ rule, the probability that the active user is of the same personality type as any other can be computed. After computing this quantity for each user, the probability distribution for the active users rating of an unseen item can be calculated.
4
Smoothing Methods
There are many underlying scenarios of smoothing methods. One user may probably vote an item with similar value to most of the other users. Therefore, we may use the mean rating values of the item to replace the blanks. Some users may vote high rates to most movies and others may vote low rates according to their tastes. Hence, we may use the mean rating value of the user to replace those blanks. Besides, we may also cluster users/items into groups to get better smoothing results. Accordingly, smoothing methods can also be divided into groups from different perspective. From the dimensional perspective, they are divided into three groups: item-based, user-based and user-item-based smoothing. From the granular perspective, they are divided into matrix-based and clusterbased smoothing. 4.1
Dimensional Perspective
Item-Based Smoothing. Item-based smoothing method is a function like s({Rij0 }, i0 , j0 ). It replaces the blanks with other users’ rating values to the same item. We may use the mean value of the other users’ ratings. sI (Rtrue , i0 , j0 ) = s({Rij |j = j0 }, i0 , j0 ) = average(Rij0 ) Rij0 i,Rij0 =⊥
User-Based Smoothing. User-based smoothing method is a function like s({Ri0 j }, i0 , j0 ). It replaces the blanks with the same user’s rating value to the other items. We may use the mean value of the user’s ratings to other items. sU (Rtrue , i0 , j0 ) = s({Rij |i = i0 }, i0 , j0 ) = average(Ri0 j ) Ri0 j j,Ri0 j =⊥
User-Item-Based Smoothing. User-item-based smoothing method takes both of the above two scenarios into consideration. We may first calculate each user’s average rating values to the movies, and then do item-based smoothing on the deviation rating matrix. Denote ΔRi0 j0 = Ri0 j0 − Ri0 j as the deviation rating. The smoothing function can be described as:
An Empirical Study of Data Smoothing Methods
85
sUI (Rtrue , i0 , j0 ) = Ri0 j + average(ΔRij0 ) Ri0 j + ΔRij0 i,Rij0 =⊥
4.2
Granular Perspective
Matrix-Based Smoothing. Matrix-based smoothing methods are large granular methods. They reference to all the rating values in the matrix just as sI , sU and sUI discussed before. We name them sIM , sUM and sUIM , respectively. Cluster-Based Smoothing. Cluster-based smoothing methods hypothesize that user preferences can be implicitly clustered into groups and smoothing methods can get better effectiveness with similar users. So these methods first cluster users or items into groups by some clustering algorithm. Then, they perform smoothing methods in each cluster. IC method. IC method first clusters users into groups. Thus the whole rating matrix is divided into several small matrixes vertically. For each small matrix, it does smoothing like what IM does to the whole matrix. sIC (Rtrue , i0 , j0 ) =
average
(Rij0 )
i∈C(i0 ),Rij0 =⊥
Where C(i0 ) is the user cluster that i0 belongs to. UC method. UC method first clusters items into groups. Thus the whole rating matrix is divided into several small matrixes horizontally. For each small matrix, it does smoothing like what UM does to the whole matrix. sUC (Rtrue , i0 , j0 ) =
average
j∈C(j0 ),Ri0 j =⊥
(Ri0 j )
Where C(j0 ) is the item cluster that j0 belongs to. UIC method. UIC method first clusters users into groups. For each user cluster, it does smoothing like what UIM does to the original matrix. sUIC (Rtrue , i0 , j0 ) = average(Ri0 j ) + Ri0 j =⊥
5 5.1
average
i∈C(i0 ),Rij0 =⊥
(ΔRij0 ).
Experimental Results Metrics and Datasets
Mean Absolute Error(MAE) [17] is used here to measure the prediction quality. MAE =
1 true |Ruj − Ruj | |T | u∈T
˜ ui is the predicted value of where Rui is the rating given to item i by user u, R user u on item i, T is the test set, and |T | is the size of the test set.
86
D. Han, G.-R. Xue, and Y. Yu Table 1. Training and Testing Dataset Characteristics Dataset Name EM-Train ML-Train EM-Test ML-Test
Data Rate Number Movie Souce (All) Number EachMovie 293,081 1,628 MovieLens 670,346 3,900 EachMovie 107,379 1,628 MovieLens 253,918 3,900
Matrix Density 4.50% 4.30% 4.40% 4.34%
Rates Level 0,1,2,3,4,5 1,2,3,4,5 0,1,2,3,4,5 1,2,3,4,5
To test the matrix filling methods with different data distributions, we have taken EachMovie and MovieLens 3 as two sources of our test-bed. For each data source, we have extracted two subsets with no intersection and a limitation of more than 20 rates per user. One is for training, containing 4,000 users. The other is for testing, containing 1,500 users. Some characteristics of the data sets are listed in Table 1. To get the different density training datasets, we randomly remove some ratings. By predicting each rate as the most appearing rate value 3, we can also get an MAE lower bound of each test set. They are 1.20 on EM-Test and 1.07 on ML-Train. If we got an MAE larger than the lower bound, we may regard the prediction as a failure. 5.2
How Does Smoothing Work?
To test how smoothing can work, we performed IB, KNN without penalty (KNN(U)), KNN with penalty (KNN(P))4 , PD algorithms without smoothing and with UM, IM, UIM smoothing methods on the two datasets. MAE values are given in Figure 1. Typically, MAE decreases as the training dataset density grows except those of KNN’s. From the figure, we get the following observations: 1. IB algorithm achieves a higher accuracy when performed on less-than 0.4% density training dataset using IM/UIM or larger-than 0.4% density training dataset using UM. IM and UIM decrease the accuracy of IB algorithm when the training dataset density is larger than 1%. 2. To both KNN(U) and KNN(P), All the three methods can help to lower MAE. Only in some cases on EachMovie data, UM fails. 3. All the three methods can improve PD algorithm’s accuracy. The sparser the training data is, the higher the improvement will be. These observations validate that with proper smoothing methods, memorybased or hybrid CF algorithms can achieve a higher accuracy. Item based CF should use UM smoothing while KNN and PD should use IM or UIM smoothing because such methods can provide some useful information from the other 3 4
1 Million MovieLens Dataset, http://www.cs.umn.edu/research/GroupLens/. After smoothing, KNN algorithm gets the same MAE with and without penalty. We set k = 10 in the experiments.
An Empirical Study of Data Smoothing Methods
1.3
1.2
KNN(P) KNN(U) IM + KNN UM + KNN UIM + KNN
1.35 1.3
MAE
1.1 1.05
2.2
1.2
1.05 1
1.4 1.2 1
0.95 0
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Density(%)
2
0.8 0
(a) IB on EM
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Density(%)
2
0
(b) KNN on EM
0.95
0.75
0.8
0.8 1 1.2 Density(%)
1.4
1.6
1.8
2
1.8
2
PD Only IM + PD UM + PD UIM + PD
2.2 2 MAE
MAE
0.85
0.6
2.4
0.9
0.8
0.4
2.6 KNN(P) KNN(U) IM + KNN UM + KNN UIM + KNN
0.95
0.85
0.2
(c) PD on EM
1 IB Only IM + IB UM + IB UIM + IB
0.9
2 1.8 1.6
1.1
1
0.9
MAE
2.4
1.15
0.95
PD Only IM + PD UM + PD UIM + PD
2.6
MAE
1.25
1.15 MAE
2.8
1.4 IB Only IM + IB UM + IB UIM + IB
1.25
87
1.8 1.6 1.4 1.2 1 0.8
0.7
0.75 0
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Density(%)
(d) IB on ML
2
0.6 0
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Density(%)
(e) KNN on ML
2
0
0.2
0.4
0.6
0.8 1 1.2 Density(%)
1.4
1.6
(f) PD on ML
Fig. 1. MAEs of IB, KNN, and PD on different sparsity EachMovie and MovieLens datasets with different smoothing methods
perspective of the training matrix than the perspective those methods do predictions from. If we do smoothing from the same perspective of CF algorithms do predictions, the MAEs will be lower on sparse dataset only. They will be higher on dense dataset as these smoothing methods data can weaken the abilities of CF algorithms to identifying unique user preferences. The experimental results of “IM + IB” and “UM + KNN” also show evidences. 5.3
Is There Any Difference with Cluster Numbers?
We have also clustered the 0.125%, 1% and 2% density datasets of both EachMovie and MovieLens training data into different number of clusters to see where cluster can help smoothing and CF algorithms. The clustering is performed by the Repeated Bisections method [18] of cluto - a software package for clustering high-dimensional datasets, using I2 criteria [19]. The MAEs are shown in Figure 2. We do not present the figures on the 2% density datasets due to the space limit. They are quite similar to those on the 1% density datasets. In the figures, when cluster number is 1, IC, UC and UIC methods are just the same as IM, UM and UIM. The observations are: 1. Most of the minimum MAEs are obtained by the smoothing methods without clustering. There is no clear advantage of cluster-based smoothing methods. 2. The effects of the cluster-based algorithms are similar on the two datasets.
D. Han, G.-R. Xue, and Y. Yu
1.4 IC + IB UC + IB UIC + IB
1.35 1.3
MAE
1.2 1.15 1.1 1.05 1 0
10
20
30
40 50 60 70 Cluster Number
80
90 100
1.25
1.1
1 10
20
30
40 50 60 70 Cluster Number
80
90 100
0
20
30
1.14
40 50 60 70 Cluster Number
80
90 100
IC + PD UC + PD UIC + PD
1.12 1.1
1.2
1.08 MAE
1.15 1.1
1.15 1.1
1.06 1.04 1.02
1.05
1.05 1
1
0.95
0.95 0
10
20
30
40 50 60 70 Cluster Number
80
90 100
1 0.98 0.96 0
(d) IB on 1% EM
10
20
30
40 50 60 70 Cluster Number
80
90 100
0
(e) KNN on 1% EM
1
0.9
MAE
MAE
0.84
10
20
30
40 50 60 70 Cluster Number
80
90 100
10
20
30
40 50 60 70 Cluster Number
80
0
0.9
0.84
0.95
MAE
1
MAE
0.86
0.82 0.8
0.75 10
20
30
40 50 60 70 Cluster Number
80
90 100
(j) IB on 1% ML
80
90 100
IC + PD UC + PD UIC + PD
0.9
0.8
0.76 0
40 50 60 70 Cluster Number
0.85
0.78
0.7
30
1.05
0.95
0.8
20
1.1 IC + KNN UC + KNN UIC + KNN
0.88
0.85
10
(i) PD on 0.125% ML
0.9 IC + IB UC + IB UIC + IB
1
0.95
90 100
(h) KNN on 0.125% ML
1.05
IC + PD UC + PD UIC + PD
0.8 0
(g) IB on 0.125% ML
90 100
0.85
0.76 0
80
0.9
0.8 0.78
0.75
40 50 60 70 Cluster Number
1
0.86
0.82 0.8
30
1.05
0.88
0.85
20
1.1 IC + KNN UC + KNN UIC + KNN
0.92
0.9
10
(f) PD on 1% EM
0.94 IC + IB UC + IB UIC + IB
0.95
MAE
10
(c) PD on 0.125% EM
IC + KNN UC + KNN UIC + KNN
1.25
1.2
MAE
1.2
1.05
1.3 IC + IB UC + IB UIC + IB
1.3
1.25
1.15
(b) KNN on 0.125% EM
1.35
IC + PD UC + PD UIC + PD
1.4 1.35 1.3
0
(a) IB on 0.125% EM
MAE
1.45 IC + KNN UC + KNN UIC + KNN
MAE
MAE
1.25
1.22 1.2 1.18 1.16 1.14 1.12 1.1 1.08 1.06 1.04 1.02 1
MAE
88
0.75 0
10
20
30
40 50 60 70 Cluster Number
80
90 100
(k) KNN on 1% ML
0
10
20
30
40 50 60 70 Cluster Number
80
90 100
(l) PD on 1% ML
Fig. 2. MAEs of IB, KNN, and PD on different density datasets with different cluster number and different smoothing methods
5.4
A Brief Summary
When CF algorithms are performed in with quite sparse training set, proper smoothing methods can help them to predict user preferences more accurately. Here we list the best smoothing methods for each of the three CF algorithms on the 0.125%, 1% and 2% density datasets in Table 2.
An Empirical Study of Data Smoothing Methods
89
Table 2. Best Smoothing(Smt.) Methods on Two Datasets EachMovie Density Algorithm Best Smt. Method MAE without Smt. MAE with Smt. Improvement MovieLens Density Algorithm Best Smt. Method MAE without Smt. MAE with Smt. Improvement
6
IB UM 0.947 0.931 1.69% IB UM 0.739 0.723 2.17%
2% KNN PD IB UIM IM UIM 1.047 1.063 0.990 0.956 0.935 0.969 8.69% 12.0% 2.12% 2% KNN PD IB UIM UIC(70) UM 0.845 0.917 0.751 0.766 0.759 0.740 9.35% 17.2% 1.46%
1% 0.125% KNN PD IB KNN PD UIM IM UIM IM IM 1.113 1.345 1.230 1.113 2.754 0.975 0.955 1.043 1.020 1.020 12.4% 29.0% 15.2% 8.36% 62.9% 1% 0.125% KNN PD IB KNN PD UIM UIC(70) IC(30) IM UIC(20) 0.862 1.282 0.915 0.830 2.476 0.769 0.768 0.775 0.801 0.825 10.8% 40.1% 15.3% 3.49% 66.7%
Conclusion
In this paper, we have compared several smooth methods to deal with the sparsity problem in recommendation systems. When properly smoothing methods are performed, the representative memory-based algorithms (IB, KNN) and the hybrid algorithm (PD) can be improved to a higher accuracy. In our experiments, the average MAE improvements of IB, KNN and PD algorithm are 6.32%, 8.85% and 38.0% respectively. These improvements will lead current recommendation systems to higher accuracies with lower training data requirements. Our conclusion is that, to deal with the sparse training data problem in collaborative filtering, IB algorithm should use UM smoothing method before predicting. KNN should use UIM smoothing method before predicting. PD algorithm should use IM smoothing method before predicting. Although clustering may sometimes help to improve the prediction accuracy, they do not show a clear advantage. Future work may include a theoretical analysis of the effects of these smoothing methods to the CF algorithms.
References 1. Breese, J.S., Heckerman, D., Kadie, C.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th Conference on Uncertainty in Artifical Intelligence. (1998) 43–52 2. Kohrs, A., Merialdo, B.: Clustering for clooaborative filtering applications, IOS Press (1999) 3. L.H. Ungar, D.P.F.: Clustering methods for collaborative filtering. In: Proceedings of the Workshop on Recommendation Systems, AAAI Press (1998) 4. Hofmann, T.: Latent semantic models for collaborative filtering. ACM Transactions on Information System 22 (2004) 89–115 5. Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J.: Grouplens: applying collaborative filtering to usenet news. Communications of the ACM 40 (1997) 77–87
90
D. Han, G.-R. Xue, and Y. Yu
6. Soboroff, I., Nicholas, C.: Collaborative filtering and the generalized vector space model (poster session). In: Proceedings of the 23rd annual international conference on Research and development in information retrieval. (2000) 351–353 7. Penmnock, D.M., Horvitz, E., Lawrence, S., Giles, C.L.: Collaborative filtering by personality diagnosis: A hybrid memory-and-model-based approach. In: Proc. of the 16th Conference on Uncertainty in Artifical Intelligence. (2000) 473–480 8. Goldberg, K.Y., Roeder, T., Gupta, D., Perkins, C.: Eigentaste: A constant time collaborative filtering algorithm. Information Retrieval 4 (2001) 133–151 9. Fisher, D., Hildrum, K., Hong, J., Newman, M., Thomas, M., Vuduc, R.: Swami: a framework for collaborative filtering algorithm development and evaluation. In: Proceedings of the 23rd annual international conference on Research and development in information retrieval. (2000) 366–368 10. Sarwar, B.M., Karypis, G., Konstan, J.A., Riedl, J.T.: Application of dimensionality reduction in recommender system – a case study. In: ACM WebKDD 2000 Web Mining for E-Commerce Workshop. (2000) 11. Zeng, C., Xing, C.X., Zhou, L.Z.: Similarity measure and instance selection for collaborative filtering. In: Proceedings of the 12th international conference on World Wide Web. (2003) 652–658 12. Balabanovic, M., Shoham, Y.: Fab: content-based, collaborative recommendation. Communication of the ACM 40 (1997) 66–72 13. Claypool, M., Gokhale, A., Mirands, T., Murnikov, P., Netes, D., Sartin, M.: Combining content-based and collaborative filters in an online newspaper. In: ACM SIGIR Workshop on Recommender Systems - Implementation and Evaluation. (1999) 14. Popescul, A., Ungar, L.H., Pennock, D.M., Lawrence, S.: Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In: Proceedings of the 17th Conference on Uncertainty in Artifical Intelligence. (2001) 437–444 15. Xue, G.R., Lin, C., Yang, Q., Xi, W., Zeng, H.J., Yu, Y., Chen, Z.: Scalable collaborative filtering using cluster-based smoothing. In: Proceedings of the 28th annual international conference on Research and development in information retrieval. (2005) 114–121 16. Sarwar, B., Karypis, G., Konstan, J., Reidl, J.: Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on World Wide Web. (2001) 285–295 17. Herlocker, J.L., Konstan, J.A., Terveen, L.G., Riedl, J.T.: Evaluating collaborative filtering recommender systems. Trans. on Information System 22 (2004) 5–53 18. Zhao, Y., Karypis, G., Fayyad, U.: Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery 10 (2005) 141 – 168 19. Zhao, Y., Karypis, G.: Soft clustering criterion functions for partitional document clustering: a summary of results. In: Proceedings of the thirteenth ACM conference on Information and knowledge management. (2004) 246–247
Eliminate Redundancy in Parallel Search: A Multi-agent Coordination Approach Jiewen Luo1,2 and Zhongzhi Shi1 1
Institute of Computing Technology,Chinese Academy of Sciences 100080 Beijing, China {luojw, shizz }@ics.ict.ac.cn 2 Graduate University of Chinese Academy of Sciences 100080 Beijing, China
Abstract. Web spider is a widely used approach to obtain information for search engines. As the size of the Web grows, it becomes a natural choice to parallelize the spider’s crawling process. However, parallel execution often causes redundant web pages to occupy vast storing space. How to solve this problem becomes a significant issue for the design of next generation web spiders. In this paper, we employ the method from multi-agent coordination to design a parallel spider model and implement it on the multi-agent platform MAGE. Through the control of central facilitator agent, spiders can coordinate each other to avoid redundant pages in the web page search process. Experiment results demonstrate that it is very effective to improve the collection efficiency and can eliminate redundant pages with a tiny efficiency cost.
1
Introduction
One of the important components in a search engine is the web spider, a program downloading and storing web pages. It can traverse the World Wide Web information space by following hypertext links and retrieve web documents by standard HTTP protocol. Generally, a spider starts off by placing an initial URL, namely seed URL, in a queue. From this queue, the spider gets a URL, downloads web pages, extracts any URLs in the downloaded pages, and then puts the new URLs in the queue. This process is repeated until the spider decides to stop. Collected pages are mainly used to create a copy of all the visited websites for later processing by a search engine that will index the downloaded pages to provide fast searches. As the size of the Web grows, it becomes more difficult to retrieve the whole or a significant portion of the Web using a single spider. Therefore, it is a natural choice for many search engines to parallelize the collection process for maximizing the download rate. We refer to this type of spider as a parallel spider. Although this approach can considerably improve the efficiency, it also takes great challenges in how to eliminate the web page redundancy caused by parallelization on the one hand and minimize the cost on the other. In order to investigate this problem, we design the parallel spider model based on a multi-agent system. A multi-agent system is one in which a number of agents Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 91–100, 2006. c Springer-Verlag Berlin Heidelberg 2006
92
J. Luo and Z. Shi
cooperate and interact with each other to achieve a global objective in a complex and distributed environment. With the cooperation, spider agents can coordinate the information collection actions, which effectively avoid the page redundancy. To test the approach, we realize the model on the multi-agent platform MAGE[3] and conduct a series of experiments. Experiment results demonstrate that it is very effective to solving page redundancy problem with a tiny multi-agent interaction cost.
2
Related Work
Web spiders have been studied since the advent of the Web [4-10].As to parallel spiders, there exists a significant body of literature studying the general problem of parallel and distributed computing [12,13]. Some of these studies focus on the design of efficient parallel algorithms. For example, references [14,15] present various architectures for parallel computing, propose algorithms solving various problems under the architecture, and study the complexity of the proposed algorithms. Besides, some researchers explore to build cooperative information gathering system from distributed problem solving perspective [18]. Another body of literature investigates the parallel spider from multi-agent perspective [1,17,18]. Reference [1] presents the Collaborative Spider, a multi-agent system designed to provide post-retrieval analysis and enable across-user collaboration in web search and mining. This system allows the user to annotate search sessions and share them with other users. References [17,18] introduce the infoSleuth project, which is based on a multi-agent system for heterogeneous information resources in open and dynamic environments. However, all the parallel models and multi-agent system referred above do not amply consider or present effective approaches for the redundant pages issue. When multiple spiders run in parallel to download pages, it is possible that different processes download the same pages multiple times. One spider may not be aware that another spider has already downloaded the pages. Can we coordinate the parallel process to prevent redundant pages? This paper will investigate this issue in the following sections.
3
Parallel Web Spider Model
3.1
The Architecture of Parallel Model
Based on the framework of MAGE1 , we design a parallel spider model for web information collection. Figure 1 shows the model’s architecture. It contains three kinds of agents: managing agent (Facilitator), spider agent and index agent. They all execute on the MAGE platform. In our system, MAGE provides the multi-agent runtime environment support. Moreover, MAGE platform has implemented most basic classes, which can be inherited or extended by new classes 1
http://www.intsci.ac.cn/en/research/mage.html
Eliminate Redundancy in Parallel Search
93
Fig. 1. Generic Agent Structure in MAGE Platform
conveniently, e.g. Agent class, ACLMessage class. Thus, it makes the model implementation process easier since we need not to consider the complex multiagent mechanism and can focus our work on the web spider’s search function. Facilitator is the management center in the parallel spider system. It is a special agent that is used to manage the seed URL resource and interaction process among agents. If an agent wants to participate in the system, it should register on the facilitator to get permit message. Moreover, the facilitator controls the life cycle of other agents. For instance, facilitator can kill all the registered agents in the runtime. Spider is the main part of the system. It starts work after receiving permit message from the facilitator. And then, it begins to search and download web pages from the seed URL and parse their children URL’s hyperlinks for useful information. In our parallel model, we adopt the FIPA ACL2 as the standard communication language among agents. Through coordination of the facilitator, spider agents can effectively avoid redundant pages. After finishing page collection, spiders inform the index Agent .The main job of the index agent is classifying, indexing and storing the web pages, which form the resource library for search engine and future web analysis. 3.2
Dynamic Assignment Mechanism
In this section, we explain how to avoid redundant pages caused by parallelization with the dynamic assignment mechanism. When multiple spiders download pages in parallel, different spiders may download the same pages multiple times. The overlap situation is presented by figure 2. In order to solve this problem, we implement the dynamic assignment mechanism through multi-agent coordination. The central facilitator logically divides the WWW into different domains (e.g. sina.com3 and yahoo.com etc) and dynamically assigns each domain to a spider as the seed URL. After MAGE generates a new spider and arbitrarily assigns it a seed URL, the new spider first checks the facilitator’s URL domain to check whether the seed URL has been 2 3
http://www.fipa.org/specs/fipa00061/ http://www.sina.com/ is the biggest Chinese website.
94
J. Luo and Z. Shi
Algorithm 1. Facilitator Dynamic Assignment 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:
Facilitator(); // facilitator initialization Message=Facilitator.WaitForMessage(); if M essage.T ype = Registration then Record(SpiderName,IP,seedURL); Put(seedURL,DomainLibrary); // put the new seedURL into Domain Library else if M essage.T ype = checkSeedU RL then URL=Message.getSeedURL(); // get seedURL from Message. Flag=Check(URL); // check if the URL is existent in domain library end if if F lag = T rue then sendMessage(Spider,Refuse,The seed URL is existent); else if F lag = F alse then sendMessage(Spider,Allow,The seed URL is vacant); wakeup(Spider); end if
Algorithm 2. Spider Dynamic Assignment 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26:
m=maxOfVisiteSite; // assign the max of visit site to m n=maxOfSearchDepth; // assign the max search depth to n Spider(SpiderName,IP); // spider agent initialization seedURL=Spider.getSeedURL(); sendMessage(Facilitator,checkSeedURL); Message=Spider.WaitForMessage(); if M essage.T ype = Ref use then Inform(Message.content); else if M essage.T ype = Allow then sendMessage(Facilitator,Registration); // get seedURL from Message. end if loop if numOf Site ink = nC + k ≥ in 0
n = 1,2, L , N , k = 0,1,L, C− 1 ino + C − k i −k > n ≥ no , C C
if { f mod (ino − k , C) = 0
}
(9)
ink −k = nC≥ ino −k ipk −k ino +C −k ≥n > , if { fmod(ino −k,C) =0} C C
(10)
ink = nC + k
Then the multidimensional receptive field function can be defined as: X nk = i nk
n −1
∑ t =1
p −1
X t n −1 i pk − f mod ( i nk , C ) + ( C C p=2
∑
∑
s =1
Xs i − f mod ( i nk , C ) ) + 1k C C
(11)
Output is: F (S ) =
C −1
∑ W (X k =0
k
)μ (k )
(12)
802
K. Zhang and S. Wang
The FCMAC weight be turned on by:
ΔW ( X ) = β [d − F ( s)]μ (k ) / C
(13)
In equation (13), d denotes target value of the output, β denotes the rate of learning, and μ(k) is membership of fuzzy set of variables. We consider the case that s set of training data is repeatedly presented to learning structure, the updating at the presentation of the s th sample in the i th iteration is expressed as Wsi = Wsi−1 + ΔWsi−1 = Wsi−1 +
β c
[d s −1 − μ sT−1 (k )Wsi−1 ]μ s −1 (k )
(14)
where subscripts s − 1 and s indicate the sample numbers, the superscript i indicates the iteration number and d s −1 is the target value of sample s − 1 . By using equation (14), the difference of memory contents between two consecutive iterations is calculated as: DWsi = Ws( i +1) − Ws( i ) = DWs(−i1) + −
β C
β
μ s −1 ( k )∑ [d s −1 − μ sT−1 (k )Ws(−i1+1) ] C −1
C
k =0
μ s −1 (k )∑ [d s −1 − μ sT−1 (k )Ws(−i1) ] C −1
(15)
k =0
= DWs(−i1) −
β C
μ s −1 (k ) μ sT−1 (k ) DWs(−i1)
Let us further define that Gs = E N s L E s E s −1 E s −2 L E1
DW
(i )
[ = G [DW = G [DW
= Gs DW1(i −1) , DW2(i −1) ,L, DW N(is−1) 2 s
(i −2 ) 1
i s
( 0) 1
= G DW
(0)
i s
, DW
, DW
(i −2) 2
(0) 2
,L, DW
,L, DW
(0) Ns
]
( i−2 ) Ns
]
]
(16)
With the definition of DW
( 0) s
DWs( 0 ) = Ws(1) − Ws( 0 ) = Ws(−11) + ΔWs(−11) − Ws( 0 ) = W1(1) + ΔW1(1) + L + ΔWs(−11) − Ws( 0 ) = WN( 0s ) + ΔW N( 0s ) + ΔW1(1) + L + ΔWs(−11) − Ws( 0 ) = Ws( 0 ) + ΔWs( 0 ) + ΔWs(+01) + L + ΔWN( 0s ) + ΔW1(1) + L + ΔWs(−11) − Ws( 0 )
(17)
= ΔWs( 0 ) + ΔWs(+01) + L + ΔW N( 0s ) + ΔW1(1) + L + ΔWs(−11)
We should notice: ΔWs(i ) is different from DWs(i ) , ΔWs(i ) is the updating at the time
s
is presented in the i th iteration, DWs( i ) = Ws(i ) − Ws( i −1) is change of weight in the i th iteration and in the i − 1 , from equation (13) we can have: when
Δ W s( i ) =
β C
μ s ( k )[ d s − μ sT ( k )W s ( i ) ]
(18)
Vibration Control of Suspension System
803
We define scalar d s − μ sT ( k ) w s( i ) as u s(i ) , thus: ΔWs( i ) =
β C
μ s (k )u s(i )
(19)
Fig. 2 shows Flow diagram of control synthesis process. The relation between the input-output properties of system and its stability has been extensively studied using the theory of dissipative system. The relation between the input-output properties of system and its stability has been extensively studied using the theory of dissipative system [11]. The relevance is that the FCMAC neural network used for control purposes here will be constructed to have an important dissipation property that makes them robust disturbances and unmodeled dynamics. Select input vector
Compute desired output function, And compare to CMAC output
Err!D
Update E No
ErrD
Test new 'W(X k ) Yes Done
Fig. 2. Flow diagram of control synthesis process
4 Experiments Physical model can be established based on the theories of Buckingham’s Pi theorem of section 2, and variables of model are shown in Table 2. As can be seen easily, the universes of discourse of the controller inputs are closely related to road disturbances, thus in the FCMAC semi-active system, we determine Fuzzy inputs and output as follows. Controller inputs: vertical sprung mass acceleration, the change of vertical sprung mass acceleration and suspension deflection, controller output is voltage signal which is used to drive the servo valve. We divide each variable into 7 fuzzy subsets. The results of experiment of closed-loop excited by three typical vibration signals: the first is approximately a step input, representing a large-amplitude isolated obstacle, and
804
K. Zhang and S. Wang Table 2. Variable of physical model
variable value m1
variable value
7˄kg˅
m2 62˄kg˅
35˄kN/m˅
Ee
700 ( MPa˅
Kq
k1 110˄kN/m˅ k2
Kc
4u104 (m/v) 0.67˄m2/s˅ 1.9u1010 ˄m3/Pas˅
KV
0.005˄m3˅ 6u1012 (m3/Pas˅
A s 0.0031(m2˅
Vt
u
0~10˄V˅
Ctp
U
900˄kg/m3˅
xv max 0.002˄m˅
7 Acceleration 6
m/s2
5 4 3 2 1 0 -1 -2 -3
0
2
4
6
8
10
t/s Fig. 3. Experiment result of vibration control for a step input
illustrating primary ride control. The second type of vertical road disturbance is a sinusoidal input, the effect of small-amplitude, high frequency inputs can be seen form the sinusoidal road input. The third type of vertical road disturbance is random disturbance input. In order to show that the performance of designed FCMAC semiactive suspension system is good, a comparative study with another scheme is necessary. In this experiment, another semi-active suspension system is designed using PID approach, which has suspension system is designed using PID approach, which has been widely used and been proven good control results in designing semi-active suspension systems. The results for the experiments described are given in Figs. 3-5. In each case, the dashed line shows the response of conventional PID semi-active suspension, the real line shows the FCMAC semi-active suspension system. The result from the step input, Fig. 3 shows that FCMAC semi-active suspension gives an improved heave response in terms of overshoot, although small. Figs. 3-5 show the
Vibration Control of Suspension System
805
sprung mass vertical acceleration with road disturbance of step input, sinusoidal input, and random input in the FCMAC Neural Networks semi-active suspension system. It can be easily noted that the FCMAC Neural Networks semi-active suspension system show better performance in reducing the sprung mass vertical acceleration. Figs. 3-5 show the sprung mass vertical acceleration assuming 25%, 27% and 20% change, greatly improving the ride quality for road disturbance input.
5
Acceleration m/s2
4 3 2 1 0 -1 -2 -3 -4 -5 0
2
4
6
8
10
t/s Fig. 4. Experiment result of vibration control for a sinusoidal input 6
Acceleration m/s2
4 2 0 -2 -4 -6 0
2
4
6
8
10
t/s Fig. 5. Experiment result of vibration control for a random disturbance input
806
K. Zhang and S. Wang
5 Conclusions Extensions to the studies on the vibration absorption technique in semi-active suspension have been presented. Based on a physical model established by a strict technique, online control of semi-active suspension is performed by using a hybrid intelligent control integrated Fuzzy logic technique with CMAC Neural Network. Experiment results show that FCMAC controller described in this paper produced significant coach performance improvements in FCMAC semi-active suspension system. System model is used to offline assistant learning. However, the controller described is only an initial feasibility study with minimal tuning and it nevertheless produced improvements. There is scope for further improvements by extending the controller to the full range of coach motion, including longitudinal. The physical model described is only included as a realistic representation of the suspension system and allowed road inputs via the vertical tyre force. The model should then be extended to the fullcar model by including the steering system, with driver steering wheel inputs via the rack motion, and the representation of the lateral tyre force.
References 1. Mo Jamshidi: Fuzzy Control of Complex Systems. Soft Computing, Springer-Verlag (1997) 42-56 2. Chian-Shyong Tseng: Integrating Fuzzy Knowledge by Genetic Algorithms. IEEE Transactions on Evolutionary Computation 2(4) (1998) 138-149 3. Takao Sato, Akira Inoue & Yoichi Hirashima: Self-Tuning Two-Degree-of-Freedom PID Controller Reducing the Effect of Disturbance, Proc. American Control Conf., American (2002) 3997-4002 4. Daniel Neagu: Modular Neuro-fuzzy Networks Used in Explicit and Implicit Knowledge Integration, Proc.15th International Conf. On Florida Artificial Intelligence Society, Florida, (2002) 277-281 5. Buswell, R., Angelo, P., Wright J.: Transparency and Simplification of Fuzzy Rule-based Models for On-line Adaptation, Proc.2nd EUSFLAT Conf. Leicester (2001) 234-237 6. Yaochu Jin: A Framework for Evolutionary Optimization with Approximate Fitness Functions. IEEE Transactions on Evolutionary Computation 6(5) (2002) 481-494. 7. Markus Olhofer, Toshiyuki Arima, Yaochu Jin, Toyotaka Sonoda and Bernhard Sendhoff: Optimization of Transonic Gas Turbine Blades with Evolution Strategies, Honda R&D Technical Review 14(1) (2002) 203-216 8. Kim, M. Y., Cho H. S., Kim J. H.: Neural Network Based Recognition of Navigation Environment for Intelligent Shipyard Welding Robots, IROS, Maui, Hawaii, USA (2001) 9. Jeong, I. S., Cho, H.S.: Self-localization for Mobile Robots by Matching of Two Consecutive Environmental Range Data, Proc. of the 2001 IEEE International Conf. on Robotics and Automation (2001) 1603-1608 10. Chun-ShinLin: Learning Convergence of CMAC Technique. IEEE Transactions on Neural Networks 8(6) (1997) 1281-1291 11. Jong H. Park: H-Infinity Direct Yaw Moment Control with Brakes for Robust Performance and Stability of Vehicles. JSME International Journal, Series C 44(2) (2001) 403-413
An Intelligent Conversational Agent as the Web Virtual Representative Using Semantic Bayesian Networks Kyoung-Min Kim, Jin-Hyuk Hong, and Sung-Bae Cho Dept. of Computer Science, Yonsei University 134 Shinchon-dong, Seodaemoon-ku, Seoul 120-749, Korea {kminkim, hjinh}@sclab.yonsei.ac.kr,
[email protected] Abstract. In this paper, we propose semantic Bayesian networks that infer the user’s intention based on Bayesian networks and their semantic information. Since conversation often contains ambiguous expressions, managing the context or the uncertainty is necessary to support flexible conversational agents. The proposed method drives the mixed-initiative interaction (MII) that prompts for missing concepts and clarifies for spurious concepts to understand the user’s intention correctly. We have applied it to an information retrieval service for Web sites so as to verify the usefulness.
1 Introduction Conversational agents are representative intelligent agents that provide information for users by using the natural language dialogue. They understand the user’s intention through conversation and offer an appropriate service [1]. Pattern matching, one of the popular methods for constructing the conversational agent, works well at a sentence, but it is not feasible to understand a dialogue in which context should be considered. Moreover, it is likely to fail to understand a complex sentence which requires a deep analysis. Recently, researchers have investigated on flexible dialogue models using Bayesian networks (BN) [2], where Bayesian networks have been also used in information retrieval (IR) [3,4]. When many variables in the application domain are related to each other, the inference of the user’s intention becomes very difficult. In this paper, we propose semantic Bayesian networks (SeBN) not only to reduce the complexity of construction, but also to infer the user’s intention in more detail.
2 Intelligent Conversational Agent For the efficient inference of conversational agents, we design semantic Bayesian networks composed of the probabilistic inference and the semantic inference. The stepwise modeling helps to understand the user’s intention through conversation. It is constructed with three levels according to the function: keywords, concepts, and targets. The keyword layer consists of words related to the user’s query, while the concept layer is composed of entities of the domain and their semantic relationship. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 807 – 812, 2006. © Springer-Verlag Berlin Heidelberg 2006
808
K.-M. Kim, J.-H. Hong, and S.-B. Cho
The target layer represents target information (products) whose attributes are defined. The concept layer is divided to three components: objects, attributes, and values. Each object is a set of attribute-value pairs, where node ai is an attribute and node v k is a value in the domain. A solid line represents the probabilistic relationship between nodes, while a dotted line signifies the semantic relationship between them. The probabilistic relationship in semantic Bayesian networks is similar to that in the traditional IR model. First, it infers probabilistically between the keyword layer and the concept layer. The user’s query U = {k1 , k 2 ,..., k t } , where keyword k i is interpreted as an elementary word in the keyword layer. It sets a keyword node as 1 when the word of the keyword layer is observed in query Q and otherwise 0. It infers the probability of each node in the concept layer when all evidence variables associated with keywords are set. The probability P (c | W ) , using with keyword W in the keyword layer as evidence, is defined as follows: P (c | W ) = P (c | w1 , w2 ,..., wN ) =
P(c) × P ( w1 , w2 ,..., wN | c) P ( w1 , w2 ,..., wN )
≈ P (c) × P ( w1 , w2 ,..., wN | c) = P (c) × P ( w1 | c) × P ( w2 | c) × ... × P( wN | c) n
∏ P ( w | c)
= P (c )
i
i =1
After computing the probability of all nodes in the concept layer, it infers the probability P( p | C ) of product p in the target layer using them as evidence similar to that of inferring the probability P (c | W ) . Table 1. Semantic inference in SeBN [Concept] Object : O = {o1, o2, o3, ... , on} Attribute: A = {a1, a2, a3, ... , an} Value : V = {v1, v2, v3, ... , vn} i = find_high_probability_object(); // Search an object over the threshold. if ( object(o) > α ) { j = find_OA_attribute(o); // Search attribute ‘a’ whose probability is below the threshold which has O-A relationship with node o. if ( attribute(a) < β ) response (a, v); else reject; }
else { j = find_high_probability_attribute(); // Search attribute ‘a’ over the threshold if ( attribute(a) > α ) { // Search attribute ‘a’ whose probability is below the threshold which has O-A relationship with node o. i = find_OA_object(o); response(o); } else reject; }
It selects a node in the target layer whose probability is higher than the threshold after the inference. It provides the information of the target product to the user when a proper number of nodes are selected. In this paper, we define it as successful execution
An Intelligent Conversational Agent as the Web Virtual Representative
809
when one product is selected. When there is no product selected, it executes the semantic inference of semantic Bayesian networks in the concept layer. There are two major relationships (‘Has-a’, ‘Is-a’) between nodes while ‘Is-a’ has two different types (‘OA’, ‘A-V’). Table 1 shows the semantic inference executed when the probabilistic inference fails to infer the user’s intention. At first, it searches an object node whose probability is higher than the threshold. Then, it looks up an attribute whose probability is below the threshold, which has ‘O-A’ relationship with the object node. It collects supplementary information on the attribute selected and carries out the inference again with the information gathered from the user. It repeats the procedure until a target product is selected. In order to search out what the user wants, it should gather enough information to infer target products. Traditional information retrieval systems work well only when the user’s query includes enough for inference. When there is not enough information, however, the proposed method provides a suitable response to the user based on the mixed-initiative interaction. Finally, the proposed method is able to show good performance in diverse dialogue situations.
3 Experimental Results 3.1 Experimental Environments
In order to verify the usefulness of the proposed method, we have developed a flexible conversational agent for virtual representative of web sites. It consists of a main window for displaying information, an input text box, and the avatar system with a speech generation engine. When the user types a query, the avatar responds in speech with a corresponding action. Q-avatar (www. qavatar.com) is employed as the avatar system, while Voiceware (voiceware.co.kr), a solution for speech generation, is used to provide the user with a realistic and convenient interface. Table 2. The attributes of objects Object
Attributes
Cellular-phone (240 products)
Brand, Product, Model, Image, Bell, Camera, Pixel, Size, Weight, Color, Price, Year
Digital camera (688 products)
Brand, Product, Model, Image, Memory, Run-time, Color, Size, Feature, Weight, Price, Year
MP3 (488 products)
Brand, Product, Model, Image, Pixel, Memory, Weight, Feature, Size, Zoom, Color, Price, Year
The target domain is mobile Web sites introducing of cellular-phone, digital camera, and MP3. Table 2 describes the attributes of each object in the target database. The database is built by extracting information from 5 Web sites: Naver (www.nshopping.naver.com), Samsung-mall (www.samsung-mall.co.kr), LG-eshop (www.gseshop.co.kr), Enuri (www.enuri.com) and DCinside (www.dcincide. com).
810
K.-M. Kim, J.-H. Hong, and S.-B. Cho
3.2 Qualitative Analysis: Illustration
In many cases, the user has background knowledge in addition to the content of conversation, so a query may not include all information required to infer the user’s intention. The proposed conversational agent uses a mixed-initiative dialogue as shown in Fig. 1, by requesting additional information to the user. Finally, information on the target product is provided to the user after inference with the information. As shown in Fig. 1, it searches the plural objects from the initial query. Since the agent needs additional information for the correct intention inference, it outputs a supplementary query, “What color do you like? Red or Silver?” to the user as the mixed-initiative interaction. The user responds that “I’d like a red one,” and then it executes the probabilistic inference again using semantic Bayesian networks based on this response. Until it detects the plural products as the result of prior inference, the agent keeps up the conversation by the mixed-initiative interaction. If a product is selected, the agent finishes the inference and provides the information of the target product to the user. User: I’m looking for a MP3. - Infer probabilistically using the SeBN. - Search the multiple objects. - Acquire additional information using MII. a. Search the object: MP3 b. Search the O-A attribute : color
<Semantic Inference> Agent: What color do you like? Red or Silver? User : I'd like a red one. - Infer probabilistically using SeBN. - Search the target product. → Inference completion - Provide the information. Agent: Here it is. (Samsung-Anycall SCH-S130). The price is \383,000.
Fig. 1. The target retrieval using MII
3.3 Quantitative Analysis
In order to evaluate the efficacy and satisfaction of the agent by younger adults, we have compared three conversational agents: Script-based, BN-based and the SeBNbased agents. Thirty subjects in age from 22 to 33 living in Korea have evaluated them. Participants have performed ten tasks to search for information on several products. Satisfaction scores have been measured by single item on five-point Likert scales (1.0= “not at all”, 5.0=“very much”). The result (see Table 3) shows that the proposed method (M=94.42) is superior to the others (M=92.15, 87.51). It is able to manage various types of dialogues while the
An Intelligent Conversational Agent as the Web Virtual Representative
811
Script-based and BN-based agents fail to respond for them. It also shows good performance in providing suitable responses for the user with few interactions (M=2.96). As shown in Table 4, satisfaction with the overall intervention has been very high in the proposed method. The effect of the proposed method on easy, friendly, informative, repetitive and interesting was statistically measured by a one way ANOVA with the variant of the SeBN as the among-systems factor. Post-hoc tests were also conducted, whenever one or more of the significant factors entail more than two categories or levels. At most cases, the proposed method shows better results than the others. Table 3. Comparative results in efficiency Script
BN
SeBN
Retrieval Rate (PR) Average Interactions (AI)
PR(%)
AI
PR(%)
AI
PR(%)
AI
Average
87.51
3.53
92.15
3.18
94.42
2.96
Table 4. Comparative results in the user satisfaction
User Satisfaction
Script
BN
SeBN
Mean
SD
Mean
SD
Mean
SD
Easy
2.9
.7379
4.0
.4714
4.6
.5164
Friendly
2.7
.6749
3.8
.4216
4.7
.4830
Informative
3.1
.5676
3.7
.6749
4.4
.5164
Repetitive
3.9
.8756
2.3
.4830
1.6
.5164
Interesting
3.1
.5676
3.8
.7888
4.5
.5270
4 Conclusion and Future Works We have proposed a conversational agent using semantic Bayesian networks to be more flexible and considerable in the inference of the intention. If information in the query is insufficient, the agent asks the user to give more information to infer the user’s intention correctly. Finally, it improves the answering performance. While, designing networks becomes easier and more comprehensible by one’s intuition. The research on the automatic construction of semantic networks is remained for the future study.
Acknowledgements The work was supported by the Korea Research Foundation Grant. (KRF-2004-005H00005)
812
K.-M. Kim, J.-H. Hong, and S.-B. Cho
References 1. S. Macskassy, “A conversational agent,” Master Essay, Rutgers University, 1996. 2. J.-H. Hong and S.-B. Cho, “A two-stage Bayesian network for effective development of conversational agent,” LNCS 2690, pp. 1-8, 2003. 3. S. Acid, et al., “An information retrieval model based on simple Bayesian networks,” Int. Journal of Intelligent Systems, 18(2), pp. 251-265, 2003. 4. P. Calado, et al., “A Bayesian network approach to searching Web databases through keyword-based queries,” Information Processing and Management, 40, pp. 773-790, 2004.
Three-Tier Multi-agent Approach for Solving Traveling Salesman Problem Shi-Liang Yan1 and Ke-Feng Zhou2 1
Engineering & Technology Center, Southwest University of Science and Technology, Mianyang, 621010, P.R. China
[email protected] 2 Department of computer science, Qiu-Ai Experimental Elementary School, Ningbo, 315153, P.R. China
[email protected] Abstract. The Traveling Salesman Problem (TSP) is a very hard optimization problem in the field of operations research. It has been shown to be NP-hard, and is an often-used benchmark for new optimization techniques. This paper will to bring up a three-tier multi-agent approach for solving the TSP. This proposed approach supports the distributed solving to the TSP. It divides into three-tier (layer), the first tier is ant colony optimization agent and its function is generating the new solution continuously; the second-tier is genetic algorithm agent, its function is optimizing the current solutions group; and the third tier is fast local searching agent and its function is optimizing the best solution from the beginning of the trial. Ultimately, the experimental results have shown that the proposed hybrid approach has good performance with respect to the quality of solution and the speed of computation.
1 Introduction The usual ways of solving the traveling salesman problem (TSP) are based either on integer linear programming techniques or on heuristic algorithms [1]. The former approach pursues the solution of the problem up to optimality. For example, highly optimized exact algorithms based on the branch-and-cut method [2] have been proposed that enable even large TSP instances to be solved. Unfortunately, this is not always possible because of the increase in computational work with problem size. Some heuristic approaches, however, have been proved to be very effective both in terms of execution times and quality of the solutions achieved. Domain-specific heuristics, such as 2-Opt [3], 3-Opt [4], and Lin-Kernighan (LK) [5], are surprisingly very effective for the TSP. On the other hand, general problem-independent heuristics like simulated annealing (SA) [6], genetic algorithms (GA) [7], ant system (AS) [8] and Neural Network (NN) [9] perform quite poorly on large TSP instances. Several published results demonstrate that combining a problem-independent heuristic with a local search method is a viable and effective approach for finding high-quality solutions of large TSPs [10]. The problem-independent part of the hybrid algorithm drives the exploration of the search space, thus, focusing on the global Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 813 – 817, 2006. © Springer-Verlag Berlin Heidelberg 2006
814
S.-L. Yan and K.-F. Zhou
optimization task, while the local search algorithm visits the promising sub-regions of the solution space. Reference [11] proposed the chained local optimization algorithm, where a special type of 4-Opt move is used under the control of a SA schema to escape from the local optima found with LK. Reference [12] combines an original compact genetic algorithm with an efficient implementation of LK. Since the TSP has proved to belong to the class of NP-hard problem, heuristics and meta-heuristics occupy an important place in the methods so far developed to provide practical solutions for large instances. Accordingly, this paper proposes a three-tier multi-agent approach for solving the TSP. This proposed approach supports the distributed solving to the TSP. The experimental results have shown that the proposed hybrid approach has good performance with respect to the quality of solution and the speed of computation.
2 Three-Tier Multi-agent Approach 2.1 Three-Tier Multi-agent Framework Figure 1 depicts the framework of the proposed three-tier multi-agent approach. From figure 1, we can see that the proposed approach includes three-tire (layer), the first tier is ant colony optimization agent and its function is generating the new solution continuously; the second-tier is genetic algorithm agent, its function is optimizing the current solutions group; and the third tier is fast local searching agent and its function is optimizing the best solution from the beginning of the trial. Obviously, this proposed approach supports the distributed solving to the TSP. The proposed approach is terminated when one of the following criteria (end condition of whole approach) is satisfied: (1) maximum preset search time is exhausted; (2) the known optimal solution was achieved by this proposed approach.
Fig. 1. The framework of the proposed three-tier multi-agent approach
2.2 Ant Colony Optimization Agent The first tier of this proposed approach is ant colony optimization (ACS) agent and its function is generating the new solution continuously. The ACS agent is terminated
Three-Tier Multi-agent Approach for Solving Traveling Salesman Problem
815
when one of the following criteria is satisfied: (1) the known optimal solution was achieved by ACS agent; (2) all of the solutions generated in five continuous iterations are worse than the globally best tour from the beginning of the trial. When the current optimization in the first tier met the end condition of ACS agent but not satisfied the end condition of whole approach, it will use genetic algorithm (GA) agent to improve the quality of those solutions achieved by ACS agent. That is, if the final solutions achieved by ACS agent aren’t satisfied the end condition of whole approach, then these solutions will be selected as the initial chromosomes population of GA agent. 2.3 Genetic Algorithm Agent The second-tier of this proposed approach is genetic algorithm agent and its function is optimizing the current solutions group. The GA agent is terminated when one of the following criteria is satisfied: (1) all individuals of a population are identical; (2) all of the solutions generated in five continuous iterations are worse than the globally best tour from the beginning of the trial; (3) maximum preset generation is reached; (4) the known optimal solution was achieved by GA agent. When the current optimization in the second tier met the end condition of GA agent but not satisfied the end condition of whole approach, it will use fast local searching agent to refine these improved solutions achieved by GA agent. That is, if the final solutions achieved by GA agent aren’t satisfied the end condition of whole approach, then these solutions (or part of them, such as top ten) will be selected as the initial optimization individuals of fast local searching agent. 2.4 Fast Local Searching Agent The third tier of this proposed approach is fast local searching (FLS) agent and its function is optimizing the best solution from the beginning of the trial. The FLS agent is terminated when one of the following criteria is satisfied: (1) maximum preset search time is exhausted; (2) the known optimal solution was achieved by FLS agent. If the final solutions achieved by FLS agent aren’t satisfied the end condition of whole approach, then it will use ACS agent to generate some new more promising solutions. Please note, ACS agent should apply the global updating rule to update the pheromone level to the refined solutions achieved by FLS agent. In the whole optimization process of this proposed approach, the globally best solution from the beginning of the trial will be recorded because of the randomness of the ACS agent and GA agent.
3 Simulation Study In this section, the proposed method was tested using a set of the benchmark TSP. To avoid any misinterpretation of the optimization results, relating to the choice of a particular initial solution, we performed each test 20 times. The performance of the proposed technology is compared to the other four published versions optimization algorithm (see table 1). The experimental results obtained for 8 test problems, using these five different methods, are given in table 2. This proposed method was implemented on a Pentium IV 2.4 GHz personal computer with a single processor and
816
S.-L. Yan and K.-F. Zhou Table 1. The 5 different methods used in this section Mark M1 M2 M3 M4 M5
Name of the optimization algorithm Annealing-Based Heuristics Guided Local Search Lin-Kernighan Heuristic Evolutionary Algorithm Three-tier Multi-agent Approach
Reference Reference [13] Reference [14] Reference [15] Reference [16] This paper
Table 2. The experimental results obtained form 8 test problems using the 5 methods TSP ATT532 (42029) RAT783 (8806) PR1002 (259045) VM1084 (239297) PCB1173 (56892) U1432 (152970) U2152 (64253) PR2392 (378032)
Performance Index Average time (S) Average error (%) Average time (S) Average error (%) Average time (S) Average error (%) Average time (S) Average error (%) Average time (S) Average error (%) Average time (S) Average error (%) Average time (S) Average error (%) Average time (S) Average error (%)
M1 20.93 0.016 55.06 0.022 129.58 0.027 114.49 0.033 119.68 0.025 150.03 0.049 299.20 0.056 293.59 0.077
M2 17.01 0.014 44.74 0.020 105.28 0.024 93.02 0.029 97.24 0.022 121.89 0.044 243.09 0.050 238.53 0.068
M3 17.65 0.013 46.43 0.019 109.26 0.023 96.54 0.027 100.92 0.021 126.51 0.042 252.30 0.047 247.56 0.065
M4 16.68 0.013 43.89 0.018 103.28 0.022 91.25 0.027 95.39 0.020 119.58 0.040 238.49 0.046 234.01 0.063
M5 15.07 0.010 39.65 0.014 93.31 0.017 82.45 0.021 86.19 0.016 108.04 0.031 215.47 0.035 211.43 0.049
512M RAM. Here, the average optimization time and the average optimization error were used for evaluating these different methods. Average optimization time is the average time of the computation time of these 20 independent runs. In the same way, average optimization error is the average error of the computation error of these 20 independent runs. The computation error is the relative error between the optimal result achieved by giving method and the optimal result produced by the recent published heuristic optimization algorithm. From table 2, we can see that both the average optimization time and the average optimization error, the proposed method of this paper are better than the other four methods. Simulations have shown that the proposed hybrid approach for the TSP has excellent performance with respect to the quality of solutions and the speed of calculation.
4 Conclusions The contribution of this paper is summarized as following. (1) This paper presents a new multi-agent architecture for the TSP; the future new agent can be integrated into this architecture. (2) This study designs three-tier multi-agent framework with different
Three-Tier Multi-agent Approach for Solving Traveling Salesman Problem
817
function in the proposed approach, all of these agents cooperate each other to find an optimal solution.
References 1. LAPORTE G.: The Traveling Salesman Problem: an Overview of Exact and Approximate Algorithms. European Journal of Operational Research. 59(3) (1992) 231-247 2. PADBERG M., RINALDI G.: Optimization of a 532-city Symmetric Genetic Traveling Salesman Problem by Branch and Cut. Operational Research Letters. 6(1) (1987) 1-7 3. CROES G.A.: A Method for Solving Traveling Salesman Problems. Operational Research. 6(6) (1958) 791-812 4. LIN S.: Computer Solution of the Traveling Salesman Problem. Bell System Technology Journal. 44(5) (1965) 2245-2269 5. LIN S., KERNIGHAN B.W.: An Effective Heuristic Algorithm for the Traveling Salesman Problem. Operational Research. 21(2) (1973) 498-516 6. MARTIN O., OTTO S.W.: Combining Simulated Annealing with Local Search Heuristic. Annual Operational Research. 63(2) (1996) 57-75 7. MICHALEWICZ Z.: Genetic Algorithms + Data Structures = Evolution Programs. 3rd edn. Springer-Verlag, Berlin Heidelberg New York (1996) 8. DORIGO M., GAMBARDELLA L.M.: Ant Colony System: A Cooperative Learning Approach to the Traveling Salesman Problem. IEEE Transactions on Evolutionary Computation. 1(1) (1997) 53-66 9. ARAS N., ALTINEL I.K., OOMMEN J.: A Kohonen-Like Decomposition Method for the Euclidean Traveling Salesman Problem-KNIES_DECOMPOSE. IEEE Transactions on Neural Network. 14(4) (2003) 869-890 10. BURIOL L., FRANCA P.M.: A New Memetic Algorithm for the Asymmetric Traveling Salesman Problem. Journal of Heuristics. 10(3) (2004) 483-506 11. MARTIN O., OTTO S.W., FELTEN E.W.: Large Step Markov Chain for the Traveling Salesman. Journal of Complex System. 5(3) (1991) 299 12. BARAGLIA R., HIDALGO J.I., PEREGO R.: A Hybrid Heuristic for the Traveling Salesman Problem. IEEE Transactions on Evolutionary Computation. 5(6) (2001) 613-622 13. PEPPER J.W., GOLDEN B.L., WASIL E.A.: Solving the Traveling Salesman Problem with Annealing-Based Heuristics: A Computational Study. IEEE Transactions on Systems, Man and Cybernetics - Part A: Systems and Humans. 32(1) (2002) 72-77 14. VOUDOURIS C., TSANG E.: Guided Local Search and its Application to the Traveling Salesman Problem. European Journal of Operational Research. 113(2) (1999) 469-499 15. HELSGAUN K.: An Effective Implementation of the Lin-Kernighan Traveling Salesman Heuristic. European Journal of Operational Research, 126 (1) (2000) 106-130 16. TSAI H.K., YANG J.M., TSAI Y.F., et al.: An Evolutionary Algorithm for Large Traveling Salesman Problems. IEEE Transactions on Systems, Man and Cybernetics. 34(4) (2004) 1718-1729
Adaptive Agent Selection in Large-Scale Multi-Agent Systems Toshiharu Sugawara1 , Kensuke Fukuda2 , Toshio Hirotsu3 , Shin-ya Sato4 , and Satoshi Kurihara5 1
NTT Communication Science Laboratories, 3-1 Wakamiya Morinosato, Atsugi, kanagawa 243-0198, Japan 2 National Institute of Informatics 3 Toyohashi University of Technology 4 NTT Network Innovation Laboratories 5 Osaka University
Abstract. An agent in a multi-agent system (MAS) has to select appropriate agents to assign tasks. Unfortunately no agent in an open environment can identify the states of all agents, so this selection must be done according to local information about the other known agents; however this information is limited and may contain uncertainty. In this paper we investigate how overall performance of MAS is affected by learning parameters for adaptive strategies to select partner agent for collaboration. We show experimental results using simulation and discuss why overall performance of MAS varies.
1 Introduction A huge number of agents are deployed in the vast Internet. Some agent work together cooperatively or competitively to provide many services. In general an agent selects partner agents appropriate for collaboration based on their abilities. However, if multiple candidate agents still remain, a more efficient agent is preferable. Of course, the efficiency of these agents are determined from agents’ states such as workload, communication bandwidth as well as their intrinsic capabilities, such as CPU power. In an open environment like the Internet, however, agents have to select efficient partners based only on locally available information, which contains uncertainties. Our interest lies in the total performance of MAS, when all agents select partner agents based on their partner selection strategies (PSS), where ‘total’ means the average performance of all agents. For this issue, we already investigated how PSS with learning can gradually improve total performance. This improvement is achieved by load balancing when workloads are high and by concentration when low[4]. This paper investigates how such learning parameters as fluctuation related to exploitation-versus-exploration will affect total performance and load balancing. As a performance parameter we use response time from sending a task to returning the results. We also introduce a simple coordination strategy to improve the total performance of MAS. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 818–822, 2006. c Springer-Verlag Berlin Heidelberg 2006
Adaptive Agent Selection in Large-Scale Multi-Agent Systems
819
2 Simulation and Its Model Our simulation model consists of a set of agents A = {ai } and server agents S = {sj }(⊂ A) that can execute a specific task T . When agent ai has T , it assigns it to a server that it knows, Si (⊂ S). A PSS corresponds to the method whereby ai selects sj from Si . To clearly understand the relationship between total performance and PSSs, we simplify the other parameters and assume that all tasks are only of a single type. On the Internet, one observable parameter concerning performance is response time (rt), time between the request of a task and the return of the result. Other parameters such as CPU types are not usually available, they are not used in our experiments. Using the observed data, agents learn which server’s rt will be the smallest. In our experiments, ai calculates expected response time eji of known server sj and (usually) selects the best server, arg minsj ∈Si eji . To evaluate the total performance of a MAS, we adopt the average value of response time of all agents. This value is denoted by RT . Our experiments show how RT changes when all agents try to make their rt smaller. Let |A| = 10000 and |S| = 120. All agents are randomly placed on points of a 150 x 150 grid plane with torus and Manhattan distance topology. For every tick tl tasks are generated and given to tl randomly selected agents. tl is called the task load denoted by tl task/tick or tl T/t. Agent ai receiving a task selects sj ∈ Si using its PSS to send it to sj . Server sj processes it and returns the result. Agent ai observes the response time and calculate expected response time eji . Servers are assumed to have their own CPU capabilities; each server processes a task in 10 to 50 ticks. These capabilities are randomly assigned. When a task arrives at sj , it is immediately executed if sj has no other tasks. If sj has others, the received task is stored in its queue and queued tasks are processed in turn. An agent can store 20 tasks in its queue. If ai already has 20 tasks in its queue, the new task is dropped. Communication cost, the time to send a task, is assumed to be proportional to the distance between the agent and server and ranges from 10 to 120 msec. An agent has its scope based on distance (less than 14), so it can communicate with agents and servers in its scope. For all agents to compare server response times, they have to know at least two servers. If an agent knows less than one server, it asks all known agents for servers; then in the following experiments agents can initially know two to fifteen servers. The results of all our simulations are the average value of three independent experiments from three series of random numbers using three seeds. In these experiments, the total sum of the capabilities expressed by all agents is that they can theoretically process 4.7 to 5.0 tasks every tick. The actual capabilities are influenced by communication cost, the deviation of task allocation, and server distribution in the grid plane. Note that response time is the sum of the durations for communication, queuing, and processing. Agents select appropriate servers using expected response time eji , which is calculated by average values of observed response time hji or estimated values by update function wij that are often used in reinforcement learning: hji [n] = hji [n − 1] ∗ (1 − 1/n) + rtji [n] ∗ 1/n j wi [n − 1] ∗ (1 − λ) + rtji [n] ∗ λ (if n > 1) wij [n] = rtji [1] (if n = 1),
(1) (2)
820
T. Sugawara et al.
where rtji [n] is the n-th observed response time when ai sent the task to server sj and λ is the learning parameter (0 ≤ λ ≤ 1. λ = 0.2 in our experiments). Value hji [n] (wij [n]) is ai ’s expected response time about sj by Eq. 1 (Eq. 2) after n data, rij [1], . . . , rij [n], of sj were observed. We describe hji and wij simply if value n is unnecessary. Note that, limn→∞ 1/n = 0, so hji becomes stable, although wij may change according to the server’s performance. Hereafter, Eq. 1 is called the average value function.
3 Performance Improvement by Learning In the first experiment (Exp. 1), ai selects a server from Si by the following PSS: P1. ai selects server arg minsj ∈Si eji with a probability of p (0 ≤ p ≤ 1). If multiple servers have the best eji , one is randomly selected. P2. Otherwise, ai selects the server with probabilistic distribution Pr(sj ), (eki )−l Pr(sj ) = (eji )−l / sk ∈Si
where eji = hji or wij . Agent ai initially sets eki = 0 for known server sk so it selects sk with no observed data. Note that P2 induces fluctuation to the PSS, since an agent may select the best server. The larger l is, the smaller this fluctuation is. First we assume tl=4T/t. Overall response time RT is calculated every 20K ticks and the average value of these RT values during 600 K and 800 K are shown in Fig. 1; this illustrates how independent learning by each agent can uprate the total performance of the entire MAS, probably by balance load. Note that since the first observed values of RT , that is, during 0K to 20K ticks, are 272.9 ticks and 11180.0 tasks, respectively, they can be considerably improved. When using update function (eji = wij ), Fig. 1(a) indicates that RT generally improves if l is larger (p = 0.9 or 0.8, fixed); this result is consistent with the results of the fixed load case in [2] (In [2] p = 0, so graphs (ii) in Fig. 1(a) are closer to their experiments). However, when using the average value function (eji = hji ), RT is the best 130 128 126
160
(ii) (p=0.8)
124
(i) (p=0.9)
122 120 118 116
(iii) (p=0.9)
114 112 110
0
1
2
3
(b) Average response time (RT)
150 Response time in ticks
Response time in ticks
Using average value function 㩷Using update function
(a) Average response time (RT)
4
5
l : Power of the fluctuation factor
6
140
(l=2)
130 120 110 100
0
0.2
0.4
0.6
0.8
p : Probability of selecting the best server
Fig. 1. Total performance values
1
Adaptive Agent Selection in Large-Scale Multi-Agent Systems
821
around 1.5 ≤ l ≤ 2. This means that some fluctuation can improve total performance. This suggests that using the average value function makes agent server selections stable and conservative over time, but the convergent state is not optimal. We investigate how p affects total performance by fixing l to 2.0 since RT is the best around p = 2.0. This result is shown in Fig. 1 (b). RT values are minimal around p = 0.85 to 0.9 in both cases. This graph shows that the observed tradeoff in the previous experiment is a special phenomenon; when p ≤ 0.8, agents select the best servers by learning with information from their own viewpoints, so their performance value RT indicates improvement when p becomes larger. However if p > 0.9, this PSS induces task concentration to a few servers that have high CPU capabilities. Consequently, both RT degrades. This simulation shows that tradeoff appears only when 0.8 ≤ l ≤ 0.9, which is a kind of intermediate state. Finally, the most noticeable feature of these graphs is that using the average value function outperforms update functions in RT . When agents use update functions, they can act more adaptively to the environment including other agents. Hence this adaptability also continuously changes the environment. Agents can detect these changes only by observing worse response time and never be stable; so the total performance do not improve. However, using average values, agent selections become stable and conservative. This convergent state is not optimal but it is better than the unstable situation.
4 Communication with Local Agents In [4], we suggested that a collaborative strategy (CS) can drastically improve total performance, but fluctuation degrades it, in contrast to the experiment in the previous section. In the next experiment (Exp. 2), we investigate in more detail how performance value RT changes when agents collaborate. For this purpose, the following CS is inserted before P1. P0. With a probability of 0.01, ai randomly selects a known agent, and asks it to recommend a ‘best server’ sn . If ai already knows sn , this is ignored. If not, ai adds sn and set eni = 0. 㩷Using update function (p=0.9)
106
(a) Average response time (RT)
104
(ii)
102 100
108
(i)
108
0
1 2 3 4 5 l : Power of the fluctuation factor
Using average value function
110
6
Response time in ticks
Response time in ticks
110
(b) Average response time (RT) (l=2)
106
(i)
104 102 (ii)
100 98
0
0.2 0.4 0.6 0.8 p : Probability of selecting the best server
Fig. 2. Total performance values (p = 0.9)
1
822
T. Sugawara et al.
The requested agent recommends the server which is selected by P1 and P2. P0 suggests that a good server of neighboring agents may be good for itself. This CS requires one tick for communication; so their communication cost is relatively small because an agent only communicates with others within its scope, and this recommendation occurs fairly infrequently (1%). Additionally only “best” (often “better”) servers are recommended; so low performance servers are filtered out. Note that this CS is open, that is, a good server in the distance may be delivered by the agent-by-agent recommendation chain. The results of Exp. 2 are illustrated in Fig. 2. Both figures show that the value of RT worsens if l and p becomes larger when agents use the update function; this indicates that if the agents select make rational decision with higher probability, the total performance of MAS degrades, although these variations by l and p are smaller than the ones in Exp. 1; this is contrary to our intuition. In contrast, when the agents use average value function, we cannot observe clear relation among p, l, and RT . This means that by using the average value function, the convergent state of MAS does not depend on parameters p and l; fluctuation does not make total performance better nor worse. As in Exp. 1, using average value function outperforms update function except when p is small, as shown in Fig. 2 (b).
5 Discussion and Conclusion We investigated how learning parameters for agent local strategies to select partner agents influence on the total performance of an entire MAS. Although many task allocation methods have been proposed[1,2,3], our experiments show that task allocation is not obvious when MAS is massive and each agent has sophisticated abilities such as decision-making by its own information and learning. All the figures in this paper indicate that adaptation by update function makes the environment fluid, so some degradation is observed. However, our experiments assume quite simple situations; tasks are given in a constant rate of 4T/t. In actual Internet environments, some characteristics always change. Hence, we believe there is the tradeoff between unstable and stable: in other words, adaptive and inadaptable; which is better may depend on the speed of environmental changes versus the speed of adaptation. However, I believe that adaptability is vital because the Internet is a changing environment. The issue is the relationship between the speed of change and adaptation; this is our next research issue.
References 1. Mirchandany, R., Stankovic, J.: Using stochastic learning automata for job scheduling in distributed procesing systems. Journal of Parallel and Distributed Computing 38(11) (1986) 1513–1525 2. Schaerf, A., Shoham, Y., Tennenholtz, M.: Adaptive Load Balancing: A Study in Multi-Agent Learning. Journal of Artificial Intelligence Research 2 (1995) 475–500 3. Mehra, T., Wah, B.W.: Population-based learning of load balancing policies for a distributed computer system. Proc. of Computing in Aerospace 9 Conference. (1993) 1120–1130 4. Sugawara, T. et. al.: Total Performance by Local Agent Selection Strategies in Multi-Agent Systems. Proc. of 5th Int. Joint Conf. on Autonomous Agents and Multiagent Systems. (2006)
A Mobile Agent Approach to Support Parallel Evolutionary Computation Wei-Po Lee Department of Information Management National University of Kaohsiung Kaohsiung, Taiwan
[email protected] Abstract. To enhance the performance of evolutionary algorithms, different parallel computation models have been proposed, and they have been implemented on parallel computers to speed up the computation. Instead of using expensive parallel computing facilities, in this paper we propose to implement parallel evolutionary computation models on easily available networked PCs, and present a multi-agent framework to support parallelism. To evaluate the proposed approach, different kinds of experiments have been conducted to assess the developed system and the preliminary results show the efficiency of our approach.
1 Introduction Evolutionary Algorithms (EAs) have become increasingly popular to solve problems in different domains. Yet, in order to solve more difficult problems, two inherent features of EAs, pre-mature convergence and computation time must be improved. To overcome the two problems, the idea of parallelizing EAs has been proposed by different researchers, and it has been proven to be a promising method [1]. Conceptually, the parallelism is to divide a big population in a sequential EA into multiple smaller sub-populations that are distributed to separate processors and can then be evaluated simultaneously. According to the sub-population size, the parallel EAs are categorized into two types: coarse-grain and fine-grain, and they are usually implemented on MIMD and SIMD computers, respectively. Though parallel computers can speed up EAs dramatically, they are not easily available and it is expensive to upgrade their processing power and memory. The expensive parallel computers cannot be expected especially in a campus-based computing environment. A promising alternative without expensive hardware facilities is to construct the parallel EC framework on a set of networked personal computers. There are some possible ways to manage the operation of such a distributed computational framework, for example a client-server technique or an agent-based approach. Client-server is the most common paradigm of distributed computing at present. But in this paradigm all components are stationary in respect to execution. Therefore, mobile agent-based design paradigm provides a better choice to support parallel computing [2][3]. Mobile agents are software agents that are capable of transmitting Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 823 – 828, 2006. © Springer-Verlag Berlin Heidelberg 2006
824
W.-P. Lee
themselves across a computer network and recommencing execution at a remote site. Many multi-agent platforms are publicly available. Application developers can use them as platforms and focus on the software development issues at the application level. To make the parallel EAs more realistic, in this paper we propose a mobile agentbased methodology to support parallelism on networked computers in which mobile agents dynamically allocate available machines for the EA-code. To verify the proposed approach we have developed a prototype application system on a middleware platform JADE (Java Agent Development Framework, [3]). Different kinds of experiments have been conducted to assess the developed prototype system and the preliminary results show the promise and efficiency of our approach.
2 The Proposed Mobile Agent-Based Approach To develop a parallel EC framework without using a powerful connection machine, we choose to implement a coarse-grain model on a set of networked PCs in our laboratory. Figure 1 shows the aspect of our computational framework for island model parallelism. In this figure, each grey block contains a sub-population of individuals in which EC continues for a certain number of generations before migration happens. During this period, computation for each sub-population is independent from others, so evolution for different sub-populations can proceed simultaneously. In this model, migration happens only between immediate neighbors along different dimensions of the hypercube, and the communication phase is to send a certain number of the best individuals of each sub-population to substitute the same number of worst individuals of its immediate neighbors at a regular interval. Though realizing parallelism in the above way is much cheaper than using a parallel computer, some machines have to be pre-specified to contribute their computation power for running sub-populations. This is in fact not practical for a computing laboratory environment in which the computational resources are shared by many end-users. Therefore, we develop an agent-based framework instead to manage the execution and communication for different sub-populations in an adaptive manner. Our agent-based framework mainly includes three kinds of agents (in addition to the default agents provided by JADE for network services): the mobile agent to carry EC code, the status agent to report machine status, and the synchronization agent to record the evolving progress of sub-populations. Different agents communicate with each other in a pre-specified language through a common channel that is compliant with established FIPA standards to ensure the interoperability between agents. In the proposed framework, mobile agents play the major roles because they have the ability to migrate from machine to machine. In our strategy, each mobile agent takes the responsibility for running a sub-population. Initially, a status agent is created for each machine in the network framework, and the status agent SA in the main host is responsible for maintaining the information of individual machine status (e.g., the CPU utility) reported by other status agents. After the mobile agent MA starts the computation in the main host, it clones itself with the EC-code for each node of the prespecified n-cube model. According to the information provided by SA, MA dispatches the duplication to each available machine. In the proposed framework, a mobile agent
A Mobile Agent Approach to Support Parallel Evolutionary Computation
825
can only execute its EC-code in an idle machine. Therefore, during the period of execution if a status agent in any machine (except the main host) detects the existence of a new end-user, it informs the mobile agent in the same machine to suspend the execution of its corresponding EC-code to release the computing resources for the end-user. The mobile agent then inquires machine availability from SA to find a free machine to resume the execution. If the mobile agent finds a machine with CPU utility lower than a certain threshold, it asks SA to reserve this machine, and then carries its code and related information to that machine to continue the computation from the point it is interrupted; otherwise it stays in the machine it has been in and waits for any available one. A synchronized method is taken to exchange individuals for our island model EC, in which all sub-populations have to evolve for the same number of generation before the communication phase can happen. This is achieved by creating a synchronization agent in the main host to record evolving progress of different sub-populations. Once a sub-population has evolved for a pre-defined number of generations, the mobile agent responsible for it will send a message to the synchronization agent to indicate this situation, and the status agent in the same machine will inform SA to update the information of machine availability accordingly. Because sub-populations have to wait for each other and will release their computation resources while waiting, eventually each mobile agent can find a free machine for the sub-population not yet finished. In the worst case when all machines are taken by end-users during the execution, the mobile agents distributed in them can move back to the main host to continue their evolutionary computation. To implement the above computing environment, we choose JADE as a platform and build our agents on it. JADE is compliant with FIPA standard specifications so that agents developed on it can thus interoperate with other agents built with the same standard. Also JADE allows each agent to dynamically discover other agents and to communicate with them in a peer-to-peer manner. By using such a platform, we can ignore the details of the middleware issues on a distributed architecture, but just concentrate on building agents to constitute our parallel evolutionary computing framework to solve application tasks.
virtual parallelism real parallelism
sub-pop1(t)
sub-pop2(t)
ΘΘΘ
sub-popn(t)
subpop1(t+1)
subpop2(t+1)
ΘΘΘ
subpopn(t+1)
ΘΘΘ
Comm. Phase
subpop1(t+k-1)
subpop2(t+k-1)
ΘΘΘ
n-cube model Computer Network mapping
subpopn(t+k-1)
Fig. 1. Control flows of the parallel evolutionary computation
826
W.-P. Lee
3 Implementations and Experiments Following our computational architecture and the mobile agent-based methodology, we conduct two series of experiments to compare the corresponding performance. The experiments are to evaluate the performance of using mobile agents to exploit the computational power of multiple machines. Two strategies, one static and one adaptive, are used to achieve the real parallelism. To verify our methodology, we use it to perform a time-consuming application task evolving robot controllers [4]. Here, our previous genetic programming system ([5]) is extended with the agent-based parallel model to evolve robot controllers to achieve an obstacle avoidance task. In the experiments of using agent-based strategies, we arranged eight networked computers running Windows NT operating system as a distributed computing environment to support the JADE platform. One of the computers played the role of “main host” where it was running the JADE Main-Container. Once the platform was activated, the JADE default agents AMS, DF, and RMA were instantiated, in which the DF agent provided other agents the information (e.g., the IP address) about the hosts connected to the JADE platform. Each machine in this framework had a status agent to report its corresponding status, and a mobile agent to take care of the computation for each sub-population. Also a synchronization agent was created in the main container to activate the communication phase for exchanging individuals between subpopulations. The first phase is to examine whether the developed agent-based framework can be used to exploit the networked computing power to speed up the evolutionary computation. Hence, the agents remained static; they did not migrate between different machines. To compare the effect of using different numbers of computers, we have conducted experiments of one population of 400 individuals, two sub-populations of 200 individuals, four sub-populations of 100 individuals, and eight sub-populations of 50 individuals. In the experiments, the communication phase happened every ten generations and the migrants to be exchanged was 4% of a sub-population. These values were chosen because they were found to give the best performance in a small pilot study. Each sub-population was executed on one computer and end-users were not allowed to access the machines used for the experiments. Mobile agents here were created only to communicate with agents on other machines. Figure 2 (left) shows the average of the computational time spent for each strategy. As can be observed, the time for running a single experiment is reduced in an almost linear manner. Though in the above parallel model, agent communication for exchanging individuals between sub-populations needs extra computational effort, it is relatively small compared to the time for running a time-consuming evolutionary experiment. Different from the above, the second phase is to show how the proposed mobile agent approach can support adaptive parallelism on network computers. In the experiments, eight hosts were connected to the JADE platform in which only one host was preserved for running EC experiments. The preserved host was initiated as the main container to enable the default agents AMS, DF, and RMA, and here end-users were allowed to use the other seven computers as they usually do. As in the first phase, each host was a container that included a mobile agent and a status agent. Also a synchronization agent was allocated in the main container. Initially, the EC code was executed on the main container, and then the mobile agent in this host checked
-
A Mobile Agent Approach to Support Parallel Evolutionary Computation
827
with the status agent to find other available hosts. For any free host, a duplicated mobile agent packed the EC-code and relevant information, moved to the target host, and started the execution for a new sub-population. After that, the mobile agents moved between different hosts according to the strategy described in section 2, to execute the corresponding EC-code for the application task. As is mentioned, here the end-users were not restricted to use the machines connected to the mobile agent framework. Due to different user-accessing situations, the experiments of adaptive parallelism were conducted six times in different days. Figure 2 (right) shows the results. As can be seen, all runs have been sped up by the proposed approach. It shows the efficiency of our mobile agent-based approach. 1.4
4
1.2
3.5
3
1
time (hrs)
time (hrs)
2.5
2
0.8
0.6
1.5
0.4
1
0.5
m=1
m=2
m=4
m=8
1
2
3
4
0
0.2
0 day1
day2
day3
day4
day5
day6
Fig. 2. Computational cost for experiments with a static strategy (m is the number of computers used) and an adaptive strategy
4 Conclusions and Future Work In this paper, we propose to implement parallel evolutionary computation models on easily available networked PCs and present a multi-agent framework for paralleling evolutionary computation in which mobile agents play the kernel roles to manage the execution and communication for the EC-code distributed in different computers. To evaluate our framework, two sets of experiments have been conducted for static and adaptive mobile agent strategies. The results show that both strategies can efficiently speed up the computation. Currently, we are investigating the issues of system reliability and fault tolerance by constructing a mechanism to ensure that our framework can recover from unexpected faults caused by the hardware defects or the inappropriate user-operations.
References 1. Cantú-Paz, E.: Efficient and Accurate Parallel Genetic Algorithms. Kluwer Academic Publishers (2000) 2. Lange, D. B., Oshima, M.: Programming and Deploying Java Mobile Agents with Aglets. Addison-Wesley, Menlo Park, CA (1998)
828
W.-P. Lee
3. Bellifemine, F., Poggi, A., Rimassa, G.: Developing Multi-Agent Systems with a FIPACompliant Agent Framework. Software: Practice and Experience 31 (2001) 103-128 4. Nolfi, S., Floreano, D.: Evolutionary Robotics: the Biology, Intelligence, and Technology of Self-Organizing Machines. MIT Press, MA (2000) 5. Lee, W.-P.: Evolving Complex Robot Behaviors. Information Sciences 121 (1999) 1-25
The Design of Fuzzy Controller by Means of Genetic Algorithms and NFN-Based Estimation Technique Sung-Kwun Oh1, Jeoung-Nae Choi2, and Seong-Whan Jang2 1
Department of Electrical Engineering, The University of Suwon, San 2-2 Wau-ri, Bongdam-eup, Hwaseong-si, Gyeonggi-do, 445-743, South Korea
[email protected] 2 Department of Electronic and Information Engineering, Wonkwang University, 344-2, Shinyong-Dong, Iksan, Chon-Buk, 570-749, South Korea
[email protected] Abstract. In this study, we introduce a neurogenetic approach to the design of fuzzy controllers. The design procedure exploits the technology of Computational Intelligence (CI) focusing on the use of genetic algorithms and neurofuzzy networks (NFN). The crux of the design concerns the selection and determination of optimal values of the scaling factors of the fuzzy controllers, which are essential to the entire optimization process. First, the tuning of the scaling factors of the fuzzy controller is carried out, and then the development of a nonlinear mapping for the scaling factors is realized by using GA- based NFN. Keywords: Fuzzy Controller, Neurofuzzy Network (NFN), Genetic Algorithms, Estimation technique.
1 Introduction In parallel to PID controllers that are regarded nowadays as the standard control constructs of numeric control [1], fuzzy controllers have positioned themselves in a similar dominant role at the knowledge-rich end of the entire spectrum of control algorithms. The intent of this study is to develop, optimize and experiment with the fuzzy controllers when developing a general design scheme of Computational Intelligence. One of the difficulties in the construction of the fuzzy controller is to derive a set of optimal control parameters of the controller such as linguistic control rules, scaling factors, and membership functions of the fuzzy controller. In the conventional design method being applied there, a control expert proposes some linguistic rules and decides upon the type and parameters of the associated membership functions. With an attempt to enhance the quality of the control knowledge conveyed by the expert (and this usually applies to the matter of calibration of such initial domain knowledge), genetic algorithms (GAs) have already started playing a pivotal role. The development process consists of two main phases. First, using genetic optimization we determine optimal parameters of the fuzzy controller for various initial states (conditions) of the dynamic system. Second, we build up a nonlinear model that captures a relationship between Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 829 – 833, 2006. © Springer-Verlag Berlin Heidelberg 2006
830
S.-K. Oh, J.-N. Choi, and S.-W. Jang
the initial states of the system and the corresponding genetically optimized control parameters.
2 The Fuzzy Controller In fuzzy PID controller, we confine ourselves to the following notation: e denotes the error between reference and response (output of the system under control), Δe is the first-order difference of error signal while Δ2e is the second-order difference of the error. Note that the input variables to the fuzzy controller are transformed by the scaling factors (GE, GD, GH, and GC) whose role is to allow the fuzzy controller to properly “perceive” the external world to be controlled. The fuzzy PID controller consists of rules of the following form Rj : if E is A1j and ΔE is A2j and Δ2E is A3j then Δuj is Dj
(1)
The capital letters standing in the rule (Rj) denote fuzzy variables (linguistic terms) whereas D is a numeric value (singleton) of the control action. An overall operation of a fuzzy PID controller can be described in the format so that the resulting control is formed incrementally based on the previous control (2) u ( k ) = u ( k − 1) + Δu ( k )
3 Auto-tuning of the Fuzzy Controller Using GAs In this study, the number of generations is set to 100, crossover rate is equal to 0.6, while the mutation rate is taken as 0.1. The number of bits used in the coding is equal to 10. Let us recall that this involves tuning of the scaling factors and a construction of the control rules. These are genetically optimized. We set the initial individuals of GAs using three types of parameter estimation modes such as a basic mode, contraction mode and expansion mode. In the case of a basic mode (BM), we use scaling parameters that normalize error between reference and output, one level error difference and two level error difference by [-1, 1] for the initial individuals in the GA. In a contraction mode (CM), we use scaling parameters reduced by 25% in relation to the basic mode. While in the expansion mode (EM), we use scaling parameters enlarged by 25% from a basic mode. The standard ITAE expressed for the reference and the output of the system under control is treated as a fitness function [2].
4 The Estimation Algorithm by Means of GA-Based Neurofuzzy Networks(NFN) Let us consider an extension of the network with the fuzzy partition realized by fuzzy relations. Figure 1 visualizes an architecture of such NFN for two-input and oneoutput, where each input assumes three membership functions. The circles denote processing units of the NFN. The node indicated ∏ denotes a Cartesian product, whose output is the product of all the incoming signals. And N denotes the normalization of the membership grades.
The Design of Fuzzy Controller by Means of Genetic Algorithms
∏
x1
μi N
∏
N
∏
N
∏
N
∏
N
21
∏
N
Α
22
∏
N
Α
23
∏
N
∏
N
Α
11
Α
12
Α Α x2
μi
13
w
831
i
fi
∑
y^
Fig. 1. NFN structure by means of the fuzzy space partition realized by fuzzy relations
As far as learning is concerned, the connections change as follows
w(new) = w(old ) + Δw
(3)
In this algorithm, to optimize the learning rate, momentum term and fuzzy membership function of the above NFN we use the genetic algorithm.
5 Experimental Studies In this study, the dynamics of the inverted pendulum system are characterized by two state variables: θ (angle of the pole with respect to the vertical axis), θ& (angular velocity of the pole). The behavior of these two state variables is governed by the following second-order equation. The dynamic equation of the inverted pendulum comes in the form ⎛ − F − mlθ& 2 sin θ g sin θ + cos θ ⎜⎜ mc + m ⎝ θ&& = ⎛ 4 m cos 2 θ ⎞ ⎟⎟ l ⎜⎜ − ⎝ 3 mc + m ⎠
⎞ ⎟⎟ ⎠
(4)
Where g (acceleration due to gravity) is 9.8m/s2, mc (mass of cart) is 1.0kg, m (mass of pole) is 0.1kg, l(length of pole) is 0.5m and F is the applied force expressed in Newtons. Proceeding with the genetic optimization, we consider the ITAE (Integral of the Time multiplied by the Absolute value of Error), overshoot and rising time as the PI (Performance Index) of the controller as the three underlying criteria. We selected 0.1rad., 0.2rad. ,…, 0.7rad., and 0.8rad. as a collection of initial angular positions and 0.1rad/sec, 0.2rad/sec ,…, 0.7rad/sec, and 0.8rad/sec as the corresponding family of values of the initial angular velocity. We tune (adjust) the control parameters of each controller (fuzzy PID controller, fuzzy PD controller and PID controller). Figure 2 visualizes the value of the scaling factors treated as a function of initial angular position and angular velocity of the inverted pendulum in the fuzzy PID controller. Evidently there are nonlinear characteristics. In general, Fuzzy PD and fuzzy PID controllers are preferred architectures. But PID controller is also satisfactory in comparison to the fuzzy PID controller within a linear range of θ < 0.4, while in case of a nonlinear range of θ > 0.6 Fuzzy PID controller architecture performs better than both Fuzzy PD and PID controller.
S.-K. Oh, J.-N. Choi, and S.-W. Jang
100
4
80
3
60
2
GC
GE
832
40
1
20 0 .8
0 0 .8 0 .6
0 .6
0 .8 0 .6
0 .4
A n g u la r V e lo c ity
0 .8
0
0 .4
0 .2
0 .2 0
0 .6
0 .4
0 .4
0 .2
0
A n g u la r V e lo c i t y
A n g u la r P o s itio n
(a) GE
0 .2 0
A n g u l a r P o s i t io n
(b) GC
Fig. 2. Auto-tuned scaling factors according to the change of initial angles and angular velocity in the fuzzy PID controller (a) GE and (b) GC
The fuzzy PID controller and fuzzy PD controller are superior to the conventional PID controller from the viewpoint of ITAE, overshoot and rising time. Now, we consider the case in which the initial angular positions and angular velocities of the inverted pendulum are selected arbitrarily within the given range. Here we show that the control parameters under the arbitrarily selected initial condition are not tuned by the GAs and the control parameters of each controller are estimated by using the estimation algorithm of GA-based NFN. We implement the optimal neurofuzzy networks for parameter estimation using GAs. In this algorithm, we adjust the learning rates, momentum coefficient, and apexes of membership function of neurofuzzy networks by using GAs. Table 1 shows the estimated scaling factors of the fuzzy controllers and control parameters of PID controller and describes performance index (ITAE, overshoot and rising time) in case of θ = 0.22, 0.45(rad) and θ& = 0.22, 0.78(rad/sec) respectively. Table 1. The estimated parameters by means of the GA-based NFN and performance index(ITAE, overshoot and rising time) of controllers in the case of θ = 0.22, 0.45(rad) and
θ& = 0.22, 0.78(rad/sec) Controller Type FPID FPD
PID
Initial Angle
Initial angular velocity
GE
GD
GH
GC
ITAE
Over shoot (%)
Rising time (sec)
0.22
0.22
2.0328
61.546
237.3
3.706
0.419
0.000
0.261
0.45 0.22 0.45
0.78 0.22 0.78
2.411 1.854 2.865
0.855 0.149 0.728
0.000 0.000 0.102
0.167 0.129 0.149
0.22 0.78
61.082 0.529 0.305 Ti 164.717 165.861
236.4
0.22 0.45
1.9079 7.4378 4.1532 K 168.686 168.507
0.247 0.953
0.087 0.147
0.172 0.181
Td 0.104 0.103
In Table 1, we know that the fuzzy PD and fuzzy PID control effectively the inverted pendulum system. The proposed estimation algorithm such as GA-based NFN generates the preferred model architectures. The performance of the fuzzy controllers such as the fuzzy PD and the fuzzy PID controller with evidently nonlinear characteristics are superior to that of the PID controller especially in a nonlinear range of θ>0.45 when using the nonlinear dynamic equation of the inverted pendulum, while in case of a linear range θ , in which the component tag is used to mark whether the constraint cxy appears in a disjunction or not: ⎧⎪ Λ , if c xy do esn ' t ap pear as a d isjunct ; . tag ( c xy ) = ⎨ ⎪⎩ c u v , if c xy app ea rs in the disju nction c xy ∨ c u v .
Consistency check and threat resolution in LP-TPOP LP-TPOP checks the consistency of an r-STN by detecting all the negative cycles in it. When all the disjuncts in a disjunctive constraint are involved in negative cycles
LP-TPOP: Integrating Planning and Scheduling Through Constraint Programming
847
the r-STN is inconsistent. Fig.2 (a) and (b) illustrates two inconsistent r-STNs. Take (b) of Fig.2 for example, all the two disjuncts are involved in negative cycles DABCD and DBCD respectively, which makes it inconsistent. When some potential threats are detected in a partial plan, LP-TPOP takes some measures to resolve them. The basic idea is to depart the two intervals on which two conflict propositions are asserted. Fig.2 (c) and (d) respectively illustrate a partial plan and related r-STN for the problem with those actions in Fig. 1, initial state {p1#[tinit, tp1), p2#[tinit, tp2)}, and goals {p3#[t1w, t∞), p4#[t2w, t∞)}. In (c) the solid lines and dotted ones with arrows notate the causal links and the threats respectively. From (d) it can be seen that three disjuncts labeled “X” are eliminated during planning procedure. 3
A
1
-5
2 -2
D
3
A
B
-5 D
C
t1w
B
-1
-5
2
-2
C
(b)
(a)
a0#tinit p1#[tinit, tp1) p2#[tinit, tp2)
0
ta
-17 A -17
-5
C
C
tinit
p1#[ta+2, ta+8)
A# ta (not p2)#[ta+10, ta+17) p3#[ta+5, ta+tp3 )
0 -5
p3#[t1w , t ∞) p4#[t2w , t ∞)
aw
p2#[tb, tb+15)
0 B0
10
tp3
-8 0
A
tw
0 B
13
tp2
-13
0
-15 -2
tb
0
tp1
tp4
tnp1
0
B# tb -2
(not p1)#[tb+13, tb+tnp1) p4#[tb+2, tb+tp4 )
t2w (d)
(c)
Fig. 2. Two inconsistent r-STN (a, b) and the exemplified partial plan and related r-STN (c, d)
3.3 Scheduling of Partial Plan
Once a consistent complete plan has been achieved, all the temporal constraints collected are passed to scheduling module (step 3.3) and 3.4) in LP-TPOP algorithm). Tab.1 displays the constraint collected along the planning procedure w.r.t. the example of Fig.2. Once they are passed to scheduling module, an optimal solution ta=15, tb=10 can be calculated. Table 1. Table of constraint additions in planning procedure Action initial state adding A adding B adding a0->A adding a0->B supplement at end
Constraints tinit≤tp1, tinit≤tp2, tinit≤t1w, tinit≤t2w, t1w≤t∞, t2w≤t∞ ta+5≤tp3, tinit≤ta, ta+5≤t1w, t∞≤tp3, ta+17≤tinit tp2≤ta+10 tb+13≤tnp1, tb+2≤tp4, tinit≤tb, tb+2≤t2w, t∞≤tp4, tnp1≤tinit tp1≤tb+13, ta+17≤tb tb+15≤ta+10 tinit≤ta+2, ta+8≤tp1 tinit≤tb, tb+15≤tp2 tp1=tb+13, tp2=ta+10
∨
∨
∨
848
Y. Liu and Y. Jiang
3.4 Formal Properties of LP-TPOP
Similar to the definition of partial plan in POCL planning [7], a partial plan P in LPTPOP is a tuple , where A is action set with each action attached to a starting point, r-STN maintains the temporal constraints, CL is the set of causal links, and TDB is the set of propositions with each attached to an interval. Definition 2 (Validity of partial plan). A partial plan P= is a valid plan for a planning problem if there’s no open goal in P and r-STN is consistent. Proposition 1 (Soundness and completeness of LP-TPOP). LP-TPOP is sound, that is, whenever LP-TPOP returns a plan, it is a valid plan. LP-TPOP is complete, in the sense that whenever there is a valid plan (schedule) for a planning problem P, LPTPOP will find one. 1
4 Conclusions and Future Work In this paper, a sound and complete planning algorithm named LP-TPOP integrating planning with scheduling through temporal constraint management is presented. LPTPOP operates on ground CBI actions in POCL planning framework. While the schedule may be not optimal globally, it is locally optimal w.r.t. a given plan. While the planner is still under implementation, it is sound and complete, and has moderate efficiency. There’s still much future work to do to further optimize LPTPOP, e.g. more effective heuristics to tackle large branching factor, more features support (e.g. resource, processes, events, etc.) and so on.
References 1. A. Cesta, F. Pecora, and R. Rasconi. Biasing the Structure of Scheduling Problems Through Classical Planners. In (Ed.), Proceedings of WIPIS-04, ICAPS Workshop on “Integrating Planning into Scheduling”, Whistler, British Columbia, Canada, (2004) 3-7 2. Maria Fox and Derek Long, PDDL2.1: An extension to PDDL for expressing temporal planning domains. In Journal of artificial intelligence research Vol. 20 (2003) 61-124 3. K. Halsey, D. Long, and M. Fox Isolating where Planning and Scheduling Interact, In Proceedings from the 22nd UK Planning and Scheduling Special Interest Group (PlanSIG'03). (2003) 104-114 4. Do, M. and Kambhampati, S. SAPA: A Multi-objective Metric Temporal Planner. In Journal of Artificial Intelligence Research Vol. 20 (2003) 155-194 5. D. Smith, J. Frank, and A. Jónsson, Bridging the Gap Between Planning and Scheduling, in Knowledge Engineering Review Vol. 15, no. 1 (2000) 15:1 6. J. Scott Penberthy, Daniel S. Weld, Temporal planning with continuous change. In Proceedings of AAAI-94 (1994) 1010-1015 7. D. S. Weld. An introduction to least commitment planning. AI Magazine, 15(4), (1994) 27-61
1
However, we do not present the complete proof in this paper for space limitation.
Integrating Insurance Services, Trust and Risk Mechanisms into Multi-agent Systems Yuk-Hei Lam1 , Zili Zhang1,2 , and Kok-Leong Ong1 1
School of Engineering and Information Technology Deakin University Waurn Ponds, Victoria 3217, Australia {yuk, zzhang, leong}@deakin.edu.au 2 Faculty of Computer and Information Science Southwest University Chongqing 400715, China
Abstract. In multi-agent systems, there is often the need for an agent to cooperate with others so as to ensure that a given task is achieved timely and cost effectively. Currently multi-agent systems maximize this through mechanisms such as coalition formation, trust and risk assessments, etc. In this paper, we incorporate the concept of insurance with trust and risk mechanisms in multi-agent systems. The novelty of this proposal is that it ensures continuous sharing of resources while encouraging expected utility to be maximized in a dynamic environment. Our experimental results confirm the feasibility of our approach.
1
Introduction
In multi-agent systems, software agents are often required to seek external help when the allocated task can not be accomplished on their own. Even though agents are mostly self-interested, they tend to cooperate with other agents in order to improve their global and individual performance [1]. However, it is often difficult for agents to discover or search for the required resources1, e.g., manpower, platform access time, services such as a particular skill, etc., in such dynamic environments to achieve their objectives. Currently, agent systems solve this problem using mechanisms such as insurance services [2], negotiation [3], and trust and risk assessments [4,5]. In this paper, we integrate our insurance mechanism [2] with the trust and risk mechanisms. With insurance, the insured agents are better guaranteed to have the requested resources in execution without any extra effort in negotiation and resource discovery. With trust management, agents can evaluate the trustworthiness of potential collaborators through their own past experience and through the acquisition of another agent’s reputation provided by the insurance agents. With risk management, agents can make decisions on purchasing insurance according to their own personal risk attitude. 1
We define resources as anything that agents are willing to share or exchange with other agents.
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 849–853, 2006. c Springer-Verlag Berlin Heidelberg 2006
850
Y.-H. Lam, Z. Zhang, and K.-L. Ong
The remaining sections of this paper are organized as follows. In the next section, we define the concept of insurance and the trustworthiness of the potential partners will be examined. We then discuss how agents reason about demanding and purchasing insurance rationally in Section 3. In Section 4, we present the simulation results and the conclusions are laid out in Section 5.
2
The Insurance Model
Without loss of generality, we assume a multi-agent system operating in a networked environment, in which self-interested agents A1 , . . . , An are required to complete a set of assigned tasks, T = T1 , . . . , Tm , where n m. Moreover, each agent owns a set of resources, R = R1 , . . . , Rk , which is used for executing any given task. We further define that I1 , . . . , Ip be a set of insurance agents who provide insurance services in the system. Due to space limitation, this paper only concentrate on the integration of trust mechanism with the existing insurance model. 2.1
Local Reputation Ratings
Consider the situation where the insurance agent Ii maintains a list of agents, PA = {Aj , . . . , Ak }, who are capable of providing the requested resource to the insured agent Ai . It is therefore natural for the insured agent Ai to seek someone who is considered the most trustworthy, i.e., whose performance has been consistent over many transactions. In this case, the insured agent Ai can obtain a general local reputation of each potential collaborative partner (i.e., Aj ∈ PA), which leads us to the following: Definition 1. Let RL (Aj ) denote average the local reputation of agent Aj from agent Ai . Thus, RL is an indication of the trust assigned by Ai to Aj and for a given set of transactions with Ai , RL (Aj ) is given as: 1 L R (Aj , r) r=1
RL (Aj ) =
(1)
where RL (Aj , r) ∈ [0, 1) and r denotes the rth transaction of Ai with Aj . The local reputation of an agent Aj is assigned by an agent Ai , which forms the personal view of agent Ai about the average past performance of agent Aj . An agent Ai can then make use of this information to predict the future behaviour of agent Aj . However, if the agent Ai has not conducted any trasactions with the agent Aj before, then the agent Ai should only consider the global reputation rating of agent Aj instead. 2.2
Global Reputation Rating
When one is in lack of information of other entities, it is hard to decide who to trust. The word-of-mouth method provides a way for entities to gain knowledge
Integrating Insurance Services, Trust and Risk Mechanisms
851
or to get advice using the testimonies of third parties. Testimonies from other trusted parties are mainly used in the environments where entities have little or no prior knowledge of the potential collaborative partner(s). By using the insurance agents as the third trusted parties who propagate the global reputation rating of agents, the insured agents will be guaranteed to obtain the same reputation’s information about the other agents regardless the number of friends they have. In order for the insurance agents to take the role of the trusted third parties, they need to gather the local reputation rating from the insured agents so as to form the average global reputation of agents, RG (.). The global view of reputation about agent Aj is calculated as RG (Aj ) =
1 L R (Aj , n) n=1
(2)
where n denotes the nth agent who has submitted reputation rating of the agent Ai to the insurance agent Ii and denotes the total number of submissions. In other words, the insurance agent Ii collected the reputation ratings submitted by other agents who has made transactions with the agent Aj previously. 2.3
Trust Degree
Once the values of local and global reputation about each potential collaborative partner are obtained, the insured agent Ai is now required to quantify the amount of trust it has on all the potential collaborative partners. If an agent Ai evaluates the trustworthiness of the agent Aj , we have the global reputation rating RG (Aj ), which is a score collected by the insurance agents about Aj . At the same time, every agent has its own score for Aj ’s reputation based on its own experience, i.e., RL (Aj ). Therefore, when an agent decides to initiate a new transaction using insurance, it needs to reconcile its knowledge (i.e., RL (Aj )) about Aj with the global score (i.e., RG (Aj )) before passing a judgement. Definition 2. Given agents Ai and Aj , the trust degree TD quantifies the amount of trust Ai has on Aj , based on Aj ’s local reputation, RL (Aj ), held by the agent Ai , and the global reputation, RG (Aj ), provided by the insurance agent Ii . TDi is given as: TD (Aj ) = w1 ∗ RG (Aj ) + w2 ∗ RL (Aj )
(3)
where w1 and w2 are the weight given by agent Ai where w1 + w2 = 1. The purpose of weighting RG and RL is to obtain the appropriate broad-sense trust degree TD that is relevant to the context of a given situation. If an agent relies on more about its local reputation rating, then it can adjust the weight
852
Y.-H. Lam, Z. Zhang, and K.-L. Ong
accordingly (i.e., w1 > w2 ). In contrast, an agent might rely on the global reputation rating if it has little or even no prior knowledge of the potential collaborative partner (i.e., w1 < w2 ). Furthermore, if an agent has no past experience with the potential collaborative partners, then the weight can be set as w1 = 1 accordingly. To decide which agent is the most trustworthy, the insured agents are required to select the agent with the highest trust degree, TD (.), among the potential collaborative partners, i.e., {p ∈ PA | ∀g ∈ PA, p = g, TD (p) > TD (g)}
3
(4)
Reasoning Insurance Purchase with Trust
We have so far considered the beneficial of the insurance concept and trust issues in multi-agent systems. However, agents need not purchase insurance for every transaction. In fact, they must reason about the effectiveness of purchasing insurance in different circumstances. In [2], evaluation methods are based on the criticality of task, financial aspects and risk assessment. In this paper, trust issues are also considered to enrich the decision making. If the potential collaborative partner is not trustworthy from the insured agents perspective, there is no point for the insured agents to commit the insurance contract from the beginning. Therefore, it is crucial for the insured agents to evaluate the trustworthiness of the potential collaborative partners that gives an indication of how they will be performing in the future. Although Equation (3) enable the insured agents to identify the most trustworthy one among the potential collaborative partners, the criteria of being a trustworthy agent is different from agents to agents. As a result, the trust weight threshold Ω(Ai ) is set to determine the level of trust accepted by the agent Ai . In this case, an agent Ai who wishes to evaluate the trustworthiness of agent Aj will therefore test for the following rule: if TD (Aj ) Ω(Ai ), then agent Aj is trustworthy.
(5)
The weighting of Ω(Ai ) depends on the personal attitude of agents. If an agent is risk neutral, the value of Ω(Ai ) will be neutral (i.e., the median of the trust degree, TD ). For risk averse agents, the value of Ω(Ai ) will be higher than the risk neutral agents as this kind of agents prefer less risk involved in transactions. Alternatively, the risk seeking agents rather take risk, so that it is still acceptable if the value of Ω(Ai ) is lower than the average.
4
Empirical Results
Experiments have been carried out in order to verify the effectiveness and benefits of applying insurance concept in the multi-agent system. Due to space limitation, we only report a summary of our results here. The full details can be obtained from [6].
Integrating Insurance Services, Trust and Risk Mechanisms
853
From the experimental results, we can conclude that insurance concept helps to stabilize the performance of agents, enable better utilization of resources, and thus increase the throughput of the whole system. In general, agents with insurance have more stable performance and they often can reach a higher success rate. Second, insurance concept works well in all different environments (from only 10% of resources to 90% resources available in the system): agents can still maintain a high success rate with insurance in all cases. Although the reputation evaluation enables agents maximizing the chances in dealing with the most trustworthy agent, it is evidence that the reputation evaluation also destroys most of the cooperation opportunities set by the insurance agents.
5
Conclusions
The concept of insurance with trust and risk mechanism provides agents an alternative approach to allocating resources and to avoiding risky situations. The demonstration of the use of insurance are addressed in Appendix A. From our initial simulation results, we have shown the beneficial of applying the insurance, trust and risk concept in multi-agent systems and thus have evidence to support the feasibility of our proposal.
References 1. Breban, S., Vassileva, J.: A coalition formation mechanism based on inter-agent trust relationships. In: International Conference on Autonomous Agents and Multiagents Systems ’02, New York, NY, USA (2002) 306–307 2. Lam, Y., Zhang, Z., Ong, K.: Insurance services in multi-agent systems. In: Proceedings of the 18th Australian Joint Conference on Artificial Intelligence. (2005) 664–673 3. Luo, X., Jennings, N.R., Shadbolt, N., fung Leung, H., man Lee, J.H.: A fuzzy constraint-based knowledge model for bilateral, multi-issue negotiations in semicompetitive environments. Artificial Intelligence 148 (2003) 53–102 4. He, M., Jennings, N.R., Leung, H.: On agent-mediated electronic commerce. IEEE Trans on Knowledge and Data Engineering 15 (2003) 985–1003 5. He, M., Leung, H., Jennings, N.R.: A fuzzy logic based bidding strategy for autonomous agents in continuous double auctions. IEEE Trans on Knowledge and Data Engineering 15 (2003) 1345–1363 6. Lam, Y., Zhang, Z., Ong, K.: Integrating insurance services, trust and risk mechanisms into multi-agent systems. Technical Report TRC06/07, School of Engineering and Information Technology, Deakin University, http://www. deakin.edu.au/∼yuk/TechReports/TrustInsuranceModel06.pdf (2006)
Cat Swarm Optimization Shu-Chuan Chu1, Pei-wei Tsai2, and Jeng-Shyang Pan2 1
Department of Information Management, Cheng Shiu University 2 Department of Electronic Engineering, National Kaohsiung University of Applied Sciences
Abstract. In this paper, we present a new algorithm of swarm intelligence, namely, Cat Swarm Optimization (CSO). CSO is generated by observing the behaviors of cats, and composed of two sub-models, i.e., tracing mode and seeking mode, which model upon the behaviors of cats. Experimental results using six test functions demonstrate that CSO has much better performance than Particle Swarm Optimization (PSO).
1 Introduction In the field of optimization, many algorithms were being proposed recent years, e.g. Genetic Algorithm (GA) [1-2], Ant Colony Optimization (ACO) [6-7], Particle Swarm Optimization (PSO) [3-5], and Simulated Annealing (SA) [8-9] etc. Some of these optimization algorithms were developed based on swarm intelligence. Cat Swarm Optimization (CSO), the algorithm we proposed in this paper, is motivated from PSO [3] and ACO [6]. According to the literatures, PSO with weighting factor [4] usually finds the better solution faster than the pure PSO, but according to the experimental results, Cat Swarm Optimization (CSO) presents even much better performance. Via observing the behavior of creatures, we may get some idea for solving the optimization problems. By studying the behavior of ants achieves ACO, and with examining the movements of the flocking gulls realizes PSO. Through inspecting the behavior of cat, we present Cat Swarm Optimization (CSO) algorithm.
2 Behaviors of Cats According to the classification of biology, there are about thirty-two different species of creatures in feline, e.g. lion, tiger, leopard, cat etc. Though they have different living environments, there are still many behaviors simultaneously exist in most of felines. In spite of the hunting skill is not innate for felines, it can be trained to acquire. For the wild felines, the hunting skill ensures the survival of their races, but for the indoor cats, it exhibits the natural instinct of strongly curious about any moving things. Though all cats have the strong curiosity, they are, in most times, inactive. If you spend some time to observe the existence of cats, you may easily find that the cats spend most of the time when they are awake on resting. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 854 – 858, 2006. © Springer-Verlag Berlin Heidelberg 2006
Cat Swarm Optimization
855
The alertness of cats are very high, they always stay alert even if they are resting. Thus, you can simply find that the cats usually looks lazy, lying somewhere, but opening their eyes hugely looking around. On that moment, they are observing the environment. They seem to be lazy, but actually they are smart and deliberate. Of course, if you examine the behaviors of cats carefully, there would be much more than the two remarkable properties, which we discussed in the above.
3 Proposed Algorithm In our proposed Cat Swarm Optimization, we first model the major two behaviors of cats into two sub-models, namely, seeking mode and tracking mode. By the way of mingling with these two modes with a user-defined proportion, CSO can present better performance. 3.1 The Solution Set in the Model -- Cat No matter what kind of optimization algorithm, the solution set must be represented via some way. For example, GA uses chromosome to represent the solution set; ACO uses ant as the agent, and the paths made by the ants depict the solution sets; PSO uses the positions of particles to delineate the solution sets. In our proposed algorithm, we use cats and the model of behaviors of cats to solve the optimization problems, i.e. we use cats to portray the solution sets. In CSO, we first decide how many cats we would like to use, then we apply the cats into CSO to solve the problems. Every cat has its own position composed of M dimensions, velocities for each dimension, a fitness value, which represents the accommodation of the cat to the fitness function, and a flag to identify whether the cat is in seeking mode or tracing mode. The final solution would be the best position in one of the cats due to CSO keeps the best solution till it reaches the end of iterations. 3.2 Seeking Mode This sub-model is used to model the situation of the cat, which is resting, looking around and seeking the next position to move to. In seeking mode, we define four essential factors: seeking memory pool (SMP), seeking range of the selected dimension (SRD), counts of dimension to change (CDC), and self-position considering (SPC). SMP is used to define the size of seeking memory for each cat, which indicates the points sought by the cat. The cat would pick a point from the memory pool according to the rules described later. SRD declares the mutative ratio for the selected dimensions. In seeking mode, if a dimension is selected to mutate, the difference between the new value and the old one will not out of the range, which is defined by SRD. CDC discloses how many dimensions will be varied. These factors are all playing important roles in the seeking mode. SPC is a Boolean variable, which decides whether the point, where the cat is already standing, will be one of the candidates to move to. No matter the value of SPC
856
S.-C. Chu, P.-w. Tsai, and J.-S. Pan
is true or false; the value of SMP will not be influenced. How the seeking mode works can be described in 5 steps as follows: Step1: Make j copies of the present position of catk, where j = SMP. If the value of SPC is true, let j = (SMP-1), then retain the present position as one of the candidates. Step2: For each copy, according to CDC, randomly plus or minus SRD percents of the present values and replace the old ones. Step3: Calculate the fitness values (FS) of all candidate points. Step4: If all FS are not exactly equal, calculate the selecting probability of each candidate point by equation (1), otherwise set all the selecting probability of each candidate point be 1. Step5: Randomly pick the point to move to from the candidate points, and replace the position of catk.
Pi =
FSi − FSb , where 0 < i < j FS max − FS min
(1)
If the goal of the fitness function is to find the minimum solution, FSb = FSmax, otherwise FSb = FSmin. 3.3 Tracing Mode Tracing mode is the sub-model for modeling the case of the cat in tracing some targets. Once a cat goes into tracing mode, it moves according to its’ own velocities for every dimension. The action of tracing mode can be described in 3 steps as follows: Step1: Update the velocities for every dimension (vk,d) according to equation (2). Step2: Check if the velocities are in the range of maximum velocity. In case the new velocity is over-range, set it be equal to the limit. Step3: Update the position of catk according to equation (3).
vk , d = vk , d + r1 × c1 × (xbest , d − xk , d ) , where d = 1,2,…,M
(2)
xbest,d is the position of the cat, who has the best fitness value; xk,d is the position of catk. c1 is a constant and r1 is a random value in the range of [0,1].
xk , d = xk , d + vk , d
(3)
3.4 Cat Swarm Optimization As we described in the above subsection, CSO includes two sub-models, the seeking mode and the tracing mode. To combine the two modes into the algorithm, we define a mixture ratio (MR) of joining seeking mode together with tracing mode. By observing the behaviors of cat, we notice that cat spends mot of the time when they are awake on resting. While they are resting, they move their position carefully and slowly, sometimes even stay in the original position. Somehow, for applying this behavior into CSO, we use seeking mode to represent it.
Cat Swarm Optimization
857
The behavior of running after targets of cat is applied to tracing mode. Therefore, it is very clear that MR should be a tiny value in order to guarantee that the cats spend most of the time in seeking mode, just like the real world. The process of CSO can be described in 6 steps as follows: Step1: Create N cats in the process. Step2: Randomly sprinkle the cats into the M-dimensional solution space and randomly select values, which are in-range of the maximum velocity, to the velocities of each cat. Then haphazardly pick number of cats and set them into tracing mode according to MR, and the others set into seeking mode. Step3: Evaluate the fitness value of each cat by applying the positions of cats into the fitness function, which represents the criteria of our goal, and keep the best cat into memory. Note that we only need to remember the position of the best cat (xbest) due to it represents the best solution so far. Step4: Move the cats according to their flags, if catk is in seeking mode, apply the cat to the seeking mode process, otherwise apply it to the tracing mode process. The process steps are presented above. Step5: Re-pick number of cats and set them into tracing mode according to MR, then set the other cats into seeking mode. Step6: Check the termination condition, if satisfied, terminate the program, and otherwise repeat step3 to step5.
4 Experimental Results We applied CSO, PSO and PSO with weighting factor into six test functions to compare the performance. All the experiments demonstrate the proposed Cat Swarm Optimization (CSO) is superior to PSO and PSO with weighting factor. Due to the space limit of this paper, only the experimental results of test function one shown in Fig. 1. )s1.00E+09 ix A go 1.00E+08 L ( e lua V1.00E+07 ss neti F 1.00E+06
Rosenbrock Function
CSO 1.00E+05
PSO with WF PSO
1.00E+04 1.00E+03 1.00E+02 1.00E+01 1.00E+00
1
0 10
0 20
0 30
0 40
0 50
0 60
0 70
0 80
0 0 0 0 0 0 0 0 0 0 0 0 Iteration 90 100 110 120 130 140 150 160 170 180 190 200
Fig. 1. The experimental result of test function 1
858
S.-C. Chu, P.-w. Tsai, and J.-S. Pan
References 1. Goldberg, D.E.: Genetic Algorithm in Search. Optimization and Machine Learning. Addison-Wesley Publishing Company (1989) 2. Pan, J. S., McInnes, F. R., Jack, M. A. : Application of Parallel Genetic Algorithm and Property of Multiple Global Optima to VQ Codevector Index Assignment. Electronics Letters 32(4) (1996) 296-297 3. Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. Sixth International Symposium on Micro Machine and Human Science (1995) 39-43 4. Shi, Y., Eberhart, R.: Empirical study of particle swarm optimization. Congress on Evolutionary Computation. (1999) 1945-1950 5. Chang, J. F., Chu, S. C., Roddick, J. F., Pan, J. S. : A Parallel Particle Swarm Optimization Algorithm with Communication Strategies. Journal of Information Science and Engineering 21(4) (2005) 809-818 6. Dorigo, M., Gambardella, L. M.: Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans. on Evolutionary Computation. 26 (1) (1997) 5366 7. Chu, S. C., Roddick, J. F., Pan, J. S.: Ant colony system with communication strategies. Information Sciences 167 (2004) 63-76 8. Kirkpatrick, S., Gelatt, Jr. C.D., Vecchi, M.P.: Optimization by simulated annealing. Science (1983) 671-680 9. Huang, H. C., Pan, J. S., Lu, Z. M., Sun, S. H., Hang, H.M.: Vector quantization based on generic simulated annealing. Signal Processing 81(7) (2001) 1513-1523
Heuristic Information Based Improved Fuzzy Discrete PSO Method for Solving TSP Bin Shen, Min Yao, and Wensheng Yi College of Computer Science, Zhejiang University, Hangzhou, 310027, P.R. China
[email protected],
[email protected] Abstract. In this paper, we propose an improved fuzzy discrete Particle Swarm Optimization method (IFD-PSO), and apply this method to TSP. We use fuzzy matrix space to represent the corresponding TSP solution, and bring forward the transformation method of fuzzy matrix space. Heuristic information is employed to improve the convergence speed. The experiment results show that IFD-PSO has a better performance and achieves satisfactory effect.
1 Introduction Recently, there have been several work on solving Traveling Salesman Problem (TSP) using Particle Swarm Optimization (PSO)[1,2,3,4] algorithm. Reference [2], [3], [4] have proposed Discrete PSO method based on swap operation. The disadvantage of this method is that it is easy to get into the local optimization comparing with the basic PSO. Wei Pang et al.[1] apply a fuzzy discrete PSO (FD-PSO) to solve TSP, and can achieve satisfactory effect. In order to further improve the performance of solving TSP using PSO, we propose an improved fuzzy discrete PSO algorithm (IFD-PSO).
2 Fuzzy Matrix Representing TSP Solution 2.1 Construction of Fuzzy Matrix Assume the solution of TSP is S = {s1 , s2 ,L, sn } = {(s1, s2 ), (s2 , s3 ),L, (sn−1, sn ), (sn , s1 )} , where n is the number of cities, si (i = 1,L, n) is the i-th visited city node in this solution, and ( s1 , s2 ), ( s2 , s3 ),L, ( sn−1 , sn ), ( sn , s1 )
can be represented as
are the visited directed edges in turn. Then fuzzy matrix R
R = (rij ) n,n
, where,
rij ∈ [0,1](i, j = 1,L , n)
means the possibility of
choosing directed edge (i, j ) , after city i has been chosen in TSP solution. In order to avoid directed edge (i, i ) (i = 1,L , n) appearing in TSP solution, we need to set the elements in diagonal of fuzzy matrix enough small, thus let rii (i = 1,L, n) = − Max . Thus the fuzzy matrix representing TSP solution can be constructed. 2.2 Transforming Fuzzy Matrix into TSP Solution While a n× n fuzzy matrix R is gotten, it needs to be defuzzified to obtain the corresponding TSP solution. Firstly, R is defuzzified according to global maximum Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 859 – 863, 2006. © Springer-Verlag Berlin Heidelberg 2006
860
B. Shen, M. Yao, and W. Yi
method and first row maximum method respectively. Then we would like to compare the fitness of two solutions, and choose the better one as our final solution. Global maximum method and first row maximum method are listed below, where a flag array of n bits is set to record whether the corresponding column has been selected or not: Method 1. Global maximum method Step 1. Initialize the flag array as all unselected. Step 2. Choose the maximum element in fuzzy matrix R , as might well be rs s , which represents the directed edge (s1, s2 ) has been selected. We need to mark the flags 1 2
of column s1 and column s2 in flag array as “selected”, to avoid returning to the visited cities without visiting all of the cities. Step 3-1. Choose the maximum element among unselected columns in row s2 of fuzzy matrix R , and let it be rs s . Thus directed edge (s2 , s3 ) is selected as the subse2 3
quence edge of (s1 , s2 ) . Similarly, we should mark column s3 as “selected”. Step 3-2. Repeat Step 3-1 and obtain directed edges ( s3 , s4 ),L, ( sn−1 , sn ) in turn. At the same time, this TSP solution visits every city once and only once. Step 4. Complete the TSP solution with directed edge (sn , s1 ) , which means traveler returns to the start from the last visited city. Thus, we achieve the solution ( s1 , s2 ), ( s2 , s3 ),L, ( sn−1 , sn ), ( sn , s1 ) . Method 2. First row maximum method We only need to change Step2 of Method 1, where “in fuzzy matrix R ” is instead with “in the first row of fuzzy matrix R ”. Thus Method 2 is obtained.
3 Heuristic Information Based Improved Fuzzy Discrete PSO 3.1 Symbol Definitions Definition 1. Fuzzy matrix position (FM-position for short) X = (rij ) n ,n , whose diagonal elements are − Max , represents a corresponding solution in TSP solution space. Definition 2. Fuzzy matrix velocity (FM-velocity for short) V= (vij)n,n, whose diagonal elements are 0, represents the change of FM-position. It can keep the diagonal elements of fuzzy matrix X being − Max among the iterative operations. Definition 3. The addition and subtraction of basic PSO are redefined as the addition and subtraction between matrixes, which are signed as ⊕ and Θ ; The multiplication is redefined as every element in R multiplied by α , marked as α ⊗ R . 3.2 Initialization Using Heuristic Information In TSP, we know the following heuristic information: the smaller the distance (cost) between two cities, the more the possibility of directed edge between these two cities being selected. Let the distance matrix between cities is D = (d ij ) n ,n , where,
Heuristic Information Based Improved Fuzzy Discrete PSO Method
d ij = ( xi − x j ) 2 + ( yi − y j ) 2
,
i, j = 1, L , n , xi , yi
861
are the X-coordinate and Y-coordinate of city i .
It is obvious that dii = 0 , i = 1, L , n . Because every element in FM-position denotes the possibility of the corresponding edge being selected, FM-position X is initialized as follows. First, set the diagonal elements as − Max , then set other element xij (i, j = 1,L , n; i ≠ j )
as 1/ d ij .Thus particles can quickly converge to the local minimum, and
try to find the better one. To keep the diversity of particles, we only need to set one particle at the local minimum. Every element except diagonal element in vij, (i,j=1, L , n; i ≠ j ) is initialized according to − Vmax ≤ vij ≤ Vmax , where Vmax is the control parameter. 3.3 Constraints of Standardization In order to keep the comparability of FM-velocity and FM-position during iterative operations, standard FM-position X = ( xij ) n ,n should satisfy the following conditions: n
∑x
j =1; j ≠i
n
ij
= 1, (i = 1,L, n)
(1)
xij ∈ [0,1], (i, j = 1,L, n; i ≠ j ) (2)
xii = − Max, (i = 1,L, n) (3)
Standard FM-velocity V = (vij ) n,n should satisfy the following conditions: (4) − Vmax ≤ vij ≤ Vmax , (i, j = 1,L , n; i ≠ j ) (5) vii = 0, i = 1,L, n (6) v = 0, (i = 1,L , n )
∑
j =1; j ≠ i
ij
We can prove that once FM-position satisfies condition (3) and FM-velocity satisfies condition (6) respectively, we needn’t adjust them again to satisfy condition (3) and condition (6) in iterative operations. The proof is similar to the proof in FDPSO’s initialization process[1]. We omit it for the limitation of space. We also can prove that once FM-position satisfies condition (1) and FM-velocity satisfies condition (4) respectively, we needn’t adjust them again to satisfy condition (1) and condition (4) (except condition (2) or condition (5) is violated). The proof is omitted here. 3.4 Standardization of FM-Position and FM-Velocity After the initialization of FM-position or several iterative operations, condition (2) may be violated, thus it is necessary to standardize the FM-position again. The standardization method of FM-position is shown as below: Step 1. Check whether or not there exist elements except in the diagonal which is less than 0. If yes, we should set them as 0. Step 2. Set the diagonal elements of FM-position as the value − Max , and let every n .Thus the transformed FM-position other element x jk ( j, k = 1,L , n; j ≠ k ) be x jk
∑x
i =1;i ≠ j
ji
can satisfy condition (1), (2), (3). After the initialization of FM-velocity or several iterative operations, condition (5) may be violated, so it is necessary to standardize the FM-velocity again. The standardization method of FM-velocity is shown as following:
862
B. Shen, M. Yao, and W. Yi
Step 1. Check whether or not there exist elements violating condition (5). If yes, we set the element which is less than − Vmax as − Vmax , and the element which is big than Vmax as Vmax . Therefore we have − Vmax ≤ vij ≤ Vmax .
Step 2. Let the diagonal elements hold the original value, and transform every other element v jk ( j, k = 1,L , n; j ≠ k ) into v jk 1 n v ji . 2
−
∑
n − 1 i =1;i ≠ j 2
Obviously, after the above transformation, the FM-velocity satisfies condition (4) and (6). We also can proof that it satisfies condition (5). The proof is omitted here. 3.5 Description of the Algorithm Step 1. Set the population size as Num , and set the maximum number of generations as MaxNum . Initialize and standardize the FM-position and the FM-velocity. Then the local best pbest i = X 0 , and the global best gbesti is set as the best position among pbesti . Step 2. If current iteration number is equal to MaxNum , go to step 5. Step 3. Calculate the new FM-position and FM-velocity for all particles. Compute the new velocity. If the new velocity is against condition (5), standardize the velocity using our proposed method. Calculate the new position. If the new position goes against constraint (2), standardize it using our proposed method. Obtain the fitness of new position. If the fitness is better than that of the local best of the particle, update the local best position with the new position. Step 4. If there exist some particles, whose fitness of local best position is better than that of the global best, update the global best position with the best one of local best positions. Then go to step 2. Step 5. Output the global best position, the solution, and its fitness value.
4 Experiment Results We test IFD-PSO using Burma14 in TSPLib, and compare the experiment results with DPSO[4] and FD-PSO[1]. The experiment is made on PC (AMD Sempron 2400+, 512M RAM, WinXP OS, matlab7.0). We set the population size as 100, the maximum number of generations as 1000, and test them for 20 runs. Suppose the corresponding fitness is Ei (t ), (i = 1,L,20; t = 1,L1000) , then we list the mean fitness 20
E (t ) = ∑ Ei (t ) 20
in Figure 1 to describe the convergence speed. The inertia weight w is
i =1
different in these algorithms. In the IFD-PSO, a linearly decreasing inertia weight is used which starts at 1, and ends at 0; in the FD-PSO, w is set as 1 all the time; in the DPSO, w is 0.9 all along. From Fig. 1 and Table 1, it can be found out that IFD-PSO has the best convergence speed and FD-PSO is in the next place. Consider the robustness and effectiveness from various criterions, IFD-PSO has the best performance, and DPSO is worst.
Heuristic Information Based Improved Fuzzy Discrete PSO Method
863
Table 1. Comparisons of IFD-PSO, FD-PSO and DPSO for bruma14
Times of converging to the global optimum value/ Total running times Best fitness value Mean fitness value Worst fitness value Iteration numbers of the best running
IFDPSO 35%
FDPSO 5%
DPSO
30.879 31.449 33.437 32
30.879 33.071 35.318 461
31.807 36.119 40.87 1000
0%
Fig. 1. The convergence curves of IFD-PSO, FD-PSO and DPSO
5 Conclusions In this paper, we propose a PSO method IFD-PSO, and apply it to TSP. We use fuzzy matrix space to represent the corresponding TSP solution, and bring forward the transformation method. Heuristic information is employed to improve the convergence speed. The experiment results show that IFD-PSO can converge to the optimal value more quickly than the current fuzzy discrete PSO, and has a better performance. This method can also be utilized in various applications which can be transformed into TSP.
References 1. Wei Pang, Kang-ping Wang, Chun-guang Zhou, Long-jiang Dong. Fuzzy discrete particle swarm optimization for solving traveling salesman problem. Proceedings of the Fourth International Conference on Computer and Information Technology. 2004: 796-800. 2. Wei Pang, Kang-ping Wang, et al. Modified particle swarm optimization based on space transformation for solving traveling salesman problem. Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, 2004: 2342-2346. 3. Maurice Clerc. Discrete Particle Swarm Optimization. http://clerc.maurice.free.fr/ pso/pso_tsp/Discrete_PSO_TSP.htm. 4. Kang-Ping Wang, et al. Particle Swarm Optimization for Traveling Salesman Problem. The Second International Conference on Machine Learning and Cybernetics, 2003.
A Network Event Correlation Algorithm Based on Fault Filtration* Qiuhua Zheng1,2, Yuntao Qian1,2, and Min Yao1 1
Computational Intelligence Research Laboratory, College of Computer Science Zhejiang University, China 2 State key Laboratory of Information Security, Institute of Software of Chinese Academy of Sciences, Beijing, China
[email protected],
[email protected] Abstract. This paper proposed a new event correlation technique to enhance the heuristic of the increment hypothesis updating (IHU) algorithm. This approach estimates the likelihood of each fault in the faults set and removes these faults with less likelihood. By this approach we also can determine whether an event is spurious or not. Simulation shows that this approach can get a high accuracy and fast speed of correlation even if the network has a high event loss and spuriousness.
1 Introduction Event correlation, a central aspect of network fault diagnosis, is a process of analyzing alarms received to isolate possible root causes responsible for network’s symptoms occurrences. Since failures are unavoidable in large and complex network, an effective event correlation can make network system more robust, and their operation more reliable, ultimately increasing the confidence level in the services they provide. Now network event correlation technique has been a focus of research activity[1-7]. In previous works, we proposed an correlation technique IHUCB that integrates the IHU algorithm with the codebook approach[8]. This approach utilizes the codebook technique to encode the network’s fault-symptom model, and uses the IHU algorithm to create the fault hypotheses set, and then calculate these fault hypotheses’ likelihood through the codebook approach. This algorithm can correlate multiple faults cases when its codebook only include codes of the single problem, and still has its efficiency and robust to event lost and spuriousness. However, the IHUCB algorithm does neither consider the difference among the likelihood of faults nor deal with spurious events specially. It treats spurious events and real events with the same method. Many fault hypotheses with low probability during the fault hypotheses updating phase are not removed until the final measurement phase. It increases the time of correlation events because of the over-large fault hypotheses set and results in a high false positive rate because of faults in fault hypothesis where some one are to explain received spurious events. In addition, since all of events are thought of real *
This research is supported partly by Science and Technology Project of Zhejiang (2006C21001).
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 864 – 869, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Network Event Correlation Algorithm Based on Fault Filtration
865
events, sometimes the results of algorithm are not the faults with the smallest distance for real events. To avoid these disadvantages, we introduce a new heuristic approach which estimates the likelihood of faults and removes these faults with less likelihood. The rest of this paper is organized as the following. In section 2, we describe a formal model of the event correlation. In section 3, we propose the new event correlation algorithm. In section 4, we describe the simulation result for evaluation the proposed technique. Finally, we conclude our work and present the direction for future research in this area in section 5.
2 The Event Correlation Model A fault-symptom map can be represented by a bipartite directed graph that encodes direct causal relationships between faults and a set of observed symptoms when the fault occurs. This map can also be represented by a dependence matrix where dij=1 means that if fault fj occurs it will lead to symptom si, and dij=0 means that fault fj is independent with symptom si, i.e., fault fj occurs without causing symptom si. An illustration example can be shown at fig.1. s1
f1
s3
s2
f2
s5
s4
f3
s6
f4
s1
s2
s3
s4
s5
s6
f1
1
0
0
1
0
0
f2
1
1
1
0
1
0
f3
0
1
0
0
1
0
f4
0
0
0
1
0
1
Fig. 1. A fault-symptom map and the dependence matrix
When faults F occurred in the network system, they will lead to the corresponding symptoms coming forth, and the monitor system will receive a number of alarms. This procedure is called fault propagation. On the contrary, fault diagnosis is the reverse procedure of fault propagation. It aims to analyze the observed events E according to the dependence matrix D for finding the set of possible fault hypotheses FH so that the events E occurrence likelihood is maximal. In fault propagation procedure, the symptoms occurred in system can be gained by the equation S = D * F . General, event correlation can be divided into two phases: first, the phase for creating fault hypotheses, whose task is to create these fault hypotheses sets which can explain received events; and secondly the phase for measurement beliefs of these fault hypotheses, whose task is to measure these fault hypotheses sets and choose the best fault hypotheses as result.
3 The Event Correlation Algorithm Based on Fault Filtration In the IHU technique[4], when an event ei is received, this algorithm creates a set of fault hypothesis FHSi by updating FHSi-1 with an explanation of the event ei. The meaning that hypothesis hk can explain event ei EO is hypothesis hk includes at least one fault which can lead to the event ei occurrence. At the worst case, there may be 2|Ψ| fault hypotheses that can be responsible for fault events EO, so fault hypotheses
∈
866
Q. Zheng, Y. Qian, and M. Yao
set don’t contain all subsets that can explain events EO. A greedy algorithm will result in fast growth of fault hypothesis set’s size then lead to the computational complexity of event correlation algorithm is unacceptable. Literature[4] proposed a heuristics approach which uses a function u(fl) to determine whether fault fl can be added into hypothesis hk FHSi-1. Fault fl Fei can be appended into hk FHSi-1 only if the size of hk, |hk|, is smaller than u(fl), where function u(fl) is defined as the minimal size of a hypothesis in FHSi-1 that contains fault fl. In IHUCB algorithm[8], we introduced a constraint to the maximal fault number to enhance the heuristic approach, and combined with the weighted hamming distance to measure the result. The result of IHUCB algorithm shows that the heuristic approach can reduce hypotheses to a great extent. However, just as mentioned in section 1, IHUCB algorithm does neither consider the differences among the likelihood of faults nor deal with spurious events specially. Therefore, we propose a novel approach, which incorporates the likelihood of fault to decrease the hypothesis and judge whether event is spurious or not.
∈
∈
∈
3.1 The Heuristic Approach for Fault Filtration
To avoid the disadvantages of IHUCB, we point out a heuristic approach, which can judge whether the event is spurious and creates fault set which explain events for every received event. It is described in detail as follows: (1) First of all, this approach gets a fault set Fei which is composed of all faults those can lead to event ei occurred. (2) Then, for each fault in set Fei, find the event vector corresponding to this fault from codebook, and compare this vector with the received event vector Erec to calculate the number of loss events. When the number of loss events is more than a predicted threshold value T (In section 3.2, we will introduce how to compute the threshold value), we consider that event ei is not caused by this fault and remove it from set Fei. Repeat this step till all the faults in set Fei have been processed.(3) If set Fei, is empty, we think event ei is spurious and remove this event from Erec. Otherwise, set Fei is returned as the fault set which can explain event ei. 3.2 How to Compute the Threshold Value
There are two kinds of methods to compute the threshold value. (1) Static threshold value method. It chooses a fixed threshold value all the time. (2) Dynamic threshold value method. It calculates the threshold value dynamically according to the number of events in codebook and the average loss ratio of events. In this paper, we design a method, which makes the probability of loss events number is more than threshold value T is less than a value α, i.e., P(Event loss num >= T) 1 − a
(1)
4 Simulation Study In this section, we describe the simulation study performed to evaluate the technique presented in this paper. In our simulation, we use LR to represent the event loss ratio,
A Network Event Correlation Algorithm Based on Fault Filtration
867
SR to represent the event spurious ratio, CB to represent the codebook and FP to denote the distribution of fault occurrence. Since it is impossible to exhaust all codebook of a given size, we only test limited codebooks in each size. Given parameters of LRl, LRh, SRl, SRh, for given parameters of the faults number FN and the codebook’ symptoms number SN, we design K simulation cases as follows: Randomly create the codebook CBi(1 i K); Randomly generate prior fault probability distribution FPi, which is a uniformly distributed and their sum is 1; ) for every event in the codebook which is Randomly generate the LR kj (1 j accord with the uniform distribution [LRl, LRh]; Randomly generate the SR kj (1 j )for every event accord with the uniform distribution [SRl, SRh]. For i-th simulation case(1 i K), we create M simulation scenarios as follows. Generate fault number distribution FNP according with FP and FN; Randomly generate the set Fki (1 k M) of faults according to the fault distribution FPi and the fault number distribution FNP. Generate the code CFi of Fki (1 k M) according to the codebook CBi. Randomly generate the loss events with LRi and the spurious events with SRi, then generate the observed events Ei by adding the noisy events to CFi; Correlate the observed events E ki (1 k M) and gain the correlation result with the algorithm IHUCB and algorithm IHUCBFPF; Calculate the detection rate DRki (1 k M) and the false positive rate FPRki (1 k M) with the following equations.
≤≤
≤ ≤SN
≤ ≤SN
≤≤
≤≤
≤≤
≤≤
DRki For
≤≤
≤≤ =|F ∩F |/|F |, FPR =|F \F |/|F | k iD
k iC
k iD
k i
k iD
k iC
k iD
(2)
i-th
simulation case we calculate the mean detection rate and mean false detection rate FPR = 1/ M ∑ Mk=1 FPR . Then, we ∑ calculate the expected values of detection rate and false detection rate denoted by M k k =1 DRi
DRi = 1/ M
k
i
i
1
1
1 0.98 0.96
0.85 20
IHUCB(LR=0, SR=0) IHUCBFPF (LR=0, SR=0) IHUCB(LR=0, SR=0.08) IHUCBFPF (LR=0, SR=0.08) IHUCBFPF(LR=0, SR=0.1) IHUCB(LR=0, SR=0.1)
25
30 35 40 Fault number of codebook
IHUCB(LR=0, SR=0) IHUCBFPF(LR=0, SR=0) IHUCB(LR=0.08, SR=0) IHUCBFPF(LR=0.08, SR=0) IHUCB(LR=0.1, SR=0) IHUCBFPF(LR=0.1, SR=0)
0.9
45
0.85 20
50
Detec tion rate
Detec tion rate
Detection rate 0.9
0.94
0.95
0.95
25
0.92 0.9 0.88 0.86 0.84 0.82
30 35 40 Fault number of codebook
45
0.8 20
50
IHUCB(LR=0.08, SR=0) IHUCBFPF(LR=0.08, SR=0) IHUCB(LR=0.08, SR=0.08) IHUCBFPF(LR=0.08, SR=0.08) IHUCBFPF(LR=0.08, SR=0.1) IHUCBFPF(LR=0.08, SR=0.1)
25
30 35 40 Fault number of codebook
45
50
45
50
Fig. 2. The detection rate of two algorithms
Fals e positive rate
0.25
0.05 IHUCBFPF(LR=0,SR=0) IHUCBFPF(LR=0,SR=0) IHUCB (LR=0,SR=0.08) IHUCBFPF(LR=0,SR=0.08) IHUCB(LR=0,SR=0.1) IHUCBFPF(LR=0,SR=0.1)
0.25 IHUCB (LR=0, SR=0) IHUCBFPF(LR=0, SR=0) IHUCB (LR=0.08, SR=0) IHUCBFPF(LR=0.08, SR=0) IHUCB (LR=0.1, SR=0) IHUCBFPF(LR=0.1, SR=0)
0.045 0.04 0.035 Fals e positive rate
0.3
0.2
0.15
0.2
Fals e positive rate
0.35
0.03 0.025 0.02
IHUCB(LR=0,SR=0.08) IHUCBFPF(LR=0,SR=0.08) IHUCB(LR=0.08,SR=0.08) IHUCBFPF(LR=0.08,SR=0.08) IHUCB(LR=0.1,SR=0.08) IHUCBFPF(LR=0.1,SR=0.08)
0.15
0.1
0.015
0.1
0.01
0.05
0.05 0.005 0 20
25
30 35 40 Fault number of codebook
45
50
0 20
25
30 35 40 Fault number of codebook
45
50
0 20
25
Fig. 3. The false positive rate of two algorithms
30 35 40 Fault number of codebook
Q. Zheng, Y. Qian, and M. Yao 4
4
x 10
8 IHUCB(LR=0, SR=0) IHUCBFPF(LR=0, SR=0) IHUCB(LR=0, SR=0.08) IHUCBFPF(LR=0, SR=0.08) IHUCB(LR=0, SR=0.1) IHUCBFPF(LR=0, SR=0.1)
Event correlation time(m illisec onds)
8 7 6 5 4 3 2
9 IHUCB(LR=0,SR=0) IHUCBFPF(LR=0,SR=0) IHUCB(LR=0.08,SR=0) IHUCBFPF(LR=0.08,SR=0) IHUCB(LR=0.1,SR=0) IHUCBFPF(LR=0.1,SR=0)
6 5
25
30 35 40 Fault number of codebook
45
50
4 3 2
0 20
x 10
IHUCB(LR=0,SR=0.08) IHUCBFPF(LR=0,SR=0.08) IHUCB(LR=0.08,SR=0.08) IHUCBFPF(LR=0.08,SR=0.08) IHUCB(LR=0.1,SR=0.08) IHUCBFPF(LR=0.1,SR=0.08)
8
1
1 0 20
4
x 10
7 Event correlation time(m illisec onds)
9
Event correlation time(m illisec onds)
868
7 6 5 4 3 2 1
25
30 35 40 Fault number of codebook
45
50
0 20 (b)
25
30 35 40 Fault number of codebook
45
50
Fig. 4. The event correlation time of two algorithms
℅
DRFN, SN and FPRFN, SN respectively. In our simulation, we set fnmax=7, α=2.5 , varied FN from 20 to 50, SN from 15 to 25. The parameter K is 100, and M is 5000. Fig.2 and fig.3 present a comparison of the detection rate (DR) and false positive rate (FPR) for IHUCB and IHUCBFPF, respectively. As shown in fig.2, if LR is 0, when SR increases, DR of IHUCB decreases while that of IHUCBFPF is still at a high level. A conclusion could be reached that IHUCBFPF is more robust than IHUCB on the event loss. When LR is 0, FPR of IHUCB is higher than that of IHUCBFPF. The reason is that IHUCB deals with the spurious event as a real event. So in the creating fault hypotheses procedure the correct hypothesis is removed due to premature hypothesis removal[4], and in the measurement phase a wrong final choice creates. It also displays when SR is 0 even if change LR there is no significant differences between FPRs of two algorithms, that is because our proposed heuristic approach is only enhance the process to spurious events. In fig.3, the graph shows that IHUCBFPF can get a satisfying FPR on the condition with a high SR while that of IHUCB is unsatisfying. There is a phenomenon when the event loss ration is high; the DRs of IHUCB and IHUCBFPF are basically the same. However, IHUCBFPF has a high improvement on FPR. DRs are not improved because when LR is high, the event loss number has exceeded the diameter of the codebook. For this problem, we can increase the event number of codebook to improve the tolerance of event loss. Fig.4 shows that IHUCB is faster than IHUCBFPF, especially in less LR cases. This reason is that many fault hypotheses with low probability are removed during the hypotheses creating phase in IHUCBFPF while during the final measurement phase in IHUCB, and the time for creating hypotheses is the most important part.
5 Conclusion The IHUCBFPF technique proposed in this paper through estimating the likelihood of each fault in the faults set which can explain the event to remove these faults with less likelihood to enhance the IHUCB algorithm. It uses a codebook model which represents the relationship between fault and symptom. As shown in the simulation, it has a satisfied improvement on the accuracy of detection, false positive rate and time of correlation when event spurious ratio is high. In the near future, we plan to apply this heuristic into the probability codebook.
A Network Event Correlation Algorithm Based on Fault Filtration
869
References 1. Bouloutas, A.T., S. Calo, and A.Finkel, Alarm Correlation and Fault Identification in Communication Networks. IEEE Transactions on Communications, 1994. 42: p. 523 - 533. 2. Katzela, I. and M. Schwartz, Schemes for Fault Identification in Communication Networks. IEEE/ACM Transactions on Networking, 1995. 3(6): p. 753-764. 3. Steinder, M. and A.S. Sethi. Increasing Robustness of Fault Localization through Analysis of Lost, Spurious, and Positive Symptoms. in INFOCOM2002. 2002. New York. 4. Steinder, M. and A.S. Sethi, Probabilistic Fault Diagnosis in Communication Systems through Incremental Hypothesis Updating. Computer Networks, 2004. 45(4): p. 537-562. 5. Yemini, S.A., et al., High Speed and Robust Event Correlation. IEEE Communications Magazine, 1996. 34(5): p. 82-90. 6. Liu, G., A.K. Mok, and E.J. Yang. Composite events for network event correlation. in IM99. 1999. 7. Kliger, S., et al. A coding approach to event correlation. in Intelligent Network Management. 1995. Santa Barbara, CA. 8. Zheng, Q.H. and Qian. Y.T. An Event Correlation Approach Based on the Combination of IHU and Codebook. in CIS05. 2005. Xian, China.
CPR Localization Using the RFID Tag-Floor* Jung-Wook Choi, Dong-Ik Oh, and Seung-Woo Kim Dept. of Computer Science & Engineering, College of Engineering, SoonChunHyang University, Asan, Korea {jwchoi, dohdoh, seungwo}@sch.ac.kr
Abstract. In this paper, we describe our approach to achieve accurate localization using RFID for Cellular Phone Robot (CPR). We solely rely on RFID mechanism and complement coordinate errors accumulated during the wheel-based CPR navigation. We especially focus on how to distribute RFID tags (tag pattern) and how many to place (tag granularity) on the floor to accomplish efficient navigations. We define the error in navigation and use it to compare the effectiveness of various RFID floor settings through a simulation. Identified tag patterns and granularities would be useful to many USN applications where the adoption of RFID technology is appropriate.
1 Introduction Under the ubiquitous computing paradigm, we are connected to the world of computing anywhere and anytime through USN (Ubiquitous Sensor Network). In USN, object information pertaining to the computing is important. One of the most fundamental information needed is the positional (localization) information of the object. Use of laser and ultrasonic sensors is common for the purpose, but they cannot sense objects through obstacles (line-of-sight problem). Recently, there have been a few attempts of applying RFID in localization [1, 2]. RFID does not suffer from the line-of-sight problem, and its object ID can be trivially retrieved. However, RFID has a problem with triangulation, the most common localization techniques adopted in USN. For this to work, the distance between a sensor and a sensed object needs to be accurately measured. But today’s RFID technology does not provide enough accuracy in the distance measurement. In the RFID sensing literatures, the best result so far gives an average error of 0.77 feet in coordinate detection. It is not accurate enough for the robot navigation [1]. Errors in RF distance measurement account for such inaccuracies. Therefore, in this paper, we suggest a new way of implementing robot localization using RFID. This effort is a part of CPR (Cellular Phone Robot) development [3, 4]. In this development, we place an RFID reader onto CPR and prepare a tag-floor. A tag-floor is a floor installed with RFID tags, each of which possesses coordinate information of itself. With such a setting, CPR may acquire accurate coordinates during navigation, because the localization error does not depend on the distance between the reader and the tag. *
This research was supported in part by the Ministry of Science & Technology of Korea (Grant No. R01-2004-000-10274-0(2005)).
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 870 – 874, 2006. © Springer-Verlag Berlin Heidelberg 2006
CPR Localization Using the RFID Tag-Floor
871
Several tag-floor arrangements have already been suggested [5, 6]. However, these studies do not address the issue of how (tag pattern) and how many (tag granularity) tags should be distributed on the floor. In this study, we suggest a way to determine efficient tag patterns and tag granularities for the tag-floor. This paper is organized as follows. In section 2, we introduce the CPR navigation. We describe the tag-floor assignment and discuss how it may affect the overall localization performance. Section 3 covers the localization efficiency by presenting and analyzing the simulation results. Section 4 concludes.
2 CPR and RFID Tag-Floor Localization CPR is a new technological concept combining CP (Cellular Phone) and RT (Robot Technology) [3, 4]. This paper focuses on the localization and navigation of CPR Mobility , which consists of two main controllers: Trajectory Controller and named CPR Self-Localization Controller. The Trajectory Controller is responsible for the wheelbased navigation. The Self-Localization Controller provides coordinate information of the moving CPR. For the robotic movement, a pair of navigational wheels is installed. With the coordinate information acquired from the Self-Localization Controller, Trajectory Controller refines CPR movements to achieve better navigation. We make CPR know of its own coordinate through RFID sensing. We place a small RFID reader onto the CPR. On the in-door floor, we place multiple RFID tags and associate each tag (tag ID) with a coordinate. During an actual navigation, as the CPR detects a tag, its coordinate is fed to the Trajectory Controller. There may be many sensible ways of placing RFID tags to the floor. Never the less, it is practical to prepare tiles of the same fashion and assemble them to produce a navigational floor. The simplest way of arranging RFID tags on a tile is to place 4 tags in a square (“square”) as in Fig. 1(a). Another way might be “parallelogram” suggested in Fig. 1(b). It is suggested that with the pattern in Fig. 1(a), there are higher possibilities of missing tag detections [5]. However, as we can see from Fig. 1(a) and 1(b), possibilities of missing tags in both cases are about the same. Therefore, we suggest the “tilted-square” pattern as in Fig. 1(c). This pattern seems to have the lowest possibility of missing tag detections during navigation.
(a)Square
(b)Parallelogram
(c)Tilted-Square
Fig. 1. Possible misses of tag detection
For the tag-floor navigation, the more tags we place on the floor, the higher the tag encountering possibility will be. We can compensate the error accumulated by the wheel-based navigation more often in this case. However, it would be desirable to place only a minimum number of tags on the floor.
872
J.-W. Choi, D.-I. Oh, and S.-W. Kim
3 Efficient Tag-Floor Setting In order to determine efficient tag granularities and tag arrangement patterns of the tag-floor, we developed a simulation program. We defined the error in navigation and used it to compare the effectiveness of the tag-floor. 3.1 Performance Factors Since we are considering CPR movement under no obstacles (near obstacles deadreckoning navigation is performed; see [4] for details), the most important performance factor is the time it takes to travel from a given departure point to a given destination point. This is the sum of straight line movement times at each successive point it goes through, plus the times to adjust its posture at each tag. The movement depends on the arrangement of tags on the floor as well. Therefore, we define CPR navigation time as: Te ( x0 , xn ) =
n −1
∑{tm( x , x i
i +1
) + tr ( xi , xn )}
(1)
i =0
where,
Te ( xs , xd ) : Navigation time from x s to x d under the floor arrangement e tm( xa , xb ) : Straight-line movement time from x a to xb tr ( xa , xb ) : Posture adjustment time from x a for xb x s , x0 : Departing points; x d , x n : Destination points x1 … xn−1 : Locations of successive tags encountered during navigation By subtracting the straight-line movement time between the departure and the destination points from Equation (1) and by normalizing it against the straight line movement time, we define relative navigational error REe ( xs , xd ) to compare the effectiveness of different tag arrangements and granularities. REe ( xs , xd ) = [Te ( xs , xd ) − tm( xs , xd )] / tm( xs , xd )
(2)
3.2 Simulation Parameters and Fixed Values
The simulation program measures relative effectiveness of the navigation. In this program we use various parameters to better reflect real-world navigation. Major parameters a user can specify are: Tag patterns. “square,” “parallelogram,” or “tilted-square” can be chosen. Tag granularities. Four tags per tile. Hence, the tile size determines the granularity. Read ranges of the tag and the reader. There are many types of tags and readers. Therefore, we make this parameter flexible. Others. Specify velocity, posture adjustment time, and angular wheel error.
Using the definition of REe in Equation (2) and by varying the patterns and granularities of the tags, we analyzed CPR navigation performance through
CPR Localization Using the RFID Tag-Floor
873
simulation. We used 50,000 of randomly generated departure-destination pairs. We accumulated REe values for each of three floor patterns and used the mean REe values for the performance comparison. Among the simulation parameters, some are fixed to reflect real-world situations. Table 1 summarizes the fixed values used. Table 1. Fixed simulation parameters
Parameters Floor Size CPR Velocity (Straight, Rotation) Minimum Path Distance Angular Error Rate
Values 1000cm × 1000cm 10.0cm/sec, 2.0cm/sec 400 cm ±π
36
rad
3.3 Simulation Results and Analysis
In order to determine how many tags are sufficient to generate an efficient navigational performance, we find the correlation between navigational performance and tag granularities. We differentiate tag granularities by adjusting the read range and the tag interval. The read range is determined by the maximum distance between centers of a tag and a reader. The tag interval is the distance between the centers of one tag to the closest tag on the floor. Tag Granularity. According to the simulation results, we notice that there exists an optimal tag interval where REe becomes minimal regardless of the tag arrangement patterns. By performing simulations on various read range and extracting those minimal REe values, we conclude that there exists an optimal relationship between the read range and the tag interval (tag granularity). Table 2 summaries the optimal results obtained from various read ranges. It shows that by maintaining the interval about 4 times of the range (about 4:1 ratio between the tag interval and the read range), we reach the minimal navigational errors in terms of REe . Tag Arrangement Pattern. Under the optimal tag granularities, we compare the performance of three tag arrangement patterns by consulting RE column in Table 2. In all cases, “tilted-square” arrangement produces the best performances in relative effectiveness. With the results, we conclude that the “tilted-square” pattern suggested in this study produces the best efficiency for the tag-floor navigation. Table 2. Optimal tag intervals and RE values (T: Tilted-Square, S: Square, P: Parallelogram)
Read range 6 cm 8 cm 10 cm 12 cm 14 cm
Optimal tag interval 28 cm 33 cm 41 cm 45 cm 53 cm
Relative error (RE) in percent (T) 36.94% (S) 40.36% (P) 39.82% (T) 32.51% (S) 35.97% (P) 35.59% (T) 29.75% (S) 32.76% (P) 33.08% (T) 27.53% (S) 30.58% (P) 30.21% (T) 26.01% (S) 28.35% (P) 29.32%
874
J.-W. Choi, D.-I. Oh, and S.-W. Kim
4 Conclusion For the implementation of the self-localization feature of CPR, we used RFID technology. Unlike other conventional sensors, RFID readers do not suffer from the line-of-sight problem, hence a better implementation of CPR navigation is possible. In this paper we suggested an RFID tag installed floor for effective CPR navigation. Furthermore, we developed a simulation program to find out better ways of distributing RFID tags on the floor. We defined the error in navigation and used it to compare the effectiveness of various floor settings. The simulation results indicate that 4:1 ratio between the tag interval and the tag/reader range is desirable. They also demonstrate that the “tilted-square” is the most adequate tag arrangement pattern. This analytical result should be beneficial for many other USN applications, in which the application of an RFID tag-floor is appropriate. In the continuing research, live test navigation based on the findings of this work will be conducted.
References 1. G. Kantor and S. Singh, "Preliminary Results in Range-Only Localization and Mapping," Proceedings of the IEEE Conference on Robotics and Automation, Washington, DC, May 2002 2. D. Hahnel, W. Burgard, D. Fox, K. Fishkin, and M. Philipose, "Mapping and Localization with RFID Technology," Intel Research Institute, Seattle, WA, Tech. Rep. IRS-TR-03-014, December 2003 3. Jaeil Choe, Seungwoo Kim, "A Study on Infra-Technology of Robotic Cellular Phone," Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3571-3576, Sept., 2004. 4. Seungwoo Kim, Jaeil Choe, "A Study on The new Technological Concept of Robotic Cellular Phone (RCP),” International Journal of Intelligent Material Systems and Structures, Vol. 16, no. 12, pp. 995-1005, Dec. 2005. 5. Itiro Siio, "User Position Detection using RFID Tags," Technical Report Proceedings of Japan Information Processing Society, 00-HI-88, pp. 45-50, 2000 6. Svetlana Domnitcheva, "Smart Vacuum Cleaner: An Autonomous Location-Aware Cleaning Device," Proceedings of the 6th International Conference on Ubiquitous Computing, Tokyo, Japan, Sep. 2004.
Development of a Biologically-Inspired Mesoscale Robot Abdul A. Yumaryanto1, Jaebum An2, and Sangyoon Lee2 1
Department of Advanced Technology Fusion, Konkuk University 1 Hwayang-dong, Seoul, South Korea
[email protected] 2 School of Mechanical and Aerospace Engineering, Konkuk University 1 Hwayang-dong, Seoul, South Korea {pojec, slee}@konkuk.ac.kr
Abstract. This paper presents the design and prototype of a mesoscale (13 cm long) six-legged walking robot whose locomotion is actuated by a piezoelectric actuator named LIPCA, which consists of multiple layers of glass/epoxy and carbon/epoxy that encapsulate a unimorph piezoceramic actuator. Inspired by the walking kinematics of cockroaches, our robot uses the alternating tripod gait (the front and rear legs on the same side move together with the middle leg on the other side for the locomotion), and has six legs that are designed to mimic the function of those of cockroaches. All the experiments with the prototype show a possibility of a small, light, and agile walking robot that is actuated by LIPCA without using any conventional electromagnetic actuator.
1 Introduction Piezoelectric materials are smart in a sense that they can sense changes in the environment and respond by changing their material properties and geometry. Such materials have been used in the robotics field as sensors [1] and actuators [2, 3]. Recent developments of piezoelectric devices are unimorph-type actuators, in which a piezoelectric ceramic is bonded to a thin metal sheet. The secondary material amplifies the axial displacement of the actuator by constricting the lateral motion. Examples of such type can be THUNDER and RAINBOW [4]. A piezo-composite actuator called LIPCA (Lightweight Piezoceramic Composite curved Actuator) has a different structure: lightweight fiber-reinforced plastic layers instead of heavy metal layers. Experimental results show that LIPCA can produce 60% larger displacement and is 40% lighter than THUNDER [5]. In addition to previous applications of LIPCA as an actuator, LIPCA is used here for actuating a small (13 cm long), six-legged walking robot. Compared to a conventional-scale robot, a mesoscale robot has a limited design space, and hence it needs a simple design approach and a lighter and smaller actuator like LIPCA. Noticing that biological insects are intelligent in terms of the agile and stable motion, we reflect the walking kinematics of cockroaches on the design of our hexapod robot. We report the experimental results with real cockroaches and the design and prototype of a LIPCA-actuated mobile robot. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 875 – 879, 2006. © Springer-Verlag Berlin Heidelberg 2006
876
A.A. Yumaryanto, J. An, and S. Lee
2 Design and Prototype of the Robot The walking mechanism of our robot is inspired by an agile six-legged insect, cockroach. We analyzed the walking kinematics of German cockroaches (Blattala Germanica) with an experimental apparatus composed of a high-speed camera, a computer, and a homemade walking track. It is observed that when a cockroach walks, it uses the alternating tripod gait where the front and rear legs on one side and the middle leg on the opposite side move concurrently (Fig. 1). In addition, the front, middle, and rear legs do different functions while walking. The rear leg is used to thrust or accelerate the body while the front leg is used to decelerate the body. The middle leg has two functions: one is to decelerate and the other is to accelerate the body while stabilizing the body in the lateral direction. The tripod gait and the functions of each leg of cockroaches are reflected in the design of our robot.
Fig. 1. Walking of a cockroach
Our six-legged robot is actuated by two LIPCA strips, which are placed in the body frame using simply-supported joints (see Fig. 2). By the application of a high AC voltage, the LIPCA strips move up and down alternately except the edges, which are constrained to the body frame. Each LIPCA is used to move one set of legs for the alternating tripod gait where one set of legs stroke on the ground while the other set swings above the ground. We constructed two transfer mechanisms to convert a LIPCA displacement into a stroke in order to realize a walking mechanism. The first mechanism is for connecting the middle part of LIPCA to the hip part of the leg and the second is for joining the hip part and the foot. On the rear legs, the displacement of LIPCA is transferred using a slider-crank mechanism, which amplifies the displacement. The front and middle legs are directly connected without any amplification. Theoretically the hip displacement of the front leg is equal to that of the middle one, while the hip displacement of the rear leg is larger than that of LIPCA.
Development of a Biologically-Inspired Mesoscale Robot
877
Fig. 2. Design of the robot
The amplification can be calculated as follows: Amp =
d '−d p − p'
(1)
where d ' = L1 sin (θ1 + Δθ ) − L2 − (s − L1 cos(θ1 + Δθ )) 2
2
, d = L1 sin θ1 − L2 2 − (s − L1 cos θ1 )2 ,
and Δθ = tan −1⎛⎜ p ⎞⎟ − tan −1⎛⎜ p' ⎞⎟ . All the parameters in the right-hand side are known ⎜ ⎟ ⎜ ⎟ ⎝q⎠
⎝q⎠
from the geometry of the robot (see Fig. 3).
Fig. 3. Slider-crank mechanism of the leg
The horizontal displacement of the foot creates a thrust for accelerating the body forward. When positioning the slot in the rear leg we used an optimization method to maximize the displacement in the horizontal direction. The result is that the slot should be placed 15 mm horizontally behind the hip and 23.3 mm vertically below the hip, which will produce 8.7 mm in the backward stroke. The body frame and the linkages of the prototype robot are made of balsa wood and carbon composite rods,
878
A.A. Yumaryanto, J. An, and S. Lee
respectively. The legs are made using a metal rod to get a high impulse. The total weight is 35 grams and the dimension is 120 mm × 55 mm × 65 mm.
3 Experiments and Results To measure the walking performance of our robot we conducted several experiments with an experimental apparatus (Fig. 4). Both LIPCA strips were driven with a square function of three different voltages (±100V, ±150V, and ±200V (200, 300, and 400 V peak-to-peak respectively)), with the phase of 180° at 2, 10, 20, 30, and 40 Hz for each voltage input. For each frequency we measured the speed three times and took the average value, which is shown in Fig. 5. As the operating frequency increases, the alternating tripod gait frequency becomes larger, and thus the robot moves faster. The voltage of the power supply changes the displacement of LIPCA, and thus the rear leg stroke becomes larger onto the ground to increase the speed. At frequencies over 30 Hz, change from 200 Vpp to 300 Vpp brought a larger effect on the velocity than the change from 300 Vpp to 400 Vpp.
Fig. 4. Experimental apparatus for walking of the robot
4 Discussion and Conclusions We have reported the design, prototype, and experiments of a mesoscale, light, and agile walking robot that is actuated by a smart material, LIPCA, and inspired by an agilely walking insect, a cockroach. Compared with MG3 [4], our robot is slower at the same frequency and voltage. However, unlike MG3 that has a frequency-based turning capability, our robot can walk straight at any frequency, which implies that our robot has a potential to walk faster with the application of a higher frequency. The second prototype of our LIPCA-actuated robot and a light and small power
Development of a Biologically-Inspired Mesoscale Robot
879
Fig. 5. Performance of the robot for various applied voltages and frequencies
supply converter are under development, with the goal that the robot becomes a selfpowered LIPCA-actuated robot. In the second prototype, we place two LIPCA strips in the middle instead of fixing them in the edges. Acknowledgements. This work was supported by Korea Research Foundation Grant (Intensive Research Center Program 2004, KRF-2004-005-D00047) and the support is sincerely appreciated.
References 1. Klahold, J., Rautenberg, J., Ruckert, U.: Continuous sonar sensing for mobile mini-robots. Proceeding of the IEEE International Conference on Robotics and Automation, Vol. 1 (2004) 323-328 2. Hollinger, G.A., Briscoe, J.M.: Genetic Optimization and Simulation of a Piezoelectric Pipe-Crawling Inspection Robot. Proceeding of the 2005 IEEE International Conference on Robotics and Automation, Vol. 1 (2005) 484-489 3. Goldfarb, M., Golgola, M., Fischer, G., Garcia, E.: Development of a piezoelectricallyactuated mesoscale robot quadruped. Journal of Micromechatronics, Vol.1, No. 3 (2001) 205-219 4. Wise, S.A.: Displacement properties of rainbow and thunder piezoelectric actuators, Sensors and Actuators, Vol. 69 (1998) 33-38 5. Yoon, K.J., Park, K.H., Lee, S.K., Goo, N.S., Park, H.C.: Analytical design model for a piezo-composite unimorph actuator and its verification using lightweight piezo-composite curved actuators, Smart Material and Structures, Vol.13 (2004) 459–467
Timed Petri-Net(TPN) Based Scheduling Holon and Its Solution with a Hybrid PSO-GA Based Evolutionary Algorithm(HPGA) Fuqing Zhao1, Yahong Yang2, Qiuyu Zhang1, and Huawei Yi1 1
School of Computer and Communication, Lanzhou University of Technology, 730050 Lanzhou, P.R. China {zhaofq, zhangqy, yihw}@mail2.lut.cn 2 College of Civil Engineering, Lanzhou University of Techchnology, 730050 Lanzhou, P.R. China
[email protected] Abstract. Modern manufacturing systems have to cope with dynamic changes and uncertainties such as machine break down, hot orders and other kinds of disturbances. Holonic manufacturing systems (HMS) provide a flexible and decentralized manufacturing environment to accommodate changes dynamically. In this paper, A new class of Time Petri Nets(TPN), Buffer-nets, for defining a Scheduling Holon is proposed, which enhances the modeling techniques for manufacturing systems with features that are considered difficult to model. The proposed novel GA algorithm performs the population alternation according to the features of the evolution of the populations in natural. Simulation results show that the proposed GA is more efficient than standard GAs. The proposed HPGA synthesizes the merits in both PSO and GA. The simulation results of the example show that the methods to scheduling holon are effective for fulfilling the scheduling problem.
1 Introduction Holonic manufacturing is a highly distributed control paradigm based on a kind of autonomous and cooperative entity called “holon”[1]. HMS requires a robust coordination and collaboration mechanism to allocate available resources to achieve the production goal. Multi-agent systems (MAS) [2] provides desirable characteristics to proactively handle uncertainties. HMS is usually modeled as a cooperative MAS. Although there are a lot of research works on HMS[3][4], however, deadlock issue has not been addressed[5]. The remainder of this paper is organized as follows. Section 2 proposes a new class of Time Petri Nets(TPN), Buffer-nets, for defining a Scheduling Holon. In section 3, a hybrid PSO-GA(HPGA) based evolutionary algorithm is proposed. A scheduling holon architecture, which integrates TPN models and HPGA techniques is given in Section 4. Section 5 concludes this paper. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 880 – 884, 2006. © Springer-Verlag Berlin Heidelberg 2006
Timed Petri-Net Based Scheduling Holon and Its Solution
881
2 Formulating the Scheduling Operation of a Scheduling Holon Using Petri-Nets 2.1 Buffer Nets Definition 1. A timed-PN is called a Buffer net (B-net) if
( P = R ∪ Q) ∧ ( R ∩ Q = Φ ) where the set of places R represents the resources and the set of places the buffers, and the following three conditions are also satisfied:
Q represents
I (r , t ) = O (r , t ) ∀t ∈ T , ∀r ∈ R (b) ∀t ∈T , there exists a single p ∈ Q : I ( p, t ) = 1 and a single p'∈ Q : O p '∈ Q : O ( P ' , t ) = 1, for p ≠ p ' ; (c) The subnet G ' = (Q, T , I ' , O ' , M ' ,τ ) ; where I ' and O ' are the restrictions of I to (Q × T ) and O to (T × Q ) . (a)
Definition 2. Initial and final states of a B-net:
BI ⊂ Q is called a set of input buffer places if ∀p ∈ BI and t ∈T , I ( p, t ) = 1 and O ( p, t ) = 0 , i.e. BI = { p ∈ Q | (!∃) t ∈ T O( p, t ) > 0} . 2) Bo ⊂ Q is called a set of output buffer places if ∀p ∈ Bo and t ∈T , I ( p, t ) = 0 and O ( p, t ) = 1 , i.e. Bo = { p ∈ Q | (!∃) t ∈ T I ( p, t ) > 0} . 1) For a B-net,
2.2 B-Nets to Model Scheduling Holon of HMS In a Scheduling Holon, there are n (where n > 1) products to be produced using m (where m > 1) processing units. For each product, the sequence by which the processing units will be visited is pre-specified and is referred to as the product (or job) routing or processing recipes. Normally, the processing time τ ij for a product (or job)
i (i = 1,2,L, n) in unit j ( j = 1,2,L, m) is given. Operation Oij can be represented by two transitions t sij and t fij for the start and the termination of this operation, respectively, and one place
τ ij
pij with time duration
for the processing activity.
3 Hybrid PSO-GA Based Evolutionary Algorithm(HPGA) 3.1 A Novel GA By introducing the “dying probability” for the individuals and the “war/disease process” for the population, the authors propose a novel approach in this paper to determine the population size and the alternation between generations.
882
F. Zhao et al.
The step of the algorithm is summarized as follows: (1) Generate initial population: sizeof_population=POP_INITIAL; create randomly population[sizeof_population], die_probability[sizeof _population]= DIE_ PROBABILITY[0]. (2) Evaluate the individual to obtain the max fitness and the least fitness of the population: value_max and value_min. (3) Memorize the best solution and stop if value_max > VALMAX or gap > GAP. (4) Select SELECTION_POP individuals into the reproducing pool randomly according to their fitness. (5) Divide individuals in the reproduction pool into couples randomly. All the couples perform the crossover and mutation operations. (6) Perform die process. For each individual, according to its die probability determine whether it will die. If the individual should die then sizeof_population- -; else if die_probability =DIE_PROBABILITY[k] then die_probability = DIE_ PROBABILITY[k++]. (7) Perform the war/disease process: if sizeof _population > POP_MAX then select POP_INITIAL individuals randomly into the new population according to their fitness. Go to step 2. 3.2 PSO-GA Based Hybrid Algorithm Particle Swarm Optimization (PSO) also is an evolutionary computational model which is based on swarm intelligence. PSO is developed by Kennedy and Elberhart [6] who have been inspired by the research of the artificial livings, it finds the optimum solution by swarms following the best particle. Based on its advantages, the PSO is not only suitable for science research, but also engineering applications, in the fields of evolutionary computing, optimization and many others[7]. This paper proposes a novel hybrid PSO-GA based algorithm(HPGA). The performance of the algorithm is described as follows: (1) Initialize GA and PSO sub-systems, respectively. (2) Execute GA and PSO simultaneously. (3) Memorize the best solution as the final solution and stop if the best individual in one of the two sub-systems satisfies the termination criterion. (4) Perform hybrid process if generations could be divided exactly by the designated iterative times N . Select P individuals from both sub-systems randomly according to their fitness and exchange. Go to step 2.
4 Numerical Results A multipurpose batch plant in a scheduling holon with five products (p1–p5) and five processing units (u1–u5) is considered as the case study to testify our model and algorithm, whose product recipe and processing time are given in Table 1. The TPN model for the NIS policy is shown in Fig. 1. Both GA,PSO and HPGA algorithms have been implemented in this example. The number of searches can be decreased from 307 to 152 by the use of the HPGA
Timed Petri-Net Based Scheduling Holon and Its Solution
883
algrithm and again the results were the same as the results from GA and PSO. The calculation of usage duration for each unit in the multipurpose case is enhanced about 56% and 39% than that of GA and PSO respectively. Comparisons of different algorithm for the case study is shown in Table 2. From the table, we can see that the performance of HPGA is better than that of SGA and PSO. Table 1. Processing times (h) of products
Units
Products P1 P2 2.0 0.0 0.0 1.0 3.0 4.0 2.0 4.0 0.0 2.0
U1 U2 U3 U4 U5
P3 2.0 0.0 5.0 0.0 1.0
P4 3.0 2.0 0.0 5.0 0.0
P5 0.0 3.0 1.0 0.0 2.0
ps1(0)
ps2(0)
ps3(0)
ps4(0)
ps5(0)
ts11
ts22
ts31
ts41
ts52
p22(1)
p2(0)
p41(3)
p11(2) tf11(ts13)
p13(3) tf13(ts14)
p1(0)
tf22(ts23)
tf31(ts33)
tf41(ts42)
tf52(ts53)
tf42(ts44)
tf53(ts55)
p3(0)
p23(4) tf23(ts24)
tf33(ts35) p4(0)
p14(2) tf14
pf1(0)
p24(4) tf24(ts25)
p25(2)
p5(0) p35(1)
p44(5)
p55(2)
tf35
tf44
tf55
pf3(0)
pf4(0)
pf5(0)
tf25 ps2(0)
Fig. 1. TPN model for a Scheduling Holon(5 × 5) Table 2. Comparisons of Different Algorithm for the case study
PSO SGA HPGA
Average Success(%) 91.1% 78.8% 98.2%
Time 95” 206” 58”
Object Function Solution 767.6933 976.966 635.2341
884
F. Zhao et al.
5 Concluding Remarks In the manufacture-to-order environment, production plans can only be drawn up and executed successfully with the use of a planning and control concept that provides predictability and stability. So the model and control of HMS based on Time Petri net theory provides a functional structure for a computer application, which enables the planners to cope with logistic and technological logical Scheduling problems on multiple levels of aggregation. The proposed HPGA synthesizes the merits in both PSO and GA. It is a simple and yet effective model to handle different kinds of continuous optimization problems.
Acknowledgements This research is supported by Natural Science foundation of GANSU province(grant NO 3ZS041-A25-020 and ZS032-B25-013).
References 1. Mondal, S., Tiwari, M.K.:Application of an autonomous agent network to support the architecture of a holonic manufacturing system. International Journal of Advanced Manufacturing Technology, 12(2002)931-942 2. Leitao, Paulo; Restivo, Francisco: Experimental validation of ADACOR Holonic control system. Lecture Notes in Artificial Intelligence, 3593(2005)121-132 3. Luder, A., Klostermeyer, A., Peschke, J. etal: Distributed Automation: PABADIS versus HMS. IEEE Transactions on Industrial Informatics, 1(2005)31-38 4. Giret, A., Botti, V., Valero, S.: MAS methodology for HMS. Lecture Notes in Artificial Intelligence, 3593(2005)39-49 5. Hosack, B., Mahmoodi, F., Mosier, C.T.: A comparison of deadlock avoidance policies in flexible manufacturing systems. International Journal of Production Research, 13(2003)2991-3006. 6. J. Kennedy, R.C. Eberhart: Particle swarm optimization. Proc. IEEE Internat. Conf. on Neural Networks, Perth, Australia,vol. IV, IEEE Service Center, Piscataway, NJ (1995)1942– 1948 7. Da, Yida,Xiurun, Ge.: An improved PSO-based ANN with simulated annealing technique. Neurocomputing, 63(2005)527-533
Recognition Rate Prediction for Dysarthric Speech Disorder Via Speech Consistency Score Prakasith Kayasith1, 2, Thanaruk Theeramunkong2, and Nuttakorn Thubthong3 1
Assistive Technology Center, National Electronics and Computer Technology Center (NECTEC), Thailand Science Park, Klong Luang, Pathumthani 12120, Thailand
[email protected] 2 School of Information and Computer Technology, Sirindhorn International Institute of Technology (SIIT), Thammasat University, Klong Luang, Pathumthani 12121, Thailand
[email protected] 3 Acoustics and Speech Research Laboratory (ASRL), Department of Physics, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand
[email protected] Abstract. Dysarthria is a collection of motor speech disorder. A severity of dysarthria is traditionally evaluated by human expertise or a group of listener. This paper proposes a new indicator called speech consistency score (SCS). By considering the relation of speech similarity-dissimilarity, SCS can be applied to evaluate the severity of dysarthric speaker. Aside from being used as a tool for speech assessment, SCS can be used to predict the possible outcome of speech recognition as well. A number of experiments are made to compare predicted recognition rates, generated by SCS, with the recognition rates of two well-known recognition systems, HMM and ANN. The result shows that the root mean square error between the prediction rates and recognition rates are less than 7.0% (R2 = 0.74) and 2.5% (R2 = 0.96) for HMM and ANN, respectively. Moreover, to utilized the use of SCS in general case, the test on unknown recognition set showed the error of 11 % (R2 = 0.48) for HMM.
1 Introduction Dysarthria is a term given to a group of speech disorder in which the transmission of messages controlling the motor movements for speech is interrupted. Severe dysarthric speech may be completely unintelligible to unfamiliar listeners. However, it has been shown in previous studies [1 - 3] that, with carefully designed, ones with dysarthria can truly be benefited from incorporated speech recognition system into assistive devices. In the area of speech assessment for people with speech disorder, there are two common tests based on perceptual analysis, namely articulatory test and intelligibility test. Basically both articulatory and intelligibility tests are subjective to human perception [4 - 5]. Besides, the results cannot be applied explicitly to evaluate the level of severity in term of speech processing features such as a consistency in speech. One may apply a straight-forward method by running the test with full processes speech recognition system. However, the choice is quite complicated and time consuming since we need both transcribing and training processes. To this end, this Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 885 – 889, 2006. © Springer-Verlag Berlin Heidelberg 2006
886
P. Kayasith, T. Theeramunkong, and N. Thubthong
paper proposes an indicator called speech consistency score (SCS), which can be directly applied to predict a possibly outcome performance of those alternative speech technologies without those pre-processing processes.
2 Speech Consistency Score (SCS) In this work, a speech consistency score (SCS) is defined as a ratio of similarity to dissimilarity of speech so that the comparison of consistency among speakers can be calculated. A similarity score (SIM) represents a level of speech signal similarity of the same words produced by a speaker while a dissimilarity score (DIS) represents dissimilarity of speech signals between different words produced by the same speaker. The relation of SIM and DIS can be used to evaluate the level of signal distortion represented by a group of speech features, which essentially used in modern speech-communication technologies. To cope with a problem of time variation of speech samples, a technique of dynamic time wrapping (DTW), with adaptive slope constraint and accumulated penalty score, is applied to measure SIM and DIS. A signal will be separated into a sequence of smaller frames (25 ms width with 10 ms interval). Each frame will be represented by speech features. The definition of features can be changed depending on project interest. For example, in this study, we focused on a consistency score of speech recognition features. Therefore, MFCC features and their derivatives were chosen. To evaluate a consistency of speech, three parameters, i.e. SIM, DIS and SCS, were taken into account. The similarity (SIM) value represents the similarity within the same word which was calculated from m samples (each speaker would be asked to speak the same word m times). In this case, an average distance within the same word w ( X w ) can be calculated from equation (1), where w ∈ {1, 2,..., n} . The distance between sample ith and jth of word w ( X iw − X
w j
) is calculated by a DTW technique
(with Euclidean’s distance for frame comparison). Therefore, the value of similarity can be represented by the average distances for all words of each speaker, SIMsp, as showed in equation (2) where n is the number of words with in the test set. w
X =
1 m m ∑ ∑ X iw − X wj m C2 i =1 j =i +1
SIM sp =
1 n i ∑X n i =1
(1) (2)
As the second parameter, the dissimilarity (DIS) value represents the average distances between all difference words within the test set. The method starts from choosing a template for each word, and then calculates the average distance for every pair of words. The template for each word, T w, is chosen by selecting a sample from m samples within the same word w (out of five samples in this case). The selection criterion is to choose the utterance with the minimum sum of distances away from the others, as shown in equation (3). Then the DIS value for each speaker (DISsp) is
Recognition Rate Prediction for Dysarthric Speech Disorder Via SCS
887
calculated from the average distance of all words, as showed in equation (4), where n is the number of words in the test set.
{
m
w w w w T w = arg min(∑ X iw − X wj ) , where X i ∈ X 1 , X 2 ,..., X m X iw
j =1
DISsp =
1 n ∑ Tw n w=1
}
(3)
(4)
The last parameter defined as a speech consistency score (SCS) is formulated by the ratio of SIM to DIS. Generally, an ability to produce a variety of speech is different for each speaker. While SIM indicates similarity within a same word, dissimilarity displays dissimilarity among different words pronounced by the same speaker. These values (if being consider separately) fail to fulfill a meaningful comparison among different speakers, the ratio of SIM to DIS (a speech consistency score, SCS) is evaluated instead as shown in equation 5.
SCS sp =
DIS sp
(5)
SIM sp
3 Experiment 3.1 Subjects To evaluate our method, the speech corpus of sixteen speakers had been constructed from eight CP-Dysarthric children, (7 – 14 years old), and eight normal speakers including four adults (23 – 36 years old) and four children (7 – 12 years old). The corpus was created with the balance set of males and females. All CP-Dysarthric children with varied severity of dysarthria were recruited from Srisungwan compulsive school, a school for children with disabilities. 3.2 Speech Corpus and Evaluation Methods In order to evaluate the proposed method, two set of speech corpus were used. The first corpus, a control set, was designed especially for Thai phonemes error analysis. The second one, an unknown set was designed as a set of words that are frequently used for Assistive Technology. Recording was carried out under normal environmental conditions, in a quiet room with the door closed but no additional sound proof materials. Subjects were instructed to speak each word in isolation. The results of SCS were compared with the evaluation results obtained from articulatory and intelligibility tests, as well as the accuracy rates of two speech recognition (SRR) models, HMM and ANN. Incorporated with SCS, recognition results of the control set from both models are used to generate the prediction function. Then the predicted rate of each speaker was calculated to compare with HMM recognition rate of the unknown set. All results were evaluated using the average of root mean square error (Erms) and the correlation coefficient (R2).
888
P. Kayasith, T. Theeramunkong, and N. Thubthong
4 Results and Discussion Table 4.1 shows the results of SCS of each speaker. The table shows the details of all normal speakers where the mean value of SCS is 1.5 (σ = 0.11) while the mean value of dysarthric speakers is 1 (σ = 0.10). The ratio of DIS and SIM represents relative difference of speech signal distribution of different words pronounced by a speaker. A higher SCS shows higher distribution (less overlap area) among different words for that speaker. Therefore, we can expect a high correlation between SCS and recognition rates. As shown from the experiment, the results are agreed to the expectation. In some cases such as DF01, DF04, and DM02, the ratios are even less than 1. This indicates a highly overlap of speech signals of among different words. Therefore the results from recognition systems are low for these speakers. When all pairs of SCS and SRR results in Table 4.1 was used to generate the prediction function, the correlation coefficient (R2) is about 0.74 for HMM and 0.96 for ANN. That is a correlation between SCS and ANN is higher than a correlation between SCS and HMM. The result tell us that ANN based model is closer than HMM, to the DTW based method of finding SCS in terms of time alignment and a direct pattern comparison. The HMM is a time-dynamic recognition system that relies on probabilistic learning and language model that may not reflect the property of SCS. Next, the predicted recognition rates (PSCS) based on each SCS were calculated. The evaluation results are shown in Table 4.2. Table 4.1. Experiment results for normal speakers (left table) and dysarthric speakers (right table): compare to speech recognition rate (SRR) of the control set Code AF01 AF02 AM01 AM02 NF01 NF02 NM01 NM02
SRRHMM 0.99 0.99 0.98 0.98 0.98 0.92 0.95 0.94
SRRANN 0.93 0.97 0.95 0.97 0.95 0.84 0.84 0.91
SCS 1.52 1.57 1.67 1.44 1.56 1.48 1.28 1.56
Code DF01 DF02 DF03 DF04 DM01 DM02 DM03 DM04
SRRHMM SRRANN 0.38 0.38 0.49 0.54 0.77 0.65 0.51 0.47 0.55 0.53 0.49 0.37 0.72 0.60 0.75 0.77
SCS 0.86 1.00 1.04 0.94 1.01 0.89 1.05 1.16
At the end of both tables are root mean square errors (Erms) between recognition method and each speech evaluation methods (SCS, speech articulatory test, and speech intelligibility test, as show as ESCS, EArti, and EIntel, respectively). Table 4.2. Predicted recognition rate results’ error and correlation evaluation on control set (HMM and ANN) and unknown set (HMM)
Corpus Control Set Unknown Set
System HMM ANN HMM
EArti 0.1274 0.1574 0.1221
EIntel 0.1023 0.1508 0.0971
ESCS 0.0688 0.0246 0.1117
R2 0.74 0.96 0.47
Recognition Rate Prediction for Dysarthric Speech Disorder Via SCS
889
According to the experiments on the control data set, our proposed method (SCS) shows the lowest prediction error, compared to the others (articulatory test and intelligibility test) for both recognition systems (HMM and ANN). The experiment on the unknown set showed that our SCS is comparable to the other standard methods. The lowest prediction error came from intelligibility test followed by our method and the articulatory test with the errors of 9.7%, 11.17%, and 12.21%, consecutively.
5 Conclusion and Future Work Speech assessments for people with speech disorder are very important. The current methods, articulatory test and intelligibility test, are subjective to each experts and listeners. Moreover, it is not clear that the standard evaluations gained by human will reflect the recognition rate of modern speech recognition models. In addition, both articulatory test and intelligibility test are a time consuming and labor task. This paper proposes a criterion called speech consistency score (SCS) which can be used not only to evaluate the severity of speech disorder but also to predict the possible accuracy outcome from a speech recognition system. The prediction can be served as a decision index whether this dysarthric speaker could be benefit from the technology or not. As for future works, the research will scope on exploring more parameters such as the overlap factor, the energy and time consistency, to improve the accuracy of prediction for the unknown set. One more issue is how to incorporate word (or phoneme) density distribution into the model in order to construct a general frame work for predicting the recognition accuracy for any set of speech.
References 1. Deller, J., Hsu, D., Ferrier, L.: On the use of hidden Markov Modeling for Recognition of Dysarthric Speech. Computer Methods and Programs in Biomedicine 35 (1991) 125 -139 2. Kotler, A., and Thomas-Stonel, N.: Effects of speech training on the accuracy of speech recognition for an individual with speech impairment. Journal of Augmentative and Alternative Communication 12 (1997) 71- 80 3. Rosen, K., and Yampolsky, S.: Automatic Speech Recognition and a Review of Its Functioning with Dysarthric Speech. Journal of Augmentative and Alternative Communication 16 (2000) 46 - 60 4. Bernthal, J. E. and Bankson, N. W: Articulation and phonological disorders (3rd ed.). Prentice Hall, Boston (1993) 5. Bodt, M. S. D., Huici, M. E. H., Heyning, P. H. V. D.: Intelligibility as a Linear Combination of Dimensions in Dysarthric Speech. Journal of Communication Disorders 35 (2002) 283 - 292.
An Emotion-Driven Musical Piece Generator for a Constructive Adaptive User Interface Roberto Legaspi, Yuya Hashimoto, and Masayuki Numao The Institute of Scientific and Industrial Research, Osaka University 8-1 Mihogaoka, Ibaraki, Osaka, 567-0047, Japan {roberto, hasimoto}@ai.sanken.osaka-u.ac.jp,
[email protected] Abstract. This paper presents the results of recent modification in the Constructive Adaptive User Interface (CAUI) that induces a model of emotional impressions towards certain musical piece structures and improvises a piece based on the model. The CAUI previously employed a ready-made melody generating module with its internal workings abstracted from the CAUI. Utilizing such black-box modules, however, may impede further effort to enhance the CAUI. To address this problem, a replacement module that automatically creates tunes tailored to the listener’s impressions has been incorporated. Current results indicate that the CAUI may induce relevant relations that can support the adaptive improvisation of impression-causing tunes of a musical piece.
1 Introduction To understand the significant link that unites music and the emotions has been a subject of considerable interest involving various fields (exemplified in [1]). Although the field of AI has played a crucial role in computer music for almost five decades (reviewed in [2]), and more recently, the few works in Machine Learning that aim to find regularities from musical performance examples (e.g., [8,7,6]), the consideration of emotions in intelligent music systems has received little attention. This paper reports the results of current modification in the Constructive Adaptive User Interface (CAUI) [3] that induces a model of the listener’s emotional impressions towards certain musical piece structures and subsequently re-arranges or composes a piece based on the model. The CAUI previously employed an external ready-made module, whose internal workings are abstracted from the CAUI, to generate a melody while necessitating some degree of manual support. Utilizing such black-box modules, however, may impede further effort to improve the CAUI. Hence, a tune-creating module that is adaptive to the listener’s impressions has been integrated to the CAUI in place of the abstracted module. With this new module, the CAUI is able to generate a chord progression consisting of tones that make up a specific tune and alters certain tones based on an existing music theory thereby creating a non-monotonic musical piece. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 890 – 894, 2006. © Springer-Verlag Berlin Heidelberg 2006
An Emotion-Driven Musical Piece Generator for a CAUI
891
2 Acquisition of Impression-Causing Musical Structures The CAUI collects a person’s impressions of certain musical pieces, based on which it extracts a musical structure causing a specific impression. The CAUI compiles a listener’s evaluations of musical pieces using a web-based evaluation instrument (illustrated in [3]). By using Osgood’s semantic differential method in psychology, each subject can rate a piece as one of 5 grades for 6 pairs of impression adjectives, namely, favorable-unfavorable, lively-dull, stable-unstable, beautiful-ugly, happy-sad, and heartrending-joyful. For the first pair, for instance, a 5 means a piece is favorable and a 1 means otherwise. Each subjective rating in the 5-1 scale reflects the degree of the listener’s impression. 75 well-known pieces, from which 8 or 16 successive bars were extracted, were evaluated by 14 subjects. All the musical pieces used to train the CAUI were prepared in the predicate music/2 form that represents an entire piece given 2 (as specified in the denominator) arguments, namely, a song_frame/7 and a list of chord/12 predicates. The predicate song_frame/7 describes the musical frame in terms of tonality, rhythm and musical instrument, and the predicate chord/12 describes a chord structure in terms of its main chord, key, root, duration and function. Using the subjects’ ratings and the predicate-represented musical pieces, the CAUI employs FOIL[4] to generate the rules that describe the musical structures that cause specific impressions, and subsequently uses Rx[5] to refine the FOIL-obtained rules. Each rule is described using a target predicate. The CAUI aims to learn three kinds of target predicates, namely, frame/1, pair/2, and triplet/3, which represent the whole framework of music, and a pattern of two and three successive chords, respectively. The number of generated rules may vary with each subject. For instance, when a model was induced for subject A for his impression of a heartrending tune 10, 69, and 70 rules in the form of frame/1, pair/2, and triplet/3, respectively, were learned.
3 Composition of a Musical Piece Evolutionary methods have been instrumental in music generation (e.g., [9]). The CAUI is distinct since its GA utilizes for its fitness function the model of userspecific impression-reflective structures and music theory. The bit string in GA is extended to a row of columns in which the first column contains the bit-representation of the components of a song_frame/7, and the rest of the columns contain chord/12 bit-representations. A one-point cross-over splits and exchanges a whole column thereby creating alternative chord progressions, and mutation changes a column’s structure thereby altering the music framework and the chord structure. A fitness function is employed to evaluate each possible alternative. The fitness function reflects the user-specific model and music theory: Fitness_Function(M)=Fitness_User(M)+Fitness_Theory(M) .
(1)
where M is a piece described by predicate music/2. This makes possible to generate a chord progression that fits the music theory and causes the required feeling. The Fitness_Theory(M) penalizes a chord progression when it violates the music theory. The Fitness_User(M) function is computed as:
892
R. Legaspi, Y. Hashimoto, and M. Numao
Fitness_User(M)=Fitness_Frame(M)+Fitness_Pair(M)+Fitness_Triplet(M) .
(2)
To compute for each function, the target predicates learned for each of the impression adjectives (ia) in the various pairs ia1-ia2 are used. Table 1 specifies the computation algorithm and Table 2 specifies the meaning of each variable used in the algorithm. Table 1. Algrithm for computing Fitness_Frame(M), Fitness_Pair(M), and Fitness_Triplet(M) 1. A list L of Predi is extracted from M where Predi depends on the targeted fitness function. The length of L is denoted by n. 2. m patterns of x-successive Predi are subsequently extracted from L. Each pattern is denoted as Pi. 3. Each Pi is input to four other sub-functions, namely, δF,δ’F, δFR, andδ’FR: • δF returns +2 with reference to ia1 and δ’F returns -2 with reference to ia2 if Pi corresponds to any T from among the existing ones learned by FOIL for ia1 and ia2. • δFR returns +1 with reference to ia1 and δ’FR returns -1 with reference to ia2 if Pi corresponds to any T in the existing ones learned by FOIL and Rx for ia1 and ia2. The returned values have been determined empirically. The returned values for Pi from all four functions are averaged. In the event that both δF andδFR hold true, only δF is counted. The same holds true for δ’F and δ’FR. The average is denoted as Eval(Pi). 4. Each corresponding fitness function (e.g., Fitness_Frame(M)) is computed as:
(3)
m
∑ Eval (Pi ) . i =1
Table 2. Meaning of each variable in the above algorithm Fitness function Fitness_Frame Fitness_Pair Fitness_Triplet
Predi song_frame/7 chord/12 chord/12
x single two three
m n n-1 n-2
T frame/7 pair/2 triplet/3
Fig.1 shows that the CAUI automatically composed a heartrending piece without any handcrafted background knowledge on this impression. For each participant, a total of 48 tunes were composed, i.e., 4 tunes for each impression adjective (4×6×2).
4 Evaluation of the Composed Pieces and Future Works The subjects were asked to evaluate, this time, the composed musical pieces using the same evaluation instrument. Fig.2 shows the average results of the students’ evaluations, where the parenthesized values indicate the standard deviation, for each impression adjective. Fig.3 shows the resulting t-test values and significance level when t-test was applied to the participants’ evaluative ratings of each impression pair.
An Emotion-Driven Musical Piece Generator for a CAUI
Fig. 1. A CAUI-composed heartrending musical piece
Fig. 2. Average results of the subjects’ evaluations of composed pieces
Fig. 3. Results on performing t-test on the subjects’ evaluations
893
894
R. Legaspi, Y. Hashimoto, and M. Numao
The empirical results indicate that the CAUI has been moderately successful in creating tunes tailored to the subjects’ impressions for 4 out of 6 adjective pairs. As for the adjective pairs for stability and beauty, satisfactory results were not achieved. This can be attributed to the lack in performance for creating adequately wellstructured tunes from the viewpoint of music theory rather than any shortcoming in inducing the user-specific model of impression-structure relations. The results obtained are acceptably sufficient in this developmental stage and provide the motivation to further enhance the CAUI’s capability. Tunes obtained at this stage are rhythmically monotonic due to the exclusive use of 8-beat basic rhythmic pattern. The use of 4- and 16-beat would certainly contribute to a variety of patterns. Furthermore, the system only created 8-bar tunes and would need predicates that could represent musical periods that would provide more consistency to the tunes.
5 Conclusion What makes the CAUI distinct from other systems is its machine-learning framework capable of creating emotion-driven musical pieces based on a model of impressions towards certain piece structures as generalized from examples. Its current implementation is an exploration attempt to automatic tune composition in which one’s sensibility is reflected while keeping such knowledge relevant to music theory.
References 1. Juslin, P.N., Sloboda, J.A. (eds.): Music and Emotion: Theory and Research. Oxford University Press, New York (2001) 2. Lopez de Mantaras, R., Arcos, J.L.: AI and Music: From Composition to Expressive Performances. AI Magazine, Vol. 23, No. 3 (2002) 43-57 3. Numao, M., Takagi, S., Nakamura, K.: Constructive Adaptive User Interfaces - Composing Music Based on Human Feelings. In Proceedings of the Eighteenth National Conference on Artificial Intelligence (2002) 193-198 4. Quinlan, J.R.: Learning Logical Definitions from Relations. Machine Learning, Vol. 5, Kluwer Academic Publishers, Boston (1990) 239-266. 5. Tangkitvanich, S., Shimura, M.: Refining a Relational Theory with Multiple Faults in the Concept and Subconcept. In Machine Learning: Proceedings of the Ninth International Workshop (1992) 436-444. 6. Thom, B: Unsupervised Learning and Interactive Jazz/Blues Improvisation. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (2000) 652-657 7. Tobudic, A., Widmer G.: Relational IBL in Classical Music. Machine Learning, Springer Netherlands (2006) 8. Widmer, G.: Discovering Simple Rules in Complex Data: A Meta-learning Algorithm and Some Surprising Musical Discoveries. Artificial Intelligence, Vol. 146, No. 2 (2003) 129148 9. Wiggins, G., Papadopoulos, G., Phon-Amnuaisuk, S., Tuson, A.: Evolutionary Methods for Musical Composition. International Journal of Computing Anticipatory Systems (1999)
An Adaptive Inventory Control Model for a Supply Chain with Nonstationary Customer Demands Jun-Geol Baek1, Chang Ouk Kim2, and Ick-Hyun Kwon3 1
Department of Industrial Systems Engineering, Induk Institute of Technology, Wolgye-dong, Nowon-gu, Seoul, 139-749, Republic of Korea 2 Department of Information and Industrial Engineering, Yonsei University, Sinchon-dong, Seodaemun-gu, Seoul, 120-749, Republic of Korea 3 Department of Industrial Systems and Information Engineering, Korea University, Anam-dong, Sungbuk-gu, Seoul, 136-701, Republic of Korea
Abstract. In this paper, we propose an adaptive inventory control model for a supply chain consisting of one supplier and multiple retailers with nonstationary customer demands. The objective of the adaptive inventory control model is to minimize inventory related cost. The inventory control parameter is safety lead time. Unlike most extant inventory control approaches, modeling the uncertainty of customer demand as a statistical distribution is not a prerequisite in this model. Instead, using a reinforcement learning technique called actionreward based learning, the control parameter is designed to adaptively change as customer demand pattern changes. A simulation based experiment was performed to compare the performance of the adaptive inventory control model.
1 Introduction This paper deals with the inventory replenishment problem of a single item in a twostage supply chain system, in which a supplier replenishes the inventories of multiple retailers with nonstationary customer demands. By the nonstationary customer demand process, we mean that the mean and variance of demand distribution change along with time. At the retailer, if customer demands are not satisfied at sales points of time, the demands are treated as lost sales. Between the supplier and each retailer, there exist a constant transportation lead time. However, the retailers’ actual lead times are not constants unless the supplier has enough amount of inventory to meet the retailers’ orders. Also, in this paper, we consider the varying reorder point system, based on which a centralized adaptive inventory control model is proposed. By adaptive, we mean that, as the customer demand changes, the inventory control model automatically adjusts reorder points in the direction of reducing inventory related costs. Therefore, associated decisions are concerned with when the retailers’ inventories are replenished. In more detail, at each review period, the supplier accesses each retailer’s inventory position and sales history data. With the data, the supplier anticipates the time point at which the inventory position of the retailer drops down below zero at first. If the time interval between the inspection time and the anticipated time is equal to the sum of the supplier’s lead time and retailer’s transportation lead time, then the supplier places Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 895 – 900, 2006. © Springer-Verlag Berlin Heidelberg 2006
896
J.-G. Baek, C.O. Kim, and I.-H. Kwon
an order to the outside source. As soon as the supplier receives the ordered quantity, he/she delivers it directly to the retailer without keeping it in the supplier’s warehouse. If the anticipation is accurate, the retailer is guaranteed to replenish its inventory without delay. However, because anticipating the time point at which the inventory position of the retailer drops down below zero is not accurate especially in nonstationary demand trend, the supplier should have a time buffer to adaptively regulate his order release time. The order release time of the supplier is expedited or delayed using the time buffer. Positive (negative) time buffer makes it release an order to the outside source earlier (later) than total lead time. The control parameter of the supplier is the length of the time buffer. Hereafter, the time buffer is called safety lead time. In this paper, the control parameter of the adaptive inventory control model is designed to adaptively change using an action-reward based learning approach, one of reinforcement learning techniques (Sutton and Barto 1998). In the context of the action-reward based learning, the learner or decision maker is called agent, and it interacts with non-static control domain. These two interact continuously, the agent selecting actions and the domain responding to those actions and giving rise to rewards, numerical values that are the inputs to the performance measure of the agent. The agent selects future actions based on the updated performance measure. For example, in the adaptive inventory control model, decision maker is the supplier, and the control domain is the two-stage supply chain wide inventory. Also, the performance measure is average inventory related cost occurred during lead times, and the action corresponds to safety lead time.
2 An Adaptive Inventory Control Model We define the following notations to explain the adaptive inventory control model. L0 : lead time of supplier, Li : transportation lead time of retailer i ( i = 1,2,..., N ),
Qi : order quantity of retailer i , sij : j th safety factor of retailer i , stij : safety lead time of the supplier when safety factor sij is applied,
σˆ ε (t ) : standard deviation of forecast errors estimated at inspection time t , Di (t ) ( Dˆ i (t ) ): (expected) customer demand at retailer i at inspection time t , Z i (t ) : inventory level (if Z i (t ) ≥ 0 ) or shortage (if Z i (t ) < 0 ) at site i at inspection time t ( i = 0,1, 2,..., N ), hi : inventory holding cost per stock keeping unit (SKU) at site i ( i = 0,1,2,..., N ), li : shortage cost per SKU at site i ( i = 0,1,2,..., N ),
− + Ci (t ) = [Z i (t )] ⋅ li + [Z i (t )] ⋅ hi : chain wide inventory related cost at site i at inspection time t ( i = 0,1,2,..., N ).
An Adaptive Inventory Control Model for a Supply Chain
897
The supplier monitors the inventory position of retailer i ( i = 1,2,..., N ) and sales history at each discrete inspection time. With the sales history data, the supplier updates linear time series model Dˆ i (t ′) = a 0 + a1t ′ . This model is used for estimating the amount of customer demands at future time t′ . In the model, coefficients a0 and a1 are also updated using exponential smoothing method (see Brown (1962) for detailed update formula). At each inspection time t , the inventory position of retailer i at future time t′ is defined as the inventory position observed at the inspection time t minus the sum of estimated demands during the time interval between the inspection time t and the future time t ′ . Now suppose that the future time t′ is set to the time at which the inventory position of retailer i falls down to zero. Then the JIT delivery policy can be briefly stated as follows. At inspection time t , if {(the time t ′ that the time series model predicts the inventory position of retailer i reaches zero) – t } ≤ L0 + Li + stij , then the supplier issues an order of Qi to the outside source.
(1)
If the demand process is stationary and its variance is very small, the forecasting model will accurately estimate demand during the total lead time. As the result, the JIT delivery policy can replenish retailers’ inventories at the time the inventory positions of retailers are close to zero. However, a problem arises if the retailers encounter sudden genuine changes in the underlying demand processes in terms of the changes of mean and (or) variance, resulting in the overestimation (or underestimation) of the demand. Of course, forecasting errors can be reduced to some extent with more sophisticated time series models. However, the models cannot fundamentally resolve the problem of forecasting errors generated due to the change of demand process. Safety lead time can adjust order placement time. For example, suppose that demand is underestimated. In this case, adding a positive safety lead time to actual lead time enforces the JIT delivery policy to place order earlier than the policy without the safety lead time. Since the delivery of ordered quantity takes the actual lead time, the JIT delivery policy considering positive safety lead time brings the effect of expediting order process. Similarly, forecasting with negative safety lead time will delay order process, and this is effective when demand is overestimated. The action-reward based learning approach is used for determining appropriate safety lead time. In general, safety lead time can be obtained from a multiplication function of lead time and forecast error (Bernard 1999). Let S i = {si1 , si 2 ..., sik } is the set of safety factors for retailer i . Then, at inventory replenishment time, safety lead time stij corresponding to safety factor sij can be derived from t + stij
Find stij such that ∑ Dˆ i (t ′) = sij × σˆ ε (t ) × ( L0 + Li ) t ′ =t
where estimated standard deviation of forecast errors σˆ ε (t ) is commonly approximated as 1.25 × MAD (See Brown (1962) for detailed justification about the approximation).
898
J.-G. Baek, C.O. Kim, and I.-H. Kwon
Suppose that some sij is selected at an inventory replenishment time, and an order for retailer i is placed at inspection time t according to the JIT delivery policy specified in (1). The ordered quantity will be delivered to retailer i at time t + L0 + Li . Then the responsibility is reflected to value C ( s ij ) of sij as follows: t + L0 + Li ⎡ 1 ⎤ (2) ⋅ ∑ Ci (t ′) − Cold ( sij )⎥ C new ( sij ) = C old ( sij ) + StepSize ⎢ ⎣ L0 + Li t′=t ⎦ In this model, because the inventory level of the supplier is always zero and the supplier centrally controls the whole supply chain, the inventory holding and shortage costs of the supplier are zero. However, since the objective is the minimization of the total average cost of the supply chain, the supplier must select safety lead time based on the retailer’s costs. If C new ( s ij ) is reduced, then sij can be regarded as an appropriate safety factor for
the current demand trend. Hence, the selection chance of sij at the next time should be increased. It is needless to say that the safety factors with least average costs should have great chance of being selected in order to reduce the total cost of the supply chain. Therefore, the next safety factor is determined according to the following rule. Pr{next safety factor = sij } =
e
∑
1 / Cnew ( sij )
(3)
1 / Cnew ( sij ) k j =1
e
The completed inventory control procedure of the adaptive inventory control model is explained as follows. Step 0. The supplier selects a safety factor sij initially for each retailer i ( i = 1,2,..., N ). Step 1. At inspection time t , if order placement condition (1) satisfies for retailer i , then the supplier issues an order of size Qi to the outside source. Step 2. If the ordered quantity Qi is arrived from the outside source, then the supplier immediately delivers Qi to retailer i . Step 3. After retailer i receives the ordered quantity Qi , the supplier updates C new ( sij ) according to the learning formula in (2). Select the next safety factor according to the probabilistic rule in (3). Set s ij = the next safety factor. Go to Step 1.
3 Simulation Based Experiment The simulated supply chain consists of one supplier and four retailers. Different customer demand process is assumed for each retailer. Time interval for inspecting the retailers’ inventory position and customer demands is set one day. The length of a simulation run is 5000 days. Given a specific demand process for each retailer, 20 simulation runs were performed and their average is measured. The set of safety factors for each
An Adaptive Inventory Control Model for a Supply Chain
899
retailer i (i = 1, 2, 3, 4) is defined as S i = {−1, − 0.75, − 0.5 , − 0.25, 0, 0.25, 0.5, 0.75,1} . Four experiment factors are considered: lead time, demand pattern, lost_sales_cost/ inventory_holding_cost ratio (L/H ratio), and supplier_cost/retailer_cost ratio (S/R ratio). For each different combination of the levels of the factors, 20 simulation runs were performed and their average value was taken into consideration for performance comparison. For lead time of site i , we consider three levels: short lead time (SL): Li = 0.3 × cyclei , normal lead time (NL): Li = 0.6 × cyclei , and long lead time (LL): Li = 0.9 × cyclei , where cyclei = Qi demand rate i and demand ratei is the average demand per period. In this paper, we assume that the customer demand is nonstationary. In the case of nonstationary customer demand, the mean of the normal distribution is designed to change at every random interval T according to the rule of mean j = mean j −1 + slope .
In this rule, slope and T are randomly created by uniform distributions U (− sm, sm ) and U (tu 2 , tu ) , respectively. sm and tu characterize the nonstationarity of demand process. In this experiment, we set the two parameters as follows: low nonstationarity (LN): sm = 1.0 and tu = 30 , medium nonstationarity (MN): sm = 2.0 and tu = 15 , and high nonstationarity (HN): sm = 4.0 and tu = 8 . In general, since lost sales cost per item is larger than inventory holding cost per item, we consider three cases: small difference (L/H ratio = 5), medium difference (L/H ratio = 10), and large difference (L/H ratio = 20). Finally, we also consider two different cost structures between the supplier and the retailers as follows: normal difference (S/R ratio = 1/3), no difference (S/R ratio = 1). From Fig. 1 and 2, we can observe that, in most cases, the performance of the adaptive inventory control model (AM) proposed in this paper is better than a traditional inventory control model such as (Q, R) model, which determines the reorder point using average customer demand. The reason would be that the adaptive inventory control model is designed to adaptively adjust the control parameter of the inventory control model as customer demand patterns change. As shown in Fig. 1-(c) and Fig. 2-(c), the performance of the adaptive inventory control model is more better than the traditional inventory control model (QR) when L/H ratio is high (e.g., L/H ratio = 20). This implies that AM is more effective when the lost sales cost per item is very large compared with the inventory holding cost. In addition, the effectiveness of AM becomes more obvious as S/R ratio becomes larger (e.g., S/R ratio = 1), which AM
QR
AM
AM
QR
1.8
1.0 0.8 0.6 0.4 0.2 0.0
1.5 1.2 0.9 0.6 0.3 0.0
(LN, SL) (LN, NL) (LN, LL) (M N, SL) (M N, NL) (MN, LL) (HN, SL) (HN, NL) (HN, LL)
(Nonstationarity, Lead time)
(a) L/H=5
QR
2.4 Normalized average cost / Period
Normalized average cost / Period
Normalized average cost / Period
1.2
2.0 1.6 1.2 0.8 0.4 0.0
(LN, SL)
(LN, NL) (LN, LL) (MN, SL) (M N, NL) (MN, LL) (HN, SL) (HN, NL) (HN, LL)
(LN, SL)
(LN, NL) (LN, LL) (MN, SL) (MN, NL) (MN, LL) (HN, SL) (HN, NL) (HN, LL)
(Nons tationarity, Lead time)
(Nonstationarity, Lead time)
(b) L/H=10
(c) L/H=20
Fig. 1. Simulation results in the case of S/R=1/3
900
J.-G. Baek, C.O. Kim, and I.-H. Kwon AM
AM
QR
AM
1.0 0.8 0.6 0.4 0.2
1.5 1.2 0.9 0.6 0.3 0.0
0.0 (LN, SL)
(LN, NL) (LN, LL) (M N, SL) (M N, NL) (MN, LL) (HN, SL) (HN, NL) (HN, LL)
(Nonstationarity, Lead time)
(a) L/H=5
QR
2.4 Normalized average cost / Period
Normalized average cost / Period
Normalized average cost / Period
QR
1.8
1.2
2.0 1.6 1.2 0.8 0.4 0.0
(LN, SL)
(LN, NL) (LN, LL) (MN, SL) (MN, NL) (MN, LL) (HN, SL) (HN, NL) (HN, LL)
(LN, SL)
(Nonstationarity, Lead time)
(b) L/H=10
(LN, NL) (LN, LL) (M N, SL) (MN, NL) (MN, LL) (HN, SL) (HN, NL) (HN, LL)
(Nonstationarity, Lead time)
(c) L/H=20
Fig. 2. Simulation results in the case of S/R=1
can be found in a relationship between the supplier and retailers that are members of different companies. Fig. 2-(c) supports such results. This result would accrue from the fact that the supplier does not keep inventory any time in the adaptive inventory control model.
4 Conclusions In most cases, due to unpredictable customer needs and economic situation, customer demands fluctuate with time, showing nonstationary patterns. To cope with this situation, we propose an adaptive, intelligent inventory control model, with the assumption that supplier is able to access online information about customer demand, as well as the inventory position of each retailer. Applying an action-reward based learning approach, one of reinforcement learning techniques, the control parameter of the inventory control model is designed to adaptively change as customer demand patterns change. A simulation based experiment was performed to compare the performance of the adaptive inventory control model.
References 1. Achabal, D. D., McIntyre, S. H., Smith, S. A., and Kalyanam, K.: A decision support system for vendor managed inventory. Journal of Retailing. 76 (2000) 430-454 2. Bernard, P.: Integrated Inventory Management. John Wiley & Sons (1999) 3. Brown, R.: Smoothing, Forecasting, and Prediction of Discrete Time Series. Prentice-Hall (1962) 4. Sutton, R. and Barto, A.: Reinforcement Learning. MIT Press (1998) 5. Zhao, X. and Xie, J.: Forecasting errors and the value of information sharing in a supply chain. International Journal of Production Research. 40 (2002) 311-335
Context-Aware Product Bundling Architecture in Ubiquitous Computing Environments* Hyun Jung Lee1 and Mye M. Sohn2,** 1
Sungkyun Institute of Management Research Sungkyunkwan University Myung Ryun 3-53, Chong No-Ku, Seoul 110-745, Korea
[email protected] 2 Department of Systems Management Engineering Sungkyunkwan University 300, Chunchun-dong, Jangan-gu, Suwon, Kyunggi-do, 440-746, Korea
[email protected] Abstract. We propose Context-Aware PRoduct Bundling Architecture (CARBA). It is necessary for the various products to be easily and immediately integrated, according to customers’ changed requirements. In order to integrate information from various resources such as airline, hotel reservation, and so on, a semantic web service supporting an ontology based travel information system is required. CARBA is basically implemented as a semantic web service, with several components for reconfiguring a bundle of traveling products, and is guaranteeing traveler’s mobility in ubiquitous computing environments.
1 Introduction Mobile technology is constantly evolving to better fulfill mobile user’s information service requirements, such as access of dispersed information on the web at any time and location [5]. However, it has been difficult to access changing context-dependent information and take information integration service to reflect changed context like alteration of traveling schedule, for high mobile user, even if mobile technology is constantly evolving. CARBA is basically designed as a product bundling and semantic web service, which are for performing of unpredictable requirements for relevant and actionable information to perform the task at hand [2], with several components to integrate services or products. In order to overcome data and information heterogeneity among services or products to be integrated, we adopt ontology as in [1]. The target problem addressed here is travel product bundling [6]. In this paper, we describe the architecture of CARBA and the context-aware traveling product bundling procedure with an illustrative example. Finally, the paper concludes with how the CARBA works effectively and a discussion on future research issues. *
**
This work was supported by grant No. R01-2006-000-10303-0 from the Basic Research Program of the Korea Science & Engineering Foundation. Correspondig author.
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 901 – 906, 2006. © Springer-Verlag Berlin Heidelberg 2006
902
H.J. Lee and M.M. Sohn
2 Overall Architecture of CARBA CARBA has three components to reconfigure a bundle of traveling products as depicted in Figure 1.
Fig. 1. Architecture of CARBA
In context identifier, single product identifier receives context-aware requests from mobile agents and parses the requests and message interpreter classifies the variables from single products into local and global variables using a rule base. In rules/constraints (R/C) manager, relevant R/C identifier and global R/C generator determines local and global relationship among variables, respectively. Composite R/C generator generates R/C within a bundled product. Finally, the conflict resolver resolves conflict between variables, and the message handler communicates with the service requester’s agent, creating a context reflective bundled product.
3 Context-Aware Traveling Product Bundling Procedure 3.1 Context-Aware Procedure Two activities occur in the context-aware procedure: communication with the service providers and a single product identifier, and the classification of variables and rules/constraints using a message interpreter. The single product identifier decomposes traveler’s requirements in the form of HTML, into rules and facts. It adapts the Rule Identification Markup Language (RIML) to identify rules and data implied in HTML. In this paper, rules and facts are manually identified by a knowledge engineer, but Park suggests ontology based rule acquisition [4]. Identified rules and facts are used to decide adequate service providers. An illustrative rule is as follows: IF
THEN
((NewFact.variable HAS DepartureCity AND (NewFact.variable HAS ArrivalCity)) AND (NewFact.variable HAS DepartureDate) SingleProduct IS Airline
To reflect traveler’s changed requirement, the single product identifier may interact with an airline service using above rule. Partial ontology for intelligent agents and interaction models is depicted in Figure 2. The Message interpreter classifies the local
CARBA in Ubiquitous Computing Environments
903
rules, constraints and variables implied in the SOAP/RSML message. The SOAP/RSML message which is a variation of a SOAP message implies processing results, and rules [3] and/or constraints which should be announced to service requesters. The message interpreter also classifies variables into global variables which are shared among single products and local variables which are referenced only by a single product, and may be related to a local rule itself and extracts R/C.
Fig. 2. Partial Ontology for airline service
As in illustrated Figure3, the variable “Age” is a global variable because it is referenced by the Airline, Hotel reservation, and Transportation product. However, variable “Route” is only referenced by an airline product. It is called a local variable such as “Dinner” and “Class.” Global variable A ge
A ge
Ag e
Date
Date
P lac e
P lac e
P lac e
A vailab le b udg et
A vailab le b udg et
R ental
M us eum
Dinner
#- o f- ro om
B us
C as tle
Ro ute
Ro o m- typ e
…
C las s Airline
…
P ac e
…
… Train
S quare
Hotel Reservation Transportation
City- Guide
Local variable
Fig. 3. Local and global variables
3.2 Rules and Constraints Generating Procedure There are rules and constraints between configurable variables in a single product, which are defined in the product DB of each web service provider and extracted to bundle several single products. In general, configurable variables have other relevant variables, according to rules and constraints. Therefore, relevant rules and constraints identifier identifies the relationship among configurable variables which come from
904
H.J. Lee and M.M. Sohn
several products. Global rules and constraints generator, for a single product to be bundled, generates global rules and constraints as follows: between global variable to bundle a product, and local variable from a single product, and between a global variable from a bundled product, and local variables with several single products. Figure 4 represents a bundled product which is combined by Airline and Hotel reservation products. Local R/C represents the relationship between local variables. Global R/C is represented by the relationship between global and local variables within a single product. For instance, the local variable “AirFare” has relevance to global variable “AvailableBudget.” The local rules and constraints represent the relationship among local variables. They do not have any relationship with other variables arising from other single products such as “AirClass.”
Fig. 4. A bundled product of “Airline” and “Hotel Reservation”
Composite rules and constraints generator generates global R/C. For instance, GC1 is the composition of constraints Air_GC3 and Acc_GC3. Air_GC3 and Acc_GC3 are sources from Airline product and Hotel Reservation product, respectively. GC2 is the composition of constraints Air_GC1, Air_Gc2, Acc_GC1, and Acc_GC2. This generator creates new rules and constraints between combined products. 3.3 Traveling Product Bundling Procedure Conflict resolver resolves confliction among constraints of composite variables. Therefore the composite rule is newly generated and context-reflective bundled product is expressed using XML. Message handler transmits the context-reflective bundled product to the traveler requesting the services. This overall product bundling procedure is repeated until the traveler is satisfied with proposed context-reflective bundled product.
4 Evaluation We conducted simulation using Netlogo software to evaluate the performance of the proposed CARBA, two types of product bundling system have been designed. One
CARBA in Ubiquitous Computing Environments
905
type uses CARBA to search and bundle products, while the other is general search system, which customers have to access several sites to bundle products. This experimental design is based on the expectation that the CARBA would contribute more to the performance of context-aware bundling product bundling, which can be measured by the decrease in search costs. To add realism to our experiment, we assumed that there exist 27 customers and 100 search sites. A customer agent can modify its requirements freely within 10 times. To evaluate performance of CARBA, we tested the following hypothesis: The searching cost of bundling product in CARBA is equal to the cost obtained by general search system. To show the validity of the proposed approach, we performed simulation experiments using several types of changing requirements. Performance is measured using the cost function of each agent as follows. Cost of CARBA = ((wC1 × search cost of site + wC2 × no. of coordination + wC3 × no. of bundling) / (wC1 + wC2 + wC3)) × no. of requirements changing Cost of General System = ((wG1 × search cost of sites + wG2 × coordination cost + wG3 × bundling cost) / (wG1 + wG2 + wG3)) × no. of requirements changing wC1 and wG1 represents the weight of search, wC2 and wG2 represents weight of coordination, and wC3 and wG3 represent weight of bundling. A paired t-test with 161 pairs was conducted to verify if the costs obtained from CARBA are significantly lower. We found that t = 21.34. Thus the test rejects the null hypothesis, which means the search costs of bundling products in CARBA is significantly (p < 0.0001) lower than those of the general search system.
5 Conclusion The ubiquitous computing world is emerging more rapidly than expected, therefore, it is becoming clear that the use of a context-aware decision support for mobile users would provide significant benefits in most mobility situations, including the traveling product bundling problem. The CARBA was designed for bundling of traveling products which requires a high level of ubiquity in order to fulfill the changing requirements of travelers as a semantic web service. With CARBA, it is expected that the high mobility issues of travelers is handled more effectively. In order to prove this assumption, an experimental situation is illustrated with an illustrative context-aware traveling product bundling problem consisting of travelers, and service providers, serving processing results, local rule, and constraints of single product, and CARBA. It is believed that this research provides a foundation for realizing semantic web efforts faster, providing improved quality of service for mobile users in ubiquitous environments.
References 1. Caragea, D., et al., Information Integration and Knowledge Acquisition from Semantically Heterogeneous Biological Data Sources. In: Proc. of the 2nd Int. Workshop on Data Integration in Life Sciences (DILS'05), San Diego, CA (2005)
906
H.J. Lee and M.M. Sohn
2. Choueiry, B.Y. and Noubir, G.: On the Computation of Local Interchangeability in Discrete Constraint Satisfaction Problems, Proc. AAAI-98, pp. 326 - 333 (1998) 3. Lee, J.K. and Sohn, M.: eXtensible Rule Markup Language. Communications of the ACM. Vol. 46, No. 5, pp. 59-64 (2003) 4. Lyytinene, K., and Yoo. Y. Research Commentary: The Next Wave of Nomadic Computing, Information Systems Research, Vol. 13, No. 4, pp. 377-388 (2002) 5. Weiser, M. Ubiquitous Computing, IEEE Computer (1993) 6. Werthner, H., and Ricci F. “E-Commerce and Tourism,” Communication of ACM, Vol. 47, No. 12, pp. 101-105 (2004)
A Relaxation of a Semiring Constraint Satisfaction Problem Using Combined Semirings Louise Leenen1 , Thomas Meyer2 , Peter Harvey1 , and Aditya Ghose1 1
Decision Systems Laboratory, School of IT and Computer Science, University of Wollongong, Australia {ll916, pah06, aditya}@uow.edu.au 2 National ICT Australia School of Computer Science and Engineering, University of New South Wales, Sydney, Australia
[email protected],
[email protected] Abstract. The Semiring Constraint Satisfaction Problem (SCSP) framework is a popular approach for the representation of partial constraint satisfaction problems. In this framework preferences (semiring values) can be associated with tuples of values of the variable domains. Bistarelli et al. [1] define an abstract solution to a SCSP which consists of the best set of solution tuples for the variables in the problem. Sometimes this abstract solution may not be good enough, and in this case we want to change the constraints so that we solve a problem that is slightly different from the original problem but has an acceptable solution. In [2] we propose a relaxation of a SCSP where we define a measure of distance (a semiring value from a second semiring) between the original SCSP and a relaxed SCSP. In this paper we show how the two semirings can be combined into a single semiring. This combined semiring structure will allow us to use existing tools for SCSPs to solve Combined Semiring Relaxations of SCSPs. At this stage our work is preliminary and needs further investigation to develop into a useful algorithm.
1
Introduction
The considerable interest in over-constrained problems, partial constraint satisfaction problems and soft constraints is motivated by the observation that with most real-life problems, it is difficult to offer a priori guarantees that the input set of constraints to a constraint solver is solvable. Many real-life problems are inherently over-constrained. In order to solve an over-constrained problem we have to identify appropriate relaxations of the original problem that are solvable. Early approaches to such relaxations largely focussed on finding maximal subsets (with respect to set cardinality) of the original set of constraints that are solvable (such as Freuder and Wallace’s work on the MaxCSP problem [3]). Subsequent efforts considered more fine-grained notions of relaxation, where entire constraints did not have to be removed from consideration ([4], [5], [6]). Bistarelli et al. [1] proposed an abstract semiring CSP scheme that generalised most of these earlier attempts, while making it possible to define several Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 907–911, 2006. c Springer-Verlag Berlin Heidelberg 2006
908
L. Leenen et al.
useful new instances of the scheme. The SCSP scheme assumes the existence of a semiring of abstract preference values, such that the associated multiplicative operator is used for combining preference values, while the associated additive operator is used for comparing preference values. An SCSP constraint assigns a preference value to all possible value assignments to the variables in its signature. These preferences implicitly define a relaxation strategy. In our previous paper [2] we define how an SCSP may be relaxed by introducing a mechanism by which we can minimally alter (or relax) constraints of the problem. We also introduce a measure of distance between an original constraint and its relaxed version that is modeled via a second semiring. Our aim in this paper is to show that we can combine the first semiring (these semiring values are used as preference values associated with tuples of constraints) and the second semiring (these semiring values are used as distance values between constraints and relaxed constraints) into a single semiring. With a single semiring we can resort to exisiting SCSP tools to solve Relaxed SCSPs.
2
The SCSP Framework and Relaxations of SCSPs
This section contains a summary of the SCSP framework of Bistarelli et al. [1], as well as a summary of the results in [2] where we propose a technique to relax the constraints of the original problem. Definition 1. A c-semiring is a tuple S = A, +, ×, 0, 1 such that – A is a set with 0, 1 ∈ A; – + is defined of elements of A as follows: for all over (possibly infinite) sets a ∈ A, ({a}) = a, (∅) = 0 and (A) = 1, and ( Ai , i ∈ I) = ({ (Ai ), i ∈ I}) for all sets of indices I. (When + is applied to sets of elements, we use the symbol .) – × is a commutative, associative, and binary operation such that 1 is its unit element and 0 is its absorbing element, and × distributes over +. The elements of the set A are the preference values to be assigned to tuples of values of the domains of constraints. Let S be a partial order over A: α S β iff α + β = β. 0 is the minimum element and 1 is the maximum element. Definition 2. Consider a constraint system CS = Sp , D, V where Sp = Ap , +p , ×p , 0p , 1p is a c-semiring, V is an ordered finite set of variables, and D is a finite set of allowed values for the variables in V. A constraint over CS is a pair c = defcp , conc with conc ⊆ V , and defcp : Dk → Ap (k is the cardinality of conc ). A Semiring Constraint Satisfaction Problem (SCSP) over CS is a pair P = C, con where C is a finite set of constraints over CS and con = c∈C conc . defcp1 , conc ∈ C and defcp2 , conc ∈ C implies defcp1 = defcp2 . Definition 3. Given a constraint system CS = Sp , D, V where Sp = Ap , +p , ×p , 0p , 1p , and two constraints c1 = defcp1 , conc1 and c2 = defcp2 , conc2 over CS, their combination, c1 ⊗ c2 , is the constraint c = defcp , conc with p conc c conc = conc1 ∪ conc2 and defcp (t) = defcp1 (t ↓con conc1 ) ×p defc2 (t ↓conc2 ).
A Relaxation of a SCSP Using Combined Semirings
909
See [1] for the definition of the projection t ↓W W of a tuple t from a set W to a set W , and the definition of the projection c ⇓ I of a constraint c = defcp , conc over set I of variables. A solution to an SCSP is a single constraint formed by the combination of all the original constraints. An abstract solution consists of the set of k-tuples of D whose c-semiring values are maximal w.r.t. Sp . Definition 4. Given an SCSP P = C, con CS, the over a constraint system solution of P is a constraint Sol(P ) = ( C) = defcp , con where C = c1 ⊗c2 ⊗...⊗cn with C = {c1 , ..., cn }. The set ASol(P ) = {t, v | defcp (t) = v and there is no t such that v <Sp defcp (t )} is the abstract solution and ASolV (P ) = {v | t, v ∈ ASol(P )} contains the maximal preference values. Definition 5. [7] Let a good enough (abstract) solution for a SCSP P be such that some element in ASolV(P) is in the region βˆ where βˆ = {γA : β Sp γ}. If ASolV (P )∩ βˆ = ∅ we want to find a relaxation P of P , such that ASolV (P )∩ βˆ = ∅. P should be as close to the original P as possible. Definition 6. A constraint cj = defjp , conj is called a ci -weakened constraint of the constraint ci = defip , coni iff the following hold: coni = conj ; for all tuples t, defip (t) S defjp (t); and for every two tuples t1 and t2 , if defip (t1 ) <Sp defip (t2 ), then defjp (t1 ) <Sp defjp (t2 ). Definition 7. Given a constraint system CS = Sp , V , D and an SCSP P = C, con, for each c ∈ C, let Wc be the set containing all c-weakened constraints, i.e. Wc = {cj | cj is a c-weakened constraint}. Let Sd = Ad , +d , ×d , 0d , 1d be a c-semiring and wdefcd : Wc → Ad be any function such that the following hold: wdefcd (cj ) = 0 iff cj = c; ∀ci , cj ∈ Wc , if for all tuples t defip (t) Sp defjp (t) then wdefcd (ci ) Sd wdefcd (cj ); and if there exists one tuple t such that defip (t) <Sp defjp (t) and for all tuples s we have defip (s) Sp defjp (s), then wdefcd (ci ) <Sd wdefc d(cj ). Definition 8. – The c-weakened constraint ci is closer to c than the c-weak ened constraint cj , iff wdefcd (ci ) <Sd wdefcd (cj ). – The c-weakened constraint ci is no closer to c than the c-weakened constraint cj , iff wdefcd (cj ) Sd wdefcd (ci ). – The c-weakened constraints ci and cj are incomparable w.r.t. closeness to c iff wdefcd (ci ) Sd wdefcd (cj ) and wdefcd (cj ) Sd wdefcd (ci ). The function wdefcd assigns a distance value from the set of the c-semiring Sd to each c-weakened constraint, and is restricted as follows. Let cik be a ci -weakened constraint, and cjm and cjn be cj -weakened constraints. If wdefcdj (cjm ) <Sd wdefcdj (cjn ), then wdefcdi (cik ) ×d wdefcdj (cjm ) <Sd wdefcdi (cik ) ×d wdefcdj (cjn ). Definition 9. A SCSP P = C , con is a d-relaxation of the SCSP P = C, con where Sd = Ad , +d , ×d , 0d , 1d , iff there is a bijection f : C → C and ∀c ∈ C, f (c) is a c-weakened constraint. Let R(P ) = {P | P is a d-relaxation of P}, and Rβˆ (P ) = {P ∈ R(P ) | ASolV (P ) ∩ βˆ = ∅}.
910
L. Leenen et al.
A d-relaxation P = C , con of P = C, con is such that every c-weakened constraint c ∈ C is the closest possible to the constraint c ∈ C while the ˆ abstract solution of P is still good enough (w.r.t. β). Definition 10. Given a d-relaxation P = C , con of a SCSP P = C, con such that P ∈ Rβˆ (P ), let d(P ) = ×d c∈C (wdefcd (f (c))) be the distance between P and P . The set M Rβˆ (P ) = {P ∈ Rβˆ(P ) | P ∈ Rβˆ(P ) such that d(P ) <S d(P )} contains the relaxations closest to P.
3
A Combined Semiring
Definition 11. Suppose SA = A, ⊕A , ⊗A , 0A , 1A and SB = B, ⊕B , ⊗B , 0B , 1B are two c-semirings. Let a Combined C-Semiring be SU = U , ⊕U , ⊗U , 0U , 1U with U = {a1 , . . . , ak , b | ai ∈ A, and b ∈ B} for some fixed non-negative integer k. If we have u1 , u2 ∈ U , with u1 = a11 , . . . , a1k , b1 and u2 = a21 , . . . , a2k , b2 , then the following statements hold. – u1 ⊗U u2 = a11 ⊗A a21 , . . . ,a1k ⊗A a2k , b1 ⊗B b2 . – u1 ⊕U u2 = a11 ⊕A a21 , . . . ,a1k ⊕A a2k , b1 ⊕B b2 . – 0U = a1 , . . . , ak , b such that every ai = 0A and b = 0B , and 1U = a1 , . . . , ak , b such that every ai = 1A for i = {1, . . . , k}, and b = 1B . – A pre-order U over the set U is defined as u1 U u2 iff b1 B b2 . Definition 12. Let P = C, con be an SCSP over a constraint system CS = Sp , D, V and P = C , con be a d-relaxation of P. A Combined Semiring Relaxation of P is a tuple P , g with SU = U , +U , ⊗U , 0U , 1U , where g : C × C → U , i.e. for every c = defcp , conc ∈ C and every c-weakened constraint cr ∈ C , g(c, cr ) = ucr with ucr ∈ U . Assume all tuples of values of D are strictly ordered. Let ucr = P refcr , bcr , where bcr is the distance value associated with the constraint cr , and P refcr = acr1 , . . . , acrk where acri , for i = {1, . . . , k}, are the preference values associated with the constraint cr ⊗p cBEST . Let cBEST = defcpBEST , con be a dummy constraint with defcpBEST (t) = 1p for every tuple t over the set of variables con. p con Note that defcpr ⊗p cBEST (t) = defcpr (t ↓con conc ) ⊗p 1p = defcr (t ↓conc )(t). The coordinates in the set P refcr are the preference values associated with the k tuples in the relaxed constraint (over the variables in the set con) while bcr represents the distance between the relaxed problem P and the original problem P .
Definition 13. Given a Combined Semiring Relaxation RP = P , g of an SCSP P = C, con and a d-relaxationP = C , con, the solution of RP is a constraint defined as RSol(RP ) = ( C ) with g( C, C ) = uCR . Suppose uCR = P refCR , bCR , and P refCR = aCR1 , . . . , aCRk . Then the abstract solution of RP is the set RASol(RP ) = {t, a | a ∈ P refCR , t is the tuple with which a is associated, and there is no t , a such that a <Sp a }. Let ASolV (RP ) = {a | t, a ∈ RASol(RP )}.
A Relaxation of a SCSP Using Combined Semirings
911
Let Rel(P ) = {RP | RP is a Combined Semiring Relaxation of an SCSP P = C, con}. Now we define a set containing the best Combined Semiring Relaxations with solutions that are good enough. Definition for every RP=P ,g ∈ Rel(P ) with P = C , con, we 14. Suppose have g( C, C ) = uCR = P refCR , bCR , and P refCR = aCR1 , . . . , aCRk . ˆ = ∅ and there is no relaxation Let Relαˆ (P )={RP ∈ Rel(P ) | RASolV (RP ) ∩ α P R ∈ Rel(P ) such that bCR <Sd bCR }.
4
Conclusion and Future Work
If the preference value associated with the abstract solution of an SCSP is not regarded as good enough, a suitable relaxation of the SCSP that has a good enough solution is found by adjusting the preferences associated with the tuples of some of the constraints (i.e. c-semiring values of the first semiring) of the original SCSP. In other words, the constraints of the original problem are relaxed until the resulting problem has a satisfactory solution. Distance values (i.e. c-semiring values from a second semiring) are associated with each relaxed constraint so that different relaxations of a problem can be compared in terms of their distance to the original problem. In this paper we show how to combine these two semirings into a single semiring. The combined semiring allows us to rely on existing techniques for solving SCSPs. At this stage we simply have a technical result and need to investigate computational aspects of this process. We aim to develop techniques to calculate solutions to a maximal Combined Relaxation of SCSP efficiently.
References 1. Bistarelli, S., Montanari, U., Rossi, F.: Semiring-based constraint solving and optimization. Journal of the ACM 44(2) (1997) 201–236 2. Leenen, L., Meyer, T., Ghose, A.: Relaxations of semiring constraint satisfaction problems. In: Proceedings of the International Conference on Constraint Programming Preferences and Soft Constraints Workshop (SOFT-05). (2005) 3. Freuder, E.C., Wallace, J.W.: Partial constraint satisfaction. Artificial Intelligence 58 (1992) 21–70 4. Wilson, M., Borning, A.: Hierarchical constraint logic programming. Journal of Logic Programming 16 (1993) 277–318 5. Dubois, D., Fargier, H., Prade, H.: The calculus of fuzzy restrictions as a basis for flexible constraint satisfaction. In: Proc. of IEEE Conference on Fuzzy Systems. (1993) 6. Fargier, H., Lang, J.: Uncertainty in constraint satisfaction problems: a probabilistic approach. In: Proc. ECSQARU. (1993) 7. Ghose, A., Harvey, P.: Partial constraint satisfaction via semiring CSPs augmented with metrics. In: Proceedings of the Australian Joint Conference on AI. Volume 2557 of Lecture Notes in Computer Science., Springer (2002)
Causal Difference Detection Using Bayesian Networks Tomoko Murakami and Ryohei Orihara Corporate Research & Development Center 1, Komukai Toshiba-cho, Saiwai-ku Kawasaki 212-8582, Japan {tomoko.murakami, ryohei.orihara}@toshiba.co.jp Abstract. In analysis of the market, detecting not only differences in consumer groups or changes but also their causal factors observed in consumer behavior is expected because it enables the marketer to take marketing actions. Although rule-discovery approaches can efficiently identify differences in groups or changes, it is still difficult to explain the causes of them. In this paper we propose an algorithm to detect causal differences in two bayesian networks by search and probability inference. We perform some experimental studies to analyze consumer behavior in purchasing personal computer.
1
Introduction
Understanding consumers’ purchasing behavior of your own and/or competitors’ products is important for marketing and customer relationship management. There has been interests in discovery of knowledge related to consumer behavior using data mining methods since data acquiring technologies such as point of sales(POS) system were developed. Detecting differences is one of the basic and important tasks in consumer behavior analyses. Difference is interpreted in two aspects, one is difference between several groups, and the other is difference in a single group as it varies through attributes such as time or location. For example, the former is difference in brand selection between male and female consumers, and the latter is change of female consumers’ purchasing attitude in 3 years. In recent years contrast-set-mining, which is skill to detect differences in several groups, is proposed and related studies are reported [1,2,3]. There are several rule-discovery approaches specifically designed for identifying the differences between several contrasting groups . They can detect efficiently differences in groups or changes, but it is still difficult to explain the causes of them. In analysis of the market, detecting not only differences in consumer groups or changes but also their causal factors observed in consumer behavior is expected because it enables the marketer to take marketing actions. In this paper, we propose a method, Causal Difference Detection using Bayesian Networks(CDDBN) to identify causal differences in two models based on bayesian networks. CDDBN realizes discovery of causal differences by search and probability inference in bayesian networks. With it, it is possible to efficiently discover the causes of differences in consumer groups or trend changes in the market. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 912–917, 2006. c Springer-Verlag Berlin Heidelberg 2006
Causal Difference Detection Using Bayesian Networks
2
913
Related Works
Tian proposed a method of discovering causal relations from data, based on detection and interpretation of local spontaneous changes in the environment [4]. The method is evaluated by χ2 test to investigate if causal relations change as time variation. Although Tian showed discovering changes of causal relations in data under the dynamic environment, the reason why changes cause is not discussed. Dong and Li worked on the problem of discovering emergent patterns (EPs) [1], which defines a difference in support values for an item in different data sets based on association rule discovery [5]. Bay developed STUCCO algorithm for mining contrast sets by combining statistical hypothesis testing with search [2]. They showed the reduction of processing time for efficient mining and summarized results by applying the algorithm to UCI data set. Webb conducted experiments using Magnum Opus [6], STUCCO [2], C4.5 [7] to mine the differences between contrasting groups [3]. They showed that Magnum Opus could successfully perform the task and also discussed how new and valuable contrast-set-mining is. With these rule-discovery approaches, it is possible to detect differences between two or more probability distributions, but still difficult to discover causal differences. Thus we can obtain answers for a query such as ”How does group A differ from B?”, but answers for a query such as ”Why does group A differ from B?” cannot be returned.
3
Causal Difference Detection
We propose causal difference detection algorithm, which is a bayesian method to discover factors which cause differences by search and probability inference in bayesian networks. We illustrate CDDBN. The details of CDDBN algorithm is shown in figure 1. Input data is two bayesian networks and a target probability. The two bayesian networks are composed of same graph structure but distinct probability distribution. Output is a report to describe differences and causal differences in target probability for two bayesian network. Let us denote them as (G, P ) and (G, P ), where G = (V, E) is a graph composed of variables V and edges E, P and P are probability distributions. A trigger to start a search is user input of a target probability for analysis, for example, a probability P (c1 ) meaning that a variable C ∈ V takes value c1 . The task is to discover differences in frequency or causal relation which cause differences in target probability in (G, P ) and (G, P ). Suppose that {C, L1, · · · , Ln } ∈ V , {L1 , · · · , Ln } are parents of C in G and taking arbitrary values. If the user inputs target probability, for example P (ck ), then search based on graph structure and probability inference in bayesian networks starts. A difference in a target probability ΔP (ck ) in (G, P ) and (G, P ) is calculated by utilizing a parent variable Li = {l1 , . . . , lm } in graph structure as follows. we obtain equation (1) by assigning P (ck |li ) = P (ck |li ) + ΔP (ck |li ) and P (li ) = P (li ) + ΔP (li ). . (P (c |l ) · ΔP (l ) + ΔP (c |l ) · P (l )). ΔP (ck ) = . k i i k i i m
i=1
(1)
914
T. Murakami and R. Orihara Input: Two Bayesian Networks (G, P ) and (G, P ), where G = (V, E) is a graph composed of variables(V ) and edges(E). Output: causal differences in probabilities between (G, P ) and (G, P ). Let D be a set of all values of V Let N ⊆ V , v ⊆ D Let th1, th2 be thresholds Let report be a function for reporting causal differences Function: reportDif f (G, N, v, D) Begin 1. for (each q ∈ D) 2. if (q = v) return 3. D ← v; 4. if (P (N = v) · log(P (N = V )/P (N = V )) < th1) return 5. f lag ← false 6. for (each parent P rt of N ) 7. for (each value k of P rt) 8. if (P (N = v|P rt = k) < th2) continue 9. ΔP OccDif f ← P (P rt = k) · log(P (P rt = k)/P (P rt = k)) 10. if (Δ POccDiff > th1) 11. f lag ← true 12. report(N, v, P rt, k) 13. reportDif f (G, P rt, k, D) 14. if (f lag = true) return 15. for (each parent P rt of N ) 16. for (each value k of P rt) 17. if (P (P rt = k) < th2) continue 18. ΔP relDif f ← P (N = v|P rt = k) · log(P (N = v|P rt = k)/P (N = v|P rt = k)) 19. if (ΔP RelDif f > th1) report(N, v, P rt, k) End
Fig. 1. Algorithm for CDDBN
Equation (1) indicates that difference in target probability are approximately obtained by calculating frequency P (li ), causal relation P (ck |li ) and ΔP (li ) and ΔP (ck |li ) differences in (G, P ) and (G, P ) respectively. If a difference in target probability is greater than threshold number th1 in figure 1, the search of the causes are performed according to equation (1). Differences in causal relation are calculated according to the latter part of equation (1). The probability of frequency P (li ) is firstly calculated for each i(i = 1 . . . m), then ΔP (ck |li ) is calculated for i if P (li ) is greater than threshold number th2. Differences in frequency are similarly calculated according to the former part of equation (1). The probability of causal relation P (ck |li) is firstly calculated for each i(i = 1 . . . m), then ΔP (li) is calculated for i if P (ck |li) is greater than threshold number th2. If ΔP (li) is greater than threshold number th1, the calculation as described above is executed recursively for L and L’s parents in G. The search in bayesian networks (G, P ) and (G, P ) is continued until reaching a root node or visiting all parents of the target variable. CDDBN assumes that causal relation is not dramatically variant, so that we fix causal structures and introduce two bayesian networks, which are composed of same graph structure but distinct probability distribution. If causal structures are correctly defined as bayesian networks, it is possible to efficiently detect even minor but critical causal differences observed in consumer behavior avoiding redundant search following causal structures. Furthermore, it is possible to detect rare event due to using ratio of probabilities as criteria in detecting causal differences.
Causal Difference Detection Using Bayesian Networks
4
915
Causal Difference in Consumer Behavior
We conducted experiments in personal computer(PC)’s consumer behavior with CDDBN. With it, we detect causal differences between two consumer groups or changes in a single consumer group through the time. But we describe only the result of detecting causal differences between two consumer groups due to the limitation of the space in the paper. We detected causal differences in selecting 6 major PC brands by consumer groups in the Kanto and the Kansai, which are eastern and western region of Japan respectively. We firstly prepared two consumer behavior models for each consumer groups, then set a probability of selecting each PC brand as a target probability for analysis, that is, trigger to start a search. The consumer behavior models are bayesian networks constructed by utilizing knowledge possessed by marketers and survey data regularly conducted to investigate PC market. The survey data is obtained by means of questionnaires and collected from 3000 respondents who range in age from 20 to 59. We adopted 24 variables related to consumers’ profile, groups, purchasing reason and brand selection in consumer behavior model. They include 3 unobserved variables introduced by marketers, which are related to tendency to follow people around, tendency to lead in the market and computer skills. Conditional probabilities for the observed variables in the bayesian networks are computed based on frequency distribution in the data. For unobserved variables, on the other hand, that are approximately computed by applying expectation-maximization(EM) algorithm [8]. Parameters in the EM algorithm, that are the number of iteration of the expectation and maximization steps i and the number of sample data to compute expected sufficient statistics N , we set i = 100 and N = 2000 to converge to a local maximum. We constructed consumer behavior models by region and time by this means. We secondly applied CDDBN to consumer behavior models. The task was to detect causal differences in region or time. We experimentally set two thresholds for detecting difference th1 = 0.05, th2 = 0.3 in CDDBN. we set th1 = 0.05 because approximately 5% increase and decrease from 10% is frequently observed as differences in consumer behavior. we set th2 = 0.3 because high frequency and strong causal relation are experimentally to be obtained in consumer behavior. In CDDBN, probabilities are computed by probability inference, where we use a variation of likelihood weighting [9]. As the result of applying CDDBN algorithm, a probability of selecting brand C in Kanto is larger than that in Kansai. CDDBN reports detected differences and its causal differences. It was revealed that factor causing difference in probability of selecting brand C is that consumers who don’t mind PC’s reputation are greater in the number in Kanto compared with those in Kansai. Additionally, we obtained two factors causing difference in the number of consumers who don’t mind PC’s reputation (”reputation = 0”) as follows. Factor1. The number of consumers who purchase whatever she/he wants to buy is greater in Kanto due to having an annual household income from 5 to 10 million yen.
916
T. Murakami and R. Orihara
Factor2. The number of consumers who have advanced values and, either being female, or in age of 10 or 40, is greater in Kanto. Our results here have shown that CDDBN can detect causal differences in consumer behavior. Marketers in our company’s PC division evaluated the results as newly discovered and valuable knowledge. We compared our result to that of χ2 test to investigate differences in frequency and causal relations based on data. It is verified that half of the detected differences by CDDBN are significant between datasets related to consumer groups in the Kanto and the Kansai. As equation (1) indicates that difference in target probability is derived from not only differences in frequency and causal relations but frequency and causal relations themselves, the result of χ2 test does not necessarily show the lack of the validity of CDDBN.
5
Conclusion
In this paper, we proposed a method CDDBN to identify causal differences in two models based on bayesian networks. CDDBN realizes discovery of causal differences by search and probability inference in bayesian networks. We conducted experimental studies of discovery the causes of differences in consumer groups in personal computer’s market. We consider that CDDBN is a new technique for contrast-discovery task in the point of causal difference detection. However it leaves room to be refined. It would be interesting to investigate the issue of correction of difference in causal relation. It would also be interesting to extend CDDBN to be able to apply to multiple bayesian networks more than two. We will continuously conduct further experiments and refine the algorithms.
References 1. Dong, G. and Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. Proc. of the fifth International Conference on Data Mining and Knowledge Discovery, ACM Press (1999) 43-52 2. Bay, D. S. and Pazzani, J. M.: Detecting Group Differences: Mining Contrast Sets. Int. J. Data Mining and Knowledge Discovery, 5(3) (2001) 213-246 3. Webb, I. G., Butler, S. and Newlands D.: On detecting differences between groups. Proc. of the ninth International Conference on Data Mining and Knowledge Discovery, ACM Press (2003) 256–265 4. Tian, J. and Pearl, J.: Causal Discovery from Changes. Proc. of Uncertainty in Artificial Intelligence (2001) 512-521 5. Agrawal, R., Imielinski, T. and Swami, A.: Mining associations between sets of items in massive databases. Proc. of the tenth Internatinal Conference on Data Management of Data, ACM Press (1993) 207-216 6. Webb, I. G., Magnum Opus version 1.3. Computer software [http://www.rulequest.com/]. Distributed by Rulequest Research (2001)
Causal Difference Detection Using Bayesian Networks
917
7. Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993) 8. Dempster, P. A., Laird, M. N. and Rubin, B. D.: Maximum likelihood from incomplete data via the EM algorithm. Royal Statistical Society, B39 (1977) 1-38 9. Fung, R. and Chang, K. C.: Weighting and integrating evidence for stochastic simulation in Bayesian networks. Proc. of Uncertainty in Artificial Intelligence (1989)
Tabu Search for Generalized Minimum Spanning Tree Problem Zhenyu Wang1 , Chan Hou Che2 , and Andrew Lim1,2 1
School of Computer Science & Engineering South China University of Technology, Guang Dong, P.R. China 2 Dept of Industrial Engineering and Logistics Management Hong Kong Univ of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
[email protected],
[email protected],
[email protected] Abstract. The Generalized Minimum Spanning Tree (GMST) problem requires spanning exactly one node from every cluster in an undirected graph. GMST problems are encountered in telecommunications network planning. A Tabu Search (TS) for the GMST problem is presented in this article. In our computational tests on 194 TSPLIB instances, TS found 152 optimal solutions. For those 42 unsolved instances, our algorithm has improved some previously best known solutions. Lower bounds of some unknown problems are improved by our heuristic relaxation algorithm.
1
Introduction
The Generalized Minimum Spanning Tree (GMST) problem is defined as follows: Given an undirected graph G = (V, E), where V is the node set partitioned into clusters Vk ,1 ≤ k ≤ m, Vi ∩ Vj = φ(i = j) and E = {(i, j)|i ∈ Vk , j ∈ Vl , k = l} is the Edge set, we need to find a minimum spanning tree including exactly one node from each cluster. The GMST problem is in telecommunications where local area networks (LAN) must be connected with each other [1]. The GMST problem is NP-hard as proved by Myung et al[1]. Feremans et al.[2] described eight formulations for the GMST problem. In Golden et al.[5], a local search and a Genetic Algorithm (GA) were developed to provide solutions for instances including up to 226 nodes. Pop et al. [8] proposed a relaxation model for GMST with up to 240 nodes. Another version of GMST which was required to span at least one node from each node cluster was studied by Dror et al.[4] and Feremans et al.[3]. Haouar et al.[9] proposed two stochastic heuristics and a Lagrangian based lower bound for this variant GMST problem. This paper proposes a Tabu Search (TS) for the GMST problem while we will implement the GA proposed by Golden et al.[5]. The computation results show that TS is more effective than the GA. In addition, we will develop a relaxation algorithm to estimate the lower bound (LB). The result is promising, in which the relaxation is able to find a better LB than previous results. The remainder of this paper is organized as follows. In Section 2, we introduce a Tabu Search. Then we describe a lower bounding heuristic method in Section 3. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 918–922, 2006. c Springer-Verlag Berlin Heidelberg 2006
Tabu Search for Generalized Minimum Spanning Tree Problem
919
The computation result is reported in Section 4 while our conclusions follow in Section 5.
2
Description of the Heuristic
The TS algorithm is implemented by three phases, Basic Tabu Search (BTS) phase, intensification phase and diversification phase. Algorithm 1 describes the BTS in pseudocode. The procedure will randomly generate a feasible solution S = (v1 , v2 , ...vm ), vi ∈ Vi , as initial solution. Then we will select a cluster Ri and replace all the nodes in selected cluster Ri to generate new solutions. Our heuristic approach uses a tabu queue list with fixed length l, which records the cluster visited by recent iterations. The algorithm will stop until the number of iterations in which the search progresses without any improvement is greater kmax . Result: Solution S = (v1 , v2 , ...vm ) Data: G is the Graph Create initial feasible solution S; repeat Randomly choose Cluster Ri ; / T abu then if Ri ∈ foreach node v from ClusterRi do //vi represent ClusterRi in S ; S ← S − vi + v ; //M ST (S) is the function that calculates the cost of solution S. if M ST (S ) < M ST (S) then S ← S ; end add Ri in the Tabu ; end until no improvement after kmax ; Algorithm 1. Basic Tabu Search
When designing the algorithm, we found some nodes always appeared in the local optimal solution. These nodes are more likely to appear in the global optimal solution than others. Therefore, we intend to develop a new searching space which is generated by a list of local optimal solutions; so-called “elite candidate list”. We restarted the BTS fifty times and selected the best five solutions as elite candidates. An intensive search space G = (V , E ), Vi ⊆ Vi , E = {(vi , vj )|vi ∈ Vk , vj ∈ Vl , k = l} is generated by grouping these five solutions. To avoid solutions being trapped in local optimal, we applied a threshold strategy to increase diversity of the intensive search space, i.e. we multiplied a random value with the number of differences between the subcluster and old cluster. If the result is higher than the threshold, the subcluster will be replaced by the old cluster. Finally, the overall algorihtm is described in Algorithm 2.
920
Z. Wang, C.H. Che, and A. Lim
Result: Solution S Data: G is the Graph while i < 50 do S = BT S(G); Add S in to Candidate List i ← i + 1; end EliteList ← T op5(CandidateList); while length of EliteList > 1 do //getIntensiveSpace is a function to get a new search space G ← getIntensiveSpace(EliteList) ; S ← BT S(G ) ; if M ST (S ) < M ST (S) then add S into EliteList; Remove the largest one in elite list ; end Algorithm 2. Overall Tabu Search Algorithm
3
Lower Bound Study
We will present a relaxation approach to generate the relaxation instance of the GMST problem. Step 1. Generate the set of elite candidates T = {t1 , t2 , ...tk } using the BTS described in Section 2.1. Set tj = {(v1 , v2 , ..., vm )|vi ∈ Vi } as a feasible solution. Therefore, subgraph G = (V , E ) mentioned in Section 2.2 can be obtained. Step 2. Generate a set V = V − V and subset of each cluster Vk = Vk − Vk . Step 3. The graph G = (V, E) is divided into two different subgraphs, elite and non-elite graphs. Choosing one of the subgraphs, we transfer each clusters in this subgraph into a node. The cost of an edge is as follows: ⎧ i, j ∈ / V ⎨ eij i∈ / V , j ∈ V e = min {eij |j ∈ V (nj )} ⎩ min {eij |i ∈ V (ni ), j ∈ V (nj )} i, j ∈ V After transformation, we obtain the relaxation of the graph G.
4
Computational Study
TS and GA were coded in Java. All tests were run on a Dual-core Xeon 3GHz Server with 2GB of RAM. The set of instances in our experiment is a subset of the instances described in Fischetti et al. [6]. This subset includes the TSP instances from TSPLIB 2.1(Reinelt[7]) having 48 ≤ v ≤ 318 where the instances with 48 to 226 nodes are identical to those in Feremans et al. [2] and Golden et al. [5]. We also considered geographical problems in [2] and [5]. Fischetti et al.[6] provided two procedures to generate node clusters, center clustering and grid clustering. For grid clustering, there is a parameter μ defined by the user. We set kmax = 50 and length of tabu list = 0.12|V |. The threshold was set to be 1.5. We
Tabu Search for Generalized Minimum Spanning Tree Problem
921
Table 1. Computational Result for TS and GA on TSPLIB Instance where the optimal solution is unknown Problem name TSPLIB d198 gr202 ts225 pr226 gr229 gil262 pr264 pr299 lin318 TSPLIB d198 gr202 ts225 pr226 gr229 gil262 pr264 pr299 lin318 TSPLIB d198 gr202 ts225 pr226 gr229 gil262 pr264 pr299 lin318 TSPLIB d198 gr202 ts225 pr226 gr229 gil262 pr264 pr299 lin318 TSPLIB d198 gr202 ts225 gr229 gil262 pr264 pr299 lin318
K |E| instances, 40 18841 41 19532 45 24650 46 24626 46 25036 53 33507 53 34028 60 43786 64 49363 instances, 67 19101 68 19826 75 24900 84 25118 81 25584 95 33845 101 34357 102 44114 108 49821 instances, 40 18772 41 19303 45 24726 50 24711 47 25129 63 33643 55 34016 69 43893 64 49320 instances, 32 18372 31 18872 35 24544 33 24355 34 24595 49 33487 43 33481 47 43463 49 49106 instances, 25 18149 21 17904 25 24300 23 23761 36 33208 27 32814 35 43054 36 48664
Previous TS LB/UB(%) best Soln CPU(s) center clustering 123 94.12 7044 7044 242 84 94.63 243 94 99.78 62315 62268 55515 55515 103 100.0∗ 787 101 90.22 – – 942 230 78.66∗ – 21886 267 89.02∗ – 20316 307 71.09∗ – 18501 350 84.6∗ grid clustering µ = 3 8283 8283 312 89.05∗ 293 293 158 90.44∗ 79019 79019 301 98.75∗ 62527 62527 207 87.22∗ – 935 262 89.3∗ – 1318 408 78.53∗ 29208 367 – – 23141 753 – – 24152 819 – – grid clustering µ = 5 94 91.07 7098 7098 232 232 93 90.52∗ 105 92.38 60659 60639 117 98.38 56721 56721 713 146 90.04 – – 1018 244 72.4∗ 21365 141 91.53 – – 18614 506 65.95∗ – 17696 304 78.13∗ grid clustering µ = 7 6501 6501 56 88.72∗ 203 45 91.13 203 53 98.95 50813 50813 44 100.0 48249 48249 626 55 84.66 – – 829 122 73.7∗ 20455 74 85.76 – 15255 106 76.17 – – 14931 114 79.63∗ grid clustering µ = 10 34 91.32 6185 6185 177 27 93.79 177 31 99.9 40339 40339 515 31 87.96 – – 655 73 73.44∗ 16554 64 97.46 – 11640 94 74.91 – 10139 136 76.72 –
GA Soln CPU(s) 7053 242 62504 55515 788 966 22028 20725 18788
149 118 165 171 171 259 258 392 437
8287 293 79408 62527 935 1329 29239 23743 24380
261 227 317 366 369 584 594 811 902
7098 232 60886 56721 714 1047 21513 19330 17879
134 111 158 191 175 334 258 475 429
6501 203 50889 48249 627 843 20461 15521 15191
88 70 109 101 95 226 183 258 282
6185 177 40339 515 679 16554 11781 10201
60 38 64 52 144 90 167 174
922
Z. Wang, C.H. Che, and A. Lim
first tested our relaxation procedure on 44 instances where the optimal solution is unknown. In our experiment, testing both types of relaxation with different lengths of lists (l = 5,10,20,30,40), we generated ten relaxation instances for each instance. In addition, we tested the instance without the relaxation procedure. We used the model described in Pop et al. [8] coding in CPLEX 9.0 in Java. The running time for each instance was 1 hour. In Table 1, the instances with asterisks mean the lower bound was improved. 19 out of 44 instances were found to have better lower bounds. In addition, 2 instances received the optimal solution. Therefore, the number of instances where the optimal solution is known increased to 152. We tested 194 instances in total. GA found 144 optimal solutions and TS found all optimal solutions. Table 1 shows the results for the instance where the optimal solution is unknown. TS improves the solution from 3 instances and obtains same solutions provided in [5] for all others instances.
5
Conclusions
In this paper, we provided a Tabu Search for the GMST problem. We also presented a relaxation procedure to estimate the lower bound for the GMST problem. Our result is noteworthy that we improved the lower bound from previous results. In addition, we solved the instances up to 318 nodes.
References 1. Myung, Y.S., C.H. Lee, D.W.Tcha. On the generialized minimum spanning tree problem. Networks 26 231–241, 1995. 2. Feremans, C. Generial Spanning Trees and Extensions. PhD thesis, Institut de Statistique et de Recherche Op´erationnelle, Universit´e Libre de Bruxelles, Bruxelles, Belgium, 2001. 3. Feremans, C., M.Labb´e, G.Laporte. The generalized minimum spanning tree problem: Polyhedral analysis and branch-and-cut-algorithm Networks 43 71-86, 2004. 4. Dror, M., M. Haouari, J. Chaouachi. Generialized spanning tree. Eur. J. Oper. Tes. 120 583-592, 2000. 5. Golden, B., S.Raghavan, D.Stanojevi´c. Heuristic Search for the Generalized Minimum Spanning Tree Problem. INFORMS J. Comput. 17(3) 290-304, 2005 6. Fischetti, M.,J.J. Salazar-Gonzalez, P. Toth. Symmetric generalized traveling salesman problem. Oper. Res. 45 378-394, 1997. 7. Reinelt, G. TSPLIB, A traveling salesman problem library. INFORMS J. Comput. 3 376-384, 1991. 8. Pop,P.C, W.Kern, G.Still, A new relaxation method for the generalized minimum spanning tree problem. Eur.J.Oper.Res. 170 900-908, 2006. 9. Haouari, M. and Chaouachi, J.S. Upper and lower bounding strategies for the generalized minimum spanning tree problem. Eur.J.Oper.Res. 171 632-647, 2006.
Investigation of Brood Size in GP with Brood Recombination Crossover for Object Recognition Mengjie Zhang1,2 , Xiaoying Gao1,2 , Weijun Lou1 , and Dongping Qian2 1
School of Mathematics, Statistics and Computer Science Victoria University of Wellington, P. O. Box 600, Wellington, New Zealand 2 Artificial Intelligence Research Centre Agricultural University of Hebei, Baoding, China {mengjie, xgao, norman}@mcs.vuw.ac.nz, {zmj, qdp}@hebau.edu.cn Abstract. This paper describes an approach to the investigation of brood size in the brood recombination crossover method in genetic programming for object recognition problems. The approach is examined and compared with the standard crossover operator on three object classification problems of increasing difficulty. The results suggest that the brood recombination method outperforms the standard crossover operator for all the problems in terms of the classification accuracy. As the brood size increases, the system effective performance can be improved. When it exceeds a certain point, however, the effective performance will not be improved and the system will become less efficient.
1
Introduction
Since the early 1990s, genetic programming (GP) [1,2] has been applied to a range of object recognition problems such as shape classification, face identification, and medical diagnosis [3,4,5,6]. While showing promise, current GP techniques are limited and frequently do not give satisfactory results on difficult classification tasks. One main problem is that the standard crossover operator is not sufficiently powerful to generate good solutions [2]. In the current crossover operator, two sub-programs (crossover points) are randomly chosen from two parent programs, and two new programs are generated by simply swapping them. However, the totally random choice is clearly unable to guarantee the best choice. To improve the standard crossover operator, Tackett [7] introduced the “brood recombination” method. In this method, a “brood” N is created for each crossover operation. The standard crossover operation is repeated N times on the two same parent programs selected from the population and 2N child programs are generated. These child programs are then evaluated and their fitness ranked. The two programs with the best fitness are considered the “real” children of the parents and retained, but other children are discarded. Clearly, the brood size is a key parameter in this approach. This size might be task dependent and related to the evolutionary process parameters such as the number of generations. However, this parameter was not properly investigated in the previous work. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 923–928, 2006. c Springer-Verlag Berlin Heidelberg 2006
924
M. Zhang et al.
The goal of this paper is to further analyse the brood size in the brood recombination crossover method. To do this, the brood recombination crossover with different brood sizes will be compared with the standard crossover operator in GP on three object classification problems of increasing difficulty. We will investigate whether a larger brood size can lead to better performance and whether there exists a certain point for the brood size beyond which the system performance will not be improved.
2 2.1
The Approach Image Data Sets
Experiments were conducted on three different image data sets providing object classification problems of increasing difficulty. Sample images for each data set are shown in figure 1.
(a)
(b)
(c)
Fig. 1. Sample images in datasets: (a) Shape; (b) Coins; (c) Texture
The first data set (shape, figure 1a) was generated to give well defined objects against a reasonably noisy background. The pixels of the objects were produced using a Gaussian generator with different means and variances for each class. Four classes of 600 small objects (150 for each class) were cut out from the images and used to form the classification data set. The four classes are: dark circles, grey squares, light circles and noisy background. The second set of images (coin, figure 1b) contains scanned 10 cent New Zealand coins. The coins were located in different places with different orientations and appeared in different sides (head and tail). The background was also cluttered. Three classes of 500 objects were cut out from the large images to form the data set. The three classes are: head, tail and background. Among the 500 cutouts, there are 160 cutouts for head, 160 cutouts for tail and 180 cutouts for background respectively. Compared with the shape data set, the classification problem in this data set is harder. Although these objects are still regular, the problem is quite hard due to the noisy background and the low resolution. The third set of images (figure 1c) contains four different kinds of texture images, which are taken by a camera under the natural light. The images are
Investigation of Brood Size in GP with Brood Recombination Crossover
925
taken from a web-based image database held by SIPI of USC [8]. The four texture classes are named woollen cloth, wood grain, raffia and herringbone weave respectively. Because they are quite similar in many aspects, this classification task is expected to be more difficult than that in the coin data set. There are 900 sample cutouts from four large images, and each class has 225 samples. This dataset is referred to as texture. For these data sets, the objects were equally split into three separate data sets: one third for the training set used directly for learning the classifiers, one third for the validation set for controlling overfitting, and one third for the test set for measuring the performance of the learned program classifiers. 2.2
GP Settings
In the approach, we used the tree-structure to represent genetic programs [1]. The ramped half-and-half method was used for generating programs in the initial population and for the mutation operator [2]. The proportional selection mechanism and the reproduction, crossover and mutation operators [1] were used in the learning and evolutionary process. Terminal Set and Function Set. In this approach, we use four simple features extracted from each data set as terminals. Given an object cutout image, the four pixel statistics, mean, standard deviation, skewness, and kurtosis, are calculated as features. In addition, we also used a constant terminal for all the three tasks. The function set consists of the four standard arithmetic operators and a conditional operation: {+, −, ∗, /, if }. The +, −, and ∗ operators have their usual meanings — addition, subtraction and multiplication, while / represents “protected” division. If the first argument is negative, the if function returns its second argument; otherwise, it returns the third argument. Fitness Function. We used classification accuracy on the training set of object images as the fitness function. The output of a genetic program in the GP system is a floating point number. In this approach, we used a variant version of the program classification map [6] to translate the single output value of a genetic program into a set of class labels. Parameters and Termination Criteria. In this approach, the population size is 300 for the Shape date set, and 500 for the other sets. The initial maximum program sizes for the three data sets are 3, 5 and 5, and can be increased to 5, 6 and 8 during evolution. The crossover rate, mutation rate and reproduction rate used are 60%, 30% and 10%, respectively. The evolutionary process is terminated when the number of generations reaches 50, or when the classification problem has been solved on the training set or the accuracy on the validation set starts falling down, in which case the evolution was terminated earlier. To compare the results with different brood sizes and the standard crossover operator, we use the classification accuracy, training time and the number of generations to measure the performances of these methods. For each experiment, we run 80 times and the average results are presented in the following sections.
926
3
M. Zhang et al.
Results
To investigate the effect of the brood size in the brood recombination crossover method, we did experiments on the three data sets using different fixed brood sizes with 2, 4, 6, 8, and 10 respectively. The average results on the test set of the GP system with these brood sizes together with the standard crossover operator (N = 1) are presented in table 1. Table 1. Results of brood recombination crossover with fixed different brood sizes Brood size N 1 2 4 6 8 10
Shape Gens Time(s) Accu.(%) 8.59 0.09 96.16 5.26 0.10 98.25 3.48 0.10 98.25 3.01 0.11 98.12 2.66 0.12 98.44 2.30 0.13 98.03
Coin Gens Time(s) Accu.(%) 28.64 1.78 90.37 21.88 2.50 92.42 19.70 3.07 93.08 17.59 3.53 92.82 15.85 3.89 93.08 15.94 4.49 92.80
Texture Gens Time(s) Accu.(%) 29.99 1.83 72.45 26.01 3.31 76.68 21.82 4.23 76.46 23.69 6.09 79.82 20.00 6.12 80.71 17.80 6.50 78.13
As shown in table 1, for all brood sizes investigated here, the brood recombination crossover method achieved better classification accuracy than the standard crossover operator for all the data sets. Although the number of generations used in the evolutionary process for the brood recombination method was smaller than the standard crossover operator, the actual training time was increased. This is mainly because the number of real evaluations in each generation in the brood recombination method was increased. From the effectiveness point of view, this method improves the standard crossover from two-fold. Firstly, it reduces the effect of disrupting potential building blocks of the standard crossover operator through multiple trials of searching for good crossover points. Secondly, it actually adds a kind of hill-climbing search into the genetic beam search in GP. The results also show that different brood sizes resulted in different results. For the object classification problems investigated here, it seems that a brood size of 4–8 could be a good starting point. Further Analysis. Further inspection of the results reveals that as the brood size increases to a certain number (4 or 8 for different data sets), the classification accuracy is increased. When the brood size exceeds this number, however, the accuracy achieved starts falling down. These results suggest that there exists such a brood size that could lead to the best performance for a particular task. We refer to this number as the brood-diversity point (or range). In the biological world, the chromosomes for a particular species are usually quite long and the crossover can occur in multiple genes in different positions. Accordingly, a huge number of crossover points can be provided, which allows a large size of brood to produce distinguished child chromosomes.
Investigation of Brood Size in GP with Brood Recombination Crossover
927
In most GP systems, however, the program size is limited to the parameter, maximum program size. In addition, the GP crossover only chooses a single point and swaps the sub trees in the parent programs. Accordingly, when the brood size increases to a certain number, the probability of the crossover operation on the same two parent programs to produce redundant programs will be extremely high. In other words, when the brood size exceeds the brood-diversity point, the brood recombination crossover operator will not only be unable to produce distinguished child programs, but also have to take longer time for more evaluations. This will result in a longer training time with non-improved even slightly worse performance in effectiveness due to possibility of pre-mature convergence.
4
Conclusions
The goal of this paper was to investigate the effect of brood size in the brood recombination crossover operator in GP for object classification problems. The goal was successfully achieved by testing five different brood sizes. The approach was examined and compared with the standard crossover operator on three object classification problems of increasing difficulty. The experiment results suggest that the brood recombination method with all the brood size investigated here achieved better classification performance than the standard crossover operator but the evolutionary training time was increased. The results also suggest that different brood sizes usually result in different performances. As the brood size increases to the brood-diversity point, the system effective performance can be improved. When the brood size exceeds this point, however, the effective performance will not be improved and the system will become less efficient. Our research suggests that the brood size is related to the program size and specific tasks. This work reveals that the brood size is closely related to the maximum program size parameter. We will investigate the relationship between the brood size and the program size together with the number of generations in the future.
Acknowledgement This work was supported in part by the national Marsden Fund of New Zealand and the University Research Fund (6/9) at Victoria University of Wellington.
References 1. Koza, J.R.: Genetic programming : on the programming of computers by means of natural selection. Cambridge, Mass. : MIT Press, London, England (1992) 2. Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming: An Introduction on the Automatic Evolution of computer programs and its Applications. Morgan Kaufmann Publishers (1998) 3. Howard, D., Roberts, S.C., Brankin, R.: Target detection in SAR imagery by genetic programming. Advances in Engineering Software 30 (1999) 303–311
928
M. Zhang et al.
4. Song, A., Ciesielski, V., Williams, H.: Texture classifiers generated by genetic programming. In: Proceedings of the 2002 Congress on Evolutionary Computation CEC2002, IEEE Press (2002) 243–248 5. Tackett, W.A.: Genetic programming for feature discovery and image discrimination. In: Proceedings of the 5th International Conference on Genetic Algorithms, Morgan Kaufmann (1993) 303–309 6. Zhang, M., Ciesielski, V., Andreae, P.: A domain independent window-approach to multiclass object detection using genetic programming. EURASIP Journal on Signal Processing, 2003(8) (2003) 841–859 7. Tackett, W.A.: Recombination, Selection and the Genetic Construction of Computer Programs. PhD thesis, University of Souithern California, Department of Electrical Engineering Systems (1994) 8. Webpage: http://sipi.usc.edu/services/database/database.cgi?volume=textures. (by Signal & Image Processing Institute of University of Southern California. accessed on 22 July, 2004)
An Immune Algorithm for the Optimal Maintenance of New Consecutive-Component Systems Y.-C. Hsieh1 and P.-S. You2 1
Department of Industrial Management, National Formosa University Huwei, Yunlin 632, Taiwan
[email protected] 2 Institute of Transportation and Logistics Engineering, National Chia-Yi University Chia-Yi 600, Taiwan
[email protected] Abstract. There are two main objectives for this paper : (1) we will propose a more general class of consecutive-component systems which generalizes both the typical consecutive-k-out-of-n:F systems and two-dimensional consecutivek-out-of-n:F systems, (2) we will propose an immune algorithm to investigate the optimal maintenance policy for the proposed consecutive-component systems. Numerical results are reported and compared with those of implicit enumeration.
1 Introduction The C(k, n : F) system consists of n linearly connected components, and it fails if and only if there are consecutive k or more than k components failed (Fig 1). 1
2
3
...
n-2
n-1
n
Fig. 1. C(k, n: F) system
The two-dimensional C(k, n: F) system consists of n2 components in a square grid of side n, and it fails if and only if there is at least one square of side k ( 2 ≤ k ≤ n − 1 ) that contains all failed components (Salvia and Lasher [8]). The system can be applied into various areas, e.g., safety monitoring systems, design of electronic devices, disease diagnosis, and pattern recognition etc (Hsieh and Chen [4]).
2 The New Proposed Consecutive-Component Systems The proposed consecutive-component systems can be generally defined as systems with consecutive minimal cuts ([6]). For convenience, we consider the following consecutive-component system in Fig 2 as an example. This system contains 13 Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 929 – 933, 2006. © Springer-Verlag Berlin Heidelberg 2006
930
Y.-C. Hsieh and P.-S. You
components, and the underlines of system of Fig 2 denote the minimal cuts of the system. That is, there are 7 minimal cuts for the system, namely CA={1,2,3}, CB={2,4}, CC={4,5}, CD={5,6,7}, CE={7,8,9}, CF={8,10,11}, and CG={11,12,13}. (F)
12 11 13 (G) (C) 10 (D) 6 5 7 8 9
(A)
4 1 2 3
(E)
(B) Fig. 2. The consecutive-component system
There are several practical applications for such a kind of systems, e.g., safety monitoring systems. In Fig 2, we may treat the 13 components as 13 cameras in a specific building. Cameras 1, 2 and 3 are responsible for lane A, and lane A is failed if and only if cameras 1, 2, and 3 are all failed. Similarly, cameras 11, 12 and 13 are responsible for lane G, and lane G is failed if and only if cameras 11, 12 and 13 are all failed. The failure reliability of the safety monitoring system of Fig 2 is defined as the probability that there is one or more of lanes are failed. Suppose that units at 1, 2, 3 and 4 of Fig 2 require the minimal operational reliability at least Ra, units at 5, 6, 7, 8 and 9 require the minimal operational reliability at least Rb, and units at 10, 11, 12 and 13 require the minimal operational reliability at least Rc,. 2.1 The System Reliability of Consecutive-Component Systems Suppose that C1, C2, C3, …, Ck are minimal cuts for a specific consecutive-component system. The failure reliability of the consecutive-component system can be obtained by the disjoint subsets method. Consider the following terms. P(C1) P(C2) - P(C2C1)
(1) (2)
M
P(Ck) - P(Ck(C1∪C2∪…∪Ck-1))
(3)
Thus P(C1∪C2∪…∪Ck) is the sum of terms from (1) to (3). There are several advantages of this disjoint subsets method which can be found in Hsieh [5]. 2.2 Maintenance Policy In the past decades, the maintenance of systems/components is always one of the main issues (Wang [9]). As known, the replacement of systems/components is the most perfect approach and there are numerous researches concerning this issue (Wang [9]). Following Flynn and Chung [1][2], we define the maintenance problems and notations for our proposed consecutive-component systems.
An Immune Algorithm for the Optimal Maintenance
931
Assumptions. 1. Component i has mi maintenance policies, {0,1,…,mi-1}, where 0 denotes no maintenance, 1 ≤ i ≤ n . 2. The reliabilities for components are known and all components in the system are independent. 3. The total cost is the long run expected average cost per period.
: : : q :q =1- p , 1 ≤ i ≤ n , j ∈ M . c : the maintenance cost for component i under maintenance policy j, 1 ≤ i ≤ n , j∈M . c :the failure cost of system. P (S):the failure probability under S. The mathematical programming model for the maintenance problem is:
Notations. S the component set of maintenance. n the number of components in the system. the reliability for component i under maintenance policy j, 1 ≤ i ≤ n , pij j ∈ M i = {0,1,..., mi − 1} . ij
ij
ij
i
ij
i
0
F
min G (S ) = ∑ ∑ qij cij + c0 PF (S )
(4)
s.t. 0 ≤ qij ≤ 1 , 0 ≤ PF (S ) ≤ 1 , cij ≥ 0
(5)
S∈Ω
i∈S j∈M i
Clearly, this proposed maintenance problem for consecutive-component system is a NP hard problem (Flynn and Chung [1][2]).
、
Definition of Optimal Maintenance Policy. (Flynn and Chung [1] Flynn et al. [3]) Denote by Ω the set of all subsets of {1, 2, …, n}. Results in Chung and Flynn [1][2] ensure the existence of an optimal critical component policy (CCP). A CCP is a stationary policy where the decisions are determined by a critical component set S∈Ω and replacing component j if and only if j is failed and j∈S. The detailed definitions of critical component set and the optimal policy can be found in Chung and Flynn [1][2].
3 The Procedure of Immune Algorithm The steps of proposed immune algorithm are as follows: Step 1. Generate an initial population of strings (antibodies) randomly. Step 2. Evaluate each individual in current population and calculate the corresponding fitness value for each individual.
932
Y.-C. Hsieh and P.-S. You
Step 3. Select the best n individual with highest fitness values. Step 4. Clone the best n individuals (antibodies) selected in Step 3. Note that the clone size for each select individual is an increasing function of the affinity with the antigen. Step 5. The set of the clones in Step 4 will suffer the genetic operation process, i.e., crossover and mutation (Michalewicz [7]). Step 6. Calculate the new fitness values of these new individuals (antibodies) from Step 5. Select those individuals who are superior to the individuals in the memory set, and then the superior individuals replace the inferior individuals in the memory set. While the memory set is updated, the individuals will be eliminated while their structures are too similar. Step 7. Check the stopping criterion, if not stop then go to Step 2. Otherwise go to next step. Step 8. Stop. The optimal or near optimal solution(s) can be obtained from the memory set.
4 Numerical Results and Discussions For C(k, n: F) test problems, we let Mi={0,1,2,3}, pi0=0, pi1=0.9, pi2=0.95, pi3=0.97, ci0=0, ci1=2, ci2=3, ci3=6 for all i when k is not fixed. Numerical results are reported in Table 1. For consecutive-component test problems, we consider the system in Fig 2 and let Mi={0,1,2,3}, pi0=0, pi1=0.9, pi2=0.97, pi3=0.99, ci0=0, ci1=2, ci2=4, ci3=15 for all i, and Ra=0.98, Rb=0.95, Rc=0.98. Numerical results of various c0 are reported in Table 2. All results are computed by Pentium IV 2.8 GHz PC and programs are coded by MATLAB 6.5. From Table 1 to Table 2, we have: (1) The CPU time is exponentially increasing with the increase of n for implicit enumeration. However, it is not so drastic for the proposed immune algorithm. (2) For all most of test problems, the proposed immune algorithm can obtain the optimal solutions which are the same as those by implicit enumeration. (3) Table 2 shows the top-5 multiple optimal solutions for consecutive-component test problems. Clearly, it trends to use high levels of maintenance with the increase of failure cost c0. Table 1. Numerical results of C(k, n: F) problems (k is not fixed)
n
Min Cuts
Immune Algorithm G(S)
R
Maintenance policy
CPU (sec) A
Implicit Enumeration G(S)
R
Maintenance policy
CPU (sec) B
A/B
No. Comb.
MC1 0.6262 0.9971 0232 0.828 0.6262 0.9971 0232 0.688 1.203488 256 MC2 1.2976 0.9944 20330332 1.125 1.2976 0.9944 02330332 204.656 0.005497 65536 12 MC3 1.7455 0.9933 033303303302 6.562 1.7445 0.9927 023303303302 75651.6 8.67E-06 16777216 No. Comb. = Number of combinations of all maintenance policies. MC1={1,2,3},{3,4}. MC2={1,2,3},{3,4},{4,5,6},{6,7},{7,8}. MC3={1,2,3},{3,4},{4,5,6},{6,7},{7,8,9},{9,10},{10,11,12}. 4 8
An Immune Algorithm for the Optimal Maintenance
933
Table 2. Numerical results of consecutive-component problems with various c0 Immune Algorithm
c0
G(S)
R
50
1.1735
0.9981
30
1.1295
0.9973
10
0.9940
0.9606
Maintenance policy 2303303200302 0323303200320 2303303200302 2302302200320 2302203200320 2302203200302 0300300300030 3003003000300 0303003000300
CPU (sec)
2303303200320 0323303200302
112.31
2302302200302 0322302200320
111.64
0300300300300 0300300030300
107.28
Ra 0.9991 0.9991 0.9993 0.9991 0.9985 0.9985 0.9801 0.9801 0.9801
0.9991 0.9991 0.9991 0.9991 0.9801 0.9801
(memory set=5, mutation=0.85, crossover=0.9, generation=5000, affinity =0.1)
Rb 0.9990 0.9990 0.9988 0.9982 0.9982 0.9982 0.9801 0.9703 0.9703
0.9990 0.9990 0.9982 0.9982 0.9801 0.9703
Rc 0.9994 0.9994 0.9994 0.9994 0.9994 0.9994 0.9801 0.9900 0.9900
0.9994 0.9994 0.9994 0.9994 0.9900 0.9900
5 Conclusions In this paper, (i) we have proposed a more general class of consecutive-component systems, (ii) we have proposed a new immune algorithm to investigate the optimal maintenance policy for the proposed systems. Numerical results have shown the superior performance for all test problems. It has to be emphasized that the proposed immune algorithm can obtain multiple optimal solutions for all test problems and can provide the decision makers alternative choices.
References 1. Flynn, J., Chung, C.S.: A heuristic algorithm for determining replacement policies in consecutive k-out-of-n systems. Computers and Operations Research 31 (2004) 1335-1348. 2. Flynn, J, Chung, C.S.: A branch and bound algorithm for computing optimal replacement policies in consecutive k-out-of-n systems. Naval Research Logistics 49 (2002) 288-302. 3. Flynn, J, Chung, C.S., Chiang, D.: Replacement policies for a multicomponent reliability system. Operations Research Letters 7 (1988) 167-172. 4. Hsieh, Y.C., Chen, T.C.: Reliability lower bounds for two-dimensional consecutive-k-outof-n: F systems. Computers and Operations Research 31 (2004) 1259-1272. 5. Hsieh, Y.C.: Alternative approach for coherent system reliability with minimal cuts/paths. Working paper (2006). 6. Hsieh, Y.C., Chen, T.C.: The reliability of systems with stair-type consecutive minimal cuts. Working paper, submitted. (2006). 7. Michalewicz, Z.: Genetic algorithm + Data structures = Evolution programs, SpringerVerlag Berlin Heidelberg, New York (1994). 8. Salvia, A.A., Lasher, W.C.: 2-dimensional consecutive-k-out-of-n F models. IEEE Transactions on Reliability 39 (1990) 382-385. 9. Wang, H.: A survey of maintenance policies of deteriorating systems. European Journal of Operational Research 139 (2002) 469-89.
:
Immune Genetic Algorithm and Its Application in Optimal Design of Intelligent AC Contactors* Li-an Chen1 and Peiming Zhang2 1
Department of Electronic Engineering, Xiamen University of Technology, P.R.C 2 Department of Electrical Engineering, Fuzhou University, P.R.C
[email protected],
[email protected] Abstract. An application of Immune Genetic Algorithm (IGA) suitable for the optimal design to intelligent AC contactors is presented in this paper. Besides the ability of stochastic global searching of Simple Genetic Algorithm (SGA), t he IGA draws into the mechanisms exist in biological immune system such as immune memory, immune regulation, antibody diversity and others. The simulation results show that IGA overcomes the disadvantages of premature convergence of SGA, and improve the global searching efficiency and capability. This algorithm has been successfully used in the optimal design to the intelligent AC contactors. Keywords: Evolutionary computing, optimal design, contactor.
1 Introduction With the rapid development of the technology of artificial intelligence, several stochastic methods originated from the biological evolutionary theories such as Evolutionary Algorithm, Genetic Algorithm (GA) are widely used in the area of optimal design. Differed from the deterministic methods, the stochastic methods have more opportunities to converge the global optimum. Especially, GA has been applied for solving the design and optimization problem of electromagnetic devices. However, the Simple Genetic Algorithm (SGA) has its limitation, for example, it’s difficult to overcome the premature convergence, another problem for SGA is the low-search efficiency. To overcome these drawbacks of SGA, people are trying to improve it and have proposed some modified methods. The Immune Genetic Algorithm (IGA) is a kind of modified genetic algorithm which is based on the theory of biological immune system. The biological immune system is a parallel, distributed, self-organizing and highly adaptive complex system which has special characteristics such as immune recognition, immune memory, immune regulation, and so on [1]-[2]. IGA not only retains the ability of stochastic global searching of SGA, but also introduces the selection mechanism based on antibodies’ concentration and the diversity maintaining mechanism which exists in nature immune system. The IGA has better global convergence and very strong self-adaptive ability. In this paper, IGA is applied to the optimal design of the intelligent AC contactors. *
This program is supported by the Bureau of Science and Technology in Fujian Province and Xiamen city (the project number is 2005F004 and 3502Z20041072 respectively).
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 934 – 939, 2006. © Springer-Verlag Berlin Heidelberg 2006
Immune Genetic Algorithm and Its Application in Optimal Design
935
2 Background knowledge of IGA GA is a kind of evolutionary algorithm which imitates the natural evolution. The initial population is produced randomly, and the new population is reproduced by three basic genetic operations: selection, crossover, and mutation. The individual in the population is evaluated by setting a proper fitness function, the individual with higher fitness has the higher probability to be selected and to be reserved to the next generation, vice versa. As GA is a kind of stochastic search method, it has few limitations to the design condition, so it’s an ideal algorithm to the optimal design problems. But there are some drawbacks existing in GA, it’s necessary to improve it. IGA is an improved genetic algorithm base on the nature immune system. The objective functions of the problem are taken as antigens while the solutions to the problem are taken as antibodies. According to the principle of nature immune system, the biological immune system is able to produce corresponding antibodies to resist the invading of antigens. This process is called immune response. Part of antibodies will be maintained in memory cell. When the same kind of antigen invades again, the memory cell will be stimulated and produce a large number of antibodies, the secondary immune response will be faster and stronger than the primary one. It demonstrates the function of memory of immune system. At the same time, antibodies are encouraged and restrained each other in order to keep population diversity and immune balance. This is self-regulation mechanism of immune system. Corresponding to the functions and mechanisms of biological immune system, IGA has salient characteristics comparing with SGA. Assume that the population size is N, the gene length is M. Some terms are defined as follow.
①
Diversity: The immune system which is composed by antibodies is an uncertain system in the process of evolution, its diversity can be expressed by Shannon’s entropy as H (N ) =
1 M
M
∑H j =1
j
(1)
(N )
Where Hj(N) is the entropy of the jth gene. Similarity: Similarity Aij is the similar level of antibody i and j.
②
Aij =
1 1 + H (2)
(2)
Where H(2) is entropy, it can be calculated by (1) (let N=2). Antibodies’ concentration: Antibodies’ concentration presents the weight of the similar antibodies in the population.
③
Ci =
The number of antibodies whos similarity with antibody i is greater than λ N
≤ ≤1
(3)
Where λ is the constant of similarity. Generally, 0.9 λ . Integrated fitness: The integrated fitness which synthesizes fitness and concentration is given by
④
936
L.-a. Chen and P. Zhang
fitness ' = fitness ∗ exp( k ∗ C i )
(4)
Actually, integrated fitness fitness’ is the adjustment to fitness[3]. As to the maximum problem, k is negative. In this paper, k is chosen as -0.8 by tests. In the operation of selection, the selection probability of the antibodies is proportional to the integrated fitness, so the higher the antibodies’ concentration is, the smaller the integrated fitness is, and the smaller the selection probability of the antibodies would be. This is a new strategy to keep the diversity of populations. Key steps of IGA is as follow: Initializing population; Calculating fitness; Producing new antibodies; Reproducing population based on concentration; Refreshing memory; Terminating calculation.
③ ⑤
⑥
④
①
②
3 Application of IGA on Optimal Design to Intelligent AC Contactor 3.1 Introduction to Intelligent AC Contactor
The intelligent AC contactor has a special control principle, which introduces high integrated single-chip microcomputer to on-time control AC contactor’s starting, maintaining and breaking operations. In starting process, the intelligent AC contactor is activated at an optimal voltage phase angle which let coil have minimum starting power energy. After a while the power is shut down by the control circuit and then restore it in order to obtain an optimal dynamic behavior. In maintaining and breaking process, it can be operated on the condition of no-noise, no-arc or less-arc. The principle of this kind of intelligent AC contactor can be referred to [4]. 3.2 Optimal Design to Intelligent AC Contactor
The target of optimal design to intelligent AC contactor is to obtain a set of parameters concerning structure and control under which the overall technical and economic index would be best.
①Optimal Variables
Six parameters(az,N,d,φ,ts,Δt) which greatly influence the dynamic characteristics of intelligent AC contactor are chosen as the optimal variables. Where az, N, d are structure parameters which present the width of U-shape iron, the number of coil-turns and the wire diameter of coil respectively. φ, ts,Δt are control parameters,φ presents the inception phase angle; ts presents the time the power is shut down, andΔt presents the duration of power shutting down. Objective Functions Three targets( V ( X ), Pt ( X ), Ek ( X ) ) which most demonstrate the overall economic
②
and technique demand are chosen as the multi-objective functions including volume, power loss and kinetic energy of armature. IGA is used as the optimization algorithm to find the optimal variables of the objective function. Where V(X) presents the volume of electromagnetic structure which is composed by the core volume VFe and the coil effective volume Vcu; Pt(X) presents the power loss of coil during the contactor’s starting process; Ek(X) presents the kinetic energy on unit area when armature bumps against stationary core.
Immune Genetic Algorithm and Its Application in Optimal Design
937
③Fitness Function
The multi-objective functions can be converted into single-objective fitness function by adding them with weight, meanwhile the minimum problem is changed into maximum problem F ( X ) = Fmax − {w1[α VFe ( X ) / VFe 0 + β Vcu ( X ) / Vcu 0 ] + w2 Pt ( X ) / Pt 0 + w3 Ek ( X ) / Ek 0
(5)
Where Fmax presents a constant which is big enough to guarantee F(x)>0; VFe0 ,Vcu0 present the pre-optimization values of core volume and the effective coil volume; Pt0 , Ek0 present the pre-optimization values of the power loss during the starting process and the kinetic energy on unit area;α,β,w1, w2, w3 present weight factors. 3.3 Example and Analysis
The population size is assumed as 50, the probability of crossover is 0.95 and the probability of mutation is 0.08. Weight factorsα,βare 0.4 and 0.6 respectively. The calculation results are shown in Fig.1, Fig.2 and Table 1. Fig.1 shows the diversity compare of SGA and IGA. It’s obvious that IGA can effectively keep the diversity of evolutionary populations with the increase of evolutionary generations, but SGA can’t. As a result, SGA is prone to be trapped in a local optimum while IGA is capable to alleviate the problem associated with premature convergence benefiting from its diversity-keeping feature. Fig.2 is the dynamic curves of the intelligent AC contactor. The dynamic curves include voltage, exciting current, attractive force, velocity and displacement characteristics.
w1=0.5, w2=0.1, w3=0.4 Fig. 1. Similarity of population
(a) SGA
(b) IGA
Fig. 2. Dynamic curves (w1=0.5, w2=0.1, w3=0.4)
Table 1 shows the results of design optimization in different weight w1, w2 and w3. Both IGA and SGA have higher integrated technical and economic target. The most obvious effect is the decreasing of kinetic energy Ek. The reason is that the intelligent AC contactor has a sound starting process by using of optimal control parameters, the armature is stick to stationary core at a near zero velocity(it can be seen in Fig.2). So Ek is enormously reduced and the mechanical span is prolonged greatly. The effect of material saving is also obvious especially for the copper consumption Vcu, it leads decreasing of
938
L.-a. Chen and P. Zhang
the cost of the device. As for the feature target of power loss Pt , the result is intercross in certain circumstances. When the weight of Pt (w2) is large enough (for example, w2=0.3), the optimal result of Pt is smaller than the pre-optimization one, thus power is saving in the starting process. However the other feature targets including Vcu and Ek would not be as smaller as that in the circumstance which w2 is smaller. So the characteristics of Vcu and Ek are as a sacrifice for power saving. To compare the optimal results of IGA and SGA, the power loss of IGA is slightly greater than that of SGA in certain circumstances, but the copper consumption and the kinetic energy of IGA are smaller than that of SGA, the integrated technical and economic target of IGA is higher than that of SGA. In other words, IGA is capable to search better optimal results than SGA, meanwhile, in the evolutionary process, IGA is able to search optimal results at earlier generation than SGA. In conclusion, the IGA has higher search ability and efficiency than SGA in optimal design of intelligent AC contactor. Table 1. Results of design optimization
Items
Names az(cm) N(turn) d(mm) φ(0) ts(ms) Δt(ms) VFe ( V3cu) 3 Pt (W·s) Ek
Optimal Variables
Feature Targets
2
PreOptimization 3.2 2820 0.31 0-180 --95.82 31.30 3.70 2657.1
w1=0.5 w2=0.1 w3=0.4 SGA IGA 2.4 2.4 2620 2820 0.23 0.19 69 171 7.0 11.0 7.0 7.0 71.87 71.87 12.40 8.73 4.06 5.49 25.06 3.94
w1=0.5 w2=0.2 w3=0.3 SGA IGA 2.4 2.4 2820 2720 0.23 0.23 147 69 8.0 7.0 11.0 7.0 71.87 71.87 13.51 12.95 3.94 3.84 28.44 22.12
w1=0.5 w2=0.3 w3=0.2 SGA IGA 2.4 2.4 2520 2720 0.27 0.25 90 84 6.0 7.0 7.0 6.0 71.87 71.87 17.28 15.75 2.93 3.15 256.0 145.8
4 Conclusions A kind of improved genetic algorithm based on immune principle is introduced to the area of optimal design on intelligent AC contactors in this paper. Enlightened by the immune principle, IGA has several distinguished features, such as keeping the diversity of evolutionary populations, alleviating premature convergence and enhancing the search efficiency, etc. The simulation results demonstrate that IGA performs better at the aspects of search speed and search ability than SGA in optimal design of intelligent AC contactor.
References 1. Xufa Wang, Xianjun Zhang, Xianbin Cao, jun Zhang, and Lei Feng, ''An Improved Genetic Algotithm Based on Immune Principle,'' Mini-Micro System, vol.20, pp. 117-120, Feb.1999. 2. Jang-Sung Chun, Min-Kyu Kim, Hyun-Kyo Jung ''Shape optimization of electromagnetic devices using immune algorithm,'' IEEE Trans. on Magnetics Vol.33 No.2 1997 1876-1879
.
,
,
,
,
:
Immune Genetic Algorithm and Its Application in Optimal Design
939
3. Xunxue Cui, ''The Study of Evolutionary Algorithms Based on Multiobjective Optimizaiton,'' Ph.D. dissertation, University of Science and Technology of China , 2001. 4. Zhihong Xu, and Peiming Zhang, ''Research on Intelligent AC Contactor,'' Low-voltage Electrical Apparatus, vol.26, pp.19-21,53,1998.
The Parametric Design Based on Organizational Evolutionary Algorithm Cao Chunhong1, Zhang Bin1, Wang Limin2, and Li Wenhui2 1
College of Information Science and Engineering, Northeastern University, Shenyang 110004, P.R. China
[email protected],
[email protected] 2 College of Computer Science and Technology, Jilin University, Changchun 130012, P.R. China
[email protected],
[email protected] Abstract. Geometric constraint problem is equivalent to the problem of solving a set of nonlinear equations substantially. In this paper we propose a new optimization algorithm—organizational evolutionary algorithm (OEA) and apply it into the geometric constraint solving. In OEA the colony is composed of the organizations. Three organizational evolutionary operators--split operator, merging operator and coordinating operator can lead the colony to evolve. These three kinds of operators have different functions in the algorithm. Split operator limits the scale of the organization, and makes sure a part of organization come into next generation directly, which maintains the variety of the generation. Merging operator makes use of the leader’s information fully and acts as a local searching function. Cooperating operator increases the degree of adaptability between the two organizations by the interactions. The experiment shows that OEA has good capability in the geometric constraint solving. Keyword: parametric design, geometric constraint solving, organizational evolutionary algorithm, split operator, merging operator, coordinating operator.
1 Introduction The parametric design is a geometric constraint-solving problem. Geometric constraint solving approaches are made of three approaches: algebraic-based solving approach, based solving approach and graph-based solving approach. One constraint describes a relation that should be satisfied. Once a user defines a series of relations, the system will satisfy the constraints by selecting proper state after the parameters are modified. The idea is named model-based constraints. Constraint solver is a segment for the system to solve the constraints. Evolutionary Computation is a kind of random research algorithm simulating natural evolution mechanism. It can solve complex and morbid problems and has been applied in the numerical optimization, combination optimization, machine study and neural networks. The problem puzzling the evolutionary algorithm is that the algorithm may get into the local best solution. In order to overcome the shortages, there are many algorithms to be introduced, such as orthogonal genetic algorithm[1], Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 940 – 944, 2006. © Springer-Verlag Berlin Heidelberg 2006
The Parametric Design Based on Organizational Evolutionary Algorithm
941
microgenetic algorithm[2], the immune programming[3], the genetic algorithm by coordinating exploration and exploitation[4] and good point set based genetic algorithm[5]. In 1995, Wilcox firstly introduced the concept “organization” in the economics into the classifier based on the genetic algorithm[6].In reference[7], the author proposed a novel method from bottom to top—organizational coevolutionary algorithm and has achieved good effect. In this paper we apply the organizational evolution into the geometric constraint solving and propose a new optimization algorithm— organizational evolutionary algorithm.
2 Organization Evolution Algorithms Split Operator: The condition of splitting the organization is as follows:
( org f Maxos )or{( org ≤ Maxos ) and (U (0,1) p
org )} N0
(1)
Merging Operator: Supposing the leader of org p is (x1, x2, …, xn), new individual is rj=(rj,1, rj,2, …, rj,n), j=1, 2, …, N, then in the merging strategy 1, rj can be gotten from formula(2): 1
rj , k
xk , z j ,k xk ° ® xk , z j ,k ! xk , k 1, 2,..., n ° z j ,k , RWKHUZLVH ¯ × xk-yjk .
(0 ,1 ) (
here, zj, k=xk+Uk
(2)
)
In the merging strategy 2, rj can be gotten from formula (3):
rj , k
1 ° xk E u ( xk xk ), U k (0,1) n ® °¯ xk , RWKHUZLVH
(3)
k=1,2,…,n, here β =U(0,1) is different to every xk. Before get the computing result of rj, zj+M can be gotten form formula (4).
z jM
° rj , Fitness(rj ) t Fitness ( y j ) °° r , ( Fitness (rj ) Fitness( y j )) and ® j °{U (0,1) exp( Fitness (r ) Fitness ( y ))} j j ° j yi , RWKHUZLVH °¯
(4)
942
C. Cao et al.
Cooperative operator: Supposing two parent organizations are org p = {x1,x2,..,xM} 1
(0,1); 5. Else if j = i 6. Then k=i and record the pattern as < PSOP(IMk), nj = 1>; j=j+1; 7. Else k=i+1 and record the pattern as < PSOP(IMk), nj = 1>; j=j+1;} For ISP(Si) and ISP(Sj), ISSP of two objects refers to the longest similar and continuous sub-patterns that belongs to the ISP of each object. Since there are spatial relationship between all images in Si, that is, for any IMi1, …, IMin, IMi(j+1) must be farther from calvaria than IMij, it is not necessary to retrieve the whole PSOPs of one object to find the similar pattern of a given Mi. For example, M1 is PSOP of the farthest image from calvaria of one object and Mp is PSOP of the nearest image of another different object, it is not meaningful to compare M1 and Mp to find whether they are similar or not because M1 and Mp show the different parts of the brain. According to this, two rules are introduced to reduce the retrieval space. For two objects Si and Sj, assumed that ISP(Si)=m and ISP(Sj)=n, (1) if m=n, that is, the number of the PSOPs in these two objects is equal, then we only need to retrieve Mi-1’, Mi’ and Mi+1’ in ISP(Si) to discover the similar patterns of Mi in ISP(Si). (2) If m 0, which balances the tradeoff between the compression and preservation. For applications in which the motivation is to group the data, it is tempting to set β to be ∞. In this case, the equation (1) becomes the maximization of I(T ; Y ), and the optimal IB solution tends to become crisp, i.e., the p(t|x) approaches zero or one almost everywhere. If the cardinality value of T is given, there exist several information bottleneck algorithms which aim to produce this crisp representation T , as shown in [2] and [3]. It would be attractive if an IB algorithm has a build-in mechanism which could automatically determine this parameter. This is in general a question of model selection. In this paper, we introduce the “Minimum Message Length” (MML) principle into IB method, and show that one can give this question a formal optimal solution that is also based on information theoretical considerations. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1005–1009, 2006. c Springer-Verlag Berlin Heidelberg 2006
1006
2
G. Li et al.
Choosing the Optimal Cardinality Value Using the MML principle
The basic idea of the Minimum Message Length (MML) [4]principle is to find the hypothesis H which leads to the shortest message. In the case of determing the optimal cardinality value for IB method, it requires the estimate of the optimal encoding code for the description of the IB solution and the original data D. The whole message consists of 2 parts: 1. The message costed to describe the IB solution. This part of message can be further divided into two sub-messages. (a) message encoding the cardinality value k. (b) message encoding the IB solution Tk . 2. The message costed to describe the original data set D, under the assumption that the IB solution was the optimal one. Given a cardinality value k, the IB solution returned from any IB algorithm is k groups of objects. The encoding of this model shall consist of the following two terms: 1. Encoding of the cardility value k: The encoding length of this description can be estimated as log∗ (k), where log∗ is the universal code length for integers [5]. 2. Encoding of the IB solution Tk : To encode the IB solution, we can code the index in some agreed enumeration of all possible groupings. As the number of possible groupings is exactly the definition of Stirling numbers of the second kind [6], the number of bits needed to code the Tk can be estimated as log S2 (n, k). Thus, the encoding length of k and the Tk can be estimated as: M sgLen(k) + M sgLen(Tk ) = log∗ (k) + log S2 (n, k)
(2)
where k is the cardinality value, and n is the number of objects. In the case of encoding the original data set X when the IB solution Tk is known, the encoding length can be estimated as n multiplies the H(X|Tk ), i.e. M sgLen(D|k, Tk ) = n × H(X|Tk ) = n × (H(X) − I(X; Tk ))
(3)
where I(X; Tk ) is the mutual information between X and Tk , and H(X) is the entropy of X. Therefore, the total message length for the IB model and the original data set D can be estimated as M sgLen = log∗ (k) + log S2 (n, k) + n × (H(X) − I(X; Tk )) where k is the cardinality value, n is the number of objects in D.
(4)
Determine the Optimal Parameter for Information Bottleneck Method
2.1
1007
Automatical SIB Algorithm — ASIB
Based on the encoding length formula (4), we can design an iterative algorithm as described in Algorithm 1: Starting with a range [mink , maxk ] of possible cardinality values (by default, mink = 1 and maxk = n), the algorithm runs through each possible k value, and a sequential IB algorithm is called to get the IB solution Tk , then the encoding length of the IB model is calculated. Finally the k which results in the minimum encoding message length will be returned as the optimal cardinality value. Algorithm 1. Automatical SIB algorithm Require: a joint distribution p(x, y), the range of possible cardinality values [mink , maxk ] (optional). Ensure: the optimal cardinality value bestK, and the feature grouping result TbestK T ← an empty array with length maxk − mink + 1 {T will store the IB solutions corresponding to different k} M sgLen ← an empty array with legnth maxk − mink + 1 {M sgLen will store the encoding length corresponding to different k} for each k ∈ [mink , maxk ] do Tk ← sIB(k) M sgLen(k) ← log ∗ (k) + log S2 (n, k) + n × (H(X) − I(X; Tk )) if msgLen(k) < msgLen(bestK) then bestK = k end if end for{Iterate each candidate feature set to choose the optimal one} returns bestK and TbestK
3
Experiment Design and Results Analysis
In this section, we evaluate the performance of the proposed ASIB algorithm under the document clustering scenario, as in the paper [2]. The datasets used in our test are 9 original and 9 shuffled document-words counts datasets [2]. In the ASIB algorithm, we set mink = 2 and maxk =20. The ASIB algorithm is then called to automatically determine the correct number of document categories, and cluster the documents. Left column of the Figure 1 shows the general shape of the total cost versus the number of possible categories. In Figure 1(a), it is interesting to note that the encoding length increases as the increasing of cluster number, and data sets Binary-1 and Binary-2 arrive at their minimum encoding length when the k = 2. The data set Binary-3 arrived at its minimum encoding length when k = 3, which is very close to its correct number of categories 2. Figure 1(c) and 1(e) present a similar results, in which we can find that the ASIB algorithm determined the correct number of clusters for Multi5-1, Multi10-1 and Multi10-3, and located very close results for data sets Multi5-2, Multi5-3 and Multi10-2. From the figure, it is evident that the ASIB algorithm successfully determined
1008
G. Li et al.
3100
3100 Binary1 Binary2 Binary3
3050
3000
DescriptionLength
DescriptionLength
3000
2950
2900
2950
2900
2850
2850
2800
2800
2750
S−Binary1 S−Binary2 S−Binary3
3050
2
4
6
8
10
12
14
16
18
2750
20
2
4
6
8
Number of clusters
(a) Binary Multi5 1 Multi52 Multi53
16
18
20
14
16
18
20
S−Multi5 1 S−Multi52 S−Multi53
3050
3000
DescriptionLength
3000
DescriptionLength
14
3100
3050
2950
2900
2950
2900
2850
2850
2800
2800
2
4
6
8
10
12
14
16
18
2750
20
2
4
6
8
Number of clusters
10
12
Number of clusters
(c) Multi5
(d) S-Multi5
3040
3060 Multi10 1 Multi102 Multi10
3020
S−Multi10 1 S−Multi102 S−Multi10
3040
3
3
3020
DescriptionLength
3000
DescriptionLength
12
(b) S-Binary
3100
2750
10
Number of clusters
2980
2960
2940
2920
3000
2980
2960
2940
2920 2900
2900
2880
2860
2880
2
4
6
8
10
12
Number of clusters
(e) Multi10
14
16
18
20
2860
2
4
6
8
10
12
14
16
18
20
Number of clusters
(f) S-Multi10
Fig. 1. Cost versus number of Categories
the correct number of categories in 5 out of 9 data sets, and located very close values for the other 4 data sets with an error just around 1. The ASIB algorithm is then called to automatically determine the correct number of document categories on 9 shuffled data sets. Right column of the Figure 1 shows the general shape of the total cost versus the number of possible categories. In Figure 1(b), it is interesting to note that the encoding length also increases as the increasing of k, and data sets S-Binary-1 and S-Binary-2 arrive at their minimum encoding length when k = 2. The data set S-Binary-3 arrived
Determine the Optimal Parameter for Information Bottleneck Method
1009
at its minimum encoding length when k = 3, which is very close to its correct categories number 2. Figure 1(d) and 1(f) present a similar results, in which we can find that the ASIB algorithm determined the correct number of clusters for S-Multi5-1, S-Multi10-1 and S-Multi10-3, and located a very close results for data sets S-Multi5-2, S-Multi5-3 and S-Multi10-2. These results are of special interest taking into account that no prior knowledge of the number of categories is provided to the ASIB algorithm. More over, there is strong empirical evidence suggesting that the proposed method could identify an appropriate parameter value for IB method automatically.
4
Conclusion
In this paper, we designed an encoding scheme for IB solutions corresponding to different cardinality values, and developed a MML-based algorithm — ASIB to automatically determine the best cardinality value based on the data set itself. The experimental results on document clustering showed that the proposed ASIB algorithm is capable of recovering the true category number, and this indicates that the proposed parameter determining mechanism is promising.
References 1. N. Tishby, F.P., Bialek, W.: The information bottleneck method. In: Proc. 37th Allerton Conference on Communication and Computation. (1999) 2. Slonim, N.: The Information Bottleneck: Theory and Applications. PhD thesis, the Senate of the Hebrew University (2002) 3. Slonim, N., Tishby., N.: Agglomerative information bottleneck. Advances in Neural Information Processing Systems (NIPS) 12 (1999) 617–623 4. Wallace, C., Freeman, P.R.: Estimation and inference by compact coding. Journal of the Royal Statistical Society 49 (1987) 223–265 5. Rissanen, J.: Universal Prior for Integers and Estimation by Minimum Description Length. Annals of Statistics 11 (1983) 416–431 6. Eric W. Weisstein: Stirling Number of the Second Kind. (From MathWorld – A Wolfram Web Resource) http://mathworld.wolfram.com/ StirlingNumberoftheSecondKind.html.
Optimized Parameters for Missing Data Imputation* Shichao Zhang1,2, Yongsong Qin1, Xiaofeng Zhu1, Jilian Zhang1, and Chengqi Zhang2 1
Deparment of Computer Science, Guangxi Normal University, China Faculty of Information Technology, University of Technology Sydney P. O. Box 123, Broadway NSW 2007, Australia {zhangsc, chengqi}@it.uts.edu.au,
[email protected],
[email protected],
[email protected] 2
Abstract. To complete missing values, a solution is to use attribute correlations within data. However, it is difficult to identify such relations within data containing missing values. Accordingly, we develop a kernel-based missing data imputation method in this paper. This approach aims at making optimal statistical parameters: mean, distribution function after missing-data are imputed. We refer this approach to parameter optimization method (POP algorithm, a random regression imputation). We experimentally evaluate our approach, and demonstrate that our POP algorithm is much better than deterministic regression imputation in efficiency of generating an inference on the above two parameters. The results also show our algorithm is computationally efficient, robust and stable for the missing data imputation.
1 Introduction Missing value imputation is an actual yet challenging issue confronted by machine learning and data mining fields [1,3,4,5,8,9,12,13,14,16,17,18]. There are many approaches to deal with missing values [1,6,17,18], mainly including (a) Ignore objects containing missing values; (b) Fill the missing value manually; (c) Substitute the missing values by a global constant or the mean of the objects; and (d) Get the most probable value to fill in the missing values. The first approach usually will lose too much useful information, whereas the second one is time-consuming and expensive in cost, so it is infeasible in many applications. The third approach assumes that all missing values are replaced with the same value, probably leading to considerable distortion for data distribution. The fourth approach, missing value filling, is the most effective way to deal with the missing value problem [6]. This means that missing data imputation is a reasonable strategy. There exist many methods for missing data imputation (we’ll adopt the term ‘imputation’ in this paper from statistics domain). Traditional missing value imputation techniques can be roughly classified into parametric imputation (e.g., the linear regression) and non-parametric imputation *
This work is partially supported by Australian large ARC grants (DP0449535 DP0559536 and DP0667060), a China NSFC major research Program (60496327), a China NSFC grant (60463003) and a grant from Overseas Outstanding Talent Research Program of Chinese Academy of Sciences (06S3011S01).
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1010 – 1016, 2006. © Springer-Verlag Berlin Heidelberg 2006
Optimized Parameters for Missing Data Imputation
1011
(e.g., the Nearest Neighbor method or non-parametric regression imputation). The parametric regression imputation must correctly specify the parametric model for the data set. However, in many real applications, it is almost impossible to know what distribution can capture the attribute correlations within real data (i.e., setting the exact relation between the feature attributes and a target attribute is very difficult). An ill-specified model may result in highly biased and miscalculated results. An alternative approach for missing value imputation could be the use of non-parametric techniques. Using a non-parametric method is beneficial when the form of relationship between input and output data is not known apriori [21]. Kernel method, as one of the non-parametric techniques, which is widely used in machine learning and pattern recognition, is an effective method to deal with missing data because of its computationally efficient, robust and stable [22]. Most commonly used imputation methods include deterministic imputation and random imputation [15]. Wang and Rao [19, 20] show that the deterministic imputation method performance well in making inference for the mean of Y. Qin and Rao [11] have showed that one must use random imputation method in making inference for distribution functions and quantiles of Y. However, the most appropriate way to handle missing or incomplete data will depend upon how data is missing. Little and Rubin [9] classified missing data mechanisms into three categories as follows: Missing Completely at Random (MCAR), Missing at Random (MAR) and Nonignorable. In this paper, Different from above methods for imputation missing value, we propose a random regression imputation method (named as POP algorithm) which construct a kernel based nonparametric estimator to impute the missing values based on the missing mechanism MAR as well as MCAR, then we evaluate the performance of the POP in making inference for the parameters of the response variable Y about the mean and the distribution function. The rest of this paper is organized as follows. In section 2, we give related work in missing value completion. In Section 3, we present our POP algorithm in detail. Experimental results are given in Section 4. Conclusions and future work are in Section 5.
2 Related Work Currently, there are two kinds of methods that are used in the research topic of missing value completion: Machine learning based methods include decision tree imputation, case-wise deletion, lazy decision tree; Statistics-based methods include linear regression, robust Bayesian estimator and a new family of reconstruction problems for multiple images from minimal data [7]. But these methods are not completely satisfactory ways to handle missing value problems because these methods usually only handled with the discrete value in machine learning, such as C4.5. Continuous attributes are discretized before it is processed, which may be lost the true characteristic during converting process from the continuous values to discrete ones. [10] has reviewed the main missing data techniques (MDTs). They revealed that statistical methods have been mainly developed to manage survey data and proved to be very effective in many situations. However, the main problem of these techniques is the need of strong model
1012
S. Zhang et al.
assumptions. That is, an assumption that the data follows a specific distribution is needed. However, these assumptions cannot well describe the data in many cases in practice, sometimes these assumptions are even misleading for modeling the data. An alternative approach for missing value imputation could be the use of non-parametric techniques for it needs fewer assumptions about the data distribution. In many real world applications, it is very common that the user has no idea about the data being discussed. As a consequence, non-parametric methods should be adopted for this kind of task of missing value imputation. Our POP algorithm is a non-parametric kernel method for missing value imputation, which is presented in section 3.
3 The POP Algorithm Let X be an n×d-dimensional vector and let Y be a variable influenced by X, we denote X, Y as attributes values and target attribute (class label) respectively, and X and Y are continuous. We assume that X has no missing values, while only Y has. To simplify the discussion, the dataset is denoted as
(X
i
, Yi , δ i ) , i = 1 , . . ., n
,
where δi is an indictor of missing, i.e., δ i = 0 if Yi is missing and δi =1 if Yi is not missing. In a real world database, we suppose that X and Y satisfy: Yi = m ( X
i
) + ε i , i = 1 , ..., n .
(1)
Where m(.) is an unknown function, εi is a random error with mean 0 and variance σ 2 . In other words, we assume that Y has relation with Xi. The relation may be linear or nonlinear, but we cannot know it exactly. In this paper, we use mˆ ( x ) =
x − X i ) h x − X i n ) + n i=1 δ i K ( h
∑ ∑
n i=1
δ iY i K (
−2
(2)
) Where m( X ) is the kernel estimate of m(X) and n-2 is introduced to avoid the case that ) the denominator vanishes, and h refers to bandwidth. We use m( X i ) as imputed the missing value of Yi in this paper. In Equation 2, K(.) is a kernel function. There are many commonly used forms of kernel functions, such as the Gaussian kernel, and the uniform kernel. In practice, there is not any significant difference using these kernel functions under the MAR and MCAR assumptions. In this paper, we adopt the widely used Gaussian kernel function. Silverman [23] turns out that the choice of bandwidth is much more important than the choice of kernel function. Small value of h make the estimate look ‘wiggly’ and show spurious features, whereas too big values of h will lead to an estimate which is too smooth in the sense that it is too biased and may not reveal structural features. There is no generally accepted method for choosing the window widths. Methods currently available include ‘subjective choice’ and automatic
Optimized Parameters for Missing Data Imputation
1013
methods such as the “plug-in”, ‘cross-validation’ (CV), and ‘penalizing function ’ approaches. In this paper, we use the method of cross-validation to choose c. Let Yi( D ) and Yi( R ) , i∈sm be values imputed for the missing data using deterministic and random imputation methods, respectively. Deterministic imputation [19, 20] uses mˆ n ( x ) as the imputed value, i.e. Y ( D ) = mˆ ( x ) , i ∈ s m ; POP imputation uses n i Yi ( R ) = mˆ n ( X i ) + ε i* =Yi ( D ) + ε i*
, i∈sm , as the imputed values, where { εi* } is a simple
random sample of size m with replacement from { Y j − mˆ n ( X i ), j∈sr }.
4 Analysis of Performance In order to show the effectiveness of the proposed method, extensive experiments were done on simulation models as well as real dataset using a DELL Workstation PWS650 with 2G main memory, 2.6G CPU, and WINDOWS 2000. The dataset we adopted is the Abalone that downloaded from UCI machine learning repository [2], which contains 4177 instances in total and 9 attributes for each instance (non missing attribute values). These attributes give the length, diameter, height and other features of the abalone, and the last attribute Rings gives the age in years. For interest of space, we only select the attribute length as X, and the attribute Rings as Y. The detailed statistics of these two attributes are listed in Table 1. Table 1. Statistics for attributes Length and Rings of Abalone dataset
Statistics Min Max Mean Standard deviation
Length 0.075 0.815 0.524 0.120
Rings 1 29 9.934 3.224
We use the missing mechanisms MCAR and MAR on Y at different missing rates of 10% and 40%. Then POP algorithm is utilized to fill the missing values of Y. The estimators of parameters of Y are also evaluated after imputation. Note that in real world applications, the missing values need to be imputed only once, i.e, repeat time = 1. In this paper, we do not present the performance for the estimators of quantile θ q as the estimators with different imputations are almost identical to the one with original data set. The reason is that the missing values of Y cause little effect for the estimator of quantiles in this case. We report the performance on μ and θ in Figures (1)-(4). We can see from Figures (1) and (3) that the POP and the deterministic one work well on the mean of Y under the real dataset, as the MSE values of both imputation methods are small and very close to each other. Figures (2) and (4) shows that the MSE for distribution function (y=10) based on the POP are significantly smaller than those based on deterministic imputation, i.e. the POP performs much better than the deterministic counterpart on distribution function of Y under the missing mechanisms MCAR and MAR.
1014
S. Zhang et al.
The POP still performs reasonably well under the circumstance of high missing rate, as the MSE for estimators of μ and θ remains small when missing rate of Y is changed from 10% to 40%. This trend can be seen by comparing Figures (1), (2) with (3), (4).
5 Conclusions and Future Work Because learning from incomplete data is a challenging and practical issue, we have designed a kernel based missing data imputation for the problem in this paper. In nonparametric regression settings, we have shown that our POP algorithm (random regression imputation) works well in making inference on all parameters. We have also illustrated that deterministic regression imputation only works for the mean of Y, but not for the distribution function and quantiles of Y. That is, when we want to make inference on the mean of Y, we can use deterministic imputation or random imputation; when we want to make inference on the distribution function, we must use random imputation. Our experiments have also demonstrated that our POP algorithm generates an inference on the above parameters much better than deterministic regression imputation.
Optimized Parameters for Missing Data Imputation
1015
In future work, we plan to use the POP algorithm to make inference for nonparametric model with fixed design points X i , and the partial linear model, Y i = β ' X i + m ( Ti ) + ε i .
References 1. Batista, G.A., et al. (2003). An Analysis of Four Missing Data Treatment Methods for Supervised Learning. Applied Artificial Intelligence, 17(5-6): 519-533. 2. Blake, C. L, and Merz, C. J (1998). UCI Repository of machine learning database. [http://www.ics.uci.edu/~mlearn/MLResoesitory.html] Irvine, CA: university of California,Department of Information and Computer Science. 3. Chen, SM, Chen HH (2000). Estimating null values in the distributed relational databases environments. Cybernetics and Systems: An International Journal, 31: 851-871 4. Chen, SM, Huang, CM (2003). Generating weighted fuzzy rules from relational database systems for estimating null values using genetic algorithms. IEEE Transactions on Fuzzy Systems, 11:495-506 5. Gessert G. (1991). Handling Missing Data by Using Stored Truth Values. SIGMOD Record, 20(3): 30-42. 6. Han, J., Kamber, M. (2000). Data Mining: Concepts and Techniques, Morgan Kaufmann. 7. Kahl, F., Heyden, A., Quan, L. (2001). Minimal Projective Reconstruction Including Missing Data. IEEE Trans. Pattern Anal. Mach. Intell., 23(4): 418-424. 8. Lakshminarayan, K, Steven A. Harp, Robert P. Goldman, Tariq Samad (1996). Imputation of Missing Data Using Machine Learning Techniques. KDD-1996: 140-145. 9. Little, R.J.A. and Rubin, D.A. (1987). Statistical analysis with missing data. New York: John Wiley and Sons. 10. Magnani M. (2004). Techniques for Dealing with Missing Data in Knowledge Discovery Tasks, In: http://magnanim.web.cs.unibo.it/index.html. 11. Qin, Y. S. and Rao, J. N. K. (2004). Confidence intervals for parameters of the response variable in a linear model with missing data. Technical Report, School of Math and Statistics, Carleton University. 12. Pawlak, W., (1993). Kernel classification rules from missing data. IEEE Transactions on Information Theory, 39(3): 979-988. 13. Pesonen, E., Matti Eskelinen and Martti Juhola (1998). Treatment of missing data values in a neural network based decision support system for acute abdominal pain. Artificial Intelligence in Medicine, 13(3): 139-146. 14. Ramoni, M., et al. (2001). Robust Learning with Missing Data. Machine Learning, 45(2): 147170. 15. Rao, J. N. K. (1996). On variance estimation with imputed survey data. J. Amer. Statist. Assoc., 91: 499-520. 16. Schafer, J.L., and Graham, J.W., (2002).: Missing Data: Our View of the State of the Art. Psychological Methods 2002, Vol. 7, No. 2, 147–177. 17. Shichao Zhang, Chengqi Zhang and Qiang Yang (2004). Information Enhancement for Data Mining. IEEE Intelligent Systems, 19(2): 12-13. 18. Shichao Zhang, Zhenxing Qin, Charles X. Ling and Shengli Sheng (2005). “Missing is useful”: missing values in cost-sensitive decision trees. IEEE Transactions on Knowledge and Data Engineering, Vol. 17 No. 12: 1689-1693.
1016
S. Zhang et al.
19. Wang, Q., Rao, J. N. K. (2002a). Empirical likelihood-based inference in linear models with missing data. Scand. J. Statist., 29: 563-576. 20. Wang, Q., Rao, J. N. K. (2002b). Empirical likelihood-based inference under imputation for missing response data. Ann. Statist. 30: 896-924. 21. Lall, U., and A. Sharma. (1996). A nearest-neighbor bootstrap for resampling hydrologic time series. Water Resource. Res. 32: 679–693. 22. John, Shawe-Taylor and Nello Cristianini (2004). Kernel Methods for Pattern Analysis. Cambridge Press. 23. Silverman, BW. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York.
Expediting Model Selection for Support Vector Machines Based on an Advanced Data Reduction Algorithm Yu-Yen Ou1 , Guan-Hau Chen1 , and Yen-Jen Oyang2 1
2
Graduate School of Biotechnology and Bioinformatics, Department of Computer Science and Engineering, Yuan-Ze University, Chung-Li, Taiwan Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Abstract. In recent years, the support vector machine (SVM) has been extensively applied to deal with various data classification problems. However, it has also been observed that, for some datasets, the classification accuracy delivered by the SVM is very sensitive to how the cost parameter and the kernel parameters are set. As a result, the user may need to conduct extensive cross validation in order to figure out the optimal parameter setting. How to expedite the model selection process of the SVM has attracted a high degree of attention in the machine learning research community in recent years. This paper proposes an advanced data reduction algorithm aimed at expediting the model selection process of the SVM. Experimental results reveal that the proposed mechanism is able to deliver a speedup of over 70 times without causing meaningful side effects and compares favorably with the alternative approaches.
1
Introduction
The support vector machine (SVM) was first proposed by Vapnik [1] and has since attracted a high degree of interest in the machine learning research community. However, it has also been observed that, for some data sets, the classification accuracy delivered by the SVM is very sensitive to how the cost parameter and the kernel parameters are set. As a result, the user may need to conduct extensive cross validation in order to figure out the optimal parameter setting. This process is commonly referred to as model selection [2]. One practical issue concerning model selection with the SVM is that this process could be very time-consuming if the data set is large and an extensive search in the parameter space is conducted. In recent years, several studies have been conducted to address this issue and all these studies share a common ground aimed at reducing the search space of parameter combinations [3,4,5]. In this paper, we propose a novel approach to expedite the model selection process of the SVM based on an advanced data reduction mechanism. The study has been motivated by the observation that the learning algorithm of the SVM Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1017–1021, 2006. c Springer-Verlag Berlin Heidelberg 2006
1018
Y.-Y. Ou, G.-H. Chen, and Y.-J. Oyang
is aimed at figuring out the boundary that separates the two different classes of training instances in a hyperspace, subject to certain optimization criteria. Accordingly, the exact profile of the boundary can be figured out by observing only the training instances located in the proximity of the boundary. In other words, those training instances that are far away from the boundary essentially play no role in determining the boundary and can be removed. However, we must also consider the side effect of data reduction. That is, it may cause degradation of classification accuracy if some essential training instances are mistakenly removed. Fortunately, the experiments that we have conducted show that extensive data reduction can be applied on the data set fed into the model selection process of the SVM without causing meaningful side effect on classification accuracy. In our first stage of study on this issue [6], we employed a conventional data reduction mechanism and observed an average reduction rate of over 80% of the input instances removed for the three datasets in Table 1. This level of reduction in turn leads to an average speedup of the model selection process by over 20 times. On the other hand, we observed no degradation of classification accuracy in two out of the three datasets and only a degradation of 0.6% in the third dataset. In our follow-up study, we have developed an advanced data reduction mechanism based on the efficient kernel density estimation algorithm that we have recently proposed [7]. With the kernel density estimation (KDE) based data reduction mechanism, the average reduction rate for the three datasets in Table 1 is raised to 90% of the input instances being removed, which in turn leads to another 3 times of speedup in the model selection process. Since it is generally believed that the time complexity of the model selection process of the SVM is over O(n2 ), the additional level of reduction rate achieved could mean a much higher level of speedup for large datasets. The experiments conducted to evaluate the effects of the KDE based data reduction mechanism again show no meaningful side effect on classification accuracy.
2
The KDE Based Data Reduction Mechanism
The design of the data reduction based mechanism proposed in this paper for expediting the model selection process of the SVM is motivated by the observation that the learning algorithm of the SVM is aimed at figuring out the boundary that separates the two different classes of training instances in a hyperspace subject to certain optimization criteria. Accordingly, the profile of the boundary can be accurately figured out by the training instances located in the proximity of the boundary. As mentioned earlier, in our first stage of study on this issue, we employed a conventional data reduction mechanism to expedite the model selection process of the SVM. The conventional data reduction mechanism operates by first sorting the instances in the training dataset in the descending order according to the distance of each instance to its nearest enemy [8], which is defined to be the nearest neighbor belonging to a different class. With the sorted list, the data reduction algorithm then examines the instances one by one. An instance is
Expediting Model Selection for SVM
1019
considered as redundant and is removed if the instance and all its k nearest neighbors in the residue dataset belong to the same class. Here, k is a parameter to be set and the residue dataset refers to the dataset generated by removal of the redundant instances that have been examined earlier. In the following part of this paper, this conventional data reduction mechanism is refered to as the NE (nearest enemy) based data reduction algorithm. Experimental results show that with parameter k set to 3, the NE based data reduction mechanism is capable of removing over 80% of the training instances in the three datasets listed in Table 1 and reduces the execution time of the model selection process by over 95% without causing any meaningful side effect on classification accuracy. In our follow-up study, we have developed an advanced data reduction mechanism based on the efficient kernel density estimation (KDE) algorithm [7]. Also, the detail of KDE algorithm can be found in [7]. In the following part of this paper, the advanced data reduction mechanism is referred to as the KDE based data reduction algorithm. The efficient kernel density estimation algorithm treats each class of training instances as a set of random samples taken from a probability distribution and construct an approximate probability density function accordingly. Since the approximate probability density function is continuous and smooth, it can be expected that the function values at the instances in the outer regions of the distribution are generally lower than the function values of the instances in the inner regions of the distribution. Accordingly, the function values at the instances can be exploited to sort the instances in the descending order. With the instances in the dataset sorted, the KDE based data reduction algorithm then examines the instances one by one for their redundance. In the KDE based data reduction mechanism, two conditions are employed to determine whether an instance is redundant and an instance is considered as redundant if either one condition is met. The first condition is that the instance and all its k nearest neighbors in the residue dataset belong to the same class. The second condition is that the value of the approximate probability density function at the instance is higher than the average function value multiplied by a factor r. As far as the time complexity of the KDE based data reduction mechanism is concerned, for each of the n training instance, we need to compute the function value of the approximate probability density function. If the kd-tree structure is employed, then the average time complexity of this process is O(nlogn) [9]. Then, we need to sort the functional values at the training instances and the sorting can be carried out in O(nlogn). Finally, we need to remove a training instance in the kd-tree if the instance is redundant. With the kd-tree, the removal of an instance can be carried out in O(logn). Therefore, the average time complexity of the data reduction mechanism is O(nlogn) if k is treated as a constant [7].
3
Experimental Results
Tables 1-3 show how the KDE based data reduction mechanism performs in comparison with the alternative approaches in various aspects. As Table 1 reveals,
1020
Y.-Y. Ou, G.-H. Chen, and Y.-J. Oyang
Table 1. Execution time of the model selection process with alternative mechanisms in seconds The original The two-line method The NE based method The KDE based method time in time in % of the time in % of the time in % of the seconds seconds original time seconds original time seconds original time satimage 29052 3244 11.1% 1791 6.2% 632 2.2% letter 192358 15716 8.2% 14528 7.6% 3915 2.0% 252018 7887 3.1% 63 0.2% 10.7 0.04% shuttle Average 7.5% 4.6% 1.4% Speedup 1 13.3 21.7 71.4 Dataset
Table 2. Generalization accuracy with alternative mechanisms Dataset original method two-line method NE based method KDE based method satimage 91.8% 91.55% 91.2% 90.95% 97.82% 96.54% 97.82% 97.16% letter 99.92% 99.81% 99.92% 99.92% shuttle Average 96.51% 95.97% 96.31% 96.01% Table 3. The numbers of training instances left after data reduction is applied The original Dataset # of training instances satimage 4435 15000 letter shuttle 44500 Average
SVM The NE based method The KDE based method # of % of original # of % of original # of % of original SV training data instances training data instances training data 1689 38.0% 1167 26.3% 737 16.6% 8931 59.5% 4027 26.8% 2002 13.3% 287 0.64% 272 0.61% 81 0.18% 32.7% 17.9% 10.0%
all three mechanisms can significantly reduce the execution time taken to carry out the model selection process, while all three cause slight degradation of classification accuracy. An overall observation is that both the KDE based data reduction mechanism and the NE based data reduction mechanism deliver higher levels of speedup and cause less degree of loss of classification accuracy than the two-line approach. In addition, the KDE based mechanism is able to deliver a higher level of speedup than the NE based mechanism. Table 3 presents further insight about the effects of the KDE based mechanism. As shown in Table 3, with the KDE based mechanism applied, the average number of training instances remaining in the data set is about one third of the average number of training instances remaining after the NE based mechanism has been applied.
4
Discussion and Conclusion
This paper proposes a novel approach to expedite the model selection process of the SVM based on an advanced data reduction mechanism. Experimental results reveal that the proposed KDE based mechanism is able to deliver a significant
Expediting Model Selection for SVM
1021
speedup without causing meaningful sideeffects and compares favorably with the alternative approaches. In particular, in the experiments with the three benchmark datasets employed in this paper, the KDE based mechanism achieves an average speedup of over 70 times. Since it is generally believed that the time complexity of the model selection process of the SVM is over O(n2 ), a higher level of speedup could be observed if the dataset involved were larger than the benchmark datasets employed in the experiments. rate achieved could mean a much higher level of speedup for large datasets. As far as the execution time of the KDE based data reduction mechanism is concerned, the average time complexity is O(nlogn), where n is the number of instances in the input dataset.
References 1. Cortes, C., Vapnik, V.: Support-vector network. Machine Learning 20 (1995) 273–297 2. Hsu, C.W., Lin, C.J.: A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks 13 (2002) 415–425 3. Chapelle, O., Vapnik, V., Bousquet, O., Mukherjee, S.: Choosing multiple parameters for support vector machines. Machine Learning 46 (2002) 131–159 4. DeCoste, D., Wagstaff, K.: Alpha seeding for support vector machines. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (KDD-2000). (2000) 5. Keerthi, S.S.: Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms. IEEE Transactions on Neural Networks 13 (2002) 1225–1229 6. Ou, Y.Y., Chen, C.Y., Hwang, S.C., Oyang, Y.J.: Expediting model selection for support vector machines based on data reduction. In: Proc. IEEE International Conference on Systems, Man and Cybernetics (SMC2003). (2003) 786–791 7. Oyang, Y.J., Hwang, S.C., Ou, Y.Y., Chen, C.Y., Chen, Z.W.: Data classification with radial basis function networks based on a novel kernel density estimation algorithm. IEEE Transactions on Neural Networks 16 (2005) 225 – 236 8. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Machine Learning 38 (2000) 257–286 9. Bentley, J.: Multidimensional binary search trees used for associative searching. In: Communications of the ACM. (1975) 18(9):509–517
Study of the SMO Algorithm Applied in Power System Load Forecasting Jingmin Wang and Kanzhang Wu Department of Economy and Management,North China Electric power University, Baoding 071003
[email protected],
[email protected] Abstract. A new methodology on the algorithm of sequential minimal optimization (SMO) for power system load was presented. In order to solve the problem that support vector machines (SVM) can not deal with large scale data, this paper introduces the modified algorithm of SMO to increase operational speed by use of a single threshold value. Adopting the actual data from the distribution network of a certain domestic city, and the load is forecasted by use of support vector regression (SVR) which is based on the modified SMO algorithm and proper kernel function. The forecasted results are compared with those SVR employing quadratic programming (QP) optimization algorithm and BP artificial neural method, and it is shown that the presented forecasting method is more accurate and efficient. Keywords: Support vector machine; Sequential minimal optimization; kennel function; load forecasting.
1 Introduction STLF (Short-term Load Forecasting) plays an import role in the reliability of electric power system and the operation of economy [1]. Especially as the development of the electric market, people attach more and more importance to STLF. The researching stuff both from domestic and abroad have been making large amounts of studies and have put forward various kinds of forecasting method such as Time Series Method, Neural Network method, expert system and Artificial Neural Method etc. But because Load of electric power system is a multidimensional non-linear chaotic system, the above-mentioned methods have limiting factors in their astringency or applicability. Vapnik [2] and his co-workers proposed SVM which is considered as a novel method of machine learning. It overcomes the inherent drawback of neural network , such as local minimal point, over learning and architecture selection and type overdepending on experience. The SMO algorithm proposed by Platt [3] is for SVM classifier design. Because SMO uses a sub-problem of size two, each sub-problem has an analytical solution. Thus SVM could be optimized without a QP solver. The fine computational speed and simple implementation are the significant features of the SMO algorithm. And then Alex Smola generalized SMO algorithm to recursive problems, which provided possibility for resolving the huge data recursive questions by Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1022 – 1026, 2006. © Springer-Verlag Berlin Heidelberg 2006
Study of the SMO Algorithm Applied in Power System Load Forecasting
1023
support vector machines. Many scholars revised Smola’s SMO algorithm [4,5]. In this paper, we introduce a feasible SMO arithmetic method.
2 Support Vector Regression (SVR) Consider a set of data points, {( xi , yi )} (i = 1, 2,L , m) , such that xi ∈ ℜ d is an input
and yi is a target output. The basic idea of support vector regression is to map the input vector into high dimensional feature space by nonlinear mapping function and then to perform linear regression in the feature space. This transformation is realized by Kernel function. It can be written as follows: f ( x, w) = wΦ( x) + b
(1)
The coefficients and are estimated by minimizing n
E ( w) = C ∑ (ξi + ξi* ) +
Minimize:
i =1
1 2 w 2
⎧ yi − f ( xi , w) ≤ ε + ξ i ⎪ ∗ ⎨ f ( xi , w) − yi ≤ ε + ξi ⎪ ξi , ξi∗ ≥ 0 ⎩
Subjected to:
(2)
Slack variables ξi and ξi∗ can be introduced when data can’t be estimated by the function f under the precise ε . Introducing Lagrange multipliers and the model output is given : N
f ( x, α ) = ∑ (α i* − α i ) K ( xi , x) + b
(3)
i =1
3 SMO Algorithm for Regression 3.1 SVR Based on SMO
If λi = α i* − α i
, λi = α i* + α
, the new value for
λi will
obey the box constraint
−C < λi < C , i = 1, 2,L , N . Substitute λi and λi into (3) and (4), then they can be written as follows: Minimize: Subject to:
N
∑
i =1
And
N
N
i =1
i =1
L p ( λ ) = ε ∑ λi − ∑ λi yi +
(5)
−C < λi < C
λi = 0
f ( x, λ ) =
1 N N ∑ ∑ λi λ j kij , 2 i =1 j =1
N
∑ λ K ( x , x) + λ i =1
i
i
0
,
(6)
1024
J. Wang and K. Wu
Our goal is to express analytically the minimum of (5) as a function of two parameters. Let these two parameters have indices a and b so that λ a and λb are the unknowns. Note that a superscript * indicate that values are computed with the old parameter values. This means that these portions of the expression will not be a function of the new parameters. Because (5) can be expressed analytically as a function of two parameters, we can get the analytic solution. However (5) has absolute value sign when computing partial derivative. Nevertheless, if we take d
x / dx = sgn( x) , the resulting derivative is
algebraic. λb = λ*b +
1
η
[y
b
]
− ya + f a* − f b* + ε (sgn(λa ) − sgn(λb ) )
(7)
where η = (kbb + kaa − 2kab ) .( 7) is λb new rule. The Computation of the Threshold b: If λa is satisfied with the constraint, then f a = ya , new old new * λa0 = ya − fa* + (λold a − λa )kaa + (λb − λb )kab + λ0
If
λb
(8)
is satisfied with the constraint, then f b = yb ,
λ0b = yb − fb* +(λaold −λanew)kab +(λbold −λbnew)kbb +λ0* Otherwise: λ
new 0
(9)
= 0.5(λ + λ ) a 0
b 0
λi = 0 ⇔ yi − fi ≤ ε KKT Conditions:
−C < λi ≠ 0 < C ⇔ yi − fi ≈ ε
(10)
λi = C ⇔ yi − fi ≥ ε If one or both multipliers violate the KKT condition, the algorithm will go on operating. When no parameter violates any KKT condition, the global minimum can be achieved within machine precision. 3.2 SMO Algorithm of Completion
1. Select the first multiplier: The external loop first iterates over the non-bound training subset. If a sample violates the KKT conditions, it is qualified for immediate optimization. If there are no such samples, then iterates over the entire training set. 2. Choose the sample with maximum of (E 1 − E 2 ) / η as the second multiplier; 3. Compute λbraw : Set λanew = λaold + λbold − λbnew 4. Updating threshold: if f a = ya , set λ0new as equation (8); if f b = yb , set λ0new as equation (9); otherwise, λ0new = 0.5(λ0a + λ0b )
Study of the SMO Algorithm Applied in Power System Load Forecasting
1025
4 Short-Term Load Forcasting Based on SVR There are great differences on the load between workdays and weekends. Also the weather influences the load of forecast day, so the training simple should include weather data. The kennel function uses the Radial Base Function. In the SVR algorithm, parameter c and ε influence the generalized performance of SVR .The value of parameter C is selected between 10 and 100, when the C value is over 100, it can result in the owed-study phenomenon; The bigger ε is, the less the number of SVM is and the lower the estimated precision of the function is. So the value is generally chosen between 0 and 1.In this paper, C is 50 and ε is 0.011. 4.1 The Process of Load Forecasting
1. Smooth pretreatment of historical data; 2. Establish forecast sample, including historical load data, temperature data, date type attribute etc. and establish the system model. 3. Calculate λi , λ0 , using the improvement SMO algorithm 4. Substitute λi , λ0 into (5) and forecast the load data in some time of the near future making use of forecasting samples. 4.2 Computational Example
Power load data in Shijiazhuang region is used to prove the effectiveness of the model. The power load data from 0:00 at 5/8/2005 to 12:00 at 5/17/2005 are as training sample, and the power load data from 13:00 to 20:00 at 5/17/2005 as testing sample. Average relative error is used to evaluate the accuracy of forecasting, and make comparison of all kinds of training time and MAE, of which the result is shown in Table1. Table 1. Comparison of prediction results of SVR , BP and QP index
MAE%
RMSE%
Training time /s
SVR
2.25
1.6
24
BP
3.10
2.6
86
QP
2.83
1.94
157
Basically the forecasting error is under 3%, and even 80% is under 2%. The results of the study showed that the presented forecast method has higher forecast accuracy than BP artificial neural method and more training speed than SVR employing QP method.
5 Conclusions In the paper, a new short-term load forecasting method is presented to enhance accuracy and cut down training time. In this method, the global QP problem can be broken
1026
J. Wang and K. Wu
down into a sequence of optimization sub-problems of size two. And the results show that the SVR which based on SMO arithmetic has more accuracy than BP artificial neural method and more training speed than SVR employing QP method. It not only resolves the problem that BP method is easy fall to local minimal point and also provides one feasible solution to large datasets.
References 1. Li Yuancheng,Fang Tingjian,Yu Erkeng: Study of support vector machines for short term load forecasting . Proceedings of the CSEE, Vol.23(2003)55-59. 2. Drucker H., Burges C.J.C., Kaufman L., Smola A., Vapnik V: Support Vector Regression Machines. In Mozer M., Jordan M.,Jordan M.,and Petsche T., editors, Advances in Neural Information Processing Systems 9.Cambridge, MA,1997.MIT Press, Page(s):155-161. 3. Platt.J: Fast training of support vector machines using sequential minimal optimization. In B.Schölkopf, C.Burges, and A.Smola,editors, Advances in Kernel Methods-Support vector Learning,MIT Press,1998. 4. Osuna E, Freund R, Girosi F: An improved training algorithm for support vector machine. Proc. The 1997 IEEE workshop on neural networks for signal processing. Amelea Island, FL,1997,276-285. 5. Gary William Flake, Steve Lawrence. Efficient SVM Regression Training with SMO. Machine Learning, vol.46(2002)271-290.
Filtering Objectionable Image Based on Image Content Zhiwei Jiang1, Min Yao1,2 , and Wensheng Yi1 1
2
College of Computer Science, Zhejiang University, Hangzhou 310027, P.R. China State Key Lab. For Software Engineering, Wuhan University, Wuhan 430072, P.R. China
[email protected] Abstract. This paper proposes an effective system to detect adult image. We take this task as a two-class pattern classification problem, The system first applies histogram color model to detect the skin regions, then extracts color, texture, shape features from skin regions, after that, the features is fed to a SVM to determine whether the input image is benign or not. Experimental results show that the proposed method can achieve a satisfactory classification performance with high speed, which is suitable for real-world applications.
1 Introduction Pornographic content on web is harmful for under-age net surfer. It calls for effective way to keep children from accessing the objectionable content. Maintaining a list of URL is a feasible way, the inconvenience is that the lists need to be updated frequently by human. Another method is to utilize the text on web page, this method may fail to work when there is no pornographic text or text is displayed in picture mode. Since nude images are main characteristic of adult web site, filtering nude images by virtue of image analysis technique is a promising way. Forsyth's system [1] used a set of grouping rules to identify naked people. Since there is innumerable human posture, a comprehensive grouping rules may be hard to construct; Wang’s WIPE system [2] adopted matching techniques of CBIR domain to screen objectionable image. In his system, features repository is a crucial factor that will affect accuracy and speed; Jones’s system [3] yield good performance purely based on color attributes in nude detection. But only color cue is insufficient to capture the whole features of images. For a practical system, both accuracy and speed are required, to achieve this goal, we refine objectionable image detection techniques based on the following three points: (1)Through Jones’s work we believe that color can be a powerful cue to detect skin regions. (2)Many wildlife and landscape pictures will resemble human skin in color, under which shape and texture cues is necessary for further discrimination. (3)An appropriate learning scheme and classifier can improve both accuracy and speed. The basic structure of proposed system is shown as Fig. 1. The remainder of this paper is organized as follows, Section 2 introduces color models and skin filter, Features extraction are described in section 3, Classifier are given in section 4, Experiment results are discussed in section 5, The work is concluded in section 6. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1027 – 1031, 2006. © Springer-Verlag Berlin Heidelberg 2006
1028
Z. Jiang, M. Yao, and W. Yi
Web image
Skin filter
SVM
Features extraction
Nude
classifier
Non-nude Fig. 1. Basic structure of nude image detection system
2 Skin Detection For skin filters, we adopt the techniques proposed by Jones and Rehg [3]. The skin pixel likelihood ratio is L( rgb) = P ( rgb | skin) P ( rgb | ¬skin) , here
P (rgb | skin) and P (rgb | ¬skin) is the probability of a given color with rgb value belongs to the skin and non-skin classes. A particular pixel with rgb value is classified as skin if L ( rgb) ≥ θ , θ is a threshold which can be adjusted to trade-off between correct detections and false positives. Since we don’t solely depend on color cue to detect nude images, we wish to accept as much of the probable skin pixels as possible, So we choose to use a lower threshold value at skin filter stage through it may also allow some non-skin pixels to pass the skin filter.
3 Feature Description and Extraction Based on the fact that there is a strong correlation between large patches of skin region and nude images, some researchers use only color cue to separate nude images and non-nude images [3]. We believe that relying solely on color information to identify nude image is insufficient, the main reason lies in: (1)Many wildlife and
(a)
(b)
(c)
(d)
Fig. 2. Typical images marked mistakenly as nude by skin filter relying solely on color cue. (a)tiger with skin resemble human’s in color. (b)image after skin filter with threshold set to 0.6 (c)desert with color similar to human skin (d) image after skin filter with threshold set to 0.6.
landscape pictures will resemble human skin in color. (2)Highly saturated or shadowed skin may be missed. (3)Images close to each other in color may actually be unrelated in semantics. To capture the main characteristics of image, we computed a joint feature vector from the output of the skin detector, the feature vector consists of color, texture and shape information.
Filtering Objectionable Image Based on Image Content
1029
3.1 Color Information The color features are computed directly from the output of the skin detector, they are: percentage of pixels detected as skin, number of connected components of skin, average probability of the skin pixels, the mean and deviation of R,G,B color value of skin pixels and the entropy of R,G,B color value of skin pixels. Then the color feature vector with 12 features is constructed. 3.2 Texture Information Texture is another important primitive visual cue, different textures in an image are usually very apparent to a human observer, but up to day there still lack good mathematical definition for diverse texture, According to the work in [4], The Gabor filter masks can be considered as orientation and scale tunable edge and line detectors. The statistics of these microfeatures in a given region can be used to characterize the underlying texture information. Gabor wavelet based texture is robust to orientation and illumination change, It is a powerful tool to extract texture features. After filtering image with Gabor wavelet at three scales and four orientations, we use mean and the standard deviation to character texture region, then twelve texture features is extracted. For RGB image, each color channel contains twelve features, so the texture feature vector with 36 features is now constructed. 3.3 Shape Information To human observer, Shape is one of the most important feature of object, how to capture this spatial information is a tricky task. Traditionally, moments have been widely used in pattern recognition applications to describe the geometrical shapes [5,6]. Good representation of shape should be independent of position, size, and orientation. We get seven translation, rotation, and scale invariant moments as introduced in [5], these moments are computed using all the information of shape boundary and interior region. An improved moment given in [7] are computed only on the shape boundary. We use Daubechies-4 wavelet to decomposed image into four frequency bands: LL, HL, LH and HH. Then applied a zero-crossing detector in HL, LH and HH band to detect the vertical, horizontal and diagonal edges: Ev , Eh and
Ed , the integrated edge is obtained as E = sqrt ( Ev2 + Eh2 + Ed2 ) . We then adopt the method in [7] to get 4x7=28 invariant moments from E , Ev , Eh , Ed edge image, so the shape feature vector with 35= 28+7 features is now constructed.
4 Images Classification Based on SVM The image feature extraction stage, described in the previous section, produce a feature vector with 83 features for each image. In this stage, our task is to find the decision rule that will optimally separates objectionable image from benign one. Traditional learning techniques can do well in minimizing the training error, but it does not imply a small expected test error, In other words, their carry little
1030
Z. Jiang, M. Yao, and W. Yi
information about the generalization ability of a learning machine. SVM is a new kind of machine learning method based on Statistical learning theory proposed by Vapnik [8], SVM pursues structure risk minimum (SRM) instead of empirical risk minimum (ERM), this principle can guarantee generalization ability of system. SVM is designed by finding a optimal hyperplane to solve two-class problems, Our classification task is a binary classification problems to distinguish nude image from non-nude one, So SVM is an ideal classifier for our classification task.
5 Experimental Results We used manually classified 300 nude images and 1200 non-nude images to train a SVM classifier. all images are obtained from Internet. Nude images include Caucasians, Asians and Blacks. Non-nude images are diverse, including human, wildlife, buildings, scenery and so on. SVM training process take the SMO algorithm introduced in [9]. The classifier was then tested on a set of 2989 test images with 338 adult image and 2651 benign images. Recall and accuracy are used as the performance measures, Experiment results show that the proposed method achieve a satisfactory performance with high speed, it obtain 97.3% recall and 98.09% accuracy. Table 1 summarizes the results.
(a)
(b)
(c)
Fig. 3. Typical mistakenly classified images. (a) close-up face image (b) little undressed pornographic image (c) poor exposed image.
Mistaken classification occurs mostly in three cases: close-up face image, little undressed pornographic image and poor exposed image as shown in figure 3. To deal with these cases, face detection and posture detection techniques are required. The average processing speed is about 14 images/s on PC (Pentium IV 2.66GHZ CPU, 512M RAM). The high speed makes our algorithm practical for real-world applications. Table 1. Performance of SVM classification Number of correctly classified images
Recall
Nude images(338)
329
97.3%
Non-nude images(2651)
2603
N/A
Accuracy 98.09%
Filtering Objectionable Image Based on Image Content
1031
6 Conclusion Our work is to develop techniques to filter adult images on Internet based on image content. We first applies histogram color model to detect the skin regions, Then a joint features vector contains color, texture and shape information are extracted. We utilize Gabor wavelets to extract texture features, Daubechies-4 wavelet to get edge image, invariant moments to extract shape features. After joint feature vector is composed, It is fed to a SVM to classify whether input image is nude or not. Experiment results show that the proposed method achieves a satisfactory performance with high speed, which make it suitable for real-world applications. At present, our SVM classifier is fixed once its train process is finished, this is not enough to classify infinite new images. One improvement is to use much large collections of training images, the other improvement is to incorporate a long-term learning scheme into our system, like relevance feedback and SVM active learning techniques. The latter one is what we are going to investigate. We will also incorporate face and posture detection technique into our system in new version.
Acknowledgement This work is supported by the Specialized Research Fund for the Doctoral Program of Higher Education (SRFDP) under contract 20040335129, and the Natural Science Foundation of Zhejiang Province under contract Z104267.
References 1. Margaret Fleck, David A. Forsyth, Chris Bregler: Finding Naked People. In European Conf. on Computer Vision. (1996)Vol. 2: 593-602. 2. J. Z. Wang, G Wiederhold and O. Firschein: System for Screening Objectionable Images. Computer Communications. (1998)Vol. 21:1355–1600. 3. M. J. Jones and J. M. Rehg: Statistical Color Model With Application to Skin Detection. Proceedings of CVPR. (1999)274-280, June. 4. Ma, W.Y., Manjunath, B.S.: Texture features and learning similarity. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. (1996)425-430. 5. M. K. Hu: Visual Pattern Recognition by Moment Invariants. IEEE Transactions on information theory. (1962)Vol. 8, Issue 2, 179-187. 6. R. J. Prokop and A. P. Reeves: A survey of moment based techniques for unoccluded object representation and recognition. Graphical Models and Image Processing. (1992)Vol. 54, No. 5, pp. 438-460, September. 7. C.C. Chen: Improved moment invariants for shape discrimination. Patten Recognition. (1993)Vol. 26: 683-686. 8. V. Vapnik: The Nature of Statistical learning Theory. New York, Springer-Verlag, 1995. 9. John C. Platt: Fast Training of Support Vector Machines using Sequential Minimal Optimization. http://www.reaserach.microsoft.com/~jplatt
MRA Kernel Matching Pursuit Machine Qing Li, Licheng Jiao, and Shuyuan Yang Institute of Intelligent Information Processing and National Key Laboratory for Radar Signal Processing, Xidian University, Xi’an 710071, China
[email protected] Abstract. Kernel Matching Pursuit Machine (KMPM) is a relatively new learning algorithm utilizing Mercer kernels to produce non-linear version of conventional supervised and unsupervised learning algorithm. But the commonly used Mercer kernels can’t expand a set of complete bases in the feature space (subspace of the square and integrable space). Hence the decision-function found by the machine can’t approximate arbitrary objective function in feature space as precise as possible. Multiresolution analysis (MRA) shows promise for both nonstationary signal approximation and pattern recognition, so we combine KMPM with multiresolution analysis technique to improve the performance of the machine, and put forward a MRA shift-invariant kernel, which is a Mercer admissive kernel by theoretical analysis. An MRA kernel matching pursuit machine (MKMPM) is constructed in this paper by Shannon MRA shift-invariant kernel. It is shown that MKMPM is much more effective in the problems of regression and pattern recognition by a large number of comparable experiments.
1 Introduction The KMPM uses kernel methods to map the data in input space to a high-dimensional feature space in which the problem becomes linearly separable [1,2], and many kinds of kernels can be used in KMPM as long as they are Mercer admissive kernels [1], such as the RBF and polynomial kernels. However, the commonly used Mercer kernels can’t expand a set of complete bases in the feature space (subspace of the square and integrable space). Hence the decision-function found by the machine can’t approximate arbitrary objective function precisely in feature space, which means the decision function learned by KMPM is only the approximation to the objective function but not its reconstruction. The concept of multiresolution approximations of L2 ( R ) and the way to link it to
orthonormal wavelet bases of L2 ( R ) was introduced by Mallat and Zhang [3,4]. According to this technique, the MRA functions can be proved to be a set of complete bases in the subspace of the square and integrable space, which shows the great flexibility and validity in signal coding and information processing. Nowadays, the application field of multiresolution analysis includes many domains such as pattern recognition, selective modifications of signals, detection and classification. Since the success of the multiresolution analysis in machine learning, it is valuable for us to study the problem of whether a better performance could be obtained if we apply multiresolution analysis to KMPM. In this paper, an MRA shift-invariant kernel, Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1032 – 1036, 2006. © Springer-Verlag Berlin Heidelberg 2006
MRA Kernel Matching Pursuit Machine
1033
Shannon MRA kernel, is proposed, which is an admissible Mercer kernel by theoretical analysis. Numerical experiments show the validity for improving the performance of KMPC.
2 MRA Mercer Kernel It is well known that the Shannon MRA is essentially the only regular MRA having the property of shift-invariance [5]. The form of the Shannon MRA can be expressed as
φ S (t ) = 2 − m 2
sin π (2− m t − n) , π (2− m t − n)
(1)
where m, n ∈ Z . When using Shannon MRA in pattern recognition, we give a minor revision to the Shannon MRA and rewrite it as
φ S (t ) =
sin π (
t
α t
− b)
απ ( − b) α
,
(2)
which will be get when substituting 2m , n with α (dilation factor), b (shift factor). Here, α ,b ∈ R . In order to make use of the advantage of multiresolution analysis, we propose a MRA Mercer kernel. First, one lemma must be cited. Lemma 1 (Mercer dot-product MRA kernel [6]). Let φ ( x) be a MRA function,
and let a and b denote the dilation and shift factors, respectively. If x, x ' ∈ R n , then MRA dot-product kernel is ' ' n ⎛ x −b ⎞ ⎛ x −b ⎞ K (x, x ') = ∏ φ ⎜ i i ⎟ φ ⎜ i i ⎟ , ⎝ a ⎠ ⎝ a ⎠ i =1
(3)
From lemma 1, the Mercer MRA dot-product kernel of the MRA function φ ( x) can be constructed directly, but the dilation and shift factors must be predefined. However, when kernel method is used for the fields of machine learning, the lack of shift-invariance is especially undesirable, because the choice of the parameters in the kernel functions is quite difficult. Lemma 2 (Mercer shift-invariant kernel [6]). The shift-invariant kernel K (x, x ') = K (x − x ') is an admissive Mercer kernel if and only if the Fourier transform of K (x) satisfy
F [ K ](ω ) = (2π ) − n 2 ∫ n exp(−i (ω ix) K (x)dx ≥ 0 , R
where x ∈ R n .
(4)
1034
Q. Li, L. Jiao, and S. Yang
From lemma 2, we only need to predefine dilation factor if we could design a MRA Mercer shift-invariant kernel. So the difficulty of the MRA kernel to the practical use will be greatly decreased. In the following, a practical MRA shift-invariant kernel is designed by the MRA function chosen as Shannon MRA function. Theorem 3. Given the MRA function (2) and the dilation factor α ≥ 0 . If x, x ' ∈ R n , the MRA shift-invariant kernel is
⎛ ⎛ xi − xi′ ⎞ ⎞ ⎜ sin π ⎜ ⎟⎟ ⎝ α ⎠ ⎟, , K (x, x ') = ∏ φS ( xi − xi′) = ∏ ⎜ ⎛ xi − xi′ ⎞ ⎟ i =1 i =1 ⎜ ⎜ απ ⎜ α ⎟ ⎟ ⎝ ⎠⎠ ⎝ n
n
(5)
which is an admissive Mercer kernel. Now, the regression function learned by MRA kernel is ⎛ ⎛ xi − xi ⎜ sin π ⎜ ⎝ α f R (x) = ∑ β k K (i, x k )=∏ β k ⎜ ⎜ ⎛ xi − xi k =1 k =1 ⎜ απ ⎜ α ⎝ ⎝ N
N
⎞⎞ ⎟⎟ ⎠⎟ , ⎞⎟ ⎟⎟ ⎠⎠
(6)
and the decision function for pattern recognition is ⎛ ⎛ ⎛ xi − xi ⎞ ⎞ ⎞ ⎜ N ⎜ sin π ⎜ ⎟ ⎟⎟ ⎛ ⎞ ⎝ α ⎠ ⎟⎟ , f C (x) =sgn ⎜ ∑ β k K (i, x k ) ⎟ =sgn ⎜ ∏ β k ⎜ ⎜ k =1 ⎜ ⎛ xi − xi ⎞ ⎟ ⎟ ⎝ k =1 ⎠ ⎜ ⎜ απ ⎜ α ⎟ ⎟ ⎟ ⎝ ⎠ ⎠⎠ ⎝ ⎝ N
where x k = ( x1 ,
(7)
, xn ), k = 1 ∼ N are support points.
3 Validation of MKMPM’s Effectiveness In the paper, we compare the MRA kernel function with the most commonly used kernel function, RBF kernel. In each experiment of this paper, the parameters p for the RBF kernel and α for the Shannon MRA kernel are selected by using cross validation that is in wide use [7]. We adopt the parameters’ notation of KMPM as follows: N maximum of the basis functions; stops KMPC stopping criterion (predefined accuracy); fitN KMPC doing a back-fitting in every fitN-step. In the test of
—
—
—
regression, we adopt the approximation error as ess =
(∑
l i =1
( yi − f i ) 2
) l , and both
in the regression and classification tests, the loss function of the KMPM adopts squared loss. For avoiding the weak problem, each experiment has been performed 50
MRA Kernel Matching Pursuit Machine
independent runs, and all experiments were carried on a Pentium MB RAM using Matlab6.01 compiler.
1035
Ⅳ 2.6Ghz with 512
3.1 Regression Experiments
A well-known dataset, Boston housing dataset, choosing from the UCI machine learning database1, has been tested in this experiment. The input space of Boston housing dataset is 13 dimensions. It contains 506 samples, and some of them are taken as the training examples while others as testing examples.
Test
Parameters:
RBF
kernel
K (x, y) = exp(− x − y
2
2 p2 )
with
p = 3.32e + 2 , MRA kernel α = 3.53e + 2 and parameters of KMPM N = 180, stops = 0.01, fitN = 10 . Table 1 lists the approximation errors using the two kernels. Table 1. Approximation Results of Boston Housing Dataset
# Training
# Test
350
156
400
106
450
56
Kernel MRA RBF MRA RBF MRA RBF
# s.p. 11 7 13 7 12 7
Error 8.0121 8.0966 7.9801 8.0736 7.2311 7.8405
3.2 Experiments of Pattern Recognition
Waveform is one dataset of the UCI repository. It has 21 characteristic attributes with all including noise and one class attribute. It contains three classes, waveform 0, waveform 1 and waveform 3, whose number of data is 5000. When training KMPM, we choose one class as positive samples and others two classes as negative samples. The selection of the training samples is shown in table 6 and the left samples as test data. In the table 2, “Team i ” means the test on waveform i (i = 1 ∼ 3) and “+, -” represent positive and negative samples, respectively. Table 2. Selection of the Training Data
Group 1 2
1
Team 1 + 200 400 250 500
URL:http://www.ics.uci.edu/mlearn
# Training examples Team 2 Team 3 + + 200 400 200 400 250 500 250 500
1036
Q. Li, L. Jiao, and S. Yang
We select RBF kernel function p = 10 and MRA kernel σ = 10 . KMPM Parameters: N = 120, stops = 0.05, fitN = 4 . Table 3 lists the classification result by MRA kernel and RBF kernel, respectively. Table 3. Recognition Results of Waveform Dataset Selection of the Training Recognition Results of Artificial Dataset
Group 1 2
Kernel MRA RBF MRAl RBF
Team 1 91.35% 89.47% 92.65% 90.12%
Recognition rates Team 2 92.01% 89.83% 92.45% 89.87%
Team 3 91.86% 89.95% 93.01% 90.56%
4 Concluding Remarks Kernel Matching Pursuit Machine (KMPM) is a class of learning algorithm utilizing Mercer kernels to map the input space of the problems to a high-dimensional feature space in which the problem becomes linearly separable, while allowing a better control of the sparsity of the solution. But the commonly used Mercer kernels can’t expand a set of complete bases in feature space. Hence the decision-function found by the machine can’t approximate arbitrary objective function in such space as precise as possible. In this paper, the multiresolution analysis technique has been combined with the kernel matching pursuit machine to improve the performance of the machine, and the Shannon MRA shift-invariant kernel has been proposed, which is also a Mercer admissive kernel by theoretical analysis. At last, the MRA kernel matching pursuit machine (MKMPM) has been constructed and the results of our simulations show the feasibility and validity of the MKMPM in regression and pattern recognition.
References 1. Engel, Y., Mannor, S., Meir, R. The kernel recursive least-squares algorithm. IEEE Trans. Signal Processing, vol. 52, Issue: 8, pp. 2275-2285, August 2004. 2. Pascal Vincent, Yoshua Bengio. Kernel matching pursuit. Machine Learning, 48:165--187, 2002. 3. Davis G., Mallat S., Z. Zhang. Adaptive time-frequency decompositions. Optical Engineering 33(7), 2183-2191. 4. Mallat S. A theory for nuliresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Machine Intell., vol. 11, pp. 674-693, July 1989. 5. Bastys, A. Periodic shift-invariant multiresolution analysis. IEEE, Digital Signal Processing Workshop Proceedings, pp.398 – 400, 1996. 6. L. Zhang, W. D. Zhou, L. C. Jiao. Wavelet support vector machine. IEEE Trans. On Systems, Man, and Cybernetics. Part B: Cybernetics. Vol. 34, no. 1, February 2004. 7. Kearns M., Ron D. Algorithmic stability and sanity-check bounds for leave-one-out cross validation. Proc. Tenth Conf. Comput. Learning Theory. New York: ACM, 1997, pp. 152162.
Multiclass Microarray Data Classification Using GA/ANN Method Tsun-Chen Lin1, Ru-Sheng Liu1, Ya-Ting Chao2, and Shu-Yuan Chen1 1
Department of Computer Science and Engineering, Yuan Ze University, Nei-Li, Chung-Li, Taoyuan, 32026, Taiwan, ROC
[email protected],{csrobinl, cschen} @saturn.yzu.edu.tw 2 Graduate School of Biotechnology and Bioinformatics, Yuan Ze University, Nei-Li, Chung-Li, Taoyuan, 32026, Taiwan, ROC
[email protected] Abstract. This work aims to explore the use of gene expression data in discriminating heterogeneous cancers. We introduce hybrid learning methodology that integrates genetic algorithms (GA) and artificial neural networks (ANN) to find optimal subsets of genes for tissue/cancer classification. This method was tested on two published microarray datasets: (1) NCI60 cancer cell lines and (2) the GCM dataset. Experimental results on classifying both datasets show that our GA/ANN method not only outperformed many reported prediction approaches, but also reduced the number of predictive genes needed in classification analysis.
1 Introduction The analysis of gene expression profiles, which serve as molecular signatures for cancer classification, has become a challenging research topic in bioinformatics. The special characteristic of microarray classification problem is that the datasets usually contain very small samples in an extremely high dimensional gene space. In the past few years, many classification algorithms combining rank-based gene selection methods have been applied to 2-class to 4-class microarray datasets and most of them can achieve prediction accuracies close to 95-100%. However, if this problem was expanded to datasets that contain more than five classes such as the 9-class NCI60 dataset [1] and the 14-class GCM dataset [2], the performance of these methods would deteriorate significantly [3]. Instead, GA was proposed to extract genes in chromosomes to consider the correlations between features on the actual classification task itself [4-7]. Consequently, these hybrid learning methodologies have shown a better degree of accuracy than rank-based methods in multiclass microarray classification. This leads our works to ensemble above two approaches for classifying more difficult datasets in two steps. First, in order to filter irrelevant genes, the between-groups to within groups sum of squares (BSS/WSS) ratio of Dudoit et al. [8] was first performed for small data subsets. Next, we applied GA/ANN in a way that considers correlations between genes and allows non-linearly separating data in classification. Finally, GA/ANN achieved greater classification accuracy of 90.8% and 100% for Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1037 – 1041, 2006. © Springer-Verlag Berlin Heidelberg 2006
1038
T.-C. Lin et al.
NCI60 data and GCM data respectively and needed less number of genes than reported techniques.
2 Methods To reduce the dimensionality space of genes from 6128-14476 to 1000 for NCI60 data and GCM data respectively, a preliminary selection of genes with the highest BSS/WSS ratios are selected for small data subsets. Besides, we also randomly split them into 30 perturbed versions of data, and each of them is divided into the ratio of 2:1 (2/3 samples for training and 1/3 samples for testing). For feature selection, the genetic algorithms were adopted from Ooi and Tan [4], with toolboxes of two selection methods including stochastic universal sampling (SUS) and roulette wheel selection (RWS). In addition, two tuning parameters, Pc: crossover rate and Pm: mutation rate, were used to tune one-point and uniform crossover operations to evolve the population of individuals in the mating pool. The format of chromosome is represented by the string Si, Si = [g1 g2 … g20], where g1, g2,… g20, are the indices of 20 genes corresponding to a dataset and to be used as discriminatory genes evaluated by the fitness function of f (Si) = (1 – Et) ×100 for sample classification, where Et means the prediction error rate of the training set. The parameters of GA to work with ANN in classifying data are given by the following procedure. Procedue GAANN For seeds = 1 to 10 { generate initial population of 50 chromosomes with the size of genes equal to 20 while (the maximum generations = 100 is not reached) { select a population by stochastic universal sampling operation for mating pool generate offspring using uniform crossover from parents using Pc=1 apply mutation to offspring using Pm=0.002 replace worst individuals with offspring determine the useful genes by the score of fitness = 100 }} For classification tasks, the feedforward perceptron algorithm of ANN was developed by modifying those of Nørgaard [9] to set up the neural networks. Figure 1 shows the 3-layer winner-takes-all neural networks with 20 neurons and 10 neurons for the input and the hidden layer respectively. The chromosomes consisting of 20 features will represent patterns of training samples with known class labels y =1,…q to be fed into input layer to build the classifier. The target is made depending if a sample belonging to class p, the output target for the p-th neuron is set to “1” and the other neurons elsewhere are set to “0”. Therefore, the presentation of training samples will then induce the weight matrix of the class indicator neuron to have the maximum activation. So, if we want to predict the class of an unseen query sample of class label p, the system output should be the “winning unit”, i.e., the neuron of p with, yp> yk for p ≠ k, k =1…q. Otherwise the sample is misclassified. The symbol yp is defined as 20 ⎛ 10 ⎞ y p = Fp ⎜ W pr f r ( wri g i +wr 0 ) + w p 0 ⎟ ⎜ ⎟ i =1 ⎝ r =1 ⎠
∑
∑
(1)
Multiclass Microarray Data Classification Using GA/ANN Method
1039
where gi are the neuron inputs and wpr are the input weights for the p-th neuron. F(u)=tanh(u) is a hyperbolic tangent function for neurons to perform a nonlinear mapping of the neuron input. g1
g2
g20 chromosomes
bias neuron
bias neuron
wr0
....
wr1
wr20 20 input neurons f1(·)
Wp0
F2(·)
F1(·)
y1
..
y2
fr(·)
10 hidden neurons
Wpr
....
Fp(·) yp
output neurons
Fig. 1. The neural architecture runs the GA/ANN method
3 Results During the 300 runs of GAANN procedure on 30 datasets, we obtained almost 150,000 and 1,200,000 subsets of genes that can fully separate training samples with 100% accuracy for NCI and GCM data respectively. We then examined the frequency of membership of the genes and calculated the mean µ of frequencies, and the standard deviation σ of each feature in these near optimal sets. The 1000 genes were subsequently rank ordered according to the number of times each was selected as shown in Figure 2(a). We first identified 10 top ranked genes by their access numbers through GenBank, namely W03157, AA045756, AA053504, AA033882, N98804, AA047106, W25510, H18563, W73203, and W47652 (σranging from 3.56 to 10.54) for NCI data and U43944, R33301, W68502, AA024428, T50576, S77393, X52332, D11922, AA412505 and CCNE (σ ranging from 2.38 to 2.97) for GCM data to classify 30 versions of test samples. As the results, the novel samples of NCI data were correctly classified with average accuracy of 90.8% (range 75-100%) and 100% (range 100-100%) classification accuracy for GCM data. Furthermore, we also tested GA/ANN on the 4-class SRBCT dataset [10]. Expectably, we obtained the same results of 100% prediction accuracy as those in Deutsch [5] and Lee et al. [11] while cho et al. [12] used the same method; they only achieved 96% classification accuracy for SRBCT data. With a choice of only a few top genes, the classification may not be reliable, when too many genes will add noise to the classification. For both datasets, we tried to vary the size of input pattern in steps of 10, first using the top 10 genes, then adding the other top 10 genes and so forth up to the top 200 genes as the best subset to classify the test samples among 30 datasets to find the average classification performance. Figure 2(b) illustrated that the GA/ANN with 10 top scoring genes were good enough for classification, and the classification accuracy dropped significantly
1040
T.-C. Lin et al.
when more than 60 genes (σ ranging from 1.68 to 2.97) were used for GCM data. In addition, we also compared the classification accuracies achieved by GA/ANN with previously published methods in Table 1. 7
x 10
4
1 0.9
6
0.8
(a)
0.7
Accuracy
Frequency
5 4 3
0.5 0.4 0.3
2 NCI60 GCM
1 0 0
(b)
0.6
0.2
NCI60 GCM
0.1
200
400
600
800
1000
Genes sorted by frequency
0 0
20
40
60
80
100 120 140 160 180 200
Number of top scoring genes
Fig. 2. The effect on classification accuracy of selected genes Table 1. Results compared to some other reported methods
Classification Method NCI60 data GA/ANN GA/MLHD [4] GA/SVM [7] GA/KNN [6] BSS/WSS/DLDA [8]
Accuracy (%) 90.8 80 88.52 76.23 88.33
No. of genes 10 13 40 30 30
Classification Method GCM data GA/ANN GA/MLHD [4] GA/SVM [7] OVA/SVM [13] OVA/KNN [13]
Accu- No. of racy (%) genes 100 86 80.99 78.26 54.34
10 32 40 16063 100
4 Conclusion The main issue in pattern recognition for microarray data classification is to identify useful genes for accurate classification. Our belief is that such a hybrid learning system will identify more discriminatory feature subsets than those produced by existing methods for two reasons. First, GA/ANN that embedded gene selection process in classification task itself can explore more informative genes than rank-based gene selection methods. Second, neural networks describing the nonlinear relationships between genes for discriminating samples are more powerful than linear discriminant analyses. The results not only show good classification accuracy but also need the least number of discriminatory genes, and we hope it can be useful for microarray data analysis in cancer diagnosis. Acknowledgments. This work was partially supported by the National Science Council of Taiwan, R.O.C., under Grants NSC-95-2745-E-155-008-URD.
Multiclass Microarray Data Classification Using GA/ANN Method
1041
References 1. Ross, D.T., Scherf, U., Eisen, M.B., Perou, C.M., Rees, C., Spellman, P., Iyer, V., Jef-frey, S.S., Van de Rijn,M., Waltham, M., Pergamenschikov, A., Lee, J.C., Lash-kari, D., Shalon, D., Myers, T.G., Weinstein, J.N., Botstein, D., Brown, P.O.: Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics 24 (2000) 227–235 2. Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T. et al.: Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences USA 98 (2001) 15149–15154 3. Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20 (2004) 2429-2437 4. Ooi, C.H., Tan, P.: Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19 (2003) 37–44 5. Deutsch, J.M.: Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics 19 (2003) 45–52 6. Thanyaluk, J.U., Stuart, A.: Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6:148 (2005) 1-11 7. Liu, J.J., Cutler, G., Li, W., Pan, Z., Peng, S., Hoey, T., Chen, L., Ling, X.B.: Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics 21 (2005) 2691–2697 8. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal American Statistical Association 97 (2002) 77-87 9. Nørgaard, M.: Neural Network Based System Identification Toolbox. Tech. Report. 00-E891, Department of Automation, Technical University of Denmark. (2000) 10. Khan, J., Wei, J.S., Rigner, M., Saal, L.H., Ladani, M., Westermann, F., Berthold, F., Schwab, M., Antonescus, C.R., Peterson, C., Meltzer, P.S.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7 (2001) 673–679 11. Lee, Y., Lee, C.K.: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19 (2003) 1132–1139 12. Cho, H.S., Kim, T.S., Wee, J.W., Jeon, S.M., Lee, C.H.: cDNA Microarray Data Based Classification of Cancers using Neural Networks and Genetic Algorithms. Nanotech 1 (2003) 28-31 13. Yeang, C.H., Ramaswamy, S., Tamayo, P., Mukherjee, S., Rifkin, R.M., Angelo, M., Reich, M., Lander, E., Mesirov, J., Golub, T.: Molecular classification of multiple tumor types. Bioinformatics 1 (2001) 1–7
Texture Classification Using Finite Ridgelet Transform and Support Vector Machines* Yunxia Liu, Yuhua Peng, and Xinhong Zhou School of Information Science and Engineering, Shandong University Jinan, Shandong, China 250100
[email protected],
[email protected],
[email protected] Abstract. Based on energy distribution analysis of FRIT coefficients, a novel feature extraction method of low computation complexity in FRIT domain was proposed for texture classification in this paper. A ‘one-against-one’ multi-class SVM with RBF kernel was adopted as classifier. Experiments carried out on abundant texture databases with varying sizes demonstrated its validity.
1 Introduction In recent decades, texture classification has received considerable attention in various application areas ranging from industrial automation to medical diagnosis etc. Texture analysis methods can be classified into four primary categories, namely statistical, geometrical, model-based and signal processing-based approaches [1]. Lots of wavelet-based methods were proposed and have achieved good results due to their multi-resolution characteristics. However, research on Human Visual System (HVS) revels that it relies on edges, contours more than pixels in information acquisition. Thus 2-D separable wavelet loses its superiority. New transform schemes with better geometrical presentation ability are expected to arise. Based on the statistical study of energy distribution in FRIT domain, a transform domain subband division scheme is obtained. Subband statistics are then extracted as features for classification. Compared with former feature extraction work done by Shutao Li [2] and Arivazhagan [3] in Ridgelet domain, computation complexity of our method is greatly reduced. Besides, FRIT has a much smaller coefficient matrix, thus leads to the possibility of more precise subband division in transform domain. Support vector machine (SVM) is adopted as classifier in this paper. It outperforms the traditional risk-based approaches and has demonstrated excellent performance in a variety of pattern recognition problems in various practical problems [4].The Radial Basis Function (RBF) is used as kernel function, and the ‘one against one’ strategy is adopted for multi-class classification problem. *
This project is sponsored by SRF for ROCS, SEM (2004.176.4) and NSF SD Province (Z2004G01) of China.
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1042 – 1046, 2006. © Springer-Verlag Berlin Heidelberg 2006
Texture Classification Using Finite Ridgelet Transform and SVM
1043
2 Feature Extraction Using Finite Ridgelet Transform FRIT is a multiscale orthogonal transform pioneered by Do and Vetterli for better representation of linear singularities [5]. After the Finite Radon Transform (FRAT) mapping linear singularities in image domain to point singularities in FRAT domain, a collection of orthogonal transforms are applied to FRAT coefficients columnwisely to accomplish the FRIT. The resulting transform is invertible, non-redundant and leads to a family of directional orthogonal bases for digital images. On computing FRIT of an image with dyadic size n × n , one first expands the image to a prime size of p × p , where p is the minimum prime number larger than n . A fast [6] involves p 2 multiplications only, and implementation of the FRAT described in [3] coefficient matrix is p × ( p + 1) . The DRT adopted in has a coefficients matrix twice size of the input image. Computation complexity of the iterative algorithm adopted in [7] is O(n 2 log(n 2 )) , and the coefficient matrix is four times over-complete. Obviously, as the inequality O( p 2 ) ≈ O(n 2 ) < O(n 2 log n 2 ) holds true for all natural number, we conclude that FRIT has advantages over existing methods in terms of both computation complexity and coefficient over-completeness, which will benefit its application in feature extraction. We investigate statistical properties of FRIT for subband division. Firstly, energy distribution among FRIT columns is examined. Fig.1 (a) displays typical normalized energy distribution waveform obtained by D20 in Brodaze. In fact, it holds true for almost each texture image that energy tends to distribute more in the middle ( 45o ) and the both ends ( 0o and 90o ) columns of FRIT, while energy of other columns is relative small and distributed randomly. Definition of FRAT lines may help to explain this that 0o , 45o and 90o FRAT lines being identical to natural lines are more likely to present peaks. Secondly, a three-level decomposition of DWT (which is assumed as the default setup in the sequel) is applied to all FRAT projections for a further lookup of energy distribution within FRIT columns. Our experiment reveals that approximation coefficients of lower frequency concentrate most of the energy within each FRIT columns, which is consistent with wavelet theory. Instructed by the idea that more energy distributed areas possess more information and should be covered with finer subbands, we propose an FRIT-based subband 8
Normallized Energy (%)
7 6 5 4 3 2 1 0
1
20
40
60
80
100
120 132
(a) Fig. 1. (a) Normalized energy distribution ( n = 128 ).
(b) (b) subband division ( n = 64 ).
1044
Y. Liu, Y. Peng, and X. Zhou
division method, see Fig.1 (b). Occasions for 32 × 32 and 128 × 128 images are similar, and are not given for limitation of space.
3 Support Vector Machine for Classification Assume the training set {( xi , yi ), i = 1, 2,K , l} is denoted by D , where each input training vector xi ∈ R n coming from two classes and the output label yi ∈{+1, −1} . Firstly, training vectors x are mapped to a Hilbert space F via a nonlinear map φ : R n → F . Then SVM construct a classifier with maximized separation gaps between the positive and negative examples by solving the following primal problem: min w ,b ,ε
(1)
l 1 T w w + C∑ εi 2 i =1
with the constriction conditions: yi ( wT φ ( xi ) + b) ≥ 1 − ε i , ε i ≥ 0, i = 1,K , l. . [8] For extention of binary SVM to multi-class classifiers, Hsu compared the performance of three commonly used strategies and concluded that ‘one-against-one’ and DAGSVM methods outperforms others for practical use. The ‘one-against-one’ voting strategy is adopted in our method, in which Cl2 = l (l − 1) 2 classifiers are constructed and each takes charge of classifying data from two different classes.
4 Method and Procedures [2]
We adopt the database in as the first database. Sixteen natural textures (i.e. Brick.0000, Fabric.0007, 0008, 0013, 0017, 0018, Flowers.0005, 0006, 0007, Grass.0001, Metal.0000, 0002, 0004, Misc.0001, Sand.0000, 0002) of size 512 × 512 from the VisTex are selected as the second database. It’s a challenging work as some of them are very similar to each other. Complete separate training and testing sets are required in our experiment. Fig.2 depicts the classification steps.
Training Texture Images
FRIT
Feature Library
Feature Extraction (a) Texture Training;
Testing Texture Images
FRIT
Feature Extraction
Classified SVM
Pattern Label
(b) SVM Classification.
Fig. 2. Texture classification datagram
4.1 Feature Extraction and Performance Evaluation Each image is divided into non-overlapping subimages ( 64 × 64 ), thus 1000 and 1024 subimages are resulted for the two databases. A parameter ratio ∈ (0,1) is introduced to control the size of training and testing datasets. Local energies
Texture Classification Using Finite Ridgelet Transform and SVM 1 M ×N
M
1045
N
∑∑ I x =1 y =1
2 i
(2)
( x, y )
from each FRIT subband (indexed by i, i = 1, 2,K , 21 ) are then used as texture features, where M and N denote the subband size. Average of each subimage subtracted in the first step of FRAT is also combined to the feature vector as well. Thus a 22dimensional feature vector is extracted for images with size of 64 × 64 . [2] As suggested by previous work in , DRT combining with DWT helps for classification performance improvement, DWT features are also involved in our method here. Mean and standard deviation of detail subbands of DWT ( LH k , HLk , HH k , [9] for k = 1, 2,3 ) are calculated according to the following formula : Mean(m) =
1 N2
N
∑ d (i, j ) ; Std =
i , j =1
1 N2
(3)
N
∑ (d (i, j ) − m)
2
i , j =1
Performance is evaluated by means of computing classification gain: G (%) =
(4)
Ccorr × 100%, M
where Ccorr is the number of images correctly classified and M is the total number of images belonged to that particular class. 4.2 Experiment Results A RBF kernel function is used here as preliminary results suggest that it performs other kernel for our texture images. The garma value is set to be 0.04 and C=5, which can all be adjusted to achieve better performance. Classification results carried on the first database are given in Tabel.1, together [2] with results in . Obvious improvement of classification gain can be observed, while the FRIT-based methods are of lower computation complexity. Table 1. Comparison of different feature extraction methods in terms of classification gain Ratio 12.5% 25% 50% 75%
DWT 79.8% 84.9% 90.3% 92.4%
DRT 75.9% 84.9% 90.1% 91.3%
COM 84% 91.4% 94.6% 95.7%
FRIT 86.3% 91.2% 91% 92%
FRIT&DWT 93.6% 96.2% 98.2% 98.4%
Experiment on the second database employs a subtler adjusting of ratio . As shown in Fig.3 (a), FRIT-based classification method outperforms the DWT-based one by an average of 3.16 percents, while the combined method of FRIT and DWT possesses the best performance. As training data is always small in practical applications, a further lookup of (a) is given in (b). The combined method achieves an average 1.42 [2] percents classification gain advantage over . A comprise between this two methods can be chosen according to practical application requirements.
1046
Y. Liu, Y. Peng, and X. Zhou 100
100
96 94 92 90 88 FRIT DWT FRIT&DWT
86 84 82 0
(a)
Classification Gain (%)
Classification Gain (%)
98
5 10
20
30
40
50
60
70
80
Proportion of samples used for training (%)
95 90 85 80
70
90
(b)
FRIT DWT FRIT&DWT
75
0
1.56 3.13 4.69 6.25 7.81 9.38 10.94 12.5 14.06 15.6
Proportion of samples used for training (%)
Fig. 3. Classification Gain comparison on database two
5 Conclusions An FRIT-based feature extraction method with low computation complexity for texture classification is proposed in this paper. A multi-class SVM constructed following the ‘one-against-one’ strategy is adopted as classifier, using RBF as its kernel. Experiments carried out on abundant databases demonstrated its validity. Both classification gain improvement and satisfying results are obtained.
References 1. Tuceyran M. and Jain A. K., Texture analysis, in Handbook of Pattern Recognition and Computer Vision, World Scientific Publishing Co., Chapter 2.1, (1998) 207-248 2. Shutao Li, Yi Li,Yaonan Wang, Combining Wavelet and Ridgelet Transforms for Texture Classifications using Support Vector Machines, Proceedings of International Symposium on Intelligent Multimedia, Video and Speech Processing, October (2004)442-445 3. Arivazhagan, S.; Ganesan, L.; Subash Kumar, T.G.; Texture classification using Ridgelet transform. 6th ICCIMA, 16-18 Aug. (2005) 321 – 326 4. C.J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining Knowledge Discovery 2 (2) (1998) l21–167 5. Do, M.N, Vetterli, M, The finite ridgelet transform for image representation, IEEE Transactions on Image Processing, Volume 12, Issue 1, Jan. (2003) 16 - 28 6. Matus, F.; Flusser, J .Image representation via a finite Radon transform. IEEE Transactions on PAMI, Volume 15, Issue 10, Oct. (1993) 996 – 1006 7. David L. Donoho, Ana Georgina Flesia, Digital Ridgelet Transform Based on True Ridge Functions, Technical Report 31 (2001) 8. C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13(2) (2002) 415–425 9. DeBrunner, V.; Kadiyala, M. Texture classification using wavelet transform Circuits and Systems, 1999. 42nd Midwest Symposium , 8-11 Aug. (1999 ), vol. 2, 1053 - 1056
Reduction of the Multivariate Input Dimension Using Principal Component Analysis Jianhui Xi1 and Min Han2 1 Department
of Automation, Shenyang Institute of Aeronautical Engineering, Liaoning, China, 110034
[email protected] 2 School of Electronic and Information Engineering, Dalian University of Technology, Liaoning, China, 116023
Abstract. There are limitations for the existing methods to model multivariate time series because that defining the input components is highly difficult. The main purpose of this paper is to expand the principal components analysis (PCA) method to extract the joint information of multiple variables. First, both the linear correlations and the nonlinear correlations are detected to initialize an embedding delay window, which contains enough information for prediction. Then, the PCA method is expanded to extract the joint information of multiple variables in a complex system. Finally, neural network makes predictions on the basis of approximating both the functional relationship between different variables and the map between current state and future state.
1 Introduction Most of previously published prediction methods are concentrated on the modeling of univariate time series. The main idea involves two steps: reconstruct a phase space from data [1], and find a functional relationship between current state and future state [2]. In the first step, the embedding dimension m and the time delay τ, are computed so that in the m dimensional space, such time-lagged vectors topologically describe over time an object that is equivalent to the conjectured attractor of the physical system. However, for a practical complex system, the internal dynamics is often contained in multiple model variables. When a vectorial time series is available, it is possible to exploit the joint information to obtain a better reconstruction, in turn, also produces a predictive improvement. In fact, once the embedding vectors have been built in reason, the prediction procedure is exactly the same as in the univariate case. But the joint information is highly difficult to be extracted from different time series. Cao et al [3] select m and τ from a great deal of input vector sets according to the prediction precision, which needs big computation. Some previous research [4] suggests that it may be more appropriate to fix the reconstruction window, T = τ (m - 1), rather τ alone. Considering that for the case of time series prediction, recent delays are more important than older delays, τ is usually small. Consequently, m is usually large, especially for multivariate prediction, which may cause the model to have an excess number of inputs that highly correlated. This problem can be solved by reducing Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1047 – 1051, 2006. © Springer-Verlag Berlin Heidelberg 2006
1048
J. Xi and M. Han
dimension of the delay vector with Principal component analysis (PCA) method. Liang et al [5] proves the optimal performance of the proper orthogonal bases of the singular value decomposition (SVD). Therefore, PCA realized by SVD extracts the principal components from stochastic process. In this paper, to include enough prediction information in the input vector, a linear function and a nonlinear function are respectively used to detect the linear and the nonlinear correlations in the data [1]. An initial embedding delay window is defined. Then, use PCA to reconstruct the input vectors and use neural networks to model the relationship between inputs and outputs. In the following sections, section 2 describes the methodology and basic model structure. In section 3, simulation using the proposed method is described. Finally, section 4 gives a summary and discussion.
2 Basic Modal Structure More than one time series can be observed in the modeling of complex system. Figure 1 shows a schematic representation of the basic prediction model with n variables, where x1(t), , xn(t), t = 1, , N, are the observed time series. y1(t), , yn(t) are the output of the prediction model. The error are denoted by e1(t), , en(t). m1, , mn are the embedding dimension corresponding to each variable. Taking a system including two variables as an example, the initial input vector is built like r x (t) = {x1(t), , x1(t-(m1-1)τ), x2(t), , x2(t -(m2 -1)τ)}T
…
…
{
…
v v = x1 (t ) T , x 2 (t ) T
…
}
T
…
…
…
(1)
First, in order to retain enough prediction information for each variable, a delay window Tim =τi (mi - 1) (i = 1, 2) are often first fixed. Here a linear function and a nonlinear function are respectively used as Φ xx (Tim ) = E{[ x i (t ) − x i (t )][ xi (t − Tim ) − xi (t )]}
(2)
Φ x 2 x 2 (Tim ) = E{[ x i2 (t ) − x i2 (t )][ x i2 (t − Tim ) − x i2 (t )]}
(3)
where Φxx(Tim) could only detect the linear correlations, Φ x2 x 2 (Tim ) is designed to
Fig. 1. Structure of basic prediction model
Reduction of the Multivariate Input Dimension Using Principal Component Analysis
1049
detect nonlinear correlations in the data. E{⋅} is the mathematical expectation and the overbars indicate averaging with respect to time. Assuming that Timx is the time of the 2
first maximum of Φxx(Tim) and Timx is defined analogously for Φ x2 x 2 (Tim ) , choose Tim = max( Timx
,T
x2 im
)–1
(4)
The data in Tim could contain the linear and nonlinear correlations of the series. v v Second, PCA is separately implemented for x1 (t ) and x 2 (t ) . Because a suitable τi for all model variables is difficultly obtained, we often choose τ i =1, which require v v the lowest functional complexity. If z1 (t ) is the principal components of x1 (t ) , v v v and z 2 (t ) corresponds to the x 2 (t ) , the principal components z (t) of input vector is
{
v v v z (t ) = z1 (t )T , z2 (t )T v ~ v zi (t ) ≈ xi (t )V
}
T
(5)
(6)
~ where V (mi × r) is the first r columns of V which is obtained from U ΣVT = Xi Σ = diag s1 s 2 L s p
[
0 L 0
]
(7) (8)
v Xi (l × mi) contains the original x i (t), l = N – max [(mi - 1)τi]. i
Estimate the energy contribution ηj of a principal component as
ηj =
s 2j p
∑ i =1
, j = 1, 2,
si2
…, p r
Retain the corresponding r principal components if
∑η j j =1
(9)
> η0, 0.5 δ ,
(5)
when x ≤ δ ,
where, α , β are the parameters of the nonlinear function. Usually, α is between 0 and 1 ( 0 < α ≤ 1 ). When α = 1 , it is the linear function of f ( x) = x . δ is a small positive number applied to create a small linear area in this nonlinear function when x is around zero. e(t ) is the error. The nonlinear function f ( x, α , δ ) gives high gain for small x and a small gain for large x. 3.2 The Neuron Controller [5]
According to the neuron model and the learning strategy described as above, the neuron model-free control method is proposed as follows [5] n ⎧ K ∑ w i (t ) x i (t ) ⎪ i =1 ⎪⎪ u ( t ) = , n ⎨ w ( t ) ⎪ ∑ i i =1 ⎪ ⎩⎪ w i ( t + 1) = w i ( t ) + de ( t ) u ( t ) x i ( t ).
(6)
where u (t ) , y (t ) are the input and output of the plant respectively, and u(t ) is the control signal produced by the neuron. e(t ) = r (t ) − y (t ) , r (t ) is the set point, and the neuron inputs xi (t ) can be selected by the demands for the control system designs. 3.3 The Neuron Based Nonlinear PID Control Method
The neuron is used to construct the nonlinear PID controller by combining the advantages of the neuron controller and the conventional nonlinear PID controller. The neuron based nonlinear PID control system is set up and shown in Fig.2.
r(t)
+
⊗
e(t)
-
•
f (e )
w1 (t )
∫
w 2 (t )
f (e& )
w 3 (t )
f ( ed t)
+ ⊗
u (t) K
y (t) p la n t
+
L ea rn in g a lg o rith m
Fig. 2. The neuron based nonlinear PID control system
In this control system, the neuron based nonlinear PID controller is constructed by selecting the neuron inputs, and the control action of the nonlinear PID controller are determined by modifying the neuron weights on-line. The neuron based nonlinear PID
1092
N. Wang and J. Yu
controller produce the control signal in model-free way. It is designed as follows, the inputs are selected as x1 (t ) = f (e(t ),α P , δ P ), x2 (t ) = f ( ∫ e(t ) dt ,α I , δ I ), x3 (t ) = f (e&(t ),α D , δ D ).
(7)
where, the nonlinear function f (⋅) is defined as Eq.(5), e(t ) = r (t ) − y (t ) .From Eqs.(6) and (7), we have u (t ) = Kw1 (t ) x1 (t ) + K w 2 (t ) x 2 (t ) + Kw3 (t ) x 3 (t ) = K P (t ) f (e (t ), α P , δ P ) + K I (t ) f ( ∫ e (t ) dt , α I , δ I ) + K D (t ) f (e&(t ), α D , δ D ).
(8)
3
Where, w1 (t ) = wi (t ) / ∑ wi (t ) , K P (t ) = Kw1 (t ) , K I (t ) = Kw2 (t ) , K D (t ) = Kw3 (t ) . i =1
Thus, the Eqs.(6) and (7) construct the neuron based nonlinear PID controller. According to the description as above, the neuron based nonlinear PID control is proposed as follows 3 ⎧ K ∑ w i (t ) x i ( t ) ⎪ ⎪ u ( t ) = i =1 , 3 ⎪ w t ( ) ⎪ ∑ i ⎨ i =1 ⎪ w (t + 1) = w (t ) + d e (t ) u ( t ) x ( t ), i i i i ⎪ ⎪ x1 ( t ) = f ( e (t ), α P , δ P ), x 2 ( t ) = f ( ∫ e ( t ) dt , α I , δ I ), ⎪ x (t ) = f ( e& ( t ), α , δ ). D D ⎩ 3
(9)
where, K, d i (i=1,2,3), α P , δ P , α I , δ I , α D , δ D are the constants to be determined.
4 Simulation Tests and Results The plant with big uncertainties of a paper-making process is taken as an example to verify the proposed nonlinear PID control method in Eq.(5). The dynamic characteristics of the plant can be written as follows [8] ⎧ 0.2719 ⎪ ⎪ z ( z − 0.8187 ) ⎪ 0.4484 G( z) = ⎨ ⎪ z ( z − 0.7788 ) ⎪ 0.7087 ⎪ z ( z − 0.7165 ) ⎩
when 80 g / m 2 paper is made , when 100 g / m 2 paper is made ,
(10)
when 120 g / m 2 paper is made.
The experiments of using the proposed nonlinear PID controller are made. the controller parameters are selected as follows: K = 1.1 , d1 = 15 , d 2 = 1 , d 3 = 10 ,
α P = 1 , α I = 0.5 , α D = 0.5 , δ P = 0.1 , δ I = 0.4 , δ P = 0.3 . The simulation results are shown as in Fig.3. From the simulation results, it is known that the response curves of using the neuron based nonlinear PID controller are close to the same shape.
basis weight
Neuron Based Nonlinear PID Control
1093
1 0.8 0.6 0.4 0.2 0
0
50
100
150
min.
Fig. 3. The simulation results of the proposed controller
5 Conclusions This paper work illustrates that the neuron based nonlinear PID controller can efficiently control a plant with big uncertainties. The main advantages of this model-free control method have strong adaptability and robustness. It shows good performance of the control system when different kinds of paper are made. Acknowledgments. This paper is supported by National Science Foundation of China under grant 60421002 and 70471052.
References 1. F. G. Shinskey.: Process control systems: application, design and adjustment. McGraw-Hall Book Company, New York (1988) 2. F.J. Jiang., et al.: An application of nonlinear PID control to a class of truck ABS problems. Proc. of the 40th IEEE Conference on Decision and Control, Orlando (2001) 516-521 3. A.A.S. Ibrahim.: Nonlinear PID controller design using fuzzy logic. Proc. of the 11th Mediterranean Electrotechnical Conference, Menouf (2002) 595-599 4. C.L. Chen., F.Y. Chang.: Design and analysis of neural/fuzzy variable structure PID control systems. IEE Proc. Control and Application, Vol.143, No.2 (1996) 200-208 5. N. Wang., J. Tu., J. Chen.: Intelligent Control Using the Single Adaptive Neuron. J. Hangzhong Univ. of Sci. & Tech, Vol.21, no.3 (1993) 31-35 6. N. Wang., J.Tu., J. Chen.: Neuron intelligence control for electroslag remelting processes. ACTA Automatica SINICA, vol.19, no.5 (1993) 634-636 7. N. Wang., Chen J., Wang J. C.: Neuron intelligent control for hydraulic turbine generators. Proceedings of the IEEE International Conference on Industrial Technology (1994) 288-292 8. Q.G. Wang.,et al.: A multiple-model based design method for the control of uncertain multivariable systems with paper machines. ACTA Automatica SINICA, vol.17, no.1 (1991) 68-76
An Image Retrieval System Based on Colors and Shapes of Objects Kuo-Lung Hong1, Yung-Fu Chen2,*, Yung-Kuan Chan3, and Chung-Chuan Cheng1 1
Department of MIS, Chaoyang University of Technology, Taichung 413, Taiwan {klhung, s9314638}@cyut.edu.tw 2 Department of CSIE, National Chin-Yi Institute of Technology, Taichung County 411, Taiwan
[email protected] 3 Department of MIS, National Chung Hsing University, Taichung 402, Taiwan
[email protected] Abstract. This paper proposes a color-shape based method (CSBM) based on color, area, and perimeter intercepted lengths of segmented objects in an image. It characterizes the shape of an object by the intercepted lengths obtained by intercepting the object perimeter by eight lines with different orientations passing through the object center. The experimental results show that CSBM provides a better performance than fuzzy color histogram (FCH) and conventional color histogram (CCH). Besides, it is insensitive to translation, rotation, distortion, scaling, and hue variations, but impressionable to contrast and noise variations. Keywords: Content-based image retrieval, Perimeter intercepted length, Fuzzy c-mean histogram (FCH), Conventional color histogram (CCH).
1 Introduction Context-based image retrieval has been studied for more than two decades, which generally works with features including color, texture, and shape [1]. Color histogram is one of the most commonly adopted features for designing image retrieval systems [2]. The advantages of color histogram are simple in operation and easy for calculation. In order to improve the efficacy of the conventional color histogram (CCH), a fuzzy-based technique, namely fuzzy color histogram (FCH) [3], is proposed for color histogram construction. It considers the degree of color similarity for each pixel to be associated with all the histogram bins using fuzzy memberships, in which the memberships are calculated based on the fuzzy c-means algorithm [3]. It is demonstrated to be more robust than the CCH in dealing with quantization errors and changes in light intensity. CCH and FCH, however, embed a significant drawback that they can delineate only the global properties of an image. To overcome this problem, this paper proposes a color-shape based method (CSBM) based on the colors, areas, and intercepted lengths obtained by intercepting the object perimeter by eight lines with different orientations passing through the object center for segmented objects. The intercepted lengths are demonstrated to be effective in discriminating shapes of the objects and are immune to translation and rotation variations. *
Corresponding author.
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1094 – 1098, 2006. © Springer-Verlag Berlin Heidelberg 2006
An Image Retrieval System Based on Colors and Shapes of Objects
1095
2 Color-Shape Based Method If the possible pixel colors in a color image, for example trademark, cartoon, flag, traffic sign, and synthesized images, are reduced to fewer colors, it usually turns into an image consisting of several large regions that each is made up of a set of pixels having the same color. Since it is very difficult to segment the objects contained in an image, this paper regards a region consisting pixels with identical color as an object. Two similar images generally contain several bigger objects that have similar colors and shapes. Figure 1(a) shows an image containing five objects, in which A and B are usually treated as different objects because of different shapes even with identical color. Objects A and C are also different because of distinctive colors although having the same shapes and sizes. A B C D
(a)
E
(b)
Fig. 1. (a) An image and (b) the 8 perimeter intercepts of an object bounded by an MBR
(a)
(b)
(c)
Fig. 2. Objects having the same area but different shapes (PILs)
2.1 Feature Extraction and Image Matching CSBM first classifies the pixels of images into K clusters using the K-mean clustering algorithm [4]. The mean value of each cluster is regarded as a representative color in a color palette, namely common color palette (CCP), for all images including database and query images. For an image I, its color-reduced image I' containing only K colors is generated by replacing each pixel in I with a closest color found in CCP. Every object has its distinctive color, area, and shape. For an object O with an area greater than a threshold value ThA, features including color, area, and perimeter intercepted lengths (PILs) are calculated and recorded; otherwise, it is treated as a noise and ignored. As shown in Fig. 1(b), a minimal bounding rectangle (MBR) with its sides parallel to the X and Y axes is applied to enclose the object. The area is featured as the number of pixels within O, whereas the shape as a set of 8 perimeter intercepted lengths obtained by 8 lines passing through the central pixel of the MBR with different orientations intersecting the perimeter of object O. The orientations of these eight lines are separated by 22.5˚ from 0˚ to 180˚. For object O, CSBM records its color, area, and PILs into a database. The difference between the PILs of two objects is demonstrated to be a good indicator of shape variation. As shown in Fig. 2, although the areas of all three objects are very close, their shapes are greatly different. To achieve rotation invariant, two objects are compared in 8 different orientations that the smallest Euclidean distance is used as an indicator for shape differentiation. Let I be an image with the size of H×W pixels. To remedy the problem caused by scale variation, this paper normalizes the features by dividing each value of PILs by
1096
K.-L. Hong et al.
≦ ≦
≦
(H+W) and the object area by (H×W), respectively. Let Ohiq (1 i nhq ) and Ohjd (1 nhd ) be the objects with the h-th color in CCP obtained from I q′ and I′d , colorj reduced images of the query image Iq and database image Id, respectively. Also, nhq and nhd denote the total numbers of objects in I q′ and I′d with h-th color in CCP. CSBM calculates the distance dij between objects Ohiq and Ohjd according to Eq. (1):
≦
d ij = d ijL + w × d ijA
(1)
in which w is a weighted constant, whereas d ijL and d ijA are the distances for the PIL q and area, respectively, between Ohiq and Ohjd . With PILs denoted as L ={ l0q , l1q , …, l7q } d and L ={ l0d , l1d , …, l7d } and areas as Aiq and Adj for Ohiq and Ohjd , d ijL and d ijA in Eq. (1) can be calculated from the following equation with z is a given constant. 7
7
z A q d d ijL = z min (∑ l sq − l(dr + s ) mod8 ) and d ij = Ai − A j r =0
(2)
s =0
Algorithm OBJ_Match( ) q d Input: O hq1 , O hq2 ,…, O hn O hd1 , O hd2 ,…, O hn d , q and h h h Output: MMD MMD0h,0 = 0 for i = 0 to nhq q MMDih,0 = Penalty( O hi )+ MMDih 1,0 d for j = 0 to nh d MMD0h,j = Penalty( O hj )+ MMD0h,j 1 q for i = 1 to nh for j = 1 to nhd q MMDih,j =min( MMDih 1,j + Penalty( O hi ), MMDih,j d h +Penalty( O hj ), MMDi, 1j 1 +di,j)
1
CSBM adopts the dynamic programming method, as described in Algorithm OBJ_Match( ), to calculate the minimal matching distance between two sets of objects Ohq and Ohd , in which the objects are sorted by area in descending order. In the h algorithm, a two-dimensional nhq × nhd matrix MMD is used to record the minimal matching distance between Ohq and Ohd . An element in the matrix is represented as h MMD (i, j) which indicates the minimal matching distance between any one object in q { Oh1 , Ohq2 , …, Ohiq } and any one in { Ohd1 , Ohd2 , …, Ohjd }; therefore, the element MMD h (nhq , nhd ) denotes the final minimal matching distance between Ohq and Ohd . For
an object Ohm in either a query with 1 ≤ m ≤ nhq or a database image with 1 ≤ m ≤ nhd which does not have its corresponding matched object in its counterpart, a penalty is given according to the equation defined in Eq. (3) with Ahm indicating the area of Ohm. Penalty(Ohm ) = z
7
∑ (l ) r =0
r
z
+ Ahm × w
(3)
An Image Retrieval System Based on Colors and Shapes of Objects
1097
Finally, the image matching distance Dist between Iq and Id is defined as K
Dist = ∑ MMD h (nhq , nhd ) . h =1
2.2 Experimental Images and System Evaluation In the experiment, an image set consists of 100 true-color images are used as the query images. Additionally, a database containing images with variations, including rotation, distortion, noise, scaling, hue, luminance, and contrast, are generated in Adobe Photoshop 7.0 package from each query image is constructed. In Table 1, examples of query images and their geometry (rotation, scaling, and translation), color (luminance, contrast, and hue), noise, blurring, and distortion variations are demonstrated for testing the performance of three methods. Table 1. Examples of query images and their corresponding images with variations Query
Trans. & Rot. Lumin.
Contrast
Noise
Distort.
Scaling
Hue
To evaluate the system performance, average normalized modified retrieval rank (ANMRR), as shown in Eq. (4), proposed by MPEG-7 [5] is used as a benchmark. It not only reflects the recall rate and precision information of the retrieved images, but also shows the ranks of all the retrieved images. Q
ANMRR = ∑ ( q =1
AVR(q ) − 0.5 − 0.5 × I (q ) ) / Q with T + 0.5 − 0.5 × I (q )
I (q)
AVR (q ) = ∑ R(t ) / I (q)
(4)
t =1
in which T is Min{4×I(q), 2×GTM} with GTM representing Max{ I(q)} for each query q. Also I(q) indicates the number of returned images which are most similar to the query image and Q indicates total number of queries for evaluation. Notice that smaller ANMRR value indicates better retrieval performance.
3 Experimental Results Since CSBM can achieve the best performance if the parameters, w, z, K and ThA, have been set to 0.009, 1.1, 27 and 5, respectively, hence the parameters are set accordingly for the experiment. Table 2 compares the performances among CSBM, FCH, and CCH. The results show that CSBM is immune to geometry, distortion, and hue variations; slightly susceptible to luminance variation; and very sensitive to contrast and noise variations. FCH and CCH, on the other hand, can resist scale variation, slightly to noise and distortion variations, and very susceptive to luminance and
1098
K.-L. Hong et al.
contrast variations. Moreover, FCH is very vulnerable to geometric variations and CCH to hue variation. The average ANMRR for CSBM (0.147) is smaller than FCH (0.206) and CCH (0.231), which indicates that it provides better retrieval performance. Table 2. Comparisons among CSBM, FCH, and CCH retrieval performance (T=2) ANMRR Tran & (Rank 1, Rank 2) Rot CSBM FCH CCH
0.000 (100,100) 0.090 (86,96) 0.095 (84,97)
Luminance
Contrast
Noise
Distortion
Scaling
0.145 (80,87) 0.535 (43,50) 0.455 (51,58)
0.435 (53,60) 0.330 (61,73) 0.405 (54,65)
0.320 (63,73) 0.190 (80,82) 0.205 (76,83)
0.030 (96,98) 0.145 (83,88) 0.110 (86,92)
0.075 0.025 0.147 (89,96) (97,98) 0.000 0.150 0.206 (100,100) (83,87) 0.065 0.280 0.231 (89,98) (67,77)
Hue
Mean
4 Discussions and Conclusions The method adopts the colors, areas, and PILs of segmented objects to describe the characteristics of an image. The PILs of an object can be used to effectively characterize the shape of the object. Since the same objects with different luminance or color contrasts might be regarded as different objects, all the three methods cannot achieve good performance for these variations. Noises may generate various small isolated objects which will affect the minimal matching distance between two groups of objects having identical color; CSBM is hence very sensitive to noise variation. In conclusion, CSBM is more robust in resisting geometric, luminance, distortion, and hue variations, and is more susceptible to noise variation than the FCH and CCH. It provides better retrieval performance than the other two methods based on mean ANMRR. Acknowledgment. This work was funded in part by National Science Council of Taiwan under Grant NSC93-2213-E-212-038 for Y. F. Chen.
References 1. Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R.: Content-Based Image Retrieval at the End of the Early Years. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22 (2000) 1349-1380. 2. Swain, M. J. and Ballard, D. H.: Color indexing. International Journal of Computer Vision, Vol. 7, (1991) 11-32. 3. Han, J. and Ma, K. K.: Fuzzy color histogram and its use in color image retrieval. IEEE Transactions on Image Processing, Vol. 11 (2002) 944-952. 4. Su, M. C. and Chou, C. H.: A Modified Version of the K-means Algorithm with a Distance Based on Cluster Symmetry. IEEE Transactions on Pattern Analysis and Machine Intelligence, (2001) 674-680. 5. Manjunath, B. S., Ohm, J. R., Vasudevan, V. V., and Yamada, A.: Color and texture descriptors. IEEE Transaction On Circuits and Systems for Video Technology, Vol. 11, (2001) 703-715.
A Hybrid Mood Classification Approach for Blog Text Yuchul Jung, Hogun Park, and Sung Hyon Myaeng∗ School of Engineering, Information and Communications University, South Korea 119, Munjiro, Yuseong-gu, Daejeon, 305-732, Korea {enthusia77, gsgphg, myaeng}@icu.ac.kr
Abstract. As an effort to detect the mood of a blog, regardless of the length and writing style, we propose a hybrid approach to detecting blog text’s mood, which incorporates commonsense knowledge obtained from the general public (ConceptNet) and the Affective Norms English Words (ANEW) list. Our approach picks up blog text’s unique features and compute simple statistics such as term frequency, n-gram, and point-wise mutual information (PMI) for the SVM classification method. In addition, to catch mood transitions in a given blog text, we developed a paragraph-level segmentation based on a mood flow analysis using a revised version of the GuessMood operation of ConceptNet and an ANEW-based affective sensing module. For evaluation, a mood corpus comprised of real blog texts has been built semi-automatically. Our experiments using the corpus show meaningful results for 4 mood types: happy, sad, angry, and fear.
1 Introduction A blog is a web site, where anybody can write about his or her own personal expriences and thoughts on a voluntary basis. As a result, it reflects user’s personality and cultural biases, sometimes forming a unique society. Since blog texts often carry the emotions of the writers, they should lend themselves for automatic categorization based on moods. Compared to topicality-based classification of text, mood classification is challenging in many aspects. A recent approach to mood classification of text uses Support Vector Machine (SVM) with 6 features: frequency counts, lengths, semantic orientations, Point-wise Mutual Information for Information Retrieval (PMI-IR), emphasized words, and special symbols [1]. While this approach of using surface level features can allow for reasonable accuracy, it seems to have a limit because its inability to deal with idiosyncratic aspects of moods and blogs. For example, although an author is under a certain mood when starting to write a blog document, the initial mood may not be maintained all the way to the end. Some blogs are so intertwined that even human readers would have difficulty in identifying the mood, not to mention the statistically motivated method using surface level features. To detect the mood of blog text more accurately, we propose a hybrid approach to mood classification that incorporates commonsense knowledge obtained from the ∗
Corresponding author.
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1099 – 1103, 2006. © Springer-Verlag Berlin Heidelberg 2006
1100
Y. Jung, H. Park, and S.H. Myaeng
general public (ConceptNet) [2] and the Affective Norms English Words (ANEW) list [3]. ConceptNet is an easily usable, freely available commonsense knowledge base and natural language processing toolkit which support many practical textualreasoning tasks over real-world documents, including topic-gisting, affect-sensing, analogy-making, and other context-oriented inferences. The knowledge base is a semantic network presently consisting of over 1.6 million assertions of commonsense knowledge covering the spatial, physical, social, temporal, and psychological aspects of everyday life. The ANEW list, created from a psychological study, contains 1,034 unique terms with affective valence (unpleasant ~ pleasant), arousal (calm ~ excited), and dominance (submissive ~ dominance) scores. It can be used to identify different mood types based on lexical analysis by mapping terms in text to those in the list. Our approach is hybrid in the sense that several tools are integrated: the SVM classification model [4] that has shown superior performance over other existing classification models in many application domains, the GuessMood function of ConceptNet [2], and an affective sensing model based on ANEW [3], and Open Mind Common Sense (OMCS)1[5]. We observed that some features like term frequency, PMI-IR, emoticons, abbreviated words, and mood-specific terms contribute to detection of the mood of a given text. In addition, a paragraph level segmentation and a mood flow analysis were applied to handle various blog texts of different lengths and writing style. For an evaluation of our hybrid mood classification approach, we have built a mood corpus based on a large number of blog documents extracted from Live Journal.com. More than 50GB text has been processed to semi-automatically classify the documents into four categories: happy, sad, angry, and fear.
2 Proposed System Our system includes two steps as in Fig. 1. In the first step, when a blog document comes in, the system performs statistical analyses to obtain term frequency, n-gram, and PMI-IR[7] sequentially. Based on the statistical features, the system applies SVM2 based mood classification to assign a mood category to the document. In the second step, the system initiates a mood flow analysis to identify a global mood for the given blog document. As in the details found in Fig. 2, our mood flow analyzer segments a blog document into paragraphs. After that, in paragraph analysis, the number of mood terms is counted to select a scheme between a revised ConceptNet’s GuessMood [2] and a PAD [3] based affective sensing module. If the number of mood terms is bigger than the experimental threshold, the latter is chosen. If a mood is sustained without transitions throughout the whole blog document, the “final resolver” module only checks the consistency and assigns the mood as the final result. When some paragraphs have different mood types, heuristically measured weights are multiplied into the results of paragraph analysis. A global mood score is 1 2
http://commonsense.media.mit.edu/cgi-bin/search.cgi http://svmlight.joachims.org/
A Hybrid Mood Classification Approach for Blog Text
1101
calculated by averaging the weighted sum of every paragraph analysis in the final resolution phase. The heuristic weights were obtained through several hundreds of trials with our training corpus (Fig. 2).
Fig. 1. Overall Flow
Fig. 2. Details of Mood Flow Analysis
3 A Blog Mood Corpus We have collected over 50GB of blog text from “LiveJournal.com” semiautomatically, which are appropriate for our mood categories (happy, sad, angry, and fear) and the ANEW list to build a trustable mood corpus that can be used for training/testing purposes. In order to select blogs which fall into our mood categories, we adopted LiveJournal’s mood hierarchy3. In this process, 108,892 blog documents were extracted from 50G blog text. Because the authors’ mood “annotation” sometimes doesn’t match the real content, we performed a specially designed refinement process for more trustable mood corpus. It uses a keyword spotting technique to remove non-affective parts of the text, which contain few or no mood key terms that were predefined. In addition, to avoid irregular, meaningless, and ambiguously long blog documents, it ignores blog texts whose length is less than 5 or more than 40 sentences. Finally, now we have 10,479 blog documents (about 10MB) as a highly refined mood corpus.
4 Experiments and Discussion For improved accuracy of mood classification, diverse features of blog documents were considered. In addition, each of the two methods, one with GuessMood of ConceptNet and the other with the PAD based affective text sensing model. Internal 3
http://www.livejournal.com/moodlist.bml?moodtheme=140&mode=tree
1102
Y. Jung, H. Park, and S.H. Myaeng
parameters were tuned based on experimental work. The main goal of the experiments was to evaluate the hybrid mood classification model. In case of the happy and sad mood categories, classification results approached to the range of 89~92%. Table 2 shows classification accuracy comparisons between the baseline SVM and our proposed approaches. A total of randomly chosen 20,000 blog documents (happy: 5,000, sad: 5,000, angry: 5,000, and fear: 5,000) and semi-automatically refined 5,000 blog documents (happy: 2000, sad: 1800, angry: 600, fear: 600) were used for training and testing, respectively. 5-fold cross validation was taken for SVM classifier’s evaluation. In Table 2, “random” and “refined” mean the corpus with 20,000 documents and that with 5,000 documents, respectively. Table 2. Mood classification: SVM vs. Our Approach
Testing Type Type 1. Training: Random + Testing: Random Type 2. Training: Refined + Testing: Random Type 3. Training: Refined + Testing: Refined
Category Happy Sad Angry Fear Average Happy Sad Angry Fear Average Happy Sad Angry Fear Average
SVM 59.30 35.60 44.80 42.10 45.45% 77.80 43.15 26.67 27.23 43.71% 90.13 89.58 80.77 57.68 79.54%
Our Approach 55.14 (-4.16) 31.94 (-3.66) 38.28 (-6.52) 35.12 (-6.98) 43.24% 76.79 (+9.99) 38.37 (-4.78) 23.25 (-3.42) 25.40 (-1.83) 40.95% 92.85 (+2.72) 89.27 (-0.31) 83.12 (+2.35) 61.97 (+4.29) 81.80%
Although the semantic network of ConcpetNet consists of 1.6million assertions, it contains lots of needless commonsense knowledge that is not required for processing mood related concepts. Thus, we have re-organized the semantic network by filtering out unnecessary concepts, forming the “refined” corpus. SVM: Randomly selected training data were not sufficient in constructing a classifier; with the limited coverage, its performance was only 45.45% on average. When the SVM classifier was tested with randomly selected testing data, which contains inconsistent lexical features, it almost failed in getting a reasonable level of accuracy except for the happy mood. In case of Type 2, although highly refined training data were used as the training corpus, classification result became worse due to heterogeneity between the training and testing corpora. However, highly refined training data helped in achieving average 79.54% of accuracy if well refined testing data were also used.
A Hybrid Mood Classification Approach for Blog Text
1103
Our Hybrid Approach: In every testing, classification accuracy for the happy mood was enhanced when the refined corpus was used in training. However, in case of Type 1&2, when randomly chosen testing data were used, there were ups and downs. On the other hand, our hybrid approach obtained 81.80% of accuracy on average when well refined training and testing data were used. While the revised GuessMood and the PAD based affective text sensing modules caused noisy classification results because the semantic network is not comprehensive enough to cover all the moodrelated terms that appear in the blog text, the experimental result with the refined data indicates the resulting is promising. Even the relatively small size of training corpus of refined data allowed for quite reasonable performance.
5 Conclusion and Future Work This paper presents a hybrid model for mood classification of blog text, which uses statistical features, an informal commonsense reasoning with ConceptNet, and a PAD based affective text sensing method. In addition, a semi-automatically refined mood corpus has been built and used to evaluate our proposed model. Mood classification of blog documents is a very difficult task because of diverse situations and expressions of authors. Although we can hardly catch author’s internal, emotional status correctly, at least we could perceive a global mood for a given blog text at the surface level if statistical features and commonsense knowledge are incorporated. With the experiments using the blog text, we have developed a firm belief that sophisticated preprocessing and a specially designed hybrid mood classifier are quite feasible for mood classification of blog text.
References 1. Mishne, G.: Experiments with Mood Classification in Blog Posts. Style2005 – the 1st Workhops on Stylistic Analysis of Text for Information Access, at ACM SIGIR 2005, (2005) 2. Liu, H. and Singh, P.: ConceptNet – A practical commonsense reasoning tool-kit. BT Technology Journal, (2004), 211-226. 3. Bradley, M.M., and Lang, P.J.: Affective norms for English words (ANEW). Gainesville, FL. The NIMH Center for the Study of Emotion and Attention, University of Florida, (1999). 4. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Proc. of ECML-98, 10th European Conf. on Machine Learning, Springer-Verlag, (1998), 137-142 5. Singh, P., Lin, T., Mueller, E. T., Lim, G., Perkins, T., and Li Zhu, W.: Open Mind Common Sense: Knowledge acquisition from the general public. Proc. of the First Int. Conf. on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems, (2002). 6. Liu, H., Lieberman, H., and Selker.: A Model of Textual Affect Sensing using Real-World Knowledge. Proc. of the 2003 Int. Conf. on Intelligent User Interfaces, IUI 2003, (2003). 7. Read, J.: Recognising affect in text using pointwise-mutual information. Master’s thesis, University of Sussex, (2004). 8. Mehrabian, A.: Framework for a comprehensive description and measurement of emotional states. Genetic, Soccial, and General Psychology Monographs, 121(3), (1995), 339-361.
Modeling and Classification of Audio Signals Using Gradient-Based Fuzzy C-Means Algorithm with a Mercer Kernel Dong-Chul Park1, Chung Nguyen Tran1 , Byung-Jae Min1 , and Sancho Park2 1
Dept. of Information Engineering, Myong Ji University, Korea {parkd, tnchung, mbj2000}@mju.ac.kr 2 Davan Tech Co., Seongnam, Korea
[email protected] Abstract. In this paper, we propose a noble classification algorithm for content-based audio signal retrieval. The algorithm uses the GradientBased Fuzzy C-Means with a Mercer Kernel (GBFCM(MK)) to perform clustering of Gaussian Probability Density Function (GPDF) data of a Gaussian Mixture Model (GMM). The GBFCM(MK) algorithm incorporates a kernel method into the GBFCM to implicitly perform nonlinear mapping of the input data into a high-dimensional feature space. Experiments and results for several audio data sets have shown that the GBFCM(MK)-based classification algorithm has accuracy improvements of 3.14%-7.49% over classification algorithms employing the traditional k-means and the Fuzzy C-Mean (FCM), respectively.
1
Introduction
In recent years, various automatic and computerized methods based on the content-based analysis of audio data have been proposed for audio data classification and retrieval. Essentially, content-based classification of audio data can be performed by using a pattern recognition approach with a consideration of two issues: feature extraction and classification of extracted features. Acoustical features, such as loudness, pitch, brightness, bandwidth, and harmony have been most widely used to discriminate speech and music signals [1]. To further specifically classify music signals, music-oriented features, such as timbral texture, rhythmic content, and pitch content have been proposed [2]. In order to model and classified extracted features, statistical model based on a Gaussian Mixture Model (GMM) are widely used because of its computational simplicity [1,2]. The GMM is estimated from audio data by considering audio data as mixtures of Gaussian Probability Density Function (GPDF) data. The GPDF data are presented by a mean vector and a covariance matrix, which are obtained by using a clustering algorithm to distill natural groupings of data from a large data set. For clustering, the k-means and Fuzzy C-Means (FCM) algorithms are widely used [3,4]. As an improvement of the FCM, Gradient Based Fuzzy C-Mean (GBFCM) [5] exploits the characteristics of the Kohonen’s Self Organizing Map [6] to improve the speed and computational complexity Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1104–1108, 2006. c Springer-Verlag Berlin Heidelberg 2006
Modeling and Classification of Audio Signals
1105
of the FCM. However, these algorithms lack the ability to deal with data in which boundaries among clusters are nonlinear, and this shortcoming leads to inefficiencies in forming the mixtures of the GMM. In this paper, we propose a noble model that can efficiently classify audio signals by employing a Gradient Based Fuzzy C-Mean with a Mercer Kernel (GBFCM(MK)) for the clustering of GPDF data of GMMs. The GBFCM(MK) algorithm employs a kernel method [7] called the Mercer kernel to implicitly perform nonlinear mapping of the input data into a high-dimensional feature space. By doing so, complex nonlinear classification boundaries in the input space with the original dimensions can more likely be treated linearly in the expanded feature space [7]. By incorporating the kernel method into the GBFCM, the GBFCM(MK) has the ability in dealing with nonlinear boundary clusters in GPDF data. When applied to the clustering of GPDF data of GMMs, the GBFCM(MK) can form mixtures of GMMs more efficiently than the conventional k-means or the FCM. The remaining of this paper is organized as follows: A brief review of the GBFCM is given in Section 2. Section 3 introduces the GBFCM(MK) algorithm. Section 4 presents our experiments and results for several audio data sets. The conclusions and closing remarks are presented in Section 5.
2
Gradient Based Fuzzy C-Means (GBFCM)
The GBFCM was first introduced in [5]. The algorithm attempts to improve the FCM algorithm by minimizing the objective function using one input data at a time instead of the entire input data. Given one data xi and c clusters with centers at v j , (j = 1, 2, · · · , c), the objective function to be minimized is: Ji = μ21i (v 1 − xi )2 + μ22i (v 2 − xi )2 + · · · + μ2ci (v c − xi )2
(1)
with the following constraint: μ1i + μ2i + · · · + μci = 1
(2)
The group centers are updated as follows: v k+1 = v k − 2ημ2ki (v k − xi )
(3)
where η is a learning constant and the membership grades are defined as: μki = c
1
di (xk ) 2 j=1 ( dj (xk ) )
More detail about GBFCM can be found in [5].
(4)
1106
3
D.-C. Park et al.
Gradient Based Fuzzy C-Means with a Mercer Kernel
The objective function in the FCM or GBFCM with a kernel can be rewritten in feature space with the mapping function Φ: JiΦ =
c
μ2ki Φ(v k ) − Φ(xi )2
(5)
k=1
By using the kernel substitution [7] with the Gaussian kernel function, the objective function becomes: JiΦ = 2
c
μ2ki (1 − K(v k , xi ))
(6)
k=1
In order to minimize the objective function with a kernel, we use the steepest gradient descent algorithm. The learning rule can be summarized as follows: Δv k = η(v k − xi ) = η
∂JiΦ ∂v k
(7)
In the case of the Gaussian kernel function, the objective function in Eq.(6) can be rewritten as: JiΦ = 2
c
μ2ki (1 − e−
vk −xi 2 σ2
)
(8)
k=1
By substituting Eq.(8) into Eq.(7), the group centers are updated as follows: v k+1 = v k − ημ2ki σ −2 K(v k , xi )(v k − xi )
(9)
By solving optimization condition in Eq.(6) with respect to the constraint in Eq.(2) using the Lagrange multiplier, the membership grades are updated as follows: μki = c
1
1−K(v k ,xi ) j=1 ( 1−K(vj ,xi ) )
(10)
More detail on GBFCM(MK) can be found in [8].
4
Experiments and Results
A data set of 2,100 audio signals was used for experiments, including speech data and music data. The speech data consisted of 300 excerpts for males and 300 excerpts for females. The music data comprised of 5 genres, including country, folk, jazz, hip-hop, and rock. Each signal was a 30s-long excerpt, totaling more than 17 hours of audio data. To extract features from audio signals, we used an open source framework, Marsyas, which is provided by Tzanetakis [2].
Modeling and Classification of Audio Signals
1107
Table 1. Classification accuracy (%) of speech/music classifier using different algorithms and 3 code vectors
K-means FCM GBFCM(MK)
Speech 95 96 98
Music 87 89 93
Overall 91.0 92.5 95.5
Table 2. Classification accuracy (%) of speech classifier (male/female) using different algorithms and 3 code vectors
K-means FCM GBFCM(MK)
Male 99 99 100
Female 70 73 79
Overall 84.5 86 89.5
The audio signals were classified into a hierarchy of genres. First, the audio signals were classified into speech or music signals, using the speech/music classifiers. Furthermore, the speech signals were classified into male speech signals and female speech signals, while the music signals were classified into 5 genres, including country, folk, jazz, hip-hop, and rock. Table 1 compares the classification accuracies of classification algorithms using the conventional k-means, the FCM, and the GBFCM(MK) for a speech/music classifier. The results were obtained by using 400 speech excerpts and 400 music excerpts for training. The remaining 200 speech excerpts from the speech data and 200 music excerpts are were for testing. As can be seen from Table 1, the classification model using the GBFCM(MK) outperforms those using the conventional k-means and the FCM in every case. After classifying audio signals into speech or music signals, the speech signals were further classified into male or female speech signals. A summary of classification accuracy for speech classifier using 3 code vectors is given in Table 2. As can be seen from Table 2, overall classification accuracies of 84.5%, 86%, and 89.5% were achieved by using the conventional k-means, the FCM, and the GMFCM(MK), respectively. To classify music signals into 5 genres, 200 excerpts from each genre were used for training, while the remaining 100 excerpts from each genre were used for testing. The classification accuracy of the music classifier using 5 code vectors is given in Table 3. As can be seen from Table 3, overall classification accuracies of 64.2%, 65.2%, and 69.4% were achieved by using the conventional k-means, the FCM and the GMFCM(MK), respectively. Table 4 shows the confusion matrix, which describes classification results for each genre in details. One significant point that can be inferred from the confusion matrix is that hip-hop and rock can be well discriminated from the others while country is quite likely to be confused.
1108
D.-C. Park et al.
Table 3. Classification accuracy of different algorithms (unit:%), using 5 code vectors Country K-means 42 FCM 42 GBFCM(MK) 47
Folk 45 48 57
Jazz 62 58 69
Hiphop 97 99 99
Rock 75 79 79
Overall 64.2 65.2 69.4
Table 4. Confusion matrix of audio genres, using 5 code vectors
Country Folk Jazz Hiphop Rock
5
Country 47 9 3 0 2
Folk 23 57 14 0 5
Jazz 9 13 65 0 10
Hiphop 9 7 11 99 4
Rock 12 14 7 1 79
Accuracy 47% 57% 65% 99% 79%
Conclusions
In this paper, a new approach for modeling and classification of audio signals using the GBFCM(MK) algorithm for clustering of GPDF data of GMMs is proposed. The GBFCM(MK) algorithm was formulated by incorporating the kernel methods with the GBFCM to manage nonlinear separation boundaries among clusters in GPDF data. The GMM using the GBFCM(MK) for clustering of GPDFs was applied to the audio signal classification problem. Our experiments and results for several audio signal data sets shown that respective improvements of as much as 7.49% and 3.14% can be archived over the conventional k-means and the FCM.
References 1. Lu, L., Zhang, H.J., Jiang, H.: Content Analysis for Audio Classification and Segmentation. IEEE Trans. on Speech and Audio Processing 10(7) (2002) 504-516 2. Tzanetakis, G., Cook, P.: Musical Genre Classification of Audio Signals. IEEE Trans. Speech and Audio Processing 10(5) (2002) 293-302 3. Hartigan, J.: Clustering Algorithms. New York, Wiley (1975) 4. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. New York, Plenum (1981) 5. Park, D.C, Dagher,I.: Gradient Based Fuzzy c-means Algorithm. IEEE Int. Conf. on Neural Networks, Vol. 3, ICNN-94 (1994) 1626-1631 6. Kohonen,T.: The Self-Organizing Map. Proc. IEEE, Vol. 78 (1990) 1464-1480 7. Muller, K.-R., Mika, S., Ratsch, G., Tsuda, K., Scholkopf, B.,: An Introduction to Kernel-Based Learning Algorithms. IEEE Transactions on Neural Networks 12(2) (2001) 181-201 8. Park, D.C, Tran, C.N., Park, S.: Gradient Based Fuzzy C-Means Algorithm with a Mercer Kernel. Lecture Notes in Computer Science (LNCS), Vol. 3971, SpringerVerlag, Berlin Heidelberg New York (2006) 1038-1043
A Quick Rank Based on Web Structure Hongbo Liu, Jiaxin Wang, Zehong Yang, and Yixu Song State Key Lab of Intelligent Technology and System, Department of Computer Science and Technology, Tsinghua University, Beijing, P.R. China
[email protected] Abstract. Hyperlink structure of the Web provides valuable information for ranking query results and has been used in some famous search engines. The development of search engines such as personalized and topic-sensitive search intensifies the need of quick rank algorithms. In this paper, a link based rank called ExpRank is proposed. It can converge quickly and reserve the fundamental features of PageRank. Experimental results comparing with PageRank on real dataset are also discussed.
1
Introduction
Search engines are successful applications of web mining technology and play more and more crucial roles on the Internet. PageRank proposed by Sergey Brin and Larry Page [1] is one of the most important link-based ranking algorithms and becomes the basis of ranking system used by Google search engine. One advantage of link-based algorithm is that it is query independent and content independent. It can be computed offline using only the link structure and is used when users submit queries to the search engine. The simplicity, robustness and effectiveness of link-based ranking method have been witnessed with the great success of Google. Since WWW contains billions of pages, it is necessary to accelerate the speed of ranking computation. Recent research on the personalized and topic-sensitive search [2,3] greatly intensified the need of faster ranking algorithm. Some accelerated methods [4,5] have been proposed to speed up the computation. However, these methods are limited by the intrinsic property of slow convergence of power method. In this paper, we proposed a new link-based ranking method called ExpRank which could dramatically accelerates the iterative computation and reserves the fundamental features of PageRank. In the next section, main idea and computation of ExpRank will be introduced. Some experimental results comparing with PageRank will be presented in section 3.
2 2.1
ExpRank and Its Computation Matrix Representation and PageRank
Consider web pages as nodes and the hyperlinks as directed links, the hyperlink structure of WWW can be represented as a directed graph G. In digraph G, let Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1109–1113, 2006. c Springer-Verlag Berlin Heidelberg 2006
1110
H. Liu et al.
u, v denotes a link from node u to node v, and let c(u) denotes the outdegree of node u. Let n be the number of nodes in the digraph, G can be represented with a n × n dimensional transition matrix P whose element Puv is 1/c(u) if there is a link from u to v and is zero otherwise. For example, the transition matrix of a small fraction of web containing six pages linked shown in Fig. 1 can be represented as ⎛ ⎞ 0 1 0 0 0 0 ⎜ 0 0 1/2 1/2 0 0 ⎟ ⎜ ⎟ ⎜ 0 0 0 0 1 0 ⎟ ⎜ ⎟. P=⎜ ⎟ ⎜ 0 0 0 0 1/2 1/2 ⎟ ⎝ 0 0 0 1/2 0 1/2 ⎠ 1/2 0 0 1/2 0 0 The rank vector on the graph represents the importance of each node. The basic conception underlying link based rank was that the link u, v can be viewed as evidence that v is important for u. In PageRank, the rank vector wT was defined recursively as wT = αw T P + (1 − α)v T ,
1
6
4
2
(1)
where w T and v T are both n dimensional 5 3 row vectors and α is the damping factor and 0 ≤ α < 1. v T is known as personalized vecFig. 1. Digraph representing web tor and can be assigned n dimensional uni- containing six pages form row vector generally. The iterative form of equation (1) can be written as w(k)T = αw(k−1)T P + (1 − α)v T .
(2)
Start from w(0)T , w (k)T can be calculated using power method. 2.2
ExpRank
Form Eq. (2), we can obtain w(k)T = w(0)T (αP )k + (1 − α)v T (I +
k−1
(αP )i ).
(3)
i=1
Since the spectral radius of P equals to 1, (αP )k converges to zero when k → ∞. Thus, when k → ∞, Eq. (3) can be written as w = (1 − α)v (I + T
T
∞
(αi P i )) = (1 − α)v T (I − αP )−1 .
(4)
i=1
Eq. (4) is the stable solution of PageRank algorithm, which can also be achieved by solving the linear equation of Eq. (1). Thus Eq. (4) is the ideal
A Quick Rank Based on Web Structure
1111
rank result attained from PageRank algorithm. It can be viewed as the accumulated effect of the vector sequence produced along with the transition of v T on digraph G. Thus, the generalization form of PageRank can be written as wT = v T (I +
∞
(Γi P i )),
(5)
i=1
where Γ is a sequence of coefficients converging to zero. Let Γ = {β n /n!|n ∈ N, β ≥ 0}, and this leads to our ExpRank algorithm. In ExpRank, Eq. (5) becomes ∞ βi ( P i )) = v T eβP . w = v (I + i! i=1 T
T
(6)
β is the decay factor used to adjust the decay rate of Γ , and v T is a personalized vector to customize ExpRank for the purpose of spam or user-dependent applications. Algorithm 1. ExpRank Computation k←0 tT ← v T wT ← vT repeat k ←k+1 tT ← βtT P w T ← w T + tT /k! δ = tT /k!1 until δ < ε
For the computation of ExpRank, the exponential of matrix does not need to be computed apparently, and only the results of vector-matrix multiplication were required. The process of ExpRank computation is described in Algorithm 1. Algorithm 1 is matrix-free and only nnz(P ) multiplications are needed for each iteration, where nnz(P ) is the number of non-zeros in P . Since generally there is no more than 10 non-zeros per row in P , O(nnz(P )) ≈ O(n). Only the storage of two vector tT and w T is required at each iteration. Thus, this algorithm is suitable for the size and sparsity of web matrix.
3
Experimental Results
In digraph G, the nodes without outlinks were called dangling nodes. In PageRank, danging nodes need to be pretreated to avoid rank sinking. This pretreatment is not necessary for ExpRank calculation, and ExpRank can be calculated with the transition matrix P including dangling nodes directly. However, for the
1112
H. Liu et al.
convenience of comparison with PageRank, all dangling nodes were removed in our experiments. The dataset used in our experiments was generated from a crawl within China in 2004. The link graph contains 92382 nodes and 238430 links. After removing the dangling node recursively, there are 33519 nodes and 164110 links left for the ExpRank and PageRank computation. In the following results of our experiments, β = 3, α = 0.85 and v T is uniform if not specified. The uniform vector was also assigned to w(0)T in PageRank calculation. ExpRank vector wTexp and PageRank vector wTpage were plotted in semilog plots in Fig. 2 with x-axis indicates labels of nodes. w Texp and wTpage are both normalized. As most rank values were very small, y-axes of wTexp and wTpage plots were logarithmically scaled. r T = wTexp − wTpage was also plotted in this figure. The average of rT is −6.398 × 10−19 and the variance is 6.586 × 10−10 . We measure the rates of convergence using the L1 norm of the residual vector, i. e., Δ(k) = w(k)T − w (k−1)T 1 . Since P is a stochastic matrix and v T is uniform, from Algorithm 1 we can obtain (k) that Δexp = β k /k!. Thus, the convergence rate of ExpRank is much faster than the convergence rate of PageRank O(αk ). The convergence rates of ExpRank and PageRank in our experiments were plotted on semi-log graph shown in Fig. 3. As the order of nodes is im−3 x 10 portant in a rank, the percent 4 of nodes reversed can be used wTpage − wTexp 2 0 as a valuable evidence to eval−2 uate the difference between −4 ExpRank and PageRank. Let 0 10 r1 and r2 are two different wTpage −2 rank containing m nodes, we 10 −4 define Ψ (r 1 , r 2 ) as the ratio 10 −6 of reversed order between two 10 0 rank, i.e., the number of node 10 wTexp pairs reversed between two 10−2 ranks divided by all possible 10−4 reversed node pairs. If the or- 10−6 0 5000 10000 15000 20000 25000 30000 der of a node changes from a u to b, there are |a − b| reverFig. 2. Comparison of ExpRank and PageRank sals. Thus, Ψ (r 1 , r2 ) can reflect the quantity of changed node pairs and the amplitude of the change of one node’s rank order simultaneously. Ψ (r Texp , r Tpage ) of top m nodes in ExpRank vs. m when m ≤ 1000 is plotted in Fig. 4. The relatively high difference in the low region of m is due to the small value of m which highlights the reversals of rank order. From Fig. 2 and Fig. 4, we can infer that the difference between the results of ExpRank and PageRank is not very obvious, especially for nodes with high rank, which is more meaningful for the users of search engines.
A Quick Rank Based on Web Structure 1
10
0.25
ExpRank PageRank
0
Ψ (w Texp , w Tpage )
10
Δ(k)
−1
10
−2
10
−3
10
−4
10
0
5
10
15
k
20
25
Fig. 3. The convergence rates of ExpRank and PageRank
4
1113
30
0.2
0.15
0.1
0.05
0
0
200
400
m
600
800
1000
Fig. 4. The ratio of reversed order for high ranked nodes
Conclusion
From the above discussion, we can conclude that in ExpRank the main features of PageRank were reserved such as query independent, content independent, simplicity, suitable for the huge and extreme sparse web matrix, etc. The convergence rate of ExpRank is much faster and ExpRank calculation can also be achieved using Krylov subspace approximation method [6]. So it could be a good candidate for the increasing needs of quick rank algorithms, along with the rapid development of World Wide Web and the progression of specific and customized search engines.
References 1. Brin, S., Page, L., Motwanl, R., Winogard, T.: The pagerank citation ranking: Bring order to the web. Technical report, Stanford University (1999) Available at http://dbpubs.stanford.edu:8090/pub/1999-66. 2. Jeh, G., Wedom, J.: Scaling personalized web search. Technical report, Stanford University (2002) Available at http://dbpubs.stanford.edu:8090/pub/2002-12. 3. Haveliwala, T.H.: Topic-sensitiv pagerank. In: Proc. of the Eleventh International World Wide Web Conference, New York: ACM Press (2002) 517–526 4. Kamvar, S.D., Haveliwala, T.H., Manning, C.D., Golub, G.H.: Extrapolation methods for accelerating pagerank computations. In: Proc. of the Twelfth International World Wide Web Conference, New York: ACM Press (2003) 261–270 5. Langville, A.N., Meyer, C.D.: Deeper inside pagerank. Internet Mathmatics 1 (2003) 335–380 6. Hochbruck, M., Lubich, C.: On krylov subspace approximations to the matrix exponential operator. SIAM J. Numer. Anal. 34(5) (1997) 1911–1925
A Biologically-inspired Computational Model for Perceiving the TROIs from Texture Images Woobeom Lee and Wookhyun Kim Department of Computer Engineering, Yeungnam University 214-1 Dae-dong, Gyeongsan-si, Gyeongbuk-do 712-749, Republic of Korea {beomlee, whkim}@ynu.ac.kr Abstract. This paper presents a biologically-inspired method of perceiving the TROIs(: Texture Region Of Interest) from various texture images. Our approach is motivated by a computational model of neuron cells found in the primary visual cortex. An unsupervised learning schemes of SOM(: Self-Organizing Map) is used for the block-based image clustering, plus 2D spatial filters referring to the response properties of neuron cells is used for extracting the spatial features from an original image and segmenting any TROI from the clustered image. To evaluate the effectiveness of the proposed method, various texture images were built, and the quality of the extracted TROI was measured according to the discrepancies. Our experimental results demonstrated a very successful performance.
1
Introduction
Texture analysis using a biologically-motivated Gabor scheme is the most effective technique and represents the sate-of-the-art in this area. Two major approaches have been studied for using merely one filter in this literature. One is the supervised method that uses a bank of Gabor filters[1,2]. These methods are restricted within a supervised problem of fore-knowledge and a high computational complexity issue. The other is the unsupervised method that designs a single Gabor filter, which is distinctly responding to the specific texture component[3,4]. Although it is an unsupervised methods, optimal filtering has focused on detecting only pertinent texture component and using the texture information inherent to a particular image with fore-knowledge. Accordingly, this paper proposes a biologically-inspired method like the human behavior recognizing the TROIs in an image without fore-knowledge, and providing a useful edge-information for object recognition from a query image. This paper focuses on implementing 2D spatial filters corresponding to the receptive field of neuron cells such as retinal ganglion cell(cG ) and two types of simple cell(cS1 , cS2 ) found in the primary visual cortex. cS1 cells extract a orientation selective spatial feature from an original image, and an unsupervised learning scheme of SOM clusters an original image into the block-based parts. Also, A cS2 cell responds to the selective attention for segmenting any TROI from the clustering results automatically without fore-knowledge, and A cG cell responses to the contrast extraction for detecting the edge of any TROI. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1114–1118, 2006. c Springer-Verlag Berlin Heidelberg 2006
A Biologically-inspired Computational Model
2
1115
A Self-organized Clustering Model(: cS1 Cell)
A cS1 cell is defined by using the asymmetrical difference of two Gaussian function with a preferred orientation φ, as follows: cS1 (x , y , φ) =
exp(−
x2 σe x2 y 2 ) − exp(− ) · exp(− 2 ) 2 2 2σe σi 2σi 2σen
(1)
where (x , y ) = (x cos φ + y sin φ, −x sin φ + y cos φ) are rotated coordinates, σe and σi represent the space constants of excitatory and inhibitory regions respectively, and σen determines the sensitivity of preferred orientation of the filter. This filter corresponds to a simple cell receptive field found in mammal’s visual cortex domain. Simple cells are excellent at detecting the presence of simple visual features, such as lines and edges of a particular orientation[5]. The output of cS1 cell is given by + 1+ AS1 cS (ξ, η, φ) · i(x + ξ, y + η)dξdη uS1 (x, y, φ) = ϕ −1 , (2) − 1+ AS1 cS (ξ, η, φ) · i(x + ξ, y + η)dξdη where AS1 denotes the radius of receptive field satisfying |ξ|2 + |η|2 |AS1 |2 , − c+ S (·) and cS (·) represent the strength coefficients of excitatory and inhibitory connections respectively, i(x, y) is a gray level intensity in image, and ϕ[·] is a step function. After respondence of the cS1 cells is completed for each preferred orientation in an image, the image is divided into the equal-sized block parts by the unsupervised learning scheme, which is proposed by Kohonen[6,7].
3
A Selective Attention Model(: cS2 Cell)
A cS2 cell corresponds to another simple cell receptive field found in mammal’s visual cortex domain. cS2 cells are more appropriates for a selective attention of an image containing a very specific frequency and orientation characteristic[5]. Thus, this cell acts as a local bandpass filter. This cell is defined by using a 2D Gabor function form, as follows: cS2 ( x, y; σ, u0 , v0 , λ, φ)
= g(x , y ; σ) · exp − 2πi(u0 x + v0 y) = g(x , y ; σ) · exp(−2πif0 x ) (3) = g(x , y ; σ) · cos(2πf0 x ) − isin(2πf0 x ) ,
where g(x, y; σ) =
(x/λ)2 + y 2 1 · exp − , 2 2πλσ 2σ 2
(x , y ) = (x cos φ + y sin φ, −x sin φ + y cos φ) are rotated coordinates, λ(= b/a) specifies the aspect ratio, σ is the standard deviation of the Gaussian envelope. Also, the radial center frequency f0 can be calculated as f0 = u20 + v02 ,
1116
W. Lee and W. Kim
and λ, φ, the center frequency (u0 , v0 ) of the Gabor function is defined as follows: u0 = fu /N, v0 = fv /M, λ = 0.5, φ = θ(= tan−1 (v0 /u0 )).
(4)
Here N and M , considering as N = M generally, are the spatial resolution of the perceived TROI, and 1/N is the frequency sample interval. Tuning frequency (fu , fv ) for a selective attention of a TROI is then defined as follows: n
k 1 (fu , fv ) = arg max(1< { F S − F S } (5) < t i =k=m) i=1(=t)
F Stk
where is the k-th maximum frequency in the TROIt , which corresponds to the k-th center frequency that is ordered by the Fourier spectrum of the TROIt , and F Si1 are the highest center frequency in the TROIi . Also, m is the number of the ordered spatial frequency candidate in the TROIt , and n is the number of the clustered TROIs in an image[7]. The response of a cS2 cell, uS2 (·) can be defined in the form of Eq. (6). 2 uS2 (x, y) = cR (ξ, η) · i(x + ξ, y + η)dξdη S2 AS2
+ AS2
cIS2 (ξ, η) · i(x + ξ, y + η)dξdη
2 1/2 (6)
where AS2 denotes the distance of receptive field satisfying |ξ/a|2 + |η/b|2 I |AS2 |2 , a and b denote the two scale parameters, and cR S2 (·) and cS2 (·) represent the strength coefficients of the real and imaginary parts respectively.
4
A Contrast Extraction Model(: cG Cell)
The edge detection of a TROI requires a binary image of the uS2 (·) output. Thus, the below Eq. (7) is applied to the uS2 (·) output of any TROI , ⎧ ⎨ 1 if ω × H/ω x ω × L/ω (7) uB (x, y) = ϕ uS2 (x, y) = ⎩ 0 otherwise. where H and L are the highest and lowest response value, respectively, for uS2 (·), in any clustered TROI, ω is the precision coefficient, and · and · denote the ceiling() function and f loor() function, respectively, for the truncation using the integer transformation. Thus, without any fore-knowledge or heuristic decision, the upper and lower bound for the binary image transformation of the uS2 (·) can be automatically determined. The final segmentation is achieved by the contrast-extracting property of a retinal ganglion cell. This cell corresponds to an on-center and off-surround receptive field of ganglion cells found in the retina of the visual pathways[5]. A cG
A Biologically-inspired Computational Model
1117
cell is defined by using a set of 2D circular symmetric difference of two Gaussian function, as follows: x2 + y 2 x2 + y 2 1 1 − A . (8) cG (x, y) = exp − exp − 2πσe2 2σe2 2πσi2 2σi2 where σe and σi represent the space constants of the excitatory and inhibitory regions, respectively, and the ratio of the space constants σe /σi = 1.6. The ratio yields a good approximation of the ideal Laplacian operator. The output of a cG cell can be defined by uG (x, y) = ψ cG (ξ, η) · uS2 (x + ξ, y + η)dξdη . (9) AG
where ψ[·] is a function for finding the zero crossing of the cG cell respondence.
5
Computer Simulation
To evaluate the quality of the segmentation performance, given more than 100 texture images obtained from the Brodaz texture book , the segmentation quality was measured according to the discrepancies based on the number of missegmented pixels[9], as defined below: N N Dk = 100 × Cik − Ckk / Cik (10) i=1
i=1
where Cij represents the number of cluster j pixels classified as cluster i in the segmentation results. The results were measured as close to 5% for the twotexture problem, and 7% for three-texture problem. this results show that the performance of the proposed system was very successful.
6
Conclusions
A biologically-inspired computational model was presented for unsupervised perceiving the TROIs from a texture image. This paper focuses on (1) implementing a 2D spatial filters corresponding to the receptive field of neuron cells such as ganglion cell in the retina and two types of simple cells found in the primary visual cortex, (2) proposing an unsupervised learning scheme for clustering the TROIs without fore-knowledge, and (3) detecting edge of any TROI from the perceived results automatically. However, several problems remain for future work: the size of a block unit and the number of preferred orientation. In particular, the selection method of the appropriate parameters, such as the orientation, phase, and aspect ratio, is an important task when tuning a selective attention. Consequently, when these problems are solved, and the object-shape definition of the extracted TROI is also included, the proposed method has potential application for the development of a real image query system.
1118
W. Lee and W. Kim 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 5 0 0 0 0 0 0
0 0 2 2 2 0 0 0 0 0 0 0 0 0 1 0
0 3 1 2 3 3 3 3 3 0 0 0 0 0 0 0
0 2 2 3 3 2 3 3 3 0 0 0 0 0 0 0
0 0 2 3 0 3 3 2 3 0 0 0 0 0 0 0
0 0 2 0 3 3 3 3 2 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 3 0 0 0 0 4 6 6 4 0 0
0 0 0 0 0 0 0 0 0 5 6 6 4 6 5 0
0 0 0 0 0 3 0 0 4 4 5 5 5 5 5 0
0 1 0 0 2 0 0 0 5 5 5 5 5 5 5 0
1 0 0 0 0 0 0 0 4 4 5 4 5 5 5 0
0 0 0 1 0 0 0 0 4 6 5 5 5 4 0 0
0 0 0 0 0 3 0 0 4 6 5 6 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 5 1 0 0 0 0 0 0
0 0 2 2 2 0 0 0 0 0 0 0 0 0 0 0
0 1 2 2 3 3 3 3 3 0 0 0 0 0 0 0
0 2 2 3 3 3 3 3 3 0 0 0 0 0 0 0
0 0 2 3 3 3 3 3 3 0 0 0 0 0 0 0
0 0 2 3 3 3 3 3 3 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 6 6 6 6 0 0
0 0 0 0 0 0 0 0 0 6 6 6 6 5 5 0
0 0 0 0 0 0 0 0 4 4 5 5 5 5 5 0
0 0 0 0 0 0 0 0 5 5 5 5 5 5 5 0
0 0 0 0 0 0 0 0 4 4 5 5 5 5 5 0
0 0 0 0 0 0 0 0 4 6 5 5 5 5 0 0
0 0 0 0 0 0 0 0 4 6 5 5 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
TROI1
TROI2
TROI3
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 1. Experimental Results: (a) Collage of the Brodaz textures Background(Sand), D112(Plastic bubbles) and D24(Pressed calf leather), where the size of original image is 512 x 512 pixel. [uS1 (·) and SOM ]: (b) Clustered map and (c) Merged map, where one of the block unit is scaled to a size of 32x32 pixel. Bounding box corresponds to the maximum square including the TROI and preserving the block-based connectivity with respect to the same label. (d) Results of browing the TROIs. [uS2 (·)]: (e) Result of a selective attention by tuning a spatial frequency of the TROI(D112). [uB (·)]: (f) Binary image of extracted TROI(D112). (g) Binary image of extracted the TROI(D24). [uG (·)]: (h) Edge detected image of the TROIs(D112 and D24) by zero-crossing.
References 1. Manthalkar, R., etc.: Rotation invarient texture classification using even symmetric Gabor filters. Pattern Recognition Letters 24 (2003) 2061-2068 2. Idrissa, M., Acheroy, M.: Texture classification using Gabor filters. Pattern Recognition Letters 23 (2002) 1095-1102 3. Tsai, D., etc.: Optimal Gabor filter design for texture segmentation using stochastic optimazation. Image and Vision Computing 19 (2001) 299-316 4. Clausi, D. A., Jernigan, M.: Designing Gabor filters for optimal texture seperability. Pattern Recognition 33 (2000) 1835-1849 5. Marr, D.: Vision: A computational investigation into the human representation and processing of visual information. W. H. Freedom&Company (1982) 6. Kohonen, T.: The self-organizing map. Proc. IEEE 78 (9) (1990) 1464-1480 7. Lee, W.B., Kim, W.H.: Texture Segmentation by Unsupervised Learning and Histogram Analysis using Boundary Tracing. In: Yue, H., et al (eds.): Computational Intelligence and Security. Lecture Notes in Artificial Intelligence, Vol. 3801. SpringerVerlag, Berlin Heidelberg New York (2005) 25-32 8. Fukushima, K.: Neural network model for extracting optical flow. Neural Neowrks 18 (2005) 549-556 9. Zhang, Y. J.: A survey on evaluation methods for image segmentation. Pattern Recognition 29 (8) (1996) 1335-1346
A Computer-Assisted Environment on Referential Understanding to Enhance Academic Reading Comprehension Wing-Kwong Wong1, Jian-Hau Lee1, Yu-Fen Yang2, Hui-Chin Yeh2, Chin-Pu Chiao2, and Sheng-Cheng Hsu3,4 1
Graduate School of Computer Science & Information Engineering, 2 Graduate School of Applied Foreign Languages, 3 Graduate School of Engineering Science & Technology, National Yunlin University of Science & Technology, Douliou, Yunlin, Taiwan {wongwk, g9317708, yangy, hyeh, chiaocp, g9010802}@yuntech.edu.tw 4 Department of Information Management, Nan Kai Institute of Technology, Tsaotun, Nantou, Taiwan Abstract. To comprehend English-written texts successfully, readers have to construct a referential map of textual information. Referential device is one of the important means for helping readers comprehend a text. The purpose of this study is to develop a computer-assisted environment to enhance EFL college students' comprehension. Four modules, natural language processing (NLP), user interface, recording, and feedback module are included. Among these four modules, the feedback module compares students’ initial maps with the expert’s. The results of comparison will inform students what referents are incorrect and offer them appropriate scaffoldings. The recording module records all of students’ behavioral data. From the data, the teacher can identify the difficulties students encounter and different performance levels among students with various reading proficiencies.
1 Introduction To comprehend a text successfully, the reader has to construct a comprehensible and coherent mental representation of the textual information in his/her memory [1]. The mental representation shows a network of the text, with nodes that indicate individual text elements and connections which show the meaningful relations [1]. The ability of integrating textual information is one of the essential skills for reading comprehension [2] [3]. Halliday and Hansan [4] proposed five cohesive ties to help the reader integrate textual meaning and form a coherent mental picture toward the information presented in a text; they are reference, substitution, ellipsis, conjunction, and lexical cohesion. Among these five cohesive ties, referential device accounts for 75% of the variation for students who learn English as a Foreign Language (EFL) in academic reading comprehension [5]. Three major types of references are included: personal, demonstrative, and comparative references. Personal reference refers to individuals or objects by specifying their functions or roles in the speech situation [4], such as “I”, “me”, and “you.” Demonstrative reference acts as a form of verbal locating, such Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1119 – 1124, 2006. © Springer-Verlag Berlin Heidelberg 2006
1120
W.-K. Wong et al.
as “this”, “these”, and “that.” The speaker figures out the referent by means of location, on a scale of proximity. Comparative reference refers to the indirect reference for indicating identity or similarity. As cohesive signals for connecting sentences in texts occur less often in Chinese [6], most Taiwanese EFL students are found to be less aware of them while reading English texts. They rarely use cohesive devices or referential words for integrating textual information [7][8]. The difficulty in identifying references results in their lower English reading proficiency. EFL college students are expected to be equipped with sufficient English reading skills since most of them have to acquire domain-specific knowledge by reading texts. That is, they are required to apply basic skills to read English for Specific Purpose (ESP) texts, such as statistics, physics, or chemistry. These basic skills include understanding relations within a sentence and between sentences, using cohesive and discourse markers, predicting, inferring, guessing, processing, and evaluating the information during reading [9]. This study aims to develop a computer-based reference instruction for enhancing their comprehension in reading texts and exploring the relationship between referential understanding and reading comprehension among EFL college students.
2 English Learning Environment The English learning environment built in this study includes four modules: natural language processing (NLP), user interface, recording, and feedback modules. Fig. 1 shows the system architecture of the learning environment. Natural Language Processing Module
User Interface Teacher Interface
Text
POS tagger
Learner Interface
Chunker Referential words
Referential device Selector
Sentences
Sentence Detector
Database
Recording Module Feedback Module
Expert Map
Fig. 1. System Architecture
The teacher designs the course, select texts for students to read, and input the texts to NLP module through a teacher interface. The NLP module picks all of the referential devices from each text and segments the text into sentences. The selected referential devices and the sentences are then saved in the database. The recording module records student’s keyboard and mouse actions while he/she constructs a referential map of the text. These recorded behavioral data are then
A Computer-Assisted Environment on Referential Understanding
1121
reported back to the teacher who identifies the difficulties the student encounter and the performances among groups of different reading proficiencies. After the student has constructed his/her initial referential map, the feedback module compares the map with the expert’s map. It then reports their differences back to the student. 2.1 Natural Language Processing Module To search all referential devices and segment a text into sentences automatically, we choose OpenNLP [10] to help us do part-of-speech tagging, chunking, and sentence detection. It provides a variety of Java-based tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection [10]. For example, the referential devices of the text in Fig. 2 are detected in Fig. 3. Dr. Chen is interested in using the sample data to make an inference about the average hours of useful life for the populations of all lightbulbs that could be produced with the new filament. Fig. 2. An example text
[NP the sample data] [NP the average hours] [NP the populations] [NP that] [NP the new filament] Fig. 3. Referential devices detected by NLP module
2.2 User Interface The user interfaces include teacher interface and learner interface. With the teacher interface, the teacher manages course data, input texts which students should read, and access students’ behavioral data. With the learner interface, students input the relationships among referents. Fig. 4 shows the learner interface.
A B E
D
C
Fig. 4. Learner interface and Feedback received by students
1122
W.-K. Wong et al.
A. Toolbar. Toolbar includes many graphic tools. Text tool can add text elements in the canvas; connection tool can establish meaningful relations among referential devices. Other tools in helping students manage the canvas are cut, copy, paste, erase, group, ungroup, zoom in, zoom out, undo, redo, and request hint. B. Text field. Text field shows a reading text. Students can select a word or a sentence as a text element and drag into the canvas directly when they comprehend the relations among the text. The entire sentence will be highlighted when students click at a word. C. Referential device list. All referential devices found by the NLP module are listed in this area. Students are asked to figure out what these referential devices refer to, drag and drop them to the canvas. Students can also add other referential devices from the text to the list by themselves. Moreover, when a referential device is selected, the referential device will also be highlighted in the text. D. Canvas. Students draw arrows indicating the relations among referents in the canvas. They can add, erase, drag and drop elements in the canvas, and establish relations among these text elements. The map shows a simplified network about the text, with nodes of individual text elements and connections which show the relations between elements in the text [1]. E. Feedback frame. The feedback frame will inform students what referents are incorrect and offer him/her some clues about the correct referents. The details of the Feedback module will be discussed later in this article. 2.3 Recording Module While students construct the referential map, the system records all of their actions by recording module. Students’ behavioral data and maps will be saved by the system in the database so that the teacher can check out these data later.The module uses some predicates to record students’ behavioral data. Table 1 shows some of the predicates. Table 1. Student’s actions Predicates select_sentence(T) element(X) erase_element(X) refer_to(X, Y) erase_reference(X, Y) …
Description Select a sentence T which a student is reading in text field. Add an element X to the canvas. Erase an element X in the canvas. Add a relation between X and Y. Erase a relation between X and Y. …
2.4 Feedback Module After a student has constructed his/her initial referential map, he/she can request an evaluation of the map. The feedback module will compare the student’s map with the expert’s map and then inform the student what referents are incorrect and offer him/her some clues about the correct referents. Fig. 4 shows the feedback received by
A Computer-Assisted Environment on Referential Understanding
1123
a student. When an incorrect referent is chosen by the student, the clues about the referent will be highlighted in the text field. The types of mistakes can be divided into missing referents, incorrect referents, and incorrect direction of referential relation.
3 Conclusion In this study, a computer-assisted environment is developed to enhance EFL college students’ reading comprehension. Natural language processing technology is used to automatically search referential devices and segment the texts into sentences. The recording module records students’ actions and then reports back to the teacher. From these data, the teacher can further identify students’ reading strengths and weaknesses. This would be helpful for the teacher to adjust their teaching in order to reduce students’ reading difficulties. After students construct their initial referential map, the feedback module will compare the map with the expert’s map. If there are mistakes in the student’s map, the feedback module will offer them a second chance in correcting the referents. Based on the results of comparison, students are told what referents are incorrect, and get hints on candidate referents. In the future, this computer-assisted learning environment will be empirically tested by EFL college students, and the experimental results will be reported in subsequent studies. The system also plans to add a module which automatically generates the expert’s map from a text. We will use the lexical resource of WordNet [11] to help us to resolve referential ambiguities.
Acknowledgement This project is supported by the National Science Council, Taiwan (NSC 94-2520-S224-001) and the Center for English Teaching Resources (MOE 0940052270).
References 1. Broek, P. V. D., Kremer, K. E.: The Mind in Action: What It Means to Comprehend During Reading. In Taylor, B. M., Graves, M. F., & Broek, P. V. D. (Ed.) Reading for Meaning: Fostering Comprehension in the Middle Grades. Allyn & Bacon (1998) 1-31 2. Peterson, C.: Identifying Referents and Linking Sentences Cohesively in Narration. Discourse processes, 16 (1993) 507-524 3. Grabe, W., Stoller, F. L.: Teaching and Researching Reading. Pearson Education (2002) 4. Halliday, M. A. K., Hasan, R.: Cohesion in English. Longman Group Ltd (1976) 5. Huang, S. H.: Assessing the Relationship between Referential Understanding and Academic Reading Comprehension among EFL College Students. A thesis of institute of applied foreign languages, National Yunlin Nniversity of Science and Technology, Taiwan (2005) 6. Chu, H. C. J., Swaffar, J., Charney, D. H.: Cultural Representation of Rhetorical Conventions: the Effects on Reading Recall. TESOL Quarterly, 36(4), (2002) 511-541
1124
W.-K. Wong et al.
7. Chen, L. T.: Improving High School Students’ Performance on “Discourse Structure” Tests through Instruction of Text Structure and Think-aloud Modeling. A thesis of institute of English department, National Taiwan Normal University, Taiwan (2003) 8. Sharp, A.: Reading Comprehension and Text Organization. U. S. The Edwin Mellen Press, Ltd (2003) 9. Dudley-Evans, T., St. John, M. J.: Developments in ESP a Multi-Disciplinary Approach. Cambridge university press (1998) 10. OpenNLP, available electronically from http://opennlp.sourceforge.net/ 11. WordNet, available electronically from http://wordnet.princeton.edu/
An Object-Oriented Framework for Data Quality Management of Enterprise Data Warehouse Wang Li and Li Lei Software Institute of Sun Yat-Sen University 510630 Guangzhou, China
[email protected] Abstract. Enterprise data warehousing technology aims at providing integrated, consolidated and historical data for users to analyze businesses and make decisions. In order to obtain the correct results, the high data quality is required. In this paper, we analyze the quality problems of enterprise data warehouse and present an object-oriented framework for data quality management. In this framework, an object-oriented data quality model (OODQM) is built. The data quality requirements, the participators, the data quality checking object, and the possible data quality problems, form the core components of OODQM. The method we provide is a goal-driven method. Once the data quality goal is built, we manage data quality by the interaction of those components of OODQM.
1 Introduction Effective, real, and accurate data is a necessary condition for enterprise data warehouse (EDW) building. We should build a series of managing mechanisms to guarantee the data quality before the data is used to make decision. Many experts study on the data quality, and the issues include data quality requirements analysis and modeling [4], data quality defining [5, 10], data quality assessment [7], and data quality management [1, 2, 3, 8, 9]. R.Y. Wang builds the framework of data quality analyzing [10], and provides the attribute-based method [2] which attaches the data quality measure to attribute and build the data quality model based on attributes. Yang W. Lee [3] provides an approach to manage data quality based on context-reflective which includes paradigm, role, goal, time, and place. DWQ [1, 9] extends the architecture of EDW by building the metadata model of data quality and embedding them into the concept model, logic model, and physical model of EDW respectively. The quality data is acquired, stored, and maintained in the metadata model. These researches build and develop the foundation of theories and techniques for guaranteeing data quality. In this paper, we build a novel data quality management framework for EDW according to the actual experience of project implementing. EDW has special data environments which integrate many homogeneous or heterogeneous data. We analyze the origins of data quality problems in these environments, establish an objectoriented data quality model, and provide the data quality processing method. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1125 – 1129, 2006. © Springer-Verlag Berlin Heidelberg 2006
1126
L. Wang and L. Li
2 Definition of Data Quality and Data Environments of EDW The definitions of data quality are different for different categories [5, 10]. For EDW, we define the definition of data quality from six dimensions according to the actual projects we did. We measure every dimension quantitatively. Completeness— There are two kinds of completeness, one is business rule completeness (BRC), and the other is value of attribute completeness (VAC). BRC is the percent of data which satisfy business requirements. VAC is the percent of records which have right value. Correctness— Correctness is the percent of data in accord with facts. Usability—Usability is the percent of data which can be used. Currency— Currency is the percent of data which are current for the application of EDW, namely, the percent of data refreshed during the accepted tolerance of delay. Consistency— Consistency is the percent of data which don’t conflict with others. Relevance— Relevance is the percent of data which can be related with other data according to business requirements. Data environments are the environments where the data exist. The generally data environments of EDW are shown in figure 1. The data quality problems are mostly caused by the source data. Logical data model (LDM) and ETL design would produce wrong data too. Errors would occur when the data are downloaded, transmitted, loaded, or integrated by ETL program. Data quality problems will be produced during the process of EDW running.
Fig. 1. Data Environments
3 Framework for Data Quality Management of EDW 3.1 Object-Oriented Data Quality Model (OODQM) The object-oriented theory has been popularized since 1970s. In this section, we build the object-oriented data quality model (OODQM) based on the object-oriented theory. The mainly classes included in OODQM are defined as follows. 1. Goal— Goal is a class which represents the requests for data quality coming from users or the system. Goal contains kinds of attributes related with data quality dimensions. The Goal class includes six subclasses: Completeness, Correctness, Usability, Currency, Consistency, and Relevance, furthermore, the Completeness
An Object-Oriented Framework for Data Quality Management of EDW
1127
subclass includes BRC subclass and VAC subclass. The meanings of these subclasses are shown in section 2. The main functions of the operations defined in these subclasses are acquiring the measures about data quality. 2. Data Carrier—Data Carrier is a class which represents all kinds of being forms of data and their corresponding data quality information. According to the data environments of EDW, the Data Carrier class has five subclasses: Source DB, Data File, Temp DB, EDW LDM and EDW DB. Source DB subclass represents the databases of source systems and contains the data quality information of source systems. The main functions of the operations defined in Source DB are collecting the data quality information of source systems by mining the relations between all source systems or mining the business rules within a source system. Data File subclass represents the data files downloaded from source system or the data files stored in the ETL server and contains their corresponding data quality information. The main functions of the operations defined in Data File are collecting the data quality information by checking the key features of data files, such as the names of data files, the date of data, the size of a file, the number of records in data files, the length of a record, and so forth. Temp DB subclass represents the temp database (or temp space) built during the process of ETL and contains the data quality information of the temp database. Generally, the temp database has the same data structure as source data in source system. The main functions of the operations defined in Temp DB are collecting the data quality information by checking key features of the table in the temp database, such attributes’ domain, reference integrity, data consistency within a system or between systems, correctness of indexes (by computing some sample). EDW LDM subclass represents the logic data model of EDW and contains the data quality information about entities, attributes, or the relevant mapping rules from source systems to EDW. The main functions of the operations defined in EDW LDM are collecting the data quality information by checking the LDM. EDW DB subclass represents the EDW database and its data quality information. The main functions of operations defined in EDW DB are the same as the operations of Temp DB. 3. Quality Problem— Quality Problem is a class which represents all kinds of the data quality problems, such as null, incorrect data, incurrent data, inconsistent data, etc.. The main functions of operations defined in the Quality Problem are the methods used to solve corresponding data quality problems, such as the null processing methods (for example, using the sample average to replace null), noise data processing methods (for example, the binning approach), and so on. The Quality Problem can be specialized into multi subclasses according to the types of data quality problems. 4. Role—Role is a class which represents the participants who participate in data quality management. There are three subclasses in Role: Modeler, ETL designer, and DBA (database administrator). The functions of the operations defined in these subclasses are participating in data quality management. The Modeler takes charge of the correctness of LDM. The ETL designer takes charge of the data quality
1128
L. Wang and L. Li
management of data files, temp database, and EDW. The DBA takes charge of monitoring and maintaining the data quality in EDW when EDW running. The mainly associations of OODQM are defined as follow: 1. “Check” between Goal and Data Carrier. Goal checks the situations of data quality of Data Carrier. 2. “Managed” between Data Carrier and Role. Data Carrier is managed by Role. 3. “Exist” between Data Carrier and Quality Problem. What problems the Data Carrier has. The mainly generalizations of OODQM are shown in the figure 2, for example, the relations between Modeler and Role. There aren’t dependencies between the classes defined as above.
Fig. 2. OODQM Primary Class Diagram
Figure 2 illustrates the primary class diagram of OODQM. The other concepts of OODQM, such as massage, polymorphism, inheritance, etc., are the same as the general definitions of the object-oriented theory, and we don’t show their definitions here again. 3.2 Data Quality Processing in EDW The procedure of data quality processing in EDW is a goal-driven course. We first built the goal which represents the requirements of data quality. Once the goal is built, we manage data quality by the interaction of those components of OODQM. Figure 3 illustrates the architecture of data quality management framework of EDW. The requirements of data quality information are input by the clients and stored as Goal objects in the data quality management server. Management server checks the data quality information in the Data Carrier objects and finds out the data quality problems by comparing the information with Goal. Server informs the participators to select the right methods to solve problems. The framework has been partly implemented in our actual project. The remaining key work is complementing and optimizing the data quality checking and solving methods.
An Object-Oriented Framework for Data Quality Management of EDW
1129
Fig. 3. Architecture for Data Quality Management framework of EDW
4 Conclusion In this paper, we presented an object-based framework for data quality management of EDW. Our method emphasizes the importance of data quality goal, the data quality checking object, the participants, and the types of data quality problems in the process of managing data quality. The framework we built is flexible. We can trim the objects of OODQM by the actual applications.
References 1. M. Jarke, M.A. Jeusfeld, et al..: Architecture and Quality in Data Warehouses: An Extended Repository Approach. Information Systems 24 (3) (1999) 229—253 2. R.Y. Wang, M.P. Reddy, H.B. Kon.: Towards quality data: an attribute-based approach. Decision Support Systems, 13(4) (1995) 349—372 3. Yang W. Lee.: Crafting Rules: Context-Reflective Data Quality Problem Solving. Journal of Management Information Systems/Winter 2003-4, Vol. 20, No. 3 (2004) 93—119 4. R.Y. Wang.: Data Quality Requirements Analysis and Modeling. Ninth International Conference on Data Engineering Vienna, Austria April (1993) 5. Strong D.; Lee, Y., Wang, R. Data quality in context. Communications of the ACM, 40.5 (1997) 103–110 6. Wand Y., Wang R.: Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39, 11 (1996) 86–95 7. Pipino L., Lee Y., Wang, R.: Data quality assessment. Communications of the ACM, 45, 4 (2002) 211–218 8. Wang R., Lee Y., Pipino L., Strong D.: Manage your information as a product. Sloan Management Review, 39, 4 (1998) 95–105 9. M. Jarke, Y. Vassiliou.: Foundations of data warehouse quality: a review of the DWQ project. In Proc. 2nd intl.Conf. Informantion Quality(IQ-97). Cambridge, Mass. (1997) 10. R.Y. Wang, V.C. Storey, C.P. Firth.: A framework for analysis of data quality research. IEEE Trans. Knowledge and Data Engineering, 7(4) (1995) 623-640
Extending HPSG Towards HDS as a Fragment of pCLL Erqing Xu Shanghai International Studies University, 200083 Shanghai, China
[email protected] Abstract. Rebuilding Minimalist Grammars (MG) into Categorial Minimalist Grammars (CMG) as extension of MG towards partially commutative linear logic (pCLL) is of significance. But the bijective syntax-semantics interface established in CMG sometimes fails to obtain some semantic proof trees which involves more than one quantifier. Satisfactory solution has not been found yet. Aimed at this problem, we keep the type psoa of HPSG (Head-Driven Phrase Structure Grammar) that the glue semantics approach removes and extend HPSG towards a HPSG Deductive System (HDS) as a fragment of pCLL the way MG are rebuilt. We handle the Subcategorization Principle, the Trace Principle, and the Semantic Principle of HPSG with HDS; establish the correspondence between syntax and semantics; obtain with this correspondence the problematic semantic proof tree, and thus solve the above-mentioned problem.
1 Introduction Since Minimalist Grammars (MG) are relatively recent and largely informal, while linear logic provides a valuable common ground upon which proposals in MG can be formulated clearly, redundancies and notational variants can be identified, and compatibility with the data can have the deciding role it deserves, [4] rebuilding MG into Categorial Minimalist Grammars (CMG) as extension of MG towards partially commutative linear logic (pCLL) [1][2] is of significance. In CMG, syntactic analyses can be regarded as pCLL proofs. Using the embedding of intuitionistic logic into pCLL [4], semantical λ-terms which by Curry-Howard isomorphism are intuitionistic proofs can also be viewed as pCLL proofs [4]. Thus, both the semantic homomorphic image of the syntactic analysis and the semantic recipes are pCLL proofs, and the semantic recipe of an utterance can be computed [4]. This gives rise to the establishment of a bijective syntax-semantics interface, which enables the handling of the interaction between syntax and semantics. [1][2] But this interface sometimes fails to work for certain cases where more than one quantifier is involved. To be more specific, some semantic proof trees involving more than one quantifier are hard to obtain. For example, the semantic proof tree of “∃x.[poem(x) ∀y.[student(y) knows(y, x)]]: t” (denoting one of the two readings of “every student knows a poem”) is hard to obtain. In order to solve this problem, the tool of CLLS (Constraint Language for Lambda Structures) is introduced, which helps to generate all the possible semantic proof trees. [2] But CLLS is too flexible
∧
→
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1130 – 1134, 2006. © Springer-Verlag Berlin Heidelberg 2006
Extending HPSG Towards HDS as a Fragment of pCLL
1131
[2] and the bijective correspondence between syntactic proof trees and semantic proof trees are lost. Further work on this problem is needed. [2] The glue semantics for HPSG (Head-Driven Phrase Structure Grammar) [5] is able to deal with multi-quantifiers and generate all the possible semantic proof trees. But the problem with the glue semantics approach for HPSG is similar to that of CLLS, i.e. it is too flexible and the correspondence between syntax and semantics cannot be found. We argue that this is because the glue semantics approach removes from HPSG the type psoa (parameterized state of affairs), namely the useful quantifierhandling information. In order to solve the above-mentioned problem, the idea of this paper is to keep the type psoa that the glue semantics approach removes and extend HPSG towards a HPSG Deductive System (HDS) as a fragment of pCLL the way MG are rebuilt. Then the correspondence between syntax and semantics can be established and the above-mentioned problem can be solved.
2 A Logical Analysis of the Syntax of HPSG In a headed phrase, the SUBCAT value of the head daughter is the concatenation of the phrase’s SUBCAT list with the list of SYNSEM values of the complement daughters. [3] This is referred to as the Subcategorization Principle of HPSG. Suppose in a HPSG schema, there are a Head-daughter and its Complement-sister, which can be designed as mere list of features. The head daughter H is z1 /1, and its complement-sister C is 1, where z1 is a sublist of head features of H, /1 is on the SUBCAT list of H, which is to be satisfied by 1, 1 is the head features of C. We have: z1/1 1 z1. Here /1 and 1 are both cancelled. With the operator /, we are able to attach a syntactic object to the right of another. If the head selects an additional syntactic object, it will be attached on the left, and therefore we shall have necessarily to use \. We assume the usual elimination rules for / and \, which are the rules of classical categorial grammars. Now we consider that syntactic features of a lexical entry or a syntactic object are put together by a non-commutative product •. The rules of • of HDS include [•I] and [•E] as follows respectively:
├
├
├
Γ x: 1 Δ y: 2 (Γ; Δ) xy: 1 • 2
├ ├
├
├
Γ ω: 1 • 2 Δ[(x: 1; y: 2)] z: 3 Δ[Γ] let (x; y) = (π1(ω); π2(ω)) in z: 3 Here Δ[] denotes a position in context Δ. [4] ω, x, and y are all phonological labels. 1, 2, and 3 are tagged features. π1(ω) and π2(ω) are projections. ( ) expresses order. Product • is also associative. We have an example with the lexicon: she ::= /she/: 1; walks ::= /walks/: 1\ 2. We build the syntactic proof tree with [\E]:
├
├
├ /she/: 1 ├ /walks/: 1\ 2 ├ /she walks/: 2
1132
E. Xu
3 A Logical Analysis of the Trace Principle In HPSG, every trace must be subcategorized by a substantive head, [3] which is referred to as the Trace Principle. Refer to the following example of WUDC (weak Unbounded Dependency Construction): “I1 am easy to please ___1.” In this example, the nominative “I” is coindexed with the accusative “___”. Here the phenomenon is not so simple as it was where the hypotheses discharged in conformity with the use of the [•E] rule when the factors of the product type/feature were in the same order as the hypotheses to discharge. Here relevant head features of the complement to satisfy the features on the SUBCAT list of the head are not at the right place where they can delete the features on the SUBCAT list of the head by means of our usual rules. This problem can be solved if we introduce two products • (which expresses order) and ⊗ (which ignores order). The rules of ⊗ of HDS include [⊗I] and [⊗E] respectively:
├
├
Γ x: 1 Δ y: 2 Γ, Δ {x, y}: 1⊗2
├
├
├
ω: 1⊗2 Δ[(x: 1, y: 2)] z: 3 Γ Δ[Γ] let{x, y}={π1(ω), π2(ω)} in z: 3
├
Here {} ignores order. And then the rule [entropy] is a direct result:
├1 ├1
Γ[(Δ1; Δ2)] Γ[(Δ1, Δ2)]
The rule [⊗E] can act on non-adjacent features and they can be employed to handle the trace+coindextion problem. We try the following example with the lexicon: I ::= /I/: 3⊗5; am ::= /am/: 3\ 2 /4; easy ::= /easy/: 4 /1; to_please ::= /to_please/: 1 /5, and build the following syntactic proof tree:
├
├
├
├
├ /to_please/: 1 /5 x: 5 ├ x: 5 ├ /easy/: 4 /1 x: 5├ /to_please/x: 1 ├ /am/ : 3\ 2 /4 x: 5 ├ /easy to_please/x: 4 y: 3 ├ y: 3 x: 5 ├ /am easy to_please/x: 3\ 2 ├/I/: 3⊗5 y: 3 , x: 5 ├ y/am easy to_please/x: 2 π1(/I/)/am easy to_please/π2(/I/): 2
We give the set of rules and the structure of lexical entries to express our work in this paper. Notice lexical entries are axioms. And the tagged features are in the typelogical sense. Let us call HDS (HPSG Deductive System) the fragment of pCLL [1]. We thus obtain rules ([/E], [\E], [•I], [•E] [⊗I], [⊗E], [entropy], along with [ ◦I] and [ ◦E] in Section 4) and a lexicon of which each lexical entry consists in an w: T where T is a type: ((2\(3\ …(n\(n+1⊗n+2⊗…⊗n+m⊗n+m+1))))/1. axiom
— ├
—
Extending HPSG Towards HDS as a Fragment of pCLL
1133
4 Semantic Interpretation The following work in this section simplifies the complicated Semantic Principle [3] of HPSG. And we employ HDS to deal with the semantic calculus for HPSG. Refer to the example sentence: “Every student knows a poem.” [3] We have the lexicon:
├ ├ ├
→
├
—◦t)—◦(e —◦t)—◦t —◦t —◦(e —◦t) —◦(e —◦t) —◦t)—◦(e —◦t)—◦t —◦t
(esubj esubj esubj eobj (eobj eobj
λP.λQ.∀x.P(x) Q(x) λx.student(x) λx.λy.knows(x, y) or λy.λx.knows(x, y) λP.λQ.∃x.P(x) Q(x) λx.poem(x)
every ::= /every/: 1 /7 student ::= /student/: 7 knows ::= /knows/: 1\ 3 /2
∧
a ::= /a/: 8 poem ::= /poem/: 8\ 2
├
subj
obj
subj
obj
Here t bears a truth-denoting or proposition meaning, while e bears entity-denoting meanings, and esubj and eobj are subtypes of e. [2] The semantic rules [ ◦I] and [ ◦E] also belong to HDS. With [\E] and [/E] we build the syntactic proof tree:
—
—
├ /a/: 8 ├ /poem/: 8\2 ├ /every/: 1 /7 ├ /student/: 7 ├ /knows/: 1\ 3 /2 /a poem/: 2 ├ /every student/ : 1 ├ /knows a poem/: 1\ 3 ├ /every student knows a poem/: 3 In HPSG, each lexical entry contains both syntactic and semantic information in its AVM. If we highlight the semantic information as part of CONTENT whose value is of type psoa [3], we have two structure trees in Figure 1 [3]. The value of “knows” should be selected accordingly to ensure that the corresponding semantic proof does not crash, and we have the following two semantic proof trees for tree (a) and tree (b) in Figure 1 respectively: : eobj λy.λx.knows(x, y): eobj ◦(esubj ◦t) subj λQ.∀y.[student(y) Q(y)]: (e ◦t) ◦t λx.knows(x, z): esubj ◦t
→
∧ ∧
z — — — — — ∀y. [student(y)→knows(y, z)]: t —◦t)—◦t λz.∀y.[student(y)→knows(y, z)]: e —◦t
λQ.∃x.[poem(x) Q(x)]: (eobj ∃x.[poem(x) ∀y.[student(y) knows(y, x)]]: t
obj
→
z: e
— —◦t)
λx.λy.knows(x, y): esubj ◦(eobj λQ.∃x.[poem(x) Q(x)]: (e ◦t) ◦t λy.knows(z, y): eobj ◦t ∃x.[poem(x) knows(z, x)]: t ◦t λz.∃x.[poem(x) knows(z, x)]: esubj ◦t λQ.∀y.[student(y) Q(y)]: (esubj ◦t ∀y.[student(y) ∃x.[poem(x) knows(y, x)]]: t
∧ → →
— — — )—
obj
∧
subj
∧
—
—
∧
—
—
The above two semantic proof trees can be built through the rules [ ◦I] and [ ◦E] of HDS, and thus the complicated Semantic Principle [3] is simplified. The above two distinct semantic proof trees are obtained due to the difference in the orders of the retrievals of the stored quantifiers tagged 4 and 6. By the above analysis, it is revealed that the sentence “every student knows a poem” has two readings.
1134
E. Xu
Fig. 1. Two structure trees (a) and (b). In tree (a) the quantifier tagged 6 is RETRIEVED from QSTORE before the quantifier tagged 4 is RETRIEVED. And the other way round for tree (b).
5 Conclusion In this paper, we extend HPSG towards HDS as a fragment of pCLL and successfully obtain certain problematic semantic proof tree that involves more than one quantifier.
References 1. Lecomte, A.: Rebuilding MP on a Logical Ground. Research on Language and Computation. 2 (2004) 27-55 2. Amblard, M.: Synchronisation Syntax-Sémantique, des Grammaires Minimalistes Catégorielles aux Constraint Languages for Lambda Structures. RECITAL 2005, Dourdan, France, 6-10 Juin 2005 3. Pollard, C., Sag, I.A.: Head-Driven Phrase Structure Grammar. The University of Chicago Press, Chicago London (1994) 4. Rétoré, C., Stabler, E.: Generative Grammars in Resource Logics. Research on Language and Computation. 2 (2004) 3-25 5. Asudeh, A., Crouch, R.: Glue Semantics for HPSG. Proceedings of the 8th International Conference on Head-Driven Phrase Structure Grammar. CSLI, Stanford (2001) 1-19
Chinese Multi-document Summarization Using Adaptive Clustering and Global Search Strategy Dexi Liu1,2, Yanxiang He2, Donghong Ji3, Hua Yang2, and Zhao Wu1,2 1
School of Physics, Xiangfan University, Xiangfan 441053, P.R. China School of Computer, Wuhan University, Wuhan 430079, P.R. China 3 Institute for Infocomm Research, Heng Mui Keng Terrace 119613, Singapore
[email protected] 2
Abstract. Multi-document summarization has become a key technology in natural language processing. This paper proposes a strategy for Chinese multidocument summarization based on clustering and sentence extraction. As for clustering, we propose two heuristics to automatically detect the proper number of clusters: the first one makes full use of the summary length fixed by the user; the second is a stability method, which has been applied to other unsupervised learning problems. We also discuss a global searching method for sentence selection from the clusters. To evaluate our summarization strategy, an extrinsic evaluation method based on classification task is adopted. Experimental results on news document set show that the new strategy can significantly enhance the performance of Chinese multi-document summarization.
1 Introduction The sentence extraction strategy ranks and extracts representative sentences from the multiple documents. Radev et al. [1] described an extractive multi-document summarizer which extracts a summary from multiple documents based on the document cluster centroids. Although sentence extraction method is not the best one for a readable summary, some sentences extracted from the documents can describe part contents in a certain extent. Extraction-based summarization is still a promising solution especially when the speed is concerned. In the sentence extraction strategy, clustering is frequently used to eliminate the redundant information resulted from the multiplicity of the original documents [2]. However, two problems should be solved for Chinese multi-document summarization based on clustering and sentence extraction strategy. The first problem is how many clusters are appropriate for the sentences in the documents collection. In this paper, we try to use two strategies to automatically infer the cluster number: the first one makes full use of the summary length fixed by the user; the other is a stability-based strategy [3], which has been applied to other unsupervised learning problems. The second problem is how to select representative sentences from the clusters. In this paper, we give some formalization of the local and global strategies, and compare their performance. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1135 – 1139, 2006. © Springer-Verlag Berlin Heidelberg 2006
1136
D. Liu et al.
2 Sentence Clustering In this paper, two kinds of methods are proposed to detect K automatically. The first method is simple and inspired by the limit of summary length fixed by the user. On the one hand, summary length is usually fixed by the user, so the number of extracted sentences is approximatively fixed at the same time. On the other hand, to generate an anti-redundant summary, summarizer usually extracts only one sentence from each cluster. So, the number of sentences in fixed-length-summary is an acceptable value for the number of clusters. The most probable number of sentences in a fixed-lengthsummary is the length of summary divided by the average length of sentences in document collection. Thus, we determine the approximate number of clusters as: K=LSM/avg(LS) .
(1)
Where LSM denotes the summary length fixed by the user, avg(LS) denotes the average length of sentences in the document collection. In contrast to the former method, the second one is suitable for the condition in which the summary length is absent. We adopt a stability method, which has been applied to other unsupervised learning problems [4,5]. Formally, let K be the cluster number, we need to find K which meets (2).
K = arg max(F(k )) .
(2)
k
Where F(k) is the evaluation function based on resampling based stability. Let P μ be a subset sampled from full sentence set in the document collection P with size δ P ( δ is set as 0.9 in this paper), C (or C μ ) be P × P (or P μ × P μ ) connectivity matrix based on the clustering result on P (or P μ ). Each entry cij (or cijμ ) of C (or C μ ) is calculated in the following way: if the entry pair pi , p j ∈ P (or P μ ) belong to the
same cluster, then cij (or cijμ ) equals 1, otherwise, 0. Then the stability is defined as:
∑
μ
M (C , C ) =
η
pi , p j ∈P μ , pi′ , p j ′ ∈P
∑
pi , p j ∈P μ
ξ
.
μ
⎧1, if {Ci , j = Ci ′, j ′ = 1, pi = pi′ , p j = p j ′ }
η=⎨
⎩0,
else
(3)
μ
⎧1, if Ci , j = 1 , ξ =⎨ else ⎩0,
Intuitively, M (C μ , C ) denotes the consistency between the clustering results on C μ and C. The assumption is that if k is actually the “natural” number of the clusters, then the clustering results on the subset P μ generated by sampling should be similar to the clustering result on full sentence set P. Obviously, the above function satisfies 0≤M≤1. It is noticed that M (C μ , C ) tends to decrease when increasing the value of k. Therefore to avoid the bias that small value of k is to be selected as cluster number,
Chinese Multi-document Summarization
1137
we use the cluster validity of a random predictor ρ k to normalize M (C μ , C ) . The random predictor ρ k achieved the stability value by assigning uniformly drawn labels to objects, in other words, splitting the data into k clusters randomly. Furthermore, for each k, we tried q times. So, the normalized object function can be defined as (4).
F (k ) = M knorm =
1 q 1 q M (Ckμi , Ck ) − ∑ M (Cρμki , Cρk ) . ∑ q i =1 q i =1
(4)
Normalizing M (C μ , C ) by the stability of the random predictor can yield values independent of k [4]. After the number of optimal clusters has been chosen, we adopted the k-means algorithm for the clustering phase. Each of the output sentence clusters is supposed to denote one topic in the document collection. For the sake of running efficiency, we limited cluster number k varies from 8 to 12 in the following experiments.
3 Representative Sentence Selection For each sentence cluster, we need to select one sentence to represent the topic denoted by the cluster. Now that the terms extracted from the texts (sentence cluster or the whole document collection) are supposed to denote the main concepts in the texts, we weight the sentence based on the terms included in the sentence. Local search strategies select the representative sentences based on the clusters themselves. We try 3 methods to select the representative sentence: centroid sentence, TF* IDF [6], TF. For global search strategy, we select a sentence according to its contribution to the performance of the whole summary. To do that, we need a global criterion to measure the summary. The criterion is defined as follows:
wsummary =
∑
t∈summary
(log(1 + f t D ) * log(1 + lt )) log(1 + lsummary )
.
(5)
Where t is the term in the summary, f t D is term frequency in document collection, lt is the term length. Intuitively, the criterion reflects the global term density of a summary. In general, we expect the summary to contain more terms, more longer terms, and as short as possible in each selecting step.
4 Experiments and Results We adopt the extrinsic method [7] to evaluate the quality of summarization by evaluating the results of classifying task. The training and testing data set are the document collection and their summaries produced by the summarizer to be evaluated. For this classifying task, the document collection D and their summaries set S are divided into
1138
D. Liu et al.
two equal parts D1, D2, S1 and S2 respectively. The effectiveness of summarization can be evaluated through comparing the effectiveness of following 4 subtasks: i) S1D2: classify S1 using the classifier trained with D2; ii) S1S2: classify S1 using the classifier trained with S2; iii) D1S2: classify D1 using the classifier trained with S2; iv) D1D2: classify D1 using the classifier trained with D2; The data set we used is the news pages snapped from sohu website (www. sohu.com) in 2004. All these 1912 news have been classified into 7 categories: economics (number: 312), science & technology (283), law (150), politics (409), military (256), sports (324), entertainment (178). We extract the content from these pages using XML parser and save them as text files. Table 1 suggests objectively that the summaries can express the main idea of the original documents more accurately and fairly perfect, and the performance of classification by training with the summaries is better than that with the corresponding documents. The main reason is that most of the little-informative sentences have been removed while the themes of the original documents are covered in the summaries. Table 1. Evaluation of summarization based on classification task and comparison of different sentence selection strategies (CS – Centroid Sentence ranking, TFIDF – TF*IDF ranking, TF – Term Frequency ranking, GS – Global Search)
Macro-averaged F1 CS TFIDF TF S1D2 0.8045 0.8103 0.8162 S1S2 0.9183 0.9647 0.9718 D1S2 0.8061 0.8412 0.8433 D1D2 0.7847 0.7847 0.7847
GS 0.8156 0.9630 0.8459 0.7847
Micro-averaged F1 CS TFIDF TF 0.8127 0.8211 0.8095 0.9198 0.9647 0.9710 0.8002 0.8335 0.8562 0.7818 0.7818 0.7818
GS 0.8126 0.9605 0.8531 0.7818
The results also demonstrate that TF*IDF ranking, term frequency ranking and the global search get very similar scores, while the centroid sentence ranking produces the summaries with lower performance. The reason may be that the information of terms frequency, their cluster frequency, as well as their length, is useful to select better representative sentences. To check whether automatic cluster number detection help to improve the quality of summary, we design three experiments using different cluster number determination methods. Firstly, we fix the summary length as 10%, 30% and 50% of the average document length of its original document collection, and then the number of clusters can be calculated using formula (1) in section 2. Secondly, we fix the number of the clusters as 8, 10 and 12 respectively. In the third experiment, the number of the clusters automatically detected using formula (2) in section 2. For each K, we use term frequency ranking sentence selection strategy after k-means clustering, and then evaluate the generated summaries through classification task. The macro-averaged F1 scores of classification result are listed in Table 2.
Chinese Multi-document Summarization
1139
Table 2. Comparison of different cluster number determination methods
S1D2 S1S2 D1S2 D1D2
Fixed summary length 10% 30% 50% 0.7526 0.8169 0.8083 0.8209 0.9683 0.9467 0.7314 0.8327 0.8059 0.7847 0.7847 0.7847
Fixed cluster number K=8 K=10 K=12 0.8005 0.8106 0.7961 0.9562 0.9437 0.9546 0.8255 0.8370 0.8359 0.7847 0.7847 0.7847
Auto detection 0.8162 0.9718 0.8433 0.7847
We can see that the algorithm with automatic cluster number detection outperforms the fixed cluster number method. The reason is that for each document collection, the optimal number of the clusters should be different. With fixed cluster number, there would produce non-optimal cluster structures, which would affect the overall performance.
5 Conclusion In this paper, we propose a cluster-based method for Chinese multi-document summarization. It mainly consists of two steps: sentence clustering and sentence selection. For sentence clustering, we propose two strategies to determine the number of clusters automatically: one strategy makes full use of the summary length fixed by the user while the other one is stability based, which can infer the optimal cluster number automatically. For sentence selection, we present a global search method and compare this method with other local methods. Experiment results show that our summarization strategy is effective and efficient for classification tasks. Our future work is to utilize more Chinese text features in sentence selection and generation.
References 1. Radev, Dragomir R., Jing, HY., Budzikowska, M.: Centroid-Based Summarization of Multiple Documents: Sentence Extraction, Utility-Based Evaluation and User Studies. Information Processing and Management, Vol. 40(6) (2004) 919–938 2. Boros, E., Kantor, P., Neu, David J.: A Clustering Based Approach to Creating MultiDocument Summaries. http://www-nlpir.nist.gov/projects/duc/pubs/2001papers/ rutgers_final.pdf (2001) 3. Lange, T., Braun, Mikio L., Roth, V., Buhmann, Joachim M.: Stability-Based Model Selection. Advances in Neural Information Processing Systems, Vol. 15. MIT Press (2003) 4. Levine, E., Domany, E.: Resampling Method for Unsupervised Estimation of Cluster Calidity. Neural Computation, Vol. 13 (2001) 2573–2593 5. Niu, ZY., Ji, DH., Tan, CL.: Document Clustering Based on Cluster Validation. CIKM’04, Washington, DC, USA (2004) 6. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley, New York (1999) 27–30 7. Hand, T. F.: A Proposal for Task-based Evaluation of Text Summarization Systems. In ACLEACL-97 Summarization Workshop (1997) 31–36
Genetic Algorithm Based Multi-document Summarization Dexi Liu1,2, Yanxiang He2, Donghong Ji3, and Hua Yang2 1
School of Physics, Xiangfan University, Xiangfan 441053, P.R. China School of Computer, Wuhan University, Wuhan 430079, P.R. China 3 Institute for Infocomm Research, Heng Mui Keng Terrace 119613, Singapore
[email protected] 2
Abstract. The multi-document summarizer using genetic algorithm-based sentence extraction (SBGA) regards summarization process as an optimization problem where the optimal summary is chosen among a set of summaries formed by the conjunction of the original articles sentences. To solve the NP hard optimization problem, SBGA adopts genetic algorithm, which can choose the optimal summary on global aspect. To improve the accuracy of term frequency, SBGA employs a novel method TFS, which takes word sense into account while calculating term frequency. The experiments on DUC04 data show that our strategy is effective and the ROUGE-1 score is only 0.55% lower than the best participant in DUC04.
1 Introduction The sentence extraction strategy ranks and extracts the representative sentences in the multiple documents. Radev et al. [1] described an extractive multi-document summarizer that extracts a summary from multiple documents based on the document cluster centroids. Knight and Marcu [2] introduced two algorithms for sentence compression based on noisy-channel model and decision-tree approach. Barzilay et al. [3] described an algorithm for information fusion, which tries to combine similar sentences across documents to create new sentences based on language generation technologies. In sentence extraction strategy, clustering is introduced to eliminate the information redundancy resulted from the multiplicity of the original documents [4]. However, the redundancy problem cannot be totally solved because the clustering process cannot maintain the disjunction between clusters. On the other hand, the critical problem of clustering method is how to find out the appropriate number of clusters. Some researchers adopted predefined cluster number or similarity threshold. Even if the cluster number is determined, sentences with high score in their clusters may be not the best one if we view the summery as a whole. To solve these problems, we adopt genetic algorithm (GA) [5] to extract appropriate sentences in our work. In addition, a novel method is adopted while calculating TF*IDF, which takes word sense into account.
2 System Design SBGA includes three modules: pre-processing module, statistical modules and sentence extraction module. In pre-processing module, documents are split into Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1140 – 1144, 2006. © Springer-Verlag Berlin Heidelberg 2006
Genetic Algorithm Based Multi-document Summarization
1141
paragraphs and sentences, common words (stop words and other non-important words) are deleted, word stems are calculated, and word synsets are extracted from WordNet. In statistical module, word frequency, word weight, sentence position, sentence type, sentence score, sentence similarity matrix are calculated. In sentence extraction module, the best summary is chosen from the summary population, which is generated by a genetic algorithm. Pre-processing Module contains four operators: i) Split documents into paragraphs and sentences. ii) Get the stem of each word by the Porter stemmer. SBGA uses stem instead of its original word in the statistical module. iii) Get the type of each sentence according to punctuation at the end. The interrogative sentence and dialog should have less chance to appear in summary. iv) Get synset of each word from WordNet. 2.2 Statistical Module Three statistical variants, word score, sentence score and sentence similarity matrix, are calculated. Let D be a document cluster with common topics, and |D| be the number of documents in D, dk denote the kth document in D, si,k be the ith sentence in document dk, pj,k be the jth paragraph in document dk, w be a word, synset(w) be the synset of word w. (1) Word-Score The product of “Term Frequencies (TF)” and “Inverted Document Frequencies (IDF)” is commonly used as the weight of a word [6]. However, the TF*IDF is not accurate enough because authors prefer to use different vocabularies to express the same meaning. In this paper, not only words themselves but also their lexical meaning (according to WordNet) is employed while calculating word frequency. If a word’s synonym occurs in the same document, this word should have higher frequency. Hence, the method TFS adds the weighted synonym frequency to term (or word) frequency in order to enhance the accuracy. Weight of a synonym is calculated according to its position in the synset. SCORE_W( w) = TFS(w) ⋅ IDF( w) / D .
Where: TFS( w) =
∑
yi ∈{{ w}U synset ( w )}
(1)
th λ i f ( yi , D ) , yi is the i element in the union set of w and
its synset. f(yi,D) is the “Term Frequencies” of yi in document cluster D. λ i is posi# documents is the tion weight of synonym ( λ =0.5 in SBGA). IDF( w) = lg # documnets-contain-w “Inverted Document Frequencies”, which is computed over the documents in the whole corpus (we use BNC corpus in this work). The word has higher Word-Score is more informative. We define a keyword as:
∑ SCORE_W(w)
w is a keyword, if SCORE_W( w) ≥ 10 × w∈D
nW ( D)
Where nW(D) is the number of words in document cluster D. We define keyword set of sentence si , k as: keywords(si , k ) = {w w ∈ si , k ∧ w is a keyword} .
1142
D. Liu et al.
(2) Sentence-Score Four features are used when we check whether a sentence is suitable for the summary or not. They are Word-Feature FW ( si , k ) , Position-Feature FP ( si , k ) , Length-
Feature FL ( si , k ) , Type-Feature FT ( si , k ) . The Sentence-Score is defined as: SCORE_S( si , k ) = wW ⋅ FW ( si , k ) + wP ⋅ FP ( si , k ) + wL ⋅ FL ( si , k ) + wT ⋅ FT ( si , k )
(2)
Where: wW , wP , wL , wT are weights of each feature, which satisfy:
wW , wP , wL , wT > 0,
wW + wP + wL + wT = 1.
We let wW = 0.5 , wP = 0.3 , wL = 0.1 , wT = 0.1 in SBGA. (3) Sentence-Similarity-Matrix Similarity between two words is predefined as:
⎧0, if synset( w) U synset( w′) = 0 ⎪ . sim W ( w, w′) = ⎨ synset( w) I synset( w′) ⎪ synset( w) U synset( w′) , else ⎩
(3)
Let SIM_D denote the sentence similarity matrix of document cluster. The element
sim i , j is computed as follow:
sim i , j
⎧0, if keywords(si ) ⋅ keywords( s j ) = 0 ⎪ ⎪ ∑ ∑ simW (w, w′) = simS ( si , s j ) = ⎨ . w∈si w′∈s j ⎪ , else ⎪ keywords( si ) ⋅ keywords( s j ) ⎩
(4)
Where si , s j are the i th and the j th sentence in sentence collection of document cluster. 2.3 Sentence Extraction Module
The sentence extraction module generates a summary by GA, which starts from a random solution, and then builds, in each stage, a set of solution and evaluates them through maximizing the evaluation function. Suppose the original document cluster has N sentences, then, each summary is represented as a vector of N bits, where “1” in the i th position means that the i th sentence will be extracted, and “0” means otherwise. Initially, the algorithm starts with a random set of vectors (called initial population). Each vector (called genome or individual) has n 1s and N-n 0s. The evaluation function of GA is defined as: E(S) = wlen ⋅ LEN(S) + wcov ⋅ COV(S) + winfo ⋅ INFO(S) + wsim ⋅SIM(S)
(5)
Where: − S is the candidate summary generated according to an individual in the population. − LEN(S) associated to the “length” criteria been used in [7].
Genetic Algorithm Based Multi-document Summarization
1143
− COV(S) associated to the “coverage” criteria: keywords( S ) / keywords( D ) . − INFO(S) associated to the “informativeness” criteria: INFO( S ) = ∑ SCORE_S(si ) / max ) ∑ SCORE_S(si ) . S
si ∈S
) si ∈S
(6)
) Where S denote all of the summaries in the summary population. − SIM(S) associated to the “anti-redundancy” criteria:
∑ ∑ sim ( s , s )
SIM( S ) = 1 −
si ∈S s j ∈S si ≠ s j
S
i
j
nS ( S ) × (nS ( S ) − 1)
.
(7)
Where nS ( S ) is the number of sentences in summary S. − wlen , wcov , winfo , wsim are the weights of the corresponding features, which satisfy : wlen , wcov , winfo , wsim > 0, wlen + wcov + winfo + wsim = 1 ( wlen = 0.2 , wcov = 0.3 , winfo = 0.4 , wsim = 0.1 in SBGA). Applying selection, crossover and mutation operators, genetic algorithm can generate optimal summary after hundreds of iterations. In this work, the implementation of genetic algorithm is GAlib released by Mattew [8]. A summary is generated by reordering all extracted sentences according to their position in the original documents and the chronology of the original documents.
3 Experimentation and Evaluation We use the document set from DUC2003 and DUC2002 as the training data, and document set from DUC2004 as the testing data. It is claimed that ROUGE-1 consistently correlates highly with human assessments and has high recall and precision in significance test with manual evaluation results [9]. So we choose ROUGE-1 as the measurement of our experiment results. Results show that SBGA is an effective system, which score is only 0.55% lower than the best participant. By the way, the average score differences of 34 participants is 2.95%, and more than 2/3rd of these scores only have about 0.30% difference. Compared with clustering based system, SBGA gets 1.94% higher scores, which implies that the genetic algorithm based method performs better than the clustering based, and the problems such as how to determine the number of clusters are not exist. This work also tries to check whether the sentence similarity is helpful to improve summary quality. ExtraNews [7] uses GA as well, but it does not consider the sentence similarity while designing the GA evaluation function. SBGA gets 0.62% higher score than ExtraNews. To check whether the word sense is useful for improving the accuracy of Term Frequencies or not, we also evaluate the system SBGA and the system SBGA1. SBGA uses TFS as term frequency calculating method, whereas SBGA1 uses TF. Obviously, the summarizer performs better when word sense is taken into account.
1144
D. Liu et al.
Table 1. ROUGE-1 scores of different systems (Human average: the average score of the human summaries; Best system: the best participant system on DUC2004; SBGA: the system using GA-based sentence extraction; ExtraNews: the participant system on DUC2004, id=19; SBGA1: the system using TF; Clustering: the clustering based system which select one sentence from each cluster; Baseline system: the baseline system which selects the beginning 665 bytes from the latest document in each document cluster)
Human average Best system SBGA SBGA1 ExtraNews Clustering Baseline system
ROUGE-1(F-score) 0.40441 0.37917 0.37709 0.37521 0.37476 0.36978 0.32095
Human average 0 -6.24% -6.76% -7.22% -7.33% -8.56% -20.64%
SBGA +6.76% +0.55% 0 -0.36% -0.62% -1.94% -14.88%
References 1. Radev, D., Jing, HY., Budzikowska, M.: Centroid-Based Summarization of Multiple Documents: Sentence Extraction, Utility-Based Evaluation and User Studies. Information Processing and Management, Vol. 40(6) (2004) 919-938 2. Knight, K., Marcu, D.: Summarization Beyond Sentence Extraction: a Probabilistic Approach to Sentence Compression. Artificial Intelligence, Vol. 139 (1) (2002) 91–107. 3. Barzilay, R., McKeown, Kathleen R., Michael, E.: Information Fusion in the Context of Multi-Document Summarization. the 37th Annual Meeting of the Association for Computational Linguistics, New Jersey: Association for Computational Linguistics (1999) 550-557. 4. MAN`A-LO`PEZ, Manuel J.: Multi-document Summarization: An Added Value to Clustering in Interactive Retrieval. ACM Transactions on Information Systems, Vol. 22(2) (2004) 215–241. 5. Goldberg, DE.: Genetic Algorithms in Search Optimization and Machine Learning. Addision Wesley, New York (1989). 6. Baeza, Yates R., Ribeiro, Neto B.: Modern Information Retrieval. Addison Wesley, New York (1999) 27-30. 7. Jaoua, Kallel F., Jaoua, M.: Summarization at LARIS Laboratory. http://duc.nist.gov/pubs/ 2004papers/larislab2.jaoua.pdf (2004). 8. Matthew, W.: GAlib: A C++ Library of Genetic Algorithm Components. http://lancet.mit. edu/ga/ (1996). 9. Lin, CY., Hovy, E.: Automatic Eevaluation of Summaries Using N-gram Co-occurrence Statistics. http://www.isi.edu/~cyl/papers/NAACL2003.pdf (2003).
MaxMatcher: Biological Concept Extraction Using Approximate Dictionary Lookup* Xiaohua Zhou, Xiaodan Zhang, and Xiaohua Hu College of Information Science & Technology, Drexel University 3141 Chestnut Street, Philadelphia, PA 19104
[email protected], {xzhang, thu}@ischool.drexel.edu
Abstract. Dictionary-based biological concept extraction is still the state-ofthe-art approach to large-scale biomedical literature annotation and indexing. The exact dictionary lookup is a very simple approach, but always achieves low extraction recall because a biological term often has many variants while a dictionary is impossible to collect all of them. We propose a generic extraction approach, referred to as approximate dictionary lookup, to cope with term variations and implement it as an extraction system called MaxMatcher. The basic idea of this approach is to capture the significant words instead of all words to a particular concept. The new approach dramatically improves the extraction recall while maintaining the precision. In a comparative study on GENIA corpus, the recall of the new approach reaches a 57% recall while the exact dictionary lookup only achieves a 26% recall.
1 Introduction A biological concept is a unique meaning in biological domain. It represents a set of synonymous terms. For example, C0020538 is a concept about the symptom of hypertension in Universal Medical Language System (UMLS) [13]; it represents a set of synonymous terms including high blood pressure, hypertension, and hypertensive disease. In comparison with individual words, a concept is more meaningful; in comparison with multi-word phrases, a concept well solves polysemy and synonymy problems [12]. Therefore, using biological concepts can improve the performance of many applications such as large-scale biomedical literature retrieval, clustering, and summarization. There are volumes of work addressing the issue of biological concept extraction in literature. However, most of them utilize the special naming conventions or patterns to identify a few types of biological concepts such as genes, proteins and cells [1, 3, 4, 7, 8, 9, 10]. In general, those approaches are designed for very specific types of concepts, and work efficiently and effectively if the types of biological concepts have unique naming patterns. Many large-scale biomedical applications such as literature retrieval, clustering, and summarization, however, are interested in many rather than a few types of biological concepts most of which do not have unique naming patterns. *
This research work is supported in part from the NSF Career grant (NSF IIS 0448023). NSF CCF 0514679 and the research grant from PA Dept of Health.
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1145 – 1149, 2006. © Springer-Verlag Berlin Heidelberg 2006
1146
X. Zhou, X. Zhang, and X. Hu
For example, UMLS covers 135 semantic types of biological concepts; a typical genomic IR system will index all of them. The dictionary-based biological concept extraction is still the state-of-the-art approach to large-scale biomedical literature annotation and indexing [6, 11, 12]. Its major advantage over the pattern-based approach is that it not only recognizes names, but also identifies unique concept identities. Among dictionary-based approaches, the exact dictionary lookup is the simplest one, but always achieves low extraction recall because a biological term often has many variants such as morphological variants, syntactic variants, and semantic variants [2] while a dictionary is impossible to collect all of them. In this paper, we propose a new approach, referred to as approximate dictionary lookup, to the biological concept extraction. The basic idea is to capture the significant words rather than all words of a concept. For example, the word gyrb is significant to the concept “gyrb protein”; we will recognize it as a concept name even if the word protein is not present. Using UMLS Metathesaurus [13] as the dictionary, we implement this approach as an extraction system called MaxMatcher. We test the new approach on GENIA corpus [14]. As expected, the new approach dramatically raises the recall from 26% to 58%.
2 The Concept Extraction Approach To overcome the limitation of exact dictionary lookup, we introduce an approximate dictionary lookup technique. The basic idea of this technique is to capture significant words rather than all words in a concept name. For example, the word gyrb is obviously very significant to the concept “gyrb protein”; we treat it as a concept name even if the word protein is not present. So the problem is reduced to measuring the significance of any word to given concept names. In particular, we propose a relative significance score measure in this paper. Suppose a concept (c) has n concept names denoted as s1,…, sn, respectively. Let N(w) denotes the number of concepts whose variant names contain word w, and let wji denotes the i-th word in the j-th variant name of the concept, the significance of w to the concept is defined as follows:
I ( w, c ) = max{I ( w, s j ) | j ≤ n} where : w∉sj ⎧ 0 ⎪ 1 N ( w) I ( w, s j ) = ⎨ w∈sj ⎪ ∑ 1 N ( w ji ) ⎩ i
(2.1)
We use UMLS Metathesaurus 2005AA version [13] as the dictionary to train the significance score of each word to biological concepts containing that word. The UMLS Metathesaurus has a table called normalized string index, which record all normalized names of each concept. We remove normalized strings containing more than ten words and then use the remaining 2,573,244 strings to build the significance score matrix. A huge matrix, 509,170 rows (words) by 998,774 columns (concepts), is
MaxMatcher: Biological Concept Extraction
1147
Find next starting word t s k =0 C = {c | t s ∈ T ( c )} / * T ( c ) is the set of words appearing in names of concept c * / For each c ∈ C S c = I (t s , c ) / * I (t s , c ) is the score of word t s to concept c * / While next word t is not bounary word AND k < skip N = {c | t ∈ T ( c ) ∧ c ∈ C} IF N = ∅ Then k = k + 1 Else C=N For each c ∈ C S c = S c + I (t, c ) End If Wend C = {c | S c > threshold ∧ c ∈ C} If C > 0 Then return concept name and candidate concepts c ∈ C End If Fig. 1. The algorithm for extracting one concept name and its candidate concept IDs. The threshold is set to 0.95; the maximum number (skip) of skipped words is set to 1.
obtained. Because for each word, only a few concepts contain it, we use sparse matrix to make the storage and search more efficiently. During the stage of extraction, we use a set of simple rules to identify the boundary of a concept candidate. A biological concept name should begin with a noun, a number, or an adjective while ending with a noun or a number; it can not contain any boundary words including (1) punctuations (except hyphen, period, and single quote), verbs, and conjunctions and prepositions (except “of”). In other words, whenever a boundary word is encountered, a candidate concept name reaches its end. The detailed searching algorithm is shown in Figure 1. The major advantage of approximate dictionary lookup is that even if a concept name changes the word ordering a little bit, inserts or deletes a couple of insignificant words, it is still can be recognized. According to its definition, the significance score of a concept name should be equal to or greater than 1.0 if no word is missing. Thus, the threshold of significance score should be close to 1.0. If the threshold is too small, our approach may falsely recognize “high pressure” as the concept name “high blood pressure”; if it is too high, our approach may fail to recognize “gyrb” as “gyrb protein”. We found that 0.95 as the threshold gave good results for UMLS-based biological concept extraction. Our approach is able to recognize concept names with a couple of insertions such as articles, pronouns, and even nouns. The parameter skip controls the maximum number of insertions. We found that skip=1 gave good results. The searching results are concept names and corresponding concept IDs. If two or more concept IDs are returned, we need to further figure out the meaning the extracted concept name refers to. The words surrounding the extracted concept name are often indicative to the meaning [5]. Thus, we take surrounding words (4 to the left and 4 to the right) as the context and use the same algorithm as shown in Figure 1 to disambiguate the meaning of the extracted concept name if necessary.
1148
X. Zhou, X. Zhang, and X. Hu
3 Experimental Results We evaluate both efficiency and effectiveness of the MaxMatcher. The effectiveness is evaluated on GENIA 3.02 corpus [14] which consists of 2,000 human annotated PubMed abstracts. We compare the result of MaxMatcher with that of two other exact dictionary lookup systems, BioAnnotator [8] and ExactMatcher. ExactMatcher is implemented by us. The machine-extracted terms are compared with human annotations. Because human annotation is kind of subjective, we provide exact-match based evaluation and approximate-match based evaluation, following the evaluation method in [8]. For approximate-match, the human annotation should be the substring of the machine annotation, or the opposite. The comparison among three systems is presented in Table 1. For exact-match, MaxMatcher performs significantly better than the other two systems in terms of both precision and recall. For approximate match, the precision of MaxMatcher is comparable to that of the other two systems while the recall is significantly better than that of the other two. Table 1. The effectiveness comparison. BioAnnotator [8] actually tested several configurations. But only the configuration with only dictionaries (i.e. exact dictionary lookup) is compared. BioAnnotator was evaluated on GENIA 1.1 (containing 670 human annotated abstracts of research papers). The dictionary used for BioAnnotator also includes LocusLink and GeneAlias in addition to UMLS. IE Systems MaxMatcher ExactMatcher BioAnnotator
Exact Match Eva. Recall Precision F-score 57.73 54.97 56.32 26.63 31.45 28.84 20.27 44.58 27.87
Approximate Match Eva. Recall Precision F-score 75.18 71.60 73.35 61.56 72.69 66.66 39.75 87.67 54.70
For efficiency comparison, we download first 10,000 PubMed abstracts published in 2005 and count the time for annotating these abstracts by MaxMatcher and ExactMatcher, respectively. It takes 510 seconds for MaxMatcher to annotate all 10,000 PubMed abstracts; the average annotation speed is 19.6 abstracts per second. ExactMatcher is faster. It only costs 320 seconds to process those abstracts; the average annotation speed is 31.3 abstracts per second. However, ExactMatcher consumes much more memory (765Megabytes) than MaxMatcher (362 Megabytes).
4 Conclusions Dictionary-based biological concept extraction is still the state-of-the-art approach to the large-scale biomedical literature annotation and indexing. The exact dictionary lookup is very simple but always achieves low extraction recall because biological terms often have many variants while a dictionary is impossible to collect all of them. In this paper, we propose a generic approach, referred to as approximate dictionary lookup, to cope with the biological concept variation. The basic idea of the new approach is to capture the significant words of a biological concept rather than all of
MaxMatcher: Biological Concept Extraction
1149
them. A comparative study on GENIA corpus shows that the new approach can dramatically improve the extraction recall while maintaining the precision. However, the extraction efficiency of the new approach goes down a little bit in comparison with the exact dictionary lookup.
References 1. Chang, J.T., Schütze, H., and Altman, R.B., “GAPSCORE: finding gene and protein names one word at a time”, Bioinformatics, Vol. 20, No. 2, pp. 216-225, 2004. 2. Chiang, J.-H. and Yu, H.-C., “Literature extraction of protein functions using sentence pattern mining”, IEEE Transactions on Knowledge and Data Engineering, 17(8), Aug. 2005 Page(s):1088 – 1098 3. Collier, N., Nobata, C., and Tsujii, J., “Extracting the names of genes and gene products with a Hidden Markov Model”, Proc. COLING 2000, 201--207, 2000 4. Fukuda, K., Tamura, A., Tsunoda, T., and Takagi, T., “Toward information extraction: Identifying protein names from biological papers”, In Proceedings of Pacific Symposium on Biocomputing, pages 707--718, Maui, Hawaii, January 1998. 5. Lesk, M., “Automatic Sense Disambiguation: How to Tell a Pine Cone from and Ice Cream Cone”, Proceedings of the SIGDOC’86 Conference, ACM, 1986. 6. Rindfleisch, T.C., Tanabe, L., and Weinstein, J.N., “EDGAR: Extraction of Drugs, Genes and Relations from the Biomedical Literature”, Proceedings of Pacific Symposium on Bioinformatics, Hawaii, USA, pp. 514-525, 2000. 7. Song, Y.-I., Kim, S.-B., and Rim, H.-C., "Terminology Indexing and Reweighting methods for Biomedical Text Retrieval", In Proceedings of the SIGIR'04 Workshop on Search and Discovery in Bioinformatics, Sheffield, UK, ACM, July 2004. 8. Subramaniam, L., Mukherjea, S., Kankar, P., Srivastava, B., Batra, V., Kamesam, P. and Kothari, R., “Information Extraction from Biomedical Literature: Methodology, Evaluation and an Application”, In the Proceedings of the ACM Conference on Information and Knowledge Management, New Orleans, Louisiana, 2003. 9. Tanabe, L. and Wilbur, W., “Tagging gene and protein names in biomedical text”, Bioinformatics, Vol. 18, No. 8, pp.1124-1132, 2002. 10. Zhou, G.-D., Zhang, J., Su, J., Shen, D., and Tan, C.-L., “Recognizing Names in Biomedical Texts: A Machine Learning Approach”, Bioinformatics, 20(7), 1178-1190, 2004. 11. Zhou, X., Han, H., Chankai, I., Prestrud, A., and Brooks, A., “Converting Semi-structured Clinical Medical Records into Information and Knowledge”, Proceeding of The International Workshop on Biomedical Data Engineering (BMDE) in conjunction with the 21st International Conference on Data Engineering (ICDE), Tokyo, Japan, April 5-8, 2005. 12. Zhou, X., Hu, X. and Zhang, X., “Using Concept-based Indexing to Improve Language Modeling Approach to Genomic IR”, The 28th European Conference on Information Retrieval (ECIR’ 2006), 10 - 12 April, 2006, London, UK. 13. UMLS, http://www.nlm.nih.gov/research/umls/ 14. GENIA Corpus, http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/
Bootstrapping Word Sense Disambiguation Using Dynamic Web Knowledge Yuanyong Wang and Achim Hoffmann School of Computer Science & Engineering The University of New South Wales Sydney, Australia
[email protected],
[email protected] Abstract. Word Sense Disambiguation(WSD) is one of the traditionally most difficult problems in natural language processing and has broad theoretical and practical implications. One of the main difficulties for WSD systems is the lack of relevant knowledge–commonly known as the knowledge acquisition bottleneck problem. We present in this paper a novel method that utilizes dynamic Web data obtained through Web search engines to effectively enrich the semantic knowledge for WSD systems. We demonstrated through a word sense disambiguation system the large quantity and good quality of the extracted knowledge.
1 Introduction Word sense disambiguation problem is a problem in which the proper sense of a target word has to be determined in a given context. “All disambiguation work involves matching the context of the instance of the word to be disambiguated with either information from an external knowledge source (knowledge driven WSD), or information about the contexts of previously disambiguated instances of the word derived from corpora (data-driven or corpus-based WSD)”(Ide and Veronis,1998) [4]. Despite this seemingly simple principle the development in the field of word sense disambiguation over the years has always had difficulty obtaining enough prior knowledge whether it be “external knowledge source” or annotated corpora. Recently, due to the enormous data available on the Web many researchers are drawn to the Web for a way of solving the knowledge acquisition bottleneck problem. Many statistical word sense disambiguation models are built on huge corpora collected from the Web (Turney,2004) [5]. Web search engines like Altavista usually have several billions of web pages indexed. The collection of all the Web pages if treated as a single corpus and the search engine as a tool to draw statistical information from this corpus offer a very encouraging solution to the problems caused by hand tagging and data spareness for corpus based approaches. Our work is along the line of exploring the enormous potential of Web data and search engines as a gateway to this rich resource for WSD. This paper is organized as follows. In section 2 we describe the proposed WSD model that bootstraps from minimum initial human input by extracting Web data through search engines. In section 3 the model will be evaluated on 35 selected nouns Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1150–1154, 2006. c Springer-Verlag Berlin Heidelberg 2006
Bootstrapping Word Sense Disambiguation Using Dynamic Web Knowledge
1151
and the results presented. Comparison with relevant works is conducted in section 4. Finally, in section 5 conclusions and a discussion about future development are given.
2 Our WSD Model – Step 1: Seeding The first step is called seeding during which a human is asked to nominate three typical phrases or ”seed phrases” for each of the senses to be disambiguated for a target word. For example, for the target word ”age” two senses are considered. The first sense refers to ”a period in history”. The second sense refers to ”duration of life”. Three seed phrases for the first sense are: ”stone age”, ”digital age”, ”new age”. For the second sense the seed phrases are: ”old age”, ”age discrimination” and ”at age”. As shown above the seed phrases could be of any type (different types of noun phrases, prepositional phrases) as long as they are highly specific to only one of the senses disambiguated and contain the target word. – Step 2: Bootstrapping The second step is called bootstrapping . Each of the seed phrases are issued to the Web search engine Altavista as quoted queries. The snippets containing target words are extracted from the retrieved pages. Each snippet usually contains 30 to 40 words. For each seed phrase we automatically download approximately 1,000 such snippets. The snippets are pooled together for each sense so that each sense would have a snippet pool of 3,000 snippets. This way, the information contained in the three seed phrases specific to one sense is dramatically expanded to a sense specific snippet pool with texts of more than 10,000 words. – Step 3: Statistical knowledge extraction In the third step– statistical knowledge extraction , the statistical information is collected from the snippet pools on single word occurrences and 2-gram occurrences. For each sense we have two lists compiles. One list contains all the words that occur in its snippet pool ranked according to their frequency and the other list contains all the 2-gram phrases ranked likewise. All the functional words like ”the”, ”a” ... would be removed from the word list. Then all the phrases that do not contain the remaining words in the word list would also be removed from the corresponding phrase list. The word list is thus used to filter the phrase list of semantically irrelevant phrases. The different phrase lists are also used to do mutual pruning, where one phrase that is present in different phrase lists with significantly different frequencies are removed from the list where its frequency is low. Eventually only the pruned phrase lists will be used. – Step 4: Disambiguation The fourth step is called disambiguation . As the name suggests, this is where the actual sense disambiguation is performed on the test data. Each test case is a whole news article that contain the target word in its title (and most of the time in its body as well) from Google News (as in news.google.com). The overlap found between sense specific phrase lists and the news article will be counted. One overlap is defined to be one 2-gram phrase that occurs both in the phrase list and in the news article. Basically the one sense specific phrase list that has more overlap with the news article indicates the correct sense of the target word.
1152
Y. Wang and A. Hoffmann
– Step 5: Second bootstrapping The fifth step is called second bootstrapping . It is thus named because the model does bootstrapping again from the knowledge extracted in the third step in the hope of further expanding sense specific information. Similar to the first bootstrapping new seed phrases need to be first determined. The top phrases in expanded phrase lists will undergo the disambiguation process(step4) just like a test case. As a result of the disambiguation some phrases will be ruled out as noisy phrases. From the rest phrases the first three will be selected. They are called “second seed phrases” and will be used as seed phrases for next round of expansion. For the ”age” example, the size of the sense specific phrase list for sense one increased from 24608-phrase to 41472-phrase to 57777-phrase after two iterations. It increased from 13267-phrase to 27154-phrase to 40112-phrase for sense two.
3 Evaluation The WSD model is evaluated on a data set of 35 nouns. Among the 35 nouns 29 nouns are of two-way distinction, 5 nouns are of three-way distinction and 1 noun is of 4way distinction. All the disambiguated senses are subsets of WordNet senses for these target words. These nouns are chosen so that their senses have substantial presence in the Google News articles. The nouns are summarized below. – 2-way distinction nouns: party, tissue, atmosphere, image, mouse, drug, deposit, age, degree, nature, player, rally, toll, stage, plant, yard, mine, bass, organ, treatment, offense, head, trial, memory, body, bug, power, room, trunk – 3-way distinction nouns: paper, channel, charge, course, heart – 4-way distinction noun: interest The experiment results are summarized in Table 1. Precision and applicability measures are chosen to present the results because they are more informative than the comTable 1. Summary of the results over two iterations. ”#S” is the number of senses disambiguated. ”#N” is the number of nouns under the category. ”ave-prec” is average precision. Precision is the proportion of correct judgement in all the judgements made. ”ave-app” is average applicability . Applicability is the proportion of all the judgements made in all the test cases. ”iter-n” is the nth iteration of the algorithm. Each iteration consists of all 5 steps of the algorithm. #S
#N measure ave-prec 2 29 ave-app ave-prec 3 5 ave-app ave-prec 4 1 ave-app ave-prec total 35 ave-app
iter-1 92.8% 70.8% 81.4% 61.6% 74% 72.5% 90.8% 69.5%
iter-2 94.3% 76.1% 90% 64.3% 79.6% 67.5% 93.4% 74.4%
default 87.2% 100% 77% 100% 57.2% 100% 84.9% 100%
Bootstrapping Word Sense Disambiguation Using Dynamic Web Knowledge
1153
monly used precision/recall measure. At the second iteration the results of 93.3% precision and 74.4% applicability are comparable to 70% precision/recall. When a WSD system gives a judgement for every case its precision equals to its recall. Our system, if assumed to give judgement for every case (non-judgement cases are considered in correct), has 69.4%(multiply 93.3% by 74.4%) precision/recall. The results in the last column are simply obtained by adding the default sense heuristics to the second iteration disambiguation. Default senses that are predetermined by prior sense distribution statistics are assigned to those non-judgement cases.
4 Comparison with Related Works (Yarowsky,1995) [1] as one the pioneer in unsupervised WSD systems designed a model featuring one sense per discourse and bootstrapping algorithm to expand the sense specific collocations. Ten words are tested in his experiment all with 2-way distinction. Precision as high as 96% has been achieved but no information about applicability is given. In this work the collocation is treated as a bag of words. In comparison, our model uses phrases instead of words to avoid the ambiguity among the collocational words themselves. In his work Yarowsky also mentioned the fundamental limitation of the model is coverage and in half of the examples no overlap is found between sense specific information and the collocational contexts. In our work, because of the use of Web search engine to collect highly concentrated sense specific information the data sparseness problem is greatly reduced. Carroll and (McCarty et al,2000) [3] in their unsupervised system for Senseval-2 WSD competition automatically extract subject-verb and verb-direct object dependencies from a 90 million word corpus as selectional preference evidence for WSD. They achieved 69.1% precision and 20.5% recall for the all word task. The low recall is an apparent indication of the low coverage of the automatically extracted selectional preference information. The 90 million word corpus, despite being large in its overall size, is not necessarily large at all for a randomly chosen word. This is still the notorious data sparseness problem that troubles most corpus based approaches. Agirre and (Martinez,2004) [2] also used Google snippets to compile Web corpus for their minimally supervised WSD model. Their model tested on 29 nouns from the Senseval-2 competition achieved 49.8% precision/recall. The seeds they use are monosemous words from WordNet. This, while eliminating the possible ambiguity from the seeds, again seriously limited the coverage of the sense specific information automatically extracted from the Web. (Mihalcea et al,1999) [7] designed a simple and elegant WSD algorithm that utilizes WordNet glosses and word pairs from the disambiguation contexts to produce sense specific queries. The search engine hits of these queries are used as evidence to determine the correct sense. This model achieved a 80.1% precision over 384 word pairs manually extracted from Brown Corpus. No information about applicability is given. Many more WSD systems use Web in a different way and compile huge text corpora out of Web texts. (Turney,2004) [5] used a system that utilize syntactic and semantic feature vectors trained on the training data for disambiguation. Co-occurrence statistics extracted from a huge Web corpora are used to assist similarity score computation, which in turn helps assigning values to the semantic feature vectors. His fully
1154
Y. Wang and A. Hoffmann
supervised system when applied to Senseval-3 lexical sample task achieved 75.9% precision/recall. (Mihalcea,1999) [6] proposed a model that utilizes WordNet synsets and definitions as seeds and bootstraps by retrieving through search engines Web documents with these seeds as the queries. The retrieved documents are manually tested for their relevancy. The model is limited in that it is still largely word oriented (its limitation in terms of possible high ambiguity in the seeds is also mentioned by the author).
5 Conclusion and Future Work By comparing the experimental results of our proposed WSD model with related works, our approach appears very encouraging indeed. The proposed model, being minimally supervised, produced a comparable or favorable results to the best contemporary unsupervised and minimally supervised systems that attack tasks like Senseval-3 lexical sample task and similar tasks reported in other works. Using only phrases as seeds for bootstrapping and sense specific phrases for disambiguation should contribute the most to the high precision in our results. Using Web search engine to download large amounts of sense specific snippets as well as the snow balling effect of the iterative bootstrapping (with the assistance of first sense heuristics) are the determining factors of the high applicability in the results. One of the aspects of the model that apparently need to improve is the seed phrase selection process. We will explore more techniques to this end and to improve other aspects of our model in the future.
References 1. Yarowsky. David. Unsupervised word sense disamibuation rivalring supervised methods. In Proceedings of the 33rd Meeting of the Association for Computational Linguistics, 1995. 2. Agirre. E and Martinez. D. Unsupervised wsd based on automatically retrieved examples: The importance of bias. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004. 3. Carroll. J and McCarthy. D. Word sense disambiguation using automatically acquired verbal preferences. In Computers and Humanities, pages 34,1–2, Netherlands, 1999. 4. I. Nancy and V. Jean. Word sense disambiguation: The state of the art. In Ide, Nancy and Vronis, Jean (1998). Word sense disambiguation: The state of the art. Computational Linguistics, 24:1, 1-40., 1998. 5. Turney. D Peter. Word sense disambiguation by web mining for word co-occurence probabilities. In Proceedings of the 3rd International Worshop on the Evaluation of Systems for the Semantic Analysis of texts, pages 239–242, Barcelona, Spain., 2004. 6. Mihalcea. Rada and Moldovan. Dan. An automatic method for generating sense tagged corpora. In Proceedings of the American Association for Artificial Intelligence, Orlando, FL, 1999. 7. Mihalcea. Rada and Moldovan. Dan. A method for word sense disambiguation of unrestricted text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, College Park, MA., 1999.
Automatic Construction of Object Oriented Design Models [UML Diagrams] from Natural Language Requirements Specification G.S. Anandha Mala1 and G.V. Uma2 Department of Computer Science and Engineering, College of Engineering, Anna University, Guindy, Chennai, Tamil Nadu, India-600025
[email protected],
[email protected] Abstract. Application of natural language understanding to requirements gathering remains a field that has only limited explorations so far. This paper presents an approach to extract the object oriented elements of the required system. This approach starts with assigning the parts of speech tags to each word in the given input document. Further, to resolve the ambiguity posed by the pronouns, the pronoun resolutions are performed before normalizing the text. Finally the elements of the object-oriented system namely the classes, the attributes, methods and relationships between the classes, sequence of actions, the use-cases and actors are identified by mapping the ‘parts of speech- tagged’ words onto the Object Oriented Modeling Language elements using mapping rules which is the key to a successful implementation of user requirements.
1 Introduction As already several attempts have been made to semi automate the process of requirements capture, this is yet another approach of automatic construction of object oriented design model [UML diagram] from the natural language requirement specification. The paper begins with a review of the advances in the field of requirements engineering in section 2. The proposed methodology is explained in section 3. Our implementation and results are explained in section 4. The conclusion and future work is contained in Section 5.
2 Related Work The first relevant published technique attempting to produce a systematic procedure to produce design models from NL requirements was Abbot [1]. Abbot suggested a non-automatic methodology that only produces static analysis and design products obtained by an informal technique requiring high participation with that of users for decisions. Methods to bring out a justified relationship between the natural- language structures and OO concept is proposed by Sylvain [9] who show that computational linguistic tools, are appropriate for preliminary computer assisted OO analysis. Sawyer in their REVERE [5] makes use of a lexicon to disambiguate the word senses thus obtaining a summary of requirements from a natural language text but do not Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1155 – 1159, 2006. © Springer-Verlag Berlin Heidelberg 2006
1156
G.S. Anandha Mala and G.V. Uma
attempt to model the system. Liwu Li [6] also presents a semi-automatic approach to translate a use case to a sequence diagram. It needs to normalize a use case manually. Overmyer [8], also present only a complete interactive methodology and prototype. However, the text analysis remains in good part a manual process. Liu [3] present an approach, which uses formalized use cases to capture and record requirements. Ke Li [4] also semi-automate the process of requirement elicitation where the text is matched with predefined statements. If there is no match then get help from user to clarify incomplete/ambiguous data. Participation of domain experts, customer are needed in class identification process in contrast to our fully automatic methodology.
3 The Proposed System In all the earlier works mentioned, the requirement elicitations are not fully automatic. The proposed methodology includes the automatic reference resolution, which eliminates the user intervention as in the previous works. The system architecture is shown in fig. 1. The system named as ‘REQUIREMENTS ELICITOR’. The given input problem statement is split into sentences by the sentence splitter for sentence tagging. Then each sentence is subjected to tagging in order to get the parts of speech marker for every word. The noun and the verb phrases are identified for the tagged text by chunker based on simple phrasal grammars. To remove ambiguity posed by pronouns, they are resolved to their respective noun phrases by reference resolver. The text has to be simplified into the following constructs by the normalizer to ease the task of mapping the words onto the Object Oriented system constituents. • • •
•
Conditional: Conditional syntax is If aCondition transaction [else other Transtions] Iteration: Iteration syntax is While condition transactions endwhile. Concurrency: Concurrency syntax is Start concurrency transaction 1 … concurrent transaction k end concurrency which executes transaction l, to transaction k concurrently. (Synchronization) A synchronization syntax is Start Synchronization transaction 1 … synchronized transaction k end Synchronization
Which requires synchronize transaction l, to transaction k. All the transaction statements are simple. A number of patterns using conjunctions and their corresponding splits in the sentences are stored in the catalog. Each sentence is checked against the stored patterns and the corresponding split up is made. For example “If the source and the destination of the request fall on the same route, the receptionist checks the seat that are available and issues the ticket to the passenger and blocks the seat” is normalized to If the source and the destination of the request fall on the same route The receptionist checks the seat. The receptionist issues the ticket to the passenger The receptionist blocks the seat End if
Automatic Construction of Object Oriented Design Models
1157
Input problem statement
REQUIREMENTS ELICITOR Preprocessor Sentence splitter
Tagger
Chunker
Reference Resolver
Normalizer
Catalog
NL-OOML Mapper
Syntactic Structures
Message Records & Object Oriented Design Models Fig. 1. System Architecture
The NL-OOML mapper accepts a normalized problem description as input. Using the syntactic structures as in table 1. it translates each normalized sentence into a message record. Simple rule based approach is followed for identifying OO elements.The rules are, 1: Translating Nouns to Classes. A noun, which does not have any attributes, need not be proposed as a class. 2: Translating Noun-Noun to Class-Property according to position. When two nouns appear in sequence in the text, the first Noun is translated to Class and the following Noun is translated to properties of this Class. 3: A simple heuristic is used to decide which nouns are classes, and which form the attribute. In Noun-Noun, if the first noun is already been chosen as the class then the second noun is taken as the attribute. The attributes are decided based on the verb phrase. 4: Translating the lexical Verb of a non-personal noun to a Method of this noun. Decide the sender, receiver classes and argument to this method based on the Table 1. 5: Translating the lexical Verb of a personal noun to a use case (or part of a use case) linked with an actor defined by this noun. 6: Matching a Noun to a Personal Pronoun as the nouns of previous sentence.
1158
G.S. Anandha Mala and G.V. Uma Table 1. Syntactic Structures of Simple Sentences
4 Implementation and Results The ‘Requirements Elicitor’ was implemented using JAVA and validated using 100 problem samples each of around 500 lines. The result produced by the system was compared with that of the human output. The human outputs were the results that were obtained by conducting the noun-verb analysis on the text. It was considered as the baseline and taken as expert judgment. The system does not miss to identify any of the classes and methods. But approximately 12.4% of additional classes and 7.4% of additional methods are identified in the entire sample taken, those that are removed by human by intuition that they may not be classes. Since system lacks that knowledge, they are listed as classes. The missed out methods occur only if the tagger assigns a wrong tag to the word. Also the system perfectly identifies all the attributes, usecases and actors with out any additional, missed or miss assignments.
5 Conclusion and Future Work The project presents an approach to restructure the natural language text into a modelling language in order to elicit the stated requirements of a system. Further the work can be extended for identifying the different modules present in the requirement specification by properly segmenting the input text, which will help us to identify the packages. The deficiencies in the tagger and the reference resolver can be overcome by building a knowledge base which can also improve the effectiveness of generation of the system elements.
Automatic Construction of Object Oriented Design Models
1159
References 1. Abbot.R.J.: “Program Design by informal English descriptions”. Communications of the ACM, vol.26, (1983) 882 – 894. 2. Brill E.: “A simple rule-based part-of-speech tagger”. Proceedings of Third ACL Conference on Applied Natural Language Processing, Trento, Italy, (1992) 152-155 3. Dong Liu, Kalaivani Subramaniam, Behrouz H. Far, Armin Eberlein: “Automating transition from use cases to class model”, MSc Thesis, University of Calgary, (2003). 4. Ke Li: “Towards Semi-automation in Requirements Elicitation: mapping natural language and object-oriented concepts”, 13th IEEE International Requirements Engineering Conference, (2005) 5. Sawyer P., P Rayson, and R Garside: “REVERE: support for requirements synthesis from documents”, Information Systems Frontiers Journal. Vol.4, (2002) 343 - 353. 6. Liwu Li: “A semi-automatic approach to translating use cases to sequence diagrams”, Proceedings of Technology of Object-Oriented Languages and Systems, July (1999), IEEE CS Press, 184 –193 7. Mitkov. R: “Robust pronoun resolution with limited knowledge”, Proceedings of the 18.th International Conference on Computational Linguistics (COLING'98)/ACL'98”, Montreal, Canada, (1998) 869-875. 8. Overmyer ScottP., Lavoie.B.Rambow: “Conceptual modelling through linguistic analysis using LIDA”, Proceedings of the 23rd International Conference on Software Engineering, ICSE 2001, Toronto (2001). 9. Sylvain Delisle, Ken Barker, Ismaïl Biskri: "Object-Oriented Analysis: Getting Help from Robust Computational Linguistic Tools, in G. Friedl, H.C. Mayr (eds) Application of Natural Language to Information Systems, Oesterreichische Computer Gesellschaft, (1999) 167-172.
A Multi-word Term Extraction System Jisong Chen, Chung-Hsing Yeh, and Rowena Chau The Clayton School of Information Technology, Monash University, Clayton, Victoria 3800, Australia {Jisong.Chen, ChungHsing.Yeh, Rowena.Chau}@infotech.monash.edu.au
Abstract. Traditional statistical approaches for identifying multi-word terms have to handle a large amount of noisy data and are extremely time consuming. This paper introduces a multi-word term extraction system for extracting multiword terms from a set of documents based on the co-related text-segments existing in these documents. The system uses a short predefined stoplist as an initial input to segment a set of documents into text-segments, calculates the segment-weights of all text-segments, and then applies the short text-segments to segment the longer text-segments based on the weight values recursively until all text-segments cannot be further divided. The resultant text-segments can thus be identified as terms based on a specified threshold. The initial experimental result on a set of traditional Chinese documents shows that this system can achieve a minimum of 76.39% of recall rate and a minimum of 91.05% of precision rate on retrieving multiple occurrences terms, which include 18.30% of new identified terms.
1 Introduction A term (a concept and its designation) can consist of a single word or multiple words. In general, a multi-word term may carry more meaning than a single-word term and can represent documents more actually. Statistical approaches to multi-word term identification are based on the detection of one or more lexical units in specialized documents with a frequency-derived value higher than a given threshold [1,2]. The concept is that documents are characterized by the repeated use of certain lexical units or morph-syntactic constructions. Traditional statistical methods for multi-word term identification have to handle a large amount of noisy data and are very time consuming. In this paper, we suggest a multi-word term extraction system which uses a new and effective statistical method for identifying multi-word terms. The system uses a short predefined stoplist as an initial input to segment a set of documents into text-segments, calculates the segmentweights of all text-segments, and then applies the short text-segments to segment the longer text-segments based on the weight values. The system performs the weighting and segmenting tasks on the newly generated text-segments recursively until all textsegments cannot be further divided. The resultant text-segments can thus be identified as terms based on a specified threshold. The system has been experimented on a set of Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1160 – 1165, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Multi-word Term Extraction System
1161
traditional Chinese documents which was downloaded from the Hong Kong government website (www.info.gov.hk). The experimental result shows that this system can achieve a minimum of 76.39% of recall rate and a minimum of 91.05% of precision rate on retrieving multiple occurrences multi-word terms, which include 18.30% of new retrieved terms.
2 Proposal of the Multi-word Term Extraction System Fig. 1 shows the proposed multi-word term extraction system. The system includes four components: a text-segment generator which uses a short predefined stoplist as an initial input to segment a set of text documents into text-segments; a text-segment weigher which calculates the segment-weight for each generated text-segments; a textsegment segmenter which segments all text-segments each other based on their segment-weights to generate new text-segments - term candidates; the term candidates can be re-input for further segmentation or directly input to the next component – a term identifier which identifies the resultant term candidates to be terms based on a specified threshold.
Fig. 1. A Proposed Multi-Word Term Extraction System
2.1 Generating Text-Segments In a text document, some words, which have a very low discrimination value when it comes to Information Retrieval (IR) – known as stopwords, are existent. By removing these stopwords from each sentence, then a sentence may become one or multiple text-segments. A text-segment may include one or more words (a Chinese character is treated as a word). As this research focuses on multi-word term identification, a textsegment here only refers to a segment with multiple words.
1162
J. Chen, C.-H. Yeh, and R. Chau
During this stage, a term frequency (TF) and a document frequency (DF) for each text-segment are recorded. A term frequency for a text-segment t (TFt) represents the total occurrences of the text-segment t, and a document frequency for a text-segment t (DFt) refers the number of documents which the text-segment t occurs. 2.2 Weighing Text-Segments To develop a scheme for weighing text-segments, the following factors have been considered: (a) a multi-word term with a longer text-segment is more descriptive than a short one. In other words, the segment-length of a text-segment, which indicates the number of words in the text-segment, must be included for estimating its segmentweight; (b) a new multi-word term such as a new terminology may frequently appear in certain documents which discuss this terminology. That is, this kind of terms should have a high TF; (c) an existing multi-word term such as a commonly used phrase usually has a high DF and/or a high TF, due to its frequent usage. Given a text-segment t ∈ C (C is a group of documents), the segment-weight Wt can be calculated in Equation (1). In Equation (1), TFt is the term frequency of the textsegment t, DFt is the document frequency of text-segment t, and Lt denotes the segment-length of the text-segment t. Wt = TFt × DFt ×
Lt
(1)
Based on the Equation (1), a text-segment with longer segment-length, a high TF and/or a high DF will gain larger segment-weight. 2.3 Segmenting Text-Segments Fig. 2 shows the rule for segmenting text-segments. Generally speaking, a term with a longer text-segment is more descriptive than a short one. Always applying the shorter text-segment to segment the longer one may lead to the loss of meaningful terms. As such, the segmentation rule in Fig. 2 applies a segment-weight for each text-segment. Only a text-segment with a higher weight can segment a text-segment with a lower weight. As discussed in Section 2.2, a multi-word term with a longer text-segment will usually have a higher segment-weight, and as such, will have a lower opportunity to be further segmented by other text-segments. Given two text-segments p and q, p can be further segmented by q when the following conditions hold simultaneously: 1. q ⊂ p 2. Wq > Wp where Wp and Wq denote the segment-weights of text-segments p and q respectively. Fig. 2. Rule for segmenting a text-segment
A Multi-word Term Extraction System
1163
2.4 Selecting Terms As discussed in Section 2.2, a new multi-word term such as a new terminology may have a high TF whereas an existing multi-word term may have a high DF and/or a high TF. As such, the term selection algorithm can be developed based on both TF and DF. In this research, a term-selection-weight (TSW) has been used to evaluate each term candidate. Equation (2) shows the developed method for evaluating the term-selection-weight for term candidate i. In Equation (2), TFi and DFi represent the values of TF and DF respectively for the term candidate i.
TSWi = TFi × DFi
(2)
Once each term candidate has been given its term selection weight, the system can select the terms from these candidates based on a specified threshold.
3 Experiment 3.1 Experimental Resource The Hong Kong government Website at http://www.info.gov.hk is used for the experiment. The experiment has been mainly conducted on the traditional Chinese documents (encoded in BIG-5). An Oracle defined stoplist [3] has been used to segment these documents. A P4, CPU 1.8G, Ram 256MB computer was used for the experiment. By using a Web spider, called Teleport, a collection of 308 Web documents in traditional Chinese were randomly fetched from the Website. The extracted terms from these documents were identified based on the following methods: (a) Using Tsai's list of Chinese words [4] dictionary to identify the extracted terms; (b) for remaining un-identified terms, manually checked their correctness. 3.2 Setting Term Selection Thresholds and Evaluation Base Based on the Equation (2), three experimental thresholds Ex1, Ex2 and Ex3 have been identified as 2, 3 and 4 respectively. The Ex1 is assigned to extract the minimum multiple-occurrence terms. The Ex2 and Ex3 are assigned for comparing with Ex1. To provide an evaluation base, the Tsai’s list of Chinese words has been applied to extract terms from the selected 308 documents. A total of 11662 terms have been extracted from these documents and the total time spent for this extraction process was 690 mins 24 seconds. Table 1 shows the extraction result for different thresholds. Table 1. Dictionary-based extraction result
No. Of Terms (NOT)
Ex1 6939
Ex2 5216
Ex3 4920
1164
J. Chen, C.-H. Yeh, and R. Chau
3.3 Experiment Result Table 2 shows the experiment result based on different thresholds. For each result, a number of extracted terms including in the dictionary (TiD) and a number of extracted terms excluding from the dictionary (TxD) have been recorded. Table 2. Experiment Result
Ex1 5301 1985
TiD TxD
Ex2 4273 1251
Ex3 4251 1215
Recall and Precision: Table 3 shows the recall rate and the precision rate for each result. The recall rate and the precision rate are calculated based on Equations (3) and (4) respectively. The result in Table 3 shows that the higher threshold is applied, the higher recall rate and the higher precision rate can be achieved. Table 3. Recall and Precision
Recall (%) Precision (%)
Ex1 76.39 72.76
Ex2 81.92 77.35
Ex3 86.40 77.77
Recall = TiDExi / NOTExi
(3)
Precision = TiDExi / (TiDExi + TxDExi)
(4)
New Identified Terms: the experimental result in Table 2 shows that a large number of extracted terms are not included in the dictionary (TxD). By manually examining these terms, most of them are meaningful phrases, person’s names, street names, and so on. By including these manually identified new terms, Equation (5) should be used to calculate the precision rate instead of Equation (4). Table 4 shows the total number of these manually identified new terms (NT) and the recalculated precision rates. Precision = (TiDExi + NTExi) / (TiDExi + TxDExi)
(5)
Table 4. Number of Manually Identified New Terms (NT)
NT Precision (%)
Ex1 1333 91.05
Ex2 944 94.44
Ex3 917 94.65
System Efficiency: the complexity of the multi-word term extraction system is O(N2) in worst case, where N is the number of the generated text-segments. In this
A Multi-word Term Extraction System
1165
experiment, the total time spent was 386 minus and 35 seconds (the dictionary-based approach spent 690 minus and 24 seconds).
4 Conclusion In this paper, we have presented the multi-word term extraction system, a new automatic statistical approach for identifying multi-word terms based on co-related text-segments existing in a set of documents. New algorithms have been developed to identify multi-word terms effectively and efficiently. Object-oriented techniques have been applied to develop this extendable system. The experiment conducted on a set of traditional Chinese documents downloaded from the Hong Kong government Web site has shown that this system can achieve a minimum of 76.39% of recall rate and a minimum of 91.05% of precision rate for retrieving multiple occurrences terms, which include 18.30% of new identified terms. The experiment on English documents is ongoing.
References 1. J. S. Chang, S. D. Chen, S. J. Ker, Y. Chen, and J. Liu, “A multiple-Corpus Approach to Recognition of Proper Names in Chinese Texts”, Computer Processing of Chinese and Oriental Languages, Vol. 8, No. 1, 1994, pp. 75–85. 2. Y.-S. Lai and C.-H. Wu, “Unknown Word and Phrase Extraction Using a Phrase-Like-UnitBased Likelihood Ratio”, International Journal of Computer Processing of Oriental Languages, Vol. 13, No. 1 (2000) 83–95. 3. Chinese Stoplist (Traditional). http://www.lc.leidenuniv.nl/awcourse/oracle/text.920/ a96518/astopsup.htm#45728 4. C.-H. Tsai, “A Review of Chinese Word Lists Accessible on the Internet”, http:// technology.chtsai.org/wordlist/.
A Multiscale Self-growing Probabilistic Decision-Based Neural Network for Segmentation of SAR Imagery Xian-Bin Wen1, Hua Zhang1, and Zheng Tian2 1
Department of Computer Science and Engineering, Tianjin University of Technology, Tianjin 300191, China
[email protected] 2 Northwestern Polytechnical University, Xi’an, 710072, China
Abstract. A new segmentation algorithm for synthetic aperture radar (SAR) image is proposed using multiscale self-growing probabilistic decision-based neural network (MSPDNN). The proposed algorithm is able to find the natural number of category in SAR image based on the Bayesian information criterion (BIC). The learning process starts from a single SAR image at proper scale randomly initialized in the feature space, and grows adaptively during the learning process until most appropriate number of category are found. Experimental results of the proposed algorithm are proposed and compared with that of previous algorithms.
1 Introduction Synthetic Aperture Radar (SAR) image segmentation is usually said to be a complex problem in the pattern recognition area, due to the presence of speckle. To fully exploit the coherent nature and complexity of SAR image formation, we employ a introduced class of mixture multiscale autoregressive (MMAR) model evolving on dyadic trees, and the expectation-maximization (EM) algorithm to the MMAR model is applied [1]. However, the EM algorithm has a high possibility of being trapped in local optima and is also slow to the converge [2]. In this paper, we propose a new MMAR-based neural network, namely multiscale self-growing probabilistic decisionbased neural networks (MSPDNN). The leaning process starts from randomly initializing a single SAR image at any scale in the feature space and adaptively growing the category until the most appropriate number of category are reached.
2 Multiscale Self-growing Probabilistic Decision-Based Neural Network Given SAR image associated with class ωi , multiscale sequence X L , X L −1 , K , X 0 of SAR images are constructed as in [1], the pixel mapped to node s is denoted as X ( s ) .We assume that the likelihood function f ( X ( s ) | ωi , ℑs ) for class ωi is a MMAR model, i.e. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1166 – 1170, 2006. © Springer-Verlag Berlin Heidelberg 2006
A Multiscale Self-growing Probabilistic Decision-Based Neural Network
1167
f ( X ( s ) | ωi , ℑs ) = ∑ P (Θ r |i | ωi ) p ( X ( s ) | ωi , Θ r |i ) . R
r =1
⎛ X ( s ) − ar |i ,0 − ar |i ,1 X ( sγ ) − L − ar |i , p ( sγ r|i = ∑ P (Θ r |i | ωi )Φ ⎜ ⎜ σ r |i r =1 ⎝ R
pr|i
)⎞ ⎟. ⎟ ⎠
(1)
and the parameters of the MMAR model can be estimated iteratively by EM algorithm[1], where Θ r |i represent the parameters of the rth mixture component, R is the total number of mixture components, p ( X ( s ) | ωi , Θ r |i ) and P(Θ r |i | ωi ) is distribution
function and the prior probability (also called mixture coefficients) of rth component, respectively. ∑ r =1 P(Θr |i | ωi ) = 1 , f ( X ( s ) | ωi , ℑs ) is the probability distribution R
function. ℑs is the set of X ( sγ ) , K , X ( sγ pi ) ( pi = max pr |i ), sγ is defined to referr
ence the parent of node s , and Φ (.) be the standard normal distribution function. MSPDNN is a multi-variate of multiscale Gaussian neural network, and employ a modular network structure, as shown in Fig.1. A detailed description of the MSPDNN will be given in the following sections.
Fig. 1. The structure of the multiscale self-organizing mixture network
2.1 Discriminant Functions of MSPDNN
Based on the likelihood function p( X ( s ) | ωi , ℑs ) for class ωi , the discriminate function of the muti-class MSPDNN models the log-likelihood function
ϕ ( X ( s), wi ) = log f ( X ( s) | ωi , ℑs ) . where wi = {Θ r |i , P(Θr |i | ωi ), Ti } . Ti is the output threshold of the subnet i .
(2)
1168
X.-B. Wen, H. Zhang, and Z. Tian
2.2 Locally Unsupervised Learning
Given a set of patterns X + = { X (t ), t = 1, 2,..., N } and a set of candidate MMAR models M = {MM i | i = 1, 2..., L} , each model associated with a parameter set wi . In order to select a proper model MM i from M to represent the distribution of X + , the Bayesian information criterion (BIC) for model M i and training data X + is defined as BIC( MM i , X + ) = −2 log P( X + | wˆ i , MM i )) − d ( MM i ) log N .
(3)
where wˆ i is a maximum likelihood estimate of wi , d ( MM i ) is the number of free parameters in model MM i , and N is the number of train data. Like analysis in [3], choosing the model with minimum BIC is equivalent to choosing the model with the largest posterior probability. And, BIC can be used to compare model with differing parameterization, differing numbers of cluster components, or both. So, if there are two candidate models MM 1 and MM 2 for modeling a data set X + , BIC difference ΔBIC21 ΔBIC21 ( X + ) = BIC( MM 2 , X + ) − BIC( MM 1 , X + ) .
(4)
can be used to evaluate which model is a preferred one. The unsupervised training process of LU learning based on BIC can be described below: (1) Construct multiscale sequence of SAR image. (2) Set the initial number of multiscale mixture Gaussian components: Gc = 1 , and set randomly initial values of parameters in Θ , which represents the parameters of a MMAR. (3) If ΔBIC21 ( X + ) ≤ growing-confidence, then, relearn Θ by appling EM algorithm on a uni-component multiscale mixture Gaussian on X + , and process terminates; or else increment Gc , and relearn Θ by appling EM algorithm on two components multiscale mixture Gaussian on X + . (4) Clustering: let EM_classi denote the input data X (t ) which belongs to the ith multiscale Gaussian component after EM learning. For each pattern X (t ) in X + , if k = arg max i {P(λi | X (t )} , assign X (t ) to EM_classi. (5) Grows one component: let growing= argmax i {ΔBIC21 ( EM_classi)}, for i = 1, …, Gc , if max{ΔBIC21 ( EM_classi) ≤ growing-confidence, process terminates; % and Θ % of the newly split two components or else we initialize parameters Θ 1
2
% from EM_classgrowing, and remove the parameter Θ growing from Θ ; update Θ by % and Θ % into Θ ; increment G . putting Θ 1 2 c (6) Using current Θ as the initial values, perform EM learning on all the clusters. (7) Repeat (4)-(6) until process terminates.
A Multiscale Self-growing Probabilistic Decision-Based Neural Network
1169
2.3 Global Supervised Learning
In the Globally Supervised (GS) training phase, training data are used to fine-tune the decision boundaries. Specifically, when a training pattern is misclassified to the ith class, reinforced and /or anti-reinforced learning are applied to update the parameters of subnet i . Thus we have reinforced learning wi( m +1) = wi( m ) + η∇ϕ ( X i (m), wi ) .
(5)
and anti-reinforced learning w(j m +1) = w(j m ) − η∇ϕ ( X i (m), w j ) .
(6)
where η is user-assigned (positive) learning rates, 0 < η ≤ 1 . For the false rejection data set D2i , reinforced and anti-reinforced learning will be applied to class ωi and
ω j , respectively. As for the false acceptance set D3i , anti-reinforced learning will be applied to class ωi , and reinforced learning will be applied to class ω j . The gradient vector ∇ϕ in (5) and (6) can be computed in the similar manner, as proposed in [4]. The false rejection data set D2i and the false acceptance set D3i are defined as follows: D2i ={ X (t ); X (t ) ∈ ωi , X (t ) is misclassified to another class ω j }. D3i ={ X (t ); X (t ) ∉ ωi , X (t ) is classified to ωi }. The threshold value Ti of a subnet i in the MSPDNN recognizer can also be learned by reinforced or anti-reinforced learning rules. Specifically, the threshold Ti at iteration m is updated according to the reinforced learning Ti ( m +1) = Ti ( m ) + ηt l ′(Ti ( m ) − ϕ ( X i (m), wi )) .
(7)
and anti-reinforced learning Ti ( m +1) = Ti ( m ) − ηt l ′(ϕ ( X i (m), wi ) − Ti ( m ) ) .
(8)
where ηt is a positive learning parameter, l (.) is a penalty function, and l ′(.) is the derivative of the penalty function.
3 Experimental Results for SAR Imagery To demonstrate the proposed algorithm, we apply it to a complex SAR images in Fig. 2(a), which consists of multiple classes of homogeneous regions, respectively. In the experiments, we generate an above-mentioned quadtree representation and use a two orders regression. Because it is found that by increasing the regression order to p = 2 for both images, we can achieve a lower probability of misclassification and a
1170
X.-B. Wen, H. Zhang, and Z. Tian
(a)
(b)
(c)
Fig. 2. (a) Original SAR image composed of woodland and cornfield. (b) Segmented image obtained using EM algorithm. (c) Segmented image obtained using MSPDNN algorithm.
good trade-off between modeling accuracy and computational efficiency. Learning rates η and ηt in the MSPDNN were set to 0.5 and 0.05, respectively. The penalty function l ( x) is chosen to be 1/(1 + exp( x )) . Fig. 2 shows segmentation results from applying MSPDNN to the SAR images, as well as results from EM algorithm in [1] for comparison. From Fig. 2, the MSPDNN algorithm not only performs better than the EM algorithm, especially at the boundaries, but also can automatically select the proper number of cluster in images, and converges much faster than the EM algorithm, and considerably reduces the segmentation time.
References 1. Wen, X.B., Tian, Z.: Mixture Multiscale Autoregressive Modeling of SAR Imagery for Segmentation. Electronics Letters 39 (2003) 1272-1274 2. Redener, R.A., Walker, H.F.: Mixture Densities, Maximum Likelihood and the EM Algorithm. SIAM Rev. 26 (1984) 195-239 3. Kass, R.E.: Bayes Factors. J. Amer. Statist. Assoc. 90 (1995) 773-795 4. Lin, S.H., Kung, S.Y., Lin, L.J.: Face Recognition/detection by Probabilistic Decision-based Neural Networks. IEEE Trans. Neural Networks, special issue on Artifi. Neural Network Pattern Recog. 8(1997) 114-132
Face Detection Using an Adaptive Skin-Color Filter and FMM Neural Networks* Ho-Joon Kim1, Tae-Wan Ryu2, Juho Lee3, and Hyun-Seung Yang3 1
School of Computer Science and Electronic Engineering Handong University, Pohang, 791-708, Korea
[email protected] 2 Department of Computer Science, California State University, Fullerton, CA, 92834 , USA
[email protected] 3 Department of Computer Science, KAIST Daejeon, 305-701, Korea {jhlee, hsyang}@paradise.kaist.ac.kr
Abstract. In this paper, we present a real-time face detection method based on hybrid neural networks. We propose a modified version of fuzzy min-max (FMM) neural network for feature analysis and face classification. A relevance factor between features and pattern classes is defined to analyze the saliency of features. The measure can be utilized for the feature selection to construct an adaptive skin-color filter. The feature extraction module employs a convolutional neural network (CNN) with a Gabor transform layer to extract successively larger features in a hierarchical set of layers. In this paper we first describe the behavior of the proposed FMM model, and then introduce the feature analysis technique for skin-color filter and pattern classifier.
1 Introduction Growing interest in computer vision has motivated a recent surge in research on problems such as face recognition, pose estimation, face tracking and gesture recognition. However, most methods assume human faces in their input images have been detected and localized [1-2]. Recently, skin detection has emerged as an active research topic in several practical applications including face detection and tracking [3-4]. In this paper we present an adaptive skin-color filter model which is capable of adjusting the skin-color model by a training process. We also present an improved neuro-fuzzy pattern classification model based on FMM neural networks [5-6]. Since the weight factor can be adjusted by training process, the system can prevent undesirable performance degradation which may be caused by some environmental factors such as *
This research was supported by a 21st Century Frontier R&D Program and Brain Neuroinformatics Research Program sponsored by Minister of Information and Communication and Minister of Commerce, Industry and Energy in KOREA.
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1171 – 1175, 2006. © Springer-Verlag Berlin Heidelberg 2006
1172
H.-J. Kim et al.
illumination changes. Through the feature analysis using the proposed model, we can select the most relevant features for the skin-color filter as well as the pattern classifier.
2 Underlying System As shown in Fig. 1, the underlying face detection system consists of three subprocesses: skin-color filter, feature extractor and pattern classifier.
Fig. 1. The underlying face detection system
Through the skin color analysis and training process, the system can generate an adaptive skin model and a relevant feature set for the given illumination condition. The feature extractor generates numerous features from the input image. The number of features and the relevance factors of the features affect the computation time and the performance of the system. Therefore we propose a feature analysis technique to reduce the amount of features for the pattern classifier.
3 A Weighted FMM Neural Network We have proposed a modified FMM neural network[7] called weighted fuzzy minmax(WFMM) neural network. In this paper we present an improved structure of the model and a feature analysis method. As shown in Equation (1) and (2), the model employs an activation function which has the factors of feature value distribution and the weight value for each feature in a hyperbox. The hyperbox membership function has weight factor to consider the relevance of each feature as different value. In the equation, wij is the connection weight between i-th feature and j-th hyperbox. The weighted FMM neural network is capable of utilizing the feature distribution and frequency in learning process as well as in classification process. Since the weight factor effectively reflects the relationship
Face Detection Using an Adaptive Skin-Color Filter and FMM Neural Networks
1173
between feature range and its distribution, the system can prevent undesirable performance degradation which may be caused by noisy patterns. b j ( Ah ) =
n
1
• ∑ w ji [max(0,1 − max(0, γ jiv min(1, ahi − v ji )))
n
∑w i =1
i =1
ji
+ max(0,1 − max(0, γ jiu min(1, u ji − ahi ))) − 1.0]
γ ⎧ ⎪γ jiU = R ⎪ U ⎨ γ ⎪γ = ⎪⎩ jiV RV
(1)
RU = max( s, u ji new − u ji old ) RV = max( s, v ji old − v ji new )
(2)
Consequently the proposed model can provide more robust performance of pattern classification when the training data set in a given problem includes some noise patterns or unusual patterns.
4 A Feature Analysis Technique The most advantageous feature of convolutional neural network is invariant detection capability for distorted patterns in images [1-2]. The underlying system employs a convolutional neural network in which a Gabor transform layer is added at the first layer. The first layer of the network extracts local feature maps from the input image by using Gabor transform filters. The other layers of the feature extractor include two types of sub-layers called convolution layer and sub-sampling layer. Each layer of the network extracts successively larger features in a hierarchical set of layers. Finally a feature set is generated for the input of the pattern classifier. The number of the features can be reduced by the feature analysis technique using the FMM model described in the previous section. We define a measure called relevance factor (RF) as shown in Equation (3). The measure means the degree of relevance between a feature value and a pattern class.
RF ( xi , Ck ) = (
−
1 Nk
∑
B j ∈Ck
S ( xi , (u ji , v ji )) ⋅ wij
1 wij ∑ S ( xi , (u ji , v ji )) ⋅ wij ) / B∑ ( N B − N k ) B j ∉Ck j ∈Ck
(3)
In the equation, constant N B and N k are the total number of hyperboxes and the number of hyperboxes that belong to class k, respectively. S is a similarity measure between two fuzzy intervals. If the RF ( xi , k ) has a positive value, it means an
1174
H.-J. Kim et al.
excitatory relationship between the feature
xi and the class k. But a negative value of
RF ( xi , k ) means an inhibitory relationship between them. 5 Experimental Results For the training of skin-color filter, the system considers eleven color features which are labeled as F-1 = Red, F-2 = Green, F-3 = Blue, F-4 = Intensity, F-5 = Cb, F-6 = Cr, F-7 = magenta, F-8 = Cyan, F-9 = Yelleow, F-10 = Hue, and F-11 = Saturation.
Fig. 2. Two training data captured under different illumination conditions Table 1. Feature analysis results for the two different images
image - 1 features
feature range
image – 2 RF
features
feature range
RF
F-5
0.547 ~ 0.737
9.3703
F-11
0.027 ~ 0.128
0.8888
F-3
0.435 ~ 0.627
9.2604
F-10
0.828 ~ 0.983
0.5827
F-9
0.372 ~ 0.564
9.2631
F-6
0.053 ~ 0.233
0.5004
F-11
0.074 ~ 0.279
8.7050
F-5
0.759 ~ 0.958
0.4529
Face Detection Using an Adaptive Skin-Color Filter and FMM Neural Networks
1175
Table 1 shows the skin-color analysis result and the feature range data derived from the training process. As shown in the table, different kinds of features can be adaptively selected for a given condition, and the feature ranges of skin-color filter can be also adjusted by the training process. Table 1 shows four features which have the highest value of the relevance factor RF. A number of hyperboxes for face and nonface patterns have been generated and the relevance factors are also adjusted through the training process. Therefore the system can select more effective feature set adaptively for the given environment.
6 Conclusion A feature analysis method for face detection using a modified FMM model has been introduced. Through the training process, the skin-color filter is adapted for the illumination condition under which the given images are captured. A relevance factor has been defined for the feature selection technique. The measure can be utilized in designing an optimal structure of the classifier. We have applied the proposed model to a real-time face detection system in which the illumination conditions are frequently changed.
References 1. Garcia, Cristophe and Delakis, Manolis: Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.26, No.11, (2004) 1408-1423 2. Lawrence, Steve, Giles, C. L., Tsoi, A. C. and Back, Andrew D.: Face Detection: A Convolutional Neural-Network Approach, IEEE Transaction n Neural Networks, Vol.8, No.1, (1997) 98-113 3. Hsu, Rein-Lien, Mohamed Abdel-Mottaleb and Jain, Anil K.: Face Detection in Color Images,” IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.24, No.5, (2002) 696-706 4. Storring, Moritz, Kocka, Tomas, Andersen, Hans J., Granum, Erik: Tracking Regions of Human Skin Through Illumination Changes, Pattern Recognition Letters, Vol.24, (2003) 1715-1723 5. Simpson, P. K.: Fuzzy Min-Max Neural Networks Part 1: Classification. IEEE Transaction on Neural Networks, Vol.3. No.5. (1997) 776-786 6. Gabrys, B. and Bargiela A.: General Fuzzy Min-Max Neural Network for Clustering and Classification. IEEE Transaction on Neural Networks, Vol.11. No.3. (2000) 769-783 7. Kim, H. J., Ryu, T. W., Nguyen, T. T., Lim, J. S. and Gupta, S.: A Weighted Fuzzy MinMax Neural Network for Pattern Classification and Feature Extraction. Proceeding of International Conference on Computational Science and Its Application, Part.4 (2004) 791-798
GA Optimized Wavelet Neural Networks Jinhua Xu Department of Computer Science, East China Normal University
[email protected] Abstract. In this paper, a new GA-based constructive algorithm is proposed for wavelet neural networks. Wavelets will be added to the WNNs from low resolution level to high resolution level. At each resolution, the translation parameters of a new wavelet is trained using GA, and output weights is obtained using least square techniques. The proposed algorithm is suitable to the high dimensional problems.
1
Introduction
Wavelet transforms have emerged as a means of representing a function in a manner which readily reveals properties of the function in localized regions of the joint time-frequency space. The idea of combining wavelets with neural networks has led to the development of wavelet neural networks (WNNs) [1]. The determination of a network size and weight parameters is clearly critical when using WNNs. Some research have been done on this problem. In [2], an iterative method which combining genetic algorithms and least squares techniques is proposed for optimizing WNNs. GAs are used for optimal selection of the structure of the WNNs and the parameters of the transfer function of its neurons. Least squares techniques are used to update the weights of the network. In [3], a new class of wavelet networks is proposed, where the model structure for a high dimensional system is chosen to be a superimposition of a number of functions with fewer variables. A forward orthogonal least squares algorithm and the error reduction ratio is applied to select the model terms. In [4], wavelet network is constructed by some selected wavelets from a wavelet basis by exploring the sparseness of training data and using techniques in regression analysis. In [5], an orthogonalized residual based selection (ORBS) algorithm is proposed for WNNs. The use of evolutionary algorithms (EA) to aid in the artificial neural network (ANN) learning has been a popular approach to address the local optima and design problem of ANN[6]. The typical approach is to combine the strength of backpropagation (BP) in weight learning and EA’s capability of searching the architecture space. Some EA methods were proposed to learn both the network structure and connection weights[2]. Genetic algorithm (GA) is a directed random search technique that is widely applied in optimization problems. This is especially useful for complex optimization problems where the number of parameters is large and the analytical Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1176–1180, 2006. c Springer-Verlag Berlin Heidelberg 2006
GA Optimized Wavelet Neural Networks
1177
solutions are difficult to obtain. GA can help to find out the optimal solution globally over a domain. In this paper, a new constructive algorithm is proposed for WNNs. Wavelets will be added to the WNNs from low resolution level to high resolution level. At each resolution, GA is applied to select the wavelet basis. The local optima problem of gradient-based algorithm is avoided. The proposed algorithm is suitable to the high dimensional problems.
2
Preliminaries
The wavelet analysis procedure is implemented with dilated and translated versions of a mother wavelet. In theory, the dilation (scale) parameter of a wavelet can be any positive real value and the translation (shift) can be an arbitrary real number. In practice, in order to improve computation efficiency, the values of the shift and scale parameters are often limited to some discrete lattices. This is then referred to as the discrete wavelet transform (DWT). The WNNs stemmed from the DWT have a linear-in-parameter structure[4,5]. In practical applications, it is unnecessary and impossible to represent a signal using an infinite decomposition in terms of wavelet basis functions. The decomposition are therefore often truncated at an appropriate accuracy. An approximation to a function f ∈ L2 (R) using the truncated wavelet decomposition with the coarsest resolution Jmin and the finest resolution Jmax can be expressed in the following: J max cj,k ψj,k (x) (1) f (x) = j=Jmin k∈Kj
where Kj are subsets of Z and often depend on the resolution level j for all compactly supported wavelets and for most rapidly vanishing wavelets. The wavelet network in (1) may involve a great number of candidate wavelet terms. Experience shows that often many of the terms are redundant and only a small number of significant wavelet terms are necessary to describe a given nonlinear system with a given accuracy. Some basis selection algorithms were proposed to select the significant basis from the candidate wavelet library[4,5]. However, when the number of the basis in the wavelet library is very large, the heavy computational cost may make the basis selection algorithms not feasible in practice. In this paper, a genetic algorithm is introduced to find the significant wavelets which should be included to the wavelet networks.
3
A GA-Based Constructive Algorithm for Wavelet Networks
In this section, a new constructive algorithm is proposed for WNNs, which starts with no wavelet in the WNN and adds new wavelets trained using GA.
1178
J. Xu
Given N pairs of training sample, {(x(1), y(1)}, · · · , {x(N ), y(N )}. Set the desired output y = [y(1), y(2), · · · , y(N )]T . Suppose the dilation is in the range [Jmin , Jmax ], where Jmin and Jmax are integers which represent the coarsest and finest resolution level respectively. For simplicity, assume that the dilations of all dimension for each wavelet are equal, that is, di1 = · · · = din = di . The algorithm can be summarized as follows: Step 1: Initialization. Set the output of the WNN yˆ0 = 0, the residual r0 = y; the dilation d = Jmin ; Set the number of wavelet i = 1. A WNN with i − 1 wavelets implements the function given by yˆi−1 (x) =
i−1
wj ψj (x)
(2)
j=1
where ψj (x) represents the function implemented by the j th wavelet. Moreover, ri−1 (x) = y(x) − yˆi−1 (x) is the residual error function for the current network with i − 1 wavelets. Addition of a new wavelet proceeds in two steps: Step 2: Input training. Use GA to train the translation parameters Θ = [θ1 , θ2 , · · · , θn ]T , where n is the dimension of the wavelet. Set φ = [ψ(1), ψ(2), · · · , ψ(N )]T where ψ(t) =
n
j=1
ψ(2d xj (t) − θj ). Let
yˆi = yˆi−1 + wφ
ri = y − yˆi = ri−1 − wφ
with w = (φT φ)−1 φT ri−1 . The best Θ may be selected to minimize the cost function Vi (Θ) = riT ri = (ri−1 − wφ)T (ri−1 − wφ) T = ri−1 ri−1 − (φT φ)−1 (φT ri−1 )2
(3)
A genetic algorithm is proposed to solve the optimization problem in (3) to find the Θ∗ . The translation parameters of a new wavelet are encoded into the chromosome. The fitness function to evaluate a chromosome in the population can be chosen as the residue reduction of a new wavelet, which is written as f (Θ) = (φT φ)−1 (φT ri−1 )2
(4)
Spinning the roulette wheel is used as the selection operator. A conventional one-point crossover operator has been employed. Two mutation operators, random mutation and little perturbation mutation, have been used. Step 3: Output training. If Vi−1 − Vi (Θ∗ ) < λVi−1 , (λ is a chosen constant threshold for the decay rate), the wavelet is rejected, goto step 4 for the next
GA Optimized Wavelet Neural Networks
1179
resolution; otherwise, Θ is accepted and set Ti := [ti1 , · · · , tin ]T = Θ∗ , di := d n and φi := [ψi (1), · · · , ψi (N )], where ψi (t) = j=1 ψ(2di xj (t) − tij ). Then φi is normalized as vi = φi / φTi φi . Suppose i − 1 wavelets have been obtained and orthonormalized as q1 , q2 , · · · qi−1 . The new obtained vi is orthogonalized to the previous wavelet as follows: pi = vi − ((viT q1 )q1 + · · · + (viT qi−1 )qi−1 ) qi = pi / pTi pi
(6)
w ¯i = qiT y
(7)
(5)
and set yˆi = yˆi−1 + w ¯i qi
ri = ri−1 − w ¯i qi
If riT ri < , the approximation accuracy is reached, go to step 5; Otherwise, set i := i + 1 and goto step 2 to train a new wavelet in the current resolution. Step 4: Change the dilation parameter d := d + 1, if d < Jmax (maximum resolution), go to step 2; otherwise, go to step 5. Step 5: Set M := i(the number of wavelets). Stop training.
4
Numerical Examples
Chaotic time series identification: The logistic map[2] is a chaotic time series close to being a white noise that satisfies the ergodicity property. This series can be generated as follows: xn+1 = 4xn (1 − xn ), x0 ∈ (0, 1)
(8)
The 1-D wavelet used is the Mexican hat with support [−4, 4]. The population size used for the GA is 20, the maximum generation is 200. In order to compare the prediction results of the WNN with other work [2], the training set and testing sets are identical to that in [2]. The training set consists of 2000 points extracted from the series generated from (8) with initial condition x0 = 1/11. Ten independent runs are performed. To test the generalization capabilities of the constructed wavelet networks, 8000 points have been generated starting from the following different initial conditions: √ √ x1 = 2/2, x2 = 3/3, x3 = 1/11, x4 = 4/7, x5 = 8/9 (9) The generalization capabilities are measured N Nin terms of the normalized square error, ERR% = 100∗ k=1 (y(k)− yˆ(k))2 / k=1 y 2 (k), standard deviation (SD), N σ = k=1 (y(k) − yˆ(k))2 /N . Means of the number of wavelets, ERRs and SDs are shown in Table 1 and 2. It can be seen that the wavelet networks constructed using the proposed approach have less number of nodes and better generalization capabilities than the networks in [2].
1180
J. Xu
Table 1. Comparison of the prediction result of the proposed WNNs with the result in [2] Algorithms Number of nodes RBNN2 [2] 20 14 WBNN3 [2] 12 WBNN4 [2] GA-WNN 4
ERR 0.0090 0.0091 0.0090 0.0052
SD 0.0033 0.0033 0.0033 1.92e-3
Table 2. Generalization performance of the WNNs using the proposed GA algorithm √ √ Measures x1 = 2/2 x2 = 3/3 x3 = 1/11 x4 = 4/7 x5 = 8/9 EER, % 0.0049 0.0050 0.0051 0.0051 0.0051 1.86e-3 1.88e-3 1.89e-3 1.89e-3 1.89e-3 SD(σ)
5
Conclusion
In this paper, a new constructive algorithm is proposed for WNNs, which starts with no wavelet in the WNN and adds new wavelets from low resolution level to high resolution level. At each resolution, the translation parameters of a new wavelet are trained using GA. Since GA is a directed random search technique, local optima trap problem of gradient-type algorithm is avoided. Moreover, the proposed GA Optimized WNNs can be used for high dimensional problems.
References 1. Zhang, Q., Benveniste, A.: Wavelet Networks. IEEE Trans. on Neural Networks 3 (1992) 889-898 2. Alonge, F., Dippolite, F., Raimondi, F.M.: System Identificaton via Optimsed Wavelet-Based Neural Networks. IEE Proc.-Control Theory Appl. 150(2) (2003) 147-154 3. Billings, S.A., Wei, H.: A new Class of Wavelet Networks for Nonlinear System Identification. IEEE Trans. on Neural Networks 16(4) (2005) 862-874 4. Zhang, Q.: Using Wavelet Network in Nonparametric Estimation. IEEE Trans. on Neural Networks 8 (1997) 227-236 5. Xu, J., Ho, D.W.C.: A Basis Selection Algorithm for Wavelet Neural Networks. Neurocomputing 48 (2002) 681-689 6. Yao, X.: Evolving Artificial Neural Networks. Proc. IEEE 87(9) (1999) 1423-1447
The Optimal Solution of TSP Using the New Mixture Initialization and Sequential Transformation Method in Genetic Algorithm* Rae-Goo Kang and Chai-Yeoung Jung** Dept. Computer Science & Statistic, Chosun University, 375 Seoseok-dong Dong-gu Gwangju, Korea
[email protected],
[email protected] Abstract. TSP is a problem finding out the shortest distance out of possible courses where one starts a certain city and turns back to a starting city, visiting every city only once among N cities. This paper proposes the new method using both population initialization and sequential transformation method at the same time and then proves the improvement of capability by comparing them with existing methods. Keywords: Genetic Algorithm, GA, TSP, Initialization.
1 Introduction TSP is the problem to find out the shortest distance out of possible courses where one starts a certain city and, turns back to a starting city, visiting every city only once as N cities and the distances between cities are given. The investigation space of TSP is
{T1 , T2 ,....., Tn } , the set of all traveling, and the
size of it is N ! . The solution is the shortest travelling distance. TSP is applied to various kinds of fields such as optimization of network and problems deciding the process order in a factory and home-delivery courses etc.[1][2][3]. This paper draws the new Mixture Initialization using both Random Initialization and induced Initialization Population Initialization, which should be preceded to apply GA to TSP. Also, it has the nearest solution to the optimal solution found out by sequential transformation method when applying Selection Operator.
2 Genetic Algorithm in the Experiment 2.1 Selection Operator The common rule is that the probability for superior solution to be chosen should be high although various selection operators have been presented so far. This paper uses the Roulette Wheel selection and Stochastic universal sampling operators. * **
This study was supported by research funds from Chosun University, 2006. Correspondent author.
Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1181 – 1185, 2006. © Springer-Verlag Berlin Heidelberg 2006
1182
R.-G. Kang and C.-Y. Jung
Roulette Wheel selection, the most representative operator, solves the selection probability of each objective i by summing up f (i ), i = 1,2,3,....., N , f means the fitness of each objective i (fitness should not be 0.) With Stochastic universal sampling, every objective can be chosen for the same probability. 2.2 Crossover Operator Crossover Operator is the most various and representative operator in GA. PMX, CX, OX, and Edge Recombination(ER) was used in this paper.[4] Unlike PMX, ER, CX and OX, a kind of Heuristic Crossover Operator by Grefensetette, is the way using the edge information around parent generation, not the existing way deciding the gene through the genes in parent generation. And this was introduced by Starkweathe.[5][6]
Fig. 1. Edge Recombination
Fig 1 shows the operation process of ER operation, one of the most being-used operators out of Crossover Operator. At first, one chooses one city at random (In Fig1, city 0 is chosen.) and then, finds the city, which has the least number of linkable edges. If the number of edges is the same, a certain city is chosen at random. After finishing all these processes, the new generation can be obtained. 2.3 Mutation Operator Each population becomes stronger and move look-alike by Selection Operators and Crossover Operators. However, the move the generation goes down, the less the variety of genes is. Mutation Operators is used to compensate these faults. With Mutation Operators, New Population can be made by preventing a specific bit from fixing from the early generation. In this paper, Swapping mutation and Inversion are used out of Mutation Operators.
3 Proposed Methods This paper proposes Mixture Initialization and Sequential Transformation method to obtain a superior solution of TSP.
The Optimal Solution of TSP
1183
3.1 Mixture Initialization The first Population Initialization is more important then any other thing to get the nearest value to the optimal solution. There are two methods in Population Initialization. One is Random Initialization where population is produced by extracting at random without any rules. And the other is Induced Initialization where population is produced consistently by using background knowledge and information relating to given values. Random Initialization has been used mostly among Population Initialization methods for TSP. This paper proposes Mixture Initialization using both Random Initialization and Induced Initialization at the same time. (Random Initialization. Uses a random generator and Induced Initialization. Is based on background knowledge or experience.) One chooses a starting city through random generator, and lists cities orderly from the city with the shortest distance to the city with the farthest distance, referring already-known information about distance among cities. If N cities are listed like this order, N × N matrix is formed. This matrix was as the first population. 3.2 Sequential Transformation Method Before Selection Operator operates in each generation, this paper proposes Sequential Transformation method, which changes the solution of population produced in the former generation sequentially, and applies the selection operator. In this methods, the probability for the superior solution to be chosen becomes high when the population to be used in next generation is produces. That’s because populations are rearranged sequentially in the order of population which produces the best solution by the solution of the former generation
4 Consequence of Experiment To analyze the results of Mixture Initialization and Sequential Transformation method proposed in this paper, 2 Selection Operators, 4 Crossover Operators, 2 Mutation Operators were used and 2 superior genes were preserved by using elitism. We designed the number of total cities 100, the initialization of population size 1000, the number of total generations unlimited and the operation automatically ended if the value of solution is not changed during 100 generations. And Pc and Pm was used 0.7 and 0.1. This experiment was realized by using PowerBuilder8.0 based on Windows XP in P-4 2.4GHz and data was saved and analyzed by Oracle 9i. Fig 2 shows the moment we obtain the best result. This result was obtained by TSP newly developed with PowerBuilder8.0 to prove the capability of two methods propose in this paper. As shown in Table 1, 2664Km is the shortest distance by mainly used Random Initialization, but the better result is obtained by Mixture Initialization and Sequential Transformation method newly-proposed in this paper: the shortest distance is 2528Km. It is shown that the distances by newly-proposed methods are always shorter in every result. And also, these two methods have achieved maximum improvement rate 10.1%, minimum 0.2%, and average 6.2%.
1184
R.-G. Kang and C.-Y. Jung
Fig. 2. Application used in experiment Table 1. comparison of experiment result Selection
Crossover PMX
Roulette Wheel
ER CX OX PMX
Stochastic universal sampling
ER CX OX
Mutation Inversion Swapping Inversion Swapping Inversion Swapping Inversion Swapping Inversion Swapping Inversion Swapping Inversion Swapping Inversion Swapping
Min 2776 3328 3521 2998 2664 2968 3028 2908 3054 4215 3524 3698 2967 4015 3025 3332
Avg. 3812.2 4215 4996.1 3512 3081.2 3325.2 3521.2 3865.5 3344.5 3702 3561.1 2968.7 3206.2 3345 3625 3478
gen 2650 3520 3358 2658 2375 2451 3302 2785 2556 3322 3158 2745 2566 2856 3254 3023
newly-proposed method Min Avg. gen 2528 3332.1 2427 3185 3965.1 3320 3217 4402.3 3256 2698 3168 2756 2618 3990.8 3012 2812 4065.1 2405 2786 3302.1 3598 2901 4002 2741 3015 4812.1 2588 3854 4102.2 3025 3222 5021 2930 3213 5502.6 3302 2851 4892.1 2547 3874 1561.9 2635 3005 5009 3010 2995 5821.1 2998
Fig. 3. Optimal value graph using newly-proposed method
The Optimal Solution of TSP
1185
5 Conclusion This paper proposes two methods more effectively to solve TSP. One is Mixture Initialization using Random Initialization and Induced Initialization at the same time, and the other is Sequential Transformation method rearranging objectives in each generation and heightening the probability to only superior genes to be chosen. With these two methods, average improvement rate 6.2% was obtained and superior values produced from 1generation unlike existing methods. So to speak, improving efficiency is helped by the method using the given information of distances between cities and the method heightening probability for only superior genes to be chosen through rearranging objectives. The methods proposed in this paper was experimented with existing GA alone, thus if a new algorithm is applied and cities are lager or more complicated, another study should be done continually to obtain the optimal value.
References 1. Jin gang gou.: Genetic Algorithm and the application, kousa (2000) 2. Goldberg, D.: Genetic Algorithms in search, Optimization, and Machine Learning Addison Wesley, Reading, MA (1989 ) 3. K.D. Boese, "Cost Versus Distance In the Traveling Salesman Problem", Technical Report CSD-950018, UCLA (1995) 4. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, SpringerVerlag (1992) 5. Grefenstette, J. Gopal,R. Rosmaita, B and Gucht, D.: "Genetic Algorithms for the Traveling Salesman Problem", Proc. the 1st Inter. Conf. on GAs and Their Applications (1985) 160-168 6. Whitley, D. Starkweather, T and Fuquay, D.: "Scheduling problems and traveling salesman: the genetic edge recombination and operator", Proc. Third Int. Conf. G As (1989) 133-140
Steering Law Design for Single Gimbal Control Moment Gyroscopes Based on RBF Neural Networks Zhong Wu1, Wusheng Chou2, and Kongming Wei1 1 School of Instrumentation Science and Optoelectronics Engineering Beijing University of Aeronautics and Astronautics, Beijing 100083, China
[email protected],
[email protected] 2 School of Mechanical and Automation Engineering Beijing University of Aeronautics and Astronautics, Beijing 100083, China
[email protected] Abstract. Usually, the pseudo-inverse of the Jacobian matrix needs to be calculated in the conventional laws for the Single Gimbal Control Moment Gyroscopes (SGCMGs). However, the steering law can not work when the Jacobian matrix is singular and its pseudo-inverse is indefinite. To avoid the conditions stated above, a new steering law is designed using radial basis function(RBF) neural networks. This algorithm can output the desired gimbal angles directly according to the momentum command. And also, this algorithm can deal with the singular conditions since the pseudo-inverse of the Jacobian matrix is not needed. Simulation results demonstrate the effectiveness of the steering law.
1 Introduction Single Gimbal Control Moment Gyroscopes (SGCMGs) are torque-producing devices which are mounted inside a spacecraft and operate based on principles of momentum exchange. Due to their torque amplification features and simple construction, SGCMGs have wide application perspectives in the field of spacecraft control. Generally, three or more SGCMGs are often used to meet the need of the 3-axis attitude control. In order to control the spacecraft attitude accurately using SGCMGs, highperformance steering laws should be designed. Usually, the pseudo-inverse of the Jacobian matrix needs to be calculated in the conventional steering laws[1-7]. However, the steering law can not work when the Jacobian matrix is singular and its pseudo-inverse is indefinite. Although the steering laws based on singularity-robust inverse can escape from the singular points, they can not produce exact torque for the attitude control and unexpected attitude oscillations will be resulted from the steering errors[4-7]. Therefore, Krishnan and Vadali[8] presented a steering law based on the transpose of the Jacobian matrix, which can provide viable gimbal rates for SGCMGs even when the gimbals pass through a singular configuration. Paradiso[9] adopted a global search technology to produce singularityavoiding feedforward gimbal trajectories in response to a command history forecast from a momentum management or maneuver scheduler. Nevertheless, both of them have a heavy burden in calculation and can not be implemented easily. Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1186 – 1190, 2006. © Springer-Verlag Berlin Heidelberg 2006
Steering Law Design for SGCMGs Based on RBF Neural Networks
1187
Actually, the problem of the steering law design can be regarded as an inverse to a nonlinear map⎯SGCMG kinematics. If the inverse map can be obtained, then the steering law design is fulfilled. Therefore, the problem of the steering law design can be transformed into the study on the inverse kinematics. In this paper, radial basis function(RBF) neural networks are utilized to approximate the inverse since it can approximate any nonlinear continuous function with arbitrary degree of accuracy. Thus, the steering law based on RBF neural networks is a global method and can output the desired gimbal angle directly in response to the momentum command. The rest parts of this paper are organized as follows. In section 2, a brief description of kinematics for SGCMG systems is given. Section 3 will describe how to design a steering law based on RBF neural networks. In section 4, simulation results are presented to verify the steering law. Conclusions are made in section 5.
2 SGCMG Kinematics and Problem Statement For conveniences, assume n identical SGCMGs are used in the attitude control system of the spacecrafts(n>3). Let σ, h denote the gimbal angle and the angular momentum vector respectively, then the kinematics of the SGCMG systems can be written as[2-3]: h(t)=f(σ(t))
(1)
h& (t ) = J (σ (t ))σ& (t )
(2)
and correspondingly, where f(⋅) is a certain nonlinear function, Jacobian matrix J(σ(t))=∂f(σ(t))/∂σ(t).
3 Steering Law Design Based on RBF Neural Networks In order to avoid the calculation of the pseudo-inverse of the Jacobian matrix, the steering law can be designed through the direct inverse to (1). RBF neural networks can be employed to approximate the inverse since f is a complex nonlinear function and it is impossible to obtain its inverse in analytical forms. Furthermore, RBF networks should be adjusted on line to guarantee the steering performance, as shown in Fig.1. This network has 3 input nodes and n output nodes. The weight wji can be adjusted on line according to the steering errors. The gimbal angle is given by[10]: 3
(
)
σ i = w0i + ∑ w jiϕ j hd − c j , i=1,2, ⋅⋅⋅, n j =1
hd(t)
RBF NN e(t) +
σ (t)
f (⋅)
h(t)
Fig. 1. Steering law based on RBF neural networks
(3)
1188
Z. Wu, W. Chou, and K. Wei
where hd(t) is the desired momentum trajectory, wji is the weight between the jth hidden node and the ith output node, ϕj is the radial basis function, and cj is the center vector of the function, j=1,2,3. Let σ=[σ1, ⋅⋅⋅, σn]T, W0=[w01, ⋅⋅⋅, w0n]T, W=(wij)n×3, ϕ=[ϕ1(|hd-c1|), ⋅⋅⋅, ϕn(|hd-cn|)]T, then we can change (3) into the following form:
σ = W 0 + Wϕ
(4)
E = eT e 2
(5)
Choose a performance index as:
where e=hd-h. Assume hd varies very slowly, then
∂e ∂σ ≈ − ∂h ∂σ = − J Take the time derivative of E along (2), (4), and (6), then we can get T T E& = e T e& = e ∂e ∂σ σ& = −e JW& ϕ
(6)
(7)
Let
W& = αJ T eϕ T Then, (7) can be changed into E& = −αe T JJ T eϕ T ϕ ≤ 0
(8)
(9)
where α is a positive constant. From (9), it is easy to know that function E has a lower bound. Therefore, we can use (8) as the update law of the weight matrix W to make the steering error e converge to zero asymptotically. Additionally, the update law (8) also can be written in the form of wji, i.e.
(
)
w& ji = αJ iT eϕ j hd − c j , i=1,2, ⋅⋅⋅, n, j=1,2,3
(10)
where Ji=∂h/∂σi is the ith column vector of J. Before the use of the steering law based on RBF networks, it should be trained first using enough sample data. For redundant SGCMGs, it is very difficult to determine the sample space because a fixed momentum h corresponds to a manifold in gimbal angle space. Therefore, the sample trajectories can be used in the generation of the sample data. For example, we can choose enough momentum trajectories in momentum space first. Then, the corresponding gimbal angle trajectories can be determined using global optimization techniques[9] to guarantee the avoidance of all the internal singularities. Thus, the momentum and the gimbal angle trajectories can compose the sample space to train the network. Obviously, the use of the sample trajectories can eliminate the problems resulted from the nonlinear surjective maps (SGCMG kinematics), and can improve the performance of the singularity avoidance.
4 Simulation Results To demonstrate the effectiveness of the steering law based on RBF neural networks, a simulation study was performed concerning a pyramid-type SGCMG system with the following kinematics[3].
Steering Law Design for SGCMGs Based on RBF Neural Networks
1189
⎧h1 = h0 (−c β s1 − c 2 + c β s 3 + c 4 ) ⎪ ⎨h2 = h0 (c1 − c β s 2 − c 3 + c β s 4 ) ⎪h = h s ( s + s + s + s ) 0 β 1 2 3 4 ⎩ 3
(11)
where sβ=sinβ, cβ=cosβ, si=sinσi ci=cosσi, i=1,⋅⋅⋅,4, β=54.74°. Assume that each SGCMG has unit momentum (h0=1), the initial gimbal angle σ0=[45°, -45°, 45°, -45°]T, α=0.2, simulation results can be obtained in Fig.2~Fig.3 under different momentum commands, where the singularity measure m=sqrt(|JJT|). Singularity Measure
1.2
1
1.1
-3
Steering Error
0.8 ||e||2(Nms)
1
m
x 10
0.9 0.8
0.6 0.4 0.2
0.7
0 0
1
2
3
0
1
time(s)
2
3
time(s)
Fig. 2. Simulation curves when hd=0.7071[t, t, 0]T Singularity Measure
1.5
8
x 10
-3
Steering Error
6
m
||e||2(Nms)
1
0.5
4 2
0
0 0
1
2 time(s)
3
4
0
1
2 time(s)
3
4
Fig. 3. Simulation curves when hd=0.7071[t, t, 0]T, t≤0.83s; hd=0.7071[(1.66-t), t, 0]T, t>0.83s
From Fig.2, it can be seen that no internal singularities is encountered during the whole steering procedure. The maximal steering error is less than 0.04%. When the steering procedure approaches the saturation singular points(t>3s), the steering error will increase rapidly, and desaturation measure should be used. Fig.3 also shows that there are no internal singularities during the steering procedure. Although the sudden variation(t=0.83s) of the momentum command can result in the increase of the steering error, the error can converge to zero rapidly and the maximal error is only 0.6%. Compared with the conventional steering laws[2-7], this one does not need the calculation of the pseudo-inverse of the Jacobian matrix and can deal with the singular conditions easily.
1190
Z. Wu, W. Chou, and K. Wei
5 Conclusion In order to eliminate the problems resulted from the calculation of the pseudo-inverse of the Jacobian matrix in the conventional steering laws, a new steering law is designed for redundant SGCMG systems using RBF neural networks. Since RBF networks can approximate any nonlinear functions with arbitrary precision and without local minimum problems, the steering law of this paper has good performance in accuracy and singularity avoidance. Furthermore, this algorithm can output the desired gimbal angle directly according to the momentum command and the pseudoinverse of the Jacobian matrix is not needed.
References 1. Lappas, V.J., Steyn, W.H., Underwood C.I.: Control moment gyro (CMG) gimbal angle compensation using magnetic control during external disturbances. Electronics Letters, 37(9) (2001) 603-604 2. Wu, Z., Wu, H.X.: Survey on steering laws for single gimbal control moment gyroscope systems. Journal of Astronautics, 21(4) (2000) 140-145 (in Chinese) 3. Bedrossian, N.S.: Steering law design for redundant single gimbal control moment gyro systems. M.S.Thesis, Mechanical Engineering, Massachusetts Inst. of Technology, Cambridge, MA, Aug., (1987) 4. Wie, B., Bailey, D., Heiberg, C.: Singularity robust steering logic for redundant singlegimbal control moment gyros. Journal of Guidance, Control and Dynamics, 24(5) (2001) 865-872 5. Ford, K.A., Hall, C.D.: Singular direction avoidance steering for control-moment gyros. Journal of Guidance, Control, and Dynamics, 23(4) (2000) 648-656 6. Jung, D., Tsiotras, P.: An experimental comparison of CMG steering control laws. In: Collection of Technical Papers - AIAA/AAS Astrodynamics Specialist Conference, Providence, RI, (2) (2004) 1128-1144 7. Wie, B.: Singularity escape/avoidance steering logic for control moment gyro systems. Journal of Guidance, Control and Dynamics, 28(5) (2001) 948-955 8. Krishnan, S., Vadali, S.R.: An inverse-free technique for attitude control of spacecraft using CMGs. Acta Astronautica, 39(6) (1996) 431-438 9. Paradiso, J.A.: Global Steering of Single Gimballed Control Moment Gyroscopes Using a Directed Search. Journal of Guidance, Control and Dynamics, 15(5) (1992) 1236-1244 10. Yang, M., Li, J.G., Lu, G.Z.: The inverse kinematics control algorithm based on RBF neural networks for manipulators. In: Proceedings of the 5th World Congress on Intelligent Control and Automation, Hangzhou, China, (2004) 855-859
Automatic Design of Hierarchical RBF Networks for System Identification Yuehui Chen1 , Bo Yang1,2 , and Jin Zhou1 1
2
School of Information Science and Engineering Jinan University, Jinan 250022, P.R. China
[email protected] State Key Lab. of Advanced Technology for Materials Synthesis and Processing, Wuhan University of Science and Technology, Wuhan, China
[email protected] Abstract. The purpose of this study is to identify the hierarchical radial basis function neural networks and select important input features for each sub-RBF neural network automatically. Based on the pre-defined instruction/operator sets, a hierarchical RBF neural network is created and evolved by using Extended Compact Genetic Programming (ECGP), and the parameters are optimized by Differential Evolution (DE) algorithm. Empirical results on benchmark system identification problems indicate that the proposed method is efficient.
1
Introduction
Hierarchical Neural Network (HNN) is neural network architecture in which the problem is divided and solved in more than step [1]. Ohno-Machado divide hierarchical network into two architectures that are bottom-up and topdown architectures [1]. Many version of HNN have been introduced and applied in various applications [1][3][4][5]. Erenshteyn and Laskov examine the application of hierarchical classifier to recognition of finger spelling [2]. They refer hierarchical NN as multi-stage NN. The approach aimed to minimize the network’s learning time without reducing the accuracy of the classifier. Mat Isa et al. used Hierarchical Radial Basis Function (HiRBF) to increase RBF performance in diagnosing cervical cancer [3]. HiRBF cascading together two RBF networks, where both network have different structure but using the same algorithms. The first network classifies all data and performs a filtering process to ensure that only certain attributes to be fed to the second network. The study shows that HiRBF performs better compared to single RBF. HRBF has been proved effective in the reconstruction of smooth surfaces from sparse noisy data points [5]. In this paper, an automatic method for constructing HRBF network is proposed. Based on the pre-defined instruction/operator sets, a HRBF network can be created and evolved. HRBF allows input variables selection. In our previous studies, in order to optimize Flexible Neural Tree (FNT) the hierarchical structure was evolved using Probabilistic Incremental Program Evolution algorithm (PIPE) with specific instructions [6][7] and Ant Programming [8]. In this Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1191–1195, 2006. c Springer-Verlag Berlin Heidelberg 2006
1192
Y. Chen, B. Yang, and J. Zhou x1
ψ
x2
ψ
1
W2
Σ
2
...
... xn
W1
ψ
W
m
y
x1 x2 x3 x4
m
RBF NN
RBF NN
y
x4
x1 x2 x3
Fig. 1. A basis function operator (left), and a tree-structural representation of a hierarchical RBF neural network with function instruction set F = {+2 , +3 , +4 , +5 , +6 }, and terminal instruction set T = {x1 , x2 , x3 } (right)
research, the hierarchical structure is evolved using the Extended Compact Genetic Programming. The fine tuning of the parameters encoded in the structure is accomplished using Differential Evolution (DE). The novelty of this paper is in the usage of HRBF model for selecting the important variables and for improving the accuracy of system identification.
2
The Hierarchical RBF Model
The function set F and terminal instruction a hierarchi set T used for generating cal RBF model are described as S = F T = {+2 , +3 , . . . , +N } {x1 , . . . , xn }, where +i (i = 2, 3, . . . , N ) denote non-leaf nodes’ instructions and taking i arguments. x1 ,x2 ,. . .,xn are leaf nodes’ instructions and taking no other arguments. The output of a non-leaf node is calculated as a RBF neural network model (see Fig.1). From this point of view, the instruction +i is also called a basis function operator with i inputs. The basis function operator is shown in In general, the basis funcFig.1(left). m tion networks can be represented as y = i=1 ωi ψi (x; θ), where x ∈ Rn is input vector, ψi (x; θ) is ith basis function, and ωi is the corresponding weights of ith basis function and θ is the parameter vector used in the basis functions. In this re x −b 2 search, Gaussian radial basis functions are used, ψi (x; θ) = nj=1 exp(− jaj 2j ) and the number of basis functions used in hidden layer is same with the number of inputs, that is, m = n. Tree Structure Optimization. Finding an optimal or near-optimal HRBF is formulated as a product of evolution. In this paper, the Extended Compact Genetic Programming (ECGP) [9] is employed to find an optimal or near-optimal HRBF structure. ECGP is a direct extension of ECGA to the tree representation which is based on the PIPE prototype tree. In ECGA, Marginal Product Models (MPMs) are used to model the interaction among genes, represented as random variables, given a population of Genetic Algorithm individuals. MPMs are represented as measures of marginal distributions on partitions of random variables. ECGP is based on the PIPE prototype tree, and thus each node in the prototype tree is a random variable. ECGP decomposes or partitions the prototype tree into sub-trees, and the MPM factorises the joint probability of all
Automatic Design of Hierarchical RBF Networks for System Identification
1193
nodes of the prototype tree, to a product of marginal distributions on a partition of its sub-trees. A greedy search heuristic is used to find an optimal MPM mode under the framework of minimum encoding inference. ECGP can represent the probability distribution for more than one node at a time. Thus, it extends PIPE in that the interactions among multiple nodes are considered. Parameter Optimization with DE Algorithm. The DE algorithm was first introduced by Storn and Price in 1995 [10]. In generation k, we denote the population members by xk1 , xk2 , . . ., xkN . The DE algorithm is given as follows [11]: 1) Set k = 0, and randomly generate N points x01 , x02 , . . ., x0N from search space to form an initial population; 2) For each point xki (1 ≤ i ≤ N ), execute the DE offspring generation scheme ( to generate an offspring xi k + 1); 3) If the given stop criteria is not met, set k = k + 1, goto step 2). The DE Offspring Generation approach used is given as follows, 1) Choose one point xd randomly such that f (xd ) f (xki ), another two points xb , xc randomly from the current population and a subset S = {j1 , . . . , jm } of the index set {1, . . . , n}, while m < n and all ji mutually different; 2) Generate a trial point u = (u1 , u2 , . . . , un ) as follows: DE Mutation. Generate a temporary point z as follows, z = (F + 0.5)xd + (F − 0.5)xi + F (xb − xc );
(1)
Where F is a give control parameter; DE Crossover. for j ∈ S, uj is chosen to be zj ; otherwise uj is chosen a to be (xki )j ; 3) If f (u) ≤ f (xki ), set xk+1 = u; otherwise, set xk+1 = xki . i i
3
Simulation Studies
In this research, benchmark problems Mackey-Glass and Jenkins-Box time-series are employed to evaluate of the performance of the proposed method. For Mackey-Glass time-series, we predict the x(t + 6) with using the inputs variables x(t), x(t − 6), x(t − 12), x(t − 18), x(t − 24) and x(t − 30). The evolved HRBF tree and the actual time-series data, the output of HRBF model and the prediction error are shown in Fig.2 (left). A comparison result of different methods for forecasting Mackey-Glass data is shown in Table 1 (left). For Jenkins-Box time-series, 10 inputs variables are used for constructing a HRBF model. The proper time-lags for constructing a HRBF model are finally determined by an evolutionary procedure. The evolved HRBF tree and the actual time-series, the HRBF model output and the prediction error are shown in Fig.2 (right). From the evolved HRBF tree, it can be seen that the optimal inputs
1194
Y. Chen, B. Yang, and J. Zhou Training data
Testingadat
64 Training data 1.6
x0
x1 x3 x5
Desired output Model output
60
1.4 Desired and model outputs
+3
Desired output Model output
62
+2
1.2
+2
1
x0 +4 x6 x9
0.8 0.6
x7
Desired and model output s
+2
+3
Testing data
58 56 54 52 50 48
0.4
x0 x1 x2 x5
0.2 0
100
200
300
400
500
600
700
800
900
1000
46 44
Sample
0
50
100
150 Sample
200
250
Fig. 2. The evolved architecture of HRBF model for prediction of the Mackey-Glass time-series(left), and the actual time series data, output of the evolved HRBF model and the prediction error(right)
variables for constructing a HRBF model are: u(t − 2), u(t − 3), u(t − 4), u(t − 6), y(t − 1), y(t − 2) and y(t − 3). It should be noted that the HRBF model with proper selected input variables has accurate precision and good generalization ability. A comparison result of different methods for forecasting Jenkins-Box data is shown in Table 1 (right). From the above simulation results, it can be seen that the proposed HRBF model works well for generating prediction models of time series. Table 1. Comparison of prediction errors using different methods for the Mackey-Glass and Gas furnace time-series problem Mackey-Glass Method RBF [12] GA+Fuzzy [13] FNT1 [7] FNT2 [7] HRBF
4
RMSE 0.0114 0.049 0.0069 0.0027 0.0076
Gas Furnace Method ANFIS [14] FuNN [15] FNT1 [7] FNT2 [7] HRNF
MSE 0.0073 0.0051 0.00066 0.00029 0.0012
Conclusions
Based on a novel representation and calculation of the hierarchical RBF models, an approach for evolving the HRBF was proposed in this paper. The hierarchical architecture and inputs selection method of the HRBF were accomplished using ECGP algorithm, and the free parameters embedded in the HRBF model were optimized using DE algorithm. Simulation results shown that the evolved HRBF models are effective for the time-series prediction problems. Our future works will concentrate on applying the proposed approach for more complex problems.
Acknowledgment This research was partially supported the Natural Science Foundation of China under grant No. 60573065.
Automatic Design of Hierarchical RBF Networks for System Identification
1195
References 1. Ohno-Machado, L.: Medical Applications of Artificial Neural Networks: Connectionist Model of Survival. Ph.D Dissertation. (1996) Stanford University 2. Erenshteyn, R., Laskov, P.: A multi-stage approach to fingerspelling and gesture recognition. In: Proceedings of the Workshop on the Integration of Gesture in Language and Speech. (1996) 185-194 3. Mat Isa, N. A., Mashor, M. Y. and Othman, N. H.: Diagnosis of Cervical Cancer using Hierarchical Radial Basis Function (HiRBF) Network. In: Sazali Yaacob, R. Nagarajan, Ali Chekima (Eds.), Proceedings of the International Conference on Artificial Intelligence in Engineering and Technology. (2002) 458-463 4. Ferrari, S., Maggioni, M., Alberto Borghese, N.: Multiscale Approximation With Hierarchical Radial Basis Functions Networks. IEEE Trans. on Neural Networks. 15 (2004) 178-188 5. Ferrari, S., Frosio, I., Piuri, V. and Alberto Borghese, N.: Automatic Multiscale Meshing Through HRBF Networks. IEEE Trans. on Instrumentation and Measurement. 54 (2005) 1463-1470 6. Chen, Y., Yang B., and Dong, J.: Nonlinear System Modeling via Optimal Design of Neural Trees. International Journal of Neural Systems. 14 (2004) 125-137 7. Chen, Y., Yang, B., Dong, J. and Abraham, A.: Time-series Forecasting using Flexible Neural Tree Model. Information Science. 174 (2005) 219-235 8. Chen, Y., Yang, B., Dong, J.: Automatic Design of Hierarchical TS-FS Models using Ant Programming and PSO algorithm. Lectur Notes on Computer Science 3192. (2004) 285-294 9. Sastry K. and Goldberg, D. E.: Probabilistic model building and competent genetic programming. In R. L. Riolo and B. Worzel, editors, Genetic Programming Theory and Practise. (2003) 205-220 Kluwer, 2003. 10. Storn, R. and Price, K.: Differential evolution - a simple and efficient adaptive scheme for global optimization over continuous spaces. Technical report, International Computer Science Institute, Berkley, (1995) 11. Price, K.: Differential Evolution vs. the Functions of the 2nd ICEO. In: Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC’97), Indianapolis, USA. (1997) 153-157 12. Cho, K.B., Wang, B.H.: Radial basis function based adaptive fuzzy systems their application to system identification and prediction, Fuzzy Sets and Systems, 83 (1995) 325-339 13. Kim, D., kim, C.: Forecasting time series with genetic fuzzy predictor ensembles, IEEE Trans. Fuzzy Systems 5 (1997) 523-535 14. Jang, J.-S.R., Sun, C.-T. and Mizutani, E.: Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence, Upper saddle River, NJ:prentice-Hall (1997) 15. Kasabov, N., Kim, J.S., Watts, M. and Gray, A.: FuNN/2 - A fuzzy neural network architecture for adaptive learning and knowledge acquisition, Inforation Science. 101 (1996) 155-175
Dynamically Subsumed-OVA SVMs for Fingerprint Classification Jin-Hyuk Hong and Sung-Bae Cho Dept. of Computer Science, Yonsei University 134 Sinchon-dong, Sudaemoon-ku Seoul 120-749, Korea
[email protected],
[email protected] Abstract. A novel method to fingerprint classification, in which the naïve Bayes classifier (NB) and OVA SVMs are integrated, is presented. In order to solve the tie problem of combing OVA SVMs, we propose a subsumption architecture dynamically organized by the probability of classes. NB calculates the probability using singularities and pseudo codes, while OVA SVMs are trained on FingerCode. The proposed method not only tolerates ambiguous fingerprint images by combining different fingerprint features, but produces a classification accuracy of 90.8% for 5-class classification on the NIST 4 database, that is higher than conventional methods.
1 Introduction Since the Henry system categorizes fingerprints by the relative position and number of core and delta points, many researchers have tried to extract them in the flow of the ridges [1]. Karu and Jain proposed a heuristic algorithm with singularities [2], while Zhang and Yan used singularities together with pseudo ridges to classify fingerprints [3]. In order to obtain a high classification rate, various features have been also actively investigated. Jain, et al. proposed FingerCode that uses a Gabor filter to extract the directional ridge flow [4], Park used the orientation filtered by fast Fourier transform [5], where Min, et al. proposed localized models of SVMs using FingerCode [6]. There are some other attempts to integrate several features and methods to produce a robust fingerprint classifier [1, 7]. Senior used hidden Markov models and decision trees to recognize the ridge structure of the print [1], while Yao, et al. combined flat and structured features using the recursive neural networks and support vector machines (SVMs) [7]. This paper describes a novel fingerprint classification approach integrating the naïve Bayes classifier (NB) and SVMs. In order to accomplish highly accurate classification, SVMs with FingerCode are generated based on the one-vs-all (OVA) scheme while NB with singularities dynamically organizes them.
2 A Dynamic Fingerprint Classifier 2.1 Overall Architecture Contrary to conventional methods that have a static classification process, we propose a dynamic fingerprint classifier that not only uses various fingerprint features Q. Yang and G. Webb (Eds.): PRICAI 2006, LNAI 4099, pp. 1196 – 1200, 2006. © Springer-Verlag Berlin Heidelberg 2006
Dynamically Subsumed-OVA SVMs for Fingerprint Classification
1197
(singularity, pseudo ridges and FingerCode) but also solves the ambiguity of OVA SVMs. The proposed method consists of NB and OVA SVMs as shown in Fig. 1. NB estimates the posterior probability for fingerprint classes prob = {pw, pl, pr, pa, pt} by using singular points and pseudo ridges, while OVA SVMs classify fingerprints by using the FingerCode, where the margin of a sample o-svm = {mW, mR, mL, mA, mT} is produced. We evaluate them based on the subsumption architecture to manage the ambiguity such as ties and rejects. The subsumption architecture selects an action when there are multiple models by sequentially evaluating each model. When a model is satisfied, it suppresses the other models.
Fig. 1. Overview of the proposed method
prob[5] = {pW, pR, pL, pA, pT} // prob[] is calculated by the naïve Bayes classifier order[5] = {0, 1, 2, 3, 4} o-svm[5] = {mW, mR, mL, mA, mT} // o-svm[] is obatined by the OVA SVMs
// classify with OVA SVMs according to the subsumption architecture if(prob[order[0]] < r1) // r1 is a rejection threshold return reject;
// determine the order of OVA SVMs to evaluate for(i=0; i